STM32 USART Interrupt-Driven Communication — Register-Level Configuration on STM32F4
UART is the most taken-for-granted peripheral in embedded systems — until it silently drops bytes, corrupts frames, or locks up with an overrun error. When you need reliable interrupt-driven communication without the HAL overhead, the register-level USART on STM32F4 is straightforward, provided you handle exactly four flags and one timing detail correctly.
I have debugged more UART issues on client projects than any other peripheral: silent overrun on a GNSS receiver that drops a byte every 30 seconds at 115200 baud, framing errors on a misconfigured RS-485 bus, and a USART that appeared to work but emitted the first byte of every message at the wrong baud rate because the TXE interrupt fired before the BRR was fully stable. Every one of these issues was invisible at the HAL level and trivially diagnosable from the SR register.
This article walks through USART register-level initialisation on STM32F401/STM32F4, implements interrupt-driven TX with a DMA-like ring buffer, adds robust RX with overrun recovery, and demonstrates the pattern on a real UART-based NMEA GPS module.
USART register map on STM32F4
The STM32F401 has up to three USARTs: USART1 on APB2 (up to 84 MHz clock), USART2 and USART3 on APB1 (up to 42 MHz). Each uses the same register layout. The registers that matter for standard asynchronous communication:
| Offset | Register | Purpose |
|---|---|---|
| 0x00 | SR | Status — TXE, TC, RXNE, IDLE, ORE, NE, FE, PE |
| 0x04 | DR | Data register — 9-bit; write for TX, read for RX |
| 0x08 | BRR | Baud rate register — DIV_Mantissa [15:4] + DIV_Fraction [3:0] |
| 0x0C | CR1 | Control 1 — UE, TE, RE, TXEIE, TCIE, RXNEIE, IDLEIE, M, PCE, PS, OVER8 |
| 0x10 | CR2 | Control 2 — STOP bits, LINEN, CLKEN, CPOL/CPHA (synchronous mode) |
| 0x14 | CR3 | Control 3 — EIE, DMAR, DMAT, RTSE, CTSE (RTS/CTS flow control) |
| 0x1C | GTPR | Guard time and prescaler (IrDA / Smartcard only) |
Baud rate generation: the BRR trap
The USART baud rate formula depends on the OVER8 bit in CR1:
OVER8 = 0 (default, 16× oversampling): baud = f_CK / (16 × USARTDIV) OVER8 = 1 (8× oversampling, higher max baud but less noise immunity): baud = f_CK / (8 × USARTDIV)
USARTDIV is a 20-bit fixed-point number encoded in BRR as:
DIV_Mantissa [15:4] = integer part of USARTDIV
DIV_Fraction [3:0] = fractional part × 16 (for OVER8=0)
= fractional part × 8 (for OVER8=1)
For 115200 baud on USART1 (APB2 = 84 MHz) with 16× oversampling:
USARTDIV = 84 MHz / (16 × 115200) = 45.5729... DIV_Mantissa = 45 (0x2D) DIV_Fraction = 0.5729 × 16 = 9.166 → 9 (0x9) BRR = (45 << 4) | 9 = 0x2D9 Actual baud = 84 MHz / (16 × (45 + 9/16)) = 84 MHz / 729 = 115,226 Error = (115226 - 115200) / 115200 = +0.023% — well within ±2% tolerance.
The trap: if you set OVER8=1 and keep the default BRR calculation for 16× sampling, your baud rate will be almost exactly double what you expect. The data sheet (RM0368 §19.3.4) is clear, but it is easy to miss when porting code between STM32 families. Always check OVER8 before debugging a "garbage" UART link.
Register-level initialisation function
Here is a complete USART1 initialisation for 115200/8N1 on STM32F401 with 16× oversampling, TX and RX enabled, and interrupt mode:
#include "stm32f4xx.h"
#define USART1_BAUD_115200 0x2D9
void usart1_init(void) {
RCC->APB2ENR |= RCC_APB2ENR_USART1EN;
RCC->AHB1ENR |= RCC_AHB1ENR_GPIOAEN;
__DSB();
/* PA9 (TX) as alternate function push-pull */
GPIOA->MODER &= ~(3U << 18);
GPIOA->MODER |= (2U << 18);
GPIOA->AFR[1] &= ~(0xF << 4);
GPIOA->AFR[1] |= (7U << 4); /* AF7 = USART1_TX */
GPIOA->OSPEEDR |= (3U << 18);
/* PA10 (RX) as alternate function input */
GPIOA->MODER &= ~(3U << 20);
GPIOA->MODER |= (2U << 20);
GPIOA->AFR[1] &= ~(0xF << 8);
GPIOA->AFR[1] |= (7U << 8); /* AF7 = USART1_RX */
GPIOA->PUPDR &= ~(3U << 20);
GPIOA->PUPDR |= (1U << 20); /* Pull-up on RX */
USART1->CR1 = 0;
USART1->BRR = USART1_BAUD_115200;
USART1->CR2 = 0; /* 1 stop bit, async */
USART1->CR3 = 0; /* No flow control */
USART1->CR1 = USART_CR1_UE | USART_CR1_TE | USART_CR1_RE
| USART_CR1_RXNEIE;
}
I keep RXNEIE enabled permanently. The TXEIE is enabled only when we have data to send, and disabled when the ring buffer drains. This prevents the ISR from firing uselessly when there is nothing to transmit.
Interrupt-driven TX with a ring buffer
A ring buffer decouples the application code (which calls usart1_send()) from the ISR (which feeds bytes into DR). The producer (caller) writes to the buffer; the consumer (ISR) reads from it. They never block each other as long as the buffer does not overflow.
#define TX_BUF_SIZE 256
static volatile uint8_t tx_buf[TX_BUF_SIZE];
static volatile uint16_t tx_head = 0;
static volatile uint16_t tx_tail = 0;
static inline uint16_t tx_next(uint16_t i) {
return (i + 1) & (TX_BUF_SIZE - 1);
}
void usart1_send(const uint8_t *data, uint16_t len) {
for (uint16_t i = 0; i < len; i++) {
uint16_t next_head = tx_next(tx_head);
while (next_head == tx_tail); /* Block if buffer full */
tx_buf[tx_head] = data[i];
tx_head = next_head;
}
USART1->CR1 |= USART_CR1_TXEIE;
}
The ISR sends the byte at tx_tail and advances. When the ring drains (tx_head == tx_tail), it disables TXEIE:
void USART1_IRQHandler(void) {
/* --- Transmit (TXE) --- */
if ((USART1->SR & USART_SR_TXE) && (USART1->CR1 & USART_CR1_TXEIE)) {
if (tx_head != tx_tail) {
USART1->DR = tx_buf[tx_tail] & 0xFF;
tx_tail = tx_next(tx_tail);
} else {
USART1->CR1 &= ~USART_CR1_TXEIE;
}
}
/* --- Receive (RXNE) --- */
if (USART1->SR & USART_SR_RXNE) {
uint8_t byte = USART1->DR & 0xFF;
rx_put(byte);
}
/* --- Error handling --- */
if (USART1->SR & (USART_SR_ORE | USART_SR_FE | USART_SR_NE)) {
uint8_t dummy = USART1->DR;
(void)dummy;
error_counter++;
}
}
Three crucial details:
- Check both SR and CR1 for TXE: TXE is always set when the shift register is ready. The
CR1 & TXEIEguard ensures we only transmit when explicitly enabled. - Read DR once per RXNE: each DR read consumes one received byte. Reading twice discards the second byte silently.
- Clear ORE by reading SR then DR: the RM says "cleared by a read to the SR register followed by a read to the DR register". A single SR read is not enough.
RX ring buffer and line-based parsing
For asynchronous protocols such as NMEA (GPS), AT, or MODBUS RTU, the RX handler stores bytes into a ring buffer and the application periodically checks for complete lines. The ISR does no parsing. It just stores bytes.
#define RX_BUF_SIZE 512
static volatile uint8_t rx_buf[RX_BUF_SIZE];
static volatile uint16_t rx_head = 0;
static volatile uint16_t rx_tail = 0;
static void rx_put(uint8_t byte) {
uint16_t next = (rx_head + 1) & (RX_BUF_SIZE - 1);
if (next != rx_tail) {
rx_buf[rx_head] = byte;
rx_head = next;
} else {
rx_overflow++;
}
}
bool usart1_get_byte(uint8_t *byte) {
if (rx_tail == rx_head) return false;
*byte = rx_buf[rx_tail];
rx_tail = (rx_tail + 1) & (RX_BUF_SIZE - 1);
return true;
}
void process_nmea(void) {
uint8_t byte;
static char line[128];
static uint8_t idx = 0;
while (usart1_get_byte(&byte)) {
if (byte == '
' || idx >= sizeof(line) - 1) {
line[idx] = ' ';
parse_nmea_sentence(line);
idx = 0;
} else if (byte != '
') {
line[idx++] = byte;
}
}
}
This separation means the ISR stays under 1 µs on a 84 MHz STM32F401, and the application parses at its own pace in normal code context.
Error handling: ORE, FE, NE
- ORE (Overrun Error, bit 3): a new byte arrived before DR was read. The previous byte is overwritten and unrecoverable. Recovery: read SR, then read DR.
- FE (Framing Error, bit 1): the stop bit was not detected as valid. Usually a baud rate mismatch.
- NE (Noise Error, bit 2): oversampling detected inconsistent samples. Meaningful in electrically noisy environments.
I always increment a global error_counter rather than silently clearing. On one client project, the counter showed ORE incrementing exactly once every 32 bytes at 115200 baud — the RX ISR had a printf call for debug. Removing it fixed the errors.
Practical example: NMEA GPS on STM32F401
Consider a client project with an STM32F401 reading a u-blox NEO-6M GPS module over USART2 at 9600 baud:
#define USART2_BRR_9600 ((273 << 4) | 7)
void usart2_init(void) {
RCC->APB1ENR |= RCC_APB1ENR_USART2EN;
RCC->AHB1ENR |= RCC_AHB1ENR_GPIOAEN;
__DSB();
GPIOA->MODER = (GPIOA->MODER & ~(3U << 4) & ~(3U << 6))
| (2U << 4) | (2U << 6);
GPIOA->AFR[0] = (GPIOA->AFR[0] & ~(0xF << 8) & ~(0xF << 12))
| (7U << 8) | (7U << 12);
GPIOA->PUPDR = (GPIOA->PUPDR & ~(3U << 6)) | (1U << 6);
USART2->BRR = USART2_BRR_9600;
USART2->CR1 = USART_CR1_UE | USART_CR1_TE | USART_CR1_RE | USART_CR1_RXNEIE;
NVIC_EnableIRQ(USART2_IRQn);
}
The ISR stores bytes into the ring buffer. The main loop extracts $GPGGA sentences:
void main_loop(void) {
while (1) {
process_nmea();
if (new_position_available) {
new_position_available = false;
char report[64];
snprintf(report, sizeof(report),
"POS: %.6f,%.6f ALT:%.1f\r\n",
latitude, longitude, altitude);
usart1_send((uint8_t *)report, strlen(report));
}
}
}
This pattern works identically for AT command modems, serial sensors, and debug consoles. The ring buffer absorbs bursty traffic while the main loop processes at its own pace.
Practical checklist
- Verify OVER8 before debugging baud rate issues. A wrong oversampling factor doubles or halves the baud rate.
- Never print inside the USART ISR: printf, semihosting, or blocking GPIO toggles will cause ORE. Use a dedicated debug USART.
- Read DR once per RXNE event: a cached read pattern may read DR twice. Each read acknowledges one byte.
- Handle ORE explicitly: the USART does not stall on overrun — it overwrites DR. Log the error or switch to DMA.
- Keep the RX ISR under one byte time: at 115200 baud, one byte (10 bits) takes 86.8 µs. At 921600 baud, the budget is 10.9 µs.
- Set a pull-up on the RX pin: a floating RX pin picks up noise causing spurious RXNE interrupts.
- Use power-of-2 ring buffer sizes: the
& (SIZE - 1)trick is a single-cycle AND instruction on Cortex-M4.
How I would approach this on a client project
Every embedded project I start now uses the same three-layer UART architecture: a register-level initialisation, a ring-buffer ISR, and a line-based parser in the application. This pattern has shipped on at least a dozen STM32F4 projects — GNSS receivers, LoRaWAN gateways, cellular modems, RS-485 MODBUS slaves, and custom bootloaders.
The key lesson from production debugging: do not trust any UART link until you have stressed it with a continuous stream at full baud rate for at least 60 seconds. A cheap USB-UART adapter and cat /dev/urandom > /dev/ttyUSB0 on the PC side, with an error counter on the STM32 side, has caught ORE issues on three separate projects before they reached production.
On one occasion, the client's PCB had swapped the RX and TX pins on the DB9 connector — the loopback test caught it immediately, whereas a HAL-based initialisation would have returned a HAL_BUSY that the firmware engineer would have spent hours debugging as an "intermittent" issue.

Comments
Have comments? Send me an email.