2026-07-04 · Davide Carrese

STM32 USART Interrupt-Driven Communication — Register-Level Configuration on STM32F4

STM32 · USART · Register-Level · Interrupts · STM32F4 · Embedded

UART is the most taken-for-granted peripheral in embedded systems — until it silently drops bytes, corrupts frames, or locks up with an overrun error. When you need reliable interrupt-driven communication without the HAL overhead, the register-level USART on STM32F4 is straightforward, provided you handle exactly four flags and one timing detail correctly.

I have debugged more UART issues on client projects than any other peripheral: silent overrun on a GNSS receiver that drops a byte every 30 seconds at 115200 baud, framing errors on a misconfigured RS-485 bus, and a USART that appeared to work but emitted the first byte of every message at the wrong baud rate because the TXE interrupt fired before the BRR was fully stable. Every one of these issues was invisible at the HAL level and trivially diagnosable from the SR register.

This article walks through USART register-level initialisation on STM32F401/STM32F4, implements interrupt-driven TX with a DMA-like ring buffer, adds robust RX with overrun recovery, and demonstrates the pattern on a real UART-based NMEA GPS module.

USART register map on STM32F4

The STM32F401 has up to three USARTs: USART1 on APB2 (up to 84 MHz clock), USART2 and USART3 on APB1 (up to 42 MHz). Each uses the same register layout. The registers that matter for standard asynchronous communication:

OffsetRegisterPurpose
0x00SRStatus — TXE, TC, RXNE, IDLE, ORE, NE, FE, PE
0x04DRData register — 9-bit; write for TX, read for RX
0x08BRRBaud rate register — DIV_Mantissa [15:4] + DIV_Fraction [3:0]
0x0CCR1Control 1 — UE, TE, RE, TXEIE, TCIE, RXNEIE, IDLEIE, M, PCE, PS, OVER8
0x10CR2Control 2 — STOP bits, LINEN, CLKEN, CPOL/CPHA (synchronous mode)
0x14CR3Control 3 — EIE, DMAR, DMAT, RTSE, CTSE (RTS/CTS flow control)
0x1CGTPRGuard time and prescaler (IrDA / Smartcard only)

Baud rate generation: the BRR trap

The USART baud rate formula depends on the OVER8 bit in CR1:

OVER8 = 0 (default, 16× oversampling):
  baud = f_CK / (16 × USARTDIV)

OVER8 = 1 (8× oversampling, higher max baud but less noise immunity):
  baud = f_CK / (8 × USARTDIV)

USARTDIV is a 20-bit fixed-point number encoded in BRR as:

DIV_Mantissa [15:4] = integer part of USARTDIV
DIV_Fraction  [3:0] = fractional part × 16 (for OVER8=0)
                    = fractional part × 8  (for OVER8=1)

For 115200 baud on USART1 (APB2 = 84 MHz) with 16× oversampling:

USARTDIV = 84 MHz / (16 × 115200) = 45.5729...
DIV_Mantissa = 45  (0x2D)
DIV_Fraction = 0.5729 × 16 = 9.166 → 9  (0x9)
BRR = (45 << 4) | 9 = 0x2D9

Actual baud = 84 MHz / (16 × (45 + 9/16)) = 84 MHz / 729 = 115,226
Error = (115226 - 115200) / 115200 = +0.023% — well within ±2% tolerance.

The trap: if you set OVER8=1 and keep the default BRR calculation for 16× sampling, your baud rate will be almost exactly double what you expect. The data sheet (RM0368 §19.3.4) is clear, but it is easy to miss when porting code between STM32 families. Always check OVER8 before debugging a "garbage" UART link.

Register-level initialisation function

Here is a complete USART1 initialisation for 115200/8N1 on STM32F401 with 16× oversampling, TX and RX enabled, and interrupt mode:

#include "stm32f4xx.h"

#define USART1_BAUD_115200   0x2D9

void usart1_init(void) {
    RCC->APB2ENR  |= RCC_APB2ENR_USART1EN;
    RCC->AHB1ENR  |= RCC_AHB1ENR_GPIOAEN;
    __DSB();

    /* PA9 (TX) as alternate function push-pull */
    GPIOA->MODER   &= ~(3U << 18);
    GPIOA->MODER   |=  (2U << 18);
    GPIOA->AFR[1]  &= ~(0xF << 4);
    GPIOA->AFR[1]  |=  (7U << 4);    /* AF7 = USART1_TX */
    GPIOA->OSPEEDR |=  (3U << 18);

    /* PA10 (RX) as alternate function input */
    GPIOA->MODER   &= ~(3U << 20);
    GPIOA->MODER   |=  (2U << 20);
    GPIOA->AFR[1]  &= ~(0xF << 8);
    GPIOA->AFR[1]  |=  (7U << 8);    /* AF7 = USART1_RX */
    GPIOA->PUPDR   &= ~(3U << 20);
    GPIOA->PUPDR   |=  (1U << 20);   /* Pull-up on RX */

    USART1->CR1 = 0;
    USART1->BRR = USART1_BAUD_115200;
    USART1->CR2 = 0;                 /* 1 stop bit, async */
    USART1->CR3 = 0;                 /* No flow control */
    USART1->CR1 = USART_CR1_UE | USART_CR1_TE | USART_CR1_RE
                | USART_CR1_RXNEIE;
}

I keep RXNEIE enabled permanently. The TXEIE is enabled only when we have data to send, and disabled when the ring buffer drains. This prevents the ISR from firing uselessly when there is nothing to transmit.

Interrupt-driven TX with a ring buffer

A ring buffer decouples the application code (which calls usart1_send()) from the ISR (which feeds bytes into DR). The producer (caller) writes to the buffer; the consumer (ISR) reads from it. They never block each other as long as the buffer does not overflow.

#define TX_BUF_SIZE  256

static volatile uint8_t  tx_buf[TX_BUF_SIZE];
static volatile uint16_t tx_head = 0;
static volatile uint16_t tx_tail = 0;

static inline uint16_t tx_next(uint16_t i) {
    return (i + 1) & (TX_BUF_SIZE - 1);
}

void usart1_send(const uint8_t *data, uint16_t len) {
    for (uint16_t i = 0; i < len; i++) {
        uint16_t next_head = tx_next(tx_head);
        while (next_head == tx_tail); /* Block if buffer full */
        tx_buf[tx_head] = data[i];
        tx_head = next_head;
    }
    USART1->CR1 |= USART_CR1_TXEIE;
}

The ISR sends the byte at tx_tail and advances. When the ring drains (tx_head == tx_tail), it disables TXEIE:

void USART1_IRQHandler(void) {
    /* --- Transmit (TXE) --- */
    if ((USART1->SR & USART_SR_TXE) && (USART1->CR1 & USART_CR1_TXEIE)) {
        if (tx_head != tx_tail) {
            USART1->DR = tx_buf[tx_tail] & 0xFF;
            tx_tail = tx_next(tx_tail);
        } else {
            USART1->CR1 &= ~USART_CR1_TXEIE;
        }
    }

    /* --- Receive (RXNE) --- */
    if (USART1->SR & USART_SR_RXNE) {
        uint8_t byte = USART1->DR & 0xFF;
        rx_put(byte);
    }

    /* --- Error handling --- */
    if (USART1->SR & (USART_SR_ORE | USART_SR_FE | USART_SR_NE)) {
        uint8_t dummy = USART1->DR;
        (void)dummy;
        error_counter++;
    }
}

Three crucial details:

RX ring buffer and line-based parsing

For asynchronous protocols such as NMEA (GPS), AT, or MODBUS RTU, the RX handler stores bytes into a ring buffer and the application periodically checks for complete lines. The ISR does no parsing. It just stores bytes.

#define RX_BUF_SIZE  512

static volatile uint8_t  rx_buf[RX_BUF_SIZE];
static volatile uint16_t rx_head = 0;
static volatile uint16_t rx_tail = 0;

static void rx_put(uint8_t byte) {
    uint16_t next = (rx_head + 1) & (RX_BUF_SIZE - 1);
    if (next != rx_tail) {
        rx_buf[rx_head] = byte;
        rx_head = next;
    } else {
        rx_overflow++;
    }
}

bool usart1_get_byte(uint8_t *byte) {
    if (rx_tail == rx_head) return false;
    *byte = rx_buf[rx_tail];
    rx_tail = (rx_tail + 1) & (RX_BUF_SIZE - 1);
    return true;
}

void process_nmea(void) {
    uint8_t byte;
    static char line[128];
    static uint8_t idx = 0;

    while (usart1_get_byte(&byte)) {
        if (byte == '
' || idx >= sizeof(line) - 1) {
            line[idx] = '';
            parse_nmea_sentence(line);
            idx = 0;
        } else if (byte != '
') {
            line[idx++] = byte;
        }
    }
}

This separation means the ISR stays under 1 µs on a 84 MHz STM32F401, and the application parses at its own pace in normal code context.

Error handling: ORE, FE, NE

I always increment a global error_counter rather than silently clearing. On one client project, the counter showed ORE incrementing exactly once every 32 bytes at 115200 baud — the RX ISR had a printf call for debug. Removing it fixed the errors.

Practical example: NMEA GPS on STM32F401

Consider a client project with an STM32F401 reading a u-blox NEO-6M GPS module over USART2 at 9600 baud:

#define USART2_BRR_9600  ((273 << 4) | 7)

void usart2_init(void) {
    RCC->APB1ENR |= RCC_APB1ENR_USART2EN;
    RCC->AHB1ENR |= RCC_AHB1ENR_GPIOAEN;
    __DSB();

    GPIOA->MODER = (GPIOA->MODER & ~(3U << 4) & ~(3U << 6))
                 | (2U << 4) | (2U << 6);
    GPIOA->AFR[0] = (GPIOA->AFR[0] & ~(0xF << 8) & ~(0xF << 12))
                  | (7U << 8) | (7U << 12);
    GPIOA->PUPDR = (GPIOA->PUPDR & ~(3U << 6)) | (1U << 6);

    USART2->BRR = USART2_BRR_9600;
    USART2->CR1 = USART_CR1_UE | USART_CR1_TE | USART_CR1_RE | USART_CR1_RXNEIE;
    NVIC_EnableIRQ(USART2_IRQn);
}

The ISR stores bytes into the ring buffer. The main loop extracts $GPGGA sentences:

void main_loop(void) {
    while (1) {
        process_nmea();
        if (new_position_available) {
            new_position_available = false;
            char report[64];
            snprintf(report, sizeof(report),
                     "POS: %.6f,%.6f ALT:%.1f\r\n",
                     latitude, longitude, altitude);
            usart1_send((uint8_t *)report, strlen(report));
        }
    }
}

This pattern works identically for AT command modems, serial sensors, and debug consoles. The ring buffer absorbs bursty traffic while the main loop processes at its own pace.

Practical checklist

  1. Verify OVER8 before debugging baud rate issues. A wrong oversampling factor doubles or halves the baud rate.
  2. Never print inside the USART ISR: printf, semihosting, or blocking GPIO toggles will cause ORE. Use a dedicated debug USART.
  3. Read DR once per RXNE event: a cached read pattern may read DR twice. Each read acknowledges one byte.
  4. Handle ORE explicitly: the USART does not stall on overrun — it overwrites DR. Log the error or switch to DMA.
  5. Keep the RX ISR under one byte time: at 115200 baud, one byte (10 bits) takes 86.8 µs. At 921600 baud, the budget is 10.9 µs.
  6. Set a pull-up on the RX pin: a floating RX pin picks up noise causing spurious RXNE interrupts.
  7. Use power-of-2 ring buffer sizes: the & (SIZE - 1) trick is a single-cycle AND instruction on Cortex-M4.

How I would approach this on a client project

Every embedded project I start now uses the same three-layer UART architecture: a register-level initialisation, a ring-buffer ISR, and a line-based parser in the application. This pattern has shipped on at least a dozen STM32F4 projects — GNSS receivers, LoRaWAN gateways, cellular modems, RS-485 MODBUS slaves, and custom bootloaders.

The key lesson from production debugging: do not trust any UART link until you have stressed it with a continuous stream at full baud rate for at least 60 seconds. A cheap USB-UART adapter and cat /dev/urandom > /dev/ttyUSB0 on the PC side, with an error counter on the STM32 side, has caught ORE issues on three separate projects before they reached production.

On one occasion, the client's PCB had swapped the RX and TX pins on the DB9 connector — the loopback test caught it immediately, whereas a HAL-based initialisation would have returned a HAL_BUSY that the firmware engineer would have spent hours debugging as an "intermittent" issue.

Sources and further reading

Comments

Have comments? Send me an email.