STM32 I2C Master Mode at Register Level: SCL Timing, Start/Stop, and Multi-Byte Transfers

2026-06-03 · Davide Carrese
STM32 · I2C · STM32F4 · Embedded

I2C is everywhere in embedded systems — sensors, EEPROMs, port expanders, ADCs, display controllers. The STM32 HAL makes I2C look easy, but when a sensor stops responding on a production batch, or the bus times out intermittently, the HAL does not tell you why. Understanding the I2C peripheral at register level is the only way to debug SCL timing, NACK handling, and bus errors with confidence. This article walks through STM32F4 I2C master mode using registers directly, from clock configuration to multi-byte data transfers.

I2C peripheral overview (STM32F4 legacy I2C)

The STM32F4 I2C peripheral is the legacy ST I2C block (not the updated I2C v2 found on STM32G0/H7/L5/U5). It has four key registers for master operation:

The key rule for the legacy I2C block: SR2 must be read after every address event. This clears the ADDR flag and reveals the transfer direction in TRA (bit 2) and BUSY (bit 1). Forgetting this read is the single most common bug in register-level I2C code.

SCL timing: calculating CCR and TRISE

I2C on STM32F4 derives its SCL from the APB1 clock (PCLK1). On a typical F401/F411 running at 84 MHz, PCLK1 is 42 MHz (APB1 prescaler = 2). The CCR register encodes the SCL low/high period, and TRISE accounts for the rise time of the bus.

Standard mode (100 kHz)

// PCLK1 = 42 MHz, Target SCL = 100 kHz
// CCR = PCLK1 / (2 * target_SCL) = 42e6 / (2 * 100e3) = 210
// TRISE = maximum rise time in PCLK1 cycles
// I2C spec: max rise = 1000 ns → 1000e-9 * 42e6 = 42 cycles
I2C1->CCR  = 210;        // Standard mode, 100 kHz
I2C1->TRISE = 42;         // 1000 ns @ 42 MHz

Fast mode (400 kHz)

// PCLK1 = 42 MHz, Target SCL = 400 kHz
// Fast mode: CCR = PCLK1 / (3 * target_SCL) with DUTY = 0
// or CCR = PCLK1 / (25 * target_SCL) with DUTY = 1 (16:9 duty)
// Using DUTY = 0: CCR = 42e6 / (3 * 400e3) = 35
// Using DUTY = 1 (16:9): CCR = 42e6 / (25 * 400e3) = 4 (but min CCR is 1)
// TRISE: max rise = 300 ns → 300e-9 * 42e6 = 12.6 → 13
I2C1->CCR   = 35;          // Fast mode, DUTY=0, 400 kHz
I2C1->CCR  |= 0x8000;      // Set CCR[15] for Fast Mode
I2C1->TRISE = 13;          // 300 ns @ 42 MHz

I always prefer DUTY = 0 at 400 kHz because the 1:1 duty cycle from the CCR = PCLK1 / (3 * target) formula is more intuitive to tune. If your bus has high capacitive load, use DUTY = 1 (16:9 ratio) to give more margin on SCL low: CCR = PCLK1 / (25 * target_SCL).

Checking the numbers

After setting CCR and TRISE, verify by probing SCL with an oscilloscope or logic analyser. The I2C specification tolerances are tight: 100 kHz ± 1 %, 400 kHz ± 1 %. A software error of ±1 in CCR can push SCL out of spec on a long bus.

Master transmit sequence

A register-level master write to a 7-bit slave follows this exact sequence:

// Assumes I2C1 already configured: CR1_PE=1, CR2_FREQ set

void i2c_master_write(uint8_t slave_addr, uint8_t *data, int len) {
    // 1. Generate START condition
    I2C1->CR1 |= I2C_CR1_START;

    // 2. Wait for SB (Start Bit) event
    while (!(I2C1->SR1 & I2C_SR1_SB));

    // 3. Send slave address (7-bit, left-aligned, write)
    I2C1->DR = slave_addr << 1;  // LSB = 0 for write

    // 4. Wait for ADDR flag, then clear by reading SR2
    while (!(I2C1->SR1 & I2C_SR1_ADDR));
    (void)I2C1->SR2;  // ← CRITICAL: clears ADDR flag

    // 5. Transmit data bytes
    for (int i = 0; i < len; i++) {
        while (!(I2C1->SR1 & I2C_SR1_TXE));  // Wait for TXE
        I2C1->DR = data[i];
    }

    // 6. Wait for last byte to actually shift out (BTF)
    while (!(I2C1->SR1 & I2C_SR1_BTF));

    // 7. Generate STOP condition
    I2C1->CR1 |= I2C_CR1_STOP;
}

Step 4 is where most register-level code breaks: SR1 & ADDR indicates the address matched, but to clear ADDR you must read SR1 then read SR2. The (void)I2C1->SR2 cast is deliberate — the hardware requires the read, even though we ignore the value. If you skip it, ADDR stays set, the state machine stalls, and no further interrupts fire.

Master receive sequence

A master read is similar, but you must send NACK on the last byte and generate a STOP before the slave can release the bus:

void i2c_master_read(uint8_t slave_addr, uint8_t *buf, int len) {
    // 1. Generate START
    I2C1->CR1 |= I2C_CR1_START;
    while (!(I2C1->SR1 & I2C_SR1_SB));

    // 2. Send slave address (7-bit, read — LSB = 1)
    I2C1->DR = (slave_addr << 1) | 0x01;

    // 3. Wait for ADDR, clear by reading SR2
    while (!(I2C1->SR1 & I2C_SR1_ADDR));
    (void)I2C1->SR2;

    // 4. Receive bytes
    for (int i = 0; i < len; i++) {
        if (i == len - 1) {
            // Disable ACK before receiving last byte
            I2C1->CR1 &= ~I2C_CR1_ACK;
        }
        // On BTF for all bytes except last
        while (!(I2C1->SR1 & I2C_SR1_RXNE));
        buf[i] = I2C1->DR;
    }

    // 5. Generate STOP (after RXNE for last byte)
    I2C1->CR1 |= I2C_CR1_STOP;

    // 6. Re-enable ACK for next transfer
    I2C1->CR1 |= I2C_CR1_ACK;
}

The NACK/STOP timing is critical: if you clear ACK too early (before the second-to-last byte has been received), the slave may misread the transaction; too late, and the slave issues an extra clock pulse. The sequence above — clear ACK for i == len - 1, wait for RXNE, then immediate STOP — is the one I have validated on dozens of I2C slaves across F4, F0, L4, and G4 projects.

Error conditions and recovery

The legacy I2C block flags three hardware errors in SR1 that demand immediate handling:

AF — Acknowledge Failure

Set when the slave does not ACK the address or a data byte. In a master write, a data NACK means the slave is busy or the byte count exceeds the slave's buffer. In a master read, the slave controls ACK, so AF should never fire on data bytes. Typical causes: wrong address, slave powered down, bus contention, or the slave is still processing the previous command.

if (I2C1->SR1 & I2C_SR1_AF) {
    I2C1->SR1 &= ~I2C_SR1_AF;  // Clear AF by writing 0
    I2C1->CR1 |= I2C_CR1_STOP;  // Generate STOP to release bus
    return -1;  // Report NACK to caller
}

BERR — Bus Error

Indicates a misplaced START or STOP condition detected by the peripheral. This happens when a glitch or another master corrupts the bus. The only reliable recovery is to reset the peripheral and reconfigure.

if (I2C1->SR1 & I2C_SR1_BERR) {
    I2C1->CR1 &= ~I2C_CR1_PE;   // Disable peripheral
    I2C1->CR1 |= I2C_CR1_SWRST; // Software reset
    I2C1->CR1 &= ~I2C_CR1_SWRST;
    i2c_init();  // Re-initialise all registers
    return -2;
}

ARLO — Arbitration Lost

Another master on the multi-master bus started transmitting simultaneously. Clear the flag and wait for the bus to become free before retrying.

Timeouts: because hardware can hang

Every while(!(SR1 & FLAG)) loop needs a timeout. An I2C slave that holds the clock low (clock stretching) can hang your firmware indefinitely. The pattern I use on client projects:

static int wait_flag(volatile uint32_t *sr, uint32_t mask, uint32_t timeout_us) {
    uint32_t start = micros();  // Or DWT->CYCCNT on Cortex-M
    while (!(*sr & mask)) {
        if (*sr & (I2C_SR1_AF | I2C_SR1_BERR | I2C_SR1_ARLO)) {
            return -1;  // Error during wait
        }
        if ((micros() - start) > timeout_us) {
            return -2;  // Timeout
        }
    }
    return 0;  // OK
}

I use a 5 ms timeout for address events and 1 ms per byte for data transfers. Clock stretching rarely exceeds a few hundred microseconds on modern sensors; a 5 ms timeout catches genuine hangs without false positives.

Practical example: reading a temperature sensor via register-level I2C

Here is a complete read from an LM75 temperature sensor (7-bit address 0x48) using only registers:

#define LM75_ADDR  0x48
#define LM75_TEMP  0x00  // Temperature register pointer

float lm75_read_temp(void) {
    uint8_t cmd = LM75_TEMP;
    uint8_t raw[2];

    // Write pointer byte
    i2c_master_write(LM75_ADDR, &cmd, 1);

    // Read 2 bytes (temperature)
    i2c_master_read(LM75_ADDR, raw, 2);

    // Convert: 11-bit signed value, 0.125 °C per LSB
    int16_t temp = (raw[0] << 8) | raw[1];
    temp >>= 5;  // Right-align 11-bit value
    if (temp & 0x0400) {
        temp |= 0xF800;  // Sign extend
    }
    return temp * 0.125f;
}

This is the same functional flow as the HAL, but without the HAL's state machine overhead, timeout logic interleaving, or the 150+ function calls per transaction. On a resource-constrained STM32F401, the register-level approach shaves ~800 bytes of code space and eliminates the HAL's I2C state machine (which has tripped up more than one production release with subtle race conditions between master and slave modes on the same I2C instance).

Practical checklist

How I would approach this on a client project

On a production firmware project, I never mix register-level I2C code and HAL I2C code on the same bus instance — the state machines conflict. I choose one model upfront:

If the client insists on HAL, I override the HAL's I2C MSP init to configure the pins with the correct alternate function, pull-up, and speed directly — the HAL's weak default GPIO configuration has caused more intermittent I2C failures than any other single cause on STM32F4 designs I have audited.

I also add a short bus recovery sequence in i2c_init(): toggle SCL 9 times while SDA is high, then generate a START/STOP. This recovers slaves stuck in an incomplete transaction after a watchdog reset, without requiring a power cycle.

Sources and further reading

💬 Reply by email

If you have corrections, a different approach, or a real-world I2C war story, send it to blog@carrese.eu. I read every message and update the article when I learn something new.