ESP-IDF 6.x I2C Master Driver: Migrating Without Hiding Bus Faults

2026-05-27 · Davide Carrese
ESP32 · I2C · Firmware Debugging

ESP-IDF 6.x is a good moment to stop treating I2C as a few copy-pasted calls around i2c_cmd_link_create(). The current master driver models the bus and each peripheral as explicit handles, and the 6.0 migration notes also tighten some error semantics around NACK conditions. That matters in products: a sensor read that occasionally returns an old value is often more expensive than a read that fails loudly.

Why this is worth touching

I2C failures are rarely elegant. On an ESP32-based gateway, thermostat, battery product, or industrial sensor node, the fault may be one loose connector, a slow power rail, a device still waking from reset, bus capacitance slightly outside the design assumption, or a firmware task that stretches a transaction behind a higher-priority interrupt. The symptom is usually reported as “the product sometimes needs a reboot”.

The newer ESP-IDF driver pushes the firmware toward a clearer architecture. You create one master bus with its GPIOs, clock source, pull-up policy, glitch filter and optional power-management behaviour. Then you attach devices with their own addresses and bus speeds. A register read is no longer a manually assembled command list in application code; it is a transaction against a device handle, for example with i2c_master_transmit_receive(), which intentionally performs a write followed by a repeated-start read without inserting a STOP condition.

That split is useful beyond style. It gives the application one place to own bus-level configuration and one place to own each device's electrical and protocol assumptions. In a client codebase, that is the difference between “all drivers know a little bit about I2C0” and “the board support layer owns the bus; device drivers own device behaviour”.

The migration trap: preserving bad assumptions

A mechanical migration can pass compile and still preserve the original bug. Legacy code often hides three assumptions: a fixed timeout copied from an example, no distinction between NACK and other failures, and no recovery policy after a stuck transaction. ESP-IDF 6.0 migration notes mention that several I2C master APIs now return ESP_ERR_INVALID_RESPONSE rather than ESP_ERR_INVALID_STATE when a NACK is detected. That is not just a cosmetic enum change. It is an opportunity to decide what NACK means for each device.

A NACK during startup may be normal for a sensor whose regulator is still ramping. A NACK during steady-state acquisition may indicate a disconnected board, an address conflict, brownout, or an internal sensor reset. A timeout may point to clock stretching, a slave holding SDA, a too-aggressive interrupt load, or physical bus problems. Treating every non-ESP_OK as “retry three times then reboot” makes field debugging unnecessarily blind.

A small driver shape that scales

The following example shows the shape I prefer for a simple register-based sensor. The bus is created once, the device handle is retained by the sensor driver, and every read returns a typed status that the product can log and act on.

#include "driver/i2c_master.h"
#include "esp_err.h"
#include <stdint.h>

#define I2C_PORT            I2C_NUM_0
#define I2C_SCL_GPIO        22
#define I2C_SDA_GPIO        21
#define SENSOR_ADDR         0x48
#define SENSOR_SPEED_HZ     400000
#define I2C_TIMEOUT_MS      20

typedef enum {
    SENSOR_IO_OK = 0,
    SENSOR_IO_NACK,
    SENSOR_IO_TIMEOUT,
    SENSOR_IO_DRIVER_ERROR,
} sensor_io_status_t;

typedef struct {
    i2c_master_bus_handle_t bus;
    i2c_master_dev_handle_t dev;
} board_i2c_sensor_t;

static sensor_io_status_t map_i2c_error(esp_err_t err)
{
    switch (err) {
    case ESP_OK:
        return SENSOR_IO_OK;
    case ESP_ERR_INVALID_RESPONSE:
        return SENSOR_IO_NACK;
    case ESP_ERR_TIMEOUT:
        return SENSOR_IO_TIMEOUT;
    default:
        return SENSOR_IO_DRIVER_ERROR;
    }
}

esp_err_t board_sensor_i2c_init(board_i2c_sensor_t *s)
{
    i2c_master_bus_config_t bus_cfg = {
        .clk_source = I2C_CLK_SRC_DEFAULT,
        .i2c_port = I2C_PORT,
        .scl_io_num = I2C_SCL_GPIO,
        .sda_io_num = I2C_SDA_GPIO,
        .glitch_ignore_cnt = 7,
        .flags.enable_internal_pullup = true,
    };
    ESP_RETURN_ON_ERROR(i2c_new_master_bus(&bus_cfg, &s->bus), "i2c", "create bus failed");
    i2c_device_config_t dev_cfg = {
        .dev_addr_length = I2C_ADDR_BIT_LEN_7,
        .device_address = SENSOR_ADDR,
        .scl_speed_hz = SENSOR_SPEED_HZ,
    };
    return i2c_master_bus_add_device(s->bus, &dev_cfg, &s->dev);
}

sensor_io_status_t sensor_read_u16(board_i2c_sensor_t *s, uint8_t reg, uint16_t *value)
{
    uint8_t rx[2] = {0};
    esp_err_t err = i2c_master_transmit_receive(
        s->dev, &reg, 1, rx, sizeof(rx), pdMS_TO_TICKS(I2C_TIMEOUT_MS));
    sensor_io_status_t st = map_i2c_error(err);
    if (st != SENSOR_IO_OK) return st;
    *value = ((uint16_t)rx[0] << 8) | rx[1];
    return SENSOR_IO_OK;
}

This is not meant to be a complete production driver. It is meant to show the boundaries: board setup, device configuration, transaction, and error mapping. Once those boundaries exist, adding metrics, retry budgets, bus reset hooks, and device-specific warm-up behaviour becomes straightforward.

Practical example: environmental sensor board on an ESP32 gateway

Assume an ESP32 gateway reads a temperature/humidity sensor and an external ADC on the same I2C bus. The product publishes measurements every second and is installed in a cabinet where replacing hardware is expensive. In that scenario I would not migrate by changing API calls in-place. I would introduce a board-level I2C module that creates the bus once during boot, attaches both devices, and exposes device handles only to their drivers.

For the first seconds after boot, each device driver may treat NACK as “not ready yet” and retry with a bounded backoff. After the system enters normal operation, repeated NACKs should become a device-health fault, not a silent zero reading. Timeouts should be counted separately from NACKs because they often suggest different root causes. If the ADC occasionally stretches the clock longer than expected, the timeout budget may need adjustment. If the bus times out and remains stuck, the board layer can attempt a controlled bus recovery or mark the measurement subsystem degraded.

The application should publish not only sensor values but also a compact health state: last successful read age, NACK counter, timeout counter, and current degradation state. Those four fields make remote support dramatically easier. They also prevent the common anti-pattern where a cloud dashboard shows a clean but stale temperature value while the embedded device has been fighting the bus for hours.

Timeouts, pull-ups, and concurrency

Timeouts are part of the product specification

The timeout passed to a transaction is not an arbitrary number. It should be longer than the expected transaction at the selected clock rate, plus known clock stretching and scheduling latency, but short enough that a broken peripheral cannot block the measurement task indefinitely. If the value is copied from a demo, document it as unverified technical debt.

Internal pull-ups are not a board design

The ESP-IDF examples often enable internal pull-ups because they make evaluation boards convenient. For a real product, calculate or measure the rise time with the actual bus capacitance, voltage and speed. Internal pull-ups may be acceptable for a short low-speed bus; they are not a substitute for validating the electrical layer.

One bus, one owner

If several tasks talk to devices on the same bus, centralise access or protect it explicitly. The handle-based API makes ownership easier to express, but it does not remove the architectural decision. Sensor polling, configuration commands, and diagnostic reads should not interleave unpredictably because two tasks happen to share a peripheral bus.

Practical checklist

How I would approach this on a client project

I would start by inventorying every I2C device, its address, maximum bus speed, reset timing, clock-stretching behaviour and safety relevance. Then I would migrate one bus at a time behind a small board abstraction, with tests or bench scripts that deliberately remove a device, hold reset, slow power-up, and force repeated reads under RTOS load. The important deliverable is not only “the code builds on ESP-IDF 6.x”. It is a bus contract: who owns the bus, what errors mean, how they are reported, and which failures are allowed to degrade the feature instead of rebooting the product.

For a mature product, I would also compare field logs before and after the migration. If the new driver only changes the API names, the migration was mostly maintenance. If it turns intermittent “random freezes” into specific NACK, timeout and stale-data metrics, the migration improved the product.

Sources consulted

Comments

Have a concrete firmware case or a different failure mode? Send a short note by email.