STM32 Flash Memory: Writing, Erasing, and Managing Persistent Data at Runtime

2026-06-02 · Davide Carrese

STM32 · Flash · Firmware · Bootloader

Every production embedded device eventually needs to persist data — calibration coefficients, serial numbers, network credentials, or application logs. On an STM32 without external EEPROM, the on-chip flash is the only non-volatile storage available. Writing to flash at runtime is completely different from writing to RAM: you must manage unlock sequences, erase-before-write constraints, alignment rules, and timing. This article covers the register-level flash programming model across STM32F4, G0, and U5 series, with practical patterns for EEPROM emulation and bootloader integration.

Flash architecture: sectors vs pages

The first thing to understand is that STM32 flash is not organised uniformly. The erase granularity — the smallest region you can erase in one operation — varies by family:

STM32F4 (F401/F411/F412): flash is divided into sectors. On a 512 KB device, sectors are 16 KB (4 sectors), 64 KB (1 sector), and 128 KB (3 sectors). You cannot erase less than a full sector.
STM32G0/G4: flash is divided into pages, typically 2 KB each. Much finer erase granularity, which makes EEPROM emulation simpler.
STM32U5: uses a dual-bank architecture with pages of 4 KB or 8 KB, depending on the bank configuration. Supports read-while-write (RWW) when banks are independent.
STM32L0/L4: pages of 128 or 256 bytes (L0) and 2 KB (L4). The small page size on L0 is excellent for data EEPROM emulation without external memory.

The erase time also differs: erasing a 16 KB sector on F4 takes about 50–100 ms, while erasing a 2 KB page on G0 takes about 5–10 ms. Your firmware must tolerate these latencies — never erase from an interrupt context unless you can accept the stall.

Flash controller registers (STM32F4 as reference)

The flash interface is controlled through the FLASH->CR (control), FLASH->SR (status), and FLASH->ACR (access control) registers. The key bits are:

FLASH_CR_PG (bit 0): Programming enable. Must be set before every write sequence.
FLASH_CR_SER (bit 1): Sector erase enable.
FLASH_CR_SNB (bits 3–6): Sector number selection for erase.
FLASH_CR_STRT (bit 16): Start the erase operation.
FLASH_CR_LOCK (bit 31): Lock bit. Set after reset; must be unlocked before any write/erase.
FLASH_SR_BSY (bit 0): Busy flag. Poll this between operations.
FLASH_SR_PGSERR / FLASH_SR_PGPERR: Programming sequence and parallelism errors.

The unlock sequence is a write of 0x45670123 then 0xCDEF89AB to FLASH->KEYR. Writing any other value locks the controller again permanently until the next reset.

Register-level flash write

Writing a 32-bit word to flash on STM32F4 follows this sequence:

// 1. Unlock the flash controller
FLASH->KEYR = 0x45670123;
FLASH->KEYR = 0xCDEF89AB;

// 2. Wait for any previous operation to complete
while (FLASH->SR & FLASH_SR_BSY);

// 3. Enable programming
FLASH->CR |= FLASH_CR_PG;

// 4. Write the word (32-bit aligned address!)
*(volatile uint32_t *)0x08040000 = 0xAABBCCDD;

// 5. Wait for completion (typically 10–50 µs per word)
while (FLASH->SR & FLASH_SR_BSY);

// 6. Verify: read back and compare
if (*(volatile uint32_t *)0x08040000 != 0xAABBCCDD) {
    // Handle write failure
}

// 7. Lock the controller
FLASH->CR |= FLASH_CR_LOCK;

Key constraints:

The destination address must be 32-bit aligned. Half-word writes (16-bit) are also supported with the right PSIZE setting (see FLASH_CR_PSIZE on F4), but 8-bit writes are not — you must do a read-modify-write cycle.
You can only write to an erased (0xFF) location. If the target word is not 0xFFFFFFFF, the write will either fail silently (F4) or produce a write-protection error. Always erase first.
The flash programming voltage must be in the specified range. On F4, VDD must be between 1.8 V and 3.6 V during write/erase — brownout during programming corrupts the flash.

Writing multiple words

For efficiency, you can keep FLASH_CR_PG set and write consecutive 32-bit words. The hardware handles the internal programming sequence for each word. Just poll BSY between writes:

FLASH->CR |= FLASH_CR_PG;
for (size_t i = 0; i < 64; i++) {      // 64 words = 256 bytes
    ((volatile uint32_t *)dest)[i] = data[i];
    while (FLASH->SR & FLASH_SR_BSY);
}
FLASH->CR &= ~FLASH_CR_PG;

On STM32U5 and G4, you can use the burst programming feature (FLASH_CR_BURST) which programs multiple words with fewer CPU wait cycles. Check the reference manual for your target series.

Register-level flash erase

Erasing a sector on STM32F4:

// 1. Unlock
FLASH->KEYR = 0x45670123;
FLASH->KEYR = 0xCDEF89AB;
while (FLASH->SR & FLASH_SR_BSY);

// 2. Select sector erase, pick sector number
FLASH->CR |= FLASH_CR_SER;            // Sector erase mode
FLASH->CR &= ~(0x78);                 // Clear SNB bits
FLASH->CR |= (5 << 3);                // SNB = 5 (sector 5, 128 KB on F401)

// 3. Start erase
FLASH->CR |= FLASH_CR_STRT;

// 4. Wait for completion (50–300 ms depending on sector size)
while (FLASH->SR & FLASH_SR_BSY);

// 5. Lock
FLASH->CR &= ~FLASH_CR_SER;
FLASH->CR |= FLASH_CR_LOCK;

On G0/G4 with page erase, the register is similar but uses FLASH_CR_PER (page erase) and FLASH_CR_PNB (page number) instead of SER/SNB. The unlock keys are also different (0x45670123 / 0xCDEF89AB on main flash, but a different key for option bytes).

Option bytes

Option bytes control hardware configuration — read-out protection (RDP), hardware watchdog, boot mode, and flash write-protection sectors. They are stored in a dedicated flash region and must be programmed through a separate unlock sequence:

// Unlock option bytes
FLASH->OPTKEYR = 0x08192A3B;
FLASH->OPTKEYR = 0x4C5D6E7F;

// Enable option byte programming
FLASH->CR |= FLASH_CR_OPTPG;

// Write the option byte value (example: set RDP to level 1)
*(volatile uint32_t *)0x1FFFC000 = 0xCC;

while (FLASH->SR & FLASH_SR_BSY);

// Reload option bytes (triggers a system reset on some series)
FLASH->CR |= FLASH_CR_OPTSTRT;
while (FLASH->SR & FLASH_SR_BSY);

Warning: setting RDP level 2 permanently disables debugging — irreversible. On client projects, I always set RDP level 1 (debug still possible but flash reads from external debugger are blocked) only after the production programming fixture is validated. Never prototype with RDP level 2.

EEPROM emulation: the two-page swap pattern

Since flash cannot be written in place (you must erase the whole sector/page first), you cannot use it like EEPROM. The standard workaround is a two-page swap algorithm:

Allocate two identical flash pages (Page A, Page B).
Start with Page A active. All writes go to the next free slot in Page A.
When Page A fills up, write the latest copy of every variable to Page B and erase Page A.
Page B becomes active; the process repeats in reverse.

Here is a minimal implementation sketch:

#define PAGE_A_ADDR 0x08040000
#define PAGE_B_ADDR 0x08040800  // 2 KB pages on G0/G4
#define PAGE_SIZE   2048

typedef struct {
    uint16_t id;         // Variable identifier
    uint16_t len;        // Data length (bytes)
    uint8_t  data[252];  // Payload
} eeprom_record_t;

static uint32_t current_page = PAGE_A_ADDR;

void eeprom_write(uint16_t id, const uint8_t *data, uint16_t len) {
    // Find next free slot in current page (scan for 0xFFFF header)
    // If no space left, do the swap: copy valid records to the other page
    // Erase the full page, then switch active page
    // Write the new record at the next free offset
}

void eeprom_init(void) {
    // Scan Page A and Page B to determine which is active
    // The active page contains more recent data (valid records + FFFF tail)
}

On STM32F4 with 16 KB sectors, this pattern wastes more space but still works for calibration storage. I typically reserve sector 11 or 12 (near the end of flash) for data, leaving the rest for code. On G0 with 2 KB pages, the overhead is minimal — perfect for 100–200 parameters.

Practical example: writing calibration data on STM32G070

The STM32G070RB has 128 KB flash with 2 KB pages. I reserve page 63 (the last page, 0x0803F800) for calibration storage:

#define CAL_PAGE_ADDR  0x0803F800
#define CAL_MAGIC      0xCA1I

typedef struct __attribute__((packed)) {
    uint32_t magic;
    float    gain_x;
    float    offset_x;
    float    gain_y;
    float    offset_y;
    uint16_t checksum;
} cal_data_t;

int cal_save(const cal_data_t *cfg) {
    // Erase page first
    FLASH->KEYR = 0x45670123;
    FLASH->KEYR = 0xCDEF89AB;
    while (FLASH->SR & FLASH_SR_BSY);

    FLASH->CR |= FLASH_CR_PER;
    FLASH->CR &= ~FLASH_CR_PNB_Msk;
    FLASH->CR |= (63 << FLASH_CR_PNB_Pos);  // Page 63
    FLASH->CR |= FLASH_CR_STRT;
    while (FLASH->SR & FLASH_SR_BSY);
    FLASH->CR &= ~FLASH_CR_PER;

    // Write data
    const uint32_t *src = (const uint32_t *)cfg;
    FLASH->CR |= FLASH_CR_PG;
    for (int i = 0; i < sizeof(cal_data_t) / 4; i++) {
        ((volatile uint32_t *)CAL_PAGE_ADDR)[i] = src[i];
        while (FLASH->SR & FLASH_SR_BSY);
    }
    FLASH->CR &= ~FLASH_CR_PG;
    FLASH->CR |= FLASH_CR_LOCK;

    // Verify
    return memcmp((void *)CAL_PAGE_ADDR, cfg, sizeof(cal_data_t)) == 0 ? 0 : -1;
}

This approach is simple, deterministic, and survives unexpected power loss — either the write completed fully (magic + checksum valid) or it did not (magic missing), in which case cal_load() returns the factory defaults. On a client project, I add a second backup page so that if the write is interrupted, the previous valid calibration is still available.

Bootloader considerations

If your firmware has a bootloader, flash write/erase typically runs from the bootloader context, not the application. The application sends a firmware image over UART/CAN/SPI, stored temporarily in RAM or a scratch area, then jumps to the bootloader which programs the main flash:

Vector table relocation: the bootloader lives in sector 0 (0x08000000) with its own vector table. The application starts at sector 1 (e.g. 0x08008000 on F401 with 32 KB bootloader + 16 KB sector 1 overlap).
Interrupt-safe window: do not program flash while interrupts may fire. Disable global interrupts during the write/erase loop, or ensure no ISR accesses flash or uses a vector from the region being erased.
Watchdog management: a sector erase can take up to 300 ms. Refresh the IWDG before starting, or switch to a longer timeout period. I have debugged countless "bricked-on-update" devices where the IWDG reset the MCU mid-erase, leaving a partial image.

Practical checklist

☐ Flash architecture identified: sector-based (F4) or page-based (G0/G4/U5/L4). Erase granularity confirmed in the reference manual.
☐ Unlock sequence written correctly: KEYR not OPTKEYR for main flash; two specific keys in order.
☐ Destination address is 32-bit aligned and falls within a writable flash region (not protected by WRP option bytes).
☐ BSY flag polled before and after each write/erase operation.
☐ Flash locked after write/erase to prevent spurious writes from runaway code.
☐ VDD monitoring during flash programming: brownout reset could corrupt.
☐ EEPROM emulation design handles power-loss: magic number + checksum to detect valid records.
☐ Bootloader code runs from RAM or from a region not being erased (if self-programming).
☐ IWDG refreshed or prescaled before long erase operations (50–300 ms).
☐ Option bytes: RDP level verified — never level 2 during development.
☐ Write-protected sectors (WRP) checked: trying to write a protected sector causes a hard fault or silent error.

How I would approach this on a client project

On a production firmware project, I never let application code touch flash registers directly. Flash programming is a critical operation with real consequences if misconfigured — one wrong bit in CR and you can lock yourself out of debugging or corrupt the application vector table.

I write a dedicated flash_driver.c module that exposes only:

int  flash_init(void);
int  flash_erase(uint32_t page_addr);
int  flash_write(uint32_t addr, const uint8_t *data, uint32_t len);
int  flash_read(uint32_t addr, uint8_t *out, uint32_t len);
void flash_lock(void);
void flash_unlock(void);

This module goes through code review with a checklist (the one above). Unit tests verify the API calls the correct register sequences by inspecting a mock FLASH peripheral struct. Integration tests run on actual hardware with a dedicated test firmware that exercises every sector — this catches brownout-sensitive boards early in production.

The EEPROM emulation layer sits above flash_driver.c and adds the swap/page management. It is board-agnostic: the same .c file compiles on F4, G0, and U5 just by swapping the flash driver underneath. I keep the emulation layer simple — two pages, a 16-bit CRC, and no wear-leveling beyond the swap. For most calibration use cases (fewer than 100 000 writes over the product lifetime), this is more than sufficient.

If the client insists on byte-addressable EEPROM from day one, I evaluate the cost of an external I²C EEPROM (like a 24LC512) versus the two-page flash emulation. For volumes above 10K units, the BOM savings of on-chip flash emulation usually win, and the firmware effort is a one-time cost.

Sources and further reading

STM32F401 Reference Manual (RM0368) — Chapter 3: Flash memory interface. Register map, lock/unlock sequence, sector layout, and programming/erase timing.
STM32G0x0 Reference Manual (RM0444) — Chapter 4: Flash. Page erase, write protection, and boot configuration.
STM32U5 Reference Manual (RM0456) — Chapter 4: Flash. Dual-bank architecture, RWW, and secure flash programming.
ST Application Note AN4760 — EEPROM emulation for STM32F4 microcontrollers. Complete reference for two-page swap with wear-leveling.
ST Application Note AN4657 — STM32 Flash programming. Covers write/erase sequences across multiple STM32 families.
ST Application Note AN4776 — General-purpose timer cookbook (contains flash write timing references for time-triggered programming).
STM32CubeF4 firmware package — FLASH_WriteRead example in Projects/STM32F401RE-Nucleo/Examples/FLASH/.

Comments

Have comments? Send me an email.

Send me an email