STM32 RCC Configuration: HSE, PLL, and Flash Latency from Registers

2026-05-31 · Davide Carrese
STM32 · RCC · Firmware Architecture
Comments

Every STM32 project that runs above the default HSI frequency needs to configure the RCC clock tree. Getting the sequence wrong — configuring PLL dividers before the oscillator is ready, switching the system clock before setting flash wait-states, or forgetting to enable the PLL — produces a silent hard fault or a chip that boots at 16 MHz instead of 168. This article walks through the register-level sequence used on the STM32F4 family, with notes on differences for the STM32G4, L4, and U5 series.

Why register-level RCC configuration matters

CubeMX and the HAL SystemClock_Config() template work for prototyping. But in production firmware you often need to:

Knowing the register sequence lets you read back the RCC status registers and diagnose failures in minutes instead of guessing.

The RCC clock tree: high-level map

All STM32 families share the same conceptual tree. At reset the chip runs from HSI (High-Speed Internal, typically 8 MHz on F4/L4, 16 MHz on G4/U5). From there you can:

Correct startup sequence (register-level)

The sequence must follow this exact order. Any deviation risks a hard fault when the CPU tries to fetch the next instruction at a clock rate the flash cannot deliver.

Step 1: Enable HSE and wait for ready

RCC->CR |= RCC_CR_HSEON;
while (!(RCC->CR & RCC_CR_HSERDY)) { /* wait */ }

The HSE oscillator takes a few hundred microseconds to stabilize, depending on the crystal load capacitance. The ready bit is set by hardware. Do not proceed until it asserts. If your board does not have an external crystal, HSE never becomes ready and you must use HSI bypass or switch to the PLL sourced from HSI.

Step 2: Configure flash wait-states

This must happen before the system clock frequency increases. The flash memory has a maximum access speed. On STM32F401/411 at 3.3 V:

FLASH->ACR = FLASH_ACR_PRFTEN | FLASH_ACR_ICEN | FLASH_ACR_DCEN | FLASH_ACR_LATENCY_5WS;

The prefetch buffer, instruction cache, and data cache are independent of latency but should be enabled after setting the wait-state bits. On STM32G4 and STM32U5, the ART accelerator replaces the legacy cache — use the same principle: configure latency first, then enable accelerators.

Step 3: Configure PLL dividers

On a STM32F401 with an 8 MHz HSE, targeting 84 MHz SYSCLK:

// PLLM = 8, PLLN = 336, PLLP = 4  →  SYSCLK = 8 / 8 * 336 / 4 = 84 MHz

RCC->PLLCFGR = (8 << RCC_PLLCFGR_PLLM_Pos)    // M = 8
              | (336 << RCC_PLLCFGR_PLLN_Pos)   // N = 336
              | (0 << RCC_PLLCFGR_PLLP_Pos)     // P = 4  (0b00 maps to P=4)
              | RCC_PLLCFGR_PLLSRC_HSE;           // Source = HSE

The PLLCFGR register can only be modified when the PLL is disabled. Writing to it while PLL is running has no effect on most STM32 families.

Step 4: Enable PLL and wait for lock

RCC->CR |= RCC_CR_PLLON;
while (!(RCC->CR & RCC_CR_PLLRDY)) { /* wait */ }

The PLL lock time is typically 50–200 µs depending on the input frequency and VCO range. Poll the ready bit — do not use a fixed delay.

Step 5: Configure AHB/APB prescalers

Set the prescalers before switching SYSCLK so the bus frequencies are defined when the switch occurs.

// HPRE = 1 (no division), PPRE1 = 2 (APB1 = HCLK/2), PPRE2 = 1 (APB2 = HCLK)
RCC->CFGR = RCC_CFGR_HPRE_DIV1 | RCC_CFGR_PPRE1_DIV2 | RCC_CFGR_PPRE2_DIV1;

Step 6: Switch system clock to PLL

RCC->CFGR = (RCC->CFGR & ~RCC_CFGR_SW_Msk) | RCC_CFGR_SW_PLL;
while ((RCC->CFGR & RCC_CFGR_SWS_Msk) != RCC_CFGR_SWS_PLL) { /* wait */ }

Check the status bits (SWS), not just the switch bits (SW). The hardware takes a few cycles to migrate the clock source. Reading back SWS confirms the switch completed successfully.

Practical example: STM32F411 from HSI to 100 MHz

Some boards have no HSE crystal (e.g., the WeAct STM32F411CEU6 "Black Pill"). You must use the PLL sourced from HSI (16 MHz on F411):

void SystemClock_HSI_100MHz(void) {
    // 1. Flash: 3 wait-states for 100 MHz
    FLASH->ACR = FLASH_ACR_PRFTEN | FLASH_ACR_ICEN | FLASH_ACR_DCEN
               | FLASH_ACR_LATENCY_3WS;

    // 2. PLL: HSI16 / 8 * 100 / 2 = 100 MHz
    // On STM32F411 PLLP=0b00 maps to P=2
    RCC->PLLCFGR = (8 << RCC_PLLCFGR_PLLM_Pos)
                 | (100 << RCC_PLLCFGR_PLLN_Pos)
                 | (0 << RCC_PLLCFGR_PLLP_Pos)
                 | RCC_PLLCFGR_PLLSRC_HSI;

    RCC->CR |= RCC_CR_PLLON;
    while (!(RCC->CR & RCC_CR_PLLRDY));

    // 3. AHB/APB prescalers
    RCC->CFGR = RCC_CFGR_HPRE_DIV1 | RCC_CFGR_PPRE1_DIV2 | RCC_CFGR_PPRE2_DIV1;

    // 4. Switch
    RCC->CFGR = (RCC->CFGR & ~RCC_CFGR_SW_Msk) | RCC_CFGR_SW_PLL;
    while ((RCC->CFGR & RCC_CFGR_SWS_Msk) != RCC_CFGR_SWS_PLL);
}

Important: PLLP encoding differs between families. On F4, PLLP=0b00 → P=4 (minimum), while on F411 the same encoding gives P=2. Always check the PLLP divider mapping in your target reference manual. A wrong PLLP setting produces the wrong frequency but the PLL still locks — the bug hides until you measure the output.

Flash latency per family: what changes

The flash latency tables differ across STM32 families. Here are the key reference manual sections to consult:

Common failure modes

Hard fault immediately after clock switch

Almost always caused by insufficient flash wait-states. The CPU issues a fetch at 168 MHz, the flash delivers data at 30 MHz speed, and the bus returns garbage — typically seen as a HardFault in the first instruction after the SWS check passes. Fix: increase LATENCY before switching.

PLL never locks

Check the PLL VCO frequency range (typically 100–432 MHz for F4, 64–344 MHz for G4). If N × HSE / M is outside the valid VCO range, the PLL never asserts RDY. Debug by reading back RCC->PLLCFGR and computing the VCO frequency manually.

APB1 peripheral at wrong frequency

Timers clocked from APB1 (TIM2–TIM7 on F4) count at double the APB1 bus frequency when APB1 prescaler is not 1. If PPRE1 = 2, the timer clock is 2 × APB1. A common CubeMX mismatch: the user sets TIM prescaler assuming APB1 = 42 MHz, but PPRE1 = 4 halves the bus to 21 MHz and the timer ticks at 42 MHz anyway. Always compute APB1_timer_clock = HCLK / ppres1 * (ppres1 == 1 ? 1 : 2).

Practical checklist

How I would approach this on a client project

On a production firmware project, I never embed a single hardcoded SystemClock_Config(). Instead I write a clock configuration structure that carries the target frequency, flash WS, and PLL dividers as compile-time macros:

typedef struct {
    uint32_t pll_m;
    uint32_t pll_n;
    uint32_t pll_p;
    uint32_t pll_q;
    uint8_t  flash_latency;
    uint8_t  hpre, ppre1, ppre2;
} rcc_config_t;

static const rcc_config_t RCC_CFG_84MHZ = {
    .pll_m = 8, .pll_n = 336, .pll_p = 4, .pll_q = 7,
    .flash_latency = FLASH_ACR_LATENCY_3WS,
    .hpre = RCC_CFGR_HPRE_DIV1,
    .ppre1 = RCC_CFGR_PPRE1_DIV2,
    .ppre2 = RCC_CFGR_PPRE2_DIV1,
};

int rcc_apply(const rcc_config_t *cfg);  // returns 0 on success

This structure lives in a dedicated rcc.c module with its own unit tests (checked against the reference manual table). When the client swaps the crystal or changes the target frequency, they edit one header, recompile, and verify with a logic analyser on MCO — no CubeMX regen, no copy-paste mistakes.

Sources and further reading

Comments

Have comments? Send me an email.