STM32 RCC Configuration: HSE, PLL, and Flash Latency from Registers
Every STM32 project that runs above the default HSI frequency needs to configure the RCC clock tree. Getting the sequence wrong — configuring PLL dividers before the oscillator is ready, switching the system clock before setting flash wait-states, or forgetting to enable the PLL — produces a silent hard fault or a chip that boots at 16 MHz instead of 168. This article walks through the register-level sequence used on the STM32F4 family, with notes on differences for the STM32G4, L4, and U5 series.
Why register-level RCC configuration matters
CubeMX and the HAL SystemClock_Config() template work for prototyping. But in production firmware you often need to:
- Reconfigure clocks after a low-power wakeup (STOP or STANDBY) without the 50 ms HAL warm-up.
- Switch to a bypass source for HSE (external clock injection during board testing).
- Adjust PLL dividers at runtime based on temperature or power budget.
- Understand why a CubeMX-generated sequence sometimes hard-faults when moved to a custom board.
Knowing the register sequence lets you read back the RCC status registers and diagnose failures in minutes instead of guessing.
The RCC clock tree: high-level map
All STM32 families share the same conceptual tree. At reset the chip runs from HSI (High-Speed Internal, typically 8 MHz on F4/L4, 16 MHz on G4/U5). From there you can:
- HSE — external crystal or clock (4–26 MHz typical).
- PLL — sourced from HSE or HSI, with configurable input divider (M), VCO multiplier (N), and output dividers (P, Q, R).
- System clock switch — selects HSI, HSE, or PLL as SYSCLK.
- AHB/APB prescalers — divide SYSCLK down for the AHB bus (HCLK), APB1 (max 42 MHz on F4), and APB2 (max 84 MHz on F4).
Correct startup sequence (register-level)
The sequence must follow this exact order. Any deviation risks a hard fault when the CPU tries to fetch the next instruction at a clock rate the flash cannot deliver.
Step 1: Enable HSE and wait for ready
RCC->CR |= RCC_CR_HSEON;
while (!(RCC->CR & RCC_CR_HSERDY)) { /* wait */ }
The HSE oscillator takes a few hundred microseconds to stabilize, depending on the crystal load capacitance. The ready bit is set by hardware. Do not proceed until it asserts. If your board does not have an external crystal, HSE never becomes ready and you must use HSI bypass or switch to the PLL sourced from HSI.
Step 2: Configure flash wait-states
This must happen before the system clock frequency increases. The flash memory has a maximum access speed. On STM32F401/411 at 3.3 V:
- 0 wait-states: ≤ 30 MHz
- 1 wait-state: 30–60 MHz
- 2 wait-states: 60–90 MHz
- 3 wait-states: 90–120 MHz
- 4 wait-states: 120–150 MHz
- 5 wait-states: > 150 MHz (168 MHz max)
FLASH->ACR = FLASH_ACR_PRFTEN | FLASH_ACR_ICEN | FLASH_ACR_DCEN | FLASH_ACR_LATENCY_5WS;
The prefetch buffer, instruction cache, and data cache are independent of latency but should be enabled after setting the wait-state bits. On STM32G4 and STM32U5, the ART accelerator replaces the legacy cache — use the same principle: configure latency first, then enable accelerators.
Step 3: Configure PLL dividers
On a STM32F401 with an 8 MHz HSE, targeting 84 MHz SYSCLK:
// PLLM = 8, PLLN = 336, PLLP = 4 → SYSCLK = 8 / 8 * 336 / 4 = 84 MHz
RCC->PLLCFGR = (8 << RCC_PLLCFGR_PLLM_Pos) // M = 8
| (336 << RCC_PLLCFGR_PLLN_Pos) // N = 336
| (0 << RCC_PLLCFGR_PLLP_Pos) // P = 4 (0b00 maps to P=4)
| RCC_PLLCFGR_PLLSRC_HSE; // Source = HSE
The PLLCFGR register can only be modified when the PLL is disabled. Writing to it while PLL is running has no effect on most STM32 families.
Step 4: Enable PLL and wait for lock
RCC->CR |= RCC_CR_PLLON;
while (!(RCC->CR & RCC_CR_PLLRDY)) { /* wait */ }
The PLL lock time is typically 50–200 µs depending on the input frequency and VCO range. Poll the ready bit — do not use a fixed delay.
Step 5: Configure AHB/APB prescalers
Set the prescalers before switching SYSCLK so the bus frequencies are defined when the switch occurs.
// HPRE = 1 (no division), PPRE1 = 2 (APB1 = HCLK/2), PPRE2 = 1 (APB2 = HCLK)
RCC->CFGR = RCC_CFGR_HPRE_DIV1 | RCC_CFGR_PPRE1_DIV2 | RCC_CFGR_PPRE2_DIV1;
Step 6: Switch system clock to PLL
RCC->CFGR = (RCC->CFGR & ~RCC_CFGR_SW_Msk) | RCC_CFGR_SW_PLL;
while ((RCC->CFGR & RCC_CFGR_SWS_Msk) != RCC_CFGR_SWS_PLL) { /* wait */ }
Check the status bits (SWS), not just the switch bits (SW). The hardware takes a few cycles to migrate the clock source. Reading back SWS confirms the switch completed successfully.
Practical example: STM32F411 from HSI to 100 MHz
Some boards have no HSE crystal (e.g., the WeAct STM32F411CEU6 "Black Pill"). You must use the PLL sourced from HSI (16 MHz on F411):
void SystemClock_HSI_100MHz(void) {
// 1. Flash: 3 wait-states for 100 MHz
FLASH->ACR = FLASH_ACR_PRFTEN | FLASH_ACR_ICEN | FLASH_ACR_DCEN
| FLASH_ACR_LATENCY_3WS;
// 2. PLL: HSI16 / 8 * 100 / 2 = 100 MHz
// On STM32F411 PLLP=0b00 maps to P=2
RCC->PLLCFGR = (8 << RCC_PLLCFGR_PLLM_Pos)
| (100 << RCC_PLLCFGR_PLLN_Pos)
| (0 << RCC_PLLCFGR_PLLP_Pos)
| RCC_PLLCFGR_PLLSRC_HSI;
RCC->CR |= RCC_CR_PLLON;
while (!(RCC->CR & RCC_CR_PLLRDY));
// 3. AHB/APB prescalers
RCC->CFGR = RCC_CFGR_HPRE_DIV1 | RCC_CFGR_PPRE1_DIV2 | RCC_CFGR_PPRE2_DIV1;
// 4. Switch
RCC->CFGR = (RCC->CFGR & ~RCC_CFGR_SW_Msk) | RCC_CFGR_SW_PLL;
while ((RCC->CFGR & RCC_CFGR_SWS_Msk) != RCC_CFGR_SWS_PLL);
}
Important: PLLP encoding differs between families. On F4, PLLP=0b00 → P=4 (minimum), while on F411 the same encoding gives P=2. Always check the PLLP divider mapping in your target reference manual. A wrong PLLP setting produces the wrong frequency but the PLL still locks — the bug hides until you measure the output.
Flash latency per family: what changes
The flash latency tables differ across STM32 families. Here are the key reference manual sections to consult:
- STM32F401/411: RM0368 / RM0383 — 0 WS up to 30 MHz, +1 WS every 30 MHz, max 5 WS for 168 MHz.
- STM32F4 high-density (F429, F439): RM0090 — same 30 MHz step, max 5 WS.
- STM32G4: RM0440 — 0 WS up to 32 MHz, +1 WS every 32 MHz, max 4 WS at 170 MHz. Also has CCM-SRAM for 0-standalone execution.
- STM32L4/L4+: RM0351 — 0 WS up to 16 MHz at 1.8 V, up to 4 WS at 120 MHz. Voltage scaling (VOS) affects the max frequency per WS level.
- STM32U5: — TrustZone-aware flash controller with 0 WS up to 16 MHz and up to 8 WS at 160 MHz, depending on VOS range.
- STM32H7: RM0433 — separate latency for ITCM/DTCM vs AXI flash access. The TCM buses handle instruction fetches with lower latency at high speed, but the flash bank itself still needs wait-states.
Common failure modes
Hard fault immediately after clock switch
Almost always caused by insufficient flash wait-states. The CPU issues a fetch at 168 MHz, the flash delivers data at 30 MHz speed, and the bus returns garbage — typically seen as a HardFault in the first instruction after the SWS check passes. Fix: increase LATENCY before switching.
PLL never locks
Check the PLL VCO frequency range (typically 100–432 MHz for F4, 64–344 MHz for G4). If N × HSE / M is outside the valid VCO range, the PLL never asserts RDY. Debug by reading back RCC->PLLCFGR and computing the VCO frequency manually.
APB1 peripheral at wrong frequency
Timers clocked from APB1 (TIM2–TIM7 on F4) count at double the APB1 bus frequency when APB1 prescaler is not 1. If PPRE1 = 2, the timer clock is 2 × APB1. A common CubeMX mismatch: the user sets TIM prescaler assuming APB1 = 42 MHz, but PPRE1 = 4 halves the bus to 21 MHz and the timer ticks at 42 MHz anyway. Always compute APB1_timer_clock = HCLK / ppres1 * (ppres1 == 1 ? 1 : 2).
Practical checklist
- ☐ HSE ready bit polled before programming PLLCFGR.
- ☐ PLL disabled before writing PLLCFGR (on families that require it).
- ☐ Flash latency set to the correct WS count for the target frequency before clock switch.
- ☐ PLL VCO frequency within valid range (N·f_in / M).
- ☐ SWS verified after switching — not just SW written.
- ☐ APB1/APB2 prescalers respect max bus frequency (42 MHz APB1, 84 MHz APB2 on F4).
- ☐ HSI calibration value (
RCC->CSRorRCC->ICSCR) checked if using HSI as PLL source. - ☐ Overdrive (if applicable, on L4+/U5/H7) enabled when exceeding standard voltage range frequency.
- ☐ PLLP divider mapping confirmed against the reference manual (P=2 vs P=4).
How I would approach this on a client project
On a production firmware project, I never embed a single hardcoded SystemClock_Config(). Instead I write a clock configuration structure that carries the target frequency, flash WS, and PLL dividers as compile-time macros:
typedef struct {
uint32_t pll_m;
uint32_t pll_n;
uint32_t pll_p;
uint32_t pll_q;
uint8_t flash_latency;
uint8_t hpre, ppre1, ppre2;
} rcc_config_t;
static const rcc_config_t RCC_CFG_84MHZ = {
.pll_m = 8, .pll_n = 336, .pll_p = 4, .pll_q = 7,
.flash_latency = FLASH_ACR_LATENCY_3WS,
.hpre = RCC_CFGR_HPRE_DIV1,
.ppre1 = RCC_CFGR_PPRE1_DIV2,
.ppre2 = RCC_CFGR_PPRE2_DIV1,
};
int rcc_apply(const rcc_config_t *cfg); // returns 0 on success
This structure lives in a dedicated rcc.c module with its own unit tests (checked against the reference manual table). When the client swaps the crystal or changes the target frequency, they edit one header, recompile, and verify with a logic analyser on MCO — no CubeMX regen, no copy-paste mistakes.
Sources and further reading
- STM32F401 Reference Manual (RM0368) — Chapter 6: Reset and Clock Control (RCC).
- STM32F411 Reference Manual (RM0383) — RCC register map and PLL configuration examples.
- STM32G4 Reference Manual (RM0440) — Section 7: RCC with HSI16 and PLL tables.
- STM32L4 Reference Manual (RM0351) — Section 3: RCC with voltage scaling and flash WS tables.
- ARM Cortex-M4 Generic User Guide — system control block register descriptions.
- ST Application Note AN5027 — Using HSI16 clocks to improve EMC on STM32G0 series.
- ST Application Note AN2867 — Oscillator design guide (HSE crystal layout and startup time).

Comments
Have comments? Send me an email.