2026-06-22 · Davide Carrese

STM32 HardFault debugging — reading Cortex-M fault registers without a debugger

STM32 · Cortex-M · Fault · Debugging

Nothing worse than a deployed board showing no signs of life, a watchdog barking every few seconds, and no JTAG/SWD in sight. The HardFault_Handler is your last line of defence — here is how to make it talk.

HardFaults are the embedded engineer's equivalent of a segfault on Linux. On Cortex-M, the CPU enters the HardFault exception vector when it encounters an unrecoverable error condition — or when the fault escalates from a configurable exception (MemManage, BusFault, UsageFault) that was not enabled or whose handler was not installed. The result is the same: the program counter jumps to HardFault_Handler and unless you capture the context, you have no idea what went wrong.

In production, you rarely have a debugger attached. Serial-over-UART, a last-resort flash dump, or even a blinking-LED pattern may be your only output channels. This article walks through the Cortex-M fault architecture, the registers you must read, and a battle-tested HardFault_Handler implementation that decompresses the failure cause without GDB.

Anatomy of a HardFault on Cortex-M

The Cortex-M exception model distinguishes four fault exception types. Only HardFault (exception number 3, priority -1) is always enabled. The other three are optional:

When any of these occurs and the corresponding handler is not configured, or the fault occurred in the vector fetch itself, the processor escalates to the fixed-priority HardFault handler. Understanding which underlying fault triggered the escalation is the first step.

The registers you need to read

All the diagnostic data lives in the System Control Block (SCB) memory-mapped registers, accessible at fixed addresses. You must read them inside HardFault_Handler before any operation that might modify them (function calls, stack pushes on certain optimisations).

Configurable Fault Status Register — CFSR (0xE000ED28)

The 32-bit CFSR is the most important register. It packs three sub-registers:

BitsNameWidthKey flags
31:16MemManage (MMFSR)8MMARVALID, MSTKERR, MUNSTKERR, DACCVIOL, IACCVIOL, MMFARVALID
15:8BusFault (BFSR)8BFARVALID, STKERR, UNSTKERR, IMPRECISERR, PRECISERR, IBUSERR
7:0UsageFault (UFSR)16DIVBYZERO, UNALIGNED, NOCP, INVPC, INVSTATE, UNDEFINSTR

On Cortex-M33/M55, the sub-register layout is the same but additional flags exist (e.g., STKOF for stack-overflow on M33 with the Main Stack Extension).

HardFault Status Register — HFSR (0xE000ED2C)

This tells you why the HardFault was taken. The two critical bits:

Fault Address Registers

These are invaluable: they tell you which address was being accessed, which often points directly to a NULL pointer dereference, a stale pointer, or a peripheral register accessed before its clock was enabled.

Recovering the stacked context

When the processor takes a fault exception, it pushes eight registers onto the current stack (MSP or PSP depending on the active stack pointer before the fault): R0, R1, R2, R3, R12, LR (EXC_RETURN), PC (the faulting instruction), and xPSR.

The stacked PC is the single most useful value: it is the instruction that was being executed when the fault occurred. With an ELF/Map file or an addr2line lookup, you get the exact function and source line. The stacked LR tells you the caller.

To extract these values, you need the value of the stack pointer at the moment of the exception entry. The easiest way without assembly:

__attribute__((naked))
void HardFault_Handler(void) {
    __asm volatile(
        " movs  r0, #4      \n"
        " mov   r1, lr      \n"   /* EXC_RETURN tells us which stack */
        " tst   r0, r1      \n"
        " itte  eq          \n"
        " mrseq r0, msp     \n"   /* if bit 1 clear → MSP */
        " mrsne r0, psp     \n"   /* if bit 1 set   → PSP */
        " bl    hard_fault_capture\n"
        " b     .\n"               /* infinite loop; or call NVIC_SystemReset */
    );
}

typedef struct {
    uint32_t r0, r1, r2, r3, r12, lr, pc, psr;
} FaultStack;

void hard_fault_capture(uint32_t *sp) {
    FaultStack *frame = (FaultStack *)sp;

    /* Read fault registers */
    uint32_t cfsr = *(volatile uint32_t *)0xE000ED28;
    uint32_t hfsr = *(volatile uint32_t *)0xE000ED2C;
    uint32_t mmfar = *(volatile uint32_t *)0xE000ED34;
    uint32_t bfar  = *(volatile uint32_t *)0xE000ED38;

    /* Print over UART */
    printf("=== HardFault ===\n");
    printf("CFSR:  0x%08lX\n", (unsigned long)cfsr);
    printf("HFSR:  0x%08lX  FORCED=%lu  VECTTBL=%lu\n",
           (unsigned long)hfsr,
           (unsigned long)((hfsr >> 30) & 1),
           (unsigned long)((hfsr >> 1)  & 1));
    printf("Fault PC:  0x%08lX  (stacked)\n", (unsigned long)frame->pc);
    printf("Fault LR:  0x%08lX  (caller)\n",  (unsigned long)frame->lr);
    if (cfsr & (1 << 7))  /* MMFSR.MMARVALID */
        printf("MMFAR: 0x%08lX\n", (unsigned long)mmfar);
    if (cfsr & (1 << 15)) /* BFSR.BFARVALID  */
        printf("BFAR:  0x%08lX\n", (unsigned long)bfar);
}

On Cortex-M7 with L1 cache enabled, beware: the stacked PC and LR you read may show a stale address if the fault is imprecise (BFSR.IMPRECISERR set). An imprecise BusFault means the write was buffered in the store buffer and the actual faulting instruction is long past. In that case, the BFAR is not populated and you must use a DSB barrier to flush pending writes before the fault point, or enable BFHFNMIGN (at your own risk).

Practical example: NULL pointer dereference on STM32F401

Consider a typical contractor scenario: an STM32F401 project with FreeRTOS, a custom sensor driver, and a shared data structure allocated via pvPortMalloc. The device crashes after exactly 47 minutes of uptime.

You add the HardFault_Handler above, deploy the firmware, and the next crash produces:

=== HardFault ===
CFSR:  0x00008200
HFSR:  0x40000000  FORCED=1  VECTTBL=0
Fault PC:  0x08003A4C
Fault LR:  0x08004B10

Decoding CFSR 0x00008200:

You read BFAR (you should have captured it — do not forget to add BFAR to the print!):

BFAR: 0x00000004

Now lookup the fault PC (0x08003A4C):

arm-none-eabi-addr2line -e build/project.elf 0x08003A4C
# Output: sensor_driver.c:142

Line 142 of sensor_driver.c reads:

status = sensor_ctx->config->sample_rate;

The sensor_ctx pointer was valid, but config was NULL. BFAR = 0x00000004 means the CPU tried to load from offset 4 of the NULL page — i.e., ((ConfigStruct *)0)->sample_rate. The root cause: a config pointer that was never initialised before the sensor was used in a particular code path triggered by a rare sensor-read timeout after 47 minutes.

Without the HardFault capture, you would be staring at a dead board. With it, the fix is a one-line initialisation guard.

Practical checklist for client projects

When I start a new STM32 contract or join an existing project, the first thing I do is add a HardFault capture to the startup code. Here is my standard checklist:

  1. Add the naked HardFault_Handler in main.c or a dedicated fault_handler.c. Do not rely on the weak default — it just loops.
  2. Enable Configurable Faults in SystemInit() or early in main(). Write to SCB->SHCSR to enable UsageFault, BusFault, and MemManage handlers. Without this, they escalate silently.
  3. Enable DIVBYZERO and UNALIGNED traps via SCB->CCR. Catching a division-by-zero with printf is infinitely better than chasing a random HardFault days later.
  4. Route output over a diagnostic UART (TX only, 115200 8N1) with DMA or interrupt. Even a 64-byte buffer is enough to dump the CFSR/HFSR/PC.
  5. Cross-reference the PC against the build ELF. I keep a make fault-lookup target that calls arm-none-eabi-addr2line against the latest build artifact.
  6. For production firmware, add a flash log. Write the CFSR, HFSR, BFAR/MMFAR, and the stacked PC to a reserved sector in flash. On the next boot, check for the fault signature and report it over UART or BLE before continuing normal operation.

How I would approach this on a client project

When I join a project with an intermittent HardFault, I do not modify the application logic. Instead I:

  1. Add a fault_handler.c file with the naked assembly wrapper and capture function shown above.
  2. Enable all three configurable fault handlers in SystemInit_Ext (a hook called early).
  3. Route the UART output to the debug serial port — typically USART2 on STM32F4 Nucleo, or PA9/PA10 if using UART1.
  4. Deploy the firmware, reproduce the fault, capture the output.
  5. Lookup the PC with addr2line and fix the root cause — not the symptom. If the BFAR points to an address near zero, the first suspect is an uninitialised pointer in the function reported by addr2line.

This approach has saved me days of debugging across at least five separate client projects. The one-time cost of adding the handler (about 30 minutes including UART initialisation) pays for itself the first time a board crashes after hours of operation.

Sources and further reading

Comments

Have comments? Send me an email.