How I Run Code Reviews for Firmware Teams (And Why Most Reviews Waste Everyone's Time)

I spent three years on a project where every pull request went through exactly two reviewers, a mandatory 24-hour waiting period, and a checklist with forty-seven items. The code was still full of bugs. The safety-critical module had a race condition that survived fourteen reviews. The junior developer spent more time waiting for approvals than writing code. And the team hated the process so much that people started batch-committing at the end of sprints just to avoid the overhead.

That experience taught me something uncomfortable: most code review processes are designed to make managers feel safe, not to actually improve code. The rituals — mandatory reviewers, minimum comment counts, approval gates — create the illusion of quality while consuming enormous amounts of engineering time.

Over the following years, I stripped the process down, rebuilt it from first principles, and tested it across four different firmware teams. Here is what survived.

The First Principle: Reviews Catch Different Bugs Than Testing

The single biggest mistake I see in firmware teams is treating code review as a substitute for testing, or testing as a substitute for review. They catch different things:

What testing finds	What review finds
Logic errors, timing violations, edge cases in execution	Misunderstood APIs, wrong assumptions about hardware behaviour
Incorrect register values and bitfield masks	Register configuration that works but is not what the datasheet recommends for the use case
DMA descriptor chain corruption under load	DMA buffer layout that will be a maintenance nightmare in six months
Stack overflow from deep call chains	Missing `volatile` qualifiers on shared variables accessed by ISR and main loop

I have seen teams invest in elaborate hardware-in-the-loop test rigs while their code review consisted of a single "LGTM" comment from the most junior person on the team. And I have seen teams with mandatory two-person reviews on every commit to an internal-only test harness, while the bootloader that goes on ten thousand devices got a rubber stamp because "it was already reviewed in the prototype phase."

The fix is simple: match the review depth to the risk profile of the code, and never confuse process completion with quality assurance.

The Three-Tier Review Model

I use three tiers of review, applied based on the code's criticality:

Tier	When	Who	Depth
Tier 1 — Glance	Internal tools, test harnesses, build scripts, documentation	One person, any seniority (can be the author self-merging after 2 hours)	Check for obvious mistakes and naming. No style comments.
Tier 2 — Standard	Application logic, protocol implementations, new peripheral drivers	One person with domain knowledge, within 24 hours	Correctness, API design, concurrency assumptions, register-level sanity. Style only if it affects readability.
Tier 3 — Deep	Bootloaders, safety-critical modules, power management sequences, watchdog configurations	Two people: one domain expert, one architect/senior who did not work on the module	Line-by-line if needed. Walk through every state transition, every error path, every timing assumption. Document the review outcome in the commit message.

The key insight: most firmware code is Tier 2. Treating all code as Tier 3 burns out reviewers and slows the team to a crawl. Treating bootloader code as Tier 1 is how devices brick in the field.

I communicate this tiers model explicitly at the start of every engagement — in the contract or the project kick-off. The client pays for Tier 3 reviews on critical modules and Tier 2 on the rest. Everyone knows where the line is drawn, and nobody argues about "why this change needs two reviewers."

The Anti-Patterns I Ban on Day One

These four patterns destroy the value of code reviews in firmware teams. I call them out in the first team meeting:

1. Style comments disguised as correctness concerns.
"Please rename i2c_transfer() to i2c_transfer_data() because it also transfers the address byte." — This is a style preference dressed up as pedantry. If the function works, is documented, and has a clear contract, let it go. Style debates in reviews are the single biggest time sink, and they disproportionately frustrate junior developers. Use a formatter (clang-format), set it in CI, and move on.

2. The "I would have done it differently" comment.
If the approach is correct, performant enough, and safe, the alternative implementation preference is not actionable. The reviewer should ask "does this approach have a correctness issue?" not "would I have written it this way?"

3. Mandatory N + 1 reviewers.
The most common failure mode: a project requires two reviewers, so every PR waits for two people. Meanwhile, the most knowledgeable reviewer has already approved, and the second reviewer is a formality who skim-reads and clicks Approve. The review adds zero value but doubles the latency. The solution: one reviewer with the right domain knowledge is worth more than two random reviewers.

4. Reviewing generated or configuration code.
STM32CubeMX-generated initialisation code, linker scripts pasted from the SDK example, CMSIS config headers — why are humans reviewing these? If the CubeMX output compiles and the hardware boots, the review is theatre. Focus reviewer time on the logic the engineer actually wrote.

The Embedded-Specific Review Checklist

Beyond the general code review principles, firmware code has failure modes that pure software reviews never encounter. These are the specific things I check on every Tier 3 review:

Volatile semantics. Is every memory-mapped register and every variable shared between ISR and main loop qualified with volatile? Is the qualifier correctly applied to the pointer target, not the pointer itself? (I have lost count of how many volatile uint32_t* declarations I have seen that should have been uint32_t volatile*.)
Interrupt priority and nesting. Does the code assume a certain interrupt priority scheme? Is there a lock (disable interrupts) that could cause a hard fault if the interrupt controller is configured differently on a different chip variant? Does the critical section mask only the right interrupts, not all of them?
Watchdog feeding. Is the watchdog fed from the main loop or from a timer ISR? If fed from the main loop, is there a path where the loop stalls (HAL_Delay with a stuck peripheral, an I2C bus lock) and the watchdog does not fire?
Error path testing. What happens when HAL_I2C_Master_Transmit() returns HAL_ERROR? Most firmware code assumes success. The review should verify that every API call has at least a log statement or a fallback on failure — and if it does not, the justification must be explicit ("this is a development-only path", "the state machine retries from the top").
Clock tree assumptions. Does the code hardcode a delay or a timeout based on a specific clock frequency? If the project moves from HSI to HSE or changes the PLL configuration, those delays break silently.
Endianness and alignment. Firmware often communicates over byte-oriented protocols (I2C, SPI, UART). Are multi-byte fields packed correctly? Is there a potential alignment fault when casting a byte buffer to a struct pointer?

The Infrastructure That Makes Reviews Fast

No amount of process optimisation will fix a review workflow where the reviewer has to check out the branch, compile it, flash it, and run it manually to understand the change. The infrastructure layer is where most teams should invest before tweaking their review policy:

CI builds every PR. Trivial — but you would be surprised how many firmware projects run make only on the developer's machine. CI should compile, run static analysis, and fail fast on warnings.
A diff that strips whitespace and formatting changes. GitHub's ?w=1 is your best friend. If the PR is 80% reformatting, that is a CI concern, not a review concern.
Hardware-in-the-loop smoke tests on critical PRs. Not full regression — just "does the UART still print?" and "does the LED sequence still run?" Enough to catch the obvious breakage before a human looks at the code.
A shared logic analyzer trace or oscilloscope screenshot for timing-critical changes. A picture of the SPI transaction is worth a hundred lines of review commentary.

What Has Worked for Me

Four things made the biggest difference in the teams I have led:

Writing review expectations in the team charter. A one-page document that says: "Tier 2 reviews must start within 24 hours. Tier 3 reviews are scheduled as a meeting. Style comments are banned unless a formatter rule exists. Reviewers approve or request changes — no 'optional nitpick' comments left hanging for days."
Blocking time for reviews. I schedule two 45-minute review blocks per day, first thing in the morning and right after lunch. No meetings, no Slack, no email. If the team knows the reviewer has dedicated time, they submit smaller, more focused PRs — which makes the reviews faster, which makes people submit even smaller PRs. A virtuous cycle.
Reviewing the review process. Every quarter, I ask the team two questions: "What are we spending too much time on in reviews?" and "What bugs slipped through review that we wish had been caught?" The answers drive process changes. One team realised they spent 40% of review time on linker script changes that nobody really understood — so they wrote a linker script generator and reviewed the generator once, not the output every time.
Pair programming on critical modules instead of review. For the bootloader and the power management state machine, I sit with the developer and we write the code together. It takes the same amount of time as a Tier 3 review but produces better code and transfers more knowledge. Then the PR gets a Tier 1 glance just for the record.

Practical Checklist

Classify code into three tiers (Glance / Standard / Deep) and match review depth to risk
Ban style comments — use clang-format in CI instead
One domain-expert reviewer is worth more than two random reviewers
Do not review generated code (CubeMX, SDK examples, linker scripts from vendor)
Check volatile semantics, interrupt priority, watchdog feeding on every firmware review
Verify error paths — most embedded code assumes success
Add CI build + static analysis before expecting humans to review
Schedule dedicated review time; block Slack and email during it
Review the review process quarterly with the team

Sources

Google's Code Review Developer Guide — the "look for correctness, not style" principle
Michael Fagan, "Design and Code Inspections to Reduce Errors in Program Development" (1976) — the original paper on formal inspections, still relevant for Tier 3
Personal experience across 8 firmware teams between 2018 and 2026