← Journal

12 Feb 2026 · engineering

Predictable Beats Heroic

Two years spent making reliability uneventful — and the quiet things that cost.

The first incident bridge I ran at Sequifi was a good one, by the standards of the time. Seven people, three time zones, a customer with real money moving through a system that had stopped moving it. We found the cause around 3:40 in the morning, shipped the fix, watched the graphs come back. Someone said good work in the channel. I closed my laptop and felt the specific high you get from having been useful in a way the day rarely lets you be. Then I sat with that high and decided I did not trust it.

Here is the thing nobody says about the 3 a.m. war room: it is the most flattering room in engineering. Everyone in it is competent, focused, and visibly indispensable. The adrenaline does your motivation for you. And that is exactly the problem. A system that regularly assembles its best people in the dark to save it is not testing their character — it is billing them for a decision someone made months earlier and never wrote down. The heroism is real. The need for it is a defect.

So I spent two years trying to make reliability uneventful, which is a harder sell than it sounds, because nothing about it has a highlight reel. When we joined I inherited a team of about thirty carrying a product under more load than its scaffolding was built for. We crossed sixty engineers across six teams in two quarters, behind north of fifty million in ARR, and the temptation in that kind of growth is to staff the late-night bridge better. Hire people who are good in the dark. I went the other direction. I wanted fewer nights that required anyone to be good at all.

The work was unglamorous on purpose. We treated every incident as a question about design, not about the person holding the pager — what made this possible, what would have caught it earlier, what assumption was load-bearing and undocumented. We pushed ownership down so the team that wrote a service ran it, which sounds like a burden and lands as the opposite: you build differently when the pager is yours. We wrote the recovery steps, and then did the colder thing of deleting the ones a healthy system should never have needed. Customer-impacting incidents fell forty percent. Uptime held above 99.9. On-time delivery moved from roughly sixty percent to north of ninety-five — the same story told in daylight, because predictability is one discipline and it does not respect the boundary between the outage and the roadmap.

The AI tooling fit the same frame, not as a velocity story but as a variance one. Copilot, Cursor, Claude across the org bought us around twenty-five percent more throughput with nothing I could measure as lost quality. I cared less about the speed than about the floor it raised — fewer ways for a tired engineer at the edge of their knowledge to make the quiet mistake that becomes someone’s morning.

I want to be honest about what this traded away, because de-glorifying heroics is not free. The first cost is the invisible work itself. The engineer who refactors the brittle thing on a Tuesday so it never pages anyone produces nothing you can point at. No graph goes green, no thread says good work, no story for the standup. You have to build the recognition by hand — name the prevention out loud, promote the people who make the quiet stretches look easy — or the system quietly teaches everyone that the way to be seen is to let it break and then save it. That incentive is corrosive, and most orgs run on it without noticing.

The second cost is cultural, and it cut closer. Some of my strongest people loved the late-night bridge. Not the failure — the proof. It was where they felt sharpest, most needed, most themselves. When you take that away you are not just removing pain, you are removing a stage where certain people knew exactly who they were. A few felt the floor go quiet and read the quiet as being needed less. I don’t think I handled all of it well. What I learned to offer instead was harder, slower work — the architecture call, the system that holds at the next order of magnitude — where the same instinct pays off without anyone losing a night for it.

I still keep the recovery doc I signed off on, the one we eventually stopped needing. I sat in those rooms. I ran the bridges and meant the good work I typed at 3:40 in the morning. None of that is something I am trying to erase. I am only trying to stop charging my best people for it. Heroic is what you reach for when the design has already failed. Predictable is the harder thing to build and the quieter thing to be proud of — and two years in, I am certain it is the one worth choosing.