Home Latest

latest / The rewrite that wasn't

A few years ago I was on a team that decided to rewrite the monolith. You know the one. Every company has one. Ours was Ruby on Rails, of course it was, and the plan was to carve it into services — pick a language, doesn’t matter which, the point was that anything was better than This.

We did not rewrite the monolith. We spent eighteen months trying, shipped two services, discovered that the seams we’d drawn were in the wrong place, moved them, broke things, ran two systems in parallel for a quarter, and finally — quietly, without a meeting — gave up and started fixing the monolith instead.

The fix took six weeks. It was mostly: delete dead code, add indexes, split the one truly horrible model into three reasonable ones, and stop holding transactions open across HTTP calls. Performance doubled. Defect rate halved. The monolith remains, in production, doing fine.

I think about this often when I read architecture posts. The thing nobody puts in the post is the counterfactual — “we considered just deleting half of it instead, and that would have worked too, and been cheaper.” Because that’s not a story. “We fixed the existing system with sustained boring effort” is not a conference talk. “We rebuilt it in Rust” is.

The lesson I took, and which I now bore colleagues with on a regular basis, is that rewrites mostly fail not because the new technology is wrong, but because the team underestimates how much of the old system’s value is embedded in the bugs they’ve already fixed. You don’t see the bugs anymore. They’re invisible. They’re also load-bearing. The new system meets them all again, in a fresh order, and it isn’t fun.

If you’re about to rewrite a system: first, try fixing it. If you can’t articulate exactly why fixing it won’t work — not “it’s a mess,” but specifically what about it can’t be incrementally improved — you haven’t earned the rewrite yet.