Two models are cheaper than one

Gone are the days when you could just hammer away at your code with the largest frontier model and the costs would be reasonable. The obvious alternatives come with trade-offs. Switching to a cheaper model or running one locally means accepting a quality hit you might not want to take across the board. So the real question is: how do you keep using the best model where it matters, without paying for it everywhere?

The architect and the engineer

My current setup splits the work in two. I use a frontier model as an “architect”. It holds the big picture, decides on architectural questions, and supervises the output. A smaller, cheaper model acts as the “engineer”. The engineer does the actual implementation work. It writes the code, but only to match the handoff from the architect.

The architect never writes a line of code. It designs, asks me questions, reviews the engineer’s output, and merges the result. The engineer never has to care about the big picture. The bulk of token consumption happens on the cheaper model. The frontier model is involved, but mostly at the high level. Costs drop significantly.

There’s a side benefit too: context rot. Agent contexts fill up fast when you’re deep in implementation. File contents, diffs, error messages, test output. It all accumulates, and the model starts losing track of the bigger picture. The architect never sees any of that. Its context stays clean: design questions, decisions, review comments. You can run long architect sessions without them degrading.

How it works in practice

In Cursor I keep two named threads open per topic: “A: [topic]” for the architect and “E: [topic]” for the engineer. I open the architect thread and work through a design. The result is a Linear issue (Cursor creates it via MCP) with a self-contained implementation plan posted as a comment. If the decision involves a significant architectural choice, an ADR goes into the repo too. That comment is the handoff: everything the engineer needs, with none of our design conversation.

Then I switch to the engineer thread and type “Pickup FOR-134”. The agent reads the issue and its comments directly. It finds the handoff comment and starts implementing. When it’s done I go back to the architect: “Review”. The architect reads the diff and posts findings. Then back to the engineer: “Address review comments”. Repeat until it’s clean, then the architect merges.

The short commands work because the context is already there. The engineer knows which issue it’s on. The architect knows what was handed off. Nothing needs re-explaining.

Taking the engineer further

Once you’ve framed the engineer as an interchangeable role, the natural next question is: does it have run on your machine?

Cloud agents can take the engineer role directly. In Linear you assign the issue to Cursor. The agent picks it up, reads the handoff, works on a branch, and opens a PR when done. The architect reviews the PR on GitHub. If there are findings, you send them back through Linear: @Cursor with the review comments, and the same agent resumes where it left off.

This is where I’m heading next. Multiple architect threads open locally, one per active topic. All the engineers in the cloud, picking up issues and opening PRs.

You could go further and automate the architect too. But I don’t want to, at least not yet. The architect is where I still add real value: shaping what gets built, asking the right questions, catching things before they get implemented. Keeping that role gives me a meaningful amount of control over a system that is otherwise largely running on its own.

The architect and the engineer#

How it works in practice#

Taking the engineer further#

The architect and the engineer

How it works in practice

Taking the engineer further