Smaller and local models have a specific failure mode: they forget instructions. You write careful rules in the system prompt. Ten messages in, the model ignores them. They’ve drifted out of its effective attention range.
Say you’ve told the model: “When storing notes about people, always put them under people/.” Early in the conversation it follows this. Then it doesn’t. You correct it. It follows it again. Then it doesn’t. The instruction is present in the context, but the model has stopped acting on it.
What if we injected the instructions exactly when the model needs them?
Guidance injection
The insight behind guidance injection is that instructions don’t need to be remembered if they’re delivered at the right moment.
Instead of embedding rules in the system prompt and hoping they stick, you store them as notes attached to specific tools. When the model calls a gated tool, the system intercepts and checks whether applicable guidance exists. If it does, the tool doesn’t run. The call returns a structured result:
{
"guidanceRequired": true,
"family": "memory",
"revision": "a3f9...",
"guidance": "Store notes about people under people/.",
"message": "Guidance for memory must be acknowledged before this operation."
}
The model reads the guidance, then retries the same call with guidanceAck set to the returned revision. Once the ack matches, the tool runs.
{
"tool": "memory_create",
"input": {
"path": "people/alice.md",
"content": "...",
"guidanceAck": "a3f9..."
}
}
The instructions arrive in the model’s immediate context, attached to the exact operation they govern. Not buried somewhere above the scroll from an hour ago.
One thing worth noting: the ack is a revision hash that travels in the tool call arguments, so it sits in the conversation history. A model that remembers it from an earlier call could theoretically reuse it later without re-reading the guidance. I expect this to be self-correcting in practice: the models most prone to forgetting instructions are also prone to forgetting the ack, so the handshake gets triggered again naturally. A model capable enough to track and reuse the hash across a long conversation is probably capable enough to follow the guidance without needing the gate.
What this changes
The system prompt can stay small. Instead of cramming every rule and special case into it upfront, guidance lives close to the tools it governs and only appears in context when those tools are actually used. A long conversation that never touches the memory tools never loads the memory guidance.
This matters most with smaller and local models, where context is limited and attention is less reliable. A large model can hold a dense system prompt and mostly follow it. A smaller model does better with a lean prompt and instructions that arrive exactly when they’re needed.