The OpenClaw Advisor Pattern: Cut Agent Costs 80% Without Losing Quality
Anthropic just officially announced their Advisor strategy, bringing it to the Claude Platform as a first-class deployment mode. The OpenClaw community was already there — the hot thread on r/openclaw this week ("Running agents on a cheap model + using Claude Code as an advisor") racked up hundreds of upvotes with users reporting indicative savings in the 60-80% range. This post is the consolidated reference implementation for doing it with OpenClaw today.
The pattern in one diagram
The advisor pattern splits a single agent into two cooperating models. A cheap executor handles the bulk of the work. A strong advisor is consulted only at decision points — when the executor hits a branch, a plan, or an unfamiliar failure mode.
User request
│
▼
┌─────────────┐ consult (rare) ┌──────────────┐
│ Executor │ ─────────────────▶ │ Advisor │
│ (cheap) │ │ (Opus 4.6) │
│ GLM-5.1 / │ ◀───────────────── │ │
│ Sonnet 4.6 │ plan / review │ │
└─────────────┘ └──────────────┘
│
▼
Final responseThe executor owns the session. It drives the tool loop, handles the user-visible back-and-forth, and only reaches up to the advisor when it needs a plan, a sanity check, or a hard decision it does not want to make alone.
Why it works
Most steps in an agent session are boring. Reading a file, running a grep, parsing a JSON response, formatting output — these do not need a frontier model. A well-tuned cheap model handles them at a tiny fraction of the cost and often at higher speed.
The steps that actually need intelligence are rare. Choosing which refactor strategy to take. Deciding whether a failing test is a bug or a test bug. Drafting a migration plan across five services. These decisions set the direction of everything that follows — and getting them wrong is expensive in executor tokens even if the model is cheap.
The analogy that keeps coming up on r/openclaw: a junior engineer with a senior mentor on Slack. The junior does the work. When they hit something they are not sure about, they paste a summary into Slack and wait for the senior's call. The senior is not pair-programming every keystroke — that would defeat the point of having a junior at all.
The pattern works because agent sessions have a heavy-tailed distribution of step difficulty. A small fraction of steps matter a lot. Route those to the advisor, and let the executor handle the rest.
When to use it
The pattern is not free — wiring it up adds complexity, and for simple agents it adds nothing. Here is the short decision table.
| Scenario | Use advisor? | Why |
|---|---|---|
| Long-horizon refactor | Yes | Complex planning needed upfront; bad plan blows executor tokens later. |
| One-shot content generation | No | Simple, no planning. Just run the executor and ship. |
| Multi-step debugging | Yes | Branches needed; advice reduces dead ends and wasted tool calls. |
| Daily standup agent | No | Same task every day; the advisor has nothing new to contribute. |
Rule of thumb: if you have caught yourself thinking "I wish this agent would stop and plan before it dives in," you are a candidate for the advisor pattern. If your agent runs the same few tool calls in the same order every time, you are not.
Setup with OpenClaw
The reference implementation lives in awesome-openclaw-agents under configs/advisor-hybrid. It ships two SOUL.md files and a small wiring config. Four steps.
cp configs/advisor-hybrid/EXECUTOR-SOUL.md ~/.openclaw/agents/executor/SOUL.mdcp configs/advisor-hybrid/ADVISOR.md ~/.openclaw/agents/advisor/SOUL.mdopenclaw agent --agent executor --advisor advisoropenclaw agent --agent executor --advisor advisor \
--message "Refactor the billing module to use the new Stripe adapter."The --advisor flag tells OpenClaw to expose a consult_advisor tool to the executor. The executor decides when to call it; the SOUL.md rules set the ceiling on how often.
The cost math
A worked example. These numbers are indicative — use your own pricing and your own token counts before making a procurement call.
Scenario: 1,000 tasks per day, average 15,000 input tokens and 3,000 output tokens per task. Opus 4.6 priced at $15 per million input tokens and $75 per million output tokens (indicative). GLM-5.1 executor cost rounded to a small fraction of that.
$225/day
1,000 tasks × 15k tokens × $15/M input.
$225/day
1,000 tasks × 3k tokens × $75/M output.
~$450/day
Indicative daily cost before the advisor pattern.
~$90/day
GLM-5.1 executor + ~100 advisor consults. Roughly 80% savings, indicative.
| Setup | Tasks/day | Advisor consults | Daily cost |
|---|---|---|---|
| Opus-only baseline | 1,000 | n/a | ~$450 |
| Advisor + Sonnet 4.6 executor | 1,000 | ~100 | ~$130 |
| Advisor + GLM-5.1 executor | 1,000 | ~100 | ~$90 |
Reddit threads on r/openclaw suggest savings in the 60-80% range in the wild. Your mileage depends entirely on how often your executor reaches for the advisor. The gotchas below explain why that number can drift.
The 4 gotchas that kill the savings
Every team that sets this up reports the same four failure modes. Handle them in the SOUL.md up front and you will not have to debug them in production.
1. Runaway consults
If the executor consults the advisor on every step, the savings evaporate — you are now paying for both models on every call. Hard cap the consult rate at roughly 10% of steps in the executor SOUL.md. Treat consult_advisor as a scarce resource the executor has to justify.
2. Advisor output bloat
Opus 4.6 will happily return 2,000-token plans if you do not restrict it. That is pure output cost with no marginal benefit — most of it never gets read. Cap advisor output around 400 tokens in the ADVISOR SOUL.md and ask for structured bullets, not prose.
3. Context drift
The advisor does not have the full execution context. If you pass it too little, the advice is generic. If you pass it too much, you pay Opus prices to re-read the executor's entire session. Pass only the delta since the last consult plus a compact rolling summary.
4. Caching
Keep the advisor's system prompt and stable context blocks identical across turns so prompt caching kicks in. A small change to the ADVISOR.md or a shifting header can invalidate the cache and double your advisor bill silently. Check your provider's cache hit metrics weekly.
Advisor pattern variants
Three common setups, in order of increasing cost aggressiveness.
Opus 4.6 + Sonnet 4.6
The maximal Anthropic stack. Both models live in the same provider so tool protocols and prompt caching are consistent. Lowest operational risk; smallest savings.
Opus 4.6 + GLM-5.1
Recommended by r/openclaw users for the best cost-to-quality balance. GLM-5.1 tops SWE-Bench Pro, so executor quality stays high while the executor bill drops sharply.
Opus 4.6 + Local Gemma 4
The most extreme variant. Executor runs on local hardware — effectively free after the machine is paid for. Best for long-running agents where latency tolerance is high and you own the box.
All three variants use the same wiring. The only thing that changes is the executor's config.yaml model line. The advisor SOUL.md, the consult budget, and the output caps stay identical.
Grab the reference implementation
The complete executor and advisor SOUL.md files, plus the wiring config and a sample task, live in the awesome-openclaw-agents repo. Anthropic's own Advisor strategy announcement is worth reading alongside it — the two approaches are compatible and the prompt engineering notes transfer directly.
Frequently Asked Questions
Is the advisor pattern Anthropic-specific?
No. The pattern is model-agnostic. Any two models can play the roles — one cheap and fast for execution, one strong for planning and review. Anthropic shipped a first-party version with their Advisor strategy, but r/openclaw users have been wiring up Opus 4.6 + GLM-5.1 hybrids for weeks. You can mix Opus with Sonnet, GLM-5.1, Gemma 4, or anything else that speaks the same tool protocol.
How do I measure if it's actually saving me money?
Track Opus input and output tokens per session and compare against your baseline (Opus-only). The hybrid should show a large drop in Opus tokens paired with a smaller increase in executor tokens. If your Opus token count does not go down meaningfully, your executor is probably consulting the advisor too often — cap consults at roughly 10% of steps and re-measure.
Can the executor be a local model?
Yes. Gemma 4 and Qwen 3 run well as local executors, and the advisor can still be a hosted Opus 4.6. This is the most extreme variant — effectively free execution with occasional paid advice. It works best for long-running agents where latency matters less than cost, and where you control the hardware.
Won't calling two models be slower?
In practice, no. The executor runs normally, and the advisor is only invoked at checkpoints — roughly 10% of steps in a well-tuned setup. Advisor calls can be batched or run in parallel with low-priority executor work. The user-visible latency is dominated by the executor, not the advisor.
Does this replace skills?
No. Skills are orthogonal. You can use skills inside either the executor or the advisor. A common setup is to give the executor the day-to-day skills it needs (search, file ops, shell) and give the advisor a planning skill for structured output. The advisor pattern is about cost routing; skills are about capability.
Ship a hybrid advisor agent in minutes
CrewClaw Team tier generates a complete advisor-hybrid deploy package. Executor SOUL.md, advisor SOUL.md, wiring config, Docker setup, and Telegram bot included. Opus advises, your chosen executor runs the work. $29 one-time. You own the files.
Deploy a Ready-Made AI Agent
Skip the setup. Pick a template and deploy in 60 seconds.