The OpenClaw Advisor Pattern: Cut Agent Costs 80% Without Losing Quality

The pattern in one diagram

The advisor pattern splits a single agent into two cooperating models. A cheap executor handles the bulk of the work. A strong advisor is consulted only at decision points — when the executor hits a branch, a plan, or an unfamiliar failure mode.

Advisor pattern — two models, one agent

User request
     │
     ▼
┌─────────────┐   consult (rare)   ┌──────────────┐
│  Executor   │ ─────────────────▶ │   Advisor    │
│  (cheap)    │                    │  (Opus 4.6)  │
│  GLM-5.1 /  │ ◀───────────────── │              │
│  Sonnet 4.6 │    plan / review   │              │
└─────────────┘                    └──────────────┘
     │
     ▼
Final response

The executor owns the session. It drives the tool loop, handles the user-visible back-and-forth, and only reaches up to the advisor when it needs a plan, a sanity check, or a hard decision it does not want to make alone.

Why it works

Most steps in an agent session are boring. Reading a file, running a grep, parsing a JSON response, formatting output — these do not need a frontier model. A well-tuned cheap model handles them at a tiny fraction of the cost and often at higher speed.

The steps that actually need intelligence are rare. Choosing which refactor strategy to take. Deciding whether a failing test is a bug or a test bug. Drafting a migration plan across five services. These decisions set the direction of everything that follows — and getting them wrong is expensive in executor tokens even if the model is cheap.

The analogy that keeps coming up on r/openclaw: a junior engineer with a senior mentor on Slack. The junior does the work. When they hit something they are not sure about, they paste a summary into Slack and wait for the senior's call. The senior is not pair-programming every keystroke — that would defeat the point of having a junior at all.

The pattern works because agent sessions have a heavy-tailed distribution of step difficulty. A small fraction of steps matter a lot. Route those to the advisor, and let the executor handle the rest.

When to use it

The pattern is not free — wiring it up adds complexity, and for simple agents it adds nothing. Here is the short decision table.

Scenario	Use advisor?	Why
Long-horizon refactor	Yes	Complex planning needed upfront; bad plan blows executor tokens later.
One-shot content generation	No	Simple, no planning. Just run the executor and ship.
Multi-step debugging	Yes	Branches needed; advice reduces dead ends and wasted tool calls.
Daily standup agent	No	Same task every day; the advisor has nothing new to contribute.

Rule of thumb: if you have caught yourself thinking "I wish this agent would stop and plan before it dives in," you are a candidate for the advisor pattern. If your agent runs the same few tool calls in the same order every time, you are not.

Setup with OpenClaw

The reference implementation lives in awesome-openclaw-agents under configs/advisor-hybrid. It ships two SOUL.md files and a small wiring config. Four steps.

1. Install the executor agent

cp configs/advisor-hybrid/EXECUTOR-SOUL.md ~/.openclaw/agents/executor/SOUL.md

2. Install the advisor agent

cp configs/advisor-hybrid/ADVISOR.md ~/.openclaw/agents/advisor/SOUL.md

3. Wire the executor to consult the advisor at checkpoints

openclaw agent --agent executor --advisor advisor

4. Run a sample task

openclaw agent --agent executor --advisor advisor \
  --message "Refactor the billing module to use the new Stripe adapter."

The --advisor flag tells OpenClaw to expose a consult_advisor tool to the executor. The executor decides when to call it; the SOUL.md rules set the ceiling on how often.

The cost math

A worked example. These numbers are indicative — use your own pricing and your own token counts before making a procurement call.

Scenario: 1,000 tasks per day, average 15,000 input tokens and 3,000 output tokens per task. Opus 4.6 priced at $15 per million input tokens and $75 per million output tokens (indicative). GLM-5.1 executor cost rounded to a small fraction of that.

Opus-only input

$225/day

1,000 tasks × 15k tokens × $15/M input.

Opus-only output

$225/day

1,000 tasks × 3k tokens × $75/M output.

Opus-only total

~$450/day

Indicative daily cost before the advisor pattern.

Advisor hybrid total

~$90/day

GLM-5.1 executor + ~100 advisor consults. Roughly 80% savings, indicative.

Setup	Tasks/day	Advisor consults	Daily cost
Opus-only baseline	1,000	n/a	~$450
Advisor + Sonnet 4.6 executor	1,000	~100	~$130
Advisor + GLM-5.1 executor	1,000	~100	~$90

Reddit threads on r/openclaw suggest savings in the 60-80% range in the wild. Your mileage depends entirely on how often your executor reaches for the advisor. The gotchas below explain why that number can drift.

The 4 gotchas that kill the savings

Every team that sets this up reports the same four failure modes. Handle them in the SOUL.md up front and you will not have to debug them in production.

1. Runaway consults

If the executor consults the advisor on every step, the savings evaporate — you are now paying for both models on every call. Hard cap the consult rate at roughly 10% of steps in the executor SOUL.md. Treat consult_advisor as a scarce resource the executor has to justify.

2. Advisor output bloat

Opus 4.6 will happily return 2,000-token plans if you do not restrict it. That is pure output cost with no marginal benefit — most of it never gets read. Cap advisor output around 400 tokens in the ADVISOR SOUL.md and ask for structured bullets, not prose.

3. Context drift

The advisor does not have the full execution context. If you pass it too little, the advice is generic. If you pass it too much, you pay Opus prices to re-read the executor's entire session. Pass only the delta since the last consult plus a compact rolling summary.

4. Caching

Keep the advisor's system prompt and stable context blocks identical across turns so prompt caching kicks in. A small change to the ADVISOR.md or a shifting header can invalidate the cache and double your advisor bill silently. Check your provider's cache hit metrics weekly.

Advisor pattern variants

Three common setups, in order of increasing cost aggressiveness.

Opus 4.6 + Sonnet 4.6

The maximal Anthropic stack. Both models live in the same provider so tool protocols and prompt caching are consistent. Lowest operational risk; smallest savings.

Opus 4.6 + GLM-5.1

Recommended by r/openclaw users for the best cost-to-quality balance. GLM-5.1 tops SWE-Bench Pro, so executor quality stays high while the executor bill drops sharply.

Opus 4.6 + Local Gemma 4

The most extreme variant. Executor runs on local hardware — effectively free after the machine is paid for. Best for long-running agents where latency tolerance is high and you own the box.

All three variants use the same wiring. The only thing that changes is the executor's config.yaml model line. The advisor SOUL.md, the consult budget, and the output caps stay identical.

Grab the reference implementation

The complete executor and advisor SOUL.md files, plus the wiring config and a sample task, live in the awesome-openclaw-agents repo. Anthropic's own Advisor strategy announcement is worth reading alongside it — the two approaches are compatible and the prompt engineering notes transfer directly.

configs/advisor-hybrid on GitHub

Executor + advisor SOUL.md, wiring config, sample tasks

Build a hybrid agent on CrewClaw

Team tier ships the advisor hybrid bundle by default

Frequently Asked Questions

Is the advisor pattern Anthropic-specific?

No. The pattern is model-agnostic. Any two models can play the roles — one cheap and fast for execution, one strong for planning and review. Anthropic shipped a first-party version with their Advisor strategy, but r/openclaw users have been wiring up Opus 4.6 + GLM-5.1 hybrids for weeks. You can mix Opus with Sonnet, GLM-5.1, Gemma 4, or anything else that speaks the same tool protocol.

How do I measure if it's actually saving me money?

Track Opus input and output tokens per session and compare against your baseline (Opus-only). The hybrid should show a large drop in Opus tokens paired with a smaller increase in executor tokens. If your Opus token count does not go down meaningfully, your executor is probably consulting the advisor too often — cap consults at roughly 10% of steps and re-measure.

Can the executor be a local model?

Yes. Gemma 4 and Qwen 3 run well as local executors, and the advisor can still be a hosted Opus 4.6. This is the most extreme variant — effectively free execution with occasional paid advice. It works best for long-running agents where latency matters less than cost, and where you control the hardware.

Won't calling two models be slower?

In practice, no. The executor runs normally, and the advisor is only invoked at checkpoints — roughly 10% of steps in a well-tuned setup. Advisor calls can be batched or run in parallel with low-priority executor work. The user-visible latency is dominated by the executor, not the advisor.

Does this replace skills?

No. Skills are orthogonal. You can use skills inside either the executor or the advisor. A common setup is to give the executor the day-to-day skills it needs (search, file ops, shell) and give the advisor a planning skill for structured output. The advisor pattern is about cost routing; skills are about capability.

Ship a hybrid advisor agent in minutes

CrewClaw Team tier generates a complete advisor-hybrid deploy package. Executor SOUL.md, advisor SOUL.md, wiring config, Docker setup, and Telegram bot included. Opus advises, your chosen executor runs the work. $29 one-time. You own the files.

GLM-5.1 Executor Guide Build Your Hybrid Agent