How to Reduce AI Agent Cost by 16x: A Real-World Optimization Guide

The $0.40 Query Problem

We built a Telegram bot that runs analytics scripts on demand. You message it “funnel today” and it runs your Mixpanel funnel script, returning the raw output. Simple, right?

The agent uses OpenClaw under the hood — a SOUL.md config tells it which scripts to run, and it matches keywords to shell commands. The LLM does almost nothing: read the message, pick the right script, return the output.

Yet every query cost $0.40. For a bot that should be running 10-20 queries a day, that is $4-8 per day, or $120-240 per month. For running shell scripts.

Where the Money Actually Goes

Before you can optimize, you need to understand LLM pricing. Every API call has four cost components, and most people only think about two of them.

Component	What It Is	The Trap
input	Fresh tokens sent to the model	Usually tiny (your actual message)
output	Tokens the model generates	5x more expensive than input
cache_write	System prompt stored on first call	The silent budget killer
cache_read	Cached prompt reused on later calls	10-15x cheaper than write

The key insight: cache_write is what kills you on cold starts. If your system prompt is 100K tokens and you send one query every 10 minutes (outside the 5-minute cache TTL), every single query pays the full cache_write price.

Round 1: Pick the Right Model ($0.40 → $0.20)

Our first mistake was using Claude Sonnet for a task that needed zero reasoning. The agent literally matches a keyword and runs a shell command. Haiku can do this perfectly.

Model	Input	Output	Cache Write	Cache Read
Opus	$15.00	$75.00	$18.75	$1.50
Sonnet	$3.00	$15.00	$3.75	$0.30
Haiku	$1.00	$5.00	$1.25	$0.10

All prices per million tokens. Sonnet to Haiku is a 3x reduction on input and cache costs. For a script-runner agent, this is the easiest win.

Round 2: Shrink the System Prompt ($0.20 → $0.14)

Our agent loaded two files on every request: SOUL.md (the agent config) and TOOLS.md (the command reference). That is two sequential LLM calls — one to read SOUL.md, then the model realizes it needs TOOLS.md and reads that too.

The fix was trivial: inline TOOLS.md directly into SOUL.md. One file, one read, fewer tokens wasted on the tool-call overhead. This saved about 30% on per-query cost.

Lesson: Every file your agent reads is a tool call

Each tool call adds tokens for the call itself, the response formatting, and an extra LLM turn. If a file is always needed, inline it into the system prompt. One file that is always loaded is better than two files that require a decision.

Round 3: The Session Lock Bug ($0.14 → Still $0.14)

After switching to Haiku in the config, the cost did not change. We were still paying $0.14 per query. Something was wrong.

We dug into the session files and found the problem: the session was locked to the old model. When the agent framework creates a session, it records which model was active at that time. Changing the config does not update existing sessions — they keep using the original model until the session is cleared.

Our config said Haiku. Our session was running Sonnet. Every query was billed at Sonnet rates.

Watch out: Config changes might not apply to existing sessions

Many agent frameworks persist session state. If you change the model in your config, verify the actual model in your API responses. Look for a model field in the response — that is what you are actually paying for. Clear old sessions after any model change.

Round 4: Context Bloat ($0.15 → $0.024)

After fixing the model lock, we cleared the metrics session and confirmed Haiku was running. But costs were still $0.15 on cold starts. The culprit: 116,000 tokens of accumulated session history.

Every conversation turn — every query, every script output, every response — was being stored in the session and sent with every new request. After weeks of use, the context had grown to 116K tokens. Even at Haiku's cheap cache_write rate ($1.25/MTok), writing 116K tokens to cache costs $0.145 per cold start.

The fix: clear the session. The context dropped from 116K to 14K tokens (just the system prompt and framework overhead). Cold start cost went from $0.145 to $0.019.

116K

tokens before cleanup

$0.15/query cold start

14K

tokens after cleanup

$0.02/query cold start

Bonus: 9 Skills Nobody Asked For

While investigating, we discovered the agent framework was loading 9 default skills into the system prompt: weather, iMessage, Slack, GitHub, and more. Our bot runs analytics scripts — it does not need to check the weather or send iMessages.

Each skill adds its description and instructions to the context window. Nine unused skills meant hundreds of wasted tokens on every single request. Removing them would shrink the context even further and save a few more cents per query.

The Full Optimization Timeline

Change	Cost/Query	Reduction
Starting point (Sonnet)	$0.400	—
Switch to Haiku + inline TOOLS.md	$0.140	-65%
Fix session model lock	$0.150	(revealed true cost)
Clear 116K session history	$0.024	-84%
Warm cache (repeat queries)	$0.005	-99%

From $0.40 to $0.024 per query. That is 16x cheaper. At 20 queries per day, it went from $240/month to $15/month. With warm cache, it could be as low as $3/month.

Your AI Agent Cost Checklist

Use this checklist to audit your own agent costs. These apply to any LLM-powered agent, regardless of framework.

Check the actual model in API responses

Your config might say one thing, but the session might use another. Verify the model field in the raw API response.

Measure your context window size

Look at totalTokens in usage data. If it's over 20K for a simple agent, you have bloat.

Clear session history periodically

Every conversation turn adds tokens. For stateless tasks (like running scripts), clear sessions after each use or on a schedule.

Remove unused tools and skills

Every tool definition in your system prompt costs tokens. Only include what the agent actually needs.

Inline always-needed files

If your agent reads the same file on every request, put it in the system prompt. Saves a tool-call round trip.

Use the cheapest model that works

Haiku handles keyword matching and simple routing. Sonnet handles reasoning. Reserve Opus for tasks that truly need it.

Understand your cache pattern

If queries come in bursts, cache stays warm and costs drop. If they're spread out, every query is a cold start. Design accordingly.

Why This Matters for Agent Deployment

Running an AI agent is not just about getting the SOUL.md right. Session management, model selection, context optimization, and caching strategy all directly impact your monthly bill. A misconfigured agent can cost 16x more than it should.

This is why we built CrewClaw — to handle these deployment details so you can focus on what your agent does, not how much it costs. Every agent deployed through CrewClaw includes optimized Docker configurations, proper session management, and model recommendations based on your agent's actual workload.

Frequently Asked Questions

How much does it cost to run an AI agent?

It depends on the model, context size, and caching. A simple agent on Claude Haiku with a clean 14K token context costs about $0.02-0.03 per query. The same agent on Sonnet costs $0.07, and on Opus $0.36. Without optimization, costs can balloon to $0.15-0.50+ per query due to session accumulation and cache misses.

What is prompt caching and how does it reduce cost?

Prompt caching stores your system prompt on the provider's servers so you don't pay full price to send it every time. The first request (cold cache) pays a cache write fee. Subsequent requests within the TTL window (typically 5 minutes for Anthropic) pay a much cheaper cache read fee — often 10-15x less than the write cost.

Why does my AI agent cost more than expected?

The three most common reasons: (1) Session history accumulates, sending thousands of old messages with every new query. (2) Your session is locked to an expensive model even though you changed the config. (3) Unused tools, skills, and system files inflate your context window. Check your actual token usage per request to diagnose.

Which model should I use for an AI agent?

Match the model to the task. For simple command routing and script execution, Haiku is more than enough at $0.02/query. For complex reasoning and multi-step planning, Sonnet offers the best value at $0.07/query. Reserve Opus ($0.36/query) for tasks that genuinely need the most capable model. Most agent tasks don't need Opus.

How do I reduce my Claude API costs?

Four steps: (1) Use the smallest model that works — Haiku for simple tasks. (2) Keep your context small by clearing old session history. (3) Take advantage of prompt caching by keeping your system prompt stable. (4) Remove unused tools and skills from your agent configuration to shrink the context window.

Related Guides

OpenClaw Setup Guide 2026

Install, configure, and run your first agent

OpenClaw Automation Guide

Build AI workflows that run 24/7

Slack & Telegram Integration

Connect your agents to messaging platforms

SOUL.md: Create an AI Agent

Define your agent with a single markdown file

How We Reduced Our AI Agent Cost by 16x