OpenClaw Free Models Guide: Which Model for Which Role
Running OpenClaw agents does not require paid APIs. Free, open-source models running locally through Ollama can handle most agent roles. The challenge is picking the right model for each role. This guide compares Qwen, Llama, Mistral, DeepSeek, and Gemma across five common agent roles and shows you how to configure each one in your SOUL.md.
Why Free Models Matter for OpenClaw Agents
A single OpenClaw agent running on Claude Sonnet costs roughly $0.15-$0.40 per conversation. If you run five agents handling 100 messages per day each, that is $75-$200 per month in API costs alone. Multiply that across a team of agents with heartbeat schedules, and costs climb fast.
Free models eliminate that cost entirely. Qwen 2.5 7B running on an M2 Mac Mini responds in under 2 seconds and costs nothing beyond electricity. The quality gap between free 7B models and cloud APIs has narrowed significantly since mid-2025. For structured agent tasks with clear SOUL.md rules, local models now handle 80-90% of workloads that previously required paid APIs.
Ollama Setup: Get Running in 5 Minutes
Ollama is the standard way to run free models locally. It handles model downloads, GPU memory management, and exposes an OpenAI-compatible API on localhost. OpenClaw connects to it like any other provider.
# Install Ollama (macOS / Linux)
curl -fsSL https://ollama.com/install.sh | sh
# Pull models for different agent roles
ollama pull qwen2.5:7b # Best all-rounder
ollama pull llama3.1:8b # General purpose
ollama pull deepseek-coder-v2 # Code specialist
ollama pull mistral:7b # Writing tasks
ollama pull gemma2:9b # Lightweight instruction following
# Verify Ollama is running
curl http://localhost:11434/api/tags
# Quick test
ollama run qwen2.5:7b "List 3 steps to deploy an AI agent"Now configure OpenClaw to use Ollama as a model provider. You can register multiple models and assign each one to a different agent.
# Register models in OpenClaw
openclaw models add ollama-qwen \
--provider ollama \
--endpoint http://localhost:11434 \
--model qwen2.5:7b
openclaw models add ollama-deepseek \
--provider ollama \
--endpoint http://localhost:11434 \
--model deepseek-coder-v2
openclaw models add ollama-mistral \
--provider ollama \
--endpoint http://localhost:11434 \
--model mistral:7b
# Set a default
openclaw models set-default ollama-qwen
# Test the connection
openclaw models test ollama-qwenTip: On Apple Silicon Macs, Ollama uses unified memory. An M2 with 16 GB can run a 7B model at 40+ tokens/sec. An M2 Pro with 32 GB handles 14B-34B models comfortably. No external GPU needed.
Free Models Compared by Agent Role
Not every model is good at every task. The table below rates each model family across five common OpenClaw agent roles based on real-world testing. Ratings are for the 7B-9B parameter variants unless noted otherwise.
| Model | Writer | Coder | Analyst | PM | Scout |
|---|---|---|---|---|---|
| Qwen 2.5 7B | A | A | A | A | B+ |
| Llama 3.1 8B | B+ | B+ | B | B+ | B+ |
| Mistral 7B | A+ | B | B | B | B |
| DeepSeek Coder V2 | C | A+ | B+ | C | C |
| Gemma 2 9B | B+ | B | B+ | B | A |
| Phi-3 Mini | B | B | C | C | B |
Ratings based on: instruction adherence (does the model follow SOUL.md rules consistently?), output structure (does it produce the expected format?), and task accuracy (does it complete the assigned job correctly?). Tested on OpenClaw v0.8+ with Ollama 0.3+.
Best Free Model for Each Agent Role
Here is the specific model recommendation for each role, with the reasoning behind each pick.
Writer (Echo-type agents)
Mistral 7BMistral produces the most natural prose among 7B models. It handles tone instructions well, avoids the robotic patterns common in Llama writing, and consistently follows editorial rules from SOUL.md. For long-form content, it maintains coherence across 1,500+ word articles.
Alternative: Qwen 2.5 7B for multilingual writing or when you need structured output (JSON, tables) alongside prose.
ollama pull mistral:7bCoder (Builder-type agents)
DeepSeek Coder V2Purpose-built for code generation. It handles Python, TypeScript, SQL, and shell scripts with higher accuracy than general-purpose models. Understands code context, writes tests when asked, and follows coding style rules from SOUL.md. The 16B variant is worth the extra memory if you have it.
Alternative: Qwen 2.5 Coder 7B if you need a smaller footprint, or Llama 3.1 8B for agents that mix code with non-code tasks.
ollama pull deepseek-coder-v2Analyst (Radar-type agents)
Qwen 2.5 7BQwen excels at structured reasoning and data interpretation. It handles JSON input/output cleanly, follows multi-step analysis instructions, and produces consistent report formats. Its 128K context window means it can process large datasets without truncation.
Alternative: Gemma 2 9B for simpler analytics tasks where you prioritize speed over context length.
ollama pull qwen2.5:7bPM (Orion-type agents)
Qwen 2.5 7BProject manager agents need to coordinate tasks, parse status updates, and make routing decisions. Qwen 2.5 handles tool calling and structured output reliably, both critical for PM agents that delegate work to other agents. It follows conditional logic in SOUL.md rules better than alternatives at this size.
Alternative: Llama 3.1 8B if your PM agent focuses more on natural conversation than structured task management.
ollama pull qwen2.5:7bScout (Research agents)
Gemma 2 9BScout agents summarize web pages, extract key information, and filter results. Gemma 2 is fast, concise, and follows extraction rules tightly. It does not over-explain or add unnecessary context, which is exactly what you want from a research agent that feeds data to other agents.
Alternative: Qwen 2.5 7B for scouts that need to handle longer documents or produce more detailed research summaries.
ollama pull gemma2:9bSOUL.md Configuration for Free Models
Free models work best with SOUL.md files that are concise and explicit. Cloud models can infer intent from vague instructions. Local 7B models need clear, numbered rules. Here are production-tested SOUL.md patterns for each role.
# Echo — Content Writer
## Role
You write articles and blog posts. You follow
editorial rules exactly. You never add fluff.
## Rules
1. Respond in English only
2. Keep paragraphs under 4 sentences
3. Use active voice exclusively
4. Start articles with a specific claim or stat
5. Never use these phrases: "In today's world",
"It's important to note", "In conclusion"
6. Include the target keyword in the first
100 words and at least one H2 heading
7. Target 1,400-1,800 words per article
8. End with a single actionable next step
## Output Format
- Meta description (max 155 characters)
- H1 title
- Body with H2 sections
- One concluding paragraph with CTA
## Tone
Direct, knowledgeable, conversational.
Write like explaining to a smart colleague.# Builder — Code Agent
## Role
You write, review, and debug code. You receive
task descriptions and produce working code
with tests.
## Rules
1. Respond with code blocks only — no
explanations unless explicitly asked
2. Use TypeScript by default
3. Add error handling to every function
4. Write at least one test per function
5. Follow the project's existing code style
6. If the task is ambiguous, list assumptions
before writing code
7. Never use deprecated APIs
## Output Format
```typescript
// filename: path/to/file.ts
// Your code here
```
## Tools
- Use File to read existing code before editing
- Use Terminal to run tests after writing code# Radar — Data Analyst
## Role
You analyze data and produce structured reports.
You read JSON/CSV inputs and output actionable
insights.
## Rules
1. Always output valid JSON when asked for data
2. Include specific numbers — never say "many"
or "significant" without a number
3. Sort findings by impact (highest first)
4. Flag anomalies with a confidence score
5. Keep summaries under 200 words
6. When comparing periods, show absolute and
percentage change
## Output Format
{
"summary": "One-line finding",
"metrics": [...],
"recommendations": [...],
"anomalies": [...]
}
## Tools
- Use File to read data files from workspace/
- Write reports to workspace/reports/Key principle: Keep SOUL.md files under 15 rules for local models. Cloud models handle 30+ rules gracefully, but 7B models start dropping rules after 15-20 instructions. Prioritize the rules that matter most for your agent's output quality.
Performance Tips for Free Models
Running free models well is not just about picking the right one. Configuration matters. These tips come from running production OpenClaw agents on local models.
Use Q4_K_M quantization. Ollama defaults to Q4 quantization for most models, which is the right balance between quality and speed. Avoid Q2 (too much quality loss) and FP16 (too slow and memory-hungry unless you have 48+ GB). For most agent tasks, Q4_K_M is indistinguishable from full precision.
Set context length explicitly. In your OpenClaw config.json, set context_length to what your agent actually needs, not the model maximum. A support bot that handles short questions works fine with 4096 tokens. Setting it to 128K wastes memory and slows down inference. Match context length to your use case.
Keep one model loaded per role. Ollama keeps the last-used model in memory. Switching between models causes a reload that takes 5-15 seconds. If you run multiple agents, assign models so that the most active agents share the same model. Two agents on Qwen 2.5 7B are faster than one on Qwen and one on Mistral, because Ollama keeps Qwen hot in memory.
Use temperature 0.3-0.5 for structured tasks. Free models produce more consistent, rule-following output at lower temperatures. Set temperature to 0.3 for analyst and PM agents that need predictable JSON output. Use 0.7-0.8 for writer agents where you want more natural variation. Never go above 1.0 with local models.
Run Ollama with OLLAMA_NUM_PARALLEL. If multiple agents send requests simultaneously, set the OLLAMA_NUM_PARALLEL environment variable to 2 or 4. This lets Ollama handle concurrent requests instead of queuing them. On machines with 32+ GB memory, parallel inference keeps all agents responsive.
# config.json — optimized for local models
{
"models": {
"ollama-qwen": {
"provider": "ollama",
"endpoint": "http://localhost:11434",
"model": "qwen2.5:7b",
"temperature": 0.3,
"context_length": 8192,
"timeout": 120
},
"ollama-mistral": {
"provider": "ollama",
"endpoint": "http://localhost:11434",
"model": "mistral:7b",
"temperature": 0.7,
"context_length": 8192,
"timeout": 120
},
"ollama-deepseek": {
"provider": "ollama",
"endpoint": "http://localhost:11434",
"model": "deepseek-coder-v2",
"temperature": 0.3,
"context_length": 16384,
"timeout": 180
}
},
"default_model": "ollama-qwen"
}
# Assign models to agents
openclaw agents update echo --model ollama-mistral
openclaw agents update builder --model ollama-deepseek
openclaw agents update radar --model ollama-qwen
openclaw agents update orion --model ollama-qwen
openclaw agents update scout --model ollama-qwenWhen to Use Cloud Models Instead
Free models handle most routine agent work. But some tasks still benefit from cloud models. Knowing the boundary saves money without sacrificing quality where it matters.
Free Local Models
- FAQ and support responses
- Content drafting and blog posts
- Code generation and debugging
- Data formatting and report generation
- Status updates and routine monitoring
- Single-turn structured tasks
Keep Cloud Models For
- Multi-agent orchestration decisions
- Long-context analysis (over 16K tokens)
- Nuanced strategy and planning
- Complex multi-step reasoning chains
- Ambiguous tasks with edge cases
- Final QA review before publishing
The hybrid approach works well for most teams. Run your orchestrator (Orion-type PM agent) on Claude Sonnet for reliable decision-making. Run all specialist agents on free local models for speed and zero cost. The PM routes tasks, the specialists execute them.
Test Agent Roles in the CrewClaw Playground
Not sure which role fits your use case? The CrewClaw Agent Playground lets you pick a role, customize the SOUL.md, and test how different configurations behave before deploying to your local setup. Build a Writer, Coder, Analyst, PM, or Scout agent and download the complete config package with model recommendations included.
Role Templates
Pre-built SOUL.md configs optimized for each agent role. Rules tuned for 7B model constraints.
Model Config Included
Each package includes config.json with Ollama provider settings. Swap the model name and you are running.
Team Builder
Assemble a multi-agent team with the right model for each role. Download the entire workspace.
Related Guides
OpenClaw vs CrewAI
Compare multi-agent frameworks and find which fits your stack
OpenClaw GitHub Repository Guide
Clone, install, and understand the full OpenClaw codebase
Build an SEO Agent
Create an agent that researches keywords and auto-publishes articles
Run OpenClaw 24/7 on Mac
Keep agents running when your Mac sleeps or locks
Frequently Asked Questions
Can I run OpenClaw agents entirely for free with local models?
Yes. OpenClaw connects to Ollama the same way it connects to any cloud provider. Once you install Ollama and pull a model, the agent runs without any API keys or ongoing costs. The only expense is the electricity to run your machine. A Mac Mini with Apple Silicon or a Linux box with a mid-range GPU can handle multiple agents simultaneously on free models like Llama 3.1, Qwen 2.5, or Mistral.
Which single free model works best if I can only run one?
Qwen 2.5 72B if you have the hardware (48+ GB VRAM or unified memory). It handles writing, reasoning, and code well enough for most agent roles. If you are limited to 8 GB of memory, Qwen 2.5 7B is the strongest all-rounder at that size. It outperforms Llama 3.1 8B on instruction following and structured output, which are the two things that matter most for SOUL.md-driven agents.
How do I switch an existing agent from a cloud model to a free local model?
You do not need to change your SOUL.md at all. The agent identity and rules are model-independent. Just update the model assignment: run openclaw agents update your-agent --model ollama-qwen, where ollama-qwen is a model you configured in config.json pointing to your local Ollama endpoint. The agent will start using the local model on the next message. Test with a few sample tasks before switching production agents.
Do free models support tool use and function calling in OpenClaw?
It depends on the model. Qwen 2.5 and Llama 3.1 both support tool use natively when served through Ollama with the correct template. Mistral 7B has basic tool support. DeepSeek Coder V2 handles code-related tool calls well. Gemma 2 does not support structured tool calling reliably. If your agent relies heavily on tools, stick with Qwen 2.5 or Llama 3.1 as your local model.
Can I mix free local models with paid cloud models in the same team?
Yes, and this is the recommended approach for production teams. Assign free local models to high-volume, routine agents like support bots, content drafters, and data processors. Keep a cloud model like Claude Sonnet for your orchestrator or strategist agent that needs complex reasoning. OpenClaw lets you assign different models per agent in config.json, so each agent uses whatever model fits its role best.
What happens when a free model produces lower quality output than expected?
Three things to check. First, simplify your SOUL.md rules. Local models handle 10-15 clear rules well but struggle with 30+ conditional rules. Second, check context length. If your agent conversations regularly exceed 4K tokens, switch to a model with a larger context window like Qwen 2.5 (128K) or Mistral (32K). Third, try a larger model. Moving from 7B to 14B often solves quality issues without needing cloud APIs.
Build Free-Model Agent Configs in the Playground
Pick your agent roles, get SOUL.md templates optimized for local models, and download a complete workspace with Ollama config included. Test before you deploy.