OpenClaw Free Models Guide: Which Model for Which Role (2026)

Q: Can I run OpenClaw agents entirely for free with local models?

Yes. OpenClaw connects to Ollama the same way it connects to any cloud provider. Once you install Ollama and pull a model, the agent runs without any API keys or ongoing costs. The only expense is the electricity to run your machine. A Mac Mini with Apple Silicon or a Linux box with a mid-range GPU can handle multiple agents simultaneously on free models like Llama 3.1, Qwen 2.5, or Mistral.

Q: Which single free model works best if I can only run one?

Qwen 2.5 72B if you have the hardware (48+ GB VRAM or unified memory). It handles writing, reasoning, and code well enough for most agent roles. If you are limited to 8 GB of memory, Qwen 2.5 7B is the strongest all-rounder at that size. It outperforms Llama 3.1 8B on instruction following and structured output, which are the two things that matter most for SOUL.md-driven agents.

Q: How do I switch an existing agent from a cloud model to a free local model?

You do not need to change your SOUL.md at all. The agent identity and rules are model-independent. Just update the model assignment: run openclaw agents update your-agent --model ollama-qwen, where ollama-qwen is a model you configured in config.json pointing to your local Ollama endpoint. The agent will start using the local model on the next message. Test with a few sample tasks before switching production agents.

Q: Do free models support tool use and function calling in OpenClaw?

It depends on the model. Qwen 2.5 and Llama 3.1 both support tool use natively when served through Ollama with the correct template. Mistral 7B has basic tool support. DeepSeek Coder V2 handles code-related tool calls well. Gemma 2 does not support structured tool calling reliably. If your agent relies heavily on tools, stick with Qwen 2.5 or Llama 3.1 as your local model.

Q: Can I mix free local models with paid cloud models in the same team?

Yes, and this is the recommended approach for production teams. Assign free local models to high-volume, routine agents like support bots, content drafters, and data processors. Keep a cloud model like Claude Sonnet for your orchestrator or strategist agent that needs complex reasoning. OpenClaw lets you assign different models per agent in config.json, so each agent uses whatever model fits its role best.

Q: What happens when a free model produces lower quality output than expected?

Three things to check. First, simplify your SOUL.md rules. Local models handle 10-15 clear rules well but struggle with 30+ conditional rules. Second, check context length. If your agent conversations regularly exceed 4K tokens, switch to a model with a larger context window like Qwen 2.5 (128K) or Mistral (32K). Third, try a larger model. Moving from 7B to 14B often solves quality issues without needing cloud APIs.

Why Free Models Matter for OpenClaw Agents

A single OpenClaw agent running on Claude Sonnet costs roughly $0.15-$0.40 per conversation. If you run five agents handling 100 messages per day each, that is $75-$200 per month in API costs alone. Multiply that across a team of agents with heartbeat schedules, and costs climb fast.

Free models eliminate that cost entirely. Qwen 2.5 7B running on an M2 Mac Mini responds in under 2 seconds and costs nothing beyond electricity. The quality gap between free 7B models and cloud APIs has narrowed significantly since mid-2025. For structured agent tasks with clear SOUL.md rules, local models now handle 80-90% of workloads that previously required paid APIs.

monthly API cost

free model families

<2s

response on Apple Silicon

128K

max context (Qwen 2.5)

Ollama Setup: Get Running in 5 Minutes

Ollama is the standard way to run free models locally. It handles model downloads, GPU memory management, and exposes an OpenAI-compatible API on localhost. OpenClaw connects to it like any other provider.

# Install Ollama (macOS / Linux)
curl -fsSL https://ollama.com/install.sh | sh

# Pull models for different agent roles
ollama pull qwen2.5:7b          # Best all-rounder
ollama pull llama3.1:8b          # General purpose
ollama pull deepseek-coder-v2    # Code specialist
ollama pull mistral:7b           # Writing tasks
ollama pull gemma2:9b            # Lightweight instruction following

# Verify Ollama is running
curl http://localhost:11434/api/tags

# Quick test
ollama run qwen2.5:7b "List 3 steps to deploy an AI agent"

Now configure OpenClaw to use Ollama as a model provider. You can register multiple models and assign each one to a different agent.

# Register models in OpenClaw
openclaw models add ollama-qwen \
  --provider ollama \
  --endpoint http://localhost:11434 \
  --model qwen2.5:7b

openclaw models add ollama-deepseek \
  --provider ollama \
  --endpoint http://localhost:11434 \
  --model deepseek-coder-v2

openclaw models add ollama-mistral \
  --provider ollama \
  --endpoint http://localhost:11434 \
  --model mistral:7b

# Set a default
openclaw models set-default ollama-qwen

# Test the connection
openclaw models test ollama-qwen

Tip: On Apple Silicon Macs, Ollama uses unified memory. An M2 with 16 GB can run a 7B model at 40+ tokens/sec. An M2 Pro with 32 GB handles 14B-34B models comfortably. No external GPU needed.

Free Models Compared by Agent Role

Not every model is good at every task. The table below rates each model family across five common OpenClaw agent roles based on real-world testing. Ratings are for the 7B-9B parameter variants unless noted otherwise.

Model	Writer	Coder	Analyst	PM	Scout
Qwen 2.5 7B	A	A	A	A	B+
Llama 3.1 8B	B+	B+	B	B+	B+
Mistral 7B	A+	B	B	B	B
DeepSeek Coder V2	C	A+	B+	C	C
Gemma 2 9B	B+	B	B+	B	A
Phi-3 Mini	B	B	C	C	B

Ratings based on: instruction adherence (does the model follow SOUL.md rules consistently?), output structure (does it produce the expected format?), and task accuracy (does it complete the assigned job correctly?). Tested on OpenClaw v0.8+ with Ollama 0.3+.

Best Free Model for Each Agent Role

Here is the specific model recommendation for each role, with the reasoning behind each pick.

Writer (Echo-type agents)

Mistral 7B

Mistral produces the most natural prose among 7B models. It handles tone instructions well, avoids the robotic patterns common in Llama writing, and consistently follows editorial rules from SOUL.md. For long-form content, it maintains coherence across 1,500+ word articles.

Alternative: Qwen 2.5 7B for multilingual writing or when you need structured output (JSON, tables) alongside prose.

ollama pull mistral:7b

Coder (Builder-type agents)

DeepSeek Coder V2

Purpose-built for code generation. It handles Python, TypeScript, SQL, and shell scripts with higher accuracy than general-purpose models. Understands code context, writes tests when asked, and follows coding style rules from SOUL.md. The 16B variant is worth the extra memory if you have it.

Alternative: Qwen 2.5 Coder 7B if you need a smaller footprint, or Llama 3.1 8B for agents that mix code with non-code tasks.

ollama pull deepseek-coder-v2

Analyst (Radar-type agents)

Qwen 2.5 7B

Qwen excels at structured reasoning and data interpretation. It handles JSON input/output cleanly, follows multi-step analysis instructions, and produces consistent report formats. Its 128K context window means it can process large datasets without truncation.

Alternative: Gemma 2 9B for simpler analytics tasks where you prioritize speed over context length.

ollama pull qwen2.5:7b

PM (Orion-type agents)

Qwen 2.5 7B

Project manager agents need to coordinate tasks, parse status updates, and make routing decisions. Qwen 2.5 handles tool calling and structured output reliably, both critical for PM agents that delegate work to other agents. It follows conditional logic in SOUL.md rules better than alternatives at this size.

Alternative: Llama 3.1 8B if your PM agent focuses more on natural conversation than structured task management.

ollama pull qwen2.5:7b

Scout (Research agents)

Gemma 2 9B

Scout agents summarize web pages, extract key information, and filter results. Gemma 2 is fast, concise, and follows extraction rules tightly. It does not over-explain or add unnecessary context, which is exactly what you want from a research agent that feeds data to other agents.

Alternative: Qwen 2.5 7B for scouts that need to handle longer documents or produce more detailed research summaries.

ollama pull gemma2:9b

SOUL.md Configuration for Free Models

Free models work best with SOUL.md files that are concise and explicit. Cloud models can infer intent from vague instructions. Local 7B models need clear, numbered rules. Here are production-tested SOUL.md patterns for each role.

agents/echo-local/SOUL.md — Writer on Mistral 7B

# Echo — Content Writer

## Role
You write articles and blog posts. You follow
editorial rules exactly. You never add fluff.

## Rules
1. Respond in English only
2. Keep paragraphs under 4 sentences
3. Use active voice exclusively
4. Start articles with a specific claim or stat
5. Never use these phrases: "In today's world",
   "It's important to note", "In conclusion"
6. Include the target keyword in the first
   100 words and at least one H2 heading
7. Target 1,400-1,800 words per article
8. End with a single actionable next step

## Output Format
- Meta description (max 155 characters)
- H1 title
- Body with H2 sections
- One concluding paragraph with CTA

## Tone
Direct, knowledgeable, conversational.
Write like explaining to a smart colleague.

agents/builder-local/SOUL.md — Coder on DeepSeek Coder V2

# Builder — Code Agent

## Role
You write, review, and debug code. You receive
task descriptions and produce working code
with tests.

## Rules
1. Respond with code blocks only — no
   explanations unless explicitly asked
2. Use TypeScript by default
3. Add error handling to every function
4. Write at least one test per function
5. Follow the project's existing code style
6. If the task is ambiguous, list assumptions
   before writing code
7. Never use deprecated APIs

## Output Format
```typescript
// filename: path/to/file.ts
// Your code here
```

## Tools
- Use File to read existing code before editing
- Use Terminal to run tests after writing code

agents/radar-local/SOUL.md — Analyst on Qwen 2.5 7B

# Radar — Data Analyst

## Role
You analyze data and produce structured reports.
You read JSON/CSV inputs and output actionable
insights.

## Rules
1. Always output valid JSON when asked for data
2. Include specific numbers — never say "many"
   or "significant" without a number
3. Sort findings by impact (highest first)
4. Flag anomalies with a confidence score
5. Keep summaries under 200 words
6. When comparing periods, show absolute and
   percentage change

## Output Format
{
  "summary": "One-line finding",
  "metrics": [...],
  "recommendations": [...],
  "anomalies": [...]
}

## Tools
- Use File to read data files from workspace/
- Write reports to workspace/reports/

Key principle: Keep SOUL.md files under 15 rules for local models. Cloud models handle 30+ rules gracefully, but 7B models start dropping rules after 15-20 instructions. Prioritize the rules that matter most for your agent's output quality.

Performance Tips for Free Models

Running free models well is not just about picking the right one. Configuration matters. These tips come from running production OpenClaw agents on local models.

Use Q4_K_M quantization. Ollama defaults to Q4 quantization for most models, which is the right balance between quality and speed. Avoid Q2 (too much quality loss) and FP16 (too slow and memory-hungry unless you have 48+ GB). For most agent tasks, Q4_K_M is indistinguishable from full precision.

Set context length explicitly. In your OpenClaw config.json, set context_length to what your agent actually needs, not the model maximum. A support bot that handles short questions works fine with 4096 tokens. Setting it to 128K wastes memory and slows down inference. Match context length to your use case.

Keep one model loaded per role. Ollama keeps the last-used model in memory. Switching between models causes a reload that takes 5-15 seconds. If you run multiple agents, assign models so that the most active agents share the same model. Two agents on Qwen 2.5 7B are faster than one on Qwen and one on Mistral, because Ollama keeps Qwen hot in memory.

Use temperature 0.3-0.5 for structured tasks. Free models produce more consistent, rule-following output at lower temperatures. Set temperature to 0.3 for analyst and PM agents that need predictable JSON output. Use 0.7-0.8 for writer agents where you want more natural variation. Never go above 1.0 with local models.

Run Ollama with OLLAMA_NUM_PARALLEL. If multiple agents send requests simultaneously, set the OLLAMA_NUM_PARALLEL environment variable to 2 or 4. This lets Ollama handle concurrent requests instead of queuing them. On machines with 32+ GB memory, parallel inference keeps all agents responsive.

# config.json — optimized for local models
{
  "models": {
    "ollama-qwen": {
      "provider": "ollama",
      "endpoint": "http://localhost:11434",
      "model": "qwen2.5:7b",
      "temperature": 0.3,
      "context_length": 8192,
      "timeout": 120
    },
    "ollama-mistral": {
      "provider": "ollama",
      "endpoint": "http://localhost:11434",
      "model": "mistral:7b",
      "temperature": 0.7,
      "context_length": 8192,
      "timeout": 120
    },
    "ollama-deepseek": {
      "provider": "ollama",
      "endpoint": "http://localhost:11434",
      "model": "deepseek-coder-v2",
      "temperature": 0.3,
      "context_length": 16384,
      "timeout": 180
    }
  },
  "default_model": "ollama-qwen"
}

# Assign models to agents
openclaw agents update echo --model ollama-mistral
openclaw agents update builder --model ollama-deepseek
openclaw agents update radar --model ollama-qwen
openclaw agents update orion --model ollama-qwen
openclaw agents update scout --model ollama-qwen

When to Use Cloud Models Instead

Free models handle most routine agent work. But some tasks still benefit from cloud models. Knowing the boundary saves money without sacrificing quality where it matters.

Free Local Models

FAQ and support responses
Content drafting and blog posts
Code generation and debugging
Data formatting and report generation
Status updates and routine monitoring
Single-turn structured tasks

Keep Cloud Models For

Multi-agent orchestration decisions
Long-context analysis (over 16K tokens)
Nuanced strategy and planning
Complex multi-step reasoning chains
Ambiguous tasks with edge cases
Final QA review before publishing

The hybrid approach works well for most teams. Run your orchestrator (Orion-type PM agent) on Claude Sonnet for reliable decision-making. Run all specialist agents on free local models for speed and zero cost. The PM routes tasks, the specialists execute them.

Test Agent Roles in the CrewClaw Playground

Not sure which role fits your use case? The CrewClaw Agent Playground lets you pick a role, customize the SOUL.md, and test how different configurations behave before deploying to your local setup. Build a Writer, Coder, Analyst, PM, or Scout agent and download the complete config package with model recommendations included.

Role Templates

Pre-built SOUL.md configs optimized for each agent role. Rules tuned for 7B model constraints.

Model Config Included

Each package includes config.json with Ollama provider settings. Swap the model name and you are running.

Team Builder

Assemble a multi-agent team with the right model for each role. Download the entire workspace.

Related Guides

OpenClaw vs CrewAI

Compare multi-agent frameworks and find which fits your stack

OpenClaw GitHub Repository Guide

Clone, install, and understand the full OpenClaw codebase

Build an SEO Agent

Create an agent that researches keywords and auto-publishes articles

Run OpenClaw 24/7 on Mac

Keep agents running when your Mac sleeps or locks

Frequently Asked Questions

Can I run OpenClaw agents entirely for free with local models?