Run OpenClaw Locally with Ollama: Free Setup Guide (2026)

Q: Can I run OpenClaw agents entirely offline with Ollama?

Yes, once Ollama has downloaded a model, everything runs locally without an internet connection. OpenClaw sends prompts to the Ollama endpoint on localhost:11434 and receives responses without touching any external server. This is ideal for air-gapped environments, private data processing, and situations where you want full control over your AI stack. The only time you need internet is for the initial Ollama and model downloads.

Q: How much VRAM do I need to run local models for OpenClaw agents?

It depends on the model size. A 7B parameter model like Llama 3.1 8B or Mistral 7B needs about 4-6 GB of VRAM and runs well on most modern GPUs. A 13B model needs 8-10 GB. For the best agent performance with a 70B model, you need 40+ GB of VRAM. If you do not have a GPU, Ollama falls back to CPU inference, which works but is significantly slower (10-30x). For most agent tasks, a 7B or 13B model on a mid-range GPU provides a good balance of speed and quality.

Q: Which local model works best for OpenClaw SOUL.md agents?

For general-purpose agents, Llama 3.1 8B is the best starting point. It follows instructions well, handles structured outputs, and runs fast on modest hardware. For writing-heavy agents like content creators or documentation bots, Mistral 7B produces more natural prose. For coding agents, CodeGemma or DeepSeek Coder are better choices. The hybrid approach works best for most teams: use local models for routine tasks and fall back to cloud APIs for complex multi-step reasoning.

Q: Is the quality of local models good enough for production agents?

For single-turn tasks like answering questions, summarizing text, drafting short content, and following structured rules from SOUL.md, local 7B-13B models perform surprisingly well. Where they fall short is in complex multi-step reasoning chains, long-context tasks (over 8K tokens), and nuanced decision-making. If your agent mostly handles routine interactions with clear rules, local models are production-ready. For agents that need to reason through ambiguous situations, cloud models like Claude or GPT-4o still have a significant edge.

Q: Can I switch between Ollama and cloud providers without changing my SOUL.md?

Yes. The SOUL.md file defines your agent's identity, rules, and behavior. It is completely independent of the model provider. You configure the model provider in OpenClaw's config.json file, not in the SOUL.md. This means you can test with Ollama locally, then deploy to production with Anthropic Claude, without touching your agent configuration. The hybrid setup in this guide shows how to route different agents to different providers automatically.

Why Run OpenClaw with Ollama

Every API call to Claude, GPT-4o, or Gemini costs money. For agents that handle hundreds of messages per day, those costs add up fast. Ollama lets you run open-source LLMs on your own machine, and OpenClaw connects to Ollama the same way it connects to any cloud provider. The result: AI agents that run for free, respond instantly, and never send your data to a third party.

This is not a toy setup. Local models like Llama 3.1, Mistral, and Gemma have reached a quality level where they handle most routine agent tasks well. Answering questions, following SOUL.md rules, drafting content, processing structured data. For these tasks, a local 7B or 13B model running on a mid-range GPU delivers responses in under 2 seconds with zero ongoing cost.

API costs per month

100%

data stays local

<2s

response time (GPU)

Offline

works without internet

Hardware Requirements and Ollama Installation

Running local models is GPU-bound. The more VRAM you have, the larger the model you can run and the faster it responds. Here is what you need at minimum, and what is recommended for a good experience.

Setup Level	GPU / VRAM	Max Model Size	Speed
Minimum	No GPU (CPU only)	7B (Q4 quantized)	5-15 tokens/sec
Good	6-8 GB VRAM (RTX 3060, M1)	7B-13B	30-60 tokens/sec
Recommended	12-16 GB VRAM (RTX 4070, M2 Pro)	13B-34B	40-80 tokens/sec
Power User	24-48 GB VRAM (RTX 4090, M2 Ultra)	70B	20-40 tokens/sec

Apple Silicon Macs are excellent for local inference because they share unified memory between CPU and GPU. An M1 with 16 GB can run 13B models at full speed. An M2 Ultra with 192 GB can run 70B models with room to spare.

Install Ollama

# macOS / Linux: one-line install
curl -fsSL https://ollama.com/install.sh | sh

# Verify installation
ollama --version

# Pull your first model (Llama 3.1 8B, ~4.7 GB download)
ollama pull llama3.1

# Test it works
ollama run llama3.1 "Hello, are you running locally?"

# Ollama runs as a background service on port 11434
# Verify the API is accessible
curl http://localhost:11434/api/tags

Tip: On Windows, download the installer from ollama.com/download. Ollama runs natively on Windows with full GPU support for NVIDIA cards. AMD GPU support is available on Linux through ROCm.

Configure OpenClaw for Ollama

OpenClaw treats Ollama as any other model provider. You point it at the local Ollama endpoint and specify which model to use. No API keys required.

# Option 1: Configure via CLI
openclaw models add ollama \
  --provider ollama \
  --endpoint http://localhost:11434 \
  --model llama3.1

# Set Ollama as the default provider
openclaw models set-default ollama

# Verify the connection
openclaw models test ollama

You can also configure it directly in the config file for more control over parameters.

# Option 2: Edit config.json directly
# Location: ~/.openclaw/config.json

{
  "models": {
    "ollama-llama": {
      "provider": "ollama",
      "endpoint": "http://localhost:11434",
      "model": "llama3.1",
      "temperature": 0.7,
      "context_length": 8192,
      "timeout": 120
    }
  },
  "default_model": "ollama-llama"
}

# Create an agent that uses the local model
mkdir -p ~/agents/local-assistant
cat > ~/agents/local-assistant/SOUL.md << 'EOF'
# Local Assistant

## Identity
You are a helpful assistant running on local hardware via Ollama.
You respond quickly and never send data to external services.

## Rules
- Be concise and direct
- If a question is beyond your knowledge, say so honestly
- Keep responses under 150 words for simple questions
- Use markdown formatting for structured answers

## Tone
Friendly, efficient, and straightforward.
EOF

# Register and test
openclaw agents add local-assistant \
  --workspace ~/agents/local-assistant \
  --model ollama-llama \
  --non-interactive

openclaw agent --agent local-assistant \
  --message "Summarize what you can do"

Best Ollama Models for AI Agents

Not all local models are equal. Different models excel at different agent tasks. Here is a practical breakdown based on real testing with OpenClaw agents.

Model	Size	VRAM Needed	Best For
Llama 3.1 8B	4.7 GB	6 GB	General-purpose agents, Q&A, structured tasks
Mistral 7B	4.1 GB	6 GB	Writing agents, content drafting, natural prose
Gemma 2 9B	5.4 GB	8 GB	Instruction-following, concise responses
CodeGemma 7B	4.8 GB	6 GB	Code generation, DevOps agents, scripting
Llama 3.1 70B	39 GB	48 GB	Complex reasoning, near-cloud quality
Phi-3 Mini	2.3 GB	4 GB	Lightweight tasks, low-resource machines

# Pull models for different agent roles
ollama pull llama3.1        # General-purpose
ollama pull mistral         # Writing tasks
ollama pull codegemma       # Code and DevOps
ollama pull gemma2          # Instruction-following
ollama pull phi3            # Lightweight / low-resource

# List downloaded models
ollama list

# Check model details (size, quantization, parameters)
ollama show llama3.1

Tip: Start with Llama 3.1 8B. It is the best all-round model for agent use cases. Only switch to a specialized model if you notice quality issues for a specific task type.

Hybrid Approach: Local + Cloud

The most practical setup is not purely local or purely cloud. It is hybrid. Use local models for high-volume, routine tasks where speed and cost matter. Use cloud models for complex reasoning where quality matters. OpenClaw lets you assign different providers to different agents.

# config.json with both local and cloud providers
{
  "models": {
    "ollama-llama": {
      "provider": "ollama",
      "endpoint": "http://localhost:11434",
      "model": "llama3.1",
      "temperature": 0.7
    },
    "ollama-mistral": {
      "provider": "ollama",
      "endpoint": "http://localhost:11434",
      "model": "mistral",
      "temperature": 0.8
    },
    "claude-haiku": {
      "provider": "anthropic",
      "model": "claude-3-haiku-20240307",
      "api_key": "sk-ant-..."
    },
    "claude-sonnet": {
      "provider": "anthropic",
      "model": "claude-sonnet-4-20250514",
      "api_key": "sk-ant-..."
    }
  },
  "default_model": "ollama-llama"
}

# Assign different models to different agents
openclaw agents update support-bot --model ollama-llama
openclaw agents update content-writer --model ollama-mistral
openclaw agents update strategist --model claude-sonnet
openclaw agents update quick-responder --model claude-haiku

# Each agent uses its assigned model automatically
openclaw agent --agent support-bot --message "Reset password steps"
# ^ Runs locally, free, instant

openclaw agent --agent strategist --message "Analyze Q1 growth plan"
# ^ Uses Claude Sonnet for complex reasoning

Use Local Models For

FAQ and support responses
Content drafting and first passes
Data formatting and extraction
Status checks and routine monitoring
High-volume, repetitive tasks

Use Cloud Models For

Multi-step reasoning chains
Long-context analysis (over 8K tokens)
Complex decision-making
Agent-to-agent orchestration
Tasks requiring latest knowledge

Limitations: When Local Models Fall Short

Local models are not a complete replacement for cloud APIs. Being honest about the trade-offs helps you build a setup that actually works in production.

Multi-step agent chains. When Agent A passes context to Agent B, which then decides what Agent C should do, local 7B models lose track of the overall goal. They handle single-turn tasks well but struggle with complex orchestration that requires maintaining state across multiple handoffs. Use cloud models for your orchestrator agent if you run multi-agent workflows.

Long context windows. Most local models cap out at 8K-32K tokens of context. Cloud models like Claude offer 200K tokens. If your agent needs to process long documents, analyze large codebases, or maintain extensive conversation history, local models will truncate or hallucinate. Keep agent interactions short and focused when using local models.

Nuanced instruction following. A well-written SOUL.md with 10-15 rules works fine with local models. But if your agent config has dozens of edge-case rules, conditional behaviors, and complex persona requirements, local models will miss subtleties that Claude or GPT-4o handle naturally. Simplify your SOUL.md rules when targeting local inference.

Knowledge cutoff and freshness. Local models have a fixed training cutoff. They do not know about recent events, new APIs, or updated documentation. If your agent needs current information, pair it with a web search tool or use a cloud model that has a more recent training date.

Generate Ollama-Ready SOUL.md Configs with CrewClaw

Writing agent configs from scratch takes time. The CrewClaw Generator includes pre-built templates optimized for local model constraints: shorter system prompts, simpler rule structures, and focused skill sets that work well within 8K context windows.

Optimized for Local

SOUL.md templates designed for 7B-13B models. Concise rules, focused skills, and context-efficient prompts that keep token usage low.

Provider Config Included

Each generated package includes a config.json pre-configured for Ollama. Just change the model name if you want a different one.

Hybrid Templates

Multi-agent team configs with routing rules. Local models for routine agents, cloud provider slots for your reasoning agent.

Frequently Asked Questions

Can I run OpenClaw agents entirely offline with Ollama?

Yes, once Ollama has downloaded a model, everything runs locally without an internet connection. OpenClaw sends prompts to the Ollama endpoint on localhost:11434 and receives responses without touching any external server. This is ideal for air-gapped environments, private data processing, and situations where you want full control over your AI stack. The only time you need internet is for the initial Ollama and model downloads.

How much VRAM do I need to run local models for OpenClaw agents?

It depends on the model size. A 7B parameter model like Llama 3.1 8B or Mistral 7B needs about 4-6 GB of VRAM and runs well on most modern GPUs. A 13B model needs 8-10 GB. For the best agent performance with a 70B model, you need 40+ GB of VRAM. If you do not have a GPU, Ollama falls back to CPU inference, which works but is significantly slower (10-30x). For most agent tasks, a 7B or 13B model on a mid-range GPU provides a good balance of speed and quality.

Which local model works best for OpenClaw SOUL.md agents?

For general-purpose agents, Llama 3.1 8B is the best starting point. It follows instructions well, handles structured outputs, and runs fast on modest hardware. For writing-heavy agents like content creators or documentation bots, Mistral 7B produces more natural prose. For coding agents, CodeGemma or DeepSeek Coder are better choices. The hybrid approach works best for most teams: use local models for routine tasks and fall back to cloud APIs for complex multi-step reasoning.

Is the quality of local models good enough for production agents?

For single-turn tasks like answering questions, summarizing text, drafting short content, and following structured rules from SOUL.md, local 7B-13B models perform surprisingly well. Where they fall short is in complex multi-step reasoning chains, long-context tasks (over 8K tokens), and nuanced decision-making. If your agent mostly handles routine interactions with clear rules, local models are production-ready. For agents that need to reason through ambiguous situations, cloud models like Claude or GPT-4o still have a significant edge.

Can I switch between Ollama and cloud providers without changing my SOUL.md?

Yes. The SOUL.md file defines your agent's identity, rules, and behavior. It is completely independent of the model provider. You configure the model provider in OpenClaw's config.json file, not in the SOUL.md. This means you can test with Ollama locally, then deploy to production with Anthropic Claude, without touching your agent configuration. The hybrid setup in this guide shows how to route different agents to different providers automatically.

Build Ollama-Ready Agent Configs in Seconds

Use the CrewClaw generator to create SOUL.md configs optimized for local models. Pick a role, customize the rules, and download a complete package with Ollama provider config included. The generator is free to use.

Ollama Agents Guide Build Your Agent

Run OpenClaw Locally with Ollama: Free Setup Guide

Why Run OpenClaw with Ollama

Hardware Requirements and Ollama Installation

Install Ollama

Configure OpenClaw for Ollama

Best Ollama Models for AI Agents

Hybrid Approach: Local + Cloud

Use Local Models For

Use Cloud Models For

Limitations: When Local Models Fall Short

Generate Ollama-Ready SOUL.md Configs with CrewClaw

Optimized for Local

Provider Config Included

Hybrid Templates

Frequently Asked Questions

Can I run OpenClaw agents entirely offline with Ollama?

How much VRAM do I need to run local models for OpenClaw agents?

Which local model works best for OpenClaw SOUL.md agents?

Is the quality of local models good enough for production agents?

Can I switch between Ollama and cloud providers without changing my SOUL.md?

Build Ollama-Ready Agent Configs in Seconds

Deploy a Ready-Made AI Agent

Or Get the Whole Team

Get a Working AI Employee