OpenClaw + Remote Ollama Setup Guide: Run Inference on Another Machine (2026)

Why Run Ollama on a Separate Machine

Running Ollama locally works fine for testing, but production agent setups benefit from separating the inference server from the machine running OpenClaw. There are three main reasons to use a remote Ollama instance.

GPU Resource Isolation

LLM inference saturates GPU memory and compute. Running it on a dedicated machine means your development workstation stays responsive. No more frozen UI while your agent generates a response.

Always-On Inference

A dedicated server can run 24/7 without interrupting your workflow. Restart your laptop, update your OS, or switch machines freely while agents keep running against the remote Ollama endpoint.

Shared Access

Multiple OpenClaw installations across different machines can all point to the same Ollama server. One GPU serves your entire team or all your devices without duplicating model downloads.

Common setups include a desktop with an NVIDIA GPU acting as the inference server while a MacBook runs OpenClaw agents, a NAS or home server with a GPU card, or a cloud VPS with GPU access. The setup process is the same regardless of hardware.

Expose the Ollama API on Your Network

By default, Ollama only listens on localhost:11434. To accept connections from other machines, you need to set the OLLAMA_HOST environment variable to 0.0.0.0. This tells Ollama to bind to all network interfaces.

On the Ollama server machine:

# Option 1: Set the environment variable before starting Ollama
export OLLAMA_HOST=0.0.0.0
ollama serve

# Option 2: Run inline (one-liner)
OLLAMA_HOST=0.0.0.0 ollama serve

# Option 3: Make it permanent in your shell profile
echo 'export OLLAMA_HOST=0.0.0.0' >> ~/.bashrc
source ~/.bashrc
ollama serve

If Ollama was installed as a systemd service (the default on Linux), you need to edit the service file instead.

For systemd-managed Ollama (Linux):

# Edit the systemd service override
sudo systemctl edit ollama

# Add these lines in the editor that opens:
[Service]
Environment="OLLAMA_HOST=0.0.0.0"

# Save and restart
sudo systemctl daemon-reload
sudo systemctl restart ollama

# Verify it is listening on all interfaces
ss -tlnp | grep 11434
# Expected output: 0.0.0.0:11434 (not 127.0.0.1:11434)

On macOS, if Ollama is running as a desktop application, you set the environment variable using launchctl.

For macOS Ollama app:

# Set the environment variable for the Ollama app
launchctl setenv OLLAMA_HOST "0.0.0.0"

# Restart the Ollama app (quit and reopen)
# Then verify from another machine:
curl http://SERVER_IP:11434/api/tags

Important: Find your server's local IP with ip addr show (Linux) or ipconfig getifaddr en0 (macOS). You will need this IP for the OpenClaw config on your client machine.

Configure OpenClaw to Use Remote Ollama

On the machine running OpenClaw (your laptop, workstation, or another server), point the Ollama provider endpoint to the remote server's IP address instead of localhost.

CLI configuration:

# Replace 192.168.1.100 with your Ollama server's IP
openclaw models add remote-ollama \
  --provider ollama \
  --endpoint http://192.168.1.100:11434 \
  --model llama3.1

# Set it as the default model
openclaw models set-default remote-ollama

# Test the connection
openclaw models test remote-ollama

# If the test succeeds, you'll see:
# ✓ Connected to ollama at 192.168.1.100:11434
# ✓ Model llama3.1 is available
# ✓ Response time: 1.2s

For more control, edit the config file directly.

~/.openclaw/config.json on the client machine:

{
  "models": {
    "remote-ollama": {
      "provider": "ollama",
      "endpoint": "http://192.168.1.100:11434",
      "model": "llama3.1",
      "temperature": 0.7,
      "context_length": 8192,
      "timeout": 120
    },
    "remote-ollama-code": {
      "provider": "ollama",
      "endpoint": "http://192.168.1.100:11434",
      "model": "codegemma",
      "temperature": 0.3,
      "context_length": 8192,
      "timeout": 120
    }
  },
  "default_model": "remote-ollama"
}

Create an agent that uses the remote model:

# Register an agent with the remote model
openclaw agents add remote-assistant \
  --workspace ~/agents/remote-assistant \
  --model remote-ollama \
  --non-interactive

# Test it end-to-end
openclaw agent --agent remote-assistant \
  --message "What model are you running on?"

# The agent processes on your laptop,
# but inference happens on the remote GPU server

Fix Permission and Connection Errors

A common question on Reddit and forums: "I set OLLAMA_HOST=0.0.0.0 but still can't connect from another machine." Here is a systematic checklist to diagnose and fix every permission and connection issue.

Step 1: Verify Ollama is actually listening on 0.0.0.0. After setting the environment variable and restarting, run this on the server.

# Linux
ss -tlnp | grep 11434
# Should show: LISTEN 0 ... 0.0.0.0:11434

# macOS
lsof -i :11434
# Should show: ollama ... *:11434 (LISTEN)

# If it shows 127.0.0.1:11434, the env var was not picked up.
# Common causes:
# - You edited .bashrc but Ollama runs as a systemd service
# - On macOS, the app doesn't read shell profile vars
# - The env var was set AFTER Ollama started

Step 2: Test from the server itself. Before testing remotely, confirm the API works locally.

# On the server
curl http://localhost:11434/api/tags

# Should return JSON with your models:
# {"models":[{"name":"llama3.1:latest",...}]}

# If this fails, Ollama is not running.
# Restart it:
sudo systemctl restart ollama   # Linux systemd
# or
OLLAMA_HOST=0.0.0.0 ollama serve  # Manual start

Step 3: Test from the client machine. Now try reaching the server from the OpenClaw machine.

# From the client machine (replace with your server IP)
curl http://192.168.1.100:11434/api/tags

# If this hangs or times out: firewall is blocking the port
# If "connection refused": Ollama is on localhost only
# If it returns the model list: network is fine,
# proceed to configure OpenClaw

Step 4: Check for CORS or binding issues. Some setups require additional environment variables.

# If you get CORS errors (rare, but happens with
# web-based tools hitting the Ollama API):
export OLLAMA_ORIGINS="*"

# Full set of env vars for maximum compatibility:
export OLLAMA_HOST=0.0.0.0
export OLLAMA_ORIGINS="*"
ollama serve

Firewall and Security Configuration

Opening port 11434 without restrictions is fine on a trusted home network. For anything else, you need firewall rules. Ollama has no built-in authentication, so the firewall is your only line of defense.

UFW (Ubuntu/Debian):

# Allow only your OpenClaw client's IP
sudo ufw allow from 192.168.1.50 to any port 11434

# Or allow your entire local subnet
sudo ufw allow from 192.168.1.0/24 to any port 11434

# Verify the rule
sudo ufw status numbered

# NEVER do this on a public server:
# sudo ufw allow 11434  <-- allows the entire internet

iptables (advanced):

# Allow a specific IP
sudo iptables -A INPUT -p tcp --dport 11434 \
  -s 192.168.1.50 -j ACCEPT

# Block all other access to the port
sudo iptables -A INPUT -p tcp --dport 11434 -j DROP

# Save rules (persist across reboot)
sudo iptables-save > /etc/iptables/rules.v4

macOS firewall (pf):

# macOS uses pf (packet filter) for advanced rules.
# Edit /etc/pf.conf and add:
pass in on en0 proto tcp from 192.168.1.0/24 to any port 11434
block in on en0 proto tcp from any to any port 11434

# Reload the rules:
sudo pfctl -f /etc/pf.conf
sudo pfctl -e

Security note: For production setups on cloud servers, put Nginx or Caddy in front of Ollama as a reverse proxy. This lets you add basic auth, rate limiting, and TLS encryption. A minimal Nginx config needs just 10 lines and adds a real authentication layer that Ollama itself lacks.

Nginx reverse proxy with basic auth:

# /etc/nginx/sites-available/ollama
server {
    listen 11435 ssl;
    server_name ollama.local;

    ssl_certificate /etc/nginx/ssl/ollama.crt;
    ssl_certificate_key /etc/nginx/ssl/ollama.key;

    auth_basic "Ollama API";
    auth_basic_user_file /etc/nginx/.htpasswd;

    location / {
        proxy_pass http://127.0.0.1:11434;
        proxy_set_header Host $host;
        proxy_read_timeout 600s;
    }
}

# Create the password file:
# sudo htpasswd -c /etc/nginx/.htpasswd ollama-user

# In OpenClaw config, use the proxied endpoint:
# "endpoint": "https://ollama-user:PASSWORD@ollama.local:11435"

Performance Tuning for Remote Inference

Network latency adds overhead to every request. Here are the key settings to optimize remote Ollama performance.

Server-side Ollama tuning:

# Keep models loaded in memory longer (default: 5m)
# Prevents cold-start delays between requests
export OLLAMA_KEEP_ALIVE=30m

# Set the number of parallel request slots
# Useful when multiple OpenClaw instances connect
export OLLAMA_NUM_PARALLEL=4

# Increase GPU layers for faster inference
# (model-specific, check with ollama show MODEL)
export OLLAMA_NUM_GPU=999

# Set max loaded models (if you use multiple models)
export OLLAMA_MAX_LOADED_MODELS=3

# Full startup command with all tuning vars
OLLAMA_HOST=0.0.0.0 \
OLLAMA_KEEP_ALIVE=30m \
OLLAMA_NUM_PARALLEL=4 \
OLLAMA_MAX_LOADED_MODELS=3 \
ollama serve

Client-side OpenClaw tuning:

# In ~/.openclaw/config.json, increase the timeout
# for remote connections (default is often 30s)
{
  "models": {
    "remote-ollama": {
      "provider": "ollama",
      "endpoint": "http://192.168.1.100:11434",
      "model": "llama3.1",
      "timeout": 300,
      "context_length": 8192,
      "temperature": 0.7,
      "num_predict": 2048
    }
  }
}

# The timeout value (in seconds) should account for:
# - Network latency (1-5ms on LAN)
# - Model load time on first request (~5-15s)
# - Inference time for long responses (~10-60s)
# 300 seconds is safe for most setups

Reduce Latency

Use wired ethernet instead of WiFi
Set OLLAMA_KEEP_ALIVE to avoid cold starts
Pre-warm models with a dummy request on startup
Lower context_length if full 8K is not needed
Use quantized models (Q4_K_M) for faster inference

Increase Throughput

Set OLLAMA_NUM_PARALLEL for concurrent requests
Run multiple Ollama instances on different ports
Use smaller models for high-volume tasks
Batch similar requests to same model
Monitor GPU utilization with nvidia-smi

Docker Setup for Remote Ollama

Docker is the cleanest way to run Ollama on a remote server. It handles CUDA dependencies, makes updates trivial, and provides restart policies out of the box.

docker-compose.yml for the Ollama server:

version: "3.8"

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama-server
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    environment:
      - OLLAMA_HOST=0.0.0.0
      - OLLAMA_KEEP_ALIVE=30m
      - OLLAMA_NUM_PARALLEL=4
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    restart: unless-stopped

volumes:
  ollama_data:

Start the container and pull models:

# Start the Ollama container
docker compose up -d

# Pull models into the running container
docker exec ollama-server ollama pull llama3.1
docker exec ollama-server ollama pull mistral
docker exec ollama-server ollama pull codegemma

# Verify models are available
docker exec ollama-server ollama list

# Check GPU access inside the container
docker exec ollama-server nvidia-smi

# Test the API from another machine
curl http://SERVER_IP:11434/api/tags

For AMD GPUs, use the ROCm image:

# Replace the image and device config:
services:
  ollama:
    image: ollama/ollama:rocm
    devices:
      - /dev/kfd
      - /dev/dri

Tip: The Docker volume ollama_data persists downloaded models across container restarts and updates. You only need to pull each model once.

Troubleshooting Common Issues

Here are the most common problems when setting up remote Ollama and how to fix each one.

Connection refused

Ollama is not running or is only bound to 127.0.0.1. Verify with ss -tlnp | grep 11434 on the server. Ensure OLLAMA_HOST=0.0.0.0 is set and Ollama was restarted after setting it.

Connection timed out

A firewall is blocking port 11434. Check sudo ufw status or sudo iptables -L. Add a rule allowing your client IP on port 11434. On cloud servers, also check the provider's security group or network firewall settings.

Model not found (404 error)

The model name in your OpenClaw config does not match what is installed on the server. Run ollama list on the server and use the exact name shown. Watch for tag suffixes like :latest or :7b-instruct-q4_0.

Timeout during inference

The first request after a model unloads requires loading it back into GPU memory, which can take 5-15 seconds. Increase timeout in your OpenClaw config to at least 120 seconds. Set OLLAMA_KEEP_ALIVE=30m on the server to keep models loaded longer.

Slow responses over WiFi

WiFi adds 5-20ms of latency per packet, and LLM streaming sends many small packets. Switch to wired ethernet if possible. If stuck on WiFi, ensure you are on the 5GHz band and within good range of the router. Consider enabling response buffering in your OpenClaw config to reduce packet overhead.

CUDA out of memory

The model is too large for your GPU. Check memory usage with nvidia-smi. Switch to a smaller model or a more quantized variant (Q4_K_M uses roughly half the memory of Q8). Set OLLAMA_MAX_LOADED_MODELS=1 if multiple models are being loaded simultaneously.

Related Guides

Run OpenClaw Locally with Ollama

Free local setup guide with hardware requirements and model recommendations

Run OpenClaw 24/7 on Mac

Keep your Mac running agents without sleep or lock interruptions

OpenClaw Gateway Daemon Guide

Run the OpenClaw gateway as a background service with auto-restart

OpenClaw Browser Troubleshooting

Fix browser service errors, disconnects, and sandbox issues

Build Agent Configs Optimized for Remote Ollama

Use the CrewClaw agent playground to generate SOUL.md configs with provider settings pre-configured for remote Ollama endpoints. Pick a role, set your server IP, and download a ready-to-deploy agent package.

Local Ollama Guide Build Your Agent

Frequently Asked Questions

Can I run Ollama on one machine and OpenClaw on another?

Yes. Ollama exposes a REST API on port 11434. By default it only listens on localhost, but setting the OLLAMA_HOST environment variable to 0.0.0.0 makes it accept connections from any IP on the network. OpenClaw connects to Ollama using a standard HTTP endpoint, so you just change the endpoint URL from http://localhost:11434 to http://YOUR_SERVER_IP:11434 in your OpenClaw config. Both machines need to be on the same network, or you need proper port forwarding and firewall rules in place.

Is it safe to expose Ollama on 0.0.0.0?

Exposing Ollama on 0.0.0.0 means any device that can reach port 11434 on that machine can send inference requests. On a private home or office network behind a router, this is generally fine. On a public-facing server or cloud VPS, you should never expose Ollama directly without a firewall. Use iptables or ufw to restrict access to specific IP addresses. For additional security, put a reverse proxy like Nginx or Caddy in front of Ollama and add basic authentication or mTLS.

What is the latency difference between local and remote Ollama?

On a local gigabit network, the added latency is negligible, typically 1-5 milliseconds per request. The actual inference time dominates the total response time. A 7B model on a remote RTX 4090 will still respond faster than the same model running on a local CPU. On WiFi, expect 5-20ms of additional latency depending on signal strength. Over a VPN or WAN connection, latency can climb to 50-200ms, which is still acceptable for most agent tasks but may feel sluggish for interactive chat.

Can I use Ollama remote with Docker on the server side?

Yes, and Docker is often the cleanest approach. Run the official ollama/ollama Docker image with --gpus all to pass through your NVIDIA GPU, and map port 11434. The container handles all CUDA dependencies internally. Use docker compose for easier management and to set restart policies. The Docker approach also makes it simple to run multiple Ollama instances on different ports for different model configurations or to isolate workloads.

Do I need a static IP for the Ollama server?

Not necessarily, but it helps. On a home network, your router assigns IPs dynamically via DHCP. If the server IP changes, your OpenClaw config breaks. The easiest fix is to assign a static IP to the server in your router's DHCP settings or directly on the server's network interface. Alternatively, use mDNS hostnames like gpu-server.local if both machines support it (macOS does by default, Linux needs avahi-daemon). For cloud setups, use a fixed private IP or DNS name.

Why does OpenClaw get a connection refused error when connecting to remote Ollama?

Connection refused means either Ollama is not running on the server, it is only listening on localhost instead of 0.0.0.0, a firewall is blocking port 11434, or you have the wrong IP address. First verify Ollama is running on the server with curl http://localhost:11434/api/tags. Then check it is listening on all interfaces with ss -tlnp | grep 11434, which should show 0.0.0.0:11434. Then test from the client machine with curl http://SERVER_IP:11434/api/tags. If that fails, check your firewall rules with sudo ufw status or sudo iptables -L.

Can I connect multiple OpenClaw machines to one remote Ollama server?

Yes. Ollama handles concurrent requests by queuing them. Multiple OpenClaw instances on different machines can all point to the same Ollama endpoint. However, Ollama processes one request at a time per loaded model by default. If you send multiple requests simultaneously, they queue up and execute sequentially. For better throughput with multiple clients, consider running multiple Ollama instances on different ports, each with its own model loaded, and distributing requests across them.

OpenClaw + Remote Ollama Setup Guide