How to Set Up Hermes Agent: Install, Configure, Run (2026)
We have run Hermes Agent on cloud endpoints, on local Ollama models, and on hardware it was arguably too ambitious for. This guide is the result: the official install steps, the provider configuration that actually matters, and the gotchas — including one Ollama context bug — that the docs do not warn you about. After comparing Hermes with OpenClaw, this is the hands-on follow-up.
Before You Start: What You Actually Need
Hermes Agent is the open-source agent framework from Nous Research (github.com/NousResearch/hermes-agent). It runs on Linux, macOS, Windows (native or WSL2), and even Android via Termux. There is also a desktop installer if you prefer not to touch a terminal at all, though most of this guide assumes the CLI.
You need exactly two things before installing:
- A model to run it on. An API key for OpenRouter, Anthropic, or OpenAI — or a local Ollama / LM Studio setup if you want to stay offline. We cover both paths below, with honest notes on which one you should actually default to.
- A model with a 64K+ context window. Hermes requires a minimum 64K token context. This sounds like a footnote. It is not — it is the root cause of the most confusing failure mode in local setups, and we dedicate a whole section to it.
Budget a realistic block of time for your first session: the install itself is quick, but picking a provider, running the setup wizard, and verifying the agent actually calls tools correctly takes longer than any quickstart implies. Plan for an unhurried evening, not a coffee break.
Step 1: Install Hermes Agent
On Linux, macOS, WSL2, or Android (Termux), the official install script is the supported path:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
source ~/.zshrc # or ~/.bashrcOn Windows PowerShell:
iex (irm https://hermes-agent.nousresearch.com/install.ps1)If you prefer a GUI, there is a desktop installer for macOS and Windows at hermes-agent.nousresearch.com/desktop. The usual caution about piping curl into bash applies — read the script first if that bothers you; it is short and the repo is public.
Once installed, verify the binary is on your path and run the built-in diagnostic:
hermes doctorhermes doctor is the command you will come back to every time something feels off — it checks provider health and flags missing configuration. Learn it now, thank yourself later.
Step 2: Configure a Model Provider
The fastest path is the interactive wizard:
hermes setup # interactive setup wizard
hermes setup --portal # OAuth via Nous Portal (model + tool gateway in one)
hermes model # interactive provider/model pickerhermes setup --portal is the lowest-friction option: one OAuth flow covers a model plus the Tool Gateway tools (web search, image generation, TTS, browser). If you would rather bring your own key, set it directly:
hermes config set OPENROUTER_API_KEY sk-or-...
hermes config set model anthropic/claude-opus-4.6Hermes routes values intelligently: API keys land in ~/.hermes/.env, everything else in ~/.hermes/config.yaml. Three paths worth memorizing:
| Path | What lives there |
|---|---|
| ~/.hermes/config.yaml | Settings: model, terminal backend, TTS, compression |
| ~/.hermes/.env | API keys and secrets — keep these out of config.yaml |
| ~/.hermes/skills/ | Skill files the agent uses (and evolves) |
Supported providers include OpenRouter (OPENROUTER_API_KEY), Anthropic (ANTHROPIC_API_KEY), OpenAI (OPENAI_API_KEY), Gemini (GEMINI_API_KEY), and any custom OpenAI-compatible endpoint via OPENAI_BASE_URL — which is the door to local models. Finish with hermes config check to catch anything missing.
Warning: do not plug consumer subscriptions into agent harnesses
Do not connect a Claude Pro/Max or Gemini consumer subscription account to Hermes (or any agent harness). Anthropic and Google have been banning accounts used this way — the subscription terms cover the official apps, not third-party automation. A banned account can take your paid history and email identity with it. Use a proper API key billed per token, or run a local model. This is the single most expensive mistake a new agent builder can make, and it is entirely avoidable.
Step 3: Run Your First Session
With a provider configured, starting the agent is one word:
hermes # start chatting
hermes --tui # full terminal UI
hermes --continue # resume your last session (alias: hermes -c)
hermes sessions listFor a first test, ask it to do something tool-shaped rather than chat-shaped: list files in a directory, fetch a web page, summarize a document on disk. Hermes is an agent framework, and tool calls are exactly where weak setups fail. If file paths come back mangled or tools error in confusing ways, jump to the model-choice section below before blaming the framework.
Hermes also ships a messaging gateway for 20+ platforms — Telegram, Discord, Slack, WhatsApp, Signal, Email and more:
hermes gateway setup # wire up Telegram/Discord/Slack/...
hermes gateway status # check bot status
hermes tools # configure tool access per platformWe recommend getting the terminal session solid before adding channels. A misbehaving agent is much easier to debug in a TUI than through a Telegram bot.
Cloud First: The Default Most People Should Pick
Here is the framing we landed on after testing both paths: cloud endpoints are the default, local is the privacy fallback — not the other way around. The local-first instinct is understandable, but the numbers do not support it as a daily driver on typical hardware.
On an M3 MacBook Air with 24GB of RAM — a perfectly reasonable 2026 laptop — we got roughly 12 tokens per second from a local model with every optimization we could find applied. That is usable for testing a skill or running a private one-off task. It is genuinely painful for an agent that thinks in long tool-calling loops, where a single turn can burn thousands of tokens before you see a result. Cloud endpoints, including OpenRouter's free-tier models, were 3–20x faster in our runs depending on the model and time of day.
Cloud endpoint (default)
OpenRouter, Anthropic, or OpenAI key. Fast enough that the agent feels responsive in long tool loops. Free-tier OpenRouter models exist for getting started. Your prompts leave your machine — that is the trade.
Local Ollama (privacy fallback)
Nothing leaves your machine and marginal cost is zero. ~12 tok/s on an M3 Air 24GB with all optimizations — fine for testing, rough for daily use. Requires the context-window fix below or Hermes will not use the model at all.
If your work involves sensitive data, local is the right call and the next two sections make it work. Otherwise, start on a cloud endpoint, get a feel for Hermes at full speed, and add the local setup later.
Running Hermes Agent Locally with Ollama
Hermes talks to local models through any OpenAI-compatible endpoint, which Ollama and LM Studio both expose. The official path is hermes model → Custom Endpoint, or manually in ~/.hermes/config.yaml:
model:
provider: custom
default: qwen3:32b-hermes
base_url: "http://localhost:11434/v1"
context_length: 65536Note the model name ends in -hermes and context_length is set explicitly. Both of those are deliberate, and the next subsection explains why — skipping it is how local setups die.
The 4K Context Gotcha Nobody Documents
This one cost us an evening, and we have not seen it written up anywhere, so here it is in full.
Ollama loads models at a 4K token context window by default, regardless of what the model architecturally supports. Hermes requires a minimum 64K context. When it sees a model offering 4K, it silently refuses to use it — no error message pointing at the context window, no hint in the obvious places. From the outside it just looks like Hermes is ignoring your local model, and you start questioning your base_url, your firewall, and eventually your life choices.
The fix is to create a variant of the model with a bigger context baked in. Write a Modelfile:
FROM qwen3:32b
PARAMETER num_ctx 65536Then build and use the variant:
ollama create qwen3:32b-hermes -f ModelfilePoint Hermes at qwen3:32b-hermes instead of the base model, set context_length: 65536 in config.yaml so Hermes knows the real window for custom endpoints, and the silent refusal disappears. Two practical notes: a 64K context meaningfully increases RAM usage, so on 24GB machines pick a model size that leaves headroom; and the same trick applies to any base model — just swap the FROM line.
Which Local Model: Qwen, Not Gemma
Model choice matters more for agents than for chat, because tool calls are unforgiving. A chat model that paraphrases slightly is fine; an agent model that paraphrases a tool argument breaks the run.
In our testing, gemma4:26b mangled tool-call arguments. The concrete failure: it truncated file paths mid-string — a path containing mustafaergisi came out as mustafaer in the tool call. The agent then operated on a path that did not exist, errored, retried, and produced failures that looked random rather than systematic. That class of bug is brutal to diagnose because the model's prose output looks perfectly competent the whole time.
The Qwen family has been the safe local default for Hermes in our experience. Tool arguments came through intact, function-calling fidelity was consistent, and the 64K-context Modelfile trick worked without drama. If you have the RAM, start with a Qwen model sized to your hardware, apply the context fix above, and only experiment with other families once you have a known-good baseline to compare against.
Coming from OpenClaw?
A lot of readers arrive at Hermes with an existing OpenClaw agent and a carefully tuned SOUL.md. The good news is that your work transfers: the persona, rules, and memory conventions you encoded in SOUL.md map onto Hermes's config and skill files, and we built a free converter that does the translation for you — the SOUL.md to Hermes converter. Paste your SOUL.md, get a Hermes-shaped bundle out.
For the full migration story — what maps cleanly, what does not, and what to re-test afterwards — see our OpenClaw-to-Hermes migration guide. And if you are still deciding whether to migrate at all, the honest Hermes vs OpenClaw comparison lays out the tradeoffs: many builders keep an OpenClaw agent in production while running Hermes as the experimental sandbox, and that is a legitimate end-state rather than indecision.
If you are starting from zero on the OpenClaw side too, the template gallery is a faster starting point than a blank file — fork a persona that is close to what you want and adapt it for either framework.
Troubleshooting Checklist
| Symptom | Likely fix |
|---|---|
| Hermes ignores my Ollama model | 4K default context. Modelfile with num_ctx 65536, ollama create a -hermes variant |
| Tool calls fail with wrong file paths | Model fidelity issue — we saw this with gemma4:26b. Switch to a Qwen model |
| Provider errors / missing keys | hermes config check, then hermes doctor; keys belong in ~/.hermes/.env |
| Custom endpoint connects but behaves oddly | Set context_length explicitly in config.yaml — auto-detect only covers built-in providers |
| Local agent feels unbearably slow | Expected: ~12 tok/s on an M3 Air 24GB. Use a cloud endpoint for daily work |
| My Claude/Gemini account got flagged | You used a consumer subscription in a harness. Stop; switch to an API key |
When in doubt: hermes doctor first, then hermes config check. Between them they catch most of what goes wrong in week one.
Next Steps: Skills, MCP, and Beyond
A working install is the boring part. The interesting parts of Hermes are skills — the files the agent authors and evolves — and the Model Context Protocol, which lets it consume external tools. Both have enough depth that we wrote dedicated guides:
Hermes Agent Skills Guide
How skills work, browsing and installing them, and keeping the self-improvement loop on a leash
Hermes Agent MCP Guide
Connecting MCP servers to Hermes and exposing your own tools to the agent
Skill commands worth knowing now: hermes skills browse, hermes skills search <topic>, and hermes skills install <path>. They live in ~/.hermes/skills/, which is worth putting under version control before the agent starts editing them.
Related Guides
Hermes Agent vs OpenClaw
The honest comparison — cost, skills, memory, and community
Migrate OpenClaw SOUL.md to Hermes
What transfers cleanly and what needs a rewrite
SOUL.md → Hermes Converter
Free tool: paste your OpenClaw SOUL.md, get a Hermes bundle
Agent Template Gallery
Ready-made agent personas to fork instead of starting blank
Frequently Asked Questions
Can I run Hermes Agent locally?
Yes. Point Hermes at a local OpenAI-compatible endpoint (Ollama or LM Studio) via hermes model → Custom Endpoint, or set provider: custom with a base_url like http://localhost:11434/v1 in ~/.hermes/config.yaml. Two caveats from our own testing: Hermes wants a minimum 64K token context window, and Ollama loads models at 4K context by default — so you must rebuild the model with a Modelfile that sets num_ctx, or Hermes will silently refuse to use it. Performance is also modest on consumer hardware: on an M3 MacBook Air with 24GB we saw roughly 12 tokens per second with everything tuned. Fine for testing and private work, painful as a daily driver.
What models work best with Hermes Agent?
For hosted use, any strong tool-calling model on OpenRouter, Anthropic, or OpenAI works — run hermes model and pick from the list. For local use, the Qwen family has been the safe default in our testing. We tried gemma4:26b and it mangled tool-call arguments: it truncated file paths mid-string, which breaks agent tool use in ways that look like random failures rather than a model problem. Whatever you choose, it needs solid function-calling fidelity and a context window of at least 64K tokens, because Hermes leans heavily on tools and long sessions.
Is Hermes Agent free?
The framework itself is free and open source from Nous Research (github.com/NousResearch/hermes-agent). Your costs come from the model you run it on: a hosted endpoint bills per token, while a local Ollama model is free to run but limited by your hardware. OpenRouter has free-tier models that work for getting started, and Nous Portal offers an OAuth setup that bundles a model with the Tool Gateway tools. There is no license fee for Hermes itself.
Hermes or OpenClaw for beginners?
If your goal is a production agent on Telegram or Slack this week, OpenClaw is the gentler path: a single SOUL.md config file, built-in channels, and a large template library to fork from. If your goal is to experiment with a self-improving agent — skills that evolve, a reflection loop, persistent memory — Hermes is the more interesting playground and the setup in this guide is very manageable. Many builders run both: OpenClaw in production, Hermes as the research sandbox. Our full comparison covers the tradeoffs in detail.
Why does Hermes ignore my Ollama model?
Almost certainly the context window. Ollama loads models at a 4K token context by default, and Hermes requires a minimum of 64K — so it silently declines to use the model rather than erroring loudly. The fix is to create a variant with a larger context: write a Modelfile containing FROM your-model and PARAMETER num_ctx 65536, run ollama create your-model-hermes -f Modelfile, then point Hermes at the new model name and set context_length in config.yaml so Hermes knows the real window. This cost us a frustrating evening; the dedicated section in this guide walks through it step by step.
Running OpenClaw too? Bring your agent with you
CrewClaw generates complete OpenClaw deploy packages — SOUL.md, Docker, Telegram bot, config files — and our free converter turns any SOUL.md into a Hermes-ready bundle. $9 single, $19 starter, $29 team. One-time. You own the files.
Deploy a Ready-Made AI Agent
Skip the setup. Pick a template and deploy in 60 seconds.
Or Get the Whole Team
Multi-agent crews pre-configured to work together. Cheaper than buying singles.
Automate Content Pipeline: 4-Agent SEO + Writing + Social Team
Automate content pipeline end-to-end with 4 AI agents that handle keyword research, drafting, scheduling, and social distribution for solo founders and lean teams.
AI DevOps Automation: 3-Agent CI/CD, Code Review, and QA Team
AI DevOps automation team that runs CI/CD monitoring, PR review, and regression testing on autopilot for solo developers and small startup engineering teams.