TutorialHermes AgentSetupJune 10, 2026·11 min read

How to Set Up Hermes Agent: Install, Configure, Run (2026)

We have run Hermes Agent on cloud endpoints, on local Ollama models, and on hardware it was arguably too ambitious for. This guide is the result: the official install steps, the provider configuration that actually matters, and the gotchas — including one Ollama context bug — that the docs do not warn you about. After comparing Hermes with OpenClaw, this is the hands-on follow-up.

Before You Start: What You Actually Need

Hermes Agent is the open-source agent framework from Nous Research (github.com/NousResearch/hermes-agent). It runs on Linux, macOS, Windows (native or WSL2), and even Android via Termux. There is also a desktop installer if you prefer not to touch a terminal at all, though most of this guide assumes the CLI.

You need exactly two things before installing:

  • A model to run it on. An API key for OpenRouter, Anthropic, or OpenAI — or a local Ollama / LM Studio setup if you want to stay offline. We cover both paths below, with honest notes on which one you should actually default to.
  • A model with a 64K+ context window. Hermes requires a minimum 64K token context. This sounds like a footnote. It is not — it is the root cause of the most confusing failure mode in local setups, and we dedicate a whole section to it.

Budget a realistic block of time for your first session: the install itself is quick, but picking a provider, running the setup wizard, and verifying the agent actually calls tools correctly takes longer than any quickstart implies. Plan for an unhurried evening, not a coffee break.

Step 1: Install Hermes Agent

On Linux, macOS, WSL2, or Android (Termux), the official install script is the supported path:

curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
source ~/.zshrc   # or ~/.bashrc

On Windows PowerShell:

iex (irm https://hermes-agent.nousresearch.com/install.ps1)

If you prefer a GUI, there is a desktop installer for macOS and Windows at hermes-agent.nousresearch.com/desktop. The usual caution about piping curl into bash applies — read the script first if that bothers you; it is short and the repo is public.

Once installed, verify the binary is on your path and run the built-in diagnostic:

hermes doctor

hermes doctor is the command you will come back to every time something feels off — it checks provider health and flags missing configuration. Learn it now, thank yourself later.

Step 2: Configure a Model Provider

The fastest path is the interactive wizard:

hermes setup            # interactive setup wizard
hermes setup --portal   # OAuth via Nous Portal (model + tool gateway in one)
hermes model            # interactive provider/model picker

hermes setup --portal is the lowest-friction option: one OAuth flow covers a model plus the Tool Gateway tools (web search, image generation, TTS, browser). If you would rather bring your own key, set it directly:

hermes config set OPENROUTER_API_KEY sk-or-...
hermes config set model anthropic/claude-opus-4.6

Hermes routes values intelligently: API keys land in ~/.hermes/.env, everything else in ~/.hermes/config.yaml. Three paths worth memorizing:

PathWhat lives there
~/.hermes/config.yamlSettings: model, terminal backend, TTS, compression
~/.hermes/.envAPI keys and secrets — keep these out of config.yaml
~/.hermes/skills/Skill files the agent uses (and evolves)

Supported providers include OpenRouter (OPENROUTER_API_KEY), Anthropic (ANTHROPIC_API_KEY), OpenAI (OPENAI_API_KEY), Gemini (GEMINI_API_KEY), and any custom OpenAI-compatible endpoint via OPENAI_BASE_URL — which is the door to local models. Finish with hermes config check to catch anything missing.

Warning: do not plug consumer subscriptions into agent harnesses

Do not connect a Claude Pro/Max or Gemini consumer subscription account to Hermes (or any agent harness). Anthropic and Google have been banning accounts used this way — the subscription terms cover the official apps, not third-party automation. A banned account can take your paid history and email identity with it. Use a proper API key billed per token, or run a local model. This is the single most expensive mistake a new agent builder can make, and it is entirely avoidable.

Step 3: Run Your First Session

With a provider configured, starting the agent is one word:

hermes              # start chatting
hermes --tui        # full terminal UI
hermes --continue   # resume your last session (alias: hermes -c)
hermes sessions list

For a first test, ask it to do something tool-shaped rather than chat-shaped: list files in a directory, fetch a web page, summarize a document on disk. Hermes is an agent framework, and tool calls are exactly where weak setups fail. If file paths come back mangled or tools error in confusing ways, jump to the model-choice section below before blaming the framework.

Hermes also ships a messaging gateway for 20+ platforms — Telegram, Discord, Slack, WhatsApp, Signal, Email and more:

hermes gateway setup    # wire up Telegram/Discord/Slack/...
hermes gateway status   # check bot status
hermes tools            # configure tool access per platform

We recommend getting the terminal session solid before adding channels. A misbehaving agent is much easier to debug in a TUI than through a Telegram bot.

Cloud First: The Default Most People Should Pick

Here is the framing we landed on after testing both paths: cloud endpoints are the default, local is the privacy fallback — not the other way around. The local-first instinct is understandable, but the numbers do not support it as a daily driver on typical hardware.

On an M3 MacBook Air with 24GB of RAM — a perfectly reasonable 2026 laptop — we got roughly 12 tokens per second from a local model with every optimization we could find applied. That is usable for testing a skill or running a private one-off task. It is genuinely painful for an agent that thinks in long tool-calling loops, where a single turn can burn thousands of tokens before you see a result. Cloud endpoints, including OpenRouter's free-tier models, were 3–20x faster in our runs depending on the model and time of day.

Cloud endpoint (default)

OpenRouter, Anthropic, or OpenAI key. Fast enough that the agent feels responsive in long tool loops. Free-tier OpenRouter models exist for getting started. Your prompts leave your machine — that is the trade.

Local Ollama (privacy fallback)

Nothing leaves your machine and marginal cost is zero. ~12 tok/s on an M3 Air 24GB with all optimizations — fine for testing, rough for daily use. Requires the context-window fix below or Hermes will not use the model at all.

If your work involves sensitive data, local is the right call and the next two sections make it work. Otherwise, start on a cloud endpoint, get a feel for Hermes at full speed, and add the local setup later.

Running Hermes Agent Locally with Ollama

Hermes talks to local models through any OpenAI-compatible endpoint, which Ollama and LM Studio both expose. The official path is hermes model → Custom Endpoint, or manually in ~/.hermes/config.yaml:

model:
  provider: custom
  default: qwen3:32b-hermes
  base_url: "http://localhost:11434/v1"
  context_length: 65536

Note the model name ends in -hermes and context_length is set explicitly. Both of those are deliberate, and the next subsection explains why — skipping it is how local setups die.

The 4K Context Gotcha Nobody Documents

This one cost us an evening, and we have not seen it written up anywhere, so here it is in full.

Ollama loads models at a 4K token context window by default, regardless of what the model architecturally supports. Hermes requires a minimum 64K context. When it sees a model offering 4K, it silently refuses to use it — no error message pointing at the context window, no hint in the obvious places. From the outside it just looks like Hermes is ignoring your local model, and you start questioning your base_url, your firewall, and eventually your life choices.

The fix is to create a variant of the model with a bigger context baked in. Write a Modelfile:

FROM qwen3:32b
PARAMETER num_ctx 65536

Then build and use the variant:

ollama create qwen3:32b-hermes -f Modelfile

Point Hermes at qwen3:32b-hermes instead of the base model, set context_length: 65536 in config.yaml so Hermes knows the real window for custom endpoints, and the silent refusal disappears. Two practical notes: a 64K context meaningfully increases RAM usage, so on 24GB machines pick a model size that leaves headroom; and the same trick applies to any base model — just swap the FROM line.

Which Local Model: Qwen, Not Gemma

Model choice matters more for agents than for chat, because tool calls are unforgiving. A chat model that paraphrases slightly is fine; an agent model that paraphrases a tool argument breaks the run.

In our testing, gemma4:26b mangled tool-call arguments. The concrete failure: it truncated file paths mid-string — a path containing mustafaergisi came out as mustafaer in the tool call. The agent then operated on a path that did not exist, errored, retried, and produced failures that looked random rather than systematic. That class of bug is brutal to diagnose because the model's prose output looks perfectly competent the whole time.

The Qwen family has been the safe local default for Hermes in our experience. Tool arguments came through intact, function-calling fidelity was consistent, and the 64K-context Modelfile trick worked without drama. If you have the RAM, start with a Qwen model sized to your hardware, apply the context fix above, and only experiment with other families once you have a known-good baseline to compare against.

Coming from OpenClaw?

A lot of readers arrive at Hermes with an existing OpenClaw agent and a carefully tuned SOUL.md. The good news is that your work transfers: the persona, rules, and memory conventions you encoded in SOUL.md map onto Hermes's config and skill files, and we built a free converter that does the translation for you — the SOUL.md to Hermes converter. Paste your SOUL.md, get a Hermes-shaped bundle out.

For the full migration story — what maps cleanly, what does not, and what to re-test afterwards — see our OpenClaw-to-Hermes migration guide. And if you are still deciding whether to migrate at all, the honest Hermes vs OpenClaw comparison lays out the tradeoffs: many builders keep an OpenClaw agent in production while running Hermes as the experimental sandbox, and that is a legitimate end-state rather than indecision.

If you are starting from zero on the OpenClaw side too, the template gallery is a faster starting point than a blank file — fork a persona that is close to what you want and adapt it for either framework.

Troubleshooting Checklist

SymptomLikely fix
Hermes ignores my Ollama model4K default context. Modelfile with num_ctx 65536, ollama create a -hermes variant
Tool calls fail with wrong file pathsModel fidelity issue — we saw this with gemma4:26b. Switch to a Qwen model
Provider errors / missing keyshermes config check, then hermes doctor; keys belong in ~/.hermes/.env
Custom endpoint connects but behaves oddlySet context_length explicitly in config.yaml — auto-detect only covers built-in providers
Local agent feels unbearably slowExpected: ~12 tok/s on an M3 Air 24GB. Use a cloud endpoint for daily work
My Claude/Gemini account got flaggedYou used a consumer subscription in a harness. Stop; switch to an API key

When in doubt: hermes doctor first, then hermes config check. Between them they catch most of what goes wrong in week one.

Next Steps: Skills, MCP, and Beyond

A working install is the boring part. The interesting parts of Hermes are skills — the files the agent authors and evolves — and the Model Context Protocol, which lets it consume external tools. Both have enough depth that we wrote dedicated guides:

Skill commands worth knowing now: hermes skills browse, hermes skills search <topic>, and hermes skills install <path>. They live in ~/.hermes/skills/, which is worth putting under version control before the agent starts editing them.

Related Guides

Frequently Asked Questions

Can I run Hermes Agent locally?

Yes. Point Hermes at a local OpenAI-compatible endpoint (Ollama or LM Studio) via hermes model → Custom Endpoint, or set provider: custom with a base_url like http://localhost:11434/v1 in ~/.hermes/config.yaml. Two caveats from our own testing: Hermes wants a minimum 64K token context window, and Ollama loads models at 4K context by default — so you must rebuild the model with a Modelfile that sets num_ctx, or Hermes will silently refuse to use it. Performance is also modest on consumer hardware: on an M3 MacBook Air with 24GB we saw roughly 12 tokens per second with everything tuned. Fine for testing and private work, painful as a daily driver.

What models work best with Hermes Agent?

For hosted use, any strong tool-calling model on OpenRouter, Anthropic, or OpenAI works — run hermes model and pick from the list. For local use, the Qwen family has been the safe default in our testing. We tried gemma4:26b and it mangled tool-call arguments: it truncated file paths mid-string, which breaks agent tool use in ways that look like random failures rather than a model problem. Whatever you choose, it needs solid function-calling fidelity and a context window of at least 64K tokens, because Hermes leans heavily on tools and long sessions.

Is Hermes Agent free?

The framework itself is free and open source from Nous Research (github.com/NousResearch/hermes-agent). Your costs come from the model you run it on: a hosted endpoint bills per token, while a local Ollama model is free to run but limited by your hardware. OpenRouter has free-tier models that work for getting started, and Nous Portal offers an OAuth setup that bundles a model with the Tool Gateway tools. There is no license fee for Hermes itself.

Hermes or OpenClaw for beginners?

If your goal is a production agent on Telegram or Slack this week, OpenClaw is the gentler path: a single SOUL.md config file, built-in channels, and a large template library to fork from. If your goal is to experiment with a self-improving agent — skills that evolve, a reflection loop, persistent memory — Hermes is the more interesting playground and the setup in this guide is very manageable. Many builders run both: OpenClaw in production, Hermes as the research sandbox. Our full comparison covers the tradeoffs in detail.

Why does Hermes ignore my Ollama model?

Almost certainly the context window. Ollama loads models at a 4K token context by default, and Hermes requires a minimum of 64K — so it silently declines to use the model rather than erroring loudly. The fix is to create a variant with a larger context: write a Modelfile containing FROM your-model and PARAMETER num_ctx 65536, run ollama create your-model-hermes -f Modelfile, then point Hermes at the new model name and set context_length in config.yaml so Hermes knows the real window. This cost us a frustrating evening; the dedicated section in this guide walks through it step by step.

Running OpenClaw too? Bring your agent with you

CrewClaw generates complete OpenClaw deploy packages — SOUL.md, Docker, Telegram bot, config files — and our free converter turns any SOUL.md into a Hermes-ready bundle. $9 single, $19 starter, $29 team. One-time. You own the files.

Deploy a Ready-Made AI Agent

Skip the setup. Pick a template and deploy in 60 seconds.

Get a Working AI Employee

Pick a role. Your AI employee starts working in 60 seconds. WhatsApp, Telegram, Slack & Discord. No setup required.

Get Your AI Employee
One-time payment Own the code Money-back guarantee