Why I Moved from OpenClaw to Hermes


A month ago I thought I had the right answer: split everything into specialists.

At the peak, my setup had sixteen agents. One for email. One for writing. One for research. One for infrastructure. Several more for code, review, critique, QA, and orchestration. On paper it looked elegant — decomposition, clear ownership, domain-specific memory, explicit routing.

In practice it gradually became something else: an overengineered system that demanded more maintenance than it returned.

So I moved the whole thing to Hermes.

This post is not a generic “new framework is better” piece. It’s what actually changed, what broke in the old model, and the decision rule I’d recommend if you’re building your own AI setup today.

What OpenClaw gave me

I want to be fair to OpenClaw, because it solved a real problem before most tools in this space even acknowledged it.

It gave me three things that mattered:

  1. Persistence beyond one chat window. The assistant could remember prior work, not just the current prompt.
  2. A messaging-native interface. Matrix, email, scheduled jobs, background work — not just an IDE pane.
  3. A playground for architecture. It was easy to experiment with routing, specialists, cron-like workflows, memory layers, and custom coordination patterns.

That mattered. Session-only tools are useful, but they start every day half-amnesic. Even The New Stack’s recent comparison between OpenClaw and Hermes framed this as the core shift: from session-bound assistants to persistent agents that actually accumulate working context over time.

OpenClaw was the first system in my stack that made that future feel real.

Where it started to fail

The problem wasn’t that OpenClaw was incapable. The problem was that it made it too easy to build a system whose theoretical power exceeded its operational reliability.

I kept layering solutions on top of solutions:

  • more specialists to reduce context pollution
  • more routing logic to choose the right specialist
  • more handoff rules between agents
  • more memory files to keep each agent focused
  • more orchestration to recover when a chain stalled

Eventually the architecture itself became the workload.

When a task failed, the debugging question was no longer “did the model misunderstand the request?” It became:

  • did the worker fail?
  • did the handoff fail?
  • did the orchestrator miss the signal?
  • did the wrong specialist get selected?
  • did the downstream agent lack one specific piece of context the upstream agent had?

That’s not an AI problem. That’s distributed systems tax.

I wrote earlier about announce-based orchestration failures and the filesystem workaround I ended up using. That workaround worked. But that’s also the point: if your personal assistant requires production-grade coordination patterns to stay reliable, you’ve crossed from useful complexity into accidental complexity.

Sixteen agents, one lesson

The biggest lesson from the 16-agent phase is not “multi-agent is bad.” It’s more precise than that:

Persistent multi-agent setups are expensive unless the domains are truly independent and high-volume.

I had a specialist for nearly everything because I wanted quality. And yes, in some cases quality improved. Focused writer beats generalist writer. Focused reviewer beats generalist reviewer.

But over time I noticed something more important.

Most of my day does not consist of sixteen independent lanes of work running in parallel. It consists of one human agenda with occasional spikes of specialized work.

That means the dominant case is not:

  • email specialist
  • blog specialist
  • infrastructure specialist
  • code reviewer specialist
  • critic specialist
  • all active all the time

The dominant case is:

  • one trusted assistant with continuity
  • one active thread of context
  • occasional need for a highly specialized coding burst

Those are different architectures.

I had optimized for the wrong one.

What Hermes changed

Hermes pushed me back toward the simpler model: one primary assistant that is good at staying useful over time.

What I wanted in the end was not an agent zoo. I wanted a system I trust.

For me, Hermes is the better fit because it is opinionated in the right places:

  • stronger emphasis on durable memory and recall discipline
  • cleaner operational loop around tools, verification, and follow-through
  • better fit for one ongoing assistant relationship instead of many semi-permanent personas
  • easier to keep understandable after weeks of iteration

That last point matters more than people admit.

A personal AI system is not finished when it can do impressive things. It’s finished when you can still understand, repair, and extend it after a month of real life.

OpenClaw encouraged me to explore. Hermes encourages me to simplify.

Right now, simplification is worth more.

Why Claude Code and Codex changed the equation

The other thing that made the big permanent multi-agent setup less compelling was the rise of strong task-specific coding agents.

Both Claude Code and Codex are explicit about what they are in their own docs: local coding agents that can inspect a repo, edit files, and run commands in a focused working directory. That’s exactly the point.

They don’t need to be my forever assistant. They need to be very good at this code problem, right now.

Once those tools became good enough, a lot of my specialist-agent architecture stopped making economic sense.

I no longer need to keep a permanent code-writer persona, code-review persona, or test-writer persona alive as part of one giant always-on constellation just in case I need them later. When I hit a serious implementation task, I can use Claude Code or Codex directly on that repository.

That changes the architecture boundary.

Instead of:

  • one persistent system that contains every specialization internally

I can do:

  • one persistent assistant for continuity, operations, memory, messaging, and daily work
  • one ephemeral specialist agent for the hard coding task in front of me

That’s a better split.

The persistent layer keeps history and context. The specialist layer brings concentrated capability on demand.

Those two jobs do not need to live in the same permanent structure.

The practical decision rule

If you’re deciding between a persistent agent runtime and a pile of coding subagents, this is the rule I’d use now.

Use a persistent assistant when the value comes from continuity:

  • remembering your preferences
  • carrying forward project context across days
  • handling scheduled workflows
  • integrating with messaging, email, calendars, or home systems
  • reducing repeated coordination overhead

Use a repo-local specialist agent when the value comes from depth on one bounded task:

  • implementing a feature
  • reviewing a pull request
  • debugging a failing test suite
  • refactoring one codebase
  • researching one technical decision

Don’t force the persistent assistant to impersonate an entire software organization. Don’t force the repo-local coding tool to become your life OS.

Those are different tools.

What readers should take from this

The important takeaway is not “single agent good, multi-agent bad.”

It’s this:

Optimize for reliability before capability surface area.

A system that can theoretically do ten kinds of delegation but fails one out of five times is worse than a simpler system that reliably completes the boring parts of your day.

The second takeaway:

Count maintenance, not just features.

Every additional agent, memory file, router, handoff rule, and background workflow has a carrying cost. If you don’t include that cost in the architecture decision, you’ll overbuild.

And the third:

Use specialization at the edge, not necessarily at the center.

That was the real shift for me. I still use specialized agents. I just don’t keep them all running as permanent residents inside one increasingly elaborate assistant runtime. For coding, it is often better to reach for Claude Code or Codex exactly when the problem calls for them, then come back to the main assistant when the task is over.

That gives me the upside of specialization without paying permanent orchestration tax.

Closing

OpenClaw was an important stage in the path. It helped me discover what I actually wanted from an AI system — and just as importantly, what I didn’t.

What I want now is much less flashy and much more useful:

  • one assistant I trust
  • strong memory
  • clean operational behavior
  • specialized coding help on demand
  • fewer moving parts

Hermes is closer to that target.

Not because it lets me build more. Because it lets me need less.

M>


See also