AUTO_HERMES

Help & documentation

How AUTO_HERMES works

AUTO_HERMES is an autonomous software factory that runs on your own Mac. A local model writes the code, a frontier model reviews it, and the loop ships real software overnight. This page explains the parts and how they fit together.

What is AUTO_HERMES?

AUTO_HERMES is not a chatbot. It is a closed loop of two AI agents that coordinate only through plain markdown files, with no human in the overnight cycle. One agent writes, the other reviews, and only work that passes review is promoted.

It runs unattended on Apple Silicon, advancing several independent projects at once. You set the backlog; you wake up to progress across every lane.

In one line: a local LLM does the work, a frontier model is the editor, and markdown is the only thing they hand back and forth.

The two-agent loop

The loop has exactly two roles, and they never share memory directly. They communicate the way two engineers would over a shared repo: through files.

  • Hermes (the writer) runs locally on your Mac. It reads the backlog, writes code, runs tests, and commits. It is fast and private, and there is no per-token bill to ship a change.
  • Opus (the reviewer) is a frontier model in the cloud: Claude Opus, or the newer Fable. It reviews every line of what Hermes produced, and promotes only the work that meets the bar. If something fails, it goes back with notes.

The handoff repeats all night, across every project at once. The point is to combine local speed and privacy with frontier-level judgement: the best of both, without a human babysitting the cycle.

Write ↔ reviewOne produces, one approves.

Under the hood

Everything the writer needs runs on the machine in front of you. The only piece in the cloud is the reviewer.

Apple SiliconAn M-series Mac runs the whole writer side locally.
MLXApple's ML framework serves the local model fast on the GPU.
Qwen / GemmaThe local LLMs that write the code (Qwen, or Google's Gemma), on-device.
Claude Opus / FableThe frontier reviewer that checks every change.
Claude CodeThe agent runtime that drives the loop.
gitEvery step is versioned and snapshotted.
Local-firstThe writer runs on your Apple Silicon Mac.
Six lanes, in parallelIndependent projects advance together.

The lanes

A "lane" is an independent project the loop drives. New lanes spin up as fast as a backlog can be written. A sample of what is green-lit right now:

bullhornA local-first investor desk.
The Card RoomHandcrafted solitaire & puzzles.
explorablesInteractive explainers.
HistoryMâchéA turn-based tactical wargame.
vendormergeOn-prem master-data dedup.

Safety & self-healing

An unattended loop has to be hard to break and easy to recover. Three things keep it honest:

  • Git safety nets. Every change is versioned and snapshotted, so the loop can experiment freely; nothing is ever lost.
  • Self-healing. A model watchdog and deterministic state writers keep the loop alive through restarts, with no babysitting.
  • Mission control. A live dashboard shows every lane at a glance: what shipped, what is building, and what needs a human.
Git safety netsVersioned, snapshotted, reversible.
Self-healingSurvives restarts on its own.
Mission controlEvery lane at a glance.

Privacy & your data

Be clear-eyed about where the data goes, because the honest answer is the selling point.

  • The writer is local. Your code, your backlog, and the model that writes them stay on your machine. Nothing leaves to generate a change.
  • The reviewer is cloud. The review step sends the relevant diff to a frontier model (Claude Opus / Fable). That is a deliberate trade: frontier judgement in exchange for the review context leaving the machine.
This is not a full air-gap, and it does not pretend to be. If a lane must never touch the cloud, the review step can be scoped, redacted, or run against a local reviewer instead.

Getting one running

AUTO_HERMES is built and operated by siTOTis for teams who want frontier capability without handing their whole codebase to the cloud. If you want one running on your Mac, start here.

Work with siTOTis

FAQ

Does it really run with no one watching?
Yes. Once a backlog is written, the loop writes, tests, reviews, and promotes on its own overnight. A human reviews the morning's results, not the cycle itself.
What does the local model need?
An Apple Silicon Mac with enough unified memory to hold the model. The writer is served through MLX; the heavier the machine, the more lanes it can hold at once.
Why two models instead of one?
A local model is fast, private, and cheap to run, but a frontier model still has the sharper judgement. Splitting the roles gets the speed of local with the quality bar of frontier.
Can it work on my existing codebase?
That is the intended use. Each lane is just a repo with a backlog. New lanes are added as fast as you can describe the work.
What happens when a change is wrong?
It does not get promoted. The reviewer sends it back with notes, and because every step is in git, a bad path is reverted without losing anything.