> A practical guide to the Nous Research open-source framework. > Based on Hermes Agent **v0.8.0** (v2026.4.8 — "the intelligence release"). Updated: April 9, 2026. > Original content: https://github.com/alchaincyf/hermes-agent-orange-book ## Part 1: Concepts ### 01 Not Another Agent: From Harness to Hermes **What is Harness Engineering?** In early 2026, a consensus emerged in the AI coding world: the bottleneck isn't the model - it's the environment. The LangChain team ran an experiment using the same model (GPT-5.2-Codex), only adjusting the surrounding "harness" configuration. Scores jumped from 52.8% to 66.5%, rankings leaped from Top 30 to Top 5. Not a single line of model code was changed. Mitchell Hashimoto (creator of Terraform) named this: **Harness Engineering**. His approach: every time the AI made a mistake, add a rule so it would never make the same mistake again. **The five-component mapping:** | Harness Component | Manual Implementation | Hermes Built-in System | |---|---|---| | Instruction Layer | Hand-write CLAUDE.md / AGENTS.md | Skill system (markdown skill files, auto-created + self-improving) | | Constraint Layer | Configure hooks / linter / CI | Tool permissions + sandbox + toolset enabled on demand | | Feedback Layer | Manual review / evaluator Agent | Self-improving Learning Loop (auto-retrospective after each task) | | Memory Layer | Manually maintain knowledge base | Three-layer memory (session/persistent/Skill) + Honcho user modeling | | Orchestration Layer | Build your own multi-Agent pipeline | Sub-Agent delegation + cron scheduling | **Hermes vs OpenClaw vs Claude Code:** - **Claude Code** - interactive coding. Your pair-programming partner at the terminal. - **OpenClaw** - configuration-as-behavior. Write a SOUL.md and it becomes what you want. 5,700+ community Skills on ClawHub. - **Hermes** - autonomous background work + self-improvement. Runs on its own, learns on its own. Online 24/7 via Telegram/Discord/Slack. All three tools use the **agentskills.io standard**, so Skills are interoperable. ### 02 Hermes at a Glance: 60 Seconds **Architecture in one line:** ``` Learning Loop --> Three-Layer Memory --> Skill System --> 40+ Tools --> Multi-Platform Gateway ``` **Key numbers (v0.8.0, released April 8, 2026):** | Metric | Data | |---|---| | GitHub stars | 41,200+ | | Built-in tools | 40+ | | Supported platforms | 14 | | MCP integrations | 6,000+ apps | | Sub-Agent concurrency | Up to 3 | | Minimum deployment cost | $5/month VPS | | Memory usage | <500MB (without local LLM) | | License | MIT (fully open source) | **Key differences from OpenClaw:** | Dimension | Hermes Agent | OpenClaw | |---|---|---| | Core philosophy | Self-improving Learning Loop | Configuration-as-behavior (SOUL.md) | | Memory | Three-layer self-improving | Multi-layer, primarily manually maintained | | Skill maintenance | Agent auto-creates + self-improves | Manually written and maintained | | User modeling | Honcho dialectical modeling (12-layer identity inference) | Based on SOUL.md configuration | | Multi-platform | 14-platform Gateway | 50+ messaging platforms | | Ecosystem | 40+ built-in tools + MCP 6,000+ | ClawHub 5,700+ community Skills | | Deployment | Self-hosted (from $5 VPS) | Official hosting / self-hosted | | Skill interop | Both use agentskills.io standard | Both use agentskills.io standard | ## Part 2: Core Mechanisms ### 03 The Learning Loop The Learning Loop has five steps that form a continuous improvement flywheel: ``` Curate Memory --> Create Skill --> Skill Self-Improvement --> FTS5 Recall --> User Modeling ``` **Step 1: Memory curation** After each conversation, Hermes actively decides what's worth remembering. Not passive storage - it writes valuable info into SQLite with FTS5 full-text indexing. Like a person writing a diary. **Step 2: Autonomous Skill creation** When a complex task is complete, Hermes asks: "will this solution be useful again?" If yes, it distills it into a Skill file at `~/.hermes/skills/`. **Step 3: Skill self-improvement** Every time a Skill is used and you provide feedback, Hermes modifies the Skill itself. It updates documentation and standards - not just the current output. **Step 4: FTS5 cross-session recall** Uses SQLite's FTS5 extension for full-text indexing. Before each new conversation, it searches historical memory based on the current topic and loads only the relevant parts. All local - no privacy concerns. **Step 5: User modeling** Honcho user modeling (by Plastic Labs) infers what kind of person you are across 12 identity layers - not just what you said, but deeper patterns from behavior. **Manual vs automated comparison:** | Dimension | Mitchell's Way (Manual) | Hermes's Way (Automated) | |---|---|---| | Rule source | Human spots a problem, writes it down | Agent extracts from its own feedback | | Storage | CLAUDE.md (single file) | Multiple Skill files + memory database | | Improvement trigger | Only when human remembers | Automatic evaluation after every use | | Cross-project portability | Manually copy CLAUDE.md | Skills are global, shared across all projects | | Improvement speed | Depends on human diligence | Continuous and automatic | --- ### 04 Three-Layer Memory **Layer 1: Session memory (Episodic)** - Answers: "What happened?" - Every conversation's content, tool calls, and results written to SQLite with FTS5 - On-demand retrieval - not all history loaded at once - Purely local, no network dependency **Layer 2: Persistent memory (Semantic)** - Answers: "Who are you?" - Stores durable state: coding preferences, project structure habits, toolchain - Stored in SQLite under `~/.hermes/` - Portable: back up the directory and continue on any machine **Layer 3: Skill memory (Procedural)** - Answers: "How to do things?" - Each Skill is a markdown file in `~/.hermes/skills/` - Human-readable and editable **Cognitive science analogy:** | Memory Type | What Hermes Stores | Human Analogy | |---|---|---| | Episodic | What happened | Remembering falling off a bike | | Semantic | Who you are + project context | Knowing to keep center of gravity low | | Procedural | How to do things (Skills) | Body automatically balancing | **Honcho (optional add-on):** - Dialectical user modeling with 12 identity layers - Infers technical level, work rhythm, communication style, goals, emotional patterns - Catches inconsistencies between stated and revealed preferences - Injected as invisible context into subsequent prompts **Memory plugins (expanded in v0.8.0):** The built-in SQLite memory is solid for solo use, but v0.8.0 turns the memory layer into a proper plugin system. You pick your backend: | Plugin | What It Does | Best For | |---|---|---| | **Built-in SQLite + FTS5** | Default. Local, fast, private | Solo use, privacy-first setups | | **Supermemory** | Cloud-hosted semantic memory, multi-container, per-user scoping | Teams, multi-platform deployments | | **mem0** (v2 API) | Managed long-term memory with semantic search | Production agents, API-first setups | | **Hindsight** | Reflective memory that learns from past sessions | Research workflows, iterative projects | | **RetainDB** | Structured memory with dialectic mode | Data-heavy agents | | **ByteRover** | Pre-LLM-call context injection | Latency-sensitive pipelines | | **OpenViking** | Multi-tenant server mode with tenant-scoping headers | Enterprise / shared deployments | All plugins now receive the gateway `user_id` for per-user memory scoping. This matters when your Hermes instance serves multiple people across Telegram or Discord. > Warning: Memory has no automatic expiration by default. Audit `~/.hermes/` periodically. The silent `/new` and `/resume` memory flush failure is fixed in v0.8.0. | Remember | Don't Remember | |---|---| | User preferences and habits | One-off task details | | Project context | Outdated information | | Validated solutions (Skills) | Wrong inferences (clean these up) | | Recurring patterns | Sensitive info (passwords, keys) | > Warning: Hermes's memory currently has no automatic expiration. Periodically audit `~/.hermes/` and clean up outdated Skill files. --- ### 05 The Skill System **Three sources of Skills:** | Source | Description | Scale | |---|---|---| | Bundled Skills | Pre-built capabilities shipping with install | 40+ | | Agent-Created | Automatically distilled after complex tasks | Grows with usage | | Skills Hub | Community-contributed, installable with one click | Continuously growing | **agentskills.io standard:** - Supported by 30+ tools including Claude Code, Cursor, Copilot, Codex CLI, Gemini CLI - Skills you wrote for Claude Code work directly in Hermes, and vice versa - Not a walled garden - like a USB port, one Skill plugs in anywhere **Skill self-improvement cycle:** 1. Execute the Skill 2. Collect feedback (user reactions logged into session memory) 3. Agent analyzes feedback and modifies the Skill file 4. Next execution uses the new version **OpenClaw vs Hermes Skills:** | Dimension | OpenClaw Skills | Hermes Skills | |---|---|---| | Creation | Manually written SOUL.md | Agent-created + manually written | | Maintenance | Manual updates | Auto-evolution + manual intervention | | Personalization | Generic templates, fork to customize | Grows organically from usage habits | | Ecosystem Size | 5,700+ (large) | 40+ bundled + community (growing) | > Note: Skill self-improvement requires clear feedback. Vague "something's off" doesn't help. Good feedback = good evolution direction. --- ### 06 40+ Tools and MCP **Five tool categories:** | Category | Core Tools | What They Do | |---|---|---| | Execution | terminal, code_execution, file | Run commands, execute code (sandboxed), read/write files | | Information | web, browser, session_search | Web search, browser automation, search conversation history | | Media | vision, image_gen, tts | Understand images, generate images, text-to-speech | | Memory | memory, skills, todo, cronjob | Operate memory layer, manage Skills, task planning, scheduled jobs | | Coordination | delegation, moa, clarify | Delegate to sub-agents, multi-model reasoning, ask user for clarification | Notable tools: - **session_search** - FTS5 full-text indexing of conversation history with LLM summarization - **moa** (Multi-model Orchestrated Answering) - calls multiple LLMs simultaneously, synthesizes responses - **cronjob** - natural language scheduled tasks ("check my GitHub notifications every morning at 9am") - **notify_on_complete** (new in v0.8.0) - background processes auto-notify the agent when they finish. Start a long-running build, test suite, or AI training run and walk away. The agent picks up results when they land without polling. **Toolsets mechanism:** Tools are grouped and enabled/disabled in `config.yaml`. Fewer enabled tools = more focused agent, faster response, fewer tokens consumed. Toolsets also serve as security boundaries. **MCP (Model Context Protocol):** - Open standard proposed by Anthropic in late 2024 - Hermes supports stdio or HTTP connection to any MCP Server - 6,000+ applications covered: GitHub, Slack, Jira, Google Drive, databases, etc. - Per-server tool filtering: specify which tools each server can expose - Full MCP OAuth 2.1 PKCE authentication added in v0.8.0 - Automatic OSV malware scanning of MCP extension packages on install (v0.8.0) **Sub-Agent delegation:** - Up to 3 concurrent sub-agents - Each has independent context, restricted toolset, isolated terminal sessions - Results relayed back to main agent for consolidation - Best for: "do several unrelated things and then combine results" --- ## Part 3: Hands-On Setup ### 07 Installation and Configuration **Option 1: Local Install (5 minutes)** ```bash curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash ``` Then launch with: `hermes` **Option 2: Docker** ```bash docker pull nousresearch/hermes-agent:latest docker run -v ~/.hermes:/opt/data nousresearch/hermes-agent:latest ``` Key: `-v ~/.hermes:/opt/data` maps state to host. All state lives in one directory. **Option 3: $5 VPS for 24/7 uptime** | VPS Provider | Monthly Cost | Notes | |---|---|---| | Hetzner CX22 | ~$4/mo | Best value, European nodes | | DigitalOcean Droplet | $5/mo | Singapore/US West nodes | | Vultr | $5/mo | Tokyo node, low latency | Pick Ubuntu 22.04 LTS, SSH in, run the install script. **config.yaml structure:** ```yaml model: provider: openrouter api_key: sk-or-xxxxx model: anthropic/claude-sonnet-4 terminal: local # local/docker/ssh/daytona/modal gateway: telegram: token: YOUR_BOT_TOKEN discord: token: YOUR_BOT_TOKEN ``` **Model providers:** | Provider | Recommended Models | Best For | |---|---|---| | OpenRouter | Claude Sonnet 4 / GPT-4o | 200+ models, flexible switching | | Nous Portal | Hermes 3 series + MiMo v2 Pro (free tier) | Officially recommended | | OpenAI | GPT-4o / o3 | Direct API | | Google AI Studio | Gemini 2.5 Pro / Flash | Native Gemini, auto context detection via models.dev | | Ollama | Hermes 3 8B/70B | Fully offline, privacy first | > Note: As of April 2026, Anthropic banned third-party tools from accessing Claude through Pro/Max subscriptions. Use API keys (pay-as-you-go) or OpenRouter/Nous Portal instead. **Live model switching (new in v0.8.0):** Use `/model` mid-session from the CLI, Telegram, Discord, or Slack. No restart needed. Telegram and Discord get an interactive inline button picker. Aggregator-aware: stays on OpenRouter/Nous when possible, falls back cross-provider automatically. **Terminal backends:** - `local` - runs directly on your machine - `docker` - runs inside a container (isolated, secure) - `ssh` - connects to a remote server - `daytona` / `modal` - serverless, spins up on demand - `singularity` - for HPC clusters --- ### 08 First Conversation After launching, `~/.hermes/` structure: ``` ~/.hermes/ ├── config.yaml # Your configuration ├── state.db # SQLite database (conversation history + FTS5 index) ├── skills/ # Skills directory │ └── bundled/ # Built-in Skills ├── memories/ # Persistent memory (MEMORY.md + USER.md) └── logs/ # Centralized logs (new in v0.8.0) ├── agent.log # INFO+ level events └── errors.log # WARNING+ level events ``` Use `hermes logs` to tail and filter logs from the CLI. Config structure validation now catches malformed YAML at startup before it causes cryptic failures. What happens behind the scenes from your first message: 1. Conversation written to `state.db` with FTS5 index 2. Preferences detected and written to persistent memory layer 3. After complex tasks, Skill files auto-created in `~/.hermes/skills/` 4. Skill improves when you give corrective feedback --- ### 09 Multi-Platform Access **Supported platforms (14 total):** Telegram, Discord, Slack, WhatsApp, Signal, Email, SMS (Twilio), Home Assistant, Mattermost, Matrix, DingTalk, Feishu/Lark, WeCom, Open WebUI **Telegram setup (3 steps, <2 minutes):** 1. Message @BotFather in Telegram, send `/newbot`, get Token 2. Add to `config.yaml` under `gateway.telegram.token` 3. Launch `hermes` - it auto-connects **Cross-platform continuity:** All platforms share the same Agent instance and memory. A conversation started on Telegram can be continued in the CLI. There is one brain, regardless of which door you walk through. **Practical deployment architecture:** ``` $5 VPS (Ubuntu 22.04) ├── Hermes Agent Core ├── Messaging Gateway │ ├── Telegram Bot (phone) │ ├── Discord Bot (team) │ └── Slack App (enterprise) ├── ~/.hermes/ │ ├── state.db │ ├── skills/ │ └── config.yaml └── Model calls --> OpenRouter API ``` Total cost: VPS $5/month + model API fees (~$2-5/month for light usage). --- ### 10 Custom Skills **Skill file structure:** ``` ~/.hermes/skills/ └── my-skill/ ├── SKILL.md # Entry point ├── references/ # Supporting reference files ├── templates/ # Templates └── scripts/ # Scripts ``` **Anatomy of a good Skill:** | Section | Purpose | Required? | |---|---|---| | Title | Quick identification | Yes | | Trigger | When to activate | Strongly recommended | | Rules | Concrete steps, constraints, formats | Yes | | Example | Complete input-to-output | Strongly recommended | | Don'ts | Explicit boundaries | Optional | **Example Skill (git-commit-style):** ```markdown --- name: git-commit-style description: Enforce a consistent Git commit message format version: "1.0.0" --- # Git Commit Style ## Trigger Activate when the user asks me to commit code, write a commit message, or review commit history. ## Rules ### Commit Message Format - First line: type(scope): summary (50 chars max) - Blank line - Body: explain WHY, not WHAT ### Type Enum - feat: new feature - fix: bug fix - refactor: restructure (no behavior change) - docs: documentation - test: tests - chore: build/toolchain ``` **Installing from Skills Hub:** Ask Hermes: "What community Skills are available?" -> "Install XX Skill." - Immediately active, no restart needed. **Porting Claude Code Skills:** Skills follow agentskills.io standard - copy to `~/.hermes/skills/skill-name/SKILL.md`. No format changes needed. Only adjust tool references if the Skill uses Claude Code-specific MCP servers. > Note: Skills can conflict if two have overlapping triggers. If behavior seems off, check for Skill conflicts in `~/.hermes/logs/`. --- ### 11 MCP Integration **Two connection modes:** | Mode | Server Location | Best For | Performance | |---|---|---|---| | stdio | Local subprocess | Local tools, file system, databases | Fast, no network overhead | | HTTP (StreamableHTTP) | Remote server | Cloud services, shared team servers | Depends on network | **Approval buttons (new in v0.8.0):** Dangerous commands no longer require you to type `/approve` in chat. Slack and Telegram now surface native inline buttons. Slack preserves full thread context; Telegram uses emoji reactions for approval status. Less friction, same safety boundary. **GitHub MCP setup:** ```yaml mcp_servers: github: command: "npx" args: ["-y", "@modelcontextprotocol/server-github"] env: GITHUB_PERSONAL_ACCESS_TOKEN: "${GITHUB_TOKEN}" ``` **Database MCP (PostgreSQL):** ```yaml mcp_servers: postgres: command: "npx" args: ["-y", "@modelcontextprotocol/server-postgres"] env: POSTGRES_CONNECTION_STRING: "postgresql://user:pass@localhost:5432/mydb" ``` **Per-server tool filtering (principle of least privilege):** ```yaml mcp_servers: github: command: "npx" args: ["-y", "@modelcontextprotocol/server-github"] env: GITHUB_PERSONAL_ACCESS_TOKEN: "ghp_xxxxx" allowed_tools: - "list_issues" - "create_issue" - "get_pull_request" - "create_pull_request_review" ``` **When to use MCP vs native tools:** - **Native tools** for: terminal commands, file operations, web search, image generation, memory management, sub-Agent delegation - **MCP** for: GitHub, databases, Slack, Jira, Google Drive, and other external services requiring specific API protocols > Practical advice: Don't connect a dozen MCP Servers on day one. Start with one or two you use most (GitHub, database), get comfortable, then add more. **Browser backend change (v0.8.0):** The managed browser provider switched from Browserbase to [Browser Use](https://browser-use.com). Firecrawl is now available as an additional cloud browser option. If your config referenced Browserbase explicitly, update it. **MCP + Skills combo:** MCP solves "what can I connect to," Skills solve "how to use it." Example: GitHub MCP provides PR diffs, a "Code Review" Skill defines review criteria - together Hermes auto-reviews code against your standards. ## Part 4: Real-World Scenarios ### 12 Personal Knowledge Assistant **The cross-session memory advantage:** Traditional AI: Re-explain context every session (3-5 minutes of setup per conversation). With Hermes, after week one's conversations, the three memory layers have recorded: | Memory Layer | What It Records | |---|---| | Session memory (SQLite + FTS5) | Exact conversation text, precise retrieval when details needed | | Persistent memory | "User is researching AI Agent deployment, ruled out option X, prefers low cost" | | Skill memory | "Research tasks: list dimensions first -> dig into each -> summarize per round" | **Retrieval vs full-context loading:** - Traditional: Stuff all history into prompt -> token costs explode, information overload - Hermes: Persistent memory stores summaries (few hundred words), FTS5 retrieves specific snippets on demand **Three ways cross-session memory pays off:** 1. Zero startup cost - say "continue" and you continue 2. Research has continuity - ruled-out options don't get re-recommended 3. Methodology compounds - research approach from project one gets reused in project two automatically --- ### 13 Dev Automation **A developer's morning (hypothetical but achievable):** - Hermes sends 3 Telegram messages before you open your laptop - PR merged notification with review findings - CI pipeline failure report - Daily standup notes drafted from commits and PRs **Automated code review setup:** 1. Connect GitHub MCP 2. Set up cron: "Check main branch for new PRs every 6 hours and do a code review" 3. Define review standards as a Skill (evolves automatically from your feedback) **Claude Code vs Hermes division of labor:** | Dimension | Claude Code | Hermes Agent | |---|---|---| | Interaction mode | Real-time conversation | Background, reports on schedule | | Strengths | Writing code, refactoring, debugging | Monitoring, auditing, summarizing, scheduling | | Time horizon | Single session | Continuous across days and weeks | | Trigger | You initiate it | Cron or event-driven | "Claude Code is the craftsman, Hermes is the butler." **Pipeline:** ``` Claude Code writes code + opens PR --> Hermes auto-reviews PR --> Hermes runs tests to verify --> Hermes generates daily report ``` --- ### 14 Content Creation **Writing series with Hermes:** - After first article: records series positioning, target audience, editing preferences, concepts already explained - On second article: "write the next one in this series" - it knows style, what to skip, what you disliked last time - By fifth article: remarkably precise understanding of writing preferences, learned from feedback alone **Parallel research with sub-agents:** Three sub-agents simultaneously researching different products/angles. Research that used to take 60+ minutes done in 20. **Skills that accumulate writing style:** - Style rules stored as a Skill, not in prompts - Skill self-improves from edits you make to drafts - A month later: dozens of rules, all from real feedback, maintained automatically **Claude Code vs Hermes for content:** | Dimension | Claude Code | Hermes Agent | |---|---|---| | Best for | Standalone articles, one-off tasks | Content series, ongoing projects | | Style control | CLAUDE.md + manual maintenance | Skills that auto-accumulate and evolve | | Research efficiency | Linear search | Parallel research via sub-agents | | Learning ability | Doesn't learn; rules manually written | Learns automatically from feedback | --- ### 15 Multi-Agent Orchestration **Why multiple agents:** - Context explosion: one agent handling research + coding + testing = all information interfering - Time bottleneck: 3 tasks sequentially = A+B+C minutes; in parallel = max(A, B, C) **delegate_task features:** | Feature | Description | |---|---| | Independent context | Sub-agents have their own conversation history | | Restricted toolset | You specify which tools each sub-agent can use | | Isolated terminal sessions | No interference between sub-agents | | Max 3 concurrent | Hard-coded limit (attention dispersion beyond 3) | | Result relay | Results returned to main agent for consolidation | **Security design:** Research sub-agents should only get web+browser. Coding sub-agents only terminal+file+code_execution. Consolidation sub-agents: no external tools. **vs Anthropic's three-agent architecture:** | Dimension | Anthropic Three-Agent | Hermes delegate_task | |---|---|---| | Role assignment | Fixed (plan/execute/evaluate) | Task-driven, flexible | | Communication | Chain | Star topology (main agent <-> sub-agents) | | Parallelism | Typically sequential | Up to 3 concurrent | | Memory | No built-in memory | Main agent maintains full memory | > Rule of thumb: If you find yourself writing lengthy consolidation instructions for the main agent, the task decomposition is probably wrong. Good decomposition makes consolidation simple. ## Part 5: Deep Thinking ### 16 Hermes vs OpenClaw vs Claude Code: Not a Choice **Three design philosophies:** | Dimension | Claude Code | OpenClaw | Hermes Agent | |---|---|---|---| | Core philosophy | Interactive coding | Configuration as behavior | Autonomous background + self-improvement | | Your role | Sitting at the terminal directing | Writing config files to define behavior | Deploy and check in occasionally | | Memory mechanism | CLAUDE.md + auto-memory | Multi-layer (SOUL.md + Daily Logs + semantic search) | Three-layer self-improving memory | | Skill source | Manually installed community Hub | ClawHub 5,700+ | Agent-created + community Hub | | Run mode | On-demand | On-demand | 24/7 background | | Deployment | Local CLI (subscription) | Local CLI (free + API costs) | $5 VPS / Docker / Serverless | **Scenario recommendations:** | Scenario | Recommended Tool | Why | |---|---|---| | Building new features, refactoring | Claude Code | Needs real-time feedback and human judgment | | Standardized agents for a team | OpenClaw | SOUL.md is transparent, auditable, reproducible | | 24/7 code review | Hermes | Cron scheduling + GitHub MCP, runs unattended | | Personal knowledge assistant | Hermes | Three-layer memory accumulates across sessions | | Building a community bot | Hermes | Native 12+ platform Gateway | | Rapid product idea validation | Claude Code | Fast to start, fast to iterate | | Enterprise scenarios needing control | OpenClaw | Transparent config, predictable behavior | | Long-term content creation | Hermes + Claude Code | Hermes for accumulation, Claude Code for writing | **agentskills.io convergence:** 16+ tools now support the standard (Claude Code, Cursor, OpenAI Codex, Gemini CLI, Hermes). Skills are portable - your Skill library is your own asset, not a platform's appendage. **HuaShu's workflow:** - Claude Code = day shift (tasks needing presence: writing articles, code, product decisions) - Hermes = night shift (doesn't need presence: monitoring repos, scheduled research, maintaining knowledge bases) - OpenClaw's SOUL.md = standardized configuration language for behavioral constraints ### 17 The Boundaries of Self-Improving Agents **Hermes's self-improvement constraints (technically controlled):** - Skill files are readable markdown - you can see every diff - Memory data is local SQLite - you can inspect and delete - Tool permissions are sandboxed - can't arbitrarily acquire new permissions **The practical control problem:** - The whole appeal of Hermes is "not having to babysit it" - But safety requires watching the self-improvement results - This contradiction is fundamental **Nous Research's position:** - User control first - MIT license - you own all source code - You can turn off automatic Skill creation entirely **Open source vs closed source trust:** | | Closed Source (Claude Code) | Open Source (Hermes) | |---|---|---| | Trust basis | Business incentives to keep behavior predictable | Your ability to audit | | If things go wrong | Commercial obligation to fix | MIT license - you bear consequences | | Best for | People who don't want to touch code | People with technical chops who want control | **The ceiling of self-improvement:** The ceiling isn't technical - it's the feedback signal. Self-improvement works when you're there giving feedback (supervised). Without you, the agent uses its own evaluation criteria - which may not catch domain-specific errors. **HuaShu's conclusion:** Let the agent self-improve on the "how." You own the "what" and the "don't." That's not being lazy - it's a different kind of "on the loop." **Open questions:** - How much autonomous self-improvement are you comfortable with? - Who audits the results of self-improvement? - Do self-improving agents need a "forgetting" mechanism? - If the agent designs its own reins, who judges if the reins are designed correctly? ## Related - [[systems/hermes-agent]] - Technical implementation notes - [[concepts/harness-engineering]] - The underlying methodology - [[concepts/agentskills-standard]] - agentskills.io cross-tool portability - [[concepts/honcho-user-modeling]] - Dialectical user modeling system