Open Source ยท MIT License

Volundr

Autonomous Agent Orchestration for Claude Code

v5.0 - MIT License
13
Hooks
8
Agent Types
21
Personas
Live
Dashboard
localhost:3000 - Enter The Forge
Volundr v5 landing page - Enter The Forge

What Is This?

Volundr turns Claude Code into a senior engineering lead that manages the entire software development lifecycle - automatically.

Volundr (Old Norse: Volundr, English: Wayland the Smith) - the legendary master craftsman of Norse mythology. A smith of unmatched skill who could forge anything, working alone in his forge with tireless precision.

When asked what the framework should be called, the AI chose the name itself. An autonomous smith that takes raw materials and forges finished work. The dashboard is The Forge. The agent visualization is The Þing - the Old Norse assembly. The campfire is where the team gathers.

One Command to Start

Open Claude Code in the volundr directory and say "Wake up!" - Volundr handles everything from there. It interviews you, creates a blueprint, breaks work into cards, and starts building.

👥

Real Agent Teams

Volundr spawns specialized agents - Developer, Architect, QA, DevOps, Designer, Reviewer, Guardian, Researcher - each with their own domain, tools, and communication channels.

🧠

Learns Across Projects

Every session ends with a self-review. Lessons are extracted, patterns identified, and knowledge accumulates in a private local database - making every future project smarter.

The Elevator Pitch

Volundr is a PM, architect, and engineering lead that lives inside Claude Code. You describe what you want to build. It interviews you with 5-10 targeted questions. It writes a blueprint and reviews it with a panel of virtual perspectives. It breaks work into cards with binary success criteria. It spawns specialized agent teammates - one per domain, running in parallel. It scores every deliverable, retries failures, generates behavioral rules from low scores, and writes a retrospective when done. All data stays on your machine. Nothing leaves your environment. Start with "Wake up!".

How It Works

Every project follows the same five-stage flow. Volundr guides you through each stage and handles the complexity automatically.

01
💬

Discovery

Volundr interviews you - vision, stack, constraints, design, and review gate level. Opinionated defaults. Challenges vague requirements.

02
📄

Blueprint

Architecture, card breakdown, dependency graph. A virtual roundtable of perspectives debates the plan before work starts.

03

Execution

Agent teammates run in parallel - one per domain. Each works in an isolated worktree with trait-composed prompts. Architect and QA run alongside.

04

Quality

Every card is scored on four dimensions. Failures get retried. Low scores generate behavioral rules. Mandatory cross-branch review before merge.

05
🚀

Ship

Integration testing. Guardian architecture audit. Documentation. Retrospective. Lesson promotion to the global knowledge base.

Card System

Work is broken into cards with binary testable success criteria. The dashboard enforces these - a card cannot be marked done unless all criteria pass with evidence. Quality is scored on four dimensions: completeness, code quality, format compliance, and correctness.

Quality Gates & Blind Review

After every agent completes: type check, production build, smoke test, anti-pattern scan, and success criteria verification. Then a blind card reviewer (independent Haiku agent) scores the card without seeing the self-score. Both scores are tracked side-by-side. On failure: a Fixer agent retries up to twice. On double failure: escalated directly.

The Viking Roster

21 specialized personas named after Norse mythology figures. Each persona carries expertise signals, personality traits, and a preferred work style. When a card needs doing, Volundr matches it to the right persona automatically.

How matching works: Every card has a domain, a stack, and acceptance criteria. Volundr scans the persona roster and selects the persona whose expertise tags best match the work. A database migration card? Mímir Deepwell (database specialist). A security audit? Víðarr Silentward. Frontend polish? Iðunn Goldleaf.

The persona shapes the agent's behavior. A persona with the cautious trait will write more defensive code. One with terse will keep output minimal. Persona stats (quality average, reliability, cards completed) accumulate over time - the dashboard tracks which personas perform best.

Týr Lawbringer
Architect
👁
Heimdall Watchfire
Auth & Security
📚
Mímir Deepwell
Database
🌀
Skuld Threadweaver
Data Engineering
🔧
Brokkr Forgehand
DevOps
📖
Saga Storyteller
Documentation
Baldr Brightblade
Fullstack
🔀
Rán Tidecaller
Migration
🛡
Víðarr Silentward
Security
🔍
Forseti Truthseeker
QA
🍀
Iðunn Goldleaf
Frontend
Hermóðr Swiftmessage
API
🐍
Sigyn Steadfast
Python
📱
Sleipnir Swiftfoot
Mobile
Skaði Cloudpiercer
Cloud
💨
Magni Irongrip
Performance
🤖
Huginn Thoughtwing
AI / ML
Höðr Allseer
Accessibility
🔭
Muninn Farseeker
Researcher
💎
Eitri Runecaster
.NET
🔎
Freyja Goldseeker
SEO
Personas - Radar Chart, Stats & Skill History
Persona detail page with radar chart, quality average, cards completed, and skill history
Skills Library - Extracted Learnings
Skills library with severity badges, tags, source tracking, and extracted learnings

Three-Tier Discovery

User-created personas (highest priority) override pack-installed personas, which override built-in roster personas. Same ID = override. This means you can customize any built-in persona without losing the defaults.

Persona Builder

Create custom personas from the dashboard. Define name, role, expertise tags, personality traits, writing style, and model preference. Or override any built-in persona with your own tweaks.

Skill Tracking

As agents complete work, skills are extracted and linked to their persona. Over time, each persona builds a skill profile - tracked with domain, confidence level, and version. The dashboard shows learned skills per persona with a radar chart.

Scoring & Review

Every card is scored on a 1-10 scale across four weighted dimensions. A blind reviewer double-checks the work. Low scores trigger automatic corrections. Nothing ships without passing the gate.

The Formula

C
Completeness
×3
+
Q
Code Quality
×3
+
F
Format
×2
+
R
Correctness
×2
=
/10
Weighted Score

Each dimension scored 1-10. Weights reflect priority: completeness and code quality matter most (×3), format and correctness are important but secondary (×2). A score of 7 means "meets spec" - not everything deserves a 10.

👀 Blind Card Reviewer

After the implementing agent self-scores, an independent reviewer agent (Haiku model for cost efficiency) scores the same card without seeing the self-score. This prevents score inflation. Both scores are stored and shown side-by-side in the compliance heatmap with S (self) and R (reviewer) badges.

⚠ Steering Rules

When a card scores below 5.0/10, Volundr generates a behavioral steering rule. These are concrete "don't do this again" instructions appended to the project constraints. Future agents inherit them automatically. Rules are tagged with the card ID and timestamp so you can see why each rule exists.

Compliance - Enforcement Score & Heatmap
Compliance page with enforcement score, rated averages, and per-card quality heatmap
Insights - Quality Trend, Cost & Token Charts
Insights page with quality trend, card velocity, token usage, and agent breakdown

The Build Gate

Every card runs through a six-step gate before it can be merged:

1. Type Check 2. Production Build 3. Smoke Test 4. Anti-Pattern Scan 5. Success Criteria 6. Blind Review

The Forge

A real-time dashboard that shows everything happening across your project. WebSocket live updates. No polling. No refresh needed.

localhost:3000 - The Forge
The Forge dashboard - at a glance stats, live feed, progress
Board - Cards with ISC Criteria & Dependencies
Board view with acceptance criteria, ISC checks, and dependency graph
Events - Filterable Audit Log
Events page with filterable audit log and expandable detail rows
📋

Real-Time Kanban

Live card board with status columns: backlog, ready, in-progress, blocked, in-review, done. Cards update in real time as agents complete work.

🌳

Agent Tracker

Every agent registered, tracked, and completed. See which agents are live, what they are working on, and how many tokens they have consumed.

📈

Quality Scores

Per-card quality scores with trend lines. Four dimensions weighted and aggregated. Session average vs all-time average. Low scores flagged automatically.

💵

Cost Metrics

Token usage and dollar cost tracked per card, per agent, and per session. Cache hit ratio. Cost per card. Budget gating prevents runaway spending.

📡

Event Log

Every significant action logged with timestamp and context. Agent spawns, card transitions, quality gate outcomes, and steering rule generations.

WebSocket Live Updates

The dashboard uses WebSockets for instant updates. No polling delay. Changes from agents appear on the board within milliseconds.

The Þing

Named after the Old Norse assembly where decisions were debated and made collectively. This is where agent teams communicate, claim work, raise concerns, and resolve conflicts - in real time.

The Thing - Live Agent Assembly
The Thing - full team around the fire

🔥 The Campfire

At the center: an animated campfire with flickering light. Around it: every active agent appears as a silhouette figure. Volundr sits at the conductor's seat. Developers fan out. Specialists appear as they spawn and fade out on completion.

💬 Agent Communication

Agents talk to each other via message passing. A developer claims a card and announces work. The architect reviews the approach and raises concerns. QA reports test failures. Volundr mediates conflicts and makes final calls. All messages stream to the dashboard in real time.

⚖ The Roundtable

Before implementation starts, Volundr convenes a virtual roundtable. Multiple perspectives debate the blueprint - a skeptic challenges assumptions, a pragmatist focuses on delivery, an architect guards patterns. The plan is stress-tested before a single line of code is written.

🔄 Task Lifecycle

A card moves through: persona matched → developer claims → worktree created → implementation → self-review → blind reviewer scores → quality gate pass/fail → merge or retry. Every transition fires a hook and updates the dashboard instantly.

The Roster

Eight specialized agent types, each with a defined role, tool access, model tier, and communication protocol. Volundr selects and spawns the right agents based on the work at hand.

Volundr
Team Lead - Orchestrates Everything
Developers
Architect
QA Engineer
DevOps Engineer
Designer
Reviewer
Guardian
Researcher
Orchestrator
Volundr
The PM, architect, and engineering lead rolled into one. Manages the entire project lifecycle, spawns all agents, merges branches, scores quality, and generates steering rules from failures.
Teammate
Developer
Claims tasks from the shared task list and implements cards directly in isolated worktrees. One per domain, up to four running in parallel. Trait-composed per card type.
Teammate
Architect
Continuous design guardian. Reviews card specs before work starts. Catches scope creep, pattern violations, and anti-patterns. Read-only - influences through messages and comments.
Teammate
QA Engineer
Owns test strategy and coverage. Writes tests alongside implementation, runs test suites, and messages developers directly when failures are found. Tracks coverage across the project.
Teammate
DevOps Engineer
Owns infrastructure: Docker, CI/CD, migrations, environment config, and deployment pipelines. Runs environment verification at the start of every project.
Teammate
Designer
UI/UX quality, component patterns, accessibility, and responsive design. Reviews via browser screenshots. Implements CSS and design tokens directly in the codebase.
Teammate
Reviewer
Cross-domain code review and spotcheck enforcement. Reads all completed branches, flags issues with file:line references. Severity levels: BLOCK, WARN, and INFO. BLOCK findings prevent merge.
Milestone
Guardian
Full codebase architecture audit at milestones. Reviews for pattern consistency, circular imports, type safety, duplication, and security. Grades the codebase A/B/C with specific remediation steps.
Teammate
Researcher
Pre-implementation research on external APIs, libraries, and documentation. Produces reports, TypeScript interface mappings, and endpoint catalogs that developers can use directly.

Packs

Packs bundle agent prompts, persona seeds, skills, and routing rules into installable modules. The framework ships with 8 built-in packs. You can create and install your own.

Core

Developer, Architect, Reviewer, Planner prompts. The foundation every project uses.

Testing

QA Engineer prompts. Test strategy, coverage tracking, Playwright E2E.

Quality

Guardian audit, blind card reviewer, quality rubric. Scoring and gates.

Infrastructure

DevOps Engineer. Docker, CI/CD, migrations, deployment pipelines.

Frontend

Designer prompts. UI/UX, accessibility, component patterns, responsive design.

Security

Security review prompts. OWASP scanning, auth patterns, vulnerability assessment.

Research

Researcher prompts. External API docs, endpoint mapping, library evaluation.

Languages

Python, .NET, Mobile, AI/ML specialist personas and prompts.

What's in a Pack?

Each pack contains a pack.json manifest with: agent prompt templates, persona seed definitions, skill declarations, and routing rules. On install, persona seeds are written to the database with source='pack' so the three-tier discovery knows their priority.

Build Your Own

Create a directory under framework/packs/ with a pack.json, prompt templates in prompts/, and persona definitions. Install via the /vldr-pack install command. Your pack's personas will override built-in ones with the same ID.

Start in 3 Steps

Prerequisites: Docker and the Claude Code CLI. The launcher script handles everything else โ€” Docker startup, dashboard container, browser, and launching Claude with the "Wake up!" prompt.

1

Install Prerequisites

You need Docker and the Claude Code CLI. That's it.

npm install -g @anthropic-ai/claude-code
2

Clone the Repository

Clone Volundr and enter the directory. The framework lives here โ€” your project data stays in ~/.volundr/ and never touches this repo.

github.com/sebwesselhoff/volundr

git clone https://github.com/sebwesselhoff/volundr.git
cd volundr
3

Run the Launcher

The launcher does everything automatically: starts Docker if needed, initializes ~/.volundr/ on first run, pulls and starts the dashboard container, waits for the API health check, opens The Forge in your browser, and launches Claude Code with the "Wake up!" prompt.

# macOS / Linux
./start.sh

# Windows
start.bat

That's it. Volundr activates, checks the dashboard connection, loads or creates a project, and starts the discovery interview. You describe what you want to build โ€” Volundr does the rest.

The dashboard runs at http://localhost:3000, the API at http://localhost:3141. If you prefer to start Claude manually instead, just run claude from the volundr directory and type "Wake up!". Add --dangerously-skip-permissions for fully autonomous operation without permission prompts.

How It's Structured

Two parts: the framework (this repo) and user data (your machine). They never mix. Updates to the framework never overwrite your project data.

Framework Layout (this repo)
framework/
System instructions, agent prompts, quality rubric, hierarchy logic, community lessons seed
framework/agents/prompts/
Prompt templates for each agent type (Developer, Architect, QA, DevOps, Designer, Reviewer, Guardian, Researcher)
dashboard/
The Forge - Turborepo monorepo: @vldr/web (Next.js 15), @vldr/api (Express), @vldr/db (Drizzle + SQLite), @vldr/sdk, @vldr/shared
.claude/hooks/
13 lifecycle hooks: session start/stop, agent start/stop, task completed, teammate idle, worktree management, pre-compact state preservation
start.sh / start.bat
One-click launchers - starts Docker, seeds community lessons, opens The Forge
User Data (~/.volundr/ - your machine, never in the repo)
projects/registry.json
Active project pointer, project index, session state
projects/{id}/
Per-project: blueprint.md, constraints.md, cards/, reports/, checkpoints/, sow/
global/lessons.md
Aggregated lessons promoted from individual projects
global/patterns/
Reusable patterns from high-scoring cards, available to all future projects
data/the-forge.db
SQLite database - bind-mounted into Docker. Cards, agents, events, quality scores, cost tracking

Data Flow

Claude Code
CLI
13 Hooks
.claude/hooks/
Dashboard API
:3141 Express
SQLite DB
~/.volundr/data/
The Forge
:3000 Next.js

13 Lifecycle Hooks

Claude Code hooks intercept every significant moment. Session start runs crash recovery. Session end clears active project. Agent start injects project context. Agent stop accumulates token costs. Pre-compact preserves state across context resets.

Worktree Isolation

Every Developer agent works in a dedicated git worktree. Branches never collide. Volundr merges branches in dependency order after each parallel round. Failed branches are discarded cleanly.

Three-Tier Memory

HOT - always loaded: project summary, steering rules, last session. WARM - phase-selective: blueprint during planning, card specs during implementation. COLD - on demand: full history. Automatically managed.

Common Questions

Things people ask before getting started.

Does this require Claude Pro or Claude Max?

Volundr works with any Claude Code plan that supports the CLI. The framework itself has no subscription requirement. You pay for Claude usage the same way you always do - through your Anthropic account.

Can I add my own agents or customize existing ones?

Yes. Add prompt templates to framework/agents/prompts/ for new agent types, or place override files in ~/.volundr/customizations/{agent-type}/override.md for project-level customization. Overrides are additive - they extend the base prompt without replacing it.

How much does a typical project cost?

Volundr tracks token usage and dollar cost per card, per agent, and per session. A small project (5-10 cards) typically runs $2-10. Larger projects with multiple parallel agents scale linearly. Budget gating pauses execution before spawning agents if estimated cost exceeds your configured threshold.

What models does it use?

Configurable per agent role. Default tiers: Haiku for lightweight fix agents, Sonnet for domain developers and most team roles, Opus for architecture decisions and the Guardian review. You can override the model tier per role, per card type, or per session.

Is my project data shared anywhere?

No. All project data stays in ~/.volundr/ on your local machine. The SQLite database, blueprints, card specs, lessons, and session history never leave your environment. The framework repo contains only the community lessons seed file - which is anonymized patterns, not project content.

How does the cross-project memory work?

After each session, Volundr runs a self-review: it identifies quality trends, extracts lessons from failures, and promotes high-value patterns to the global knowledge base at ~/.volundr/global/. These patterns are loaded into future projects automatically, so agents get smarter over time.

What if an agent fails or produces bad output?

Volundr's quality gate runs after every agent completes. On failure, a lightweight Fixer agent is spawned for up to two retries. On double failure, the issue is escalated to you directly. Low quality scores (below 5.0/10) automatically generate behavioral steering rules to prevent the same issue in future agents. A blind card reviewer (independent Haiku agent) scores every card without seeing the self-score, ensuring honest quality tracking.

Commands & Skills

Built-in slash commands you can type during a session. These are shortcuts for common operations - Volundr handles the API calls and formatting.

/vldr-shutdown

Graceful shutdown protocol. Saves WIP, writes a session summary, runs self-review (quality trends, cost analysis, pattern identification), generates lessons, creates a checkpoint, and presents a final status report. Always run this before ending a session.

/vldr-journal

Log a journal entry - decisions, insights, blockers, pivots. These provide cognitive context that helps future sessions understand what happened and why.

/vldr-journal decision Chose flat hierarchy
/vldr-journal blocker Migration failing
/vldr-journal insight Build gate after install

/vldr-status

Quick project status. Shows dashboard health, active project, card progress by status, running agents, and total cost. Useful for a quick check mid-session.

/vldr-pack

Pack management. Install, list, and inspect agent packs. Packs bundle persona seeds, skills, agent prompts, and routing rules into installable modules.

/vldr-pack list
/vldr-pack install security
/vldr-pack inspect core

/vldr-doctor

Setup validation. Checks Docker, dashboard health, VLDR_HOME, project registry, database status, git version, Node.js, hooks, enforcement hooks, and settings. Reports pass/fail/warning for each.

/vldr-directive

Governance directives. List, add, suppress, or supersede active directives for a project. Directives are rules that persist across sessions and influence agent behavior.

/vldr-directive list
/vldr-directive add "No ORM - raw SQL only"

/vldr-economy

Toggle economy mode on the active project. Downgrades agent models to cheaper tiers (Opus → Sonnet, Sonnet → Haiku) to reduce cost when budget is tight. Toggle off to restore default model assignments.

/vldr-route

Test routing rules. Describe a piece of work and see which persona and agent tier Volundr would select. Useful for debugging persona matching or verifying routing rules behave as expected.

/vldr-route "Add PostgreSQL migration"

/vldr-compact

Context compaction with state preservation. When your conversation gets long, this compacts the context while retaining critical project state - active cards, teammate assignments, phase, and recovery instructions.

Ready to build?

Clone the repo, start the dashboard, launch Claude Code, and say "Wake up!"

github.com/sebwesselhoff/volundr
git clone https://github.com/sebwesselhoff/volundr.git
MIT License - Volundr v5.0 - Autonomous Agent Framework