Volundr v5 - Autonomous Agent Orchestration for Claude Code

Overview

What Is This?

Volundr turns Claude Code into a senior engineering lead that manages the entire software development lifecycle - automatically.

Volundr (Old Norse: Volundr, English: Wayland the Smith) - the legendary master craftsman of Norse mythology. A smith of unmatched skill who could forge anything, working alone in his forge with tireless precision.

When asked what the framework should be called, the AI chose the name itself. An autonomous smith that takes raw materials and forges finished work. The dashboard is The Forge. The agent visualization is The Þing - the Old Norse assembly. The campfire is where the team gathers.

⚡

One Command to Start

Open Claude Code in the volundr directory and say "Wake up!" - Volundr handles everything from there. It interviews you, creates a blueprint, breaks work into cards, and starts building.

👥

Real Agent Teams

Volundr spawns specialized agents - Developer, Architect, QA, DevOps, Designer, Reviewer, Guardian, Researcher - each with their own domain, tools, and communication channels.

🧠

Learns Across Projects

Every session ends with a self-review. Lessons are extracted, patterns identified, and knowledge accumulates in a private local database - making every future project smarter.

The Elevator Pitch

Volundr is a PM, architect, and engineering lead that lives inside Claude Code. You describe what you want to build. It interviews you with 5-10 targeted questions. It writes a blueprint and reviews it with a panel of virtual perspectives. It breaks work into cards with binary success criteria. It spawns specialized agent teammates - one per domain, running in parallel. It scores every deliverable, retries failures, generates behavioral rules from low scores, and writes a retrospective when done. All data stays on your machine. Nothing leaves your environment. Start with "Wake up!".

Lifecycle

How It Works

Every project follows the same five-stage flow. Volundr guides you through each stage and handles the complexity automatically.

💬

Discovery

Volundr interviews you - vision, stack, constraints, design, and review gate level. Opinionated defaults. Challenges vague requirements.

📄

Blueprint

Architecture, card breakdown, dependency graph. A virtual roundtable of perspectives debates the plan before work starts.

⚙

Execution

Agent teammates run in parallel - one per domain. Each works in an isolated worktree with trait-composed prompts. Architect and QA run alongside.

✓

Quality

Every card is scored on four dimensions. Failures get retried. Low scores generate behavioral rules. Mandatory cross-branch review before merge.

🚀

Ship

Integration testing. Guardian architecture audit. Documentation. Retrospective. Lesson promotion to the global knowledge base.

Card System

Work is broken into cards with binary testable success criteria. The dashboard enforces these - a card cannot be marked done unless all criteria pass with evidence. Quality is scored on four dimensions: completeness, code quality, format compliance, and correctness.

Quality Gates & Blind Review

After every agent completes: type check, production build, smoke test, anti-pattern scan, and success criteria verification. Then a blind card reviewer (independent Haiku agent) scores the card without seeing the self-score. Both scores are tracked side-by-side. On failure: a Fixer agent retries up to twice. On double failure: escalated directly.

Persona System

The Viking Roster

21 specialized personas named after Norse mythology figures. Each persona carries expertise signals, personality traits, and a preferred work style. When a card needs doing, Volundr matches it to the right persona automatically.

How matching works: Every card has a domain, a stack, and acceptance criteria. Volundr scans the persona roster and selects the persona whose expertise tags best match the work. A database migration card? Mímir Deepwell (database specialist). A security audit? Víðarr Silentward. Frontend polish? Iðunn Goldleaf.

The persona shapes the agent's behavior. A persona with the cautious trait will write more defensive code. One with terse will keep output minimal. Persona stats (quality average, reliability, cards completed) accumulate over time - the dashboard tracks which personas perform best.

⚔

Týr Lawbringer

Architect

👁

Heimdall Watchfire

Auth & Security

📚

Mímir Deepwell

Database

🌀

Skuld Threadweaver

Data Engineering

🔧

Brokkr Forgehand

DevOps

📖

Saga Storyteller

Documentation

☆

Baldr Brightblade

Fullstack

🔀

Rán Tidecaller

Migration

🛡

Víðarr Silentward

Security

🔍

Forseti Truthseeker

🍀

Iðunn Goldleaf

Frontend

⚡

Hermóðr Swiftmessage

API

🐍

Sigyn Steadfast

Python

📱

Sleipnir Swiftfoot

Mobile

☁

Skaði Cloudpiercer

Cloud

💨

Magni Irongrip

Performance

🤖

Huginn Thoughtwing

AI / ML

♿

Höðr Allseer

Accessibility

🔭

Muninn Farseeker

Researcher

💎

Eitri Runecaster

.NET

🔎

Freyja Goldseeker

SEO

Personas - Radar Chart, Stats & Skill History

Persona detail page with radar chart, quality average, cards completed, and skill history

Skills Library - Extracted Learnings

Skills library with severity badges, tags, source tracking, and extracted learnings

Three-Tier Discovery

User-created personas (highest priority) override pack-installed personas, which override built-in roster personas. Same ID = override. This means you can customize any built-in persona without losing the defaults.

Persona Builder

Create custom personas from the dashboard. Define name, role, expertise tags, personality traits, writing style, and model preference. Or override any built-in persona with your own tweaks.

Skill Tracking

As agents complete work, skills are extracted and linked to their persona. Over time, each persona builds a skill profile - tracked with domain, confidence level, and version. The dashboard shows learned skills per persona with a radar chart.

Quality System

Scoring & Review

Every card is scored on a 1-10 scale across four weighted dimensions. A blind reviewer double-checks the work. Low scores trigger automatic corrections. Nothing ships without passing the gate.

The Formula

C

Completeness

×3

Q

Code Quality

×3

F

Format

×2

R

Correctness

×2

/10

Weighted Score

Each dimension scored 1-10. Weights reflect priority: completeness and code quality matter most (×3), format and correctness are important but secondary (×2). A score of 7 means "meets spec" - not everything deserves a 10.

👀 Blind Card Reviewer

After the implementing agent self-scores, an independent reviewer agent (Haiku model for cost efficiency) scores the same card without seeing the self-score. This prevents score inflation. Both scores are stored and shown side-by-side in the compliance heatmap with S (self) and R (reviewer) badges.

⚠ Steering Rules

When a card scores below 5.0/10, Volundr generates a behavioral steering rule. These are concrete "don't do this again" instructions appended to the project constraints. Future agents inherit them automatically. Rules are tagged with the card ID and timestamp so you can see why each rule exists.

Compliance - Enforcement Score & Heatmap

Compliance page with enforcement score, rated averages, and per-card quality heatmap

Insights - Quality Trend, Cost & Token Charts

Insights page with quality trend, card velocity, token usage, and agent breakdown

The Build Gate

Every card runs through a six-step gate before it can be merged:

1. Type Check 2. Production Build 3. Smoke Test 4. Anti-Pattern Scan 5. Success Criteria 6. Blind Review

Dashboard

The Forge

A real-time dashboard that shows everything happening across your project. WebSocket live updates. No polling. No refresh needed.

localhost:3000 - The Forge

The Forge dashboard - at a glance stats, live feed, progress

Board - Cards with ISC Criteria & Dependencies

Board view with acceptance criteria, ISC checks, and dependency graph

Events - Filterable Audit Log

Events page with filterable audit log and expandable detail rows

📋

Real-Time Kanban

Live card board with status columns: backlog, ready, in-progress, blocked, in-review, done. Cards update in real time as agents complete work.

🌳

Agent Tracker

Every agent registered, tracked, and completed. See which agents are live, what they are working on, and how many tokens they have consumed.

📈

Quality Scores

Per-card quality scores with trend lines. Four dimensions weighted and aggregated. Session average vs all-time average. Low scores flagged automatically.

💵

Cost Metrics

Token usage and dollar cost tracked per card, per agent, and per session. Cache hit ratio. Cost per card. Budget gating prevents runaway spending.

📡

Event Log

Every significant action logged with timestamp and context. Agent spawns, card transitions, quality gate outcomes, and steering rule generations.

⚡

WebSocket Live Updates

The dashboard uses WebSockets for instant updates. No polling delay. Changes from agents appear on the board within milliseconds.

Communication

The Þing

Named after the Old Norse assembly where decisions were debated and made collectively. This is where agent teams communicate, claim work, raise concerns, and resolve conflicts - in real time.

The Thing - Live Agent Assembly

🔥 The Campfire

At the center: an animated campfire with flickering light. Around it: every active agent appears as a silhouette figure. Volundr sits at the conductor's seat. Developers fan out. Specialists appear as they spawn and fade out on completion.

💬 Agent Communication

Agents talk to each other via message passing. A developer claims a card and announces work. The architect reviews the approach and raises concerns. QA reports test failures. Volundr mediates conflicts and makes final calls. All messages stream to the dashboard in real time.

⚖ The Roundtable

Before implementation starts, Volundr convenes a virtual roundtable. Multiple perspectives debate the blueprint - a skeptic challenges assumptions, a pragmatist focuses on delivery, an architect guards patterns. The plan is stress-tested before a single line of code is written.

🔄 Task Lifecycle

A card moves through: persona matched → developer claims → worktree created → implementation → self-review → blind reviewer scores → quality gate pass/fail → merge or retry. Every transition fires a hook and updates the dashboard instantly.

Agent System

The Roster

Eight specialized agent types, each with a defined role, tool access, model tier, and communication protocol. Volundr selects and spawns the right agents based on the work at hand.

Volundr

Team Lead - Orchestrates Everything

Developers

Architect

QA Engineer

DevOps Engineer

Designer

Reviewer

Guardian

Researcher

Orchestrator

Volundr

The PM, architect, and engineering lead rolled into one. Manages the entire project lifecycle, spawns all agents, merges branches, scores quality, and generates steering rules from failures.

Teammate

Developer

Claims tasks from the shared task list and implements cards directly in isolated worktrees. One per domain, up to four running in parallel. Trait-composed per card type.

Teammate

Architect

Continuous design guardian. Reviews card specs before work starts. Catches scope creep, pattern violations, and anti-patterns. Read-only - influences through messages and comments.

Teammate

QA Engineer

Owns test strategy and coverage. Writes tests alongside implementation, runs test suites, and messages developers directly when failures are found. Tracks coverage across the project.

Teammate

DevOps Engineer

Owns infrastructure: Docker, CI/CD, migrations, environment config, and deployment pipelines. Runs environment verification at the start of every project.

Teammate

Designer

UI/UX quality, component patterns, accessibility, and responsive design. Reviews via browser screenshots. Implements CSS and design tokens directly in the codebase.

Teammate

Reviewer

Cross-domain code review and spotcheck enforcement. Reads all completed branches, flags issues with file:line references. Severity levels: BLOCK, WARN, and INFO. BLOCK findings prevent merge.

Milestone

Guardian

Full codebase architecture audit at milestones. Reviews for pattern consistency, circular imports, type safety, duplication, and security. Grades the codebase A/B/C with specific remediation steps.

Teammate

Researcher

Pre-implementation research on external APIs, libraries, and documentation. Produces reports, TypeScript interface mappings, and endpoint catalogs that developers can use directly.

Extensibility

Packs

Packs bundle agent prompts, persona seeds, skills, and routing rules into installable modules. The framework ships with 8 built-in packs. You can create and install your own.

Core

Developer, Architect, Reviewer, Planner prompts. The foundation every project uses.

Testing

QA Engineer prompts. Test strategy, coverage tracking, Playwright E2E.

Quality

Guardian audit, blind card reviewer, quality rubric. Scoring and gates.

Infrastructure

DevOps Engineer. Docker, CI/CD, migrations, deployment pipelines.

Frontend

Designer prompts. UI/UX, accessibility, component patterns, responsive design.

Security

Security review prompts. OWASP scanning, auth patterns, vulnerability assessment.

Research

Researcher prompts. External API docs, endpoint mapping, library evaluation.

Languages

Python, .NET, Mobile, AI/ML specialist personas and prompts.

What's in a Pack?

Each pack contains a pack.json manifest with: agent prompt templates, persona seed definitions, skill declarations, and routing rules. On install, persona seeds are written to the database with source='pack' so the three-tier discovery knows their priority.

Build Your Own

Create a directory under framework/packs/ with a pack.json, prompt templates in prompts/, and persona definitions. Install via the /vldr-pack install command. Your pack's personas will override built-in ones with the same ID.

Getting Started

Start in 3 Steps

Prerequisites: Docker and the Claude Code CLI. The launcher script handles everything else — Docker startup, dashboard container, browser, and launching Claude with the "Wake up!" prompt.

Install Prerequisites

You need Docker and the Claude Code CLI. That's it.

npm install -g @anthropic-ai/claude-code

Clone the Repository

Clone Volundr and enter the directory. The framework lives here — your project data stays in ~/.volundr/ and never touches this repo.

github.com/sebwesselhoff/volundr

git clone https://github.com/sebwesselhoff/volundr.git
cd volundr

Run the Launcher

The launcher does everything automatically: starts Docker if needed, initializes ~/.volundr/ on first run, pulls and starts the dashboard container, waits for the API health check, opens The Forge in your browser, and launches Claude Code with the "Wake up!" prompt.

# macOS / Linux
./start.sh

# Windows
start.bat

That's it. Volundr activates, checks the dashboard connection, loads or creates a project, and starts the discovery interview. You describe what you want to build — Volundr does the rest.

The dashboard runs at http://localhost:3000, the API at http://localhost:3141. If you prefer to start Claude manually instead, just run claude from the volundr directory and type "Wake up!". Add --dangerously-skip-permissions for fully autonomous operation without permission prompts.

Architecture

How It's Structured

Two parts: the framework (this repo) and user data (your machine). They never mix. Updates to the framework never overwrite your project data.

Framework Layout (this repo)

framework/

System instructions, agent prompts, quality rubric, hierarchy logic, community lessons seed

framework/agents/prompts/

Prompt templates for each agent type (Developer, Architect, QA, DevOps, Designer, Reviewer, Guardian, Researcher)

dashboard/

The Forge - Turborepo monorepo: @vldr/web (Next.js 15), @vldr/api (Express), @vldr/db (Drizzle + SQLite), @vldr/sdk, @vldr/shared

.claude/hooks/

13 lifecycle hooks: session start/stop, agent start/stop, task completed, teammate idle, worktree management, pre-compact state preservation

start.sh / start.bat

One-click launchers - starts Docker, seeds community lessons, opens The Forge

User Data (~/.volundr/ - your machine, never in the repo)

projects/registry.json

Active project pointer, project index, session state

projects/{id}/

Per-project: blueprint.md, constraints.md, cards/, reports/, checkpoints/, sow/

global/lessons.md

Aggregated lessons promoted from individual projects

global/patterns/

Reusable patterns from high-scoring cards, available to all future projects

data/the-forge.db

SQLite database - bind-mounted into Docker. Cards, agents, events, quality scores, cost tracking

Data Flow

Claude Code

CLI

→

13 Hooks

.claude/hooks/

→

Dashboard API

:3141 Express

→

SQLite DB

~/.volundr/data/

→

The Forge

:3000 Next.js

13 Lifecycle Hooks

Claude Code hooks intercept every significant moment. Session start runs crash recovery. Session end clears active project. Agent start injects project context. Agent stop accumulates token costs. Pre-compact preserves state across context resets.

Worktree Isolation

Every Developer agent works in a dedicated git worktree. Branches never collide. Volundr merges branches in dependency order after each parallel round. Failed branches are discarded cleanly.

Three-Tier Memory

HOT - always loaded: project summary, steering rules, last session. WARM - phase-selective: blueprint during planning, card specs during implementation. COLD - on demand: full history. Automatically managed.

FAQ

Common Questions

Things people ask before getting started.

Does this require Claude Pro or Claude Max?

Volundr works with any Claude Code plan that supports the CLI. The framework itself has no subscription requirement. You pay for Claude usage the same way you always do - through your Anthropic account.

Can I add my own agents or customize existing ones?

Yes. Add prompt templates to framework/agents/prompts/ for new agent types, or place override files in ~/.volundr/customizations/{agent-type}/override.md for project-level customization. Overrides are additive - they extend the base prompt without replacing it.

How much does a typical project cost?

Volundr tracks token usage and dollar cost per card, per agent, and per session. A small project (5-10 cards) typically runs $2-10. Larger projects with multiple parallel agents scale linearly. Budget gating pauses execution before spawning agents if estimated cost exceeds your configured threshold.

What models does it use?

Configurable per agent role. Default tiers: Haiku for lightweight fix agents, Sonnet for domain developers and most team roles, Opus for architecture decisions and the Guardian review. You can override the model tier per role, per card type, or per session.

Is my project data shared anywhere?

No. All project data stays in ~/.volundr/ on your local machine. The SQLite database, blueprints, card specs, lessons, and session history never leave your environment. The framework repo contains only the community lessons seed file - which is anonymized patterns, not project content.

How does the cross-project memory work?

After each session, Volundr runs a self-review: it identifies quality trends, extracts lessons from failures, and promotes high-value patterns to the global knowledge base at ~/.volundr/global/. These patterns are loaded into future projects automatically, so agents get smarter over time.

What if an agent fails or produces bad output?

Volundr's quality gate runs after every agent completes. On failure, a lightweight Fixer agent is spawned for up to two retries. On double failure, the issue is escalated to you directly. Low quality scores (below 5.0/10) automatically generate behavioral steering rules to prevent the same issue in future agents. A blind card reviewer (independent Haiku agent) scores every card without seeing the self-score, ensuring honest quality tracking.

Slash Commands

Commands & Skills

Built-in slash commands you can type during a session. These are shortcuts for common operations - Volundr handles the API calls and formatting.

`/vldr-shutdown`

Graceful shutdown protocol. Saves WIP, writes a session summary, runs self-review (quality trends, cost analysis, pattern identification), generates lessons, creates a checkpoint, and presents a final status report. Always run this before ending a session.

`/vldr-journal`

Log a journal entry - decisions, insights, blockers, pivots. These provide cognitive context that helps future sessions understand what happened and why.

/vldr-journal decision Chose flat hierarchy
/vldr-journal blocker Migration failing
/vldr-journal insight Build gate after install

`/vldr-status`

Quick project status. Shows dashboard health, active project, card progress by status, running agents, and total cost. Useful for a quick check mid-session.

`/vldr-pack`

Pack management. Install, list, and inspect agent packs. Packs bundle persona seeds, skills, agent prompts, and routing rules into installable modules.

/vldr-pack list
/vldr-pack install security
/vldr-pack inspect core

`/vldr-doctor`

Setup validation. Checks Docker, dashboard health, VLDR_HOME, project registry, database status, git version, Node.js, hooks, enforcement hooks, and settings. Reports pass/fail/warning for each.

`/vldr-directive`

Governance directives. List, add, suppress, or supersede active directives for a project. Directives are rules that persist across sessions and influence agent behavior.

/vldr-directive list
/vldr-directive add "No ORM - raw SQL only"

`/vldr-economy`

Toggle economy mode on the active project. Downgrades agent models to cheaper tiers (Opus → Sonnet, Sonnet → Haiku) to reduce cost when budget is tight. Toggle off to restore default model assignments.

`/vldr-route`

Test routing rules. Describe a piece of work and see which persona and agent tier Volundr would select. Useful for debugging persona matching or verifying routing rules behave as expected.

/vldr-route "Add PostgreSQL migration"

`/vldr-compact`

Context compaction with state preservation. When your conversation gets long, this compacts the context while retaining critical project state - active cards, teammate assignments, phase, and recovery instructions.