Harness: A Multi-Agent Claude Orchestrator That Builds and QA's Games Autonomously

See how an orchestrator autonomously builds and tests games using Claude agents, from feature implementation to evaluation, with live progress streaming.

Overview

Harness is a game-agnostic multi-agent orchestrator that uses the Claude Agent SDK to autonomously build and test games end-to-end.

It reads a feature-list.json from any target game project, spawns a role-specific generator agent (gameplay / UI-art / audio) to implement each feature with full tool access (Read, Write, Edit, Bash, Glob, Grep + Godot MCP), then spawns a separate evaluator agent that launches the game, classifies each test step as DATA / VISUAL / BOTH, captures screenshots, and emits pass/fail verdicts. A watchdog loop detects stuck sessions and a pre-flight smoke test validates tool permissions before real work begins. Everything streams live to a single-process Fastify + React dashboard over WebSockets — you watch agents ship and break features in real time.

For the demo I’ll drive Harness against a live Godot project, show the full loop (pick → implement → evaluate → retry), and walk through how session records, screenshots, and eval reports are persisted back to the target repo.

Tech stack