Hackathon Showcase 3rd Place Winner

hagging haggards

Team consisting of an AI Engineer (PALO IT) and a Robotics Engineer (BeeX, SUTD), building with full-stack, vision (YOLO), and agentic memory expertise.

4 members

Project Description

Project Description
Audio to Diagram Generator solves the problem of creating visual diagrams during technical discussions. Instead of manually drawing while listening to meetings or brainstorming sessions, users upload audio files and receive automatically generated diagrams in real-time.

Core Flow:

User uploads an audio file (meeting recording, technical discussion, lecture)
Audio plays in 5-second chunks, transcribed via ElevenLabs API
Three AI agents work together: (1) Transcription Agent converts speech to text, (2) Diagram Type Decision Agent (GPT-4o-mini via OpenRouter) analyzes context to determine optimal diagram type (mindmap, sequence, ER, flowchart, or generic), (3) Diagram Generation Agent (GPT-4o-mini via OpenRouter) creates React Flow nodes and edges
Diagrams update incrementally on canvas as audio progresses, maintaining conversation context across chunks
Users interact with final diagrams (pan, zoom, rearrange nodes)
Key Demo: Upload a system design discussion audio file, watch as the agent detects it’s an API flow conversation, and observe a sequence diagram automatically generate showing authentication service → database → JWT token flow.

End-to-End Setup:

Clone repository
Install dependencies: npm install
Add API keys to .env.local: ELEVENLABS_API_KEY, OPENROUTER_API_KEY, OPENAI_API_KEY
Run: npm run dev
Navigate to http://localhost:3000
Upload sample audio file from public folder or use provided test file
Click play and watch diagram generation in real-time
Judging Criteria Alignment

Working Prototype

Fully functional web application with complete audio-to-diagram pipeline. Users can upload audio files, view real-time transcription progress, see agent status indicators, and interact with generated diagrams. All three agents operate seamlessly with WebSocket-based state management. Demo-ready with test audio files included.

Technical Complexity & Integration

Multi-agent architecture featuring three specialized agents coordinating via WebSocket communication. Implements stateful diagram generation with incremental updates (not full regeneration), context retention across audio chunks, and intelligent diagram type detection. Integrates multiple APIs (ElevenLabs for transcription, OpenRouter for LLM orchestration, OpenAI GPT-4o-mini for reasoning), React Flow for interactive visualizations, and custom audio chunking logic for real-time processing.

Innovation & Creativity

Novel application of agent-based reasoning to visualization problems. The Diagram Type Decision Agent demonstrates meta-cognitive AI - it analyzes conversation content to select visualization strategy before generation. Incremental diagram updates show agent “memory” and context awareness. Creative use of LLM structured output (JSON schema validation) ensures reliable React Flow diagram generation. Solves the cognitive overload problem of simultaneous listening and diagramming.

Real-World Impact

Addresses widespread pain point across engineering teams, educators, product managers, and consultants. Saves hours of post-meeting diagram creation, reduces documentation drift from actual discussions, enables accessible meeting summaries for remote teams, and helps non-visual thinkers translate conversations into structured diagrams. Immediate applications: technical architecture reviews, database schema planning, process documentation, and educational content creation.

Theme Alignment

Exemplifies practical agentic AI through specialized agents with distinct responsibilities working autonomously. Each agent demonstrates: (1) Goal-directed behavior (transcribe, decide, generate), (2) Context awareness (full transcript history maintained), (3) Decision-making (diagram type selection based on semantic analysis), (4) Iterative refinement (incremental node/edge updates), and (5) Coordination (agents communicate via shared session state). Not just LLM prompting - true multi-agent orchestration.

Technologies & Stack
Frontend:
Next.js 16.0.8 (App Router, React Server Components)
React 19.2.1 + TypeScript 5
React Flow 11.11.0 (diagram rendering)
Tailwind CSS 4 (styling)
Lucide React 0.560.0 (icons)
AI & APIs:

ElevenLabs API (audio transcription)
OpenRouter API (LLM gateway)
OpenAI GPT-4o-mini (diagram type decision & generation)
Zod 3.23.0 (schema validation)
Agent Libraries:

Custom agent implementations in TypeScript
WebSocket protocol for real-time agent communication
In-memory state management (Map-based stores for nodes/edges)
Development:

ESLint 9 + Next.js config
PostCSS + Tailwind
Deployment:

Runs locally or deployable to Vercel
Environment variables for API key management

Team

Products & Tools

Cursor ElevenLabs