ElevenLabs Worldwide Hackathon

Final round winners have been announced. View Results

Team

Self Print

Project Concept

Memory Keeper: An AI Photo-Journalist for Your Life Stories

Memory Keeper is an AI photo-journalist that turns a single photo into a natural, voice-to-voice conversation — and ultimately into a beautifully printable memory card.
Its purpose is simple: help people, especially families and elders, preserve the stories behind their photos without needing to write or use complicated tools.

How the AI Photo-Journalist Works

When someone uploads a photo, the AI photo-journalist:

1. Looks at the photo like a real reporter

The image is stored in Vercel Blob
OpenAI GPT-4o Vision grounds the agent in the scene — the people, objects, setting, and emotional tone.

2. Begins a live, back-and-forth conversation

Using the OpenAI Realtime API (gpt-4o-realtime)
The agent speaks with a warm, curious voice inspired by Brandon Stanton
It asks one gentle follow-up question at a time, just like a real interviewer.

3. Helps the user tell the story behind the moment

It listens, interprets, and probes deeper conversationally:
Who’s in the photo?
What was happening?
Why does this moment matter?
How did it feel?

4. Turns the conversation into a keepsake

The transcript becomes a poetic title + narrative
Rendered as a Polaroid-style memory card you can print or share.

The entire experience is driven by the AI agent — the web layer simply supports it.

Project Goals

Emotional Goal

Capture stories people intend to write one day but never do — especially those from older relatives or meaningful family moments.

Product Goal

Deliver an end-to-end prototype that feels magical and real: smooth voice interaction, minimal friction, and a delightful final keepsake.

Agent Goal

Demonstrate a true agentic workflow:

Vision-guided context
Realtime conversational loop
Autonomy in deciding when a memory is ready to be turned into a finished card

What We’re Improving Next

Agent Intelligence

Refining the photo-journalist persona (tone, depth, pacing)
Smarter follow-up questions using Vision + conversation history
Faster turn-taking and lower latency in the realtime loop

End-to-End Flow

Upload → Vercel Blob → Vision → realtime voice → narrative generation
Transcript cleanup for consistent, poetic memory-card output
Serving images via /api/photo/[photoId]

Experience Design

Scrapbook-inspired home screen
Polaroid-camera metaphor for generating keepsakes
Print-ready card layout (typography, framing, vintage touch)
Smooth feedback: progress bar, speaking/listening indicators

Stability & Reliability

Robust WebSocket reconnection
Large image handling
Mic permission issues across browsers
Graceful fallback when Vision encounters errors

How Others Can Contribute

Conversation & UX

Improve persona design and question flow
Create flows for grandparents, couples, travel memories
Better visual cues for “memory readiness”

Engineering

Audio encoding + buffering optimization
Optional storage backends (S3, Cloudinary, Supabase)
Export pipelines: email, archive, batch printing

AI & Prompting

Explore additional “modes” (kids, lighthearted, serious, multilingual)
Improve story-generation consistency and emotional specificity
Add multi-language support end-to-end

Design & Branding

Logo, color system, typography refinements
Additional card templates (wedding, travel, family, archival)
Accessibility across interactions

Research

Field tests with families and elders
Trust-building and safety insights
Privacy, consent, and long-term archival patterns

Tech Stack

Next.js + TypeScript + Tailwind + shadcn/ui
OpenAI Realtime API
GPT-4o + GPT-4o Vision
Vercel Blob Storage

If you love storytelling, memories, and human-centered AI, the AI photo-journalist has a lot of exciting surfaces to build on — from conversation design to realtime audio to visual memory-making.

Entry

Status: Submitted

Last saved: December 06 at 9:45 AM +08

View Entry

Team Roster

Message board not available for this team yet.

Yu Gary Wang Team Lead RSVP Approved

AI Product Manager at Zoom

Gary led the backend AI pipeline and agent workflow. He implemented the full technical stack, including Vercel Blob for photo upload and storage, OpenAI Vision for image grounding, and the Realtime API for natural, back-and-forth audio conversation. He built the agent state machine, prompt orchestration, and transcript-to-“memory card” generation using GPT-4o. He also set up the browser-based end-to-end architecture with Next.js API routes, coordinating data flow across upload → vision analysis → realtime voice chat → memory-card generation.

I build at the intersection of AI, language, and education. Cornell-trained in CS (after starting as a French major), I’ve built AI language tools, AI-native writing assistants, and coding-education products used by students, engineers, and creators around the world. I love shipping products fast, learning across cultures, and helping people express themselves through tech and language.

I’m looking to jam with builders in AI-native productivity, language tech, voice interfaces, and education. Always down to connect with developers, PMs, designers, founders, and anyone building tools that help people learn, express, and create faster. Also curious about global tech culture, new learning models, and AI-powered communication.

Projects • soyas.ai — an AI language companion for global tech professionals. • AI Companion at Zoom

Jialin Ye RSVP Approved

Product Designer at Zoom

Jialin led the product design for Selfprint. She designed the end-to-end user flow, including the photo-upload experience, the real-time conversation interface, and the final “memory card” layout. She worked closely with the engineering side to ensure the UI integrated smoothly with OpenAI Vision outputs and the Realtime API’s audio interaction model. Her work covered interaction design, visual design, and the overall experience architecture.

I’m a product designer focused on turning messy, complex ideas into clear, useful experiences. I’ve worked across data tooling, AI, fintech, and currently design enterprise AI products. I’m drawn to systems, structure, and the challenge of shaping ambiguous problems into something thoughtful and intuitive.

I’m interested in learning more about agentic workflow design and how voice AI can be used effectively in enterprise environments. I’m especially curious about making voice interactions feel natural in office settings, where speaking to a device can feel awkward. I want to explore conversation models that feel more like collaborative exchanges or meeting-style interactions rather than one-way monologues.

My focus is on how AI can accelerate creating product demos, narratives, and showcase-ready concepts without losing the fundamentals of good design. Part of this is exploring when design should clarify the value proposition at a high level, and when it should dive into detail. I’m trying to understand how AI can support both: helping designers communicate the core idea convincingly while still enabling the level of craft and precision that makes a concept feel real.