Hackathon Portal
AI Tinkerers - Singapore
Final round winners have been announced. View Results
Team

Self Print

Project Concept

Memory Keeper: An AI Photo-Journalist for Your Life Stories

Memory Keeper is an AI photo-journalist that turns a single photo into a natural, voice-to-voice conversation — and ultimately into a beautifully printable memory card.
Its purpose is simple: help people, especially families and elders, preserve the stories behind their photos without needing to write or use complicated tools.


How the AI Photo-Journalist Works

When someone uploads a photo, the AI photo-journalist:

1. Looks at the photo like a real reporter

  • The image is stored in Vercel Blob
  • OpenAI GPT-4o Vision grounds the agent in the scene — the people, objects, setting, and emotional tone.

2. Begins a live, back-and-forth conversation

  • Using the OpenAI Realtime API (gpt-4o-realtime)
  • The agent speaks with a warm, curious voice inspired by Brandon Stanton
  • It asks one gentle follow-up question at a time, just like a real interviewer.

3. Helps the user tell the story behind the moment

It listens, interprets, and probes deeper conversationally:
Who’s in the photo?
What was happening?
Why does this moment matter?
How did it feel?

4. Turns the conversation into a keepsake

  • The transcript becomes a poetic title + narrative
  • Rendered as a Polaroid-style memory card you can print or share.

The entire experience is driven by the AI agent — the web layer simply supports it.


Project Goals

Emotional Goal

Capture stories people intend to write one day but never do — especially those from older relatives or meaningful family moments.

Product Goal

Deliver an end-to-end prototype that feels magical and real: smooth voice interaction, minimal friction, and a delightful final keepsake.

Agent Goal

Demonstrate a true agentic workflow:

  • Vision-guided context
  • Realtime conversational loop
  • Autonomy in deciding when a memory is ready to be turned into a finished card

What We’re Improving Next

Agent Intelligence

  • Refining the photo-journalist persona (tone, depth, pacing)
  • Smarter follow-up questions using Vision + conversation history
  • Faster turn-taking and lower latency in the realtime loop

End-to-End Flow

  • Upload → Vercel Blob → Vision → realtime voice → narrative generation
  • Transcript cleanup for consistent, poetic memory-card output
  • Serving images via /api/photo/[photoId]

Experience Design

  • Scrapbook-inspired home screen
  • Polaroid-camera metaphor for generating keepsakes
  • Print-ready card layout (typography, framing, vintage touch)
  • Smooth feedback: progress bar, speaking/listening indicators

Stability & Reliability

  • Robust WebSocket reconnection
  • Large image handling
  • Mic permission issues across browsers
  • Graceful fallback when Vision encounters errors

How Others Can Contribute

Conversation & UX

  • Improve persona design and question flow
  • Create flows for grandparents, couples, travel memories
  • Better visual cues for “memory readiness”

Engineering

  • Audio encoding + buffering optimization
  • Optional storage backends (S3, Cloudinary, Supabase)
  • Export pipelines: email, archive, batch printing

AI & Prompting

  • Explore additional “modes” (kids, lighthearted, serious, multilingual)
  • Improve story-generation consistency and emotional specificity
  • Add multi-language support end-to-end

Design & Branding

  • Logo, color system, typography refinements
  • Additional card templates (wedding, travel, family, archival)
  • Accessibility across interactions

Research

  • Field tests with families and elders
  • Trust-building and safety insights
  • Privacy, consent, and long-term archival patterns

Tech Stack

  • Next.js + TypeScript + Tailwind + shadcn/ui
  • OpenAI Realtime API
  • GPT-4o + GPT-4o Vision
  • Vercel Blob Storage

If you love storytelling, memories, and human-centered AI, the AI photo-journalist has a lot of exciting surfaces to build on — from conversation design to realtime audio to visual memory-making.

Entry

Status: Submitted

Last saved: December 06 at 9:45 AM +08

Team Roster

Message board not available for this team yet.

Yu Gary Wang Team Lead RSVP Approved

AI Product Manager at Zoom
Gary led the backend AI pipeline and agent workflow. He implemented the full technical stack, including Vercel Blob for photo upload and storage, OpenAI Vision for image grounding, and the Realtime API for natural, back-and-forth audio conversation. He built the agent state machine, prompt orchestration, and transcript-to-“memory card” generation using GPT-4o. He also set up the browser-based end-to-end architecture with Next.js API routes, coordinating data flow across upload → vision analysis → realtime voice chat → memory-card generation.
I build at the intersection of AI, language, and education. Cornell-trained in CS (after starting as a French major), I’ve built AI language tools, AI-native writing assistants, and coding-education products used by students, engineers, and creators around the world. I love shipping products fast, learning across cultures, and helping people express themselves through tech and language.
I’m looking to jam with builders in AI-native productivity, language tech, voice interfaces, and education. Always down to connect with developers, PMs, designers, founders, and anyone building tools that help people learn, express, and create faster. Also curious about global tech culture, new learning models, and AI-powered communication.
Projects • soyas.ai — an AI language companion for global tech professionals. • AI Companion at Zoom

Jialin Ye RSVP Approved

Product Designer at Zoom
Jialin led the product design for Selfprint. She designed the end-to-end user flow, including the photo-upload experience, the real-time conversation interface, and the final “memory card” layout. She worked closely with the engineering side to ensure the UI integrated smoothly with OpenAI Vision outputs and the Realtime API’s audio interaction model. Her work covered interaction design, visual design, and the overall experience architecture.
I’m a product designer focused on turning messy, complex ideas into clear, useful experiences. I’ve worked across data tooling, AI, fintech, and currently design enterprise AI products. I’m drawn to systems, structure, and the challenge of shaping ambiguous problems into something thoughtful and intuitive.
I’m interested in learning more about agentic workflow design and how voice AI can be used effectively in enterprise environments. I’m especially curious about making voice interactions feel natural in office settings, where speaking to a device can feel awkward. I want to explore conversation models that feel more like collaborative exchanges or meeting-style interactions rather than one-way monologues.
My focus is on how AI can accelerate creating product demos, narratives, and showcase-ready concepts without losing the fundamentals of good design. Part of this is exploring when design should clarify the value proposition at a high level, and when it should dive into detail. I’m trying to understand how AI can support both: helping designers communicate the core idea convincingly while still enabling the level of craft and precision that makes a concept feel real.