Skip to main content
Vora screenshot

Case Study

Vora

A Chrome extension built at a hackathon to help people who cannot use a mouse or keyboard navigate the web by voice. Users speak a request in plain language, Vora reads the current page, uses Claude to interpret the intent, performs the action, and reports back, with subtle animations indicating when it is listening, thinking, and responding.

Role: Co-creator (3-person hackathon team) — built the initial MVP, then owned the voice activation layer (wake-word detection, recognition buffering, mispronunciation learning buffer) and the conversational talk-back UI with its animated speaking states and live transcript
Timeline: May 2026 — Hackathon project
Chrome ExtensionReactTypeScriptWeb Speech APIClaude API

01

The Problem

People with motor disabilities are locked out of huge parts of the web — every site assumes a mouse and keyboard. Existing voice tools are brittle: they break on "Vora" itself (the Web Speech API constantly mishears it as "nora", "bora", "flora", or splits it across two words), they don't adapt to how an individual actually speaks, and they don't give clear feedback about whether they heard you. We had one day at a hackathon to build something that actually worked end-to-end.

02

The Approach

We built a Manifest V3 extension with three contexts: a React side panel that owns voice I/O and the state machine, a stateless service worker that runs the AI pipeline (DOM read → Claude → typed action), and a content script that executes actions on the page. I built the initial MVP, then went deep on the two parts that make or break a voice agent: the activation layer — wake-word detection, the speech-recognition buffering pipeline, and a learning buffer that adapts to a user's mispronunciations — and the conversational UI that talks back to you with animated speaking states and a live transcript.

03

Technical Highlights

  • Wake-word gate tolerant of Web Speech mishearings: a curated phonetic alias list ("vore-uh", "nora", "bora", "flora"…) plus an edit-distance-≤1 fuzzy match on the first spoken token, so "Vora" — a word the recognizer routinely garbles — still triggers reliably
  • Mispronunciation learning buffer: the extension accumulates per-user aliases over time, so words and commands it repeatedly mishears for that specific user get folded into the recognition vocabulary and matched on future utterances
  • Speech-recognition buffering pipeline: merges interim + final transcripts into a single utterance, flushes after a short silence so natural pauses are respected, with a hard cap as a safety net for run-on speech, and carries forward the best confidence score for the low-confidence re-prompt path
  • Conversational talk-back UI: TTS readback of every result plus an animated "speaking halo", a pulsing status indicator, a live mic-level visualizer, and a transcript panel that streams words as you speak — distinct visual states for listening, thinking, executing, and confirming
  • Claude (claude-sonnet-4-6) turns the transcript + a structured snapshot of the page's interactive elements into one typed browser action; destructive actions (submit, delete, pay…) get a spoken read-back and wait for a yes/no before executing, and sensitive fields like passwords are never read or filled

04

Results

Built end-to-end in a single-day hackathon with a 3-person team

Working voice loop on any site: speech → DOM read → Claude → action, with spoken feedback

Did not place at the hackathon — but shipped a complete, accessibility-first voice agent

Open source on GitHub