
Case Study
Vora
A Chrome extension built at a hackathon to help people who cannot use a mouse or keyboard navigate the web by voice. Users speak a request in plain language, Vora reads the current page, uses Claude to interpret the intent, performs the action, and reports back, with subtle animations indicating when it is listening, thinking, and responding.
01
The Problem
People with motor disabilities are locked out of huge parts of the web — every site assumes a mouse and keyboard. Existing voice tools are brittle: they break on "Vora" itself (the Web Speech API constantly mishears it as "nora", "bora", "flora", or splits it across two words), they don't adapt to how an individual actually speaks, and they don't give clear feedback about whether they heard you. We had one day at a hackathon to build something that actually worked end-to-end.
02
The Approach
We built a Manifest V3 extension with three contexts: a React side panel that owns voice I/O and the state machine, a stateless service worker that runs the AI pipeline (DOM read → Claude → typed action), and a content script that executes actions on the page. I built the initial MVP, then went deep on the two parts that make or break a voice agent: the activation layer — wake-word detection, the speech-recognition buffering pipeline, and a learning buffer that adapts to a user's mispronunciations — and the conversational UI that talks back to you with animated speaking states and a live transcript.
03
Technical Highlights
- Wake-word gate tolerant of Web Speech mishearings: a curated phonetic alias list ("vore-uh", "nora", "bora", "flora"…) plus an edit-distance-≤1 fuzzy match on the first spoken token, so "Vora" — a word the recognizer routinely garbles — still triggers reliably
- Mispronunciation learning buffer: the extension accumulates per-user aliases over time, so words and commands it repeatedly mishears for that specific user get folded into the recognition vocabulary and matched on future utterances
- Speech-recognition buffering pipeline: merges interim + final transcripts into a single utterance, flushes after a short silence so natural pauses are respected, with a hard cap as a safety net for run-on speech, and carries forward the best confidence score for the low-confidence re-prompt path
- Conversational talk-back UI: TTS readback of every result plus an animated "speaking halo", a pulsing status indicator, a live mic-level visualizer, and a transcript panel that streams words as you speak — distinct visual states for listening, thinking, executing, and confirming
- Claude (claude-sonnet-4-6) turns the transcript + a structured snapshot of the page's interactive elements into one typed browser action; destructive actions (submit, delete, pay…) get a spoken read-back and wait for a yes/no before executing, and sensitive fields like passwords are never read or filled
04
Results
Built end-to-end in a single-day hackathon with a 3-person team
Working voice loop on any site: speech → DOM read → Claude → action, with spoken feedback
Did not place at the hackathon — but shipped a complete, accessibility-first voice agent
Open source on GitHub