Overview

Vora

Voice-controlled Chrome extension built in 13 hours at Kiro Hacks so people who can't use a mouse or keyboard can navigate the web hands-free. Extracts and filters the interactive elements from each page's DOM, then uses Claude to interpret a spoken request and act on it. Built a resilient voice-activation layer that adapts to unreliable speech-to-text so the assistant triggers reliably hands-free.

Role: Co-creator (3-person hackathon team) — built the initial MVP, then owned the voice activation layer (wake-word detection, recognition buffering, mispronunciation learning buffer) and the conversational talk-back UI with its animated speaking states and live transcript

Timeline: May 2026 — Hackathon project

Chrome ExtensionReactTypeScriptWeb Speech APIClaude API

GitHub Devpost

The Problem

People with motor disabilities are locked out of huge parts of the web — every site assumes a mouse and keyboard. Existing voice tools are brittle: they break on "Vora" itself (the Web Speech API constantly mishears it as "nora", "bora", "flora", or splits it across two words), they don't adapt to how an individual actually speaks, and they don't give clear feedback about whether they heard you. We had one day at a hackathon to build something that actually worked end-to-end.

The Approach

We built a Manifest V3 extension with three contexts: a React side panel that owns voice I/O and the state machine, a stateless service worker that runs the AI pipeline (DOM read → Claude → typed action), and a content script that executes actions on the page. I built the initial MVP, then went deep on the two parts that make or break a voice agent: the activation layer — wake-word detection, the speech-recognition buffering pipeline, and a learning buffer that adapts to a user's mispronunciations — and the conversational UI that talks back to you with animated speaking states and a live transcript.

Technical Highlights

Wake-word gate tolerant of Web Speech mishearings: a curated phonetic alias list ("vore-uh", "nora", "bora", "flora"…) plus an edit-distance-≤1 fuzzy match on the first spoken token, so "Vora" — a word the recognizer routinely garbles — still triggers reliably
Mispronunciation learning buffer: the extension accumulates per-user aliases over time, so words and commands it repeatedly mishears for that specific user get folded into the recognition vocabulary and matched on future utterances
Speech-recognition buffering pipeline: merges interim + final transcripts into a single utterance, flushes after a short silence so natural pauses are respected, with a hard cap as a safety net for run-on speech, and carries forward the best confidence score for the low-confidence re-prompt path
Conversational talk-back UI: TTS readback of every result plus an animated "speaking halo", a pulsing status indicator, a live mic-level visualizer, and a transcript panel that streams words as you speak — distinct visual states for listening, thinking, executing, and confirming
Claude (claude-sonnet-4-6) turns the transcript + a structured snapshot of the page's interactive elements into one typed browser action; destructive actions (submit, delete, pay…) get a spoken read-back and wait for a yes/no before executing, and sensitive fields like passwords are never read or filled

Results

Built end-to-end in a single-day hackathon with a 3-person team

Working voice loop on any site: speech → DOM read → Claude → action, with spoken feedback

Did not place at the hackathon — but shipped a complete, accessibility-first voice agent

Open source on GitHub