Back to blog
AI & Tools· 9 min read

Real-Time Interview Assistance: The 200ms Pipeline Behind Amigo

How the Deepgram → OpenAI streaming pipeline works, why latency under 200ms matters in a live interview, and what it actually feels like to use.

D

David Park

13 June 2026 · Updated 25 June 2026

Real-Time Interview Assistance: The 200ms Pipeline Behind Amigo

AI & Tools

The gap between an interviewer finishing their question and you beginning your answer is roughly 1–3 seconds in a natural conversation. That window is where real-time AI assistance either succeeds or fails. At 200ms from speech-end to first streamed word, Amigo fits comfortably inside that natural pause — you read the start of an answer before you would normally have begun speaking.

Why Does Latency Matter So Much?

In a live interview, a two-second pause is normal. A five-second pause is awkward. A ten-second pause signals you are stuck. The entire value of real-time assistance collapses if the AI takes longer to respond than the silence budget allows.

This is why Amigo's architecture is built around a single constraint: the first AI word must arrive before the natural conversational pause ends. Everything in the technical pipeline is optimized around that target.

What's Inside the Real-Time Pipeline?

1

Audio capture

Amigo captures system audio using platform-native APIs (WASAPI on Windows, CoreAudio on macOS). This runs in Rust for performance and security. No audio is stored to disk.

2

Voice Activity Detection (VAD)

A lightweight VAD engine runs locally and detects the moment the interviewer stops speaking. This fires the transcription request immediately — crucially, Amigo doesn't wait for a fixed timeout.

3

Deepgram transcription

The audio buffer is sent to Deepgram's streaming STT API. Deepgram returns interim results as the person speaks (so you see partial transcripts live) and a final result when they stop. Amigo uses the last interim result to fire the AI request immediately, without waiting for the final transcript.

4

OpenAI streaming response

The transcript is combined with your Copilot Profile context (resume, job description, brief notes) and sent to an OpenAI model. The model streams tokens back. The first token typically arrives within 200ms of the VAD trigger.

5

Word-by-word display

Tokens are rendered into the Amigo overlay in real-time, word by word. You start reading the answer as it generates — no waiting for a complete response.

What Does It Feel Like to Use?

In practice, the experience is closer to having a very fast research assistant than to reading from a script:

  • The interviewer finishes their question
  • You take a natural 1-second breath (completely normal in conversation)
  • The first sentence of an answer is already visible on your Amigo overlay
  • You glance at the opening line, take its structure, and speak in your own words
  • The rest of the answer continues streaming as you talk, available if you need it

You do not read the AI output verbatim. You use it as a scaffold — it gives you the structure and key points; you add your personality, your specific examples, and your natural voice on top.

The full pipeline in action — from question to streamed answer in under 200ms.

Experience 200ms AI response yourself

Run a practice session with Amigo before your next interview and feel how real-time assistance works.

Try Amigo free →

Frequently Asked Questions

How fast is real-time interview assistance?

Amigo begins streaming an answer within approximately 200ms of the interviewer stopping speaking — fast enough to appear during the natural conversational pause.

What transcription service does Amigo use?

Deepgram's streaming STT API, running through Amigo's Rust backend. API keys are never exposed to the browser.

Does it work offline?

No — an internet connection is required to access transcription and AI services. Audio is processed locally first.

Can I use my own resume and job description?

Yes. Amigo lets you upload your resume and job description into a Copilot Profile. Every AI response is generated with that context, so answers are personalized to your background and the specific role.

Found this useful?

Share it with someone preparing for an interview.

Try free — no card