Question 1

Is my audio uploaded anywhere?

Accepted Answer

No. Nothing is uploaded. The Whisper model runs entirely in your browser using WebAssembly (and WebGPU when your browser supports it), so your audio is transcribed on your device and never sent to a server. Only the model itself is downloaded, once, then cached.

Question 2

Which model quality should I pick?

Accepted Answer

Tiny is the fastest and good for quick English notes. Base is the balanced multilingual default. Small is more accurate. Large v3 Turbo is the best quality available and handles tricky audio and accents, but it is a large download and needs a modern browser with WebGPU (a recent Chrome or Edge on a capable GPU) to run at a usable speed.

Question 3

What audio and video formats are supported?

Accepted Answer

MP3, WAV, M4A, FLAC, OGG, and other common audio files all work, since the browser decodes them before transcribing. You can also drop in an MP4 or WebM video: the tool transcribes the audio track and ignores the picture, so it doubles as a video-to-text and subtitle generator.

Question 4

What languages can it transcribe?

Accepted Answer

The multilingual models (Base, Small, and Large v3 Turbo) cover 90+ languages. The language is auto-detected by default, or you can pick a language hint to steer it for cleaner results. The English-only model (Tiny) is tuned for English speech alone.

Question 5

Why does the first run download the model?

Accepted Answer

Because everything runs locally, the Whisper weights have to be fetched to your browser the first time you use the tool. They are cached afterward, so later transcriptions start instantly. Larger quality levels are bigger downloads, which is why the first run can take a moment.

Audio Transcriber (Speech to Text)

How to transcribe audio to text

Examples

Frequently asked questions

Related tools

AI Text Summarizer

WAV File Info

MP3 Duration and Bitrate

AI Text to Speech

Audio Fade

Audio File Size Calculator