Boneyard Tools

Audio Transcriber (Speech to Text)

Turn speech into text right here, with no upload and no account. This transcriber runs OpenAI's Whisper model entirely in your browser, so your audio never leaves your device. Drop in an MP3, WAV, M4A, or even an MP4 video and get a full transcript with timestamps. Export it as SRT or VTT subtitles or plain TXT, and choose a quality level all the way up to Large v3 Turbo. The model downloads once on first use, then is cached for instant reuse.

How to transcribe audio to text

  1. Pick a model quality and, optionally, a language hint (leave it on Auto to detect).
  2. Drop in an audio or video file, or click to browse for one.
  3. Wait while the model downloads once and transcribes, then copy or download the transcript as TXT, SRT, or VTT.

Examples

Podcast clip to subtitles

interview.mp3 (a two minute spoken clip)
A full transcript plus an SRT file with cues like 00:00:03,000 --> 00:00:07,500 ready to drop into a video editor.

Frequently asked questions

Is my audio uploaded anywhere?

No. Nothing is uploaded. The Whisper model runs entirely in your browser using WebAssembly (and WebGPU when your browser supports it), so your audio is transcribed on your device and never sent to a server. Only the model itself is downloaded, once, then cached.

Which model quality should I pick?

Tiny is the fastest and good for quick English notes. Base is the balanced multilingual default. Small is more accurate. Large v3 Turbo is the best quality available and handles tricky audio and accents, but it is a large download and needs a modern browser with WebGPU (a recent Chrome or Edge on a capable GPU) to run at a usable speed.

What audio and video formats are supported?

MP3, WAV, M4A, FLAC, OGG, and other common audio files all work, since the browser decodes them before transcribing. You can also drop in an MP4 or WebM video: the tool transcribes the audio track and ignores the picture, so it doubles as a video-to-text and subtitle generator.

What languages can it transcribe?

The multilingual models (Base, Small, and Large v3 Turbo) cover 90+ languages. The language is auto-detected by default, or you can pick a language hint to steer it for cleaner results. The English-only model (Tiny) is tuned for English speech alone.

Why does the first run download the model?

Because everything runs locally, the Whisper weights have to be fetched to your browser the first time you use the tool. They are cached afterward, so later transcriptions start instantly. Larger quality levels are bigger downloads, which is why the first run can take a moment.

Related tools