Audio Transcriber (Speech to Text)
Turn speech into text right here, with no upload and no account. This transcriber runs OpenAI's Whisper model entirely in your browser, so your audio never leaves your device. Drop in an MP3, WAV, M4A, or even an MP4 video and get a full transcript with timestamps. Export it as SRT or VTT subtitles or plain TXT, and choose a quality level all the way up to Large v3 Turbo. The model downloads once on first use, then is cached for instant reuse.
How to transcribe audio to text
- Pick a model quality and, optionally, a language hint (leave it on Auto to detect).
- Drop in an audio or video file, or click to browse for one.
- Wait while the model downloads once and transcribes, then copy or download the transcript as TXT, SRT, or VTT.
Examples
Podcast clip to subtitles
interview.mp3 (a two minute spoken clip)
A full transcript plus an SRT file with cues like 00:00:03,000 --> 00:00:07,500 ready to drop into a video editor.
Frequently asked questions
Is my audio uploaded anywhere?
No. Nothing is uploaded. The Whisper model runs entirely in your browser using WebAssembly (and WebGPU when your browser supports it), so your audio is transcribed on your device and never sent to a server. Only the model itself is downloaded, once, then cached.
Which model quality should I pick?
Tiny is the fastest and good for quick English notes. Base is the balanced multilingual default. Small is more accurate. Large v3 Turbo is the best quality available and handles tricky audio and accents, but it is a large download and needs a modern browser with WebGPU (a recent Chrome or Edge on a capable GPU) to run at a usable speed.
What audio and video formats are supported?
MP3, WAV, M4A, FLAC, OGG, and other common audio files all work, since the browser decodes them before transcribing. You can also drop in an MP4 or WebM video: the tool transcribes the audio track and ignores the picture, so it doubles as a video-to-text and subtitle generator.
What languages can it transcribe?
The multilingual models (Base, Small, and Large v3 Turbo) cover 90+ languages. The language is auto-detected by default, or you can pick a language hint to steer it for cleaner results. The English-only model (Tiny) is tuned for English speech alone.
Why does the first run download the model?
Because everything runs locally, the Whisper weights have to be fetched to your browser the first time you use the tool. They are cached afterward, so later transcriptions start instantly. Larger quality levels are bigger downloads, which is why the first run can take a moment.
Related tools
AI Text Summarizer
Summarize any text with AI, right in your browser. Paste an article, pick a length, and get a short abstractive summary. Nothing is uploaded.
WAV File Info
Read a WAV file's audio format, channels, sample rate, bit depth, byte rate and duration in your browser. Nothing is uploaded, so your audio stays private.
MP3 Duration and Bitrate
Check an MP3's bitrate, sample rate, MPEG layer, channel mode, VBR or CBR, and estimated duration in your browser. Nothing is uploaded. Private and instant.
Audio Fade
Add a fade-in or fade-out to audio in your browser. Pick the fade length and a linear or equal-power curve, then download a WAV. Nothing is uploaded.
Audio Reverser
Reverse audio in your browser and play any clip backwards. Drop in a file, hear the reversed preview, then download it as a WAV. Nothing is uploaded.
Audio Silence Remover
Trim silence from the start and end of an audio file in your browser. Set the threshold and padding, then download a WAV. Free, private, nothing uploaded.