Question 1

Is my text sent to a server?

Accepted Answer

No. The text-to-speech model runs entirely in your browser via transformers.js. Your text and the generated audio never leave your device, which makes this safe for private or sensitive content.

Question 2

Which languages are supported?

Accepted Answer

Ten to start: English, Spanish, French, German, Portuguese, Italian, Russian, Arabic, Hindi and Korean. Each language uses Meta's MMS-TTS voice for that language, downloaded on demand (about 63 MB) the first time you use it, then cached.

Question 3

Is there a length limit?

Accepted Answer

No hard limit. Long text is automatically split into sentence-sized chunks, synthesized one by one with a progress bar, and concatenated into a single continuous WAV file. Very long passages just take a little longer the first time.

Question 4

Can I download the audio?

Accepted Answer

Yes. After generating, click Download WAV to save a standard 16-bit PCM WAV file you can use in videos, podcasts, slideshows or anywhere else. There are no watermarks and no usage caps.

Question 5

What is read-along highlighting?

Accepted Answer

As the audio plays, the tool highlights the exact sentence being spoken, so you can follow the text and the voice together. It works because we track where each sentence starts in the audio and match it to the player's current position.

Question 6

Why does the first run take a moment?

Accepted Answer

The first time you pick a language, the browser downloads that voice model (around 63 MB) and caches it. After that, generating speech in the same language is fast and works offline.

AI Text to Speech

How to convert text to speech

Examples

Frequently asked questions

Related tools

Audio Transcriber

Audio Speed Changer

Volume Normalizer

Audio Fade

Audio File Size Calculator

Audio Reverser