Boneyard Tools

AI Text to Speech

Type or paste any text and hear it spoken in a natural voice, then download it as a WAV file. The speech is generated entirely in your browser with Meta's MMS model via transformers.js, so your words never leave your device. Pick from 10 languages, adjust the playback speed, and follow along as the sentence being spoken is highlighted in real time. Long passages are split into sentences automatically and stitched into one seamless audio file, so there is no length limit and no per-character billing.

How to convert text to speech

  1. Type or paste your text into the box.
  2. Choose a language voice and, if you like, adjust the speaking speed.
  3. Click Generate speech, listen with read-along highlighting, then download the WAV.

Examples

Reading a paragraph aloud

Welcome to Boneyard Tools. Everything here runs in your browser.
A spoken WAV file you can play with the current sentence highlighted, or download.

Frequently asked questions

Is my text sent to a server?

No. The text-to-speech model runs entirely in your browser via transformers.js. Your text and the generated audio never leave your device, which makes this safe for private or sensitive content.

Which languages are supported?

Ten to start: English, Spanish, French, German, Portuguese, Italian, Russian, Arabic, Hindi and Korean. Each language uses Meta's MMS-TTS voice for that language, downloaded on demand (about 63 MB) the first time you use it, then cached.

Is there a length limit?

No hard limit. Long text is automatically split into sentence-sized chunks, synthesized one by one with a progress bar, and concatenated into a single continuous WAV file. Very long passages just take a little longer the first time.

Can I download the audio?

Yes. After generating, click Download WAV to save a standard 16-bit PCM WAV file you can use in videos, podcasts, slideshows or anywhere else. There are no watermarks and no usage caps.

What is read-along highlighting?

As the audio plays, the tool highlights the exact sentence being spoken, so you can follow the text and the voice together. It works because we track where each sentence starts in the audio and match it to the player's current position.

Why does the first run take a moment?

The first time you pick a language, the browser downloads that voice model (around 63 MB) and caches it. After that, generating speech in the same language is fast and works offline.

Related tools