Adding Text to Speech to JavaScript
Learn how browser speech works, its trade-offs, and how to integrate TTS2Go for high-quality AI text-to-speech in any JavaScript project.
You do not need a framework to add text-to-speech (TTS) to your website. Whether you are working with plain HTML and JavaScript, a static site generator, or a server-rendered page, adding voice to your content is straightforward.
This guide explains how browser speech works, the trade-offs of different approaches, and how to integrate TTS2Go into any JavaScript project.
How Browser Speech Synthesis Works
Modern browsers ship with the Web Speech API built in. With a few lines of vanilla JavaScript you can:
```js
const utterance = new SpeechSynthesisUtterance("Hello world");
window.speechSynthesis.speak(utterance);
```
No npm packages, no API keys, and no server are required. It is free and works offline.
The downside is quality and consistency:
- Voices often sound robotic.
- Voice quality and selection vary between platforms.
- Chrome on Windows sounds different from Safari on macOS.
- You cannot guarantee a consistent listening experience for all users.
The Rise of AI-Generated Speech
AI text-to-speech has transformed what is possible with voice on the web. Neural voice models produce audio that sounds genuinely human, with natural pauses, emphasis, and rhythm.
Pre-Generating Audio
One option is to pre-generate audio for all of your content and host the files on a CDN.
Pros
- Instant playback for users
- Identical quality and voice everywhere
Cons
- You pay to generate audio for every piece of content up front, whether it gets played or not
- Content must be known in advance
- Updates require regenerating files
- Storage costs grow with your content library
Pre-generation works well for small, stable catalogs, but becomes expensive and inflexible at scale.
Lazy Generation on Demand
A more efficient approach is to generate audio only when someone actually clicks play.
Benefits
- You pay only for audio that users request
- Dynamic and user-generated content is handled naturally
Challenge
Without any gating, anyone visiting your site could trigger expensive generations. You need a way to decide what gets generated and when, so you can control your budget while still offering TTS everywhere.
How TTS2Go Solves This
TTS2Go takes a hybrid approach that balances quality, cost, and simplicity.
- Safe client-side key
You add the SDK to your site with a frontend API key. The key uses request domain blocking and rate limiting so it can safely live in your client-side code. It is an identification and rate limiting mechanism, not a secret.
- First visitor: instant browser TTS + background generation
When the first person clicks a TTS button on a piece of content, they hear browser speech synthesis immediately. At the same time, a generation request is sent to TTS2Go in the background.
- Approval and budget control
In your TTS2Go dashboard you can:
- Approve generation requests manually, or
- Configure the AI approval system to auto-approve requests that meet your criteria.
This gives you full control over your generation budget.
- Subsequent visitors: premium AI audio from CDN
Once a request is approved and generated, every subsequent user who clicks TTS on that same content gets high-quality AI audio in your chosen voice, served instantly from TTS2Go's CDN.
In practice: the first visitor gets browser TTS, costs are controlled through approval, and everyone after gets premium AI audio.
Step 1: Install the SDK
Install the TTS2Go vanilla JavaScript package using npm:
If you are not using a bundler, you can load it from a CDN with a <script> tag and access it as window.TTS2Go.
Step 2: Create the Client
Initialize a new TTS2Go instance with your project credentials:
Step 3: Add Text to Speech
Call tts.create() with your text and a voice ID to get a TTS instance with full playback controls:
From here you can explore the full TTS2Go JavaScript API, including:
tts.getVoices()– list available voices for your project