Overview
Convert text into natural-sounding speech using various TTS models. Supports multiple languages, voices, and customization options including speed control and voice instructions. Looking for synchronous, low‑latency TTS that returns audio bytes directly? Seeapi-reference/endpoint/speech.mdx (POST /v1/speech).
Supported Models
- Kokoro-82m: 44 multilingual voices ($0.001/1k chars)
- Elevenlabs-Turbo-V2.5: Premium quality with style controls ($0.06/1k chars)
- tts-1: OpenAI standard quality ($0.015/1k chars)
- tts-1-hd: OpenAI high definition ($0.030/1k chars)
- gpt-4o-mini-tts: Ultra-low cost ($0.0006/1k chars)
Basic Usage
Async Status and Result Retrieval
Some TTS models run asynchronously. When queued, the API returns HTTP 202 with a ticket containing arunId and model. Use the TTS Status endpoint to poll until the job is complete. Synchronous models return audio immediately and do not require status polling.
Endpoints
- Submit TTS:
POST /api/tts - Check TTS Status (async only):
GET /api/tts/status?runId=...&model=...
When you see status: “pending”
If your initialPOST /api/tts returns HTTP 202 with a body like:
runId and model. If present, include cost, paymentSource, and isApiRequest from the ticket when polling to help with automatic refunds if the upstream provider later rejects content.
cURL — Submit, then Poll
Synchronous vs. Asynchronous Models
- Synchronous models (examples:
tts-1,tts-1-hd,gpt-4o-mini-tts,Kokoro-82m) return immediately fromPOST /api/ttswith either binary audio or JSON containing{ audioUrl, contentType }depending on the provider. - Asynchronous models (examples:
Elevenlabs-Turbo-V2.5,Elevenlabs-V3,Elevenlabs-Music-V1) return HTTP 202 with a polling ticket. UseGET /api/tts/statusuntil completed.
Best Practices
- Poll every 2–3 seconds; stop after 2–3 minutes and show a timeout error.
- Always include
runIdandmodel. If available, includecost,paymentSource, andisApiRequestfrom the ticket for better error handling and refund automation. - On
completed, prefer using theaudioUrldirectly (streaming or download). Cache URLs client‑side if you plan to replay. - If you receive
CONTENT_POLICY_VIOLATION, do not retry the same content; surface a clear message to the user.
FAQ
- Why did I get 202/pending? The selected model runs asynchronously; your request was queued and billed after a successful queue submission.
- Can I cancel a pending TTS? Not currently. Let it complete or time out client‑side.
- Do all TTS models require polling? No. Only async models. Synchronous models return immediately.
Model-Specific Examples
Kokoro-82m - Multilingual Voices
44 voices across 13 language groups:Elevenlabs-Turbo-V2.5 - Advanced Voice Controls
Premium quality with style adjustments:OpenAI Models - Multiple Formats & Instructions
Response Examples
JSON Response (Most Models)
Binary Response (OpenAI Models)
OpenAI models return audio data directly as binary with appropriate headers:Voice Options
Kokoro-82m Voices
- American Female: af_bella, af_nova, af_aoede, af_jessica, af_sarah
- American Male: am_adam, am_onyx, am_eric, am_liam
- British: bf_alice, bf_emma, bm_daniel, bm_george
- Asian Languages: jf_alpha (Japanese), zf_xiaoxiao (Chinese)
- European: ff_siwis (French), im_nicola (Italian)
Elevenlabs-Turbo-V2.5 Voices
Rachel, Adam, Bella, Brian, Sarah, Michael, Emily, James, Nicole, and 37 moreOpenAI Voices
alloy, echo, fable, onyx, nova, shimmer, ash, ballad, coral, sage, verseError Handling
- 400: Invalid parameters or missing text
- 401: Invalid or missing API key
- 413: Text exceeds model character limit
- 429: Rate limit exceeded