Overview
Convert text into natural-sounding speech using various TTS models. Supports multiple languages, voices, and customization options including speed control and voice instructions.Supported Models
- Kokoro-82m: 44 multilingual voices ($0.001/1k chars)
- Elevenlabs-Turbo-V2.5: Premium quality with style controls ($0.06/1k chars)
- tts-1: OpenAI standard quality ($0.015/1k chars)
- tts-1-hd: OpenAI high definition ($0.030/1k chars)
- gpt-4o-mini-tts: Ultra-low cost ($0.0006/1k chars)
Basic Usage
Model-Specific Examples
Kokoro-82m - Multilingual Voices
44 voices across 13 language groups:Elevenlabs-Turbo-V2.5 - Advanced Voice Controls
Premium quality with style adjustments:OpenAI Models - Multiple Formats & Instructions
Response Examples
JSON Response (Most Models)
Binary Response (OpenAI Models)
OpenAI models return audio data directly as binary with appropriate headers:Voice Options
Kokoro-82m Voices
- American Female: af_bella, af_nova, af_aoede, af_jessica, af_sarah
- American Male: am_adam, am_onyx, am_eric, am_liam
- British: bf_alice, bf_emma, bm_daniel, bm_george
- Asian Languages: jf_alpha (Japanese), zf_xiaoxiao (Chinese)
- European: ff_siwis (French), im_nicola (Italian)
Elevenlabs-Turbo-V2.5 Voices
Rachel, Adam, Bella, Brian, Sarah, Michael, Emily, James, Nicole, and 37 moreOpenAI Voices
alloy, echo, fable, onyx, nova, shimmer, ash, ballad, coral, sage, verseError Handling
- 400: Invalid parameters or missing text
- 401: Invalid or missing API key
- 413: Text exceeds model character limit
- 429: Rate limit exceeded
Authorizations
Body
application/json
Text-to-speech generation parameters
The body is of type object
.
Response
Text-to-speech response. Returns either JSON with audio URL or binary audio data depending on the model.
The response is of type object
.