Convert text into natural-sounding speech using various TTS models from different providers. Supports multiple languages, voices, and customization options including speed control, voice instructions, and audio format selection.
api-reference/endpoint/speech.mdx (POST /v1/speech).
runId and model. Use the TTS Status endpoint to poll until the job is complete. Synchronous models return audio immediately and do not require status polling.
POST /api/ttsGET /api/tts/status?runId=...&model=...POST /api/tts returns HTTP 202 with a body like:
runId and model. If present, include cost, paymentSource, and isApiRequest from the ticket when polling to help with automatic refunds if the upstream provider later rejects content.
tts-1, tts-1-hd, gpt-4o-mini-tts, Kokoro-82m) return immediately from POST /api/tts with either binary audio or JSON containing { audioUrl, contentType } depending on the provider.Elevenlabs-Turbo-V2.5, Elevenlabs-V3, Elevenlabs-Music-V1) return HTTP 202 with a polling ticket. Use GET /api/tts/status until completed.runId and model. If available, include cost, paymentSource, and isApiRequest from the ticket for better error handling and refund automation.completed, prefer using the audioUrl directly (streaming or download). Cache URLs client‑side if you plan to replay.CONTENT_POLICY_VIOLATION, do not retry the same content; surface a clear message to the user.Text-to-speech generation parameters
The text to convert to speech
"Hello! This is a test of the text-to-speech API."
The TTS model to use for generation
Kokoro-82m, Elevenlabs-Turbo-V2.5, tts-1, tts-1-hd, gpt-4o-mini-tts The voice to use for synthesis (available voices depend on selected model)
"af_bella"
Speech speed multiplier (0.1-5, not supported for gpt-4o-mini-tts)
0.1 <= x <= 5Audio output format (OpenAI models only)
mp3, opus, aac, flac, wav, pcm Voice instructions for fine-tuning (gpt-4o-mini-tts and tts-1-hd only)
"speak with enthusiasm"
Voice stability (Elevenlabs-Turbo-V2.5 only, 0-1)
0 <= x <= 1Voice similarity boost (Elevenlabs-Turbo-V2.5 only, 0-1)
0 <= x <= 1Style exaggeration (Elevenlabs-Turbo-V2.5 only, 0-1)
0 <= x <= 1Text-to-speech response. Returns either JSON with audio URL or binary audio data depending on the model.
URL to the generated audio file
"https://storage.url/audio-file.wav"
MIME type of the audio file
"audio/wav"
Model used for generation
The input text that was synthesized
Voice used for synthesis
Speed multiplier used
Duration of the generated audio in seconds
Cost of the generation
Currency of the cost