Skip to main content

Overview

Convert text into natural-sounding speech using various TTS models. Supports multiple languages, voices, and customization options including speed control and voice instructions. Looking for synchronous, low‑latency TTS that returns audio bytes directly? See api-reference/endpoint/speech.mdx (POST /v1/speech).

Supported Models

  • Kokoro-82m: 44 multilingual voices ($0.001/1k chars)
  • Elevenlabs-Turbo-V2.5: Premium quality with style controls ($0.06/1k chars)
  • tts-1: OpenAI standard quality ($0.015/1k chars)
  • tts-1-hd: OpenAI high definition ($0.030/1k chars)
  • gpt-4o-mini-tts: Ultra-low cost ($0.0006/1k chars)

Basic Usage

import requests

def text_to_speech(text, model="Kokoro-82m", voice=None, **kwargs):
    headers = {
        "x-api-key": "YOUR_API_KEY",
        "Content-Type": "application/json"
    }
    
    payload = {
        "text": text,
        "model": model
    }
    
    if voice:
        payload["voice"] = voice
    
    payload.update(kwargs)
    
    response = requests.post(
        "https://nano-gpt.com/api/tts",
        headers=headers,
        json=payload
    )
    
    if response.status_code == 200:
        content_type = response.headers.get('content-type', '')
        
        if 'application/json' in content_type:
            # JSON response with audio URL
            data = response.json()
            audio_response = requests.get(data['audioUrl'])
            with open('output.wav', 'wb') as f:
                f.write(audio_response.content)
        else:
            # Binary audio data (OpenAI models)
            with open('output.mp3', 'wb') as f:
                f.write(response.content)
        
        return response
    else:
        raise Exception(f"Error: {response.status_code}")

# Basic usage
text_to_speech(
    "Hello! Welcome to our service.",
    model="Kokoro-82m",
    voice="af_bella"
)

Async Status and Result Retrieval

Some TTS models run asynchronously. When queued, the API returns HTTP 202 with a ticket containing a runId and model. Use the TTS Status endpoint to poll until the job is complete. Synchronous models return audio immediately and do not require status polling.

Endpoints

  • Submit TTS: POST /api/tts
  • Check TTS Status (async only): GET /api/tts/status?runId=...&model=...

When you see status: “pending”

If your initial POST /api/tts returns HTTP 202 with a body like:
{
  "status": "pending",
  "runId": "98b0d593-fe8d-49b8-89c9-233022232297",
  "model": "Elevenlabs-Turbo-V2.5",
  "charged": true,
  "cost": 0.0050388,
  "paymentSource": "USD",
  "isApiRequest": true
}
…the request is queued. Poll the Status endpoint using the runId and model. If present, include cost, paymentSource, and isApiRequest from the ticket when polling to help with automatic refunds if the upstream provider later rejects content.

cURL — Submit, then Poll

# 1) Submit TTS
curl -X POST https://nano-gpt.com/api/tts \
  -H 'x-api-key: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "text": "Hello there!",
    "model": "Elevenlabs-Turbo-V2.5",
    "voice": "Rachel",
    "speed": 1.0
  }'

# 2) If response is 202/pending, poll using returned values
curl "https://nano-gpt.com/api/tts/status?runId=98b0d593-fe8d-49b8-89c9-233022232297&model=Elevenlabs-Turbo-V2.5&cost=0.0050388&paymentSource=USD&isApiRequest=true" \
  -H 'x-api-key: YOUR_API_KEY'

# 3) On completion, you'll receive an audioUrl
# {
#   "status": "completed",
#   "audioUrl": "https://.../file.mp3",
#   "contentType": "audio/mpeg",
#   "model": "Elevenlabs-Turbo-V2.5"
# }

Synchronous vs. Asynchronous Models

  • Synchronous models (examples: tts-1, tts-1-hd, gpt-4o-mini-tts, Kokoro-82m) return immediately from POST /api/tts with either binary audio or JSON containing { audioUrl, contentType } depending on the provider.
  • Asynchronous models (examples: Elevenlabs-Turbo-V2.5, Elevenlabs-V3, Elevenlabs-Music-V1) return HTTP 202 with a polling ticket. Use GET /api/tts/status until completed.

Best Practices

  • Poll every 2–3 seconds; stop after 2–3 minutes and show a timeout error.
  • Always include runId and model. If available, include cost, paymentSource, and isApiRequest from the ticket for better error handling and refund automation.
  • On completed, prefer using the audioUrl directly (streaming or download). Cache URLs client‑side if you plan to replay.
  • If you receive CONTENT_POLICY_VIOLATION, do not retry the same content; surface a clear message to the user.

FAQ

  • Why did I get 202/pending? The selected model runs asynchronously; your request was queued and billed after a successful queue submission.
  • Can I cancel a pending TTS? Not currently. Let it complete or time out client‑side.
  • Do all TTS models require polling? No. Only async models. Synchronous models return immediately.

Model-Specific Examples

Kokoro-82m - Multilingual Voices

44 voices across 13 language groups:
# Popular voice examples by category
voices = {
    "american_female": ["af_bella", "af_nova", "af_aoede"],
    "american_male": ["am_adam", "am_onyx", "am_eric"],
    "british_female": ["bf_alice", "bf_emma"],
    "british_male": ["bm_daniel", "bm_george"],
    "japanese_female": ["jf_alpha", "jf_gongitsune"],
    "chinese_female": ["zf_xiaoxiao", "zf_xiaoyi"],
    "french_female": ["ff_siwis"],
    "italian_male": ["im_nicola"]
}

# Generate multilingual samples
samples = [
    {"text": "Hello, welcome!", "voice": "af_bella", "lang": "English"},
    {"text": "Bonjour et bienvenue!", "voice": "ff_siwis", "lang": "French"},
    {"text": "こんにちは!", "voice": "jf_alpha", "lang": "Japanese"},
    {"text": "你好,欢迎!", "voice": "zf_xiaoxiao", "lang": "Chinese"}
]

for sample in samples:
    text_to_speech(
        text=sample["text"],
        model="Kokoro-82m",
        voice=sample["voice"]
    )

Elevenlabs-Turbo-V2.5 - Advanced Voice Controls

Premium quality with style adjustments:
# Stable, consistent voice
text_to_speech(
    text="This is a professional announcement.",
    model="Elevenlabs-Turbo-V2.5",
    voice="Rachel",
    stability=0.9,
    similarity_boost=0.8,
    style=0
)

# Expressive, dynamic voice  
text_to_speech(
    text="This is so exciting!",
    model="Elevenlabs-Turbo-V2.5",
    voice="Rachel",
    stability=0.3,
    similarity_boost=0.7,
    style=0.8,
    speed=1.2
)

# Available voices: Rachel, Adam, Bella, Brian, etc.

OpenAI Models - Multiple Formats & Instructions

# High-definition with voice instructions
text_to_speech(
    text="Welcome to customer service.",
    model="tts-1-hd",
    voice="nova",
    instructions="Speak warmly and professionally like a customer service representative",
    response_format="flac"
)

# Ultra-low cost option
text_to_speech(
    text="This is a cost-effective option.",
    model="gpt-4o-mini-tts",
    voice="alloy",
    instructions="Speak clearly and cheerfully",
    response_format="mp3"
)

# Different format examples
formats = ["mp3", "wav", "opus", "flac", "aac"]
for fmt in formats:
    text_to_speech(
        text=f"This is {fmt.upper()} format.",
        model="tts-1",
        voice="echo",
        response_format=fmt
    )

Response Examples

JSON Response (Most Models)

{
  "audioUrl": "https://storage.url/audio-file.wav",
  "contentType": "audio/wav",
  "model": "Kokoro-82m",
  "text": "Hello world",
  "voice": "af_bella",
  "speed": 1,
  "duration": 2.3,
  "cost": 0.001,
  "currency": "USD"
}

Binary Response (OpenAI Models)

OpenAI models return audio data directly as binary with appropriate headers:
Content-Type: audio/mp3
Content-Length: 123456
[Binary audio data]

Voice Options

Kokoro-82m Voices

  • American Female: af_bella, af_nova, af_aoede, af_jessica, af_sarah
  • American Male: am_adam, am_onyx, am_eric, am_liam
  • British: bf_alice, bf_emma, bm_daniel, bm_george
  • Asian Languages: jf_alpha (Japanese), zf_xiaoxiao (Chinese)
  • European: ff_siwis (French), im_nicola (Italian)

Elevenlabs-Turbo-V2.5 Voices

Rachel, Adam, Bella, Brian, Sarah, Michael, Emily, James, Nicole, and 37 more

OpenAI Voices

alloy, echo, fable, onyx, nova, shimmer, ash, ballad, coral, sage, verse

Error Handling

try:
    result = text_to_speech("Hello world!", model="Kokoro-82m")
    print("Success!")
except Exception as e:
    if "400" in str(e):
        print("Bad request - check parameters")
    elif "401" in str(e):
        print("Unauthorized - check API key")
    elif "413" in str(e):
        print("Text too long for model")
    else:
        print(f"Error: {e}")
Common errors:
  • 400: Invalid parameters or missing text
  • 401: Invalid or missing API key
  • 413: Text exceeds model character limit
  • 429: Rate limit exceeded