POST
/
api
/
tts
curl --request POST \
  --url https://nano-gpt.com/api/api/tts \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '{
  "text": "Hello! This is a test of the text-to-speech API.",
  "model": "Kokoro-82m",
  "voice": "af_bella",
  "speed": 1,
  "response_format": "mp3",
  "instructions": "speak with enthusiasm",
  "stability": 0.5,
  "similarity_boost": 0.75,
  "style": 0
}'
{
  "audioUrl": "https://storage.url/audio-file.wav",
  "contentType": "audio/wav",
  "model": "<string>",
  "text": "<string>",
  "voice": "<string>",
  "speed": 123,
  "duration": 123,
  "cost": 123,
  "currency": "<string>"
}

Overview

Convert text into natural-sounding speech using various TTS models. Supports multiple languages, voices, and customization options including speed control and voice instructions.

Supported Models

  • Kokoro-82m: 44 multilingual voices ($0.001/1k chars)
  • Dia-TTS: Multi-speaker conversations ($0.05/1k chars)
  • Elevenlabs-Turbo-V2.5: Premium quality with style controls ($0.06/1k chars)
  • tts-1: OpenAI standard quality ($0.015/1k chars)
  • tts-1-hd: OpenAI high definition ($0.030/1k chars)
  • gpt-4o-mini-tts: Ultra-low cost ($0.0006/1k chars)

Basic Usage

import requests

def text_to_speech(text, model="Kokoro-82m", voice=None, **kwargs):
    headers = {
        "x-api-key": "YOUR_API_KEY",
        "Content-Type": "application/json"
    }
    
    payload = {
        "text": text,
        "model": model
    }
    
    if voice:
        payload["voice"] = voice
    
    payload.update(kwargs)
    
    response = requests.post(
        "https://nano-gpt.com/api/tts",
        headers=headers,
        json=payload
    )
    
    if response.status_code == 200:
        content_type = response.headers.get('content-type', '')
        
        if 'application/json' in content_type:
            # JSON response with audio URL
            data = response.json()
            audio_response = requests.get(data['audioUrl'])
            with open('output.wav', 'wb') as f:
                f.write(audio_response.content)
        else:
            # Binary audio data (OpenAI models)
            with open('output.mp3', 'wb') as f:
                f.write(response.content)
        
        return response
    else:
        raise Exception(f"Error: {response.status_code}")

# Basic usage
text_to_speech(
    "Hello! Welcome to our service.",
    model="Kokoro-82m",
    voice="af_bella"
)

Model-Specific Examples

Kokoro-82m - Multilingual Voices

44 voices across 13 language groups:

# Popular voice examples by category
voices = {
    "american_female": ["af_bella", "af_nova", "af_aoede"],
    "american_male": ["am_adam", "am_onyx", "am_eric"],
    "british_female": ["bf_alice", "bf_emma"],
    "british_male": ["bm_daniel", "bm_george"],
    "japanese_female": ["jf_alpha", "jf_gongitsune"],
    "chinese_female": ["zf_xiaoxiao", "zf_xiaoyi"],
    "french_female": ["ff_siwis"],
    "italian_male": ["im_nicola"]
}

# Generate multilingual samples
samples = [
    {"text": "Hello, welcome!", "voice": "af_bella", "lang": "English"},
    {"text": "Bonjour et bienvenue!", "voice": "ff_siwis", "lang": "French"},
    {"text": "こんにちは!", "voice": "jf_alpha", "lang": "Japanese"},
    {"text": "你好,欢迎!", "voice": "zf_xiaoxiao", "lang": "Chinese"}
]

for sample in samples:
    text_to_speech(
        text=sample["text"],
        model="Kokoro-82m",
        voice=sample["voice"]
    )

Dia-TTS - Multi-Speaker Conversations

Create dialogues with speaker tags:

# Multi-speaker conversation
dialogue = "[S1] Welcome to our podcast! [S2] Thanks for having me. [S1] Let's begin!"

text_to_speech(
    text=dialogue,
    model="Dia-TTS",
    speed=1.1
)

# Single speaker with specific voice
text_to_speech(
    text="[S1] This is a single speaker narration.",
    model="Dia-TTS",
    voice="S2"  # Use S2 voice for all text
)

Elevenlabs-Turbo-V2.5 - Advanced Voice Controls

Premium quality with style adjustments:

# Stable, consistent voice
text_to_speech(
    text="This is a professional announcement.",
    model="Elevenlabs-Turbo-V2.5",
    voice="Rachel",
    stability=0.9,
    similarity_boost=0.8,
    style=0
)

# Expressive, dynamic voice  
text_to_speech(
    text="This is so exciting!",
    model="Elevenlabs-Turbo-V2.5",
    voice="Rachel",
    stability=0.3,
    similarity_boost=0.7,
    style=0.8,
    speed=1.2
)

# Available voices: Rachel, Adam, Bella, Brian, etc.

OpenAI Models - Multiple Formats & Instructions

# High-definition with voice instructions
text_to_speech(
    text="Welcome to customer service.",
    model="tts-1-hd",
    voice="nova",
    instructions="Speak warmly and professionally like a customer service representative",
    response_format="flac"
)

# Ultra-low cost option
text_to_speech(
    text="This is a cost-effective option.",
    model="gpt-4o-mini-tts",
    voice="alloy",
    instructions="Speak clearly and cheerfully",
    response_format="mp3"
)

# Different format examples
formats = ["mp3", "wav", "opus", "flac", "aac"]
for fmt in formats:
    text_to_speech(
        text=f"This is {fmt.upper()} format.",
        model="tts-1",
        voice="echo",
        response_format=fmt
    )

Response Examples

JSON Response (Most Models)

{
  "audioUrl": "https://storage.url/audio-file.wav",
  "contentType": "audio/wav",
  "model": "Kokoro-82m",
  "text": "Hello world",
  "voice": "af_bella",
  "speed": 1,
  "duration": 2.3,
  "cost": 0.001,
  "currency": "USD"
}

Binary Response (OpenAI Models)

OpenAI models return audio data directly as binary with appropriate headers:

Content-Type: audio/mp3
Content-Length: 123456
[Binary audio data]

Voice Options

Kokoro-82m Voices

  • American Female: af_bella, af_nova, af_aoede, af_jessica, af_sarah
  • American Male: am_adam, am_onyx, am_eric, am_liam
  • British: bf_alice, bf_emma, bm_daniel, bm_george
  • Asian Languages: jf_alpha (Japanese), zf_xiaoxiao (Chinese)
  • European: ff_siwis (French), im_nicola (Italian)

Elevenlabs-Turbo-V2.5 Voices

Rachel, Adam, Bella, Brian, Sarah, Michael, Emily, James, Nicole, and 37 more

OpenAI Voices

alloy, echo, fable, onyx, nova, shimmer, ash, ballad, coral, sage, verse

Error Handling

try:
    result = text_to_speech("Hello world!", model="Kokoro-82m")
    print("Success!")
except Exception as e:
    if "400" in str(e):
        print("Bad request - check parameters")
    elif "401" in str(e):
        print("Unauthorized - check API key")
    elif "413" in str(e):
        print("Text too long for model")
    else:
        print(f"Error: {e}")

Common errors:

  • 400: Invalid parameters or missing text
  • 401: Invalid or missing API key
  • 413: Text exceeds model character limit
  • 429: Rate limit exceeded

Authorizations

x-api-key
string
header
required

Body

application/json

Text-to-speech generation parameters

The body is of type object.

Response

200
application/json

Text-to-speech response. Returns either JSON with audio URL or binary audio data depending on the model.

The response is of type object.