Text-to-Speech

Overview

Convert text into natural-sounding speech using various TTS models. Supports multiple languages, voices, and customization options including speed control and voice instructions.

Supported Models

Kokoro-82m: 44 multilingual voices ($0.001/1k chars)
Dia-TTS: Multi-speaker conversations ($0.05/1k chars)
Elevenlabs-Turbo-V2.5: Premium quality with style controls ($0.06/1k chars)
tts-1: OpenAI standard quality ($0.015/1k chars)
tts-1-hd: OpenAI high definition ($0.030/1k chars)
gpt-4o-mini-tts: Ultra-low cost ($0.0006/1k chars)

Basic Usage

import requests

def text_to_speech(text, model="Kokoro-82m", voice=None, **kwargs):
    headers = {
        "x-api-key": "YOUR_API_KEY",
        "Content-Type": "application/json"
    }
    
    payload = {
        "text": text,
        "model": model
    }
    
    if voice:
        payload["voice"] = voice
    
    payload.update(kwargs)
    
    response = requests.post(
        "https://nano-gpt.com/api/tts",
        headers=headers,
        json=payload
    )
    
    if response.status_code == 200:
        content_type = response.headers.get('content-type', '')
        
        if 'application/json' in content_type:
            # JSON response with audio URL
            data = response.json()
            audio_response = requests.get(data['audioUrl'])
            with open('output.wav', 'wb') as f:
                f.write(audio_response.content)
        else:
            # Binary audio data (OpenAI models)
            with open('output.mp3', 'wb') as f:
                f.write(response.content)
        
        return response
    else:
        raise Exception(f"Error: {response.status_code}")

# Basic usage
text_to_speech(
    "Hello! Welcome to our service.",
    model="Kokoro-82m",
    voice="af_bella"
)

Model-Specific Examples

Kokoro-82m - Multilingual Voices

44 voices across 13 language groups:

# Popular voice examples by category
voices = {
    "american_female": ["af_bella", "af_nova", "af_aoede"],
    "american_male": ["am_adam", "am_onyx", "am_eric"],
    "british_female": ["bf_alice", "bf_emma"],
    "british_male": ["bm_daniel", "bm_george"],
    "japanese_female": ["jf_alpha", "jf_gongitsune"],
    "chinese_female": ["zf_xiaoxiao", "zf_xiaoyi"],
    "french_female": ["ff_siwis"],
    "italian_male": ["im_nicola"]
}

# Generate multilingual samples
samples = [
    {"text": "Hello, welcome!", "voice": "af_bella", "lang": "English"},
    {"text": "Bonjour et bienvenue!", "voice": "ff_siwis", "lang": "French"},
    {"text": "こんにちは！", "voice": "jf_alpha", "lang": "Japanese"},
    {"text": "你好，欢迎！", "voice": "zf_xiaoxiao", "lang": "Chinese"}
]

for sample in samples:
    text_to_speech(
        text=sample["text"],
        model="Kokoro-82m",
        voice=sample["voice"]
    )

Dia-TTS - Multi-Speaker Conversations

Create dialogues with speaker tags:

# Multi-speaker conversation
dialogue = "[S1] Welcome to our podcast! [S2] Thanks for having me. [S1] Let's begin!"

text_to_speech(
    text=dialogue,
    model="Dia-TTS",
    speed=1.1
)

# Single speaker with specific voice
text_to_speech(
    text="[S1] This is a single speaker narration.",
    model="Dia-TTS",
    voice="S2"  # Use S2 voice for all text
)

Elevenlabs-Turbo-V2.5 - Advanced Voice Controls

Premium quality with style adjustments:

# Stable, consistent voice
text_to_speech(
    text="This is a professional announcement.",
    model="Elevenlabs-Turbo-V2.5",
    voice="Rachel",
    stability=0.9,
    similarity_boost=0.8,
    style=0
)

# Expressive, dynamic voice  
text_to_speech(
    text="This is so exciting!",
    model="Elevenlabs-Turbo-V2.5",
    voice="Rachel",
    stability=0.3,
    similarity_boost=0.7,
    style=0.8,
    speed=1.2
)

# Available voices: Rachel, Adam, Bella, Brian, etc.

OpenAI Models - Multiple Formats & Instructions

# High-definition with voice instructions
text_to_speech(
    text="Welcome to customer service.",
    model="tts-1-hd",
    voice="nova",
    instructions="Speak warmly and professionally like a customer service representative",
    response_format="flac"
)

# Ultra-low cost option
text_to_speech(
    text="This is a cost-effective option.",
    model="gpt-4o-mini-tts",
    voice="alloy",
    instructions="Speak clearly and cheerfully",
    response_format="mp3"
)

# Different format examples
formats = ["mp3", "wav", "opus", "flac", "aac"]
for fmt in formats:
    text_to_speech(
        text=f"This is {fmt.upper()} format.",
        model="tts-1",
        voice="echo",
        response_format=fmt
    )

Response Examples

JSON Response (Most Models)

{
  "audioUrl": "https://storage.url/audio-file.wav",
  "contentType": "audio/wav",
  "model": "Kokoro-82m",
  "text": "Hello world",
  "voice": "af_bella",
  "speed": 1,
  "duration": 2.3,
  "cost": 0.001,
  "currency": "USD"
}

Binary Response (OpenAI Models)

OpenAI models return audio data directly as binary with appropriate headers:

Content-Type: audio/mp3
Content-Length: 123456
[Binary audio data]

Voice Options

Kokoro-82m Voices

American Female: af_bella, af_nova, af_aoede, af_jessica, af_sarah
American Male: am_adam, am_onyx, am_eric, am_liam
British: bf_alice, bf_emma, bm_daniel, bm_george
Asian Languages: jf_alpha (Japanese), zf_xiaoxiao (Chinese)
European: ff_siwis (French), im_nicola (Italian)

Elevenlabs-Turbo-V2.5 Voices

Rachel, Adam, Bella, Brian, Sarah, Michael, Emily, James, Nicole, and 37 more

OpenAI Voices

alloy, echo, fable, onyx, nova, shimmer, ash, ballad, coral, sage, verse

Error Handling

try:
    result = text_to_speech("Hello world!", model="Kokoro-82m")
    print("Success!")
except Exception as e:
    if "400" in str(e):
        print("Bad request - check parameters")
    elif "401" in str(e):
        print("Unauthorized - check API key")
    elif "413" in str(e):
        print("Text too long for model")
    else:
        print(f"Error: {e}")

Common errors:

400: Invalid parameters or missing text
401: Invalid or missing API key
413: Text exceeds model character limit
429: Rate limit exceeded

Authorizations

x-api-key

string

header

required

Body

application/json

Text-to-speech generation parameters

The body is of type object.

Response

200

application/json

Text-to-speech response. Returns either JSON with audio URL or binary audio data depending on the model.

The response is of type object.

Get Started

Endpoint Examples

API Reference

Miscellaneous

Integrations

Overview

Supported Models

Basic Usage

Model-Specific Examples

Kokoro-82m - Multilingual Voices

Dia-TTS - Multi-Speaker Conversations

Elevenlabs-Turbo-V2.5 - Advanced Voice Controls

OpenAI Models - Multiple Formats & Instructions

Response Examples

JSON Response (Most Models)

Binary Response (OpenAI Models)

Voice Options

Kokoro-82m Voices

Elevenlabs-Turbo-V2.5 Voices

OpenAI Voices

Error Handling

Authorizations

Body

Response

Get Started

Endpoint Examples

API Reference

Miscellaneous

Integrations

​Overview

​Supported Models

​Basic Usage

​Model-Specific Examples

​Kokoro-82m - Multilingual Voices

​Dia-TTS - Multi-Speaker Conversations

​Elevenlabs-Turbo-V2.5 - Advanced Voice Controls

​OpenAI Models - Multiple Formats & Instructions

​Response Examples

​JSON Response (Most Models)

​Binary Response (OpenAI Models)

​Voice Options

​Kokoro-82m Voices

​Elevenlabs-Turbo-V2.5 Voices

​OpenAI Voices

​Error Handling

Authorizations

Body

Response

Overview

Supported Models

Basic Usage

Model-Specific Examples

Kokoro-82m - Multilingual Voices

Dia-TTS - Multi-Speaker Conversations

Elevenlabs-Turbo-V2.5 - Advanced Voice Controls

OpenAI Models - Multiple Formats & Instructions

Response Examples

JSON Response (Most Models)

Binary Response (OpenAI Models)

Voice Options

Kokoro-82m Voices

Elevenlabs-Turbo-V2.5 Voices

OpenAI Voices

Error Handling