Skip to main content

Overview

Synthesize speech with a single, low‑latency request. The POST /v1/speech endpoint returns audio bytes directly in the HTTP response—ideal for simple request/response flows, UI playback, and short prompts without job polling.

Endpoint

  • Method/Path: POST https://api.nano-gpt.com/v1/speech
  • Auth: Authorization: Bearer <API_KEY>
  • Required header: Content-Type: application/json
  • Optional header: Accept to prefer a MIME type (for example audio/mpeg, audio/wav, audio/ogg). If omitted, the response Content-Type follows the requested format in the body.

Request Body (JSON)

  • model (string, required): TTS‑capable model ID (for example nano-tts-1).
  • input (string, required): The text to synthesize. Plain text is supported.
  • voice (string, required): Voice preset (for example luna, verse, sonic). See Voices below.
  • format (string, optional): Output container/codec. Common values: mp3, wav, ogg, opus, aac, flac, pcm16.
  • sample_rate (number, optional): Required when format=pcm16 (for example 16000).
  • speed (number, optional): Speaking rate multiplier (for example 0.52.0).
  • language (string, optional): BCP‑47 tag (for example en-US).
Additional controls (if supported by the selected model):
  • pitch (number, optional): Pitch shift or style value; model‑specific range.
  • emotion (string, optional): Style tag (for example neutral, friendly, excited).
  • stability (number, optional): 0–1; voice stability (provider‑specific).
  • similarity (number, optional): 0–1; similarity boost (provider‑specific).

Response

  • Success: 200 OK, body contains binary audio.
  • Content-Type: Based on format/Accept (for example audio/mpeg, audio/wav, audio/ogg).
On error, returns JSON with details and a non‑2xx status:
{ "error": { "type": "invalid_request_error", "message": "details" } }
Common error types: invalid_model, invalid_voice, unsupported_format, invalid_sample_rate, input_too_long, rate_limit_exceeded.

Examples

All examples write audio to disk.
curl -X POST \
  -H "Authorization: Bearer $NANOGPT_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Accept: audio/mpeg" \
  https://api.nano-gpt.com/v1/speech \
  -d '{
    "model": "nano-tts-1",
    "input": "Hello from NanoGPT!",
    "voice": "luna",
    "format": "mp3"
  }' \
  --output speech.mp3

Notes & Limits

  • Max input length: depends on model; measured in characters or tokens. For short, interactive prompts, prefer under ~1–2k characters.
  • Language support: varies by model. Specify language to force selection; otherwise, language may be auto‑detected.
  • Typical latency: scales with input length and selected format; compressed formats like mp3 are often faster than wav.
  • Usage metering: billed by characters/tokens or generated audio seconds (provider‑specific). See Pricing.

Audio Formats

Mapping between format and response Content-Type:
formatContent-TypeNotes
mp3audio/mpegWidely supported in browsers
wavaudio/wavPCM; larger payloads
oggaudio/oggContainer; may include Opus
opusaudio/opus or audio/oggStreaming‑friendly
aacaudio/aacSafari‑friendly
flacaudio/flacLossless
pcm16application/octet-streamLittle‑endian mono; requires sample_rate
Browser notes: Safari supports AAC/MP3; Ogg/Opus plays in Chrome/Firefox; WAV is universal but larger.

Voices

  • Voice IDs vary by model/provider. See model‑specific voices on Text‑to‑Speech: api-reference/text-to-speech.mdx.
  • If a voices listing endpoint is available (for example GET /v1/voices), it returns available voice IDs and metadata (language coverage, gender/pitch, sample links).

Errors & Troubleshooting

  • invalid_model, invalid_voice, unsupported_format, invalid_sample_rate: Verify model, voice, format, and required sample_rate for pcm16.
  • input_too_long: Reduce length; split long text into chunks and stitch audio client‑side.
  • rate_limit_exceeded: Exponential backoff; retry after the window resets.
  • Network/client tips: Set Accept to your preferred audio type and write the raw response body directly to a file.

Security

  • Do not expose API keys in browsers. Proxy via your server.
  • Redact PII in logs; avoid logging raw text/audio in production.
  • Rate‑limit public routes.

Pricing, Quotas, and Rate Limits

  • Billing: per characters/tokens or generated seconds depending on provider/model; minimum billing increments may apply.
  • Rate limits: per‑minute/day caps; contact support to request increases. See api-reference/miscellaneous/pricing.mdx and api-reference/miscellaneous/rate-limits.mdx.

Migration from Job‑based TTS

Already using the async POST /tts + GET /tts/status flow?
  • When to switch: choose v1/speech for short prompts, low latency, and direct playback; keep job‑based TTS for long/batch generation and webhook workflows.
  • Parameter mapping: textinput, voice stays voice, output_formatformat, speed/controls map directly when supported.
  • Retries/timeouts: v1/speech returns inline; implement client‑side timeouts and simple retries on 5xx.

Streaming

If chunked audio streaming is enabled for your account, you can request streaming with compatible formats (for example Opus/MP3) and consume the response as a stream. Example Node pattern:
const res = await fetch("https://api.nano-gpt.com/v1/speech", { /* headers/body as above */ });
if (!res.ok) throw new Error(await res.text());
const file = await fs.open("speech.mp3", "w");
for await (const chunk of res.body) {
  await file.write(chunk);
}
await file.close();
Note: Streaming availability and formats may vary by model/provider.

See Also

  • Async/job‑based TTS: api-reference/endpoint/tts.mdx
  • TTS Status polling: api-reference/endpoint/tts-status.mdx
I