v1/speech (Synchronous TTS)

Overview

Synthesize speech with a single, low‑latency request. The POST /v1/speech endpoint returns audio bytes directly in the HTTP response—ideal for simple request/response flows, UI playback, and short prompts without job polling.

Endpoint

Method/Path: POST https://api.nano-gpt.com/v1/speech
Auth: Authorization: Bearer <API_KEY>
Required header: Content-Type: application/json
Optional header: Accept to prefer a MIME type (for example audio/mpeg, audio/wav, audio/ogg). If omitted, the response Content-Type follows the requested format in the body.

Request Body (JSON)

model (string, required): TTS‑capable model ID (for example nano-tts-1).
input (string, required): The text to synthesize. Plain text is supported.
voice (string, required): Voice preset (for example luna, verse, sonic). See Voices below.
format (string, optional): Output container/codec. Common values: mp3, wav, ogg, opus, aac, flac, pcm16.
sample_rate (number, optional): Required when format=pcm16 (for example 16000).
speed (number, optional): Speaking rate multiplier (for example 0.5–2.0).
language (string, optional): BCP‑47 tag (for example en-US).

Additional controls (if supported by the selected model):

pitch (number, optional): Pitch shift or style value; model‑specific range.
emotion (string, optional): Style tag (for example neutral, friendly, excited).
stability (number, optional): 0–1; voice stability (provider‑specific).
similarity (number, optional): 0–1; similarity boost (provider‑specific).

Response

Success: 200 OK, body contains binary audio.
Content-Type: Based on format/Accept (for example audio/mpeg, audio/wav, audio/ogg).

On error, returns JSON with details and a non‑2xx status:

{ "error": { "type": "invalid_request_error", "message": "details" } }

Common error types: invalid_model, invalid_voice, unsupported_format, invalid_sample_rate, input_too_long, rate_limit_exceeded.

Examples

All examples write audio to disk.

curl -X POST \
  -H "Authorization: Bearer $NANOGPT_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Accept: audio/mpeg" \
  https://api.nano-gpt.com/v1/speech \
  -d '{
    "model": "nano-tts-1",
    "input": "Hello from NanoGPT!",
    "voice": "luna",
    "format": "mp3"
  }' \
  --output speech.mp3

Notes & Limits

Max input length: depends on model; measured in characters or tokens. For short, interactive prompts, prefer under ~1–2k characters.
Language support: varies by model. Specify language to force selection; otherwise, language may be auto‑detected.
Typical latency: scales with input length and selected format; compressed formats like mp3 are often faster than wav.
Usage metering: billed by characters/tokens or generated audio seconds (provider‑specific). See Pricing.

Audio Formats

Mapping between format and response Content-Type:

format	Content-Type	Notes
mp3	`audio/mpeg`	Widely supported in browsers
wav	`audio/wav`	PCM; larger payloads
ogg	`audio/ogg`	Container; may include Opus
opus	`audio/opus` or `audio/ogg`	Streaming‑friendly
aac	`audio/aac`	Safari‑friendly
flac	`audio/flac`	Lossless
pcm16	`application/octet-stream`	Little‑endian mono; requires `sample_rate`

Browser notes: Safari supports AAC/MP3; Ogg/Opus plays in Chrome/Firefox; WAV is universal but larger.

Voices

Voice IDs vary by model/provider. See model‑specific voices on Text‑to‑Speech: api-reference/text-to-speech.mdx.
If a voices listing endpoint is available (for example GET /v1/voices), it returns available voice IDs and metadata (language coverage, gender/pitch, sample links).

Errors & Troubleshooting

invalid_model, invalid_voice, unsupported_format, invalid_sample_rate: Verify model, voice, format, and required sample_rate for pcm16.
input_too_long: Reduce length; split long text into chunks and stitch audio client‑side.
rate_limit_exceeded: Exponential backoff; retry after the window resets.
Network/client tips: Set Accept to your preferred audio type and write the raw response body directly to a file.

Security

Do not expose API keys in browsers. Proxy via your server.
Redact PII in logs; avoid logging raw text/audio in production.
Rate‑limit public routes.

Pricing, Quotas, and Rate Limits

Billing: per characters/tokens or generated seconds depending on provider/model; minimum billing increments may apply.
Rate limits: per‑minute/day caps; contact support to request increases. See api-reference/miscellaneous/pricing.mdx and api-reference/miscellaneous/rate-limits.mdx.

Migration from Job‑based TTS

Already using the async POST /tts + GET /tts/status flow?

When to switch: choose v1/speech for short prompts, low latency, and direct playback; keep job‑based TTS for long/batch generation and webhook workflows.
Parameter mapping: text → input, voice stays voice, output_format → format, speed/controls map directly when supported.
Retries/timeouts: v1/speech returns inline; implement client‑side timeouts and simple retries on 5xx.

Streaming

If chunked audio streaming is enabled for your account, you can request streaming with compatible formats (for example Opus/MP3) and consume the response as a stream. Example Node pattern:

const res = await fetch("https://api.nano-gpt.com/v1/speech", { /* headers/body as above */ });
if (!res.ok) throw new Error(await res.text());
const file = await fs.open("speech.mp3", "w");
for await (const chunk of res.body) {
  await file.write(chunk);
}
await file.close();

Note: Streaming availability and formats may vary by model/provider.

Get Started

Endpoint Examples

API Reference

Miscellaneous

Integrations

Overview

Endpoint

Request Body (JSON)

Response

Examples

Notes & Limits

Audio Formats

Voices

Errors & Troubleshooting

Security

Pricing, Quotas, and Rate Limits

Migration from Job‑based TTS

Streaming

See Also

Get Started

Endpoint Examples

API Reference

Miscellaneous

Integrations

​Overview

​Endpoint

​Request Body (JSON)

​Response

​Examples

​Notes & Limits

​Audio Formats

​Voices

​Errors & Troubleshooting

​Security

​Pricing, Quotas, and Rate Limits

​Migration from Job‑based TTS

​Streaming

​See Also

Overview

Endpoint

Request Body (JSON)

Response

Examples

Notes & Limits

Audio Formats

Voices

Errors & Troubleshooting

Security

Pricing, Quotas, and Rate Limits

Migration from Job‑based TTS

Streaming

See Also