Overview
Synthesize speech with a single, low‑latency request. ThePOST /v1/speech
endpoint returns audio bytes directly in the HTTP response—ideal for simple request/response flows, UI playback, and short prompts without job polling.
Endpoint
- Method/Path:
POST https://api.nano-gpt.com/v1/speech
- Auth:
Authorization: Bearer <API_KEY>
- Required header:
Content-Type: application/json
- Optional header:
Accept
to prefer a MIME type (for exampleaudio/mpeg
,audio/wav
,audio/ogg
). If omitted, the responseContent-Type
follows the requestedformat
in the body.
Request Body (JSON)
model
(string, required): TTS‑capable model ID (for examplenano-tts-1
).input
(string, required): The text to synthesize. Plain text is supported.voice
(string, required): Voice preset (for exampleluna
,verse
,sonic
). See Voices below.format
(string, optional): Output container/codec. Common values:mp3
,wav
,ogg
,opus
,aac
,flac
,pcm16
.sample_rate
(number, optional): Required whenformat=pcm16
(for example16000
).speed
(number, optional): Speaking rate multiplier (for example0.5
–2.0
).language
(string, optional): BCP‑47 tag (for exampleen-US
).
pitch
(number, optional): Pitch shift or style value; model‑specific range.emotion
(string, optional): Style tag (for exampleneutral
,friendly
,excited
).stability
(number, optional): 0–1; voice stability (provider‑specific).similarity
(number, optional): 0–1; similarity boost (provider‑specific).
Response
- Success:
200 OK
, body contains binary audio. Content-Type
: Based onformat
/Accept
(for exampleaudio/mpeg
,audio/wav
,audio/ogg
).
invalid_model
, invalid_voice
, unsupported_format
, invalid_sample_rate
, input_too_long
, rate_limit_exceeded
.
Examples
All examples write audio to disk.Notes & Limits
- Max input length: depends on model; measured in characters or tokens. For short, interactive prompts, prefer under ~1–2k characters.
- Language support: varies by model. Specify
language
to force selection; otherwise, language may be auto‑detected. - Typical latency: scales with input length and selected
format
; compressed formats likemp3
are often faster thanwav
. - Usage metering: billed by characters/tokens or generated audio seconds (provider‑specific). See Pricing.
Audio Formats
Mapping betweenformat
and response Content-Type
:
format | Content-Type | Notes |
---|---|---|
mp3 | audio/mpeg | Widely supported in browsers |
wav | audio/wav | PCM; larger payloads |
ogg | audio/ogg | Container; may include Opus |
opus | audio/opus or audio/ogg | Streaming‑friendly |
aac | audio/aac | Safari‑friendly |
flac | audio/flac | Lossless |
pcm16 | application/octet-stream | Little‑endian mono; requires sample_rate |
Voices
- Voice IDs vary by model/provider. See model‑specific voices on Text‑to‑Speech:
api-reference/text-to-speech.mdx
. - If a voices listing endpoint is available (for example
GET /v1/voices
), it returns available voice IDs and metadata (language coverage, gender/pitch, sample links).
Errors & Troubleshooting
invalid_model
,invalid_voice
,unsupported_format
,invalid_sample_rate
: Verifymodel
,voice
,format
, and requiredsample_rate
forpcm16
.input_too_long
: Reduce length; split long text into chunks and stitch audio client‑side.rate_limit_exceeded
: Exponential backoff; retry after the window resets.- Network/client tips: Set
Accept
to your preferred audio type and write the raw response body directly to a file.
Security
- Do not expose API keys in browsers. Proxy via your server.
- Redact PII in logs; avoid logging raw text/audio in production.
- Rate‑limit public routes.
Pricing, Quotas, and Rate Limits
- Billing: per characters/tokens or generated seconds depending on provider/model; minimum billing increments may apply.
- Rate limits: per‑minute/day caps; contact support to request increases. See
api-reference/miscellaneous/pricing.mdx
andapi-reference/miscellaneous/rate-limits.mdx
.
Migration from Job‑based TTS
Already using the asyncPOST /tts
+ GET /tts/status
flow?
- When to switch: choose
v1/speech
for short prompts, low latency, and direct playback; keep job‑based TTS for long/batch generation and webhook workflows. - Parameter mapping:
text
→input
,voice
staysvoice
,output_format
→format
, speed/controls map directly when supported. - Retries/timeouts:
v1/speech
returns inline; implement client‑side timeouts and simple retries on 5xx.
Streaming
If chunked audio streaming is enabled for your account, you can request streaming with compatible formats (for example Opus/MP3) and consume the response as a stream. Example Node pattern:See Also
- Async/job‑based TTS:
api-reference/endpoint/tts.mdx
- TTS Status polling:
api-reference/endpoint/tts-status.mdx