Overview
NanoGPT supports AI music generation through the same OpenAI-compatible Text-to-Speech endpoint. When you specify a music model, theinput field is treated as a music prompt (not text to speak) and the API returns an audio file.
Endpoint
Choosing A Music Model
Music model availability changes over time. Discover available audio models viaGET https://nano-gpt.com/api/v1/audio-models (see api-reference/endpoint/models.mdx) and select a model intended for music generation.
Request Format
voice field. For music models, voice is ignored, so it can be any string (many examples use "alloy").
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | A music-capable audio model ID (discover via GET https://nano-gpt.com/api/v1/audio-models) |
input | string | Yes | A text prompt describing the music you want generated |
voice | string | No | Ignored for music models. Some OpenAI-compatible clients require this field; if so, pass any string (for example "alloy"). |
Response
The response is an audio file (typically MP3). TheContent-Type header indicates the format.
Examples
Tips
- Be descriptive: include genre, instruments, tempo (BPM), mood, and style.
- Duration: generation duration varies by model. Most models produce ~10-30 seconds by default; duration may be influenced by the prompt.
- Cost/quality vary by model. If cost predictability matters, prefer models with flat per-generation pricing (when available).