Overview
NanoGPT supports voice cloning so you can create reusable custom voices from short reference audio clips and then use them in text-to-speech (TTS). There are two voice-clone providers exposed via NanoGPT:- MiniMax voice clone: creates a reusable
customVoiceIdyou can pass asvoicewhen using compatible MiniMax Speech TTS models. - Qwen voice clone (1.7B): generates a speaker embedding file URL that you can pass to Qwen 3 TTS as
speaker_voice_embedding_file_url.
- Submit a clone job, receive a
runId(HTTP 202). - Poll the status endpoint until
status: "completed".
Authentication
All voice clone endpoints support:- API key auth:
x-api-key: <your NanoGPT API key>(orAuthorization: Bearer <key>) - Session auth (web app): browser cookies
Endpoints
| Provider | Submit | Status |
|---|---|---|
| MiniMax | POST /api/voice-clone/minimax | POST /api/voice-clone/minimax/status |
| Qwen | POST /api/voice-clone/qwen | POST /api/voice-clone/qwen/status |
MiniMax Voice Clone
Submit a Clone Job
multipart/form-data(upload an audio file)application/json(provideaudioUrl)
| Field | Type | Required | Notes |
|---|---|---|---|
audio | file | Yes (if no audioUrl) | MP3, M4A, WAV |
audioUrl | string | Yes (if no audio) | Hosted audio URL |
customVoiceId / custom_voice_id | string | Yes | Must match ^[A-Za-z][A-Za-z0-9]{7,}$ |
voiceCloneModel / model | string | No | Example values: speech-02-hd, speech-02-turbo |
needNoiseReduction / need_noise_reduction | boolean | No | Default false |
needVolumeNormalization / need_volume_normalization | boolean | No | Default false |
accuracy | number | No | 0 to 1, default 0.7 |
text / previewText | string | No | Preview text |
Poll Job Status
Qwen Voice Clone (1.7B)
Submit a Clone Job
multipart/form-data(upload an audio file)application/json(provideaudioUrl)
| Field | Type | Required | Notes |
|---|---|---|---|
audio | file | Yes (if no audioUrl) | MP3, OGG, WAV, M4A, AAC |
audioUrl / audio_url | string | Yes (if no audio) | Hosted audio URL |
referenceText / reference_text | string | No | Optional transcript |
Poll Job Status
X-Poll-After header indicating how many seconds to wait before polling again.
Response (completed)
Using Cloned Voices with TTS
MiniMax cloned voice (customVoiceId)
Use your customVoiceId as the normal voice on POST /api/tts with a compatible MiniMax Speech TTS model:
Qwen cloned voice (speakerEmbeddingUrl)
Use speakerEmbeddingUrl as speaker_voice_embedding_file_url on POST /api/tts with Qwen-3-TTS-1.7B:
Saving MiniMax Voice IDs (Web App)
If you use the NanoGPT web app, you can save and list your MiniMaxcustomVoiceId values.
These endpoints are session-authenticated only (they do not support API key auth).
List Saved Voice IDs
Save a Voice ID
Pricing
Clone runs are charged as a flat per-run fee:- MiniMax voice clone: $1.00 per run
- Qwen voice clone (1.7B): $0.25 per run
cost and paymentSource for the run.
Limitations
- Both providers are asynchronous; clients must poll status until completion.
- MiniMax
customVoiceIdmust match^[A-Za-z][A-Za-z0-9]{7,}$.