Overview
NanoGPT provides a drop-in OpenAI-compatible endpoint for speech-to-text (STT) transcription.Endpoint
Authentication
Use either header:Authorization: Bearer YOUR_API_KEYx-api-key: YOUR_API_KEY
Request Formats
1) Multipart upload (OpenAI-compatible)
Sendmultipart/form-data with:
file(required): audio (or video for supported models)model(required): STT model IDlanguage(optional): language code (default: auto-detect)
2) JSON with URL
file_url or audio_url for URL-based transcription.
Supported Models (Examples)
Model availability changes; useGET /api/v1/models?detailed=true for discovery.
| Model | Notes |
|---|---|
Whisper-Large-V3 | High-accuracy transcription |
Wizper | Fast processing |
Elevenlabs-STT | Speaker diarization + audio event tagging |
gpt-4o-mini-transcribe | Improved accuracy vs Whisper (OpenAI-family) |
openai-whisper-with-video | Accepts video files (MP4, MOV, etc.) |
Voice Cloning (via the same endpoint)
Some special model IDs run voice-cloning workflows instead of returning plain transcription text:qwen-voice-clone— returns a reusable speaker embedding URLminimax-voice-clone— returns a reusable custom voice ID (and/or preview output)
Response
Supported Formats
- Audio: MP3, OGG, WAV, M4A, AAC
- Video (model-dependent): MP4, MOV, AVI, MKV, WEBM
Example (Python, OpenAI SDK)
See Also
- NanoGPT transcription workflows:
api-reference/endpoint/transcribe.mdx - Full STT guide and model list:
api-reference/speech-to-text.mdx