Documentation Index
Fetch the complete documentation index at: https://docs.nano-gpt.com/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The NanoGPT STT API allows you to transcribe audio files into text using state-of-the-art speech recognition models. The API supports multiple languages, speaker diarization, and various audio formats with both synchronous and asynchronous processing options. For drop-in OpenAI SDK compatibility, you can also use the OpenAI-compatible endpoint:POST /api/v1/audio/transcriptions (see api-reference/endpoint/audio-transcriptions.mdx).
Available Models
NanoGPT supports multiple Speech-to-Text and audio-to-text workflows, including standard transcription, video transcription, and voice cloning.| Model ID | Type | Billing | Price |
|---|---|---|---|
Whisper-Large-V3 | Transcription | Per minute | ~$0.0005/min |
Wizper | Transcription | Per minute | $0.01/min |
Elevenlabs-STT | Transcription (async + diarization) | Per minute | $0.03/min |
gpt-4o-mini-transcribe | Transcription | Per minute | $0.003/min |
gpt-4o-mini-transcribe-2025-03-20 | Transcription | Per minute | $0.003/min |
gpt-4o-mini-transcribe-2025-12-15 | Transcription | Per minute | $0.003/min |
gpt-4o-mini-transcribe-latest | Transcription | Per minute | $0.003/min |
openai-whisper-with-video | Video transcription | Per minute | $0.06/min |
qwen-voice-clone | Voice cloning (async) | Per run | $0.25/run |
minimax-voice-clone | Voice cloning (async) | Per run | $1.00/run |
Authentication
All requests require authentication via API key:File Upload Methods
Method 1: Direct File Upload (≤3MB)
For smaller audio files, upload directly using multipart/form-data:Method 2: URL Upload (Recommended for >3MB)
For larger files, use URL-based upload:Advanced Features with Elevenlabs-STT
Speaker Diarization
Identify and label different speakers in conversations:Language Support
The API supports 97+ languages with auto-detection:Complete Class Implementation
Here’s a complete transcriber class with error handling and retry logic:Error Handling and Best Practices
Common Error Responses
File Format Support
Pricing and Billing
Transcription models are billed by audio/video duration. Voice cloning models are billed per run.Transcription (per minute)
- Whisper-Large-V3: ~$0.0005/min
- Wizper: $0.01/min
- Elevenlabs-STT: $0.03/min
- gpt-4o-mini-transcribe (and snapshots/aliases): $0.003/min
- openai-whisper-with-video: $0.06/min
Voice cloning (per run)
- qwen-voice-clone: $0.25/run
- minimax-voice-clone: $1.00/run