Overview
NanoGPT provides a fully OpenAI-compatible embeddings API that offers access to both OpenAI’s industry-leading embedding models and a curated selection of alternative embedding models at competitive prices. Our API supports 20+ embedding models (and this list changes over time); useGET /api/v1/embedding-models for the source-of-truth list.
Quick Start
Available Models
OpenAI Models
| Model | Dimensions | Max Tokens | Price/1M tokens | Features |
|---|---|---|---|---|
text-embedding-3-small | 1536 | 8191 | $0.02 | Dimension reduction support, most cost-effective |
text-embedding-3-large | 3072 | 8191 | $0.13 | Dimension reduction support, highest performance |
text-embedding-ada-002 | 1536 | 8191 | $0.10 | Legacy model, no dimension reduction |
Alternative Models
Multilingual Models
| Model | Dimensions | Price/1M tokens | Description |
|---|---|---|---|
BAAI/bge-m3 | 1024 | $0.01 | Excellent multilingual support |
jina-clip-v1 | 768 | $0.04 | Multimodal CLIP embeddings |
Language-Specific Models
| Model | Language | Dimensions | Price/1M tokens |
|---|---|---|---|
BAAI/bge-base-en-v1.5 | English | 768 | $0.01 |
BAAI/bge-large-en-v1.5 | English | 1024 | $0.01 |
BAAI/bge-large-zh-v1.5 | Chinese | 1024 | $0.01 |
jina-embeddings-v2-base-en | English | 768 | $0.05 |
jina-embeddings-v2-base-de | German | 768 | $0.05 |
jina-embeddings-v2-base-zh | Chinese | 768 | $0.05 |
jina-embeddings-v2-base-es | Spanish | 768 | $0.05 |
Specialized Models
| Model | Use Case | Dimensions | Price/1M tokens |
|---|---|---|---|
BAAI/bge-reranker-large | Reranking | 1024 | $0.01 |
jina-embeddings-v2-base-code | Code | 768 | $0.05 |
Baichuan-Text-Embedding | General | 1024 | $0.088 |
netease-youdao/bce-embedding-base_v1 | General | 1024 | $0.02 |
zhipu-embedding-2 | Chinese | 1024 | $0.07 |
Qwen/Qwen3-Embedding-0.6B | General | 1024 | $0.01 |
Qwen/Qwen3-Embedding-4B | General | 1536 | $0.03 |
Qwen/Qwen3-Embedding-8B | General | 1536 | $0.05 |
jina-embeddings-v3 | General | 1024 | $0.10 |
jina-embeddings-v4 | General | 2048 | $0.10 |
gemini-embedding-001 | General | 3072 | $0.15 |
doubao-embedding-large-text-240915 | General | 4096 | $0.10 |
API Endpoints
Create Embeddings
Endpoint:POST https://nano-gpt.com/api/v1/embeddings
Create embeddings for one or more text inputs.
Discover Embedding Models
Endpoint:GET https://nano-gpt.com/api/v1/embedding-models
List all available embedding models with detailed information.
Advanced Features
Batch Processing
Process multiple texts efficiently in a single request:Dimension Reduction
Reduce embedding dimensions for faster similarity comparisons (supported models only):text-embedding-3-smalltext-embedding-3-largeQwen/Qwen3-Embedding-0.6B
Base64 Encoding
For more efficient data transfer, request base64-encoded embeddings:Use Cases
Semantic Search
Build powerful search systems that understand meaning:RAG (Retrieval Augmented Generation)
Enhance LLM responses with relevant context:Clustering & Classification
Group similar texts or classify content:Duplicate Detection
Find similar or duplicate content:Model Selection Guide
By Use Case
| Use Case | Recommended Model | Rationale |
|---|---|---|
| General English text | text-embedding-3-small | Best price/performance ratio |
| Maximum accuracy | text-embedding-3-large | Highest quality embeddings |
| Multilingual content | BAAI/bge-m3 | Excellent cross-language performance |
| Code embeddings | jina-embeddings-v2-base-code | Specialized for programming languages |
| Budget-conscious | BAAI/bge-large-en-v1.5 | Just $0.01/1M tokens |
| Chinese content | BAAI/bge-large-zh-v1.5 | Optimized for Chinese |
| Fast similarity search | Models with dimension reduction | Can reduce dimensions for speed |
By Requirements
Need fastest search?- Use models supporting dimension reduction
- Reduce to 256-512 dimensions
- Trade small accuracy loss for 2-4x speed improvement
- Use
text-embedding-3-large - Keep full 3072 dimensions
- Best for critical applications
- Use
BAAI/bge-m3for general multilingual - Use language-specific Jina models for best per-language performance
- Use
jina-embeddings-v2-base-code - Optimized for programming language semantics
Best Practices
Performance Optimization
- Batch Requests: Send up to 2048 texts in a single request
- Use Dimension Reduction: Reduce dimensions when exact precision isn’t critical
- Cache Embeddings: Store computed embeddings to avoid re-processing
- Choose Appropriate Models: Don’t use 3072-dimension models if 768 suffices
Cost Optimization
- Monitor Usage: Track the
usagefield in responses - Start Small: Begin with
text-embedding-3-smallbefore upgrading - Implement Caching: Avoid re-embedding identical content
- Batch Processing: Reduce API call overhead
Quality Optimization
- Preprocess Text: Clean and normalize text before embedding
- Consider Context: Include relevant context in the text to embed
- Test Different Models: Compare performance for your specific use case
- Use Appropriate Similarity Metrics: Cosine similarity for most cases
Integration Examples
JavaScript/TypeScript
cURL
Direct API Usage
Rate Limits & Error Handling
Rate Limits
Rate limits vary by endpoint and account. See Rate Limits.Error Codes
| Code | Description | Solution |
|---|---|---|
| 401 | Invalid or missing API key | Check your API key |
| 400 | Invalid request parameters | Verify model name and input format |
| 429 | Rate limit exceeded | Implement exponential backoff |
| 500 | Server error | Retry with exponential backoff |
Error Response Format
Migration from OpenAI
Switching from OpenAI to NanoGPT is seamless:Pricing Summary
| Price Range | Models | Best For |
|---|---|---|
| $0.01/1M | BAAI models, Qwen | Budget applications |
| $0.02/1M | text-embedding-3-small, netease-youdao | Balanced performance |
| $0.04-0.05/1M | Jina models | Specialized use cases |
| $0.07-0.088/1M | zhipu, Baichuan | Specific requirements |
| $0.10/1M | ada-002 | Legacy compatibility |
| $0.13/1M | text-embedding-3-large | Maximum performance |