Overview
NanoGPT provides a fully OpenAI-compatible embeddings API that offers access to both OpenAI’s industry-leading embedding models and a curated selection of alternative embedding models at competitive prices. Our API supports 16 different embedding models, providing options for various use cases, languages, and budget requirements.Quick Start
Available Models
OpenAI Models
Model | Dimensions | Max Tokens | Price/1M tokens | Features |
---|---|---|---|---|
text-embedding-3-small | 1536 | 8191 | $0.02 | Dimension reduction support, most cost-effective |
text-embedding-3-large | 3072 | 8191 | $0.13 | Dimension reduction support, highest performance |
text-embedding-ada-002 | 1536 | 8191 | $0.10 | Legacy model, no dimension reduction |
Alternative Models
Multilingual Models
Model | Dimensions | Price/1M tokens | Description |
---|---|---|---|
BAAI/bge-m3 | 1024 | $0.01 | Excellent multilingual support |
jina-clip-v1 | 768 | $0.04 | Multimodal CLIP embeddings |
Language-Specific Models
Model | Language | Dimensions | Price/1M tokens |
---|---|---|---|
BAAI/bge-large-en-v1.5 | English | 768 | $0.01 |
BAAI/bge-large-zh-v1.5 | Chinese | 1024 | $0.01 |
jina-embeddings-v2-base-en | English | 768 | $0.05 |
jina-embeddings-v2-base-de | German | 768 | $0.05 |
jina-embeddings-v2-base-zh | Chinese | 768 | $0.05 |
jina-embeddings-v2-base-es | Spanish | 768 | $0.05 |
Specialized Models
Model | Use Case | Dimensions | Price/1M tokens |
---|---|---|---|
jina-embeddings-v2-base-code | Code | 768 | $0.05 |
Baichuan-Text-Embedding | General | 1024 | $0.088 |
netease-youdao/bce-embedding-base_v1 | General | 1024 | $0.02 |
zhipu-embedding-2 | Chinese | 1024 | $0.07 |
Qwen/Qwen3-Embedding-0.6B | General | 1024 | $0.01 |
API Endpoints
Create Embeddings
Endpoint:POST https://nano-gpt.com/api/v1/embeddings
Create embeddings for one or more text inputs.
Discover Embedding Models
Endpoint:GET https://nano-gpt.com/api/v1/embedding-models
List all available embedding models with detailed information.
Advanced Features
Batch Processing
Process multiple texts efficiently in a single request:Dimension Reduction
Reduce embedding dimensions for faster similarity comparisons (supported models only):text-embedding-3-small
text-embedding-3-large
Qwen/Qwen3-Embedding-0.6B
Base64 Encoding
For more efficient data transfer, request base64-encoded embeddings:Use Cases
Semantic Search
Build powerful search systems that understand meaning:RAG (Retrieval Augmented Generation)
Enhance LLM responses with relevant context:Clustering & Classification
Group similar texts or classify content:Duplicate Detection
Find similar or duplicate content:Model Selection Guide
By Use Case
Use Case | Recommended Model | Rationale |
---|---|---|
General English text | text-embedding-3-small | Best price/performance ratio |
Maximum accuracy | text-embedding-3-large | Highest quality embeddings |
Multilingual content | BAAI/bge-m3 | Excellent cross-language performance |
Code embeddings | jina-embeddings-v2-base-code | Specialized for programming languages |
Budget-conscious | BAAI/bge-large-en-v1.5 | Just $0.01/1M tokens |
Chinese content | BAAI/bge-large-zh-v1.5 | Optimized for Chinese |
Fast similarity search | Models with dimension reduction | Can reduce dimensions for speed |
By Requirements
Need fastest search?- Use models supporting dimension reduction
- Reduce to 256-512 dimensions
- Trade small accuracy loss for 2-4x speed improvement
- Use
text-embedding-3-large
- Keep full 3072 dimensions
- Best for critical applications
- Use
BAAI/bge-m3
for general multilingual - Use language-specific Jina models for best per-language performance
- Use
jina-embeddings-v2-base-code
- Optimized for programming language semantics
Best Practices
Performance Optimization
- Batch Requests: Send up to 2048 texts in a single request
- Use Dimension Reduction: Reduce dimensions when exact precision isn’t critical
- Cache Embeddings: Store computed embeddings to avoid re-processing
- Choose Appropriate Models: Don’t use 3072-dimension models if 768 suffices
Cost Optimization
- Monitor Usage: Track the
usage
field in responses - Start Small: Begin with
text-embedding-3-small
before upgrading - Implement Caching: Avoid re-embedding identical content
- Batch Processing: Reduce API call overhead
Quality Optimization
- Preprocess Text: Clean and normalize text before embedding
- Consider Context: Include relevant context in the text to embed
- Test Different Models: Compare performance for your specific use case
- Use Appropriate Similarity Metrics: Cosine similarity for most cases
Integration Examples
JavaScript/TypeScript
cURL
Direct API Usage
Rate Limits & Error Handling
Rate Limits
- Default: 100 requests per second per IP address
- Internal requests: No rate limiting (requires internal auth token)
Error Codes
Code | Description | Solution |
---|---|---|
401 | Invalid or missing API key | Check your API key |
400 | Invalid request parameters | Verify model name and input format |
429 | Rate limit exceeded | Implement exponential backoff |
500 | Server error | Retry with exponential backoff |
Error Response Format
Migration from OpenAI
Switching from OpenAI to NanoGPT is seamless:Pricing Summary
Price Range | Models | Best For |
---|---|---|
$0.01/1M | BAAI models, Qwen | Budget applications |
$0.02/1M | text-embedding-3-small, netease-youdao | Balanced performance |
$0.04-0.05/1M | Jina models | Specialized use cases |
$0.07-0.088/1M | zhipu, Baichuan | Specific requirements |
$0.10/1M | ada-002 | Legacy compatibility |
$0.13/1M | text-embedding-3-large | Maximum performance |