Documentation Index
Fetch the complete documentation index at: https://docs.nano-gpt.com/llms.txt
Use this file to discover all available pages before exploring further.
Overview
NanoGPT provides a fully OpenAI-compatible embeddings API that offers access to both OpenAI’s industry-leading embedding models and a curated selection of alternative embedding models at competitive prices. Our API supports 20+ embedding models (and this list changes over time); use GET /api/v1/embedding-models for the source-of-truth list.
Quick Start
from openai import OpenAI
# Initialize client pointing to NanoGPT
client = OpenAI(
api_key="YOUR_NANOGPT_API_KEY",
base_url="https://nano-gpt.com/api/v1"
)
# Create embedding
response = client.embeddings.create(
input="Your text to embed",
model="text-embedding-3-small"
)
embedding = response.data[0].embedding
print(f"Embedding has {len(embedding)} dimensions")
Available Models
OpenAI Models
| Model | Dimensions | Max Tokens | Price/1M tokens | Features |
|---|
text-embedding-3-small | 1536 | 8191 | $0.02 | Dimension reduction support, most cost-effective |
text-embedding-3-large | 3072 | 8191 | $0.13 | Dimension reduction support, highest performance |
text-embedding-ada-002 | 1536 | 8191 | $0.10 | Legacy model, no dimension reduction |
Alternative Models
Multilingual Models
| Model | Dimensions | Price/1M tokens | Description |
|---|
BAAI/bge-m3 | 1024 | $0.01 | Excellent multilingual support |
jina-clip-v1 | 768 | $0.04 | Multimodal CLIP embeddings |
Language-Specific Models
| Model | Language | Dimensions | Price/1M tokens |
|---|
BAAI/bge-base-en-v1.5 | English | 768 | $0.01 |
BAAI/bge-large-en-v1.5 | English | 1024 | $0.01 |
BAAI/bge-large-zh-v1.5 | Chinese | 1024 | $0.01 |
jina-embeddings-v2-base-en | English | 768 | $0.05 |
jina-embeddings-v2-base-de | German | 768 | $0.05 |
jina-embeddings-v2-base-zh | Chinese | 768 | $0.05 |
jina-embeddings-v2-base-es | Spanish | 768 | $0.05 |
Specialized Models
| Model | Use Case | Dimensions | Price/1M tokens |
|---|
BAAI/bge-reranker-large | Reranking | 1024 | $0.01 |
jina-embeddings-v2-base-code | Code | 768 | $0.05 |
Baichuan-Text-Embedding | General | 1024 | $0.088 |
netease-youdao/bce-embedding-base_v1 | General | 1024 | $0.02 |
zhipu-embedding-2 | Chinese | 1024 | $0.07 |
Qwen/Qwen3-Embedding-0.6B | General | 1024 | $0.01 |
Qwen/Qwen3-Embedding-4B | General | 1536 | $0.03 |
Qwen/Qwen3-Embedding-8B | General | 1536 | $0.05 |
jina-embeddings-v3 | General | 1024 | $0.10 |
jina-embeddings-v4 | General | 2048 | $0.10 |
gemini-embedding-001 | General | 3072 | $0.15 |
doubao-embedding-large-text-240915 | General | 4096 | $0.10 |
API Endpoints
Create Embeddings
Endpoint: POST https://nano-gpt.com/api/v1/embeddings
Create embeddings for one or more text inputs.
Discover Embedding Models
Endpoint: GET https://nano-gpt.com/api/v1/embedding-models
List all available embedding models with detailed information.
Advanced Features
Batch Processing
Process multiple texts efficiently in a single request:
texts = [
"First text to embed",
"Second text to embed",
"Third text to embed"
]
response = client.embeddings.create(
input=texts, # Pass array of strings
model="text-embedding-3-small"
)
# Access embeddings by index
for i, data in enumerate(response.data):
print(f"Text {i}: {len(data.embedding)} dimensions")
Dimension Reduction
Reduce embedding dimensions for faster similarity comparisons (supported models only):
# Reduce dimensions to 256 for faster processing
response = client.embeddings.create(
input="Your text here",
model="text-embedding-3-small",
dimensions=256 # Reduce from 1536 to 256
)
Supported models for dimension reduction:
text-embedding-3-small
text-embedding-3-large
Qwen/Qwen3-Embedding-0.6B
Base64 Encoding
For more efficient data transfer, request base64-encoded embeddings:
response = client.embeddings.create(
input="Your text here",
model="text-embedding-3-small",
encoding_format="base64" # Returns base64-encoded bytes
)
Use Cases
Semantic Search
Build powerful search systems that understand meaning:
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
# Create embeddings for your documents
documents = ["Document 1 text", "Document 2 text", "Document 3 text"]
doc_embeddings = []
for doc in documents:
response = client.embeddings.create(input=doc, model="text-embedding-3-small")
doc_embeddings.append(response.data[0].embedding)
# Create embedding for search query
query = "Search query text"
query_response = client.embeddings.create(input=query, model="text-embedding-3-small")
query_embedding = query_response.data[0].embedding
# Calculate similarities
similarities = cosine_similarity([query_embedding], doc_embeddings)[0]
# Find most similar documents
top_matches = np.argsort(similarities)[::-1][:3]
for idx in top_matches:
print(f"Document {idx}: {similarities[idx]:.3f} similarity")
RAG (Retrieval Augmented Generation)
Enhance LLM responses with relevant context:
# 1. Embed and store your knowledge base
knowledge_base = [
{"text": "Company founded in 2020...", "embedding": None},
{"text": "Product features include...", "embedding": None},
]
for item in knowledge_base:
response = client.embeddings.create(
input=item["text"],
model="text-embedding-3-small"
)
item["embedding"] = response.data[0].embedding
# 2. For a user query, find relevant context
user_query = "When was the company founded?"
query_response = client.embeddings.create(
input=user_query,
model="text-embedding-3-small"
)
query_embedding = query_response.data[0].embedding
# 3. Find most relevant facts (implement similarity search)
# relevant_facts = find_similar_texts(query_embedding, knowledge_base, top_k=3)
# 4. Use retrieved context with chat completion
chat_response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": f"Use this context to answer: {relevant_facts}"},
{"role": "user", "content": user_query}
]
)
Clustering & Classification
Group similar texts or classify content:
from sklearn.cluster import KMeans
# Create embeddings for texts
texts = ["Text 1", "Text 2", "Text 3", ...]
embeddings = []
for text in texts:
response = client.embeddings.create(input=text, model="text-embedding-3-small")
embeddings.append(response.data[0].embedding)
# Cluster embeddings
kmeans = KMeans(n_clusters=5)
clusters = kmeans.fit_predict(embeddings)
# Each text now has a cluster assignment
for text, cluster_id in zip(texts, clusters):
print(f"'{text}' belongs to cluster {cluster_id}")
Duplicate Detection
Find similar or duplicate content:
def find_duplicates(texts, threshold=0.95):
embeddings = []
# Generate embeddings
for text in texts:
response = client.embeddings.create(
input=text,
model="text-embedding-3-small"
)
embeddings.append(response.data[0].embedding)
# Calculate pairwise similarities
similarities = cosine_similarity(embeddings)
# Find duplicates
duplicates = []
for i in range(len(texts)):
for j in range(i+1, len(texts)):
if similarities[i][j] > threshold:
duplicates.append((i, j, similarities[i][j]))
return duplicates
Model Selection Guide
By Use Case
| Use Case | Recommended Model | Rationale |
|---|
| General English text | text-embedding-3-small | Best price/performance ratio |
| Maximum accuracy | text-embedding-3-large | Highest quality embeddings |
| Multilingual content | BAAI/bge-m3 | Excellent cross-language performance |
| Code embeddings | jina-embeddings-v2-base-code | Specialized for programming languages |
| Budget-conscious | BAAI/bge-large-en-v1.5 | Just $0.01/1M tokens |
| Chinese content | BAAI/bge-large-zh-v1.5 | Optimized for Chinese |
| Fast similarity search | Models with dimension reduction | Can reduce dimensions for speed |
By Requirements
Need fastest search?
- Use models supporting dimension reduction
- Reduce to 256-512 dimensions
- Trade small accuracy loss for 2-4x speed improvement
Need highest accuracy?
- Use
text-embedding-3-large
- Keep full 3072 dimensions
- Best for critical applications
Processing many languages?
- Use
BAAI/bge-m3 for general multilingual
- Use language-specific Jina models for best per-language performance
Working with code?
- Use
jina-embeddings-v2-base-code
- Optimized for programming language semantics
Best Practices
- Batch Requests: Send up to 2048 texts in a single request
- Use Dimension Reduction: Reduce dimensions when exact precision isn’t critical
- Cache Embeddings: Store computed embeddings to avoid re-processing
- Choose Appropriate Models: Don’t use 3072-dimension models if 768 suffices
Cost Optimization
- Monitor Usage: Track the
usage field in responses
- Start Small: Begin with
text-embedding-3-small before upgrading
- Implement Caching: Avoid re-embedding identical content
- Batch Processing: Reduce API call overhead
Quality Optimization
- Preprocess Text: Clean and normalize text before embedding
- Consider Context: Include relevant context in the text to embed
- Test Different Models: Compare performance for your specific use case
- Use Appropriate Similarity Metrics: Cosine similarity for most cases
Integration Examples
JavaScript/TypeScript
import OpenAI from 'openai';
// Initialize client
const openai = new OpenAI({
apiKey: 'YOUR_NANOGPT_API_KEY',
baseURL: 'https://nano-gpt.com/api/v1'
});
// Create embedding
const response = await openai.embeddings.create({
input: "Your text to embed",
model: "text-embedding-3-small"
});
const embedding = response.data[0].embedding;
console.log(`Embedding has ${embedding.length} dimensions`);
cURL
curl https://nano-gpt.com/api/v1/embeddings \
-H "Authorization: Bearer YOUR_NANOGPT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input": "Your text to embed",
"model": "text-embedding-3-small"
}'
Direct API Usage
import requests
import json
headers = {
"Authorization": "Bearer YOUR_NANOGPT_API_KEY",
"Content-Type": "application/json"
}
data = {
"input": "Your text to embed",
"model": "text-embedding-3-small"
}
response = requests.post(
"https://nano-gpt.com/api/v1/embeddings",
headers=headers,
json=data
)
result = response.json()
embedding = result["data"][0]["embedding"]
Rate Limits & Error Handling
Rate Limits
Rate limits vary by endpoint and account. See Rate Limits.
Error Codes
| Code | Description | Solution |
|---|
| 401 | Invalid or missing API key | Check your API key |
| 400 | Invalid request parameters | Verify model name and input format |
| 429 | Rate limit exceeded | Implement exponential backoff |
| 500 | Server error | Retry with exponential backoff |
{
"error": {
"message": "Invalid model specified",
"type": "invalid_request_error",
"param": "model",
"code": null
}
}
For a general guide across NanoGPT APIs, see Error Handling.
Migration from OpenAI
Switching from OpenAI to NanoGPT is seamless:
# OpenAI
client = OpenAI(api_key="sk-...")
# NanoGPT (just change base_url and api_key)
client = OpenAI(
api_key="YOUR_NANOGPT_API_KEY",
base_url="https://nano-gpt.com/api/v1"
)
# All other code remains exactly the same!
Pricing Summary
| Price Range | Models | Best For |
|---|
| $0.01/1M | BAAI models, Qwen | Budget applications |
| $0.02/1M | text-embedding-3-small, netease-youdao | Balanced performance |
| $0.04-0.05/1M | Jina models | Specialized use cases |
| $0.07-0.088/1M | zhipu, Baichuan | Specific requirements |
| $0.10/1M | ada-002 | Legacy compatibility |
| $0.13/1M | text-embedding-3-large | Maximum performance |
Additional Resources