Overview

NanoGPT provides a fully OpenAI-compatible embeddings API that offers access to both OpenAI’s industry-leading embedding models and a curated selection of alternative embedding models at competitive prices. Our API supports 16 different embedding models, providing options for various use cases, languages, and budget requirements.

Quick Start

from openai import OpenAI

# Initialize client pointing to NanoGPT
client = OpenAI(
    api_key="YOUR_NANOGPT_API_KEY",
    base_url="https://nano-gpt.com/api/v1"
)

# Create embedding
response = client.embeddings.create(
    input="Your text to embed",
    model="text-embedding-3-small"
)

embedding = response.data[0].embedding
print(f"Embedding has {len(embedding)} dimensions")

Available Models

OpenAI Models

ModelDimensionsMax TokensPrice/1M tokensFeatures
text-embedding-3-small15368191$0.02Dimension reduction support, most cost-effective
text-embedding-3-large30728191$0.13Dimension reduction support, highest performance
text-embedding-ada-00215368191$0.10Legacy model, no dimension reduction

Alternative Models

Multilingual Models

ModelDimensionsPrice/1M tokensDescription
BAAI/bge-m31024$0.01Excellent multilingual support
jina-clip-v1768$0.04Multimodal CLIP embeddings

Language-Specific Models

ModelLanguageDimensionsPrice/1M tokens
BAAI/bge-large-en-v1.5English768$0.01
BAAI/bge-large-zh-v1.5Chinese1024$0.01
jina-embeddings-v2-base-enEnglish768$0.05
jina-embeddings-v2-base-deGerman768$0.05
jina-embeddings-v2-base-zhChinese768$0.05
jina-embeddings-v2-base-esSpanish768$0.05

Specialized Models

ModelUse CaseDimensionsPrice/1M tokens
jina-embeddings-v2-base-codeCode768$0.05
Baichuan-Text-EmbeddingGeneral1024$0.088
netease-youdao/bce-embedding-base_v1General1024$0.02
zhipu-embedding-2Chinese1024$0.07
Qwen/Qwen3-Embedding-0.6BGeneral1024$0.01

API Endpoints

Create Embeddings

Endpoint: POST https://nano-gpt.com/api/v1/embeddings Create embeddings for one or more text inputs.

Discover Embedding Models

Endpoint: GET https://nano-gpt.com/api/v1/embedding-models List all available embedding models with detailed information.

Advanced Features

Batch Processing

Process multiple texts efficiently in a single request:
texts = [
    "First text to embed",
    "Second text to embed",
    "Third text to embed"
]

response = client.embeddings.create(
    input=texts,  # Pass array of strings
    model="text-embedding-3-small"
)

# Access embeddings by index
for i, data in enumerate(response.data):
    print(f"Text {i}: {len(data.embedding)} dimensions")

Dimension Reduction

Reduce embedding dimensions for faster similarity comparisons (supported models only):
# Reduce dimensions to 256 for faster processing
response = client.embeddings.create(
    input="Your text here",
    model="text-embedding-3-small",
    dimensions=256  # Reduce from 1536 to 256
)
Supported models for dimension reduction:
  • text-embedding-3-small
  • text-embedding-3-large
  • Qwen/Qwen3-Embedding-0.6B

Base64 Encoding

For more efficient data transfer, request base64-encoded embeddings:
response = client.embeddings.create(
    input="Your text here",
    model="text-embedding-3-small",
    encoding_format="base64"  # Returns base64-encoded bytes
)

Use Cases

Build powerful search systems that understand meaning:
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

# Create embeddings for your documents
documents = ["Document 1 text", "Document 2 text", "Document 3 text"]
doc_embeddings = []

for doc in documents:
    response = client.embeddings.create(input=doc, model="text-embedding-3-small")
    doc_embeddings.append(response.data[0].embedding)

# Create embedding for search query
query = "Search query text"
query_response = client.embeddings.create(input=query, model="text-embedding-3-small")
query_embedding = query_response.data[0].embedding

# Calculate similarities
similarities = cosine_similarity([query_embedding], doc_embeddings)[0]

# Find most similar documents
top_matches = np.argsort(similarities)[::-1][:3]
for idx in top_matches:
    print(f"Document {idx}: {similarities[idx]:.3f} similarity")

RAG (Retrieval Augmented Generation)

Enhance LLM responses with relevant context:
# 1. Embed and store your knowledge base
knowledge_base = [
    {"text": "Company founded in 2020...", "embedding": None},
    {"text": "Product features include...", "embedding": None},
]

for item in knowledge_base:
    response = client.embeddings.create(
        input=item["text"], 
        model="text-embedding-3-small"
    )
    item["embedding"] = response.data[0].embedding

# 2. For a user query, find relevant context
user_query = "When was the company founded?"
query_response = client.embeddings.create(
    input=user_query,
    model="text-embedding-3-small"
)
query_embedding = query_response.data[0].embedding

# 3. Find most relevant facts (implement similarity search)
# relevant_facts = find_similar_texts(query_embedding, knowledge_base, top_k=3)

# 4. Use retrieved context with chat completion
chat_response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": f"Use this context to answer: {relevant_facts}"},
        {"role": "user", "content": user_query}
    ]
)

Clustering & Classification

Group similar texts or classify content:
from sklearn.cluster import KMeans

# Create embeddings for texts
texts = ["Text 1", "Text 2", "Text 3", ...]
embeddings = []

for text in texts:
    response = client.embeddings.create(input=text, model="text-embedding-3-small")
    embeddings.append(response.data[0].embedding)

# Cluster embeddings
kmeans = KMeans(n_clusters=5)
clusters = kmeans.fit_predict(embeddings)

# Each text now has a cluster assignment
for text, cluster_id in zip(texts, clusters):
    print(f"'{text}' belongs to cluster {cluster_id}")

Duplicate Detection

Find similar or duplicate content:
def find_duplicates(texts, threshold=0.95):
    embeddings = []
    
    # Generate embeddings
    for text in texts:
        response = client.embeddings.create(
            input=text,
            model="text-embedding-3-small"
        )
        embeddings.append(response.data[0].embedding)
    
    # Calculate pairwise similarities
    similarities = cosine_similarity(embeddings)
    
    # Find duplicates
    duplicates = []
    for i in range(len(texts)):
        for j in range(i+1, len(texts)):
            if similarities[i][j] > threshold:
                duplicates.append((i, j, similarities[i][j]))
    
    return duplicates

Model Selection Guide

By Use Case

Use CaseRecommended ModelRationale
General English texttext-embedding-3-smallBest price/performance ratio
Maximum accuracytext-embedding-3-largeHighest quality embeddings
Multilingual contentBAAI/bge-m3Excellent cross-language performance
Code embeddingsjina-embeddings-v2-base-codeSpecialized for programming languages
Budget-consciousBAAI/bge-large-en-v1.5Just $0.01/1M tokens
Chinese contentBAAI/bge-large-zh-v1.5Optimized for Chinese
Fast similarity searchModels with dimension reductionCan reduce dimensions for speed

By Requirements

Need fastest search?
  • Use models supporting dimension reduction
  • Reduce to 256-512 dimensions
  • Trade small accuracy loss for 2-4x speed improvement
Need highest accuracy?
  • Use text-embedding-3-large
  • Keep full 3072 dimensions
  • Best for critical applications
Processing many languages?
  • Use BAAI/bge-m3 for general multilingual
  • Use language-specific Jina models for best per-language performance
Working with code?
  • Use jina-embeddings-v2-base-code
  • Optimized for programming language semantics

Best Practices

Performance Optimization

  1. Batch Requests: Send up to 2048 texts in a single request
  2. Use Dimension Reduction: Reduce dimensions when exact precision isn’t critical
  3. Cache Embeddings: Store computed embeddings to avoid re-processing
  4. Choose Appropriate Models: Don’t use 3072-dimension models if 768 suffices

Cost Optimization

  1. Monitor Usage: Track the usage field in responses
  2. Start Small: Begin with text-embedding-3-small before upgrading
  3. Implement Caching: Avoid re-embedding identical content
  4. Batch Processing: Reduce API call overhead

Quality Optimization

  1. Preprocess Text: Clean and normalize text before embedding
  2. Consider Context: Include relevant context in the text to embed
  3. Test Different Models: Compare performance for your specific use case
  4. Use Appropriate Similarity Metrics: Cosine similarity for most cases

Integration Examples

JavaScript/TypeScript

import OpenAI from 'openai';

// Initialize client
const openai = new OpenAI({
    apiKey: 'YOUR_NANOGPT_API_KEY',
    baseURL: 'https://nano-gpt.com/api/v1'
});

// Create embedding
const response = await openai.embeddings.create({
    input: "Your text to embed",
    model: "text-embedding-3-small"
});

const embedding = response.data[0].embedding;
console.log(`Embedding has ${embedding.length} dimensions`);

cURL

curl https://nano-gpt.com/api/v1/embeddings \
  -H "Authorization: Bearer YOUR_NANOGPT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Your text to embed",
    "model": "text-embedding-3-small"
  }'

Direct API Usage

import requests
import json

headers = {
    "Authorization": "Bearer YOUR_NANOGPT_API_KEY",
    "Content-Type": "application/json"
}

data = {
    "input": "Your text to embed",
    "model": "text-embedding-3-small"
}

response = requests.post(
    "https://nano-gpt.com/api/v1/embeddings",
    headers=headers,
    json=data
)

result = response.json()
embedding = result["data"][0]["embedding"]

Rate Limits & Error Handling

Rate Limits

  • Default: 100 requests per second per IP address
  • Internal requests: No rate limiting (requires internal auth token)

Error Codes

CodeDescriptionSolution
401Invalid or missing API keyCheck your API key
400Invalid request parametersVerify model name and input format
429Rate limit exceededImplement exponential backoff
500Server errorRetry with exponential backoff

Error Response Format

{
  "error": {
    "message": "Invalid model specified",
    "type": "invalid_request_error",
    "param": "model",
    "code": null
  }
}

Migration from OpenAI

Switching from OpenAI to NanoGPT is seamless:
# OpenAI
client = OpenAI(api_key="sk-...")

# NanoGPT (just change base_url and api_key)
client = OpenAI(
    api_key="YOUR_NANOGPT_API_KEY",
    base_url="https://nano-gpt.com/api/v1"
)

# All other code remains exactly the same!

Pricing Summary

Price RangeModelsBest For
$0.01/1MBAAI models, QwenBudget applications
$0.02/1Mtext-embedding-3-small, netease-youdaoBalanced performance
$0.04-0.05/1MJina modelsSpecialized use cases
$0.07-0.088/1Mzhipu, BaichuanSpecific requirements
$0.10/1Mada-002Legacy compatibility
$0.13/1Mtext-embedding-3-largeMaximum performance

Additional Resources