Embeddings - NanoGPT API Documentation

Overview

NanoGPT provides a fully OpenAI-compatible embeddings API that offers access to both OpenAI’s industry-leading embedding models and a curated selection of alternative embedding models at competitive prices. Our API supports 16 different embedding models, providing options for various use cases, languages, and budget requirements.

Quick Start

from openai import OpenAI

# Initialize client pointing to NanoGPT
client = OpenAI(
    api_key="YOUR_NANOGPT_API_KEY",
    base_url="https://nano-gpt.com/api/v1"
)

# Create embedding
response = client.embeddings.create(
    input="Your text to embed",
    model="text-embedding-3-small"
)

embedding = response.data[0].embedding
print(f"Embedding has {len(embedding)} dimensions")

Available Models

OpenAI Models

Model	Dimensions	Max Tokens	Price/1M tokens	Features
`text-embedding-3-small`	1536	8191	$0.02	Dimension reduction support, most cost-effective
`text-embedding-3-large`	3072	8191	$0.13	Dimension reduction support, highest performance
`text-embedding-ada-002`	1536	8191	$0.10	Legacy model, no dimension reduction

Alternative Models

Multilingual Models

Model	Dimensions	Price/1M tokens	Description
`BAAI/bge-m3`	1024	$0.01	Excellent multilingual support
`jina-clip-v1`	768	$0.04	Multimodal CLIP embeddings

Language-Specific Models

Model	Language	Dimensions	Price/1M tokens
`BAAI/bge-large-en-v1.5`	English	768	$0.01
`BAAI/bge-large-zh-v1.5`	Chinese	1024	$0.01
`jina-embeddings-v2-base-en`	English	768	$0.05
`jina-embeddings-v2-base-de`	German	768	$0.05
`jina-embeddings-v2-base-zh`	Chinese	768	$0.05
`jina-embeddings-v2-base-es`	Spanish	768	$0.05

Specialized Models

Model	Use Case	Dimensions	Price/1M tokens
`jina-embeddings-v2-base-code`	Code	768	$0.05
`Baichuan-Text-Embedding`	General	1024	$0.088
`netease-youdao/bce-embedding-base_v1`	General	1024	$0.02
`zhipu-embedding-2`	Chinese	1024	$0.07
`Qwen/Qwen3-Embedding-0.6B`	General	1024	$0.01

API Endpoints

Create Embeddings

Endpoint: POST https://nano-gpt.com/api/v1/embeddings Create embeddings for one or more text inputs.

Discover Embedding Models

Endpoint: GET https://nano-gpt.com/api/v1/embedding-models List all available embedding models with detailed information.

Advanced Features

Batch Processing

Process multiple texts efficiently in a single request:

texts = [
    "First text to embed",
    "Second text to embed",
    "Third text to embed"
]

response = client.embeddings.create(
    input=texts,  # Pass array of strings
    model="text-embedding-3-small"
)

# Access embeddings by index
for i, data in enumerate(response.data):
    print(f"Text {i}: {len(data.embedding)} dimensions")

Dimension Reduction

Reduce embedding dimensions for faster similarity comparisons (supported models only):

# Reduce dimensions to 256 for faster processing
response = client.embeddings.create(
    input="Your text here",
    model="text-embedding-3-small",
    dimensions=256  # Reduce from 1536 to 256
)

Supported models for dimension reduction:

text-embedding-3-small
text-embedding-3-large
Qwen/Qwen3-Embedding-0.6B

Base64 Encoding

For more efficient data transfer, request base64-encoded embeddings:

response = client.embeddings.create(
    input="Your text here",
    model="text-embedding-3-small",
    encoding_format="base64"  # Returns base64-encoded bytes
)

Use Cases

Semantic Search

Build powerful search systems that understand meaning:

import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

# Create embeddings for your documents
documents = ["Document 1 text", "Document 2 text", "Document 3 text"]
doc_embeddings = []

for doc in documents:
    response = client.embeddings.create(input=doc, model="text-embedding-3-small")
    doc_embeddings.append(response.data[0].embedding)

# Create embedding for search query
query = "Search query text"
query_response = client.embeddings.create(input=query, model="text-embedding-3-small")
query_embedding = query_response.data[0].embedding

# Calculate similarities
similarities = cosine_similarity([query_embedding], doc_embeddings)[0]

# Find most similar documents
top_matches = np.argsort(similarities)[::-1][:3]
for idx in top_matches:
    print(f"Document {idx}: {similarities[idx]:.3f} similarity")

RAG (Retrieval Augmented Generation)

Enhance LLM responses with relevant context:

# 1. Embed and store your knowledge base
knowledge_base = [
    {"text": "Company founded in 2020...", "embedding": None},
    {"text": "Product features include...", "embedding": None},
]

for item in knowledge_base:
    response = client.embeddings.create(
        input=item["text"], 
        model="text-embedding-3-small"
    )
    item["embedding"] = response.data[0].embedding

# 2. For a user query, find relevant context
user_query = "When was the company founded?"
query_response = client.embeddings.create(
    input=user_query,
    model="text-embedding-3-small"
)
query_embedding = query_response.data[0].embedding

# 3. Find most relevant facts (implement similarity search)
# relevant_facts = find_similar_texts(query_embedding, knowledge_base, top_k=3)

# 4. Use retrieved context with chat completion
chat_response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": f"Use this context to answer: {relevant_facts}"},
        {"role": "user", "content": user_query}
    ]
)

Clustering & Classification

Group similar texts or classify content:

from sklearn.cluster import KMeans

# Create embeddings for texts
texts = ["Text 1", "Text 2", "Text 3", ...]
embeddings = []

for text in texts:
    response = client.embeddings.create(input=text, model="text-embedding-3-small")
    embeddings.append(response.data[0].embedding)

# Cluster embeddings
kmeans = KMeans(n_clusters=5)
clusters = kmeans.fit_predict(embeddings)

# Each text now has a cluster assignment
for text, cluster_id in zip(texts, clusters):
    print(f"'{text}' belongs to cluster {cluster_id}")

Duplicate Detection

Find similar or duplicate content:

def find_duplicates(texts, threshold=0.95):
    embeddings = []
    
    # Generate embeddings
    for text in texts:
        response = client.embeddings.create(
            input=text,
            model="text-embedding-3-small"
        )
        embeddings.append(response.data[0].embedding)
    
    # Calculate pairwise similarities
    similarities = cosine_similarity(embeddings)
    
    # Find duplicates
    duplicates = []
    for i in range(len(texts)):
        for j in range(i+1, len(texts)):
            if similarities[i][j] > threshold:
                duplicates.append((i, j, similarities[i][j]))
    
    return duplicates

Model Selection Guide

By Use Case

Use Case	Recommended Model	Rationale
General English text	`text-embedding-3-small`	Best price/performance ratio
Maximum accuracy	`text-embedding-3-large`	Highest quality embeddings
Multilingual content	`BAAI/bge-m3`	Excellent cross-language performance
Code embeddings	`jina-embeddings-v2-base-code`	Specialized for programming languages
Budget-conscious	`BAAI/bge-large-en-v1.5`	Just $0.01/1M tokens
Chinese content	`BAAI/bge-large-zh-v1.5`	Optimized for Chinese
Fast similarity search	Models with dimension reduction	Can reduce dimensions for speed

By Requirements

Need fastest search?

Use models supporting dimension reduction
Reduce to 256-512 dimensions
Trade small accuracy loss for 2-4x speed improvement

Need highest accuracy?

Use text-embedding-3-large
Keep full 3072 dimensions
Best for critical applications

Processing many languages?

Use BAAI/bge-m3 for general multilingual
Use language-specific Jina models for best per-language performance

Working with code?

Use jina-embeddings-v2-base-code
Optimized for programming language semantics

Best Practices

Performance Optimization

Batch Requests: Send up to 2048 texts in a single request
Use Dimension Reduction: Reduce dimensions when exact precision isn’t critical
Cache Embeddings: Store computed embeddings to avoid re-processing
Choose Appropriate Models: Don’t use 3072-dimension models if 768 suffices

Cost Optimization

Monitor Usage: Track the usage field in responses
Start Small: Begin with text-embedding-3-small before upgrading
Implement Caching: Avoid re-embedding identical content
Batch Processing: Reduce API call overhead

Quality Optimization

Preprocess Text: Clean and normalize text before embedding
Consider Context: Include relevant context in the text to embed
Test Different Models: Compare performance for your specific use case
Use Appropriate Similarity Metrics: Cosine similarity for most cases

Integration Examples

JavaScript/TypeScript

import OpenAI from 'openai';

// Initialize client
const openai = new OpenAI({
    apiKey: 'YOUR_NANOGPT_API_KEY',
    baseURL: 'https://nano-gpt.com/api/v1'
});

// Create embedding
const response = await openai.embeddings.create({
    input: "Your text to embed",
    model: "text-embedding-3-small"
});

const embedding = response.data[0].embedding;
console.log(`Embedding has ${embedding.length} dimensions`);

cURL

curl https://nano-gpt.com/api/v1/embeddings \
  -H "Authorization: Bearer YOUR_NANOGPT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Your text to embed",
    "model": "text-embedding-3-small"
  }'

Direct API Usage

import requests
import json

headers = {
    "Authorization": "Bearer YOUR_NANOGPT_API_KEY",
    "Content-Type": "application/json"
}

data = {
    "input": "Your text to embed",
    "model": "text-embedding-3-small"
}

response = requests.post(
    "https://nano-gpt.com/api/v1/embeddings",
    headers=headers,
    json=data
)

result = response.json()
embedding = result["data"][0]["embedding"]

Rate Limits & Error Handling

Rate Limits

Default: 100 requests per second per IP address
Internal requests: No rate limiting (requires internal auth token)

Error Codes

Code	Description	Solution
401	Invalid or missing API key	Check your API key
400	Invalid request parameters	Verify model name and input format
429	Rate limit exceeded	Implement exponential backoff
500	Server error	Retry with exponential backoff

Error Response Format

{
  "error": {
    "message": "Invalid model specified",
    "type": "invalid_request_error",
    "param": "model",
    "code": null
  }
}

Migration from OpenAI

Switching from OpenAI to NanoGPT is seamless:

# OpenAI
client = OpenAI(api_key="sk-...")

# NanoGPT (just change base_url and api_key)
client = OpenAI(
    api_key="YOUR_NANOGPT_API_KEY",
    base_url="https://nano-gpt.com/api/v1"
)

# All other code remains exactly the same!

Pricing Summary

Price Range	Models	Best For
$0.01/1M	BAAI models, Qwen	Budget applications
$0.02/1M	text-embedding-3-small, netease-youdao	Balanced performance
$0.04-0.05/1M	Jina models	Specialized use cases
$0.07-0.088/1M	zhipu, Baichuan	Specific requirements
$0.10/1M	ada-002	Legacy compatibility
$0.13/1M	text-embedding-3-large	Maximum performance

Get Started

Endpoint Examples

API Reference

Miscellaneous

Integrations

​Overview

​Quick Start

​Available Models

​OpenAI Models

​Alternative Models

​Multilingual Models

​Language-Specific Models

​Specialized Models

​API Endpoints

​Create Embeddings

​Discover Embedding Models

​Advanced Features

​Batch Processing

​Dimension Reduction

​Base64 Encoding

​Use Cases

​Semantic Search

​RAG (Retrieval Augmented Generation)

​Clustering & Classification

​Duplicate Detection

​Model Selection Guide

​By Use Case

​By Requirements

​Best Practices

​Performance Optimization

​Cost Optimization

​Quality Optimization

​Integration Examples

​JavaScript/TypeScript

​cURL

​Direct API Usage

​Rate Limits & Error Handling

​Rate Limits

​Error Codes

​Error Response Format

​Migration from OpenAI

​Pricing Summary

​Additional Resources