Embeddings

Overview

Create embeddings for text using OpenAI-compatible and alternative embedding models. Our API provides access to 16 different embedding models at competitive prices, fully compatible with OpenAI’s embedding API.

Available Models

OpenAI Models

text-embedding-3-small - 1536 dimensions, $0.02/1M tokens - Most cost-effective with dimension reduction support
text-embedding-3-large - 3072 dimensions, $0.13/1M tokens - Highest performance with dimension reduction support
text-embedding-ada-002 - 1536 dimensions, $0.10/1M tokens - Legacy model

Alternative Models

Multilingual:

BAAI/bge-m3 - 1024 dimensions, $0.01/1M tokens - Multilingual support
jina-clip-v1 - 768 dimensions, $0.04/1M tokens - Multimodal CLIP embeddings

Language-Specific:

BAAI/bge-large-en-v1.5 - 768 dimensions, $0.01/1M tokens - English optimized
BAAI/bge-large-zh-v1.5 - 1024 dimensions, $0.01/1M tokens - Chinese optimized
jina-embeddings-v2-base-en - 768 dimensions, $0.05/1M tokens - English
jina-embeddings-v2-base-de - 768 dimensions, $0.05/1M tokens - German
jina-embeddings-v2-base-zh - 768 dimensions, $0.05/1M tokens - Chinese
jina-embeddings-v2-base-es - 768 dimensions, $0.05/1M tokens - Spanish

Specialized:

jina-embeddings-v2-base-code - 768 dimensions, $0.05/1M tokens - Code embeddings
Baichuan-Text-Embedding - 1024 dimensions, $0.088/1M tokens
netease-youdao/bce-embedding-base_v1 - 1024 dimensions, $0.02/1M tokens
zhipu-embedding-2 - 1024 dimensions, $0.07/1M tokens
Qwen/Qwen3-Embedding-0.6B - 1024 dimensions, $0.01/1M tokens - Supports dimension reduction

Request Parameters

Parameter	Type	Required	Description
`input`	string or array	Yes	Single text string or array of up to 2048 strings to embed
`model`	string	Yes	ID of the embedding model to use
`encoding_format`	string	No	Format for embeddings: `"float"` (default) or `"base64"`
`dimensions`	integer	No	Reduce embedding dimensions (only for supported models)
`user`	string	No	Optional identifier for tracking usage

Response Format

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.023, -0.012, 0.045, ...]
    }
  ],
  "model": "text-embedding-3-small",
  "usage": {
    "prompt_tokens": 8,
    "total_tokens": 8
  }
}

Code Examples

Python with OpenAI SDK

from openai import OpenAI

# Initialize client pointing to NanoGPT
client = OpenAI(
    api_key="YOUR_NANOGPT_API_KEY",
    base_url="https://nano-gpt.com/api/v1"
)

# Create embedding
response = client.embeddings.create(
    input="Your text to embed",
    model="text-embedding-3-small"
)

# Access the embedding
embedding = response.data[0].embedding
print(f"Embedding dimensions: {len(embedding)}")

JavaScript/TypeScript

import OpenAI from 'openai';

// Initialize client pointing to NanoGPT
const openai = new OpenAI({
    apiKey: 'YOUR_NANOGPT_API_KEY',
    baseURL: 'https://nano-gpt.com/api/v1'
});

// Create embedding
const response = await openai.embeddings.create({
    input: "Your text to embed",
    model: "text-embedding-3-small"
});

// Access the embedding
const embedding = response.data[0].embedding;
console.log(`Embedding dimensions: ${embedding.length}`);

cURL

curl https://nano-gpt.com/api/v1/embeddings \
  -H "Authorization: Bearer YOUR_NANOGPT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Your text to embed",
    "model": "text-embedding-3-small"
  }'

Batch Processing

# Process multiple texts in a single request
texts = [
    "First text to embed",
    "Second text to embed",
    "Third text to embed"
]

response = client.embeddings.create(
    input=texts,  # Pass array of strings
    model="text-embedding-3-small"
)

# Access embeddings by index
for i, data in enumerate(response.data):
    print(f"Text {i}: {len(data.embedding)} dimensions")

Dimension Reduction

For models that support it (text-embedding-3-small, text-embedding-3-large, Qwen/Qwen3-Embedding-0.6B):

# Reduce dimensions to 256 for faster similarity comparisons
response = client.embeddings.create(
    input="Your text to embed",
    model="text-embedding-3-small",
    dimensions=256  # Reduce from 1536 to 256
)

Use Cases

Semantic Search

import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

# Create embeddings for your documents
documents = ["Document 1 text", "Document 2 text", "Document 3 text"]
doc_embeddings = []

for doc in documents:
    response = client.embeddings.create(input=doc, model="text-embedding-3-small")
    doc_embeddings.append(response.data[0].embedding)

# Create embedding for search query
query = "Search query text"
query_response = client.embeddings.create(input=query, model="text-embedding-3-small")
query_embedding = query_response.data[0].embedding

# Calculate similarities
similarities = cosine_similarity([query_embedding], doc_embeddings)[0]

# Find most similar documents
top_matches = np.argsort(similarities)[::-1][:3]
for idx in top_matches:
    print(f"Document {idx}: {similarities[idx]:.3f} similarity")

RAG (Retrieval Augmented Generation)

# 1. Embed and store your knowledge base
knowledge_base = [
    {"text": "Fact 1...", "embedding": None},
    {"text": "Fact 2...", "embedding": None},
]

for item in knowledge_base:
    response = client.embeddings.create(
        input=item["text"], 
        model="text-embedding-3-small"
    )
    item["embedding"] = response.data[0].embedding

# 2. For a user query, find relevant context
user_query = "What is...?"
query_response = client.embeddings.create(
    input=user_query,
    model="text-embedding-3-small"
)
query_embedding = query_response.data[0].embedding

# 3. Find most relevant facts
# relevant_facts = find_similar_texts(query_embedding, knowledge_base, top_k=3)

# 4. Use retrieved context with chat completion
chat_response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": f"Context: {relevant_facts}"},
        {"role": "user", "content": user_query}
    ]
)

Best Practices

Model Selection

General English text: Use text-embedding-3-small for best price/performance
Maximum accuracy: Use text-embedding-3-large
Multilingual: Use BAAI/bge-m3 or language-specific Jina models
Code: Use jina-embeddings-v2-base-code
Budget-conscious: Use BAAI models at $0.01/1M tokens

Performance Optimization

Batch requests: Send up to 2048 texts in a single request
Use dimension reduction: Reduce dimensions for faster similarity calculations when exact precision isn’t critical
Cache embeddings: Store computed embeddings to avoid re-processing identical texts
Choose appropriate models: Don’t use 3072-dimension models if 768 dimensions suffice

Cost Optimization

Monitor token usage: Track the usage field in responses
Use smaller models: Start with text-embedding-3-small before upgrading
Implement caching: Avoid re-embedding identical content
Batch processing: Reduce API call overhead

Rate Limits

Default: 100 requests per second per IP address
Internal requests: No rate limiting (requires internal auth token)

Error Handling

The API returns standard HTTP status codes and OpenAI-compatible error responses:

{
  "error": {
    "message": "Invalid model specified",
    "type": "invalid_request_error",
    "param": "model",
    "code": null
  }
}

Common error codes:

401: Invalid or missing API key
400: Invalid request parameters
429: Rate limit exceeded
500: Server error

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

Parameters for creating embeddings

input

required

Text to embed - single string or array of up to 2048 strings Single text string to embed

model

string

required

ID of the embedding model to use

Example:

"text-embedding-3-small"

encoding_format

enum<string>

default:float

Format for embeddings

Available options:

float,

base64

dimensions

integer

Reduce embedding dimensions (only for supported models)

Example:

256

user

string

Optional identifier for tracking usage

Response

Embedding response

object

string

required

Always 'list' for embeddings

Example:

"list"

data

object[]

required

Array of embedding objects

Show child attributes

model

string

required

Model used for embeddings

Example:

"text-embedding-3-small"

usage

object

required

Show child attributes

Get Started

Endpoint Examples

API Reference

Miscellaneous

Integrations

Overview

Available Models

OpenAI Models

Alternative Models

Request Parameters

Response Format

Code Examples

Python with OpenAI SDK

JavaScript/TypeScript

cURL

Batch Processing

Dimension Reduction

Use Cases

Semantic Search

RAG (Retrieval Augmented Generation)

Best Practices

Model Selection

Performance Optimization

Cost Optimization

Rate Limits

Error Handling

Authorizations

Body

Response

Get Started

Endpoint Examples

API Reference

Miscellaneous

Integrations

​Overview

​Available Models

​OpenAI Models

​Alternative Models

​Request Parameters

​Response Format

​Code Examples

​Python with OpenAI SDK

​JavaScript/TypeScript

​cURL

​Batch Processing

​Dimension Reduction

​Use Cases

​Semantic Search

​RAG (Retrieval Augmented Generation)

​Best Practices

​Model Selection

​Performance Optimization

​Cost Optimization

​Rate Limits

​Error Handling

Authorizations

Body

Response

Overview

Available Models

OpenAI Models

Alternative Models

Request Parameters

Response Format

Code Examples

Python with OpenAI SDK

JavaScript/TypeScript

cURL

Batch Processing

Dimension Reduction

Use Cases

Semantic Search

RAG (Retrieval Augmented Generation)

Best Practices

Model Selection

Performance Optimization

Cost Optimization

Rate Limits

Error Handling