Web Scraping
Extract clean, formatted content from web pages. Returns both raw HTML content and formatted markdown.
Overview
The NanoGPT Web Scraping API allows you to extract clean, formatted content from web pages. It uses the Firecrawl service to scrape URLs and returns both raw HTML content and formatted markdown.
Authentication
The API supports two authentication methods:
1. API Key Authentication (Recommended)
Include your API key in the request header:
2. Bearer Token Authentication
Request Format
Headers
Request Body
Parameters
Parameter | Type | Required | Description |
---|---|---|---|
urls | string[] | Yes | Array of URLs to scrape. Maximum 5 URLs per request. |
URL Requirements
- Must be valid HTTP or HTTPS URLs
- Must have standard web ports (80, 443, or default)
- Cannot be localhost, private IPs, or metadata endpoints
- YouTube URLs are not supported (use the YouTube transcription endpoint instead)
Response Format
Success Response (200 OK)
Response Fields
results
Array of scraping results for each URL:
url
(string): The URL that was scrapedsuccess
(boolean): Whether the scraping was successfultitle
(string, optional): Page title if successfully scrapedcontent
(string, optional): Raw HTML contentmarkdown
(string, optional): Formatted markdown version of the contenterror
(string, optional): Error message if scraping failed
summary
Summary statistics for the request:
requested
(number): Number of URLs in the original requestprocessed
(number): Number of valid URLs that were processedsuccessful
(number): Number of URLs successfully scrapedfailed
(number): Number of URLs that failed to scrapetotalCost
(number): Total cost in USD (only for successful scrapes)
Error Responses
400 Bad Request
401 Unauthorized
402 Payment Required
429 Too Many Requests
500 Internal Server Error
Pricing
- Cost: $0.001 per successfully scraped URL
- Billing: You are only charged for URLs that are successfully scraped
- Payment Methods: USD balance or Nano (XNO) cryptocurrency
Rate Limits
- Default: 30 requests per minute per IP address
- With API Key: 30 requests per minute per API key
Code Examples
Best Practices
- Batch Requests: Send multiple URLs in a single request (up to 5) to minimize API calls
- Error Handling: Always check the
success
field for each result before accessing content - Content Size: Scraped content is limited to 100KB per URL
- URL Validation: Validate URLs on your end before sending to reduce failed requests
- Markdown Format: Use the markdown field for better readability and formatting
Limitations
- Maximum 5 URLs per request
- Maximum content size: 100KB per URL
- No JavaScript rendering (static content only)
FAQ
Q: Why was my URL rejected? A: URLs can be rejected for several reasons:
- Invalid format (not HTTP/HTTPS)
- Pointing to localhost or private IPs
- Using non-standard ports
- Being a YouTube URL (use the YouTube transcription endpoint)
Q: Can I scrape JavaScript-heavy sites? A: The scraper fetches static HTML content. Sites that rely heavily on JavaScript may not return complete content.
Q: What happens if a URL fails to scrape? A: You are not charged for failed URLs. The response will include an error message for that specific URL.
Q: Is there a sandbox/test environment? A: You can test with your regular API key. Since you’re only charged for successful scrapes, failed attempts during testing won’t cost anything.
Authorizations
Body
Web scraping parameters
The body is of type object
.
Response
Web scraping response with results for each URL
The response is of type object
.