Video Generation
Generate videos using supported text-to-video, image-to-video, and video-to-video models. The response includes a runId and pending status; poll the status endpoint for completion. See the docs for the current model list and required inputs.
Documentation Index
Fetch the complete documentation index at: https://docs.nano-gpt.com/llms.txt
Use this file to discover all available pages before exploring further.
Image-conditioned models accept eitherimageDataUrl(base64) or a publicimageUrl. The service uses the explicit value you provide before checking any saved attachments.
Overview
POST /generate-video submits an asynchronous job to create, extend, or edit a video. The endpoint responds immediately with runId, id, model, and status: "pending". runId and id are the same NanoGPT job identifier (format vid_...). Poll the unified Video Status endpoint with that job ID until you receive final assets. Duration-based billing is assessed after completion.
Errors include descriptive JSON payloads. Surface the error.message (and HTTP status) to help users correct content-policy or validation issues.
Extend Workflows
- Midjourney extend (task-based): use
POST /api/generate-video/extendwithrunId(preferred) ortaskId(legacy alias) plusindex(0-3). This flow does not acceptvideo,videoUrl,videoDataUrl, orvideoAttachmentId. - Source video extend (extend models): use
POST /api/generate-videowith an extend model pluspromptand a source video (videoUrl,videoDataUrl, orvideoAttachmentId).videois only accepted by select models (for example,wan-wavespeed-25-extend). Max source video length: 120 seconds.
Request Schema
Only include the fields required by your chosenmodel. Unknown keys are ignored, but some models fail when extra media fields are present.
Core Fields
| field | type | required | details |
|---|---|---|---|
model | string | yes | Video model ID. Model availability changes; discover models via GET /api/v1/models?detailed=true. |
conversationUUID | string | no | Attach the request to a conversation thread. |
prompt | string | conditional | Required for text-to-video and edit models unless a structured script is supplied. |
negative_prompt | string | no | Suppresses specific content. Respected by Veo, Wan, Runway, Pixverse, and other models noted below. |
script | string | conditional | LongStories models accept full scripts instead of relying on prompt. |
storyConfig | object | conditional | LongStories structured payload (e.g. scenes, narration, voice). |
animation | boolean | no | Enables animation for LongStories outputs. |
language | string | no | Output language for LongStories. |
characters | array | no | Character definitions for LongStories. |
duration | string | conditional | Seconds as a string ("5", "8", "60"). Limits vary per model; see individual entries. |
seconds | string | conditional | Sora-specific duration selector ("4", "8", "12"). |
aspect_ratio | string | conditional | Ratios such as 16:9, 9:16, 1:1, 3:4, 4:3, 21:9, auto. |
orientation | string | conditional | landscape or portrait for Sora and Wan text/image flows. |
resolution | string | conditional | Resolution tokens (480p, 580p, 720p, 1080p, 1792x1024, 2k, 4k). |
size | string | conditional | Output size preset (supported by select models). |
mode | string | no | Operation mode: text-to-video, image-to-video, reference-to-video, video-edit. |
generateAudio | boolean | no | Adds AI audio on Veo 3 and Lightricks models. Defaults to false. |
enhancePrompt | boolean | no | Optional Veo 3 prompt optimizer. Defaults to false. |
pro_mode / pro | boolean | no | High-quality toggle for Sora and Hunyuan families. Defaults to false. |
enable_prompt_expansion | boolean | no | Prompt booster for Wan/Seedance/Minimax variants. Disabled by default. |
enable_safety_checker | boolean | no | Optional safety checker toggle (supported by select models). |
camera_fix / camera_fixed / cameraFixed | boolean | no | Locks the virtual camera for Seedance and Wan variants. |
seed | number or string | no | Deterministic seed when supported (Veo, Wan, Pixverse). |
voiceId | string | conditional | Alternate voice selector for lipsync models. |
voice_id | string | conditional | Required by kling-lipsync-t2v. |
voice_language | string | conditional | en or zh for kling-lipsync-t2v. |
voice_speed | number | conditional | Range 0.8-2.0 for kling-lipsync-t2v. |
videoDuration / billedDuration | number | no | Optional overrides for upscaler billing calculations. |
adjust_fps_for_interpolation | boolean | no | Optional toggle for interpolation-aware upscaling. Defaults to false. |
Media Inputs
| field | type | required | details |
|---|---|---|---|
imageDataUrl | string | conditional | Base64-encoded data URL. Recommended for private assets or files larger than 4 MB. |
imageUrl | string | conditional | HTTPS link to a source image. |
imageAttachmentId | string | conditional | Reference to a library-stored image. |
image | string | conditional | Alternate image field accepted by select models. Prefer imageUrl unless the model explicitly requires image. |
reference_image | string | conditional | Optional still image guiding runwayml-gen4-aleph. |
referenceImages | array | conditional | Multiple reference images for reference-to-video flows. |
referenceVideos | array | conditional | Multiple reference videos. |
audioDataUrl | string | conditional | Base64 data URL for audio-driven models. |
audioDuration | number | conditional | Duration of provided audio in seconds. |
audioUrl | string | conditional | HTTPS audio input. |
audio | string | conditional | Alternate audio field accepted by select models. Prefer audioUrl unless the model explicitly requires audio. |
videoUrl | string | conditional | HTTPS link to a source video (edit, extend, upscaler, or lipsync jobs). |
videoDataUrl | string | conditional | Base64 data URL for a source video. |
video | string | conditional | Alternate video field accepted by select models. Prefer videoUrl unless the model explicitly requires video. |
videoAttachmentId | string | conditional | Reference to a library-stored video. |
swapImage | string | conditional | Swap image (face-swap models). |
targetVideo | string | conditional | Target video (face-swap models). |
targetFaceIndex | number | no | Optional face index (face-swap models). |
Provide only the media fields that your target model expects. Extra media inputs often trigger validation errors. PrefervideoUrl(camelCase) for source videos; only sendvideowhen the model explicitly requires it.
Advanced Controls
| field | type | models |
|---|---|---|
num_frames | integer | Wan 2.2 families, Seedance 22 5B, Wan image-to-video. |
frames_per_second | integer | Wan 2.2 5B. |
num_inference_steps | integer | Wan 2.2 families. |
guidance_scale | number | Wan 2.2 5B. |
shift | number | Wan 2.2 5B. |
interpolator_model | string | Wan 2.2 5B. |
num_interpolated_frames | integer | Wan 2.2 5B. |
movementAmplitude | string | Select models (for example auto, small, medium, large). |
motion | string | Select models (for example low, high). |
style | string | Select models (style/preset strings). |
effectType, effect, cameraMovement, motionMode, soundEffectSwitch, soundEffectPrompt | varies | Pixverse v4.5/v5. |
mode | string | Select models (for example animate, replace). |
prompt_optimizer | boolean | Select models. |
Model Discovery
Video model IDs and supported fields change over time. UseGET /api/v1/models?detailed=true to discover the current list and select a model intended for video generation.
Notes:
- Different models accept different media inputs (for example
imageUrlvs a sourcevideoUrl) and may support different duration / resolution options. - If you see validation errors, first retry with only the minimal required fields for your chosen model.
Async Processing & Status Polling
- The submission response includes
{ runId, id, model, status: "pending" }whereidandrunIdare identical. - Poll
/api/video/status?requestId=<runId>(orrunId) until the job reachesstatus: "COMPLETED"orstatus: "FAILED". The legacy/api/generate-video/statusendpoint is deprecated. - Many jobs emit intermediate states (
queued,processing,generating,delivering). Persist them if you need audit trails. - Failed jobs include an
errorobject. Surface the message and adjust prompts or inputs before retrying. - Duration and resolution determine credit usage.
Response example
Content & Safety Notes
Some models may block prompts that violate content policies. Non-200 responses describe the violation reason; relay these messages verbatim to users or implement automated prompt adjustments.Next Steps
- Poll the Video Status endpoint after every submission to retrieve final assets.
- Keep customer-facing pricing tables in sync with the API behavior you observe in production.
Authorizations
Body
Parameters for video generation across different models and providers
The video model to use for generation. See the docs for the current model list and required inputs.
Text prompt describing the video to generate
"A serene lake at sunset with gentle ripples on the water"
Fully-written script for LongStories models (takes precedence over prompt)
UUID for conversation tracking
Project identifier for LongStories models
Story framework for LongStories models
default, emotional_story, product_showcase, tutorial Smart Enhancement: if true, automatically choose better framework and add Director Notes if necessary
Target length in words for LongStories models (legacy parameter)
Target length in seconds (alternative to words)
Prompt for the image generation engine (LongStories). Example: 'Warm lighting' or 'Make the first image very impactful'
"Warm, cozy lighting with focus on people interacting"
Video aspect ratio for LongStories
9:16, 16:9 Script generation configuration for LongStories
Image generation configuration for LongStories
Video generation configuration for LongStories
Voiceover configuration for LongStories
Captions configuration for LongStories
Effects configuration for LongStories
Music configuration for LongStories
Legacy: Voice ID for narration (use voiceoverConfig.voiceId instead)
"pNInz6obpgDQGcFmaJgB"
Legacy: Whether to show captions (use captionsConfig.captionsEnabled instead)
Legacy: Style for captions (use captionsConfig.captionsStyle instead)
default, minimal, neon, cinematic, fancy, tiktok, highlight, gradient, instagram, vida, manuscripts Legacy: Video effects configuration (use effectsConfig instead)
Legacy: Video quality (handled by videoConfig now)
low, medium, high Legacy: Motion configuration (handled by videoConfig now)
Legacy: Music track (use musicConfig instead)
"video-creation/music/dramatic_cinematic_score.mp3"
Video duration (format varies by model - '5s' for Veo2, '5' for Kling, etc.)
"5s"
Aspect ratio (supported by select models)
16:9, 9:16, 1:1, 4:3, 3:4 Negative prompt to avoid certain elements
"blur, distort, and low quality"
Classifier-free guidance scale
0 <= x <= 20Base64 data URL of input image for image-to-video models. Aliases image_data_url and image are also accepted and normalized.
"data:image/jpeg;base64,/9j/4AAQ..."
Public HTTPS URL of the input image (interchangeable with imageDataUrl). The service will prioritize whichever field you supply before falling back to library attachments.
"https://images.unsplash.com/photo-1504196606672-aef5c9cefc92?w=1024"
Library attachment ID for input image
Public HTTPS URL of the input video (extend/edit/upscale). Preferred field name for source videos.
Base64 data URL of the input video.
Alternate video field accepted by select providers.
Library attachment ID for input video.
Whether to optimize the prompt (MiniMax model)
Number of inference steps
1 <= x <= 50Enable pro mode for Hunyuan Video
Video resolution
720p, 1080p, 540p Number of frames to generate
Frames per second
5 <= x <= 24Random seed for reproducible results
Enable safety content filtering
Allow explicit content (inverse of safety checker)
Enable automatic prompt expansion
Enable acceleration for faster processing
Shift parameter for certain models
Age setting for PromptChan model
18 <= x <= 60Enable audio for PromptChan model
Video quality for PromptChan model
Standard, High Aspect setting for PromptChan model
Portrait, Landscape, Square Response
Video generation request submitted successfully (asynchronous processing)
Unique identifier for the video generation request
Current status of the generation
pending, processing, completed, failed The model used for generation
Project identifier (for LongStories models)
Cost of the video generation
Payment source used (USD or XNO)
Remaining balance after the generation
Provider label for the precharge