Grok Imagine Video 1.5 generates video from image plus audio
Single-pass image-to-video model with synchronized audio now available via AI Gateway SDK; chain with image generation for end-to-end animation workflows.
June 4, 2026
Summary
Eliminates separate video generation and audio synthesis steps, reducing latency and API calls for developers building video-from-text pipelines. Expanded reference image support gives finer control over output style without fine-tuning.
Why it matters
Eliminates separate video generation and audio synthesis steps, reducing latency and API calls for developers building video-from-text pipelines. Expanded reference image support gives finer control over output style without fine-tuning.
Implementation verdict
Replaces multi-step video + audio workflows. Requires setting `model: 'xai/grok-imagine-video-1.5-preview'` and passing base64 image data. Ready now via AI Gateway; TypeScript SDK includes experimental generateVideo API. Worth testing if you're already on Vercel's stack.
Sources
- 1.The model generates video from an input image with synchronized audio in a single pass
- 2.Face accuracy and character consistency are stronger across longer sequences, with better lighting and physical realism in the output
- 3.set model to `xai/grok-imagine-video-1.5-preview` in the AI SDK
Dev Signal
Get briefs like this in your inbox — free, 3x a week.
100+ sources compressed into one 4-minute read. Ranked, cited, implementation-ready.