Grok Imagine Video 1.5 generates video from image plus audio

Single-pass image-to-video model with synchronized audio now available via AI Gateway SDK; chain with image generation for end-to-end animation workflows.

June 4, 2026

Summary

Eliminates separate video generation and audio synthesis steps, reducing latency and API calls for developers building video-from-text pipelines. Expanded reference image support gives finer control over output style without fine-tuning.

Why it matters

Eliminates separate video generation and audio synthesis steps, reducing latency and API calls for developers building video-from-text pipelines. Expanded reference image support gives finer control over output style without fine-tuning.

Implementation verdict

Replaces multi-step video + audio workflows. Requires setting `model: 'xai/grok-imagine-video-1.5-preview'` and passing base64 image data. Ready now via AI Gateway; TypeScript SDK includes experimental generateVideo API. Worth testing if you're already on Vercel's stack.

Sources

  1. 1.The model generates video from an input image with synchronized audio in a single pass
  2. 2.Face accuracy and character consistency are stronger across longer sequences, with better lighting and physical realism in the output
  3. 3.set model to `xai/grok-imagine-video-1.5-preview` in the AI SDK

Dev Signal

Get briefs like this in your inbox — free, 3x a week.

100+ sources compressed into one 4-minute read. Ranked, cited, implementation-ready.