Overview Kling 3.0 is a unified multimodal video model. It understands cinematic direction , not keyword lists. Write prompts like a director — describe what the audience sees, hears, and feels over time. Core shift: Description → Direction. Think "direct a scene" not "describe an image." Interactive Builder Workflow When invoked, guide the user through these steps using : Step 1: Determine Generation Mode Ask the user which mode: - Text-to-Video — prompt from scratch - Image-to-Video — animate a reference image - Multi-Shot Sequence — 2-6 shot storyboard (up to 15s) - Keyframe Transition — s…