Making AI Video
May 9, 2025
A few quick notes from the intersection of AI, marketing, branding, and small business.
How AI video is getting created. On LinkedIn and elsewhere, AI video creators are showing off samples of impressive work. This PJ Pereira short film is one beautiful example. This is a hybrid example using AI and more traditional filmmaking methods posted by Albert Bozesan. To say that change is happening fast is the understatement of the decade.
What follows is a description of what the current workflow looks like to get the best results in AI filmmaking. I’m focused on this because it looks like inexpensive AI-generated product videos are just around the corner. Of course I can’t stress enough that this workflow only represents today’s best practices. Next week the tools and way of working might be different—will be different—and the results even more eye-popping.
The AI video creation process that’s currently in vogue follows a multi-stage approach that uses different AI tools based on their specific strengths. This workflow begins with text-to-image generation for keyframes, followed by specialized video generation tools for animation, refinement, and finishing.
But first, often before touching AI, the best creators develop a clear concept and vision of what they want to ultimately make. This first step is what can separate “point-and-shoot” videos from art. Next, with or without an AI collaboration, they script out the video, including and outline and detailed scene descriptions. Don’t rush these first steps. The more you polish here, the better the outcome.
Next, using text-to-image AI tools like Midjourney, DALL-E / ChatGPT, or Stable Diffusion, these “vide-ai-ographers” generate high-quality still images that serve as keyframes. From what I’ve seen, many creators are using the recent ChatGPT release as it carefully adheres to prompts, is easy to make fine adjustments, and has been doing a better job at imaging difficult things like hands. The keyframes establish the visual style, composition, character designs, and environment details that the video will follow. It’s common to generate many variations of each keyframe, reprompting along the way, to create the most compelling images that are inline with the original vision.
Once the keyframes are finalized, they're imported into specialized video generation platforms. Tools like Runway, Pika, Kling, or Higgsfield take these reference images and transform them into fluid video sequences. Each platform offers different specialized capabilities, so you’ll want to select the one that will work best for the video you envision. Runway excels at realistic motion, Pika offers strong character animation, while tools like Higgsfield and Kling provide extensive camera movement options including pans, zooms, and focus adjustments.
The most sophisticated creators might employ a multi-platform approach, using different tools for specific effects in what will ultimately be a unified video. For example, they might use Runway for realistic human movement sequences, while using Higgsfield for dynamic camera movements. This stage, like the ones before, often involves numerous iterations as creators refine prompts to achieve the desired motion, framing, pacing, and visual coherence.
After generating the raw video clips, many creators will import them into traditional video editing software like Adobe Premiere for final assembly. Here they can add transitions, sound effects, music, voiceovers, and text overlays.
This multi-stage process has been quickly and constantly evolving as new tools emerge and existing ones improve. The workflow represents a balance between leveraging AI capabilities while maintaining creative control through human curation and traditional editing techniques. It’ll be interesting to see if this video creation ecosystem continues to be a multi-step process with platforms competing on specific features in a particular step or if platforms will try to become the single go-to for video creation from script to keyframe to scene creation to what used to be called post production. Time will tell.