Lumiere, Google’s new video production artificial intelligence (AI) model, emerges as a new milestone in video editing and creation technology. According to Ars Technica, Lumiere uses a new distribution model called Space-Time-U-Net (STUNet). This model can determine the location of objects in videos and how they move and change over time. This method allows Lumiere to render video in one process, producing a holistic video rather than stitching together small still frames.
Lumiere starts by creating an initial frame with a command, and then uses the STUNet framework to predict how objects in that frame will move, creating more frames that flow into each other. This creates the impression of seamless movement. Lumiere also produces 80 frames compared to Stable Video Diffusion’s 25 frames.
Lumiere: A step forward in AI technology that pushes the limit of realism
Google has published a “sizzle reel” and scientific preprint on Lumiere. This shows that AI video production and editing tools have become almost realistic in the last few years. Lumiere solidifies Google’s technology in a space already filled by competitors like Runway, Stable Video Diffusion or Meta’s Emu. As one of the first mass-market text-to-video platforms, Runway began delivering more realistic-looking videos.
Google put clips and scripts on the Lumiere site, allowing me to compare it to Runway. While the videos produced by Google Lumiere can give an impression of artificiality, especially when looking closely at skin texture or when the scene is more atmospheric, they look quite realistic when considering details such as how a turtle would actually move underwater, for example.
While other models stitch together videos from generated key frames where motion has already occurred, with STUNet, Lumiere focuses on motion based on where rendered content should be at a particular time in the video. While Google hasn’t yet become a major player in the text-to-video category, it has slowly introduced more advanced AI models and moved towards a more multi-modal focus. The Gemini large language model will eventually bring rendering to Bard. Although Lumiere is not yet available for testing, it demonstrates Google’s ability to develop a similar and perhaps slightly better AI video platform compared to generally available AI video generators such as Runway and Pika. This is where Google was two years ago with AI video.
Lumiere goes beyond text-to-video production, allowing for image-to-video production, stylized production that allows creating videos in a specific style, cinemagraphs that animate only a section of video, and inpainting to mask an area of video.
However, Google’s Lumiere article notes that “there is a risk of creating fake or harmful content with our technology, and developing tools to detect bias and abusive use cases is vital to ensuring safe and fair use of the app.” The authors of the article do not explain how to achieve this.