Today, Google announced the development of Imagen Video, a text-to-video AI mode capable of producing 1280×768 videos at 24 frames per second from a written prompt. Currently, it’s in a research phase, but its appearance five months after Google Imagen points to the rapid development of video synthesis models.
Only six months after the launch of OpenAI’s DALLE-2 text-to-image generator, progress in the field of AI diffusion models has been heating up rapidly. Google’s Imagen Video announcement comes less than a week after Meta unveiled its text-to-video AI tool, Make-A-Video.
According to Google’s research paper, Imagen Video includes several notable stylistic abilities, such as generating videos based on the work of famous painters (the paintings of Vincent van Gogh, for example), generating 3D rotating objects while preserving object structure, and rendering text in a variety of animation styles. Google is hopeful that general-purpose video synthesis models can “significantly decrease the difficulty of high-quality content generation.”