Demis Hassabis, CEO of DeepMind, indicates that Google will merge its Gemini and Veo AI models in the future.

Google DeepMind’s Vision for AI
In a recent episode of the podcast Possible, co-hosted by Reid Hoffman, the co-founder of LinkedIn, Demis Hassabis, who is the CEO of Google DeepMind, shared exciting plans about the company’s AI developments. He highlighted Google’s intent to merge its Gemini AI models with Veo, its new video-generating technology. The aim of this integration is to enhance Gemini’s ability to comprehend the physical world better.
The Multimodal Approach
Hassabis remarked, "We’ve always built Gemini, our foundation model, to be multimodal from the beginning." The idea behind this is to create a universal digital assistant that can effectively aid users in real-world scenarios. This approach aligns with the emerging trend in the AI sector toward developing models capable of handling diverse types of media, which are often referred to as "omni" models.
What Are Omni Models?
Omni models are advanced AI systems capable of understanding and generating various forms of media, including text, images, audio, and video. Google’s latest Gemini models exemplify this concept as they can now produce audio along with text and images. In a similar vein, OpenAI has updated its ChatGPT model to create images, including artistic styles reminiscent of Studio Ghibli.
Amazon is also entering this space, announcing its plans to launch an "any-to-any" model, which is expected to debut later this year. These advancements signify a significant leap forward in AI capabilities.
The Role of Training Data
Building these omni models isn’t a simple task; they require vast amounts of training data sourced from multiple media types, including images, videos, audio, and text. Hassabis indicated that the video data for Veo largely comes from YouTube, which is owned by Google.
As he explained, "By watching YouTube videos — a lot of YouTube videos — [Veo 2] can figure out the physics of the world." This highlights the reliance on extensive content from platforms like YouTube to train AI models effectively.
YouTube’s Data Contribution
Google has mentioned its models might be trained on "some" content from YouTube, in line with agreements with creators on the platform. As part of a broader strategy, the company expanded its terms of service last year to access more data for training its AI models. Such moves are crucial for staying competitive in the rapidly evolving landscape of AI technology.
Future Outlook for AI and Google
The integration of different AI capabilities, such as those planned for Gemini and Veo, demonstrates Google’s commitment to pushing the boundaries of what AI can achieve. The collaborative potential of multimodal models holds promise for creating intelligent assistants that can seamlessly operate across various contexts and media types.
As companies like Google, OpenAI, and Amazon continue to innovate in this field, the implications for both users and developers are profound. With each technological upgrade, AI becomes more versatile, paving the way for a new era of digital interaction that could change how we approach problem-solving and everyday tasks.