Demis Hassabis, CEO of DeepMind, predicts the future integration of Gemini and Veo AI models by Google.

Google’s Vision for AI: Merging Gemini with Veo
An Insight from Demis Hassabis
In a recent episode of the podcast Possible, co-hosted by Reid Hoffman, Demis Hassabis, the CEO of Google DeepMind, shared exciting plans about the future of their AI technologies. Hassabis revealed that Google intends to combine its Gemini AI models with the Veo video generating models. This union is aimed at enhancing Gemini’s ability to comprehend and interpret the physical world, making its applications even more versatile.
Hassabis explained that from the onset, Gemini was designed to be a multimodal foundation model. This initiative reflects Google’s ambition to develop what they refer to as a "universal digital assistant." According to Hassabis, this assistant would offer substantial support in everyday life, making it a practical tool for users.
The Rise of Omni Models
The AI landscape is gradually shifting towards advanced "omni" models. These models are capable of comprehending and synthesizing diverse forms of media, including text, images, audio, and video. For example, Google’s latest Gemini models have the ability to generate not only audio but also images and text. Similarly, OpenAI has integrated image creation capabilities into its ChatGPT model, showcasing its versatility by producing various artistic styles, including those inspired by Studio Ghibli.
Moreover, Amazon is set to launch an "any-to-any" model soon, signifying a broader trend in the AI sector towards creating models that can seamlessly handle multiple types of media.
Importance of Training Data
Building these sophisticated omni models requires a substantial amount of training data, which typically includes a mixture of images, videos, audio recordings, and text. Hassabis mentioned that the data for the Veo model largely comes from YouTube, a platform owned by Google.
He highlighted that by analyzing a significant number of YouTube videos, Veo 2 can learn about the physical principles of the world around us. This approach allows the model to gain insights into various phenomena, thereby enhancing its performance in real-world scenarios.
Google’s Use of YouTube Data
Previously, Google informed TechCrunch that its AI models might use "some" content from YouTube in line with their agreements with content creators on the platform. Last year, Google expanded its terms of service, enabling the company to access more data for training its AI technologies, which can significantly improve the capabilities of models like Gemini and Veo.
This strategic move underlines Google’s commitment to creating more advanced AI systems that can provide users with more comprehensive and practical assistance in real life. By leveraging existing resources like YouTube, Google aims to refine its AI capabilities and make them more relatable and functional for everyday tasks.
Conclusion on the Future of AI Development
The integration between Gemini and Veo represents a pivotal step for Google in its mission to create an advanced digital assistant. As AI continues to evolve into omni models that bridge various forms of media, the potential applications for such technology are boundless. These advancements are not only a testament to Google’s innovative approach but also a glimpse into the future of digital assistance and AI integration in daily life.