
What can we learn from Reid Hoffman’s AI clone? In this episode of the Turing Podcast, Mahesh Joshi, Head of Data and AI at Turing, explores how multimodal AI — systems that integrate text, audio, video, and more — is redefining what’s possible in artificial intelligence.
From the rapid progress in audio generation to the more complex challenges of video generation, Mahesh breaks down where the technology stands today, the realistic benchmarks needed to measure its true capabilities, and the enterprise opportunities emerging from these breakthroughs. The conversation also looks ahead to the quest for embodied AI, where digital intelligence could interact with the world in human-like ways.
Whether you’re fascinated by the idea of an AI clone or looking to understand the cutting edge of generative AI applications, this episode offers a clear-eyed view of the multimodal frontier and Turing’s role in pushing it forward.
Episode Highlights:
Chapters
[00:00] Introduction to Multimodal AI
[01:08] The Rise of Multimodal Systems
[02:59] State of the Art in Audio and Video
[07:37] Challenges in Video Generation
[10:14] Opportunities for Incumbents in Video AI
[14:11] Benchmarking AI: The Turing Test and Beyond
[18:22] Defining Human Interaction with AI
[24:45] Future of Multimodal Applications
[27:04] Enterprise Adoption of Multimodal AI
[30:06] Turing's Role in AI Advancement
[33:16] Research Focus: Embodied AI