Episode 253 - The Future of Voice? Exploring Gemini 2.5's TTS Model

https://is1-ssl.mzstatic.com/image/thumb/Podcasts125/v4/61/03/ea/6103ea1b-41c7-e0ca-3fc5-b127a2682d35/mza_11809009319831773693.jpg/600x600bb.jpg

Two Voice Devs

Mark and Allen

256 episodes

1 day ago

Mark and Allen talk about the latest news in the VoiceFirst world from a developer point of view.

Technology

RSS

All content for Two Voice Devs is the property of Mark and Allen and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

Mark and Allen talk about the latest news in the VoiceFirst world from a developer point of view.

Technology

https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_episode/7779266/7779266-1756470471065-419fa9e266f04.jpg

Episode 253 - The Future of Voice? Exploring Gemini 2.5's TTS Model

Two Voice Devs

25 minutes 40 seconds

2 months ago

Episode 253 - The Future of Voice? Exploring Gemini 2.5's TTS Model

In this episode of Two Voice Devs, Mark and Allen dive into the new experimental Text-to-Speech (TTS) model in Google's Gemini 2.5. They explore its capabilities, from single-speaker to multi-speaker audio generation, and discuss how it's a significant leap from the old days of SSML. They also touch on how this new technology can be integrated with LangChainJS to create more dynamic and natural-sounding voice applications. Is this the return of voice as the primary interface for AI?

[00:00:00] Introduction

[00:00:45] Google's new experimental TTS model for Gemini

[00:01:55] Demo of single-speaker TTS in Google's AI Studio

[00:03:05] Code walkthrough for single-speaker TTS

[00:04:30] Lack of fine-grained control compared to SSML

[00:05:15] Using text cues to shape the TTS output

[00:06:20] Demo of multi-speaker TTS with a script

[00:09:50] Code walkthrough for multi-speaker TTS

[00:11:30] The model is tuned for TTS, not general conversation

[00:12:10] Using a separate LLM to generate a script for the TTS model

[00:13:30] Code walkthrough of the two-function approach with LangChainJS

[00:16:15] LangChainJS integration details

[00:19:00] Is Speech Markdown still relevant?

[00:21:20] Latency issues with the current TTS model

[00:22:00] Caching strategies for TTS

[00:23:30] Voice as the natural UI for AI

[00:25:30] Outro

#Gemini #TTS #VoiceAI #VoiceFirst #AI #Google #LangChainJS #LLM #Developer #Podcast