Home
Categories
EXPLORE
True Crime
Comedy
Society & Culture
Business
Sports
Technology
News
About Us
Contact Us
Copyright
© 2024 PodJoint
Podjoint Logo
US
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts125/v4/61/03/ea/6103ea1b-41c7-e0ca-3fc5-b127a2682d35/mza_11809009319831773693.jpg/600x600bb.jpg
Two Voice Devs
Mark and Allen
256 episodes
1 day ago
Mark and Allen talk about the latest news in the VoiceFirst world from a developer point of view.
Show more...
Technology
RSS
All content for Two Voice Devs is the property of Mark and Allen and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
Mark and Allen talk about the latest news in the VoiceFirst world from a developer point of view.
Show more...
Technology
https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_episode/7779266/7779266-1756470471065-419fa9e266f04.jpg
Episode 253 - The Future of Voice? Exploring Gemini 2.5's TTS Model
Two Voice Devs
25 minutes 40 seconds
2 months ago
Episode 253 - The Future of Voice? Exploring Gemini 2.5's TTS Model

In this episode of Two Voice Devs, Mark and Allen dive into the new experimental Text-to-Speech (TTS) model in Google's Gemini 2.5. They explore its capabilities, from single-speaker to multi-speaker audio generation, and discuss how it's a significant leap from the old days of SSML. They also touch on how this new technology can be integrated with LangChainJS to create more dynamic and natural-sounding voice applications. Is this the return of voice as the primary interface for AI?


[00:00:00] Introduction

[00:00:45] Google's new experimental TTS model for Gemini

[00:01:55] Demo of single-speaker TTS in Google's AI Studio

[00:03:05] Code walkthrough for single-speaker TTS

[00:04:30] Lack of fine-grained control compared to SSML

[00:05:15] Using text cues to shape the TTS output

[00:06:20] Demo of multi-speaker TTS with a script

[00:09:50] Code walkthrough for multi-speaker TTS

[00:11:30] The model is tuned for TTS, not general conversation

[00:12:10] Using a separate LLM to generate a script for the TTS model

[00:13:30] Code walkthrough of the two-function approach with LangChainJS

[00:16:15] LangChainJS integration details

[00:19:00] Is Speech Markdown still relevant?

[00:21:20] Latency issues with the current TTS model

[00:22:00] Caching strategies for TTS

[00:23:30] Voice as the natural UI for AI

[00:25:30] Outro


#Gemini #TTS #VoiceAI #VoiceFirst #AI #Google #LangChainJS #LLM #Developer #Podcast

Two Voice Devs
Mark and Allen talk about the latest news in the VoiceFirst world from a developer point of view.