Ep 2: ASR models, accuracy, cost & the role of humans

https://is1-ssl.mzstatic.com/image/thumb/Podcasts221/v4/3d/e6/51/3de65140-1ec1-79a2-d0ec-b8dfe4307a3d/mza_1357625757877950409.jpg/600x600bb.jpg

AI Innovators - By SaladCloud

AI Innovators

17 episodes

6 months ago

All content for AI Innovators - By SaladCloud is the property of AI Innovators and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

Technology

Ep 2: ASR models, accuracy, cost & the role of humans - Aleks Smechov from Wordcab

AI Innovators - By SaladCloud

20 minutes 17 seconds

1 year ago

Ep 2: ASR models, accuracy, cost & the role of humans - Aleks Smechov from Wordcab

In this conversation, Derick Thompson from Salad Technologies interviews Alex from WordCab about transcription, ASR, and accessibility. They discuss the importance of accurate transcripts for global accessibility, the different definitions of verbatim transcription, and the impact of audio cues. They also talk about the best ASR models, tools for post-processing, and the need for human editors in transcription. The conversation concludes with a discussion on the future of ASR and transcription. Takeaways Accurate transcripts are crucial for global accessibility, allowing people with disabilities to understand audio and video content. Different definitions of verbatim transcription exist, ranging from including all disfluencies to a more cleaned-up version. Audio cues, such as laughter or coughing, are important for accessibility and may need to be added during transcription. The best ASR models for transcription depend on the specific use case and language requirements. Post-processing is essential for improving transcript accuracy, especially for industry-specific terms and difficult words. Human editors play a vital role in fine-tuning transcripts and adding value through post-processing and audio cues. The future of ASR and transcription lies in increasing accuracy, reducing word error rates, and focusing on post-processing capabilities. Transcription will become a commodity, and the real value will come from what can be done with the transcript after transcription. Using cost-effective GPU instances and cloud-agnostic tools is important for hosting ASR models. The goal is to provide reliable and affordable transcription services to meet the needs of different use cases. Sound Bites "Accessibility in terms of video and audio, captions and transcription in general, is making sure that people who have some sort of disability, maybe they're hard of hearing or deaf, are still able to understand the captions or subtitles or transcript as well as someone who could hear." "Transcript editing will always be there as a kind of a last mile thing for edge cases and there will always be edge cases." "Transcription will become a commodity or table stakes like, you'll have to have excellent transcription, 95% accuracy, et cetera, in the future. And the real value will come in with what you could do after." Chapters 00:00: Introduction and Overview of WordCab 01:14: Defining Verbatim Transcription and Audio Cues 07:03: Choosing the Best ASR Models for Transcription 09:26: The Importance of Post-Processing in Transcription 12:51: Accuracy, Word Error Rate, and Transcription 14:17: Tools and Approaches for ASR and Transcription 19:43: The Future of ASR and Transcription 21:08: Optimizing ASR Performance and Cost 22:07: Providing Reliable and Affordable Transcription Services