Home
Categories
EXPLORE
True Crime
Comedy
Society & Culture
Business
Sports
Technology
Health & Fitness
About Us
Contact Us
Copyright
© 2024 PodJoint
Podjoint Logo
US
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts115/v4/90/e9/76/90e97622-62b2-9b2e-d678-4284b64841d6/mza_16925574863160776620.jpg/600x600bb.jpg
The Data Life Podcast
Sanket Gupta
27 episodes
1 week ago
This is a podcast where we talk all-about real life experiences of dealing with data and machine learning tools, techniques and personalities. We cover not just the technical aspects but also the "life" aspects of working in the field. Note: Opinions expressed are my own and do not express the views or opinions of my employer.
Show more...
Technology
RSS
All content for The Data Life Podcast is the property of Sanket Gupta and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
This is a podcast where we talk all-about real life experiences of dealing with data and machine learning tools, techniques and personalities. We cover not just the technical aspects but also the "life" aspects of working in the field. Note: Opinions expressed are my own and do not express the views or opinions of my employer.
Show more...
Technology
https://d3t3ozftmdmh3i.cloudfront.net/production/podcast_uploaded_nologo/1452849/1452849-1559791844755-0ffa52463af9d.jpg
16: Getting Started with Natural Language Processing
The Data Life Podcast
19 minutes 31 seconds
6 years ago
16: Getting Started with Natural Language Processing

So many tweets and news articles and unstructured text surrounds us. How do we make sense of all of these? Natural language processing or NLP can help. NLP refers to algorithms that process, understand and generate aspects of natural language either in text or in spoken voice. In this episode we will cover some of the common techniques in NLP to help get started in this exciting field! 

We cover several tasks in a NLP pipeline:
1. Tokenization and punctuation removal
2. Stemming and Lemmatization
3. One hot vectors
4. Word embeddings including Word2Vec and Glove
5. Recurrent Neural Networks and LSTMs
6. tf and tf-idf approaches - when to use word embeddings, when to use tf / tf-idf approaches?
7. Generating text using encoder-decoder or sequence to sequence models

Some resources:
1. Sequence Models - course by Andrew Ng on Coursera - one of the best courses I have seen on this topic! https://www.coursera.org/learn/nlp-sequence-models
2. Awesome collection of resources for NLP for Python, C++, Scala etc. and popular resource: https://github.com/keon/awesome-nlp
3. Overview of Text Similarity Metrics (a blog written by me on Medium): https://towardsdatascience.com/overview-of-text-similarity-metrics-3397c4601f50
4. How to train custom word embeddings on a GPU https://towardsdatascience.com/how-to-train-custom-word-embeddings-using-gpu-on-aws-f62727a1e3f6

Thanks for listening, please support this podcast by following the link in the end. 


The Data Life Podcast
This is a podcast where we talk all-about real life experiences of dealing with data and machine learning tools, techniques and personalities. We cover not just the technical aspects but also the "life" aspects of working in the field. Note: Opinions expressed are my own and do not express the views or opinions of my employer.