Your Data Teacher Podcast

https://is1-ssl.mzstatic.com/image/thumb/Podcasts115/v4/74/fe/d5/74fed5f1-4887-604a-629a-5db0625c96ab/mza_9739583941962038745.jpg/600x600bb.jpg

Your Data Teacher

7 episodes

5 days ago

A podcast about data science, machine learning, artificial intelligence, statistics and everything related to data. Home Page: https://www.yourdatateacher.com

Technology

RSS

All content for Your Data Teacher Podcast is the property of Your Data Teacher and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

A podcast about data science, machine learning, artificial intelligence, statistics and everything related to data. Home Page: https://www.yourdatateacher.com

Technology

Episodes (7/7)

Your Data Teacher Podcast

Episode 7 - A Python library to remove collinearity

Collinearity is a huge problem for machine learning problems. It increases the dimensions of our dataset without increasing the amount of information. That's why I've created a Python library that can be used to remove collinearity from a dataset. I talk about this library in this podcast.

Article: https://www.yourdatateacher.com/2021/06/28/a-python-library-to-remove-collinearity/

Pypi package: https://pypi.org/project/collinearity/

GitHub repo: https://github.com/gianlucamalato/collinearity

4 years ago

8 minutes 39 seconds

Your Data Teacher Podcast

Episode 6 - Checking the distribution of your data using Q-Q plot

In this episode, I'm talking about Q-Q plot and how to use it for checking if our dataset follows a particular distribution. Instead of using complex hypothesis tests like Kolmogorov-Smirnov test, using this simple plot, we'll be able to check if our dataset follows a particular distribution or if two datasets have been created according to the same distribution.

Link to the article: https://www.yourdatateacher.com/2021/06/16/how-to-use-q-q-plot-for-checking-the-distribution-of-our-data/

4 years ago

7 minutes 28 seconds

Your Data Teacher Podcast

Episode 5 - Tuning the threshold in binary classification tasks

In this episode, I'll talk about tuning the threshold in binary classification tasks. The usual value for the threshold is 0.5, but it's useful to optimize it in order to make the model fit our needs. I talk about optimizing according to the ROC curve and maximizing the balanced accuracy.

Link to the article: https://www.yourdatateacher.com/2021/06/14/are-you-still-using-0-5-as-a-threshold/

4 years ago

7 minutes 45 seconds

Your Data Teacher Podcast

Episode 4 - Ensemble models. Bagging and boosting

In this episode, I'm going to talk about ensemble models, particularly bagging and boosting. Bagging is very useful for reducing variance, boosting is used for reducing bias. The most common bagging algorithm is Random Forest, the most common boosting algorithm is Gradient Boosting, whose most common implementations are XGBoost, LightGBM and CatBoost.

Home Page: https://www.yourdatateacher.com

4 years ago

11 minutes 55 seconds

Your Data Teacher Podcast

Episode 3 - Precision, recall, accuracy. How to choose?

In this episode, I talk about accuracy, precision and recall. We're going to focus on what they are and when to use them in machine learning projects.

Link to the article: https://www.yourdatateacher.com/2021/06/07/precision-recall-accuracy-how-to-choose/

4 years ago

11 minutes 55 seconds

Your Data Teacher Podcast

Episode 2 - How to explain neural networks using SHAP

Today we're going to talk about how we can explain neural networks. Neural networks are like black boxes that hide the way they model and represent data. That's why explaining them is very difficult. A very powerful approach is called SHAP. Using this method, we can calculate the impact of a feature according to a given model independently of the type of model we're using. It's very useful for black boxes like neural networks.

Home page: https://www.yourdatateacher.com

Link to the article: https://www.yourdatateacher.com/2021/05/17/how-to-explain-neural-networks-using-shap/

4 years ago

6 minutes 54 seconds

Your Data Teacher Podcast

Episode 1 - How accurate is your accuracy?

Today we're going to talk about the standard error on proportions. In data science, it's very important to calculate the standard error on every estimate we calculate in order to see if finite-size effects are lowering the precision too much and in order to compare two different measurement results with each other.

Home page: https://www.yourdatateacher.com

Link to the article: https://www.yourdatateacher.com/2021/05/31/how-accurate-is-your-accuracy/

4 years ago

6 minutes 40 seconds