On the (Mis)Use of Machine Learning with Panel Data

https://is1-ssl.mzstatic.com/image/thumb/Podcasts221/v4/69/1a/79/691a796f-2f50-ab9c-6171-2c5cf6a68685/mza_18354664135280864003.jpg/600x600bb.jpg

Marketing^AI

Enoch H. Kang

114 episodes

6 days ago

AI breaks down top marketing research papers into clear, quick insights.

Marketing

Business

RSS

All content for Marketing^AI is the property of Enoch H. Kang and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

AI breaks down top marketing research papers into clear, quick insights.

Marketing

Business

https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/43460291/43460291-1744500449635-353790af0c35d.jpg

On the (Mis)Use of Machine Learning with Panel Data

Marketing^AI

17 minutes 37 seconds

3 months ago

On the (Mis)Use of Machine Learning with Panel Data

This academic paper investigates the critical issue of data leakage in applying machine learning (ML) to panel data, which combines cross-sectional and time-series observations. The authors explain that standard ML practices, when unsuited for panel data's inherent structure, can lead to temporal leakage (future information affecting past predictions) and cross-sectional leakage (information sharing across training and testing units). This leakage results in inflated model performance and misleading policy recommendations, as empirical applications, particularly for income prediction in U.S. counties, vividly demonstrate. To counter this, the paper offers practical guidelines for practitioners, emphasizing the importance of clearly defining research goals—whether for cross-sectional prediction or sequential forecasting—and implementing appropriate data splitting and cross-validation strategies to ensure robust and realistic ML model evaluation.