
This episode focuses on feature engineering, a technique that transforms complex data like text and images into numerical representations called embeddings for use in predictive and causal applications. It begins by explaining principal component analysis and autoencoders as methods for generating these embeddings. The text then specifically addresses text embeddings, detailing early methods like Word2Vec and later, more sophisticated sequence models such as ELMo and BERT, highlighting their architectural differences and advancements in capturing context. Finally, the chapter covers image embeddings through models like ResNet50 and illustrates their practical application in hedonic price modeling, demonstrating how these engineered features significantly improve prediction accuracy compared to traditional methods.
Disclosure