
The episode introduces GDPval, a new benchmark created by OpenAI to evaluate AI model performance on real-world, economically valuable tasks derived from the work of industry experts across the top nine sectors contributing to U.S. GDP. This evaluation covers tasks from 44 occupations and is intended to provide a more realistic assessment of AI capabilities than traditional academic benchmarks, including the use of multi-modal inputs and subjective grading by human experts.