
Why do GenAI systems confidently state incorrect medical facts instead of saying "I don't know?" Groundbreaking research from OpenAI and Georgia Tech reveals that AI hallucinations aren't bugs to be fixed—they're inevitable consequences of how these systems are trained. This episode explores the "singleton problem" that makes AI systematically unreliable on rare facts, connects to our previous discussion of AI benchmark saturation (Episode 9), and explains why the same evaluation methods that create impressive test scores actually reward confident guessing over appropriate uncertainty. For medical faculty evaluating AI tools, understanding these statistical realities is crucial for teaching students, conducting research, and developing institutional policies that account for AI's fundamental limitations.