
This week on Fresh from the Labs, Shilpa, Kevin, and Jared dive into three headline-grabbing stories, and what they mean for builders right now:
Klarna’s customer-support U-turn. The team dissects the "replace 700 agents with AI" experiment, why a two-humans-per-bot fallback isn’t a failure, and what sustainable AI adoption in ops should look like.
OpenAI’s new Healthbench. Thousands of curated physician conversations power a fresh benchmark that pushes GPT-4o and rivals toward real clinical usefulness. We unpack where the models still stumble (context seeking!), the Epic integrations everyone is watching, and why a safer WebMD can’t come soon enough.
Codex 1 & cloud coding agents. OpenAI plants a giant flag in the fully-agentic dev-tool space, right as rumors swirl about the Windsurf acquisition. Kevin shares war-stories from building his own open-source coding agent, and the crew debates whether verticalized startups or open-source stacks will win the long game.
Along the way you’ll hear about the perils of voice agents mispronouncing simple words, Hacker News snark, and why watching fourth-graders play Ultimate Frisbee might be the purest form of agentic chaos.