This episode introduces and evaluates On-Policy Distillation (OPD) as a highly efficient method for the post-training of large language models (LLMs). The authors categorize LLM training into three phases—pre-training, mid-training, and post-training—and distinguish between on-policy training (sampling from the student model) and off-policy training (imitating external sources).
This episode is about introduce Chronos-2, a new time series foundation model developed by Amazon that expands beyond the limitations of previous models by supporting multivariate and covariate-informed forecasting in a zero-shot manner. The core innovation enabling this capability is the group attention mechanism, which allows the model to share information across related time series and external factors, significantly improving prediction accuracy in complex scenarios.
This episode is about C2S-Scale, a new family of large language models (LLMs) built upon Google's Gemma framework and designed for next-generation single-cell analysis. This platform translates high-dimensional single-cell RNA sequencing data into textual "cell sentences," enabling LLMs to process and synthesize vast amounts of transcriptomic and biological text data.
This episode is about the partnership between Google DeepMind and Commonwealth Fusion Systems (CFS) to accelerate the development of fusion energy, specifically focusing on CFS’s SPARC tokamak machine. This collaboration leverages Google DeepMind's Artificial Intelligence (AI) expertise, particularly reinforcement learning, to address the complex physics problems associated with stabilizing plasma at over 100 million degrees Celsius. A key component of this partnership is the open-source TORAX software, a fast, differentiable plasma simulator built in JAX, which allows researchers to run millions of virtual experiments to optimize SPARC's operations and identify the most efficient paths to achieving net fusion energy, or "breakeven.
This episode dives deep on significant shift in the AI development landscape, moving away from exclusive reliance on large, general-purpose cloud computing.
This episode dive deep on an Anthropic report and a related research paper, detail a joint study on the vulnerability of large language models (LLMs) to data poisoning attacks. The research surprisingly demonstrates that injecting a near-constant, small number of malicious documents—as few as 250—is sufficient to successfully introduce a backdoor vulnerability, regardless of the LLM's size (up to 13 billion parameters) or the total volume of its clean training data.
This episode introduce Petri (Parallel Exploration Tool for Risky Interactions), an open-source framework developed by Anthropic to accelerate AI safety research through automated auditing. Petri uses specialized AI auditor agents and LLM judges to test target models across diverse, multi-turn scenarios defined by human researchers via seed instructions.
This episode dive deep on Gemini 2.5 Computer Use model, a specialized AI model from Google DeepMind built on the Gemini 2.5 Pro architecture, designed to power agents capable of interacting with user interfaces (UIs). This model is accessible via the Gemini API for developers to create agents that can perform tasks like clicking, typing, and scrolling on web pages and applications.
This Episode dive deep on the latest article from The Budget Lab at Yale that provides an analysis of the initial impact of Artificial Intelligence (AI) on the U.S. labor market since the introduction of generative AI in November 2022. The authors conclude that despite widespread public anxiety about job losses, their data indicates no substantial, economy-wide disruption or acceleration in the rate of change in the occupational mix that can be clearly attributed to AI.
This episode dive deep on GEM (General Experience Maker), an open-source environment simulator designed to accelerate research on agentic Large Language Models (LLMs) by shifting their training paradigm from static datasets to experience-based learning in complex, interactive environments. Modeled after OpenAI-Gym, GEM provides a standardized framework for the agent-environment interface, supporting asynchronous execution, diverse tasks (including games, math, and coding), and external tools like Python and Search.
This episode dive deep on Anthropic last piece on the emerging field of context engineering, which is presented as the natural evolution of prompt engineering for building effective AI agents. Context engineering focuses on curating and managing the entire set of tokens; including prompts, tools, message history, and external data... that inform a large language model (LLM) during inference, acknowledging that context is a finite resource subject to degradation.
This episode dives deep on the Gemini-Robotics-1-5-Tech-Report report; significant advancement in generalist robots through the introduction of the Gemini Robotics 1.5 model family. This system features two core components: Gemini Robotics 1.5 (GR 1.5), a Vision-Language-Action (VLA) model that translates instructions into robot actions and supports multi-embodiment control, and Gemini Robotics-ER 1.5 (GR-ER 1.5), an enhanced Vision-Language Model (VLM) specialized in complex embodied reasoning and high-level task planning.
The episode introduces GDPval, a new benchmark created by OpenAI to evaluate AI model performance on real-world, economically valuable tasks derived from the work of industry experts across the top nine sectors contributing to U.S. GDP. This evaluation covers tasks from 44 occupations and is intended to provide a more realistic assessment of AI capabilities than traditional academic benchmarks, including the use of multi-modal inputs and subjective grading by human experts.
This episode is about a study titled "AI-Enhanced Sensemaking: Exploring the Design of a Generative AI-Based Assistant to Support Genetic Professionals," which investigates integrating generative AI to assist genetic experts in diagnosing rare diseases through whole genome sequencing (WGS) analysis. The research, conducted by collaborators from Microsoft Research, Drexel University, and the Broad Institute, identifies significant challenges faced by genetic professionals, such as information overload and difficulty prioritizing cases for reanalysis.
This episode details a groundbreaking research effort by Google DeepMind and collaborating academic institutions, focusing on the discovery of unstable singularities in fluid dynamics using advanced AI techniques.
This episode is about the latest Nvidia papers that advocates for the widespread adoption of Small Language Models (SLMs) over Large Language Models (LLMs) within agentic AI systems, asserting that SLMs are sufficiently powerful, more economical, and inherently more suitable for the repetitive and specialized tasks typical of such agents.
This episode dive deep on the Amazon Science article named Scientific frontiers of agentic AI. it discusses the emerging field of agentic AI, contrasting it with generative AI by emphasizing its ability to act autonomously on behalf of users by accessing and interacting with external resources.
This episode is about the working paper, "How People Use ChatGPT," investigates the widespread adoption and diverse applications of ChatGPT from its 2022 launch through July 2025. The authors analyze millions of de-identified user messages to understand usage patterns, finding that non-work-related interactions constitute the majority, though work-related use is significant for educated professionals.
This episode dive deep in the report from Anthropic that examines the rapid and geographically uneven adoption of AI, specifically Claude, across both consumer and enterprise users. It highlights that AI adoption is concentrated in higher-income regions and for certain tasks, particularly coding and administrative functions, mirroring historical patterns of technological diffusion but at an accelerated pace
This episode dive deep on the Thinking Machines Lab publication that addresses the challenge of achieving reproducibility in large language model (LLM) inference, noting that even with "greedy sampling" (temperature set to 0), results are often nondeterministic.