Machine Learning Street Talk (MLST)

https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/34/e4/14/34e41455-8b0d-9b6e-3290-e02f6da69696/mza_1821645445708339756.jpg/600x600bb.jpg

230 episodes

2 days ago

Welcome! We engage in fascinating discussions with pre-eminent figures in the AI field. Our flagship show covers current affairs in AI, cognitive science, neuroscience and philosophy of mind with in-depth analysis. Our approach is unrivalled in terms of scope and rigour – we believe in intellectual diversity in AI, and we touch on all of the main ideas in the field with the hype surgically removed. MLST is run by Tim Scarfe, Ph.D (https://www.linkedin.com/in/ecsquizor/) and features regular appearances from MIT Doctor of Philosophy Keith Duggar (https://www.linkedin.com/in/dr-keith-duggar/).

Technology

RSS

All content for Machine Learning Street Talk (MLST) is the property of Machine Learning Street Talk (MLST) and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

Technology

Episodes (20/230)

Machine Learning Street Talk (MLST)

Deep Learning is Not So Mysterious or Different - Prof. Andrew Gordon Wilson (NYU)

Professor Andrew Wilson from NYU explains why many common-sense ideas in artificial intelligence might be wrong. For decades, the rule of thumb in machine learning has been to fear complexity. The thinking goes: if your model has too many parameters (is "too complex") for the amount of data you have, it will "overfit" by essentially memorizing the data instead of learning the underlying patterns. This leads to poor performance on new, unseen data. This is known as the classic "bias-variance trade-off" i.e. a balancing act between a model that's too simple and one that's too complex.

**SPONSOR MESSAGES**

—

Tufa AI Labs is an AI research lab based in Zurich. **They are hiring ML research engineers!**

This is a once in a lifetime opportunity to work with one of the best labs in Europe

Contact Benjamin Crouzier - https://tufalabs.ai/

—

Take the Prolific human data survey - https://www.prolific.com/humandatasurvey?utm_source=mlst and be the first to see the results and benchmark their practices against the wider community!

—

cyber•Fund https://cyber.fund/?utm_source=mlst is a founder-led investment firm accelerating the cybernetic economy

Oct SF conference - https://dagihouse.com/?utm_source=mlst - Joscha Bach keynoting(!) + OAI, Anthropic, NVDA,++

Hiring a SF VC Principal: https://talent.cyber.fund/companies/cyber-fund-2/jobs/57674170-ai-investment-principal#content?utm_source=mlst

Submit investment deck: https://cyber.fund/contact?utm_source=mlst

—

Description Continued:

Professor Wilson challenges this fundamental belief (fearing complexity). He makes a few surprising points:

**Bigger Can Be Better**: massive models don't just get more flexible; they also develop a stronger "simplicity bias". So, if your model is overfitting, the solution might paradoxically be to make it even bigger.

**The "Bias-Variance Trade-off" is a Misnomer**: Wilson claims you don't actually have to trade one for the other. You can have a model that is incredibly expressive and flexible while also being strongly biased toward simple solutions. He points to the "double descent" phenomenon, where performance first gets worse as models get more complex, but then surprisingly starts getting better again.

**Honest Beliefs and Bayesian Thinking**: His core philosophy is that we should build models that honestly represent our beliefs about the world. We believe the world is complex, so our models should be expressive. But we also believe in Occam's razor—that the simplest explanation is often the best. He champions Bayesian methods, which naturally balance these two ideas through a process called marginalization, which he describes as an automatic Occam's razor.

TOC:

[00:00:00] Introduction and Thesis

[00:04:19] Challenging Conventional Wisdom

[00:11:17] The Philosophy of a Scientist-Engineer

[00:16:47] Expressiveness, Overfitting, and Bias

[00:28:15] Understanding, Compression, and Kolmogorov Complexity

[01:05:06] The Surprising Power of Generalization

[01:13:21] The Elegance of Bayesian Inference

[01:33:02] The Geometry of Learning

[01:46:28] Practical Advice and The Future of AI

Prof. Andrew Gordon Wilson:

https://x.com/andrewgwils

https://cims.nyu.edu/~andrewgw/

https://scholar.google.com/citations?user=twWX2LIAAAAJ&hl=en

https://www.youtube.com/watch?v=Aja0kZeWRy4

https://www.youtube.com/watch?v=HEp4TOrkwV4

TRANSCRIPT:

https://app.rescript.info/public/share/H4Io1Y7Rr54MM05FuZgAv4yphoukCfkqokyzSYJwCK8

Hosts:

Dr. Tim Scarfe / Dr. Keith Duggar (MIT Ph.D)

REFS:

Deep Learning is Not So Mysterious or Different [Andrew Gordon Wilson]

https://arxiv.org/abs/2503.02113

Bayesian Deep Learning and a Probabilistic Perspective of Generalization [Andrew Gordon Wilson, Pavel Izmailov]

https://arxiv.org/abs/2002.08791

Compute-Optimal LLMs Provably Generalize Better With Scale [Marc Finzi, Sanyam Kapoor, Diego Granziol, Anming Gu, Christopher De Sa, J. Zico Kolter, Andrew Gordon Wilson]

https://arxiv.org/abs/2504.15208

6 days ago

2 hours 3 minutes 48 seconds

Machine Learning Street Talk (MLST)

Karl Friston - Why Intelligence Can't Get Too Large (Goldilocks principle)

In this episode, hosts Tim and Keith finally realize their long-held dream of sitting down with their hero, the brilliant neuroscientist Professor Karl Friston. The conversation is a fascinating and mind-bending journey into Professor Friston's life's work, the Free Energy Principle, and what it reveals about life, intelligence, and consciousness itself.

**SPONSORS**

Gemini CLI is an open-source AI agent that brings the power of Gemini directly into your terminal - https://github.com/google-gemini/gemini-cli

---

Take the Prolific human data survey - https://www.prolific.com/humandatasurvey?utm_source=mlst and be the first to see the results and benchmark their practices against the wider community!

---

cyber•Fund https://cyber.fund/?utm_source=mlst is a founder-led investment firm accelerating the cybernetic economy

Oct SF conference - https://dagihouse.com/?utm_source=mlst - Joscha Bach keynoting(!) + OAI, Anthropic, NVDA,++

Hiring a SF VC Principal: https://talent.cyber.fund/companies/cyber-fund-2/jobs/57674170-ai-investment-principal#content?utm_source=mlst

Submit investment deck: https://cyber.fund/contact?utm_source=mlst

***

They kick things off by looking back on the 20-year journey of the Free Energy Principle. Professor Friston explains it as a fundamental rule for survival: all living things, from a single cell to a human being, are constantly trying to make sense of the world and reduce unpredictability. It’s this drive to minimize surprise that allows things to exist and maintain their structure.

This leads to a bigger question: What does it truly mean to be "intelligent"? The group debates whether intelligence is everywhere, even in a virus or a plant, or if it requires a certain level of complexity.

Professor Friston introduces the idea of different "kinds" of things, suggesting that creatures like us, who can model themselves and think about the future, possess a unique and "strange" kind of agency that sets us apart.

From intelligence, the discussion naturally flows to the even trickier concept of consciousness. Is it the same as intelligence? Professor Friston argues they are different. He explains that consciousness might emerge from deep, layered self-awareness—not just acting, but understanding that you are the one causing your actions and thinking about your place in the world.

They also explore intelligence at different sizes. Is a corporation intelligent? What about the entire planet? Professor Friston suggests there might be a "Goldilocks zone" for intelligence. It doesn't seem to exist at the super-tiny atomic level or at the massive scale of planets and solar systems, but thrives in the complex middle-ground where we live.

Finally, they tackle one of the most pressing topics of our time: Can we build a truly conscious AI? Professor Friston shares his doubts about whether our current computers are capable of a feat like that. He suggests that genuine consciousness might require a different kind of "mortal" computation, where the machine's physical body and its "mind" are inseparable, much like in biological creatures.

TRANSCRIPT:

https://app.rescript.info/public/share/FZkF8BO7HMt9aFfu2_q69WGT_ZbYZ1VVkC6RtU3eeOI

TOC:

00:00:00: Introduction & Retrospective on the Free Energy Principle

00:09:34: Strange Particles, Agency, and Consciousness

00:37:45: The Scale of Intelligence: From Viruses to the Biosphere

01:01:35: Modelling, Boundaries, and Practical Application

01:21:12: Conclusion

2 weeks ago

1 hour 21 minutes 39 seconds

Machine Learning Street Talk (MLST)

The Day AI Solves My Puzzles Is The Day I Worry (Prof. Cristopher Moore)

We are joined by Cristopher Moore, a professor at the Santa Fe Institute with a diverse background in physics, computer science, and machine learning.The conversation begins with Cristopher, who calls himself a "frog" explaining that he prefers to dive deep into specific, concrete problems rather than taking a high-level "bird's-eye view". They explore why current AI models, like transformers, are so surprisingly effective. Cristopher argues it's because the real world isn't random; it's full of rich structures, patterns, and hierarchies that these models can learn to exploit, even if we don't fully understand how.**SPONSORS**Take the Prolific human data survey - https://www.prolific.com/humandatasurvey?utm_source=mlst and be the first to see the results and benchmark their practices against the wider community!---Cyber•Fund https://cyber.fund/?utm_source=mlst is a founder-led investment firm accelerating the cybernetic economy.Oct SF conference - https://dagihouse.com/?utm_source=mlst - Joscha Bach keynoting(!) + OAI, Anthropic, NVDA,++Hiring a SF VC Principal: https://talent.cyber.fund/companies/cyber-fund-2/jobs/57674170-ai-investment-principal#content?utm_source=mlstSubmit investment deck: https://cyber.fund/contact?utm_source=mlst***

Cristopher Moore:

https://sites.santafe.edu/~moore/

TOC:00:00:00 - Introduction00:02:05 - Meet Christopher Moore: A Frog in the World of Science00:05:14 - The Limits of Transformers and Real-World Data00:11:19 - Intelligence as Creative Problem-Solving00:23:30 - Grounding, Meaning, and Shared Reality00:31:09 - The Nature of Creativity and Aesthetics00:44:31 - Computational Irreducibility and Universality00:53:06 - Turing Completeness, Recursion, and Intelligence01:11:26 - The Universe Through a Computational Lens01:26:45 - Algorithmic Justice and the Need for Transparency

TRANSCRIPT: https://app.rescript.info/public/share/VRe2uQSvKZOm0oIBoDsrNwt46OMCqRnShVnUF3qyoFk

Filmed at DISI (Diverse Intelligences Summer Institute)

https://disi.org/

REFS:The Nature of computation [Chris Moore]https://nature-of-computation.org/ Birds and Frogs [Freeman Dyson]https://www.ams.org/notices/200902/rtx090200212p.pdf Replica Theory [Parisi et al]https://arxiv.org/pdf/1409.2722 Janossy pooling [Fabian Fuchs]https://fabianfuchsml.github.io/equilibriumaggregation/ Cracking the cryptic [YT channel]https://www.youtube.com/c/CrackingTheCrypticSudoko Bench [Sakana]https://sakana.ai/sudoku-bench/Fractured entangled representations “phylogenetic locking in comment” [Kumar/Stanley]https://arxiv.org/pdf/2505.11581 (see our shows on this)The War Against Cliché: [Martin Amis]https://www.amazon.com/War-Against-Cliche-Reviews-1971-2000/dp/0375727167Rule 110 (CA)https://mathworld.wolfram.com/Rule150.htmlUniversality in Elementary Cellular Automata [Matt Cooke]https://wpmedia.wolfram.com/sites/13/2018/02/15-1-1.pdf Small Semi-Weakly Universal Turing Machines [Damien Woods] https://tilde.ini.uzh.ch/users/tneary/public_html/WoodsNeary-FI09.pdf COMPUTING MACHINERY AND INTELLIGENCE [Turing, 1950]https://courses.cs.umbc.edu/471/papers/turing.pdf Comment on Space Time as a causal set [Moore, 88]https://sites.santafe.edu/~moore/comment.pdf Recursion Theory on the Reals and Continuous-time Computation [Moore, 96]

3 weeks ago

1 hour 34 minutes 52 seconds

Machine Learning Street Talk (MLST)

Michael Timothy Bennett: Defining Intelligence and AGI Approaches

Dr. Michael Timothy Bennett is a computer scientist who's deeply interested in understanding artificial intelligence, consciousness, and what it means to be alive. He's known for his provocative paper "What the F*** is Artificial Intelligence" which challenges conventional thinking about AI and intelligence.**SPONSOR MESSAGES***Prolific: Quality data. From real people. For faster breakthroughs.https://prolific.com/mlst?utm_campaign=98404559-MLST&utm_source=youtube&utm_medium=podcast&utm_content=mb***Michael takes us on a journey through some of the biggest questions in AI and consciousness. He starts by exploring what intelligence actually is - settling on the idea that it's about "adaptation with limited resources" (a definition from researcher Pei Wang that he particularly likes).The discussion ranges from technical AI concepts to philosophical questions about consciousness, with Michael offering fresh perspectives that challenge Silicon Valley's "just scale it up" approach to AI. He argues that true intelligence isn't just about having more parameters or data - it's about being able to adapt efficiently, like biological systems do.TOC:1. Introduction & Paper Overview [00:01:34]2. Definitions of Intelligence [00:02:54]3. Formal Models (AIXI, Active Inference) [00:07:06]4. Causality, Abstraction & Embodiment [00:10:45]5. Computational Dualism & Mortal Computation [00:25:51]6. Modern AI, AGI Progress & Benchmarks [00:31:30]7. Hybrid AI Approaches [00:35:00]8. Consciousness & The Hard Problem [00:39:35]9. The Diverse Intelligences Summer Institute (DISI) [00:53:20]10. Living Systems & Self-Organization [00:54:17]11. Closing Thoughts [01:04:24]Michaels socials:https://michaeltimothybennett.com/https://x.com/MiTiBennettTranscript:https://app.rescript.info/public/share/4jSKbcM77Sf6Zn-Ms4hda7C4krRrMcQt0qwYqiqPTPIReferences:Bennett, M.T. "What the F*** is Artificial Intelligence"https://arxiv.org/abs/2503.23923Bennett, M.T. "Are Biological Systems More Intelligent Than Artificial Intelligence?" https://arxiv.org/abs/2405.02325Bennett, M.T. PhD Thesis "How To Build Conscious Machines"https://osf.io/preprints/thesiscommons/wehmg_v1Legg, S. & Hutter, M. (2007). "Universal Intelligence: A Definition of Machine Intelligence"Wang, P. "Defining Artificial Intelligence" - on non-axiomatic reasoning systems (NARS)Chollet, F. (2019). "On the Measure of Intelligence" - introduces the ARC benchmark and developer-aware generalizationHutter, M. (2005). "Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability"Chalmers, D. "The Hard Problem of Consciousness"Descartes, R. - Cartesian dualism and the pineal gland theory (historical context)Friston, K. - Free Energy Principle and Active Inference frameworkLevin, M. - Work on collective intelligence, cancer as information isolation, and "mind blindness"Hinton, G. (2022). "The Forward-Forward Algorithm" - introduces mortal computation conceptAlexander Ororbia & Friston - Formal treatment of mortal computationSutton, R. "The Bitter Lesson" - on search and learning in AIPearl, J. "The Book of Why" - causal inference and reasoningAlternative AGI ApproachesWang, P. - NARS (Non-Axiomatic Reasoning System)Goertzel, B. - Hyperon system and modular AGI architecturesBenchmarks & EvaluationHendrycks, D. - Humanities Last Exam benchmark (mentioned re: saturation)Filmed at:Diverse Intelligences Summer Institute (DISI) https://disi.org/

4 weeks ago

1 hour 5 minutes 44 seconds

Machine Learning Street Talk (MLST)

Superintelligence Strategy (Dan Hendrycks)

Deep dive with Dan Hendrycks, a leading AI safety researcher and co-author of the "Superintelligence Strategy" paper with former Google CEO Eric Schmidt and Scale AI CEO Alexandr Wang.

*** SPONSOR MESSAGES

Gemini CLI is an open-source AI agent that brings the power of Gemini directly into your terminal - https://github.com/google-gemini/gemini-cli

Prolific: Quality data. From real people. For faster breakthroughs.

https://prolific.com/mlst?utm_campaign=98404559-MLST&utm_source=youtube&utm_medium=podcast&utm_content=script-gen

***

Hendrycks argues that society is making a fundamental mistake in how it views artificial intelligence. We often compare AI to transformative but ultimately manageable technologies like electricity or the internet. He contends a far better and more realistic analogy is nuclear technology. Like nuclear power, AI has the potential for immense good, but it is also a dual-use technology that carries the risk of unprecedented catastrophe.

The Problem with an AI "Manhattan Project":

A popular idea is for the U.S. to launch a "Manhattan Project" for AI—a secret, all-out government race to build a superintelligence before rivals like China. Hendrycks argues this strategy is deeply flawed and dangerous for several reasons:

- It wouldn’t be secret. You cannot hide a massive, heat-generating data center from satellite surveillance.

- It would be destabilizing. A public race would alarm rivals, causing them to start their own desperate, corner-cutting projects, dramatically increasing global risk.

- It’s vulnerable to sabotage. An AI project can be crippled in many ways, from cyberattacks that poison its training data to physical attacks on its power plants. This is what the paper refers to as a "maiming attack."

This vulnerability leads to the paper's central concept: Mutual Assured AI Malfunction (MAIM). This is the AI-era version of the nuclear-era's Mutual Assured Destruction (MAD). In this dynamic, any nation that makes an aggressive, destabilizing bid for a world-dominating AI must expect its rivals to sabotage the project to ensure their own survival.

This deterrence, Hendrycks argues, is already the default reality we live in.

A Better Strategy: The Three Pillars

Instead of a reckless race, the paper proposes a more stable, three-part strategy modeled on Cold War principles:

- Deterrence: Acknowledge the reality of MAIM. The goal should not be to "win" the race to superintelligence, but to deter anyone from starting such a race in the first place through the credible threat of sabotage.

- Nonproliferation: Just as we work to keep fissile materials for nuclear bombs out of the hands of terrorists and rogue states, we must control the key inputs for catastrophic AI. The most critical input is advanced AI chips (GPUs). Hendrycks makes the powerful claim that building cutting-edge GPUs is now more difficult than enriching uranium, making this strategy viable.

- Competitiveness: The race between nations like the U.S. and China should not be about who builds superintelligence first. Instead, it should be about who can best use existing AI to build a stronger economy, a more effective military, and more resilient supply chains (for example, by manufacturing more chips domestically).

Dan says the stakes are high if we fail to manage this transition:

- Erosion of Control

- Intelligence Recursion

- Worthless Labor

Hendrycks maintains that while the risks are existential, the future is not set.

TOC:

1 Measuring the Beast [00:00:00]

2 Defining the Beast [00:11:34]

3 The Core Strategy [00:38:20]

4 Ideological Battlegrounds [00:53:12]

5 Mechanisms of Control [01:34:45]

TRANSCRIPT:

https://app.rescript.info/public/share/cOKcz4pWRPjh7BTIgybd7PUr_vChUaY6VQW64No8XMs

<truncated, see refs and larger description on YT version>

1 month ago

1 hour 45 minutes 38 seconds

Machine Learning Street Talk (MLST)

DeepMind Genie 3 [World Exclusive] (Jack Parker Holder, Shlomi Fruchter)

This episode features Shlomi Fuchter and Jack Parker Holder from Google DeepMind, who are unveiling a new AI called Genie 3. The host, Tim Scarfe, describes it as the most mind-blowing technology he has ever seen. We were invited to their offices to conduct the interview (not sponsored).Imagine you could create a video game world just by describing it. That's what Genie 3 does. It's an AI "world model" that learns how the real world works by watching massive amounts of video. Unlike a normal video game engine (like Unreal or the one for Doom) that needs to be programmed manually, Genie generates a realistic, interactive, 3D world from a simple text prompt.**SPONSOR MESSAGES***Prolific: Quality data. From real people. For faster breakthroughs.https://prolific.com/mlst?utm_campaign=98404559-MLST&utm_source=youtube&utm_medium=podcast&utm_content=script-gen***Here’s a breakdown of what makes it so revolutionary:From Text to a Virtual World: You can type "a drone flying by a beautiful lake" or "a ski slope," and Genie 3 creates that world for you in about three seconds. You can then navigate and interact with it in real-time.It's Consistent: The worlds it creates have a reliable memory. If you look away from an object and then look back, it will still be there, just as it was. The guests explain that this consistency isn't explicitly programmed in; it's a surprising, "emergent" capability of the powerful AI model.A Huge Leap Forward: The previous version, Genie 2, was a major step, but it wasn't fast enough for real-time interaction and was much lower resolution. Genie 3 is 720p, interactive, and photorealistic, running smoothly for several minutes at a time.The Killer App - Training Robots: Beyond entertainment, the team sees Genie 3 as a game-changer for training AI. Instead of training a self-driving car or a robot in the real world (which is slow and dangerous), you can create infinite simulations. You can even prompt rare events to happen, like a deer running across the road, to teach an AI how to handle unexpected situations safely.The Future of Entertainment: this could lead to a "YouTube version 2" or a new form of VR, where users can create and explore endless, interconnected worlds together, like the experience machine from philosophy.While the technology is still a research prototype and not yet available to the public, it represents a monumental step towards creating true artificial worlds from the ground up.Jack Parker Holder [Research Scientist at Google DeepMind in the Open-Endedness Team]https://jparkerholder.github.io/Shlomi Fruchter [Research Director, Google DeepMind]https://shlomifruchter.github.io/TOC:[00:00:00] - Introduction: "The Most Mind-Blowing Technology I've Ever Seen"[00:02:30] - The Evolution from Genie 1 to Genie 2[00:04:30] - Enter Genie 3: Photorealistic, Interactive Worlds from Text[00:07:00] - Promptable World Events & Training Self-Driving Cars[00:14:21] - Guest Introductions: Shlomi Fuchter & Jack Parker Holder[00:15:08] - Core Concepts: What is a "World Model"?[00:19:30] - The Challenge of Consistency in a Generated World[00:21:15] - Context: The Neural Network Doom Simulation[00:25:25] - How Do You Measure the Quality of a World Model?[00:28:09] - The Vision: Using Genie to Train Advanced Robots[00:32:21] - Open-Endedness: Human Skill and Prompting Creativity[00:38:15] - The Future: Is This the Next YouTube or VR?[00:42:18] - The Next Step: Multi-Agent Simulations[00:52:51] - Limitations: Thinking, Computation, and the Sim-to-Real Gap[00:58:07] - Conclusion & The Future of Game EnginesREFS:World Models [David Ha, Jürgen Schmidhuber]https://arxiv.org/abs/1803.10122POEThttps://arxiv.org/abs/1901.01753[Akarsh Kumar, Jeff Clune, Joel Lehman, Kenneth O. Stanley]The Fractured Entangled Representation Hypothesishttps://arxiv.org/pdf/2505.11581TRANSCRIPT:https://app.rescript.info/public/share/Zk5tZXk6mb06yYOFh6nSja7Lg6_qZkgkuXQ-kl5AJqM

1 month ago

58 minutes 22 seconds

Machine Learning Street Talk (MLST)

Large Language Models and Emergence: A Complex Systems Perspective (Prof. David C. Krakauer)

Prof. David Krakauer, President of the Santa Fe Institute argues that we are fundamentally confusing knowledge with intelligence, especially when it comes to AI.

He defines true intelligence as the ability to do more with less—to solve novel problems with limited information. This is contrasted with current AI models, which he describes as doing less with more; they require astounding amounts of data to perform tasks that don't necessarily demonstrate true understanding or adaptation. He humorously calls this "really shit programming".

David challenges the popular notion of "emergence" in Large Language Models (LLMs). He explains that the tech community's definition—seeing a sudden jump in a model's ability to perform a task like three-digit math—is superficial. True emergence, from a complex systems perspective, involves a fundamental change in the system's internal organization, allowing for a new, simpler, and more powerful level of description. He gives the example of moving from tracking individual water molecules to using the elegant laws of fluid dynamics. For LLMs to be truly emergent, we'd need to see them develop new, efficient internal representations, not just get better at memorizing patterns as they scale.

Drawing on his background in evolutionary theory, David explains that systems like brains, and later, culture, evolved to process information that changes too quickly for genetic evolution to keep up. He calls culture "evolution at light speed" because it allows us to store our accumulated knowledge externally (in books, tools, etc.) and build upon it without corrupting the original.

This leads to his concept of "exbodiment," where we outsource our cognitive load to the world through things like maps, abacuses, or even language itself.

We create these external tools, internalize the skills they teach us, improve them, and create a feedback loop that enhances our collective intelligence.

However, he ends with a warning. While technology has historically complemented our deficient abilities, modern AI presents a new danger. Because we have an evolutionary drive to conserve energy, we will inevitably outsource our thinking to AI if we can. He fears this is already leading to a "diminution and dilution" of human thought and creativity. Just as our muscles atrophy without use, he argues our brains will too, and we risk becoming mentally dependent on these systems.

TOC:

[00:00:00] Intelligence: Doing more with less

[00:02:10] Why brains evolved: The limits of evolution

[00:05:18] Culture as evolution at light speed

[00:08:11] True meaning of emergence: "More is Different"

[00:10:41] Why LLM capabilities are not true emergence

[00:15:10] What real emergence would look like in AI

[00:19:24] Symmetry breaking: Physics vs. Life

[00:23:30] Two types of emergence: Knowledge In vs. Out

[00:26:46] Causality, agency, and coarse-graining

[00:32:24] "Exbodiment": Outsourcing thought to objects

[00:35:05] Collective intelligence & the boundary of the mind

[00:39:45] Mortal vs. Immortal forms of computation

[00:42:13] The risk of AI: Atrophy of human thought

David Krakauer

President and William H. Miller Professor of Complex Systems

https://www.santafe.edu/people/profile/david-krakauer

REFS:

Large Language Models and Emergence: A Complex Systems Perspective

David C. Krakauer, John W. Krakauer, Melanie Mitchell

https://arxiv.org/abs/2506.11135

Filmed at the Diverse Intelligences Summer Institute:

https://disi.org/

1 month ago

49 minutes 48 seconds

Machine Learning Street Talk (MLST)

Pushing compute to the limits of physics

Dr. Maxwell Ramstead grills Guillaume Verdon (AKA “Beff Jezos”) who's the founder of Thermodynamic computing startup Extropic.

***SPONSOR MESSAGE***

Google Gemini 2.5 Flash is a state-of-the-art language model in the Gemini app. Sign up at https://gemini.google.com

***

Guillaume shares his unique path – from dreaming about space travel as a kid to becoming a physicist, then working on quantum computing at Google, to developing a radically new form of computing hardware for machine learning. He explains how he hit roadblocks with traditional physics and computing, leading him to start his company – building "thermodynamic computers." These are based on a new design for super-efficient chips that use the natural chaos of electrons (think noise and heat) to power AI tasks, which promises to speed up AND lower the costs of modern probabilistic techniques like sampling. He is driven by the pursuit of building computers that work more like your brain, which (by the way) runs on a banana and a glass of water!

Guillaume talks about his alter ego, Beff Jezos, and the "Effective Accelerationism" (e/acc) movement that he initiated. Its objective is to speed up tech progress in order to “grow civilization” (as measured by energy use and innovation), rather than “slowing down out of fear”. Guillaume argues we need to embrace variance, exploration, and optimism to avoid getting stuck or outpaced by competitors like China. He and Maxwell discuss big ideas like merging humans with AI, decentralizing intelligence, and why boundless growth (with smart constraints) is “key to humanity's future”.

REFS:

1. John Archibald Wheeler - "It From Bit" Concept

00:04:45 - Foundational work proposing that physical reality emerges from information at the quantum level

Learn more: https://cqi.inf.usi.ch/qic/wheeler.pdf

2. AdS/CFT Correspondence (Holographic Principle)

00:05:15 - Theoretical physics duality connecting quantum gravity in Anti-de Sitter space with conformal field theory

https://en.wikipedia.org/wiki/Holographic_principle

3. Renormalization Group Theory

00:06:15 - Mathematical framework for analyzing physical systems across different length scales

https://www.damtp.cam.ac.uk/user/dbs26/AQFT/Wilsonchap.pdf

4. Maxwell's Demon and Information Theory

00:21:15 - Thought experiment linking information processing to thermodynamics and entropy

https://plato.stanford.edu/entries/information-entropy/

5. Landauer's Principle

00:29:45 - Fundamental limit establishing minimum energy required for information erasure

https://en.wikipedia.org/wiki/Landauer%27s_principle

6. Free Energy Principle and Active Inference

01:03:00 - Mathematical framework for understanding self-organizing systems and perception-action loops

https://www.nature.com/articles/nrn2787

7. Max Tegmark - Information Bottleneck Principle

01:07:00 - Connections between information theory and renormalization in machine learning

https://arxiv.org/abs/1907.07331

8. Fisher's Fundamental Theorem of Natural Selection

01:11:45 - Mathematical relationship between genetic variance and evolutionary fitness

https://en.wikipedia.org/wiki/Fisher%27s_fundamental_theorem_of_natural_selection

9. Tensor Networks in Quantum Systems

00:06:45 - Computational framework for simulating many-body quantum systems

https://arxiv.org/abs/1912.10049

10. Quantum Neural Networks

00:09:30 - Hybrid quantum-classical models for machine learning applications

https://en.wikipedia.org/wiki/Quantum_neural_network

11. Energy-Based Models (EBMs)

00:40:00 - Probabilistic framework for unsupervised learning based on energy functions

https://www.researchgate.net/publication/200744586_A_tutorial_on_energy-based_learning

12. Markov Chain Monte Carlo (MCMC)

00:20:00 - Sampling algorithm fundamental to modern AI and statistical physics

https://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo

13. Metropolis-Hastings Algorithm

00:23:00 - Core sampling method for probability distributions

https://arxiv.org/abs/1504.01896

2 months ago

1 hour 23 minutes 32 seconds

Machine Learning Street Talk (MLST)

The Fractured Entangled Representation Hypothesis (Kenneth Stanley, Akarsh Kumar)

Are the AI models you use today imposters?

Please watch the intro video we did before this: https://www.youtube.com/watch?v=o1q6Hhz0MAg

In this episode, hosts Dr. Tim Scarfe and Dr. Duggar are joined by AI researcher Prof. Kenneth Stanley and MIT PhD student Akash Kumar to discuss their fascinating paper, "Questioning Representational Optimism in Deep Learning."

Imagine you ask two people to draw a perfect skull. One is a brilliant artist who understands anatomy, the other is a machine that just traces the image. Both drawings look identical, but the artist understands what a skull is—they know where the mouth is, how the jaw works, and that it's symmetrical. The machine just has a tangled mess of lines that happens to form the right picture.

An AI with an elegant representation, has the building blocks to generate truly new ideas.

The Path Is the Goal: As Kenneth Stanley puts it, "it matters not just where you get, but how you got there". Two students can ace a math test, but the one who truly understands the concepts—instead of just memorizing formulas—is the one who will go on to make new discoveries.

The show is a mixture of 3 separate recordings we have done, the original Patreon warmup with Tim/Kenneth, the Tim/Keith "Steakhouse" recorded after the main interview, then the main interview with Kenneth/Akarsh/Keith/Tim. Feel free to skip around. We had to edit this in a rush as we are travelling next week but it's reasonably cleaned up.

TOC:

00:00:00 Intro: Garbage vs. Amazing Representations

00:05:42 How Good Representations Form

00:11:14 Challenging the "Bitter Lesson"

00:18:04 AI Creativity & Representation Types

00:22:13 Steakhouse: Critiques & Alternatives

00:28:30 Steakhouse: Key Concepts & Goldilocks Zone

00:39:42 Steakhouse: A Sober View on AI Risk

00:43:46 Steakhouse: The Paradox of Open-Ended Search

00:47:58 Main Interview: Paper Intro & Core Concepts

00:56:44 Main Interview: Deception and Evolvability

01:36:30 Main Interview: Reinterpreting Evolution

01:56:16 Main Interview: Impostor Intelligence

02:11:15 Main Interview: Recommendations for AI Research

REFS:

Questioning Representational Optimism in Deep Learning:

The Fractured Entangled Representation Hypothesis

Akarsh Kumar, Jeff Clune, Joel Lehman, Kenneth O. Stanley

https://arxiv.org/pdf/2505.11581

Kenneth O. Stanley, Joel Lehman

Why Greatness Cannot Be Planned: The Myth of the Objective

https://amzn.to/44xLaXK

Original show with Kenneth from 4 years ago:

https://www.youtube.com/watch?v=lhYGXYeMq_E

Kenneth Stanley is SVP Open Endedness at Lila Sciences

https://x.com/kenneth0stanley

Akarsh Kumar (MIT)

https://akarshkumar.com/

AND... Kenneth is HIRING (this is an OPPORTUNITY OF A LIFETIME!)

Research Engineer: https://job-boards.greenhouse.io/lila/jobs/7890007002

Research Scientist: https://job-boards.greenhouse.io/lila/jobs/8012245002

TRANSCRIPT:

https://app.rescript.info/public/share/W_T7E1OC2Wj49ccqlIOOztg2MJWaaVbovTeyxcFEQdU

2 months ago

2 hours 16 minutes 22 seconds

Machine Learning Street Talk (MLST)

The Fractured Entangled Representation Hypothesis (Intro)

What if today's incredible AI is just a brilliant "impostor"? This episode features host Dr. Tim Scarfe in conversation with guests Prof. Kenneth Stanley (ex-OpenAI), Dr. Keith Duggar (MIT), and Arkash Kumar (MIT).While AI today produces amazing results on the surface, its internal understanding is a complete mess, described as "total spaghetti" [00:00:49]. This is because it's trained with a brute-force method (SGD) that’s like building a sandcastle: it looks right from a distance, but has no real structure holding it together [00:01:45].To explain the difference, Keith Duggar shares a great analogy about his high school physics classes [00:03:18]. One class was about memorizing lots of formulas for specific situations (like the "impostor" AI). The other used calculus to derive the answers from a deeper understanding, which was much easier and more powerful. This is the core difference: one method memorizes, the other truly understands.The episode then introduces a different, more powerful way to build AI, based on Kenneth Stanley's old experiment, "Picbreeder" [00:04:45]. This method creates AI with a shockingly clean and intuitive internal model of the world. For example, it might develop a model of a skull where it understands the "mouth" as a separate component it can open and close, without ever being explicitly trained on that action [00:06:15]. This deep understanding emerges bottom-up, without massive datasets.The secret is to abandon a fixed goal and embrace "deception" [00:08:42]—the idea that the stepping stones to a great discovery often don't look anything like the final result. Instead of optimizing for a target, the AI is built through an open-ended process of exploring what's "interesting" [00:09:15]. This creates a more flexible and adaptable foundation, a bit like how evolvability wins out in nature [00:10:30].The show concludes by arguing that this choice matters immensely. The "impostor" path may be hitting a wall, requiring insane amounts of money and energy for progress and failing to deliver true creativity or continual learning [00:13:00]. The ultimate message is a call to not put all our eggs in one basket [00:14:25]. We should explore these open-ended, creative paths to discover a more genuine form of intelligence, which may be found where we least expect it.REFS:Questioning Representational Optimism in Deep Learning:The Fractured Entangled Representation HypothesisAkarsh Kumar, Jeff Clune, Joel Lehman, Kenneth O. Stanleyhttps://arxiv.org/pdf/2505.11581Kenneth O. Stanley, Joel LehmanWhy Greatness Cannot Be Planned: The Myth of the Objectivehttps://amzn.to/44xLaXKOriginal show with Kenneth from 4 years ago:https://www.youtube.com/watch?v=lhYGXYeMq_EKenneth Stanley is SVP Open Endedness at Lila Scienceshttps://x.com/kenneth0stanleyAkarsh Kumar (MIT)https://akarshkumar.com/AND... Kenneth is HIRING (this is an OPPORTUNITY OF A LIFETIME!)Research Engineer: https://job-boards.greenhouse.io/lila/jobs/7890007002Research Scientist: https://job-boards.greenhouse.io/lila/jobs/8012245002Tim's Code visualisation of FER based on Akarsh repo: https://github.com/ecsplendid/ferTRANSCRIPT: https://app.rescript.info/public/share/YKAZzZ6lwZkjTLRpVJreOOxGhLI8y4m3fAyU8NSavx0

2 months ago

15 minutes 45 seconds

Machine Learning Street Talk (MLST)

Three Red Lines We're About to Cross Toward AGI (Daniel Kokotajlo, Gary Marcus, Dan Hendrycks)

What if the most powerful technology in human history is being built by people who openly admit they don't trust each other? In this explosive 2-hour debate, three AI experts pull back the curtain on the shocking psychology driving the race to Artificial General Intelligence—and why the people building it might be the biggest threat of all. Kokotajlo predicts AGI by 2028 based on compute scaling trends. Marcus argues we haven't solved basic cognitive problems from his 2001 research. The stakes? If Kokotajlo is right and Marcus is wrong about safety progress, humanity may have already lost control.

Sponsor messages:

========

Google Gemini: Google Gemini features Veo3, a state-of-the-art AI video generation model in the Gemini app. Sign up at https://gemini.google.com

Tufa AI Labs are hiring for ML Engineers and a Chief Scientist in Zurich/SF. They are top of the ARCv2 leaderboard!

https://tufalabs.ai/

========

Guest Powerhouse

Gary Marcus - Cognitive scientist, author of "Taming Silicon Valley," and AI's most prominent skeptic who's been warning about the same fundamental problems for 25 years (https://garymarcus.substack.com/)

Daniel Kokotajlo - Former OpenAI insider turned whistleblower who reveals the disturbing rationalizations of AI lab leaders in his viral "AI 2027" scenario (https://ai-2027.com/)

Dan Hendrycks - Director of the Center for AI Safety who created the benchmarks used to measure AI progress and argues we have only years, not decades, to prevent catastrophe (https://danhendrycks.com/)

Transcript:

http://app.rescript.info/public/share/tEcx4UkToi-2jwS1cN51CW70A4Eh6QulBRxDILoXOno

TOC:

Introduction: The AI Arms Race

00:00:04 - The Danger of Automated AI R&D

00:00:43 - The Rationalization: "If we don't, someone else will"

00:01:56 - Sponsor Reads (Tufa AI Labs & Google Gemini)

00:02:55 - Guest Introductions

The Philosophical Stakes

00:04:13 - What is the Positive Vision for AGI?

00:07:00 - The Abundance Scenario: Superintelligent Economy

00:09:06 - Differentiating AGI and Superintelligence (ASI)

00:11:41 - Sam Altman: "A Decade in a Month"

00:14:47 - Economic Inequality & The UBI Problem

Policy and Red Lines

00:17:13 - The Pause Letter: Stopping vs. Delaying AI

00:20:03 - Defining Three Concrete Red Lines for AI Development

00:25:24 - Racing Towards Red Lines & The Myth of "Durable Advantage"

00:31:15 - Transparency and Public Perception

00:35:16 - The Rationalization Cascade: Why AI Labs Race to "Win"

Forecasting AGI: Timelines and Methodologies

00:42:29 - The Case for Short Timelines (Median 2028)

00:47:00 - Scaling Limits: Compute, Data, and Money

00:49:36 - Forecasting Models: Bio-Anchors and Agentic Coding

00:53:15 - The 10^45 FLOP Thought Experiment

The Great Debate: Cognitive Gaps vs. Scaling

00:58:41 - Gary Marcus's Counterpoint: The Unsolved Problems of Cognition

01:00:46 - Current AI Can't Play Chess Reliably

01:08:23 - Can Tools and Neurosymbolic AI Fill the Gaps?

01:16:13 - The Multi-Dimensional Nature of Intelligence

01:24:26 - The Benchmark Debate: Data Contamination and Reliability

01:31:15 - The Superhuman Coder Milestone Debate

01:37:45 - The Driverless Car Analogy

The Alignment Problem

01:39:45 - Has Any Progress Been Made on Alignment?

01:42:43 - "Fairly Reasonably Scares the Sh*t Out of Me"

01:46:30 - Distinguishing Model vs. Process Alignment

Scenarios and Conclusions

01:49:26 - Gary's Alternative Scenario: The Neurosymbolic Shift

01:53:35 - Will AI Become Jeff Dean?

01:58:41 - Takeoff Speeds and Exceeding Human Intelligence

02:03:19 - Final Disagreements and Closing Remarks

REFS:

Gary Marcus (2001) - The Algebraic Mind

https://mitpress.mit.edu/9780262632683/the-algebraic-mind/

00:59:00

Gary Marcus & Ernest Davis (2019) - Rebooting AI

https://www.penguinrandomhouse.com/books/566677/rebooting-ai-by-gary-marcus-and-ernest-davis/

01:31:59

Gary Marcus (2024) - Taming SV

https://www.hachettebookgroup.com/titles/gary-marcus/taming-silicon-valley/9781541704091/

00:03:01

3 months ago

2 hours 7 minutes 7 seconds

Machine Learning Street Talk (MLST)

How AI Learned to Talk and What It Means - Prof. Christopher Summerfield

We interview Professor Christopher Summerfield from Oxford University about his new book "These Strange New Minds: How AI Learned to Talk and What It". AI learned to understand the world just by reading text - something scientists thought was impossible. You don't need to see a cat to know what one is; you can learn everything from words alone. This is "the most astonishing scientific discovery of the 21st century."People are split: some refuse to call what AI does "thinking" even when it outperforms humans, while others believe if it acts intelligent, it is intelligent. Summerfield takes the middle ground - AI does something genuinely like human reasoning, but that doesn't make it human.Sponsor messages:========Google Gemini: Google Gemini features Veo3, a state-of-the-art AI video generation model in the Gemini app. Sign up at https://gemini.google.comTufa AI Labs are hiring for ML Engineers and a Chief Scientist in Zurich/SF. They are top of the ARCv2 leaderboard! https://tufalabs.ai/========Table of Contents:Introduction & Setup00:00:00 Superman 3 Metaphor - Humans Absorbed by Machines00:02:01 Book Introduction & AI Debate Context00:03:45 Sponsor Segments (Google Gemini, Tufa Labs)Philosophical Foundations00:04:48 The Fractured AI Discourse00:08:21 Ancient Roots: Aristotle vs Plato (Empiricism vs Rationalism)00:10:14 Historical AI: Symbolic Logic and Its LimitsThe Language Revolution00:12:11 ChatGPT as the Rubicon Moment00:14:00 The Astonishing Discovery: Learning Reality from Words Alone00:15:47 Equivalentists vs Exceptionalists DebateCognitive Science Perspectives00:19:12 Functionalism and the Duck Test00:21:48 Brain-AI Similarities and Computational Principles00:24:53 Reconciling Chomsky: Evolution vs Learning00:28:15 Lamarckian AI vs Darwinian Human LearningThe Reality of AI Capabilities00:30:29 Anthropomorphism and the Clever Hans Effect00:32:56 The Intentional Stance and Nature of Thinking00:37:56 Three Major AI Worries: Agency, Personalization, DynamicsSocietal Risks and Complex Systems00:37:56 AI Agents and Flash Crash Scenarios00:42:50 Removing Frictions: The Lawfare Example00:46:15 Gradual Disempowerment Theory00:49:18 The Faustian Pact of TechnologyHuman Agency and Control00:51:18 The Crisis of Authenticity00:56:22 Psychology of Control vs Reward01:00:21 Dopamine Hacking and Variable ReinforcementFuture Directions01:02:27 Evolution as Goal-less Optimization01:03:31 Open-Endedness and Creative Evolution01:06:46 Writing, Creativity, and AI-Generated Content01:08:18 Closing RemarksREFS:Academic References (Abbreviated)Essential Books"These Strange New Minds" - C. Summerfield [00:02:01] - Main discussion topic"The Mind is Flat" - N. Chater [00:33:45] - Summerfield's favorite on cognitive illusions"AI: A Guide for Thinking Humans" - M. Mitchell [00:04:58] - Host's previous favorite"Principia Mathematica" - Russell & Whitehead [00:11:00] - Logic Theorist reference"Syntactic Structures" - N. Chomsky (1957) [00:13:30] - Generative grammar foundation"Why Greatness Cannot Be Planned" - Stanley & Lehman [01:04:00] - Open-ended evolutionKey Papers & Studies"Gradual Disempowerment" - D. Duvenaud [00:46:45] - AI threat model"Counterfeit People" - D. Dennett (Atlantic) [00:52:45] - AI societal risks"Open-Endedness is Essential..." - DeepMind/Rocktäschel/Hughes [01:03:42]Heider & Simmel (1944) [00:30:45] - Agency attribution to shapesWhitehall Studies - M. Marmot [00:59:32] - Control and health outcomes"Clever Hans" - O. Pfungst (1911) [00:31:47] - Animal intelligence illusionHistorical References"Logic Theorist" - Newell & Simon (1956) [00:10:45] - "First superintelligence""Computing Machinery..." - A. Turing (1950) - AI foundationsDartmouth Conference (1955) - McCarthy et al. - Birth of AI field"Logical Calculus..." - McCulloch & Pitts (1943) - Neural network foundationsPhilosophical Concepts<trunc>

3 months ago

1 hour 8 minutes 28 seconds

Machine Learning Street Talk (MLST)

"Blurring Reality" - Chai's Social AI Platform (SPONSORED)

"Blurring Reality" - Chai's Social AI Platform - sponsored

This episode of MLST explores the groundbreaking work of Chai, a social AI platform that quietly built one of the world's largest AI companion ecosystems before ChatGPT's mainstream adoption. With over 10 million active users and just 13 engineers serving 2 trillion tokens per day, Chai discovered the massive appetite for AI companionship through serendipity while searching for product-market fit.

CHAI sponsored this show *because they want to hire amazing engineers* --

CAREER OPPORTUNITIES AT CHAI

Chai is actively hiring in Palo Alto with competitive compensation ($300K-$800K+ equity) for roles including AI Infrastructure Engineers, Software Engineers, Applied AI Researchers, and more. Fast-track qualification available for candidates with significant product launches, open source contributions, or entrepreneurial success.

https://www.chai-research.com/jobs/

The conversation with founder William Beauchamp and engineers Tom Lu and Nischay Dhankhar covers Chai's innovative technical approaches including reinforcement learning from human feedback (RLHF), model blending techniques that combine smaller models to outperform larger ones, and their unique infrastructure challenges running exaflop-class compute.

SPONSOR MESSAGES:

***

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers in Zurich and SF.

Goto https://tufalabs.ai/

***

Key themes explored include:

- The ethics of AI engagement optimization and attention hacking

- Content moderation at scale with a lean engineering team

- The shift from AI as utility tool to AI as social companion

- How users form deep emotional bonds with artificial intelligence

- The broader implications of AI becoming a social medium

We also examine OpenAI's recent pivot toward companion AI with April's new GPT-4o, suggesting a fundamental shift in how we interact with artificial intelligence - from utility-focused tools to companion-like experiences that blur the lines between human and artificial intimacy.

The episode also covers Chai's unconventional approach to hiring only top-tier engineers, their bootstrap funding strategy focused on user revenue over VC funding, and their rapid experimentation culture where one in five experiments succeed.

TOC:

00:00:00 - Introduction: Steve Jobs' AI Vision & Chai's Scale

00:04:02 - Chapter 1: Simulators - The Birth of Social AI

00:13:34 - Chapter 2: Engineering at Chai - RLHF & Model Blending

00:21:49 - Chapter 3: Social Impact of GenAI - Ethics & Safety

00:33:55 - Chapter 4: The Lean Machine - 13 Engineers, Millions of Users

00:42:38 - Chapter 5: GPT-4o Becoming a Companion - OpenAI's Pivot

00:50:10 - Chapter 6: What Comes Next - The Future of AI Intimacy

TRANSCRIPT: https://www.dropbox.com/scl/fi/yz2ewkzmwz9rbbturfbap/CHAI.pdf?rlkey=uuyk2nfhjzezucwdgntg5ubqb&dl=0

4 months ago

50 minutes 59 seconds

Machine Learning Street Talk (MLST)

Google AlphaEvolve - Discovering new science (exclusive interview)

Today GoogleDeepMind released AlphaEvolve: a Gemini coding agent for algorithm discovery. It beat the famous Strassen algorithm for matrix multiplication set 56 years ago. Google has been killing it recently. We had early access to the paper and interviewed the researchers behind the work.

AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms

https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/

Authors: Alexander Novikov*, Ngân Vũ*, Marvin Eisenberger*, Emilien Dupont*, Po-Sen Huang*, Adam Zsolt Wagner*, Sergey Shirobokov*, Borislav Kozlovskii*, Francisco J. R. Ruiz, Abbas Mehrabian, M. Pawan Kumar, Abigail See, Swarat Chaudhuri, George Holland, Alex Davies, Sebastian Nowozin, Pushmeet Kohli, Matej Balog*

(* indicates equal contribution or special designation, if defined elsewhere)

SPONSOR MESSAGES:

***

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

Goto https://tufalabs.ai/

***

AlphaEvolve works like a very smart, tireless programmer. It uses powerful AI language models (like Gemini) to generate ideas for computer code. Then, it uses an "evolutionary" process – like survival of the fittest for programs. It tries out many different program ideas, automatically tests how well they solve a problem, and then uses the best ones to inspire new, even better programs.

Beyond this mathematical breakthrough, AlphaEvolve has already been used to improve real-world systems at Google, such as making their massive data centers run more efficiently and even speeding up the training of the AI models that power AlphaEvolve itself. The discussion also covers how humans work with AlphaEvolve, the challenges of making AI discover things, and the exciting future of AI helping scientists make new discoveries.

In short, AlphaEvolve is a powerful new AI tool that can invent new algorithms and solve complex problems, showing how AI can be a creative partner in science and engineering.

Guests:

Matej Balog: https://x.com/matejbalog

Alexander Novikov: https://x.com/SashaVNovikov

REFS:

MAP Elites [Jean-Baptiste Mouret, Jeff Clune]

https://arxiv.org/abs/1504.04909

FunSearch [Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M. Pawan Kumar, Emilien Dupont, Francisco J. R. Ruiz, Jordan S. Ellenberg, Pengming Wang, Omar Fawzi, Pushmeet Kohli & Alhussein Fawzi]

https://www.nature.com/articles/s41586-023-06924-6

TOC:

[00:00:00] Introduction: Alpha Evolve's Breakthroughs, DeepMind's Lineage, and Real-World Impact

[00:12:06] Introducing AlphaEvolve: Concept, Evolutionary Algorithms, and Architecture

[00:16:56] Search Challenges: The Halting Problem and Enabling Creative Leaps

[00:23:20] Knowledge Augmentation: Self-Generated Data, Meta-Prompting, and Library Learning

[00:29:08] Matrix Multiplication Breakthrough: From Strassen to AlphaEvolve's 48 Multiplications

[00:39:11] Problem Representation: Direct Solutions, Constructors, and Search Algorithms

[00:46:06] Developer Reflections: Surprising Outcomes and Superiority over Simple LLM Sampling

[00:51:42] Algorithmic Improvement: Hill Climbing, Program Synthesis, and Intelligibility

[01:00:24] Real-World Application: Complex Evaluations and Robotics

[01:05:39] Role of LLMs & Future: Advanced Models, Recursive Self-Improvement, and Human-AI Collaboration

[01:11:22] Resource Considerations: Compute Costs of AlphaEvolve

This is a trial of posting videos on Spotify, thoughts? Email me or chat in our Discord

4 months ago

1 hour 13 minutes 58 seconds

Machine Learning Street Talk (MLST)

Prof. Randall Balestriero - LLMs without pretraining and SSL

Randall Balestriero joins the show to discuss some counterintuitive findings in AI. He shares research showing that huge language models, even when started from scratch (randomly initialized) without massive pre-training, can learn specific tasks like sentiment analysis surprisingly well, train stably, and avoid severe overfitting, sometimes matching the performance of costly pre-trained models. This raises questions about when giant pre-training efforts are truly worth it.

He also talks about how self-supervised learning (where models learn from data structure itself) and traditional supervised learning (using labeled data) are fundamentally similar, allowing researchers to apply decades of supervised learning theory to improve newer self-supervised methods.

Finally, Randall touches on fairness in AI models used for Earth data (like climate prediction), revealing that these models can be biased, performing poorly in specific locations like islands or coastlines even if they seem accurate overall, which has important implications for policy decisions based on this data.

SPONSOR MESSAGES:

***

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

Goto https://tufalabs.ai/

***

TRANSCRIPT + SHOWNOTES:

https://www.dropbox.com/scl/fi/n7yev71nsjso71jyjz1fy/RANDALLNEURIPS.pdf?rlkey=0dn4injp1sc4ts8njwf3wfmxv&dl=0

TOC:

1. Model Training Efficiency and Scale

[00:00:00] 1.1 Training Stability of Large Models on Small Datasets

[00:04:09] 1.2 Pre-training vs Random Initialization Performance Comparison

[00:07:58] 1.3 Task-Specific Models vs General LLMs Efficiency

2. Learning Paradigms and Data Distribution

[00:10:35] 2.1 Fair Language Model Paradox and Token Frequency Issues

[00:12:02] 2.2 Pre-training vs Single-task Learning Spectrum

[00:16:04] 2.3 Theoretical Equivalence of Supervised and Self-supervised Learning

[00:19:40] 2.4 Self-Supervised Learning and Supervised Learning Relationships

[00:21:25] 2.5 SSL Objectives and Heavy-tailed Data Distribution Challenges

3. Geographic Representation in ML Systems

[00:25:20] 3.1 Geographic Bias in Earth Data Models and Neural Representations

[00:28:10] 3.2 Mathematical Limitations and Model Improvements

[00:30:24] 3.3 Data Quality and Geographic Bias in ML Datasets

REFS:

[00:01:40] Research on training large language models from scratch on small datasets, Randall Balestriero et al.

https://openreview.net/forum?id=wYGBWOjq1Q

[00:10:35] The Fair Language Model Paradox (2024), Andrea Pinto, Tomer Galanti, Randall Balestriero

https://arxiv.org/abs/2410.11985

[00:12:20] Muppet: Massive Multi-task Representations with Pre-Finetuning (2021), Armen Aghajanyan et al.

https://arxiv.org/abs/2101.11038

[00:14:30] Dissociating language and thought in large language models (2023), Kyle Mahowald et al.

https://arxiv.org/abs/2301.06627

[00:16:05] The Birth of Self-Supervised Learning: A Supervised Theory, Randall Balestriero et al.

https://openreview.net/forum?id=NhYAjAAdQT

[00:21:25] VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning, Adrien Bardes, Jean Ponce, Yann LeCun

https://arxiv.org/abs/2105.04906

[00:25:20] No Location Left Behind: Measuring and Improving the Fairness of Implicit Representations for Earth Data (2025), Daniel Cai, Randall Balestriero, et al.

https://arxiv.org/abs/2502.06831

[00:33:45] Mark Ibrahim et al.'s work on geographic bias in computer vision datasets, Mark Ibrahim

https://arxiv.org/pdf/2304.12210

5 months ago

34 minutes 30 seconds

Machine Learning Street Talk (MLST)

How Machines Learn to Ignore the Noise (Kevin Ellis + Zenna Tavares)

Prof. Kevin Ellis and Dr. Zenna Tavares talk about making AI smarter, like humans. They want AI to learn from just a little bit of information by actively trying things out, not just by looking at tons of data.

They discuss two main ways AI can "think": one way is like following specific rules or steps (like a computer program), and the other is more intuitive, like guessing based on patterns (like modern AI often does). They found combining both methods works well for solving complex puzzles like ARC.

A key idea is "compositionality" - building big ideas from small ones, like LEGOs. This is powerful but can also be overwhelming. Another important idea is "abstraction" - understanding things simply, without getting lost in details, and knowing there are different levels of understanding.

Ultimately, they believe the best AI will need to explore, experiment, and build models of the world, much like humans do when learning something new.

SPONSOR MESSAGES:

***

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

Goto https://tufalabs.ai/

***

TRANSCRIPT:

https://www.dropbox.com/scl/fi/3ngggvhb3tnemw879er5y/BASIS.pdf?rlkey=lr2zbj3317mex1q5l0c2rsk0h&dl=0

Zenna Tavares:

http://www.zenna.org/

Kevin Ellis:

https://www.cs.cornell.edu/~ellisk/

TOC:

1. Compositionality and Learning Foundations

[00:00:00] 1.1 Compositional Search and Learning Challenges

[00:03:55] 1.2 Bayesian Learning and World Models

[00:12:05] 1.3 Programming Languages and Compositionality Trade-offs

[00:15:35] 1.4 Inductive vs Transductive Approaches in AI Systems

2. Neural-Symbolic Program Synthesis

[00:27:20] 2.1 Integration of LLMs with Traditional Programming and Meta-Programming

[00:30:43] 2.2 Wake-Sleep Learning and DreamCoder Architecture

[00:38:26] 2.3 Program Synthesis from Interactions and Hidden State Inference

[00:41:36] 2.4 Abstraction Mechanisms and Resource Rationality

[00:48:38] 2.5 Inductive Biases and Causal Abstraction in AI Systems

3. Abstract Reasoning Systems

[00:52:10] 3.1 Abstract Concepts and Grid-Based Transformations in ARC

[00:56:08] 3.2 Induction vs Transduction Approaches in Abstract Reasoning

[00:59:12] 3.3 ARC Limitations and Interactive Learning Extensions

[01:06:30] 3.4 Wake-Sleep Program Learning and Hybrid Approaches

[01:11:37] 3.5 Project MARA and Future Research Directions

REFS:

[00:00:25] DreamCoder, Kevin Ellis et al.

https://arxiv.org/abs/2006.08381

[00:01:10] Mind Your Step, Ryan Liu et al.

https://arxiv.org/abs/2410.21333

[00:06:05] Bayesian inference, Griffiths, T. L., Kemp, C., & Tenenbaum, J. B.

https://psycnet.apa.org/record/2008-06911-003

[00:13:00] Induction and Transduction, Wen-Ding Li, Zenna Tavares, Yewen Pu, Kevin Ellis

https://arxiv.org/abs/2411.02272

[00:23:15] Neurosymbolic AI, Garcez, Artur d'Avila et al.

https://arxiv.org/abs/2012.05876

[00:33:50] Induction and Transduction (II), Wen-Ding Li, Kevin Ellis et al.

https://arxiv.org/abs/2411.02272

[00:38:35] ARC, François Chollet

https://arxiv.org/abs/1911.01547

[00:39:20] Causal Reactive Programs, Ria Das, Joshua B. Tenenbaum, Armando Solar-Lezama, Zenna Tavares

http://www.zenna.org/publications/autumn2022.pdf

[00:42:50] MuZero, Julian Schrittwieser et al.

http://arxiv.org/pdf/1911.08265

[00:43:20] VisualPredicator, Yichao Liang

https://arxiv.org/abs/2410.23156

[00:48:55] Bayesian models of cognition, Joshua B. Tenenbaum

https://mitpress.mit.edu/9780262049412/bayesian-models-of-cognition/

[00:49:30] The Bitter Lesson, Rich Sutton

http://www.incompleteideas.net/IncIdeas/BitterLesson.html

[01:06:35] Program induction, Kevin Ellis, Wen-Ding Li

https://arxiv.org/pdf/2411.02272

[01:06:50] DreamCoder (II), Kevin Ellis et al.

https://arxiv.org/abs/2006.08381

[01:11:55] Project MARA, Zenna Tavares, Kevin Ellis

https://www.basis.ai/blog/mara/

5 months ago

1 hour 16 minutes 55 seconds

Machine Learning Street Talk (MLST)

Eiso Kant (CTO poolside) - Superhuman Coding Is Coming!

Eiso Kant, CTO of poolside AI, discusses the company's approach to building frontier AI foundation models, particularly focused on software development. Their unique strategy is reinforcement learning from code execution feedback which is an important axis for scaling AI capabilities beyond just increasing model size or data volume. Kant predicts human-level AI in knowledge work could be achieved within 18-36 months, outlining poolside's vision to dramatically increase software development productivity and accessibility.

SPONSOR MESSAGES:

***

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

Goto https://tufalabs.ai/

***

Eiso Kant:

https://x.com/eisokant

https://poolside.ai/

TRANSCRIPT:

https://www.dropbox.com/scl/fi/szepl6taqziyqie9wgmk9/poolside.pdf?rlkey=iqar7dcwshyrpeoz0xa76k422&dl=0

TOC:

1. Foundation Models and AI Strategy

[00:00:00] 1.1 Foundation Models and Timeline Predictions for AI Development

[00:02:55] 1.2 Poolside AI's Corporate History and Strategic Vision

[00:06:48] 1.3 Foundation Models vs Enterprise Customization Trade-offs

2. Reinforcement Learning and Model Economics

[00:15:42] 2.1 Reinforcement Learning and Code Execution Feedback Approaches

[00:22:06] 2.2 Model Economics and Experimental Optimization

3. Enterprise AI Implementation

[00:25:20] 3.1 Poolside's Enterprise Deployment Strategy and Infrastructure

[00:26:00] 3.2 Enterprise-First Business Model and Market Focus

[00:27:05] 3.3 Foundation Models and AGI Development Approach

[00:29:24] 3.4 DeepSeek Case Study and Infrastructure Requirements

4. LLM Architecture and Performance

[00:30:15] 4.1 Distributed Training and Hardware Architecture Optimization

[00:33:01] 4.2 Model Scaling Strategies and Chinchilla Optimality Trade-offs

[00:36:04] 4.3 Emergent Reasoning and Model Architecture Comparisons

[00:43:26] 4.4 Balancing Creativity and Determinism in AI Models

[00:50:01] 4.5 AI-Assisted Software Development Evolution

5. AI Systems Engineering and Scalability

[00:58:31] 5.1 Enterprise AI Productivity and Implementation Challenges

[00:58:40] 5.2 Low-Code Solutions and Enterprise Hiring Trends

[01:01:25] 5.3 Distributed Systems and Engineering Complexity

[01:01:50] 5.4 GenAI Architecture and Scalability Patterns

[01:01:55] 5.5 Scaling Limitations and Architectural Patterns in AI Code Generation

6. AI Safety and Future Capabilities

[01:06:23] 6.1 Semantic Understanding and Language Model Reasoning Approaches

[01:12:42] 6.2 Model Interpretability and Safety Considerations in AI Systems

[01:16:27] 6.3 AI vs Human Capabilities in Software Development

[01:33:45] 6.4 Enterprise Deployment and Security Architecture

CORE REFS (see shownotes for URLs/more refs):

[00:15:45] Research demonstrating how training on model-generated content leads to distribution collapse in AI models, Ilia Shumailov et al. (Key finding on synthetic data risk)

[00:20:05] Foundational paper introducing Word2Vec for computing word vector representations, Tomas Mikolov et al. (Seminal NLP technique)

[00:22:15] OpenAI O3 model's breakthrough performance on ARC Prize Challenge, OpenAI (Significant AI reasoning benchmark achievement)

[00:22:40] Seminal paper proposing a formal definition of intelligence as skill-acquisition efficiency, François Chollet (Influential AI definition/philosophy)

[00:30:30] Technical documentation of DeepSeek's V3 model architecture and capabilities, DeepSeek AI (Details on a major new model)

[00:34:30] Foundational paper establishing optimal scaling laws for LLM training, Jordan Hoffmann et al. (Key paper on LLM scaling)

[00:45:45] Seminal essay arguing that scaling computation consistently trumps human-engineered solutions in AI, Richard S. Sutton (Influential "Bitter Lesson" perspective)

5 months ago

1 hour 36 minutes 28 seconds

Machine Learning Street Talk (MLST)

The Compendium - Connor Leahy and Gabriel Alfour

Connor Leahy and Gabriel Alfour, AI researchers from Conjecture and authors of "The Compendium," joinus for a critical discussion centered on Artificial Superintelligence (ASI) safety and governance. Drawing from their comprehensive analysis in "The Compendium," they articulate a stark warning about the existential risks inherent in uncontrolled AI development, framing it through the lens of "intelligence domination"—where a sufficiently advanced AI could subordinate humanity, much like humans dominate less intelligent species.

SPONSOR MESSAGES:

***

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

Goto https://tufalabs.ai/

***

TRANSCRIPT + REFS + NOTES:

https://www.dropbox.com/scl/fi/p86l75y4o2ii40df5t7no/Compendium.pdf?rlkey=tukczgf3flw133sr9rgss0pnj&dl=0

https://www.thecompendium.ai/

https://en.wikipedia.org/wiki/Connor_Leahy

https://www.conjecture.dev/about

https://substack.com/@gabecc

TOC:

1. AI Intelligence and Safety Fundamentals

[00:00:00] 1.1 Understanding Intelligence and AI Capabilities

[00:06:20] 1.2 Emergence of Intelligence and Regulatory Challenges

[00:10:18] 1.3 Human vs Animal Intelligence Debate

[00:18:00] 1.4 AI Regulation and Risk Assessment Approaches

[00:26:14] 1.5 Competing AI Development Ideologies

2. Economic and Social Impact

[00:29:10] 2.1 Labor Market Disruption and Post-Scarcity Scenarios

[00:32:40] 2.2 Institutional Frameworks and Tech Power Dynamics

[00:37:40] 2.3 Ethical Frameworks and AI Governance Debates

[00:40:52] 2.4 AI Alignment Evolution and Technical Challenges

3. Technical Governance Framework

[00:55:07] 3.1 Three Levels of AI Safety: Alignment, Corrigibility, and Boundedness

[00:55:30] 3.2 Challenges of AI System Corrigibility and Constitutional Models

[00:57:35] 3.3 Limitations of Current Boundedness Approaches

[00:59:11] 3.4 Abstract Governance Concepts and Policy Solutions

4. Democratic Implementation and Coordination

[00:59:20] 4.1 Governance Design and Measurement Challenges

[01:00:10] 4.2 Democratic Institutions and Experimental Governance

[01:14:10] 4.3 Political Engagement and AI Safety Advocacy

[01:25:30] 4.4 Practical AI Safety Measures and International Coordination

CORE REFS:

[00:01:45] The Compendium (2023), Leahy et al.

https://pdf.thecompendium.ai/the_compendium.pdf

[00:06:50] Geoffrey Hinton Leaves Google, BBC News

https://www.bbc.com/news/world-us-canada-65452940

[00:10:00] ARC-AGI, Chollet

https://arcprize.org/arc-agi

[00:13:25] A Brief History of Intelligence, Bennett

https://www.amazon.com/Brief-History-Intelligence-Humans-Breakthroughs/dp/0063286343

[00:25:35] Statement on AI Risk, Center for AI Safety

https://www.safe.ai/work/statement-on-ai-risk

[00:26:15] Machines of Love and Grace, Amodei

https://darioamodei.com/machines-of-loving-grace

[00:26:35] The Techno-Optimist Manifesto, Andreessen

https://a16z.com/the-techno-optimist-manifesto/

[00:31:55] Techno-Feudalism, Varoufakis

https://www.amazon.co.uk/Technofeudalism-Killed-Capitalism-Yanis-Varoufakis/dp/1847927270

[00:42:40] Introducing Superalignment, OpenAI

https://openai.com/index/introducing-superalignment/

[00:47:20] Three Laws of Robotics, Asimov

https://www.britannica.com/topic/Three-Laws-of-Robotics

[00:50:00] Symbolic AI (GOFAI), Haugeland

https://en.wikipedia.org/wiki/Symbolic_artificial_intelligence

[00:52:30] Intent Alignment, Christiano

https://www.alignmentforum.org/posts/HEZgGBZTpT4Bov7nH/mapping-the-conceptual-territory-in-ai-existential-safety

[00:55:10] Large Language Model Alignment: A Survey, Jiang et al.

http://arxiv.org/pdf/2309.15025

[00:55:40] Constitutional Checks and Balances, Bok

https://plato.stanford.edu/entries/montesquieu/

<trunc, see PDF>

5 months ago

1 hour 37 minutes 10 seconds

Machine Learning Street Talk (MLST)

ARC Prize v2 Launch! (Francois Chollet and Mike Knoop)

We are joined by Francois Chollet and Mike Knoop, to launch the new version of the ARC prize! In version 2, the challenges have been calibrated with humans such that at least 2 humans could solve each task in a reasonable task, but also adversarially selected so that frontier reasoning models can't solve them. The best LLMs today get negligible performance on this challenge.

https://arcprize.org/

SPONSOR MESSAGES:

***

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

Goto https://tufalabs.ai/

***

TRANSCRIPT:

https://www.dropbox.com/scl/fi/0v9o8xcpppdwnkntj59oi/ARCv2.pdf?rlkey=luqb6f141976vra6zdtptv5uj&dl=0

TOC:

1. ARC v2 Core Design & Objectives

[00:00:00] 1.1 ARC v2 Launch and Benchmark Architecture

[00:03:16] 1.2 Test-Time Optimization and AGI Assessment

[00:06:24] 1.3 Human-AI Capability Analysis

[00:13:02] 1.4 OpenAI o3 Initial Performance Results

2. ARC Technical Evolution

[00:17:20] 2.1 ARC-v1 to ARC-v2 Design Improvements

[00:21:12] 2.2 Human Validation Methodology

[00:26:05] 2.3 Task Design and Gaming Prevention

[00:29:11] 2.4 Intelligence Measurement Framework

3. O3 Performance & Future Challenges

[00:38:50] 3.1 O3 Comprehensive Performance Analysis

[00:43:40] 3.2 System Limitations and Failure Modes

[00:49:30] 3.3 Program Synthesis Applications

[00:53:00] 3.4 Future Development Roadmap

REFS:

[00:00:15] On the Measure of Intelligence, François Chollet

https://arxiv.org/abs/1911.01547

[00:06:45] ARC Prize Foundation, François Chollet, Mike Knoop

https://arcprize.org/

[00:12:50] OpenAI o3 model performance on ARC v1, ARC Prize Team

https://arcprize.org/blog/oai-o3-pub-breakthrough

[00:18:30] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Jason Wei et al.

https://arxiv.org/abs/2201.11903

[00:21:45] ARC-v2 benchmark tasks, Mike Knoop

https://arcprize.org/blog/introducing-arc-agi-public-leaderboard

[00:26:05] ARC Prize 2024: Technical Report, Francois Chollet et al.

https://arxiv.org/html/2412.04604v2

[00:32:45] ARC Prize 2024 Technical Report, Francois Chollet, Mike Knoop, Gregory Kamradt

https://arxiv.org/abs/2412.04604

[00:48:55] The Bitter Lesson, Rich Sutton

http://www.incompleteideas.net/IncIdeas/BitterLesson.html

[00:53:30] Decoding strategies in neural text generation, Sina Zarrieß

https://www.mdpi.com/2078-2489/12/9/355/pdf

6 months ago

54 minutes 15 seconds

Machine Learning Street Talk (MLST)

Test-Time Adaptation: the key to reasoning with DL (Mohamed Osman)

Mohamed Osman joins to discuss MindsAI's highest scoring entry to the ARC challenge 2024 and the paradigm of test-time fine-tuning. They explore how the team, now part of Tufa Labs in Zurich, achieved state-of-the-art results using a combination of pre-training techniques, a unique meta-learning strategy, and an ensemble voting mechanism. Mohamed emphasizes the importance of raw data input and flexibility of the network.

SPONSOR MESSAGES:

***

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

Goto https://tufalabs.ai/

***

TRANSCRIPT + REFS:

https://www.dropbox.com/scl/fi/jeavyqidsjzjgjgd7ns7h/MoFInal.pdf?rlkey=cjjmo7rgtenxrr3b46nk6yq2e&dl=0

Mohamed Osman (Tufa Labs)

https://x.com/MohamedOsmanML

Jack Cole (Tufa Labs)

https://x.com/MindsAI_Jack

How and why deep learning for ARC paper:

https://github.com/MohamedOsman1998/deep-learning-for-arc/blob/main/deep_learning_for_arc.pdf

TOC:

1. Abstract Reasoning Foundations

[00:00:00] 1.1 Test-Time Fine-Tuning and ARC Challenge Overview

[00:10:20] 1.2 Neural Networks vs Programmatic Approaches to Reasoning

[00:13:23] 1.3 Code-Based Learning and Meta-Model Architecture

[00:20:26] 1.4 Technical Implementation with Long T5 Model

2. ARC Solution Architectures

[00:24:10] 2.1 Test-Time Tuning and Voting Methods for ARC Solutions

[00:27:54] 2.2 Model Generalization and Function Generation Challenges

[00:32:53] 2.3 Input Representation and VLM Limitations

[00:36:21] 2.4 Architecture Innovation and Cross-Modal Integration

[00:40:05] 2.5 Future of ARC Challenge and Program Synthesis Approaches

3. Advanced Systems Integration

[00:43:00] 3.1 DreamCoder Evolution and LLM Integration

[00:50:07] 3.2 MindsAI Team Progress and Acquisition by Tufa Labs

[00:54:15] 3.3 ARC v2 Development and Performance Scaling

[00:58:22] 3.4 Intelligence Benchmarks and Transformer Limitations

[01:01:50] 3.5 Neural Architecture Optimization and Processing Distribution

REFS:

[00:01:32] Original ARC challenge paper, François Chollet

https://arxiv.org/abs/1911.01547

[00:06:55] DreamCoder, Kevin Ellis et al.

https://arxiv.org/abs/2006.08381

[00:12:50] Deep Learning with Python, François Chollet

https://www.amazon.com/Deep-Learning-Python-Francois-Chollet/dp/1617294438

[00:13:35] Deep Learning with Python, François Chollet

https://www.amazon.com/Deep-Learning-Python-Francois-Chollet/dp/1617294438

[00:13:35] Influence of pretraining data for reasoning, Laura Ruis

https://arxiv.org/abs/2411.12580

[00:17:50] Latent Program Networks, Clement Bonnet

https://arxiv.org/html/2411.08706v1

[00:20:50] T5, Colin Raffel et al.

https://arxiv.org/abs/1910.10683

[00:30:30] Combining Induction and Transduction for Abstract Reasoning, Wen-Ding Li, Kevin Ellis et al.

https://arxiv.org/abs/2411.02272

[00:34:15] Six finger problem, Chen et al.

https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_SpatialVLM_Endowing_Vision-Language_Models_with_Spatial_Reasoning_Capabilities_CVPR_2024_paper.pdf

[00:38:15] DeepSeek-R1-Distill-Llama, DeepSeek AI

https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B

[00:40:10] ARC Prize 2024 Technical Report, François Chollet et al.

https://arxiv.org/html/2412.04604v2

[00:45:20] LLM-Guided Compositional Program Synthesis, Wen-Ding Li and Kevin Ellis

https://arxiv.org/html/2503.15540

[00:54:25] Abstraction and Reasoning Corpus, François Chollet

https://github.com/fchollet/ARC-AGI

[00:57:10] O3 breakthrough on ARC-AGI, OpenAI

https://arcprize.org/

[00:59:35] ConceptARC Benchmark, Arseny Moskvichev, Melanie Mitchell

https://arxiv.org/abs/2305.07141

[01:02:05] Mixtape: Breaking the Softmax Bottleneck Efficiently, Yang, Zhilin and Dai, Zihang and Salakhutdinov, Ruslan and Cohen, William W.

http://papers.neurips.cc/paper/9723-mixtape-breaking-the-softmax-bottleneck-efficiently.pdf

6 months ago

1 hour 3 minutes 36 seconds