Home
Categories
EXPLORE
True Crime
Comedy
Business
Society & Culture
Sports
Technology
History
About Us
Contact Us
Copyright
© 2024 PodJoint
Podjoint Logo
US
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/19/3c/37/193c377f-b4f3-4287-08d5-858d165cbe27/mza_18260133791311462652.jpg/600x600bb.jpg
The Turing Podcast
Turing
5 episodes
1 week ago
The Turing Podcast is a weekly dispatch from Founder & CEO Jonathan Siddharth and Turing experts on the future of AI infrastructure, research, and frontier model development. Each episode explores how labs are building, training, and evaluating the next generation of intelligent systems, with an eye on speed, neutrality, and real-world scale.
Show more...
Technology
RSS
All content for The Turing Podcast is the property of Turing and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
The Turing Podcast is a weekly dispatch from Founder & CEO Jonathan Siddharth and Turing experts on the future of AI infrastructure, research, and frontier model development. Each episode explores how labs are building, training, and evaluating the next generation of intelligent systems, with an eye on speed, neutrality, and real-world scale.
Show more...
Technology
https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/43873867/43873867-1754155438075-488e6b048952b.jpg
GPT-5 and SWE-Bench: A Launchpad for O5-Level Code Reasoning
The Turing Podcast
32 minutes 23 seconds
3 months ago
GPT-5 and SWE-Bench: A Launchpad for O5-Level Code Reasoning

In this episode, Lilin Wang, Engineering Director at Turing, discusses SWE Bench, a benchmark designed to evaluate the software engineering reasoning capabilities of large language models. She explores the motivation behind SWE Bench, its structure, and how it differs from traditional coding benchmarks. Lilin explains Turing's approach to enhancing model performance through data expansion and trajectory data, as well as the challenges posed by SWE Bench compared to other benchmarks. The episode concludes with insights into the future of software engineering with AI and the evolving role of engineers.


Highlights


  • SWE Bench evaluates the capability of large language models in real-world software engineering tasks.
  • The benchmark moves beyond simple coding tasks to include bug fixing and feature development.
  • SWE Bench leverages high-quality data from GitHub repositories for evaluation.
  • The model's ability to understand context is crucial for solving complex problems
  • Turing aims to expand the SWE Bench dataset for better model training.
  • Trajectory data helps in understanding and correcting model failures.
  • SWE Bench presents unique challenges compared to other benchmarks like Human Eval.
  • The future of software engineering may see models acting as junior engineers.
  • Engineers will shift to supervisory roles, focusing on high-level planning.
  • Improving model capabilities will enhance efficiency in software development.


Chapters

00:00 Introduction and Model Breaking Prompts

03:52 Understanding SWE Bench: Motivation and Structure

06:58 Evaluating Tasks: Solvable vs. Hard

10:04 Turing's Approach to Multi-Step Code Reasoning

16:23 Challenges of SweetBench vs. Other Benchmarks

20:16 Future of AI in Software Engineering

27:04 Conclusion and Future Prospects


The Turing Podcast
The Turing Podcast is a weekly dispatch from Founder & CEO Jonathan Siddharth and Turing experts on the future of AI infrastructure, research, and frontier model development. Each episode explores how labs are building, training, and evaluating the next generation of intelligent systems, with an eye on speed, neutrality, and real-world scale.