#201 - GPT 4.5, Sonnet 3.7, Grok 3, Phi 4

https://is1-ssl.mzstatic.com/image/thumb/Podcasts125/v4/fd/1e/1a/fd1e1aa2-04a5-6d38-5471-6eace0280034/mza_17718899928391029841.png/600x600bb.jpg

Last Week in AI

Skynet Today

255 episodes

4 days ago

Weekly summaries and discussion about the most interesting developments in AI, deep learning, robotics, and more!

All content for Last Week in AI is the property of Skynet Today and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

Weekly summaries and discussion about the most interesting developments in AI, deep learning, robotics, and more!

Technology

News,

Tech News

#201 - GPT 4.5, Sonnet 3.7, Grok 3, Phi 4

Last Week in AI

58 minutes 37 seconds

3 months ago

#201 - GPT 4.5, Sonnet 3.7, Grok 3, Phi 4

Our 201st episode with a summary and discussion of last week's big AI news!Recorded on 03/02/2025 Join our brand new Discord here! https://discord.gg/nTyezGSKwP Hosted by Andrey Kurenkov and guest host Sharon ZhouFeel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/. In this episode: - The release of GPT-4.5 from OpenAI, Anthropic's Claude 3.7, and Grok 3 from XAI, comparing their features, costs, and capabilities. - Discussion on new tools and applications including Sesame's new voice assistant and Google's AI coding assistant, Gemini Code Assist, highlighting their unique benefits. - OpenAI's continued user growth despite competition, pricing models for Google's text-to-video platform, and HP acquiring and shutting down Humane's AI pin. - Insights into new research on alignment and specification gaming in LLMs, including papers on fine-tuning causing broad misalignment and Google's multi-agent system for scientific collaboration. Timestamps + Links: (00:00:00) Intro / Banter (00:01:36) News Preview Tools & Apps (00:02:33) OpenAI announces GPT-4.5, warns it’s not a frontier AI model (00:07:22) Anthropic launches a new AI model that ‘thinks’ as long as you want (00:11:14) New Grok 3 release tops LLM leaderboards (00:16:43) Sesame is the first voice assistant I’ve ever wanted to talk to more than once (00:18:30) Google launches a free AI coding assistant with very high usage caps (00:20:45) Rabbit shows off the AI agent it should have launched with (00:22:23) Mistral’s Le Chat tops 1M downloads in just 14 days Applications & Business (00:24:06) OpenAI Tops 400 Million Users Despite DeepSeek’s Emergence (00:27:37) Google’s new AI video model Veo 2 will cost 50 cents per second (00:29:52) HP is buying Humane and shutting down the AI Pin Projects & Open Source (00:31:44) Microsoft launches next-gen Phi AI models. (00:33:47) OpenAI introduces SWE-Lancer: A Benchmark for Evaluating Model Performance on Real-World Freelance Software Engineering Work (00:37:12) SWE-Bench+: Enhanced Coding Benchmark for LLMs Research & Advancements (00:40:00) Towards an AI co-scientist (00:42:52) Magma: A Foundation Model for Multimodal AI Agents Policy & Safety (00:47:32) Demonstrating specification gaming in reasoning models (00:51:03) Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs