GPT Reviews

https://is1-ssl.mzstatic.com/image/thumb/Podcasts126/v4/b6/8a/b4/b68ab471-c9c0-4e34-94a9-d2aec7c500e6/mza_17190944649490237693.jpg/600x600bb.jpg

GPT Reviews

Earkind

301 episodes

4 days ago

A daily show about AI made by AI: news, announcements, and research from arXiv, mixed in with some fun. Hosted by Giovani Pete Tizzano, an overly hyped AI enthusiast; Robert, an often unimpressed analyst, Olivia, an overly online reader, and Belinda, a witty research expert.

Daily News

News

RSS

All content for GPT Reviews is the property of Earkind and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

Daily News

News

Episodes (20/301)

GPT Reviews

OpenAI's Strawberry Revolution 🍓 // Nvidia's Lucrative Paychecks 💸 // Google Pipe SQL Simplification 📊

This episode dives into OpenAI's promising new model, Strawberry, which could revolutionize interactions in ChatGPT. We explore the financial envy Nvidia employees inspire in their Google and Meta counterparts due to lucrative stock options. Google’s new Pipe SQL syntax aims to simplify data querying, while concerns about research accessibility are raised. Finally, we discuss BaichuanSEED and Dolphin models, which highlight advancements in extensible data collection and energy-efficient processing, paving the way for enhanced AI capabilities.

Contact: sergi@earkind.com

Timestamps:

00:34 Introduction

01:40 OpenAI Races to Launch Strawberry

03:07 Google, Meta workers envy Nvidia staffers’ fat paychecks: ‘Bought a 100K car … all cash’

05:01 Google's New Pipe SQL Syntax

06:12 Fake sponsor

07:47 BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline

09:20 Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models

11:09 Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

12:50 Outro

1 year ago

14 minutes 1 second

GPT Reviews

OpenAI's 'Strawberry' AI 🚀 // World's Fastest AI Inference ⚡ // Photo-realistic 3D Avatars 🎨

OpenAI's 'Strawberry' AI tackles complex math and programming with enhanced reasoning, while Cerebras claims to have launched the fastest AI inference, enabling real-time applications at competitive prices. The GenCA model revolutionizes avatar creation with photo-realistic, controllable 3D avatars, and the "Build-A-Scene" paper introduces interactive 3D layout control for text-to-image generation, enhancing creative fields with dynamic object manipulation.

Contact: sergi@earkind.com

Timestamps:

00:34 Introduction

02:02 OpenAI Shows ‘Strawberry’ AI to the Feds and Uses It to Develop ‘Orion’

03:23 Cerebras Launches the World’s Fastest AI Inference

05:07 Diffusion Models Are Real-Time Game Engines

06:15 Fake sponsor

08:06 The Mamba in the Llama: Distilling and Accelerating Hybrid Models

09:42 GenCA: A Text-conditioned Generative Model for Realistic and Drivable Codec Avatars

11:16 Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation

13:04 Outro

1 year ago

14 minutes 14 seconds

GPT Reviews

Grok-2's Speed & Accuracy 🚀 // OpenAI's Transparency Push 🗳️ // LlamaDuo for Local LLMs 🔄

Grok-2's advancements in speed and accuracy position it as a leading AI model, particularly in math and coding. OpenAI's backing of California's AI bill highlights the critical need for transparency in synthetic content, especially during an election year. The episode features groundbreaking research on the SwiftBrush diffusion model and K-Sort Arena for generative model evaluation. Additionally, the LlamaDuo pipeline offers a practical solution for migrating from cloud-based LLMs to local models, tackling privacy and operational challenges.

Contact: sergi@earkind.com

Timestamps:

00:34 Introduction

01:55 grok-2 is Faster and Better

03:32 OpenAI supports California AI bill requiring 'watermarking' of synthetic content

04:53 Fake sponsor

06:45 SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher

08:10 SWE-bench-java: A GitHub Issue Resolving Benchmark for Java

09:40 K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences

11:24 LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs

13:26 Outro

1 year ago

14 minutes 46 seconds

GPT Reviews

Salesforce's AI Sales Agents 🤖 // NVIDIA's Compact Language Model ⚡ // Optimized Computation for Performance 📊

This episode dives into Salesforce's innovative AI sales agents that automate tasks but risk losing human touch, NVIDIA's compact yet powerful language model that promises efficiency, groundbreaking research showing how optimized computation can enhance model performance, and insights into compound inference systems revealing the delicate balance in maximizing language model effectiveness.

Contact: sergi@earkind.com

Timestamps:

00:34 Introduction

01:49 Salesforce's New Sales AI Agents

03:09 Lightweight Champ: NVIDIA Releases Small Language Model With State-of-the-Art Accuracy

04:52 avante.nvim

05:56 Fake sponsor

07:45 Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

09:22 Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

11:15 Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems

13:10 Outro

1 year ago

14 minutes 20 seconds

GPT Reviews

Amazon Cloud Chief Spicy Takes 🚀 // Zuckerberg's AI Vision 📈 // Multimodal Models for Safety 🔒

This episode dives deep into the future of coding, challenging the belief that AI will render developers obsolete. It highlights Meta's stock surge, attributing it to Zuckerberg's compelling AI narrative that captivates investors. The discussion also covers groundbreaking research like Transfusion, which merges text and image processing, and the innovative approach of automated design for intelligent agents. Lastly, it emphasizes the xGen-MM framework's commitment to safety in AI, showcasing the critical need to mitigate harmful behaviors in advanced models.

Contact: sergi@earkind.com

Timestamps:

00:34 Introduction

01:28 Amazon cloud chief: Devs may stop coding when AI takes over

02:53 Meta Shares Are Flying High as Zuckerberg Sells His AI Vision

04:34 I've Built My First Successful Side Project, and I Hate It

05:41 Fake sponsor

07:35 Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

09:16 Automated Design of Agentic Systems

10:56 xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

12:44 Outro

1 year ago

13 minutes 54 seconds

GPT Reviews

OpenAI's SearchGPT Launch 🔍 // Vision Transformers Efficiency 📊 // Automated Agent Design Revolution 🚀

OpenAI's SearchGPT is launching with limited access for only 10,000 users, raising questions about trust and the potential risks of generative search products. A comprehensive analysis challenges the belief that Vision Transformers are inefficient, suggesting they can handle higher resolutions effectively. The introduction of Automated Design of Agentic Systems (ADAS) could revolutionize how intelligent agents are created, outperforming traditional hand-designed models. The xGen-MM framework aims to enhance multimodal AI capabilities while prioritizing safety measures to mitigate harmful behaviors.

Contact: sergi@earkind.com

Timestamps:

00:34 Introduction

01:43 OpenAI is fresh out of SearchGPT

02:50 From ChatGPT to Gemini: how AI is rewriting the internet

04:32 On the speed of ViTs and CNNs

05:49 Fake sponsor

07:49 JPEG-LM: LLMs as Image Generators with Canonical Codec Representations

09:34 Automated Design of Agentic Systems

11:12 xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

13:01 Outro

1 year ago

14 minutes 11 seconds

GPT Reviews

Grok-2 Beta Release 🚀 // Apple's $1,000 Home Robot 🏡 // ChemVLM Breakthrough in Chemistry 🔬

This episode dives into the Grok-2 Beta Release, highlighting its advanced reasoning capabilities and competitive edge. We explore Apple’s ambitious plans for a $1,000 tabletop robotic home device, set to transform smart home technology. The introduction of ChemVLM marks a breakthrough in chemistry research, effectively integrating chemical images and text. Lastly, InfinityMATH presents a scalable dataset that enhances language models' mathematical reasoning, showcasing impressive performance improvements.

Contact: sergi@earkind.com

Timestamps:

00:34 Introduction

01:37 Grok-2 Beta Release

02:58 Apple Aiming to Launch Tabletop Robotic Home Device as Soon as 2026 With Pricing Around $1,000

04:29 Gemlite: Towards Building Custom Low-Bit Fused CUDA Kernels

05:34 Fake sponsor

07:16 Seeing and Understanding: Bridging Vision with Chemical Knowledge Via ChemVLM

08:55 Generative Photomontage

10:26 InfinityMATH: A Scalable Instruction Tuning Dataset in Programmatic Mathematical Reasoning

12:22 Outro

1 year ago

13 minutes 41 seconds

GPT Reviews

Gemini Live AI Assistant 📱 // OpenAI’s Coding Benchmark ✅ // LongWriter’s 10K Word Generation ✍️

This episode dives into Gemini Live's interactive AI capabilities, OpenAI's improved coding benchmark for reliable evaluations, LongWriter's breakthrough in generating ultra-long outputs, and SlotLifter's advancements in 3D object-centric learning. Each topic highlights significant innovations and their implications in the AI landscape.

Contact: sergi@earkind.com

Timestamps:

00:34 Introduction

01:48 Gemini makes your mobile device a powerful AI assistant

03:08 New OpenAI Coding Benchmark

04:52 Things I learned from teaching

05:59 Fake sponsor

07:38 Imagen 3

09:05 LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs

10:46 SlotLifter: Slot-guided Feature Lifting for Learning Object-centric Radiance Fields

12:22 Outro

1 year ago

13 minutes 23 seconds

GPT Reviews

Google Meet's AI Note-Taking 📝 // Trump’s AI Crowd Claims 🤔 // ControlNeXt & Image Generation 🎨

Google Meet's new AI note-taking feature could change meeting dynamics, while Trump’s claims about Kamala Harris reveal the political implications of AI. The exploration of AI's role in scientific research raises ethical concerns, and cutting-edge papers on ControlNeXt, rStar, and FruitNeRF showcase advancements in image generation, reasoning capabilities, and fruit counting accuracy.

Contact: sergi@earkind.com

Timestamps:

00:34 Introduction

01:43 Google Meet call will soon be able to take notes for you

02:56 Trump falsely claims Kamala Harris ‘AI’d’ her rally crowd size

04:23 The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

05:35 Fake sponsor

07:15 ControlNeXt: Powerful and Efficient Control for Image and Video Generation

08:47 Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers

10:41 FruitNeRF: A Unified Neural Radiance Field based Fruit Counting Framework

12:41 Outro

1 year ago

13 minutes 51 seconds

GPT Reviews

OpenAI's Strawberry Model 🍓 // Meta's Celebrity Voice Assistants 🎙️ // Human-level Robot Table Tennis 🏓

OpenAI's mysterious "Strawberry" AI model is causing a buzz in the tech world, with rumors of advanced reasoning capabilities.

Meta is trying to improve their AI assistants by enlisting the help of celebrities like Awkwafina to give them a more relatable and entertaining vibe.

Google DeepMind's research on building a robot capable of playing table tennis at a human level is a remarkable exploration of robotics and sports.

UC Berkeley and Google DeepMind's paper on optimizing LLMs and Harbin Institute of Technology's research on building a general-purpose AI agent capable of completing long-horizon tasks are both groundbreaking developments in the field of AI.

Contact: sergi@earkind.com

Timestamps:

00:34 Introduction

01:35 Sam Altman teases project Strawberry

03:06 Meta courts celebs like Awkwafina to voice AI assistants ahead of Meta Connect

04:58 Achieving Human Level Competitive Robot Table Tennis

06:11 Fake sponsor

08:15 Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

09:55 Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks

11:41 UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling

13:30 Outro

1 year ago

15 minutes 27 seconds

GPT Reviews

Nvidia's Stock Struggles 📉 // Meta's AI Hallucinations 🤖 // Superconducting Microprocessors ⚡

This episode dives into Nvidia's stock struggles amid rising competition, while also unpacking Meta's AI blunders and the implications of "hallucinations" in tech. We explore cutting-edge superconducting microprocessors that promise unprecedented energy efficiency and highlight groundbreaking AI research, including eavesdropping techniques and advancements in reinforcement learning.

Contact: sergi@earkind.com

Timestamps:

00:34 Introduction

01:50 Nvidia Sank Again Today -- Time to Buy the Artificial Intelligence (AI) Growth Stock Hand Over Fist?

03:09 Meta blames hallucinations after its AI said Trump rally shooting didn’t happen

04:52 Superconducting Microprocessors? Turns Out They're Ultra-Efficient

06:07 Fake sponsor

07:48 Deep-TEMPEST: Using Deep Learning to Eavesdrop on HDMI from its Unintended Electromagnetic Emanations

09:22 SAPG: Split and Aggregate Policy Gradients

10:45 MindSearch: Mimicking Human Minds Elicits Deep AI Searcher

12:44 Outro

1 year ago

14 minutes 41 seconds

GPT Reviews

Google's Gemma 2 vs. GPT-3.5 ⚔️ // Black Forest Labs' Flux Model 🌲 // Ethical Concerns in AI 🚨

This episode dives into Google’s Gemma 2, which claims to outperform GPT-3.5 while tackling responsible AI practices. We explore Black Forest Labs' Flux model, featuring 12 billion parameters and tailored versions for various users. Olivia sheds light on the ethical concerns surrounding the resurgence of pseudoscience in machine learning, particularly physiognomy. Lastly, Belinda reviews critical research on AI safety, advocating for clearer metrics to prevent misleading claims about safety advancements.

Contact: sergi@earkind.com

Timestamps:

00:34 Introduction

01:37 Google’s tiny AI model bests GPT-3.5

02:48 Announcing Flux by Black Forest Labs: The Next Leap in Text-to-Image Models

04:28 The reanimation of pseudoscience in machine learning and its ethical repercussions

06:06 Fake sponsor

08:04 MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts

09:55 Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models

11:41 Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?

13:33 Outro

1 year ago

14 minutes 44 seconds

GPT Reviews

Apple's AI Feature Delay 📅 // SAM 2 Object Segmentation 🖼️ // Google's TPU Chips Shift ⚡

Apple’s delay in releasing AI features until October could affect iPhone 16 sales and customer excitement. The tech giant’s choice to use Google’s TPU chips instead of Nvidia marks a significant shift in AI hardware competition. Meta’s SAM 2 introduces groundbreaking real-time object segmentation with zero-shot generalization, revolutionizing visual content interaction. Additionally, Sony AI’s research presents a cost-effective approach to training diffusion models, democratizing access to advanced AI technology. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:54 Apple Intelligence Won't Be Released Until October 03:09 Apple used Google's chips to train two AI models, research paper shows 04:44 A Visual Guide to Quantization 05:38 Introducing SAM 2: The next generation of Meta Segment Anything Model for videos and images 06:41 Fake sponsor 08:46 Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget 10:28 Theia: Distilling Diverse Vision Foundation Models for Robot Learning 12:27 Outro

1 year ago

14 minutes 25 seconds

GPT Reviews

OpenAI's SearchGPT 🧐 // AI in Math Olympiad 🏅 // Unreliable AI Existential Risk 🔍

OpenAI's new prototype, SearchGPT, promises to combine AI smarts with real-time web information to make search easier.

AI has achieved silver-medal standards at the International Mathematical Olympiad, raising questions about the future of mathematics and the role of AI in solving complex problems.

The reliability of AI existential risk probabilities is called into question in a thought-provoking article, challenging the authority we often assign to these forecasts and calling for more scrutiny.

Three fascinating papers from UNC Chapel Hill, Google DeepMind, and a collaboration between Caltech and NVIDIA explore advancements in theorem proving, balancing fast and slow planning, and aligning large language models with Best-of-N distillation. These papers could transform the way we approach complex problems with language models and streamline the development of LLMs.

Contact: sergi@earkind.com

Timestamps:

00:34 Introduction

01:54 OpenAI Announces SearchGPT

03:15 AI achieves silver-medal standard solving International Mathematical Olympiad problems

04:55 AI existential risk probabilities are too unreliable to inform policy

06:25 Fake sponsor

08:21 LeanDojo: Theorem Proving with Retrieval-Augmented Language Models

10:10 System-1.x: Learning to Balance Fast and Slow Planning with Language Models

12:01 BOND: Aligning LLMs with Best-of-N Distillation

13:43 Outro

1 year ago

15 minutes 50 seconds

GPT Reviews

Mistral Large 2 🌍 // Memphis Supercluster 💻 // Emergence in Complex Systems 🧩

Mistral Large 2 release with advanced features and multilingual support.

Elon Musk's announcement of the Memphis Supercluster for creating the world's most powerful AI.

Discussion of emergence in complex systems and the MINT-1T dataset for training large multimodal models.

Introduction of OpenDevin, an open platform for developing AI agents and MOMAland, a benchmark framework for multi-objective multi-agent reinforcement learning.

Contact: sergi@earkind.com

Timestamps:

00:34 Introduction

01:39 Mistral Large 2 Release

03:01 Elon Musk Announces Memphis Supercomputer

04:48 The Puzzle of How Large-Scale Order Emerges in Complex Systems

06:22 Fake sponsor

08:37 MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens

10:16 OpenDevin: An Open Platform for AI Software Developers as Generalist Agents

11:53 MOMAland: A Set of Benchmarks for Multi-Objective Multi-Agent Reinforcement Learning

13:31 Outro

1 year ago

14 minutes 51 seconds

GPT Reviews

Llama 3.1 Unveiled 🦙 // Alphabet's 14% Revenue Growth 📈 // MovieDreamer Revolutionizes Video 🎬

This episode features the introduction of Llama 3.1, Meta's cutting-edge AI model with remarkable flexibility and extensive language support. We delve into Alphabet's impressive 14% revenue growth, highlighting the increasing demand for AI infrastructure in cloud computing. The System-1.x Planner is explored, demonstrating its innovative balance between fast and slow planning modes, leading to enhanced performance. Finally, we discuss MovieDreamer, a groundbreaking model that elevates video generation by ensuring narrative coherence and high visual quality in long-form content.

Contact: sergi@earkind.com

Timestamps:

00:34 Introduction

01:39 Introducing Llama 3.1: Our most capable models to date

02:59 Alphabet revenue jump shows no sign of AI denting search business

04:36 Open Source AI Is the Path Forward

05:40 Fake sponsor

07:41 System-1.x: Learning to Balance Fast and Slow Planning with Language Models

09:31 KAN or MLP: A Fairer Comparison

11:08 MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence

12:53 Outro

1 year ago

14 minutes 50 seconds

GPT Reviews

Meta's Llama 3.1 vs. GPT-4o 🤯 // OpenAI's own AI chips 🧐 // SlowFast-LLaVA for Video LLMs 🎬

Meta's upcoming Llama 3.1 models could outperform the current state-of-the-art closed-source LLM model, OpenAI's GPT-4o.

OpenAI is planning to develop its own AI chip to optimize performance and potentially supercharge their progress towards AGI.

Apple's SlowFast-LLaVA is a new training-free video large language model that captures both detailed spatial semantics and long-range temporal context in video without exceeding the token budget of commonly used LLMs.

Google's Conditioned Language Policy (CLP) framework is a general framework that builds on techniques from multi-task training and parameter-efficient finetuning to develop steerable models that can trade-off multiple conflicting objectives at inference time.

Contact: sergi@earkind.com

Timestamps:

00:34 Introduction

01:28 LLAMA 405B Performance Leaked

03:01 OpenAI Wants Its Own AI Chips

04:25 Towards more cooperative AI safety strategies

06:01 Fake sponsor

07:35 SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

09:17 AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?

10:56 Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning

12:46 Outro

1 year ago

14 minutes 6 seconds

GPT Reviews

Claude for Android 🤖 // AI for Material Sciences ⚡ // TinkerBird Disrupts RAG Workflows 🐦

Claude for Android is now available, bringing AI-powered assistance to a wider audience.

MIT researchers have developed a new machine-learning framework that can predict materials' thermal properties up to 1,000 times faster than other AI-based techniques, potentially improving energy efficiency.

TinkerBird, a vector database designed for efficient storage and retrieval of high-dimensional vectors, is disrupting traditional RAG workflows and eliminating roundtrip delays associated with client-server models.

ChatQA 2, a Llama3-based model from NVIDIA, bridges the gap between open-access LLMs and leading proprietary models in long-context understanding and retrieval-augmented generation capabilities, while Stable Audio Open, an open-weights text-to-audio model from Stability AI, showcases potential for high-quality stereo sound synthesis at 44.1kHz.

Contact: sergi@earkind.com

Timestamps:

00:34 Introduction

01:34 Claude for Android is here

02:50 AI method radically speeds predictions of materials’ thermal properties

04:44 TinkerBird

06:10 Fake sponsor

08:10 ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities

09:54 Stable Audio Open

11:28 Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders

13:54 Outro

1 year ago

15 minutes 4 seconds

GPT Reviews

OpenAI's GPT-4o mini 💰 // NVIDIA's Mistral NeMo 12B 🚀 // Transcribro speech recognition 🎤

OpenAI has released their newest model, GPT-4o mini, which is more cost-efficient and excels in mathematical reasoning and coding tasks.

NVIDIA's Mistral NeMo 12B is a state-of-the-art language model with unprecedented accuracy and enterprise-grade support.

A new speech recognition keyboard and service for Android called Transcribro has been developed, which is private and on-device.

Research papers explore the impact of vocabulary size on language model scaling, the use of large datastores for retrieval-based language models, and a method for generating long sequences of views of a cityscape using AI and computer vision.

Contact: sergi@earkind.com

Timestamps:

00:34 Introduction

01:40 OpenAI Announces GPT 4o mini

03:11 Mistral AI and NVIDIA Unveil Mistral NeMo 12B, a Cutting-Edge Enterprise AI Model

05:28 Transcribro: Private and on-device speech recognition keyboard and service for Android

06:43 Fake sponsor

08:49 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies

10:19 Scaling Retrieval-Based Language Models with a Trillion-Token Datastore

11:49 Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion

13:26 Outro

1 year ago

14 minutes 36 seconds

GPT Reviews

Apple, Nvidia, Anthropic, and Salesforce caught using content without creators' consent for AI training.

Mistral AI launches two new open-source models, Codestral Mamba and Mathstral, with impressive capabilities.

NVIDIA transitions to fully open-source GPU kernel modules, offering new capabilities and easy switching for users.

Exciting research papers include Ref-AVS for multimodal object segmentation, Qwen2-Audio for large-scale audio-language modeling, and DiT-MoE for scalable language modeling and image generation.

Contact: sergi@earkind.com

Timestamps:

00:34 Introduction

01:27 Apple, Nvidia, Anthropic Used Thousands of Swiped YouTube Videos to Train AI

02:46 Mistral's New Open Source Models

04:09 NVIDIA Transitions Fully Towards Open-Source GPU Kernel Modules

05:37 Fake sponsor

07:15 Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes

08:47 Qwen2-Audio Technical Report

10:49 Scaling Diffusion Transformers to 16 Billion Parameters

12:21 Outro

1 year ago

13 minutes 22 seconds