Home
Categories
EXPLORE
True Crime
Comedy
Business
Society & Culture
History
Sports
Health & Fitness
About Us
Contact Us
Copyright
© 2024 PodJoint
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts221/v4/9d/c8/57/9dc857f4-309d-9457-0471-e56406d46de5/mza_15725793256597513042.jpg/600x600bb.jpg
muckrAIkers
Jacob Haimes and Igor Krawczuk
18 episodes
3 weeks ago
Join us as we dig a tiny bit deeper into the hype surrounding "AI" press releases, research papers, and more. Each episode, we'll highlight ongoing research and investigations, providing some much needed contextualization, constructive critique, and even a smidge of occasional good will teasing to the conversation, trying to find the meaning under all of this muck.
Show more...
Technology
Science,
Mathematics
RSS
All content for muckrAIkers is the property of Jacob Haimes and Igor Krawczuk and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
Join us as we dig a tiny bit deeper into the hype surrounding "AI" press releases, research papers, and more. Each episode, we'll highlight ongoing research and investigations, providing some much needed contextualization, constructive critique, and even a smidge of occasional good will teasing to the conversation, trying to find the meaning under all of this muck.
Show more...
Technology
Science,
Mathematics
https://is1-ssl.mzstatic.com/image/thumb/Podcasts221/v4/9d/c8/57/9dc857f4-309d-9457-0471-e56406d46de5/mza_15725793256597513042.jpg/600x600bb.jpg
NeurIPS 2024 Wrapped 🌯
muckrAIkers
1 hour 26 minutes
10 months ago
NeurIPS 2024 Wrapped 🌯

What happens when you bring over 15,000 machine learning nerds to one city? If your guess didn't include racism, sabotage and scandal, belated epiphanies, a spicy SoLaR panel, and many fantastic research papers, you wouldn't have captured my experience. In this episode we discuss the drama and takeaways from NeurIPS 2024.

Posters available at time of episode preparation can be found on the episode webpage.

EPISODE RECORDED 2024.12.22


  • (00:00) - Recording date
  • (00:05) - Intro
  • (00:44) - Obligatory mentions
  • (01:54) - SoLaR panel
  • (18:43) - Test of Time
  • (24:17) - And now: science!
  • (28:53) - Downsides of benchmarks
  • (41:39) - Improving the science of ML
  • (53:07) - Performativity
  • (57:33) - NopenAI and Nanthropic
  • (01:09:35) - Fun/interesting papers
  • (01:13:12) - Initial takes on o3
  • (01:18:12) - WorkArena
  • (01:25:00) - Outro


Links

Note: many workshop papers had not yet been published to arXiv as of preparing this episode, the OpenReview submission page is provided in these cases. 

  • NeurIPS statement on inclusivity
  • CTOL Digital Solutions article - NeurIPS 2024 Sparks Controversy: MIT Professor's Remarks Ignite "Racism" Backlash Amid Chinese Researchers’ Triumphs
  • (1/2) NeurIPS Best Paper - Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
  • Visual Autoregressive Model report this link now provides a 404 error
    • Don't worry, here it is on archive.is
  • Reuters article - ByteDance seeks $1.1 mln damages from intern in AI breach case, report says
  • CTOL Digital Solutions article - NeurIPS Award Winner Entangled in ByteDance's AI Sabotage Accusations: The Two Tales of an AI Genius
  • Reddit post on Ilya's talk
  • SoLaR workshop page

Referenced Sources

  • Harvard Data Science Review article - Data Science at the Singularity
  • Paper - Reward Reports for Reinforcement Learning
  • Paper - It's Not What Machines Can Learn, It's What We Cannot Teach
  • Paper - NeurIPS Reproducibility Program
  • Paper - A Metric Learning Reality Check

Improving Datasets, Benchmarks, and Measurements

  • Tutorial video + slides - Experimental Design and Analysis for AI Researchers (I think you need to have attended NeurIPS to access the recording, but I couldn't find a different version)
  • Paper - BetterBench: Assessing AI Benchmarks, Uncovering Issues, and Establishing Best Practices
  • Paper - Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
  • Paper - A Systematic Review of NeurIPS Dataset Management Practices
  • Paper - The State of Data Curation at NeurIPS: An Assessment of Dataset Development Practices in the Datasets and Benchmarks Track
  • Paper - Benchmark Repositories for Better Benchmarking
  • Paper - Croissant: A Metadata Format for ML-Ready Datasets
  • Paper - Rethinking the Evaluation of Out-of-Distribution Detection: A Sorites Paradox
  • Paper - Evaluating Generative AI Systems is a Social Science Measurement Challenge
  • Paper - Report Cards: Qualitative Evaluation of LLMs

Governance Related

  • Paper - Towards Data Governance of Frontier AI Models
  • Paper - Ways Forward for Global AI Benefit Sharing
  • Paper - How do we warn downstream model providers of upstream risks?
    • Unified Model Records tool
  • Paper - Policy Dreamer: Diverse Public Policy Creation via Elicitation and Simulation of Human Preferences
  • Paper - Monitoring Human Dependence on AI Systems with Reliance Drills
  • Paper - On the Ethical Considerations of Generative Agents
  • Paper - GPAI Evaluation Standards Taskforce: Towards Effective AI Governance
  • Paper - Levels of Autonomy: Liability in the age of AI Agents

Certified Bangers + Useful Tools

  • Paper - Model Collapse Demystified: The Case of Regression
  • Paper - Preference Learning Algorithms Do Not Learn Preference Rankings
  • LLM Dataset Inference paper + repo
  • dattri paper + repo
  • DeTikZify paper + repo

Fun Benchmarks/Datasets

  • Paloma paper + dataset
  • RedPajama paper + dataset
  • Assemblage webpage
  • WikiDBs webpage
  • WhodunitBench repo
  • ApeBench paper + repo
  • WorkArena++ paper

Other Sources

  • Paper - The Mirage of Artificial Intelligence Terms of Use Restrictions
muckrAIkers
Join us as we dig a tiny bit deeper into the hype surrounding "AI" press releases, research papers, and more. Each episode, we'll highlight ongoing research and investigations, providing some much needed contextualization, constructive critique, and even a smidge of occasional good will teasing to the conversation, trying to find the meaning under all of this muck.