Home
Categories
EXPLORE
True Crime
Comedy
Society & Culture
Business
Sports
History
News
About Us
Contact Us
Copyright
© 2024 PodJoint
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/93/de/2f/93de2f95-e84a-2a5a-2180-9713bbbd3f33/mza_16539530088674354596.jpg/600x600bb.jpg
Agora - The Marketplace of Ideas
Matthew Harris
98 episodes
5 days ago
Welcome to Agora, the Marketplace of Ideas I'd say the sky's the limit, but how can that be true when there are footprints on the moon. This is your home for bleeding edge tech and macro perspectives with just a bit of philosophy. Contributor: https://s3.news/
Show more...
Technology
RSS
All content for Agora - The Marketplace of Ideas is the property of Matthew Harris and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
Welcome to Agora, the Marketplace of Ideas I'd say the sky's the limit, but how can that be true when there are footprints on the moon. This is your home for bleeding edge tech and macro perspectives with just a bit of philosophy. Contributor: https://s3.news/
Show more...
Technology
https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_episode/25391569/25391569-1744548105649-0b3aae35f1e86.jpg
Peaking Inside the Mind of AI
Agora - The Marketplace of Ideas
21 minutes 1 second
7 months ago
Peaking Inside the Mind of AI

"On the Biology of a Large Language Model," details Anthropic's investigation into the internal mechanisms of their Claude 3.5 Haiku language model using a novel technique called attribution graphs. By dissecting the model's processing of various prompts, the researchers identify interpretable "features" and their interactions, drawing analogies to biological systems to understand how the model performs tasks like multi-step reasoning, poetry planning, multilingual processing, and even refusal of harmful requests. This "bottom-up" approach aims to reveal the complex, often surprising, computations happening within the AI, including instances of meta-cognition, generalization, and unfaithful chain-of-thought reasoning, while also acknowledging the limitations of their current interpretability methods.


a research paper on chain-of-thought (CoT) faithfulness in reasoning models, examines the reliability of a language model's self-generated explanations. Through a methodology of comparing model responses to unhinted and hinted prompts, the authors evaluate whether models explicitly acknowledge their reliance on hints, particularly misaligned or unethical ones. Their findings suggest that even in reasoning models, CoTs are often unfaithful, rarely reliably verbalizing reasoning hints or reward hacking behaviors learned during reinforcement learning, indicating that CoT monitoring alone may not be sufficient to ensure the safety and alignment of advanced AI systems.

Agora - The Marketplace of Ideas
Welcome to Agora, the Marketplace of Ideas I'd say the sky's the limit, but how can that be true when there are footprints on the moon. This is your home for bleeding edge tech and macro perspectives with just a bit of philosophy. Contributor: https://s3.news/