“On Fleshling Safety: A Debate by Klurl and Trapaucius.” by Eliezer Yudkowsky

https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/d8/b7/27/d8b72741-4a96-73a6-e98e-c6c2402e48ec/mza_11654084090888999774.jpg/600x600bb.jpg

LessWrong (Curated & Popular)

LessWrong

655 episodes

1 day ago

This is a link post. New Anthropic research (tweet, blog post, paper): We investigate whether large language models can introspect on their internal states. It is difficult to answer this question through conversation alone, as genuine introspection cannot be distinguished from confabulations. Here, we address this challenge by injecting representations of known concepts into a model's activations, and measuring the influence of these manipulations on the model's self-reported states. We f...

All content for LessWrong (Curated & Popular) is the property of LessWrong and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

Technology

Society & Culture,

Philosophy

“On Fleshling Safety: A Debate by Klurl and Trapaucius.” by Eliezer Yudkowsky

LessWrong (Curated & Popular)

2 hours 22 minutes

1 week ago

“On Fleshling Safety: A Debate by Klurl and Trapaucius.” by Eliezer Yudkowsky

(23K words; best considered as nonfiction with a fictional-dialogue frame, not a proper short story.) Prologue: Klurl and Trapaucius were members of the machine race. And no ordinary citizens they, but Constructors: licensed, bonded, and insured; proven, experienced, and reputed. Together Klurl and Trapaucius had collaborated on such famed artifices as the Eternal Clock, Silicon Sphere, Wandering Flame, and Diamond Book; and as individuals, both had constructed wonders too numerous to nu...