[Crosspost] Adam Gleave on Vulnerabilities in GPT-4 APIs (+ extra Nathan Labenz interview)

https://is1-ssl.mzstatic.com/image/thumb/Podcasts116/v4/7f/c2/65/7fc26521-8463-fb0a-cf30-6dacd66b1623/mza_7962167092170656969.jpg/600x600bb.jpg

The Inside View

Michaël Trazzi

52 episodes

1 week ago

The goal of this podcast is to create a place where people discuss their inside views about existential risk from AI.

Technology

RSS

All content for The Inside View is the property of Michaël Trazzi and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

The goal of this podcast is to create a place where people discuss their inside views about existential risk from AI.

Technology

https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_episode/14474637/14474637-1715960900784-ce68e5854d827.jpg

[Crosspost] Adam Gleave on Vulnerabilities in GPT-4 APIs (+ extra Nathan Labenz interview)

The Inside View

2 hours 16 minutes 8 seconds

1 year ago

[Crosspost] Adam Gleave on Vulnerabilities in GPT-4 APIs (+ extra Nathan Labenz interview)

This is a special crosspost episode where Adam Gleave is interviewed by Nathan Labenz from the Cognitive Revolution. At the end I also have a discussion with Nathan Labenz about his takes on AI.

Adam Gleave is the founder of Far AI, and with Nathan they discuss finding vulnerabilities in GPT-4's fine-tuning and Assistant PIs, Far AI's work exposing exploitable flaws in "superhuman" Go AIs through innovative adversarial strategies, accidental jailbreaking by naive developers during fine-tuning, and more.

OUTLINE

(00:00) Intro

(02:57) NATHAN INTERVIEWS ADAM GLEAVE: FAR.AI's Mission

(05:33) Unveiling the Vulnerabilities in GPT-4's Fine Tuning and Assistance APIs

(11:48) Divergence Between The Growth Of System Capability And The Improvement Of Control

(13:15) Finding Substantial Vulnerabilities

(14:55) Exploiting GPT 4 APIs: Accidentally jailbreaking a model

(18:51) On Fine Tuned Attacks and Targeted Misinformation

(24:32) Malicious Code Generation

(27:12) Discovering Private Emails

(29:46) Harmful Assistants

(33:56) Hijacking the Assistant Based on the Knowledge Base

(36:41) The Ethical Dilemma of AI Vulnerability Disclosure

(46:34) Exploring AI's Ethical Boundaries and Industry Standards

(47:47) The Dangers of AI in Unregulated Applications

(49:30) AI Safety Across Different Domains

(51:09) Strategies for Enhancing AI Safety and Responsibility

(52:58) Taxonomy of Affordances and Minimal Best Practices for Application Developers

(57:21) Open Source in AI Safety and Ethics

(1:02:20) Vulnerabilities of Superhuman Go playing AIs

(1:23:28) Variation on AlphaZero Style Self-Play

(1:31:37) The Future of AI: Scaling Laws and Adversarial Robustness

(1:37:21) MICHAEL TRAZZI INTERVIEWS NATHAN LABENZ

(1:37:33) Nathan’s background

(01:39:44) Where does Nathan fall in the Eliezer to Kurzweil spectrum

(01:47:52) AI in biology could spiral out of control

(01:56:20) Bioweapons

(02:01:10) Adoption Accelerationist, Hyperscaling Pauser

(02:06:26) Current Harms vs. Future Harms, risk tolerance

(02:11:58) Jailbreaks, Nathan’s experiments with Claude

The cognitive revolution: https://www.cognitiverevolution.ai/

Exploiting Novel GPT-4 APIs: https://far.ai/publication/pelrine2023novelapis/

Advesarial Policies Beat Superhuman Go AIs: https://far.ai/publication/wang2022adversarial/