Erik Jones on Automatically Auditing Large Language Models

https://is1-ssl.mzstatic.com/image/thumb/Podcasts116/v4/7f/c2/65/7fc26521-8463-fb0a-cf30-6dacd66b1623/mza_7962167092170656969.jpg/600x600bb.jpg

The Inside View

Michaël Trazzi

52 episodes

1 week ago

The goal of this podcast is to create a place where people discuss their inside views about existential risk from AI.

Technology

RSS

All content for The Inside View is the property of Michaël Trazzi and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

The goal of this podcast is to create a place where people discuss their inside views about existential risk from AI.

Technology

https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_episode/14474637/14474637-1691773532035-19e0181850adf.jpg

Erik Jones on Automatically Auditing Large Language Models

The Inside View

22 minutes 36 seconds

2 years ago

Erik Jones on Automatically Auditing Large Language Models

Erik is a Phd at Berkeley working with Jacob Steinhardt, interested in making generative machine learning systems more robust, reliable, and aligned, with a focus on large language models.In this interview we talk about his paper "Automatically Auditing Large Language Models via Discrete Optimization" that he presented at ICML.

Youtube: https://youtu.be/bhE5Zs3Y1n8

Paper: https://arxiv.org/abs/2303.04381

Erik: https://twitter.com/ErikJones313

Host: https://twitter.com/MichaelTrazzi

Patreon: https://www.patreon.com/theinsideview

Outline

00:00 Highlights

00:31 Eric's background and research in Berkeley

01:19 Motivation for doing safety research on language models

02:56 Is it too easy to fool today's language models?

03:31 The goal of adversarial attacks on language models

04:57 Automatically Auditing Large Language Models via Discrete Optimization