Home
Categories
EXPLORE
True Crime
Comedy
Society & Culture
Business
Sports
Technology
Health & Fitness
About Us
Contact Us
Copyright
© 2024 PodJoint
Podjoint Logo
US
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/86/0c/75/860c75aa-068a-18b9-1cb5-600f803acdd4/mza_17177667092256625558.jpg/600x600bb.jpg
AI Illuminated
The AI Illuminators
25 episodes
1 day ago
A new way to keep up with AI research. Delivered to your ears. Illuminated by AI. Part of the GenAI4Good initiative.
Show more...
Courses
Education
RSS
All content for AI Illuminated is the property of The AI Illuminators and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
A new way to keep up with AI research. Delivered to your ears. Illuminated by AI. Part of the GenAI4Good initiative.
Show more...
Courses
Education
https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_episode/42256170/42256170-1729400633459-a8627ccf1e4ed.jpg
The Ingredients for Robotic Diffusion Transformers
AI Illuminated
7 minutes 53 seconds
1 year ago
The Ingredients for Robotic Diffusion Transformers

[00:00] Intro

[00:33] Combining transformers & diffusion models

[01:12] Key design: Scalable attention blocks (AdaLN)

[02:30] Efficient observation tokenization

[03:45] DiT Block policy architecture overview

[04:20] BiPlay dataset introduction

[04:53] Performance improvements over baselines

[05:30] Key findings from ablations

[06:08] Generalization to different robot types

[06:43] Simulation vs real-world performance

[07:13] Takeaways and future research directions


Authors: Sudeep Dasari, Oier Mees, Sebastian Zhao, Mohan Kumar Srirama, Sergey Levine


Affiliations: Carnegie Mellon University, University of California, Berkeley.


Abstract: In recent years roboticists have achieved remarkable progress in solving increasingly general tasks on dexterous robotic hardware by leveraging high capacity Transformer network architectures and generative diffusion models. Unfortunately, combining these two orthogonal improvements has proven surprisingly difficult, since there is no clear and well-understood process for making important design choices. In this paper, we identify, study and improve key architectural design decisions for high-capacity diffusion transformer policies. The resulting models can efficiently solve diverse tasks on multiple robot embodiments, without the excruciating pain of per-setup hyper-parameter tuning. By combining the results of our investigation with our improved model components, we are able to present a novel architecture, named \method, that significantly outperforms the state of the art in solving long-horizon (1500+ time-steps) dexterous tasks on a bi-manual ALOHA robot. In addition, we find that our policies show improved scaling performance when trained on 10 hours of highly multi-modal, language annotated ALOHA demonstration data. We hope this work will open the door for future robot learning techniques that leverage the efficiency of generative diffusion modeling with the scalability of large scale transformer architectures. Code, robot dataset, and videos are available at: this https URL


Link: https://arxiv.org/abs/2410.10088

AI Illuminated
A new way to keep up with AI research. Delivered to your ears. Illuminated by AI. Part of the GenAI4Good initiative.