Home
Categories
EXPLORE
True Crime
Comedy
Society & Culture
Business
History
Technology
Fiction
About Us
Contact Us
Copyright
© 2024 PodJoint
Loading...
0:00 / 0:00
Podjoint Logo
US
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/0e/a0/6b/0ea06bef-d724-4ebf-61db-69b969390a17/mza_9075267775248923469.jpg/600x600bb.jpg
Programming Throwdown
Patrick Wheeler and Jason Gauci
183 episodes
15 hours ago
Programming Throwdown educates Computer Scientists and Software Engineers on a cavalcade of programming and tech topics. Every show will cover a new programming language, so listeners will be able to speak intelligently about any programming language.
Show more...
How To
Education,
News,
Tech News
RSS
All content for Programming Throwdown is the property of Patrick Wheeler and Jason Gauci and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
Programming Throwdown educates Computer Scientists and Software Engineers on a cavalcade of programming and tech topics. Every show will cover a new programming language, so listeners will be able to speak intelligently about any programming language.
Show more...
How To
Education,
News,
Tech News
https://img.transistor.fm/dKng0CC3e79aOVioBPfYCkT00sYx_F0em7g0a4GQW9s/rs:fill:3000:3000:1/q:60/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9iMjJl/YWNlODUwZDFlOGY5/NjQ2MjNmZGE2MjYw/YmUyOC5wbmc.jpg
180: Reinforcement Learning
Programming Throwdown
1 hour 52 minutes
4 months ago
180: Reinforcement Learning

Intro topic: Grills

News/Links:

  • You can’t call yourself a senior until you’ve worked on a legacy project
    • https://www.infobip.com/developers/blog/seniors-working-on-a-legacy-project
  • Recraft might be the most powerful AI image platform I’ve ever used — here’s why
    • https://www.tomsguide.com/ai/ai-image-video/recraft-might-be-the-most-powerful-ai-image-platform-ive-ever-used-heres-why
  • NASA has a list of 10 rules for software development
    • https://www.cs.otago.ac.nz/cosc345/resources/nasa-10-rules.htm
  • AMD Radeon RX 9070 XT performance estimates leaked: 42% to 66% faster than Radeon RX 7900 GRE
    • https://www.tomshardware.com/tech-industry/amd-estimates-of-radeon-rx-9070-xt-performance-leaked-42-percent-66-percent-faster-than-radeon-rx-7900-gre 

Book of the Show

  • Patrick: 
    • The Player of Games (Ian M Banks)
      • https://a.co/d/1ZpUhGl (non-affiliate)
  • Jason: 
    • Basic Roleplaying Universal Game Engine
      • https://amzn.to/3ES4p5i


Patreon Plug https://www.patreon.com/programmingthrowdown?ty=h


Tool of the Show

  • Patrick: 
    • Pokemon Sword and Shield
  • Jason: 
    • Features and Labels ( https://fal.ai )

Topic: Reinforcement Learning

  • Three types of AI
    • Supervised Learning
    • Unsupervised Learning
    • Reinforcement Learning
  • Online vs Offline RL
  • Optimization algorithms
    • Value optimization
      • SARSA
      • Q-Learning
    • Policy optimization
      • Policy Gradients
      • Actor-Critic
      • Proximal Policy Optimization
  • Value vs Policy Optimization
    • Value optimization is more intuitive (Value loss)
    • Policy optimization is less intuitive at first (policy gradients)
    • Converting values to policies in deep learning is difficult
  • Imitation Learning
    • Supervised policy learning
    • Often used to bootstrap reinforcement learning
  • Policy Evaluation
    • Propensity scoring versus model-based
  • Challenges to training RL model
    • Two optimization loops
      • Collecting feedback vs updating the model
    • Difficult optimization target
      • Policy evaluation
  • RLHF &  GRPO

★ Support this podcast on Patreon ★
Programming Throwdown
Programming Throwdown educates Computer Scientists and Software Engineers on a cavalcade of programming and tech topics. Every show will cover a new programming language, so listeners will be able to speak intelligently about any programming language.