Home
Categories
EXPLORE
True Crime
Comedy
Society & Culture
Business
Sports
History
Fiction
About Us
Contact Us
Copyright
© 2024 PodJoint
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts116/v4/a8/f4/86/a8f486e5-d37b-6880-c68f-b93d86944779/mza_15571952067834809146.jpg/600x600bb.jpg
The Paleo Cyborg Podcast
Steve
18 episodes
5 days ago
Ai
Show more...
Technology
RSS
All content for The Paleo Cyborg Podcast is the property of Steve and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
Ai
Show more...
Technology
https://d3t3ozftmdmh3i.cloudfront.net/production/podcast_uploaded_episode/38098018/38098018-1705501560144-40e6505fe8d07.jpg
Caio Temer Episode 13 part 1
The Paleo Cyborg Podcast
1 hour 21 minutes 30 seconds
1 year ago
Caio Temer Episode 13 part 1
Caio Temer explains LLMs and machine learning. found that generalized learning happens after 10k epochs with a considerable amount of data per epoch, are there studies in this regard? ChatGPT Yes, there are studies related to generalized learning after many times, with a large amount of data per season. A notable example is the "Deep Bootstrap Framework", presented at ICLR 2021. This framework makes a connection between generalization and online optimization. He compares a "real world", where a model is trained on a finite number of samples by multiple times, with an "ideal world", where each training step uses fresh samples from a virtually infinite data set. The research found that models that train quickly in infinite data are the same as those that generalize well in finite data. This study offers a new perspective on design choices in practice and provides a theoretical path to understand generalization. Get started with ChatGPT Caio Temer show notes: Part One summary interesting topics for trading- -how I created the first bots -how models combine language models with autonomous robots -why conventional methods don't work about language models: -language models are simpler to obtain with other structures -problem in the current structure and how I solved it - development of intuitive neural networks using hash abstraction Caio developed a way for neural networks, through cryptography techniques that use sum, to learn to abstract and understand deeply quickly created a lib of a tiny language model that learns to predict all tokens at an absurd speed by using distance vector embeds instead of tokens Transformers: [](https://arxiv.org/abs/2205.13504) for example, if the network is exploding, there's something wrong with a component, or the update rate is wrong If the rate is right, the linearities are wrong, just exchange some linear or more linear connection like relu for tanh or add a tanh. softmax in deep layers as a projector reduces network convergence time by forcing layer alignment training the transformer one layer at a time, adding a copy of the first layer and freezing the first accelerates training and improves retention In sequence for sequence, a permuted linear dense solves 99% of problems in less time and better than the transformer. Giant embeds work like entire networks, only needing one activation at the end after reshaping “1 tanh layer of 1 million units solves everything better than any other network with the same number of parameters, at least in the tasks I use, I always start with it, then see how to reduce” Collabs: making a language model using Facebook's FastText https://colab.research.google.com/drive/1wVQrpzyY-SkCZTRZCcP6xJGRdg1ik0jR#scrollTo=pIW6-VwMuRlz https://colab.research.google.com/drive/1oABIZr1xiIu7DKc7AbbZfeBJFLkZW6Ep#scrollTo=XfR-3PpPYoFU https://github.com/rtkclouds/fast-js-language-model https://x.com/canalCCore2/status/1735044356535472278?s=20 GitHub gists: [](https://gist.github.com/rtkclouds/50b81d10736793f07cdca354516e8757) [](https://gist.github.com/rtkclouds/a6ee9afd96461ca94b3e9c22f78bda3a)
The Paleo Cyborg Podcast
Ai