
In today’s 5–6 minute roundup, we cover:
(1) SAPO’s decentralized RL that shares rollouts across a swarm for cheaper, faster LM post-training (arXiv:2509.08721 PDF),
(2) VLA-Adapter’s “Bridge Attention” that makes small vision-language-action models both fast and state-of-the-art on robotics tasks (arXiv:2509.09372 PDF), and
(3) HuMo’s unified generator coordinating text, reference images, and audio for people-centric video with strong identity + lip-sync (arXiv:2509.08519 PDF). Subscribe for crisp takes on what was done, why it matters, and where it might go next.