
1. RLAIF at Scale: Reinforcement Learning from AI Feedback for Multi-Turn Reasoning
This paper explores using AI-generated feedback instead of expensive human labels to train reasoning models. The authors show that Reinforcement Learning from AI Feedback (RLAIF) can match or even outperform models trained with limited human feedback, especially in multi-turn reasoning tasks.
2. Learning to Forget: Dynamic Memory Compression in Long-Context Transformers
The authors propose a method for making transformers more efficient on long contexts by teaching them to “forget” unimportant details. Their dynamic memory compression reduces memory usage by over 40% while maintaining — and sometimes improving — accuracy on long-sequence benchmarks.
3. VidAgent: Scalable Video Agents with Spatio-Temporal Reasoning
This work introduces VidAgent, a system that can understand and reason over long videos by grounding events in both space and time. It achieves state-of-the-art performance on video QA benchmarks and opens up possibilities for advanced video search and monitoring applications.