
Today we cover three standout arXiv releases shaping vision, language, and evaluation. First, PANORAMA surveys the rise of omnidirectional, 360° perception for embodied AI—why standard pinhole vision isn’t enough, where datasets and models fall short, and how new backbones and adaptation methods are closing the gap. Read: https://arxiv.org/pdf/2509.12989 (arXiv:2509.12989).
Next, the HALA technical report details an Arabic-centric instruction and translation pipeline—from FP8 translator teachers to multi-million sample corpora—powering models from 350M to 9B with strong benchmark gains. Read: https://arxiv.org/pdf/2509.14008 (arXiv:2509.14008).
Finally, GenExam proposes a multidisciplinary “exam” for text-to-image models, revealing how strict, knowledge-heavy prompts expose major gaps in today’s generators. Read: https://arxiv.org/pdf/2509.14232 (arXiv:2509.14232).