This academic paper challenges the common belief that the final layers of large language models (LLMs) are the most effective for downstream tasks. The authors propose a new unified framework that integrates information theory, geometry, and invariance metrics to assess the quality of hidden layer representations. Their extensive experiments across various LLM architectures and even vision models demonstrate that intermediate layers often provide richer, more robust features, frequently outperforming the final layer in terms of accuracy on diverse tasks. The paper also explores how different architectures and training objectives influence these internal representation patterns, highlighting a "compression valley" in autoregressive models that appears crucial for balancing information and noise. Ultimately, this research advocates for a shift in focus toward strategically leveraging mid-layer representations for more accurate and robust AI systems.