
Rot

Context Rot: How Increasing Input Tokens Impacts LLM Performance
Recent developments in LLMs show a trend toward longer context windows, with the input token count of the latest models reaching the millions. Because these models achieve near-perfect scores on widely adopted benchmarks like Needle in a Haystack (NIAH) [1], it’s often assumed that their performance is uniform across long-context tasks. However, NIAH is fundamentally…