Artificial Analysis overhauls its AI Intelligence Index, replacing saturated benchmarks with real-world tests measuring ...
Beijing-based Ubiquant launches code-focused systems claiming benchmark wins over US peers despite using far fewer parameters ...
What if the future of artificial intelligence wasn’t about building ever-larger models but instead about doing more with less? In a stunning upset, the 27-million-parameter Hierarchical Reasoning ...
Ben Gao '25 asks us to reconsider how we can use AI effectively, arguing that human-centered design needs to be prioritized.
China’s new coding AI beats GPT-5.1 and Claude 4.5, with 128,000-token context helping you solve tougher repos faster and cut ...
Hello and welcome to Eye on AI. In this edition…A new OpenAI benchmark shows how good models are getting at completing professional tasks…California has a new AI law…OpenAI rolls out Instant Purchases ...
AI chatbots have been linked to serious mental health harms in heavy users, but there have been few standards for measuring whether they safeguard human well-being or just maximize for engagement. A ...
Researchers tested the accuracy of five AI models using 500 everyday math prompts. The results show that there is roughly a ...
Anthropic launched Claude Sonnet 4.5, claiming the title of the world’s best coding model. The key breakthrough is its ability to code autonomously for up to 30 hours, a massive increase in endurance.