Stay updated on LLM benchmarks and evaluations. MMLU, GPQA, coding benchmarks, and model comparisons. Daily updates.
Will 2026 be looked back on as the pivotal year for making decisions about the singularity?
There's too much going on!
From rewriting Google’s search stack in the early 2000s to reviving sparse trillion-parameter models and co-designing TPUs with frontier ML research, Jeff Dean has quietly shaped nearly every layer of the modern AI stack.
MiniMax-M2.5 is a SOTA large language model designed for real-world productivity. Trained in a diverse range of complex real-world digital working environments, M2.5 builds upon the coding expertise of M2.1 to extend into general office work, reaching fluency in generating and operating Word, Excel, and Powerpoint files, context switching between diverse software environments, and working across different agent and human teams. Scoring 80.2% on SWE-Bench Verified, 51.3% on Multi-SWE-Bench, and 76.3% on BrowseComp, M2.5 is also more token efficient than previous generations, having been trained to optimize its actions and output through planning.
The past year has marked a turning point for Chinese AI. Since DeepSeek released its R1 reasoning model in January 2025, Chinese companies have repeatedly delivered AI models that match the performance of leading Western models at a fraction of the cost. Just last week the Chinese firm Moonshot AI released its latest open-weight model,…
We have Opus 4.5 at home
Inside Boltz, AlphaFold’s Legacy, and the Tools Powering Next-Gen Molecular Discovery
Today we’re excited to announce that the NVIDIA Nemotron 3 Nano 30B model with nbsp;3B active parameters is now generally available in the Amazon SageMaker JumpStart model catalog. You can accelerate innovation and deliver tangible business value with Nemotron 3 Nano on Amazon Web Services (AWS) without having to manage model deployment complexities. You can power your generative AI applications with Nemotron capabilities using the managed deployment capabilities offered by SageMaker JumpStart.
Engineers at the University of California San Diego have developed a new way to train artificial intelligence systems to solve complex problems more reliably, particularly those that require interpreting both text and images. In widely used tests to evaluate mathematical reasoning, AI models trained with this method outperformed others in solving math word problems containing visual elements like charts and diagrams.
a quiet day lets us reflect on a pithy quote from the ClawFather.
How can you quantify creativity?
Built by @aellman
2026 68 Ventures, LLC. All rights reserved.