Stay updated on LLM benchmarks and evaluations. MMLU, GPQA, coding benchmarks, and model comparisons. Daily updates.
Get our weekly newsletter on pricing changes, new releases, and tools.
In this post, we build a multimodal retrieval system for aerospace manufacturing documents using Amazon Nova Multimodal Embeddings on Amazon Bedrock and Amazon S3 Vectors. We evaluate the system on 26 manufacturing queries and compare generation quality between a text-only pipeline and the multimodal pipeline.
Amazon Quick helps turn your large enterprise data into fast and accurate AI-powered decisions. In this post, you will learn about five new capabilities of Amazon Quick that accelerate how data professionals deliver trusted AI-powered insights at enterprise scale.
AI-powered dictation apps are useful for replying to emails, taking notes, and even coding through your voice
a quiet day lets us make a call for speakers!
a quiet day lets us reflect on coding agents "breaking containment"
In this post, we introduce a systematic framework for LLM migration or upgrade in generative AI production, encompassing essential tools, methodologies, and best practices. The framework facilitates transitions between different LLMs by providing robust protocols for prompt conversion and optimization.
In this post, we show how Sun Finance used Amazon Bedrock, Amazon Textract, and Amazon Rekognition to build an AI-powered identity verification (IDV) pipeline. The solution improved extraction accuracy from 79.7% to 90.8%, cut per-document costs by 91%, and reduced processing time from up to 20 hours to under 5 seconds. You'll learn how combining specialized OCR with large language model (LLM) structuring outperformed using either tool alone. You'll also learn how to architect a serverless fraud detection system using vector similarity search.
A new method developed by MIT researchers can accelerate a privacy-preserving artificial intelligence training method by about 81%. This advance could enable a wider array of resource-constrained edge devices, like sensors and smartwatches, to deploy more accurate AI models while keeping user data secure.
a quiet day.
Spud lives!
In this post, we'll explore how multimodal BioFMs work, showcase real-world applications in drug discovery and clinical development, and contextualize how AWS enables organizations to build and deploy multimodal BioFMs.
a quiet day lets us reflect on the top conversation that AI leaders are having everywhere.
In this post, we walk through building a scalable, event-driven transcription pipeline that automatically processes audio files uploaded to Amazon Simple Storage Service (Amazon S3), and show you how to use Amazon EC2 Spot Instances and buffered streaming inference to further reduce costs.
Most of what AI chatbots know about the world comes from devouring massive amounts of text from the internet—with all its facts, falsehoods, knowledge and nonsense. Given that input, is it possible that AI language models have an understanding of the real world? As it turns out, they do—or at least something like an understanding. That's according to a new study by researchers from Brown University to be presented on Saturday, April 25 at the International Conference on Learning Representations in Rio de Janeiro, Brazil. The study is published on the arXiv preprint server.
Today, Amazon SageMaker AInbsp; supports optimized generative AI inference recommendations. By delivering validated, optimal deployment configurations with performance metrics, Amazon SageMaker AI keeps your model developers focused on building accurate models, not managing infrastructure.
Google's newest TPUs are faster and cheaper than the previous versions. But the company is still embracing Nvidia in its cloud — for now.
Language models like ChatGPT are not neutral. Without our realizing it, they can absorb all kinds of bias—for example, around gender and ethnicity—which then become increasingly embedded in the model. According to AI researcher Oskar van der Wal, we need different kinds of measurements to detect these biases so that they can be removed from the models. In his doctoral thesis, he shows how this can be done. On 29 April, he will defend his thesis at the University of Amsterdam.
A paddle-wielding robot is so adept at playing table tennis that it is posing a tough challenge to elite human players and sometimes defeating them, according to a new study that shows how advances in artificial intelligence are making robots more agile.
A University of Queensland study has shown large language models (LLMs) used in AI content moderation may be prone to subtle biases that undermine their neutrality. A team led by data scientist Professor Gianluca Demartini from UQ's School of Electrical Engineering and Computer Science used persona prompting to test the tendency of AI chatbots to encode and reproduce political biases, and found significant behavioral shifts.
MiMo-V2.5-Pro is Xiaomi’s flagship model, delivering strong performance in general agentic capabilities, complex software engineering, and long-horizon tasks, with top rankings on benchmarks such as ClawEval, GDPVal, and SWE-bench Pro....
Built by @aellman
2026 68 Ventures, LLC. All rights reserved.