AI image generation news. DALL-E, Midjourney, Stable Diffusion, Flux updates. Visual AI model releases and capabilities.
Get our weekly newsletter on pricing changes, new releases, and tools.
In this post, we build a multimodal retrieval system for aerospace manufacturing documents using Amazon Nova Multimodal Embeddings on Amazon Bedrock and Amazon S3 Vectors. We evaluate the system on 26 manufacturing queries and compare generation quality between a text-only pipeline and the multimodal pipeline.
In this post, we dive deep into the architecture and techniques we used to improve Miro’s bug routing, achieving six times fewer team reassignments and five times shorter time-to-resolution powered by Amazon Bedrock.
a quiet day lets us make a call for speakers!
Despite significant advances in vision-based equipment tracking, frequent occlusions caused by multiple interacting machines continue to degrade tracking accuracy on construction sites. While previous studies have explored multi-camera approaches, they often assume that at least one camera maintains a clear view at all times. In practice, however, such conditions are rarely guaranteed. Even with multiple CCTV systems installed, simultaneous occlusions across cameras frequently occur, making it difficult to identify which camera provides the most reliable view at any given moment.
Artificial intelligence (AI) experts from The University of Texas at Dallas have partnered with a Japanese company through its Irving, Texas-based subsidiary to help local governments prioritize road repairs. The system builds on NEXCO-Central's existing technology, which combines artificial intelligence and video footage gathered from mobile cameras to assess road conditions and provide a network-wide view of pavement conditions.
Paintings are often made up of thousands of tiny brushstrokes, each going in a certain direction, that are not easily observed by the viewer. A cross-disciplinary research team from the Penn State College of Information Sciences and Technology (IST) and Loughborough University in England has developed an image analysis method that helps to make the underlying brushstroke structure of paintings visible, giving new insight into how artists physically created their works.
a quiet day lets us reflect on coding agents "breaking containment"
In this post, we show how Sun Finance used Amazon Bedrock, Amazon Textract, and Amazon Rekognition to build an AI-powered identity verification (IDV) pipeline. The solution improved extraction accuracy from 79.7% to 90.8%, cut per-document costs by 91%, and reduced processing time from up to 20 hours to under 5 seconds. You'll learn how combining specialized OCR with large language model (LLM) structuring outperformed using either tool alone. You'll also learn how to architect a serverless fraud detection system using vector similarity search.
In today's hospitals and clinics, a dermatologist may use an artificial intelligence model for classifying skin lesions to assess if the lesion is at risk of developing into a cancer or if it is benign. But if the model is biased toward certain skin tones, it could fail to identify a high-risk patient.
Google recreates Cher's closet from "Clueless" with AI.
Google TV just got more Gemini features, including the ability to transform photos and videos with tools Nano Banana and Veo.
a quiet day.
In this post, we'll explore how multimodal BioFMs work, showcase real-world applications in drug discovery and clinical development, and contextualize how AWS enables organizations to build and deploy multimodal BioFMs.
a quiet day lets us reflect on the top conversation that AI leaders are having everywhere.
Virtual reality (VR) experiences and 360-degree videos are transforming viewers from passive observers into active participants immersed within a scene. Yet this shift raises an important question: Where do people direct their attention in such environments, and what shapes that attention?
MiMo-V2.5 is a native omnimodal model by Xiaomi. It delivers Pro-level agentic performance at roughly half the inference cost, while surpassing MiMo-V2-Omni in multimodal perception across image and video understanding...
with Cursor getting a $10B contract with xAI and a right to acquire for $60B.
When ChatGPT launched as an experimental prototype in late 2022, OpenAI’s chatbot became an everyday everything app for hundreds of millions of people. LLMs like ChatGPT were the new future: The entire tech industry was consumed by the inferno, with companies racing to spin up rival products. The ashes of the old tech world still…
Built by @aellman
2026 68 Ventures, LLC. All rights reserved.