AI Embedding Model Pricing Comparison
Compare pricing for embedding models across AWS Bedrock and direct APIs. Find the cheapest embedding API for Titan, Cohere Embed, and other embedding models. All prices per 1M input tokens.
Embedding API Pricing Overview
All Embedding Model Prices
Author | Model | Dimensions | Max Input | Bedrock / 1M | Direct API / 1M | Cheapest |
|---|---|---|---|---|---|---|
1,024 | 512 | $0.100 | N/A | Bedrock | ||
1,024 | 512 | $0.100 | N/A | Bedrock | ||
1,024 | 128,000 | $0.120 | N/A | Bedrock |
About AI Embedding Model Pricing
Embedding models convert text into dense vector representations used for semantic search, retrieval-augmented generation (RAG), clustering, and classification. Unlike LLMs, embedding models only have input pricing — there are no output tokens.
- AWS Bedrock provides managed access to embedding models from Amazon, Cohere, and others
- Direct APIs are available from providers like OpenAI, Cohere, and Voyage AI
All prices shown are per 1 million input tokens. Key factors when choosing an embedding model include dimensions (vector size), max input tokens (context window), and price.
Frequently Asked Questions
What are embedding models?
Embedding models convert text into numerical vectors that capture semantic meaning. These vectors enable similarity search, clustering, and retrieval-augmented generation (RAG). Unlike LLMs that generate text, embedding models produce fixed-size vectors as output.
What is the cheapest embedding API?
The cheapest embedding API starts at $0.1000 per 1M input tokens. Prices vary by model and provider. AWS Bedrock often offers competitive pricing for embedding models. Use our comparison table to find the best deal.
What do embedding dimensions mean?
Dimensions refer to the size of the output vector (e.g., 1024 means each text input is converted to a 1024-number vector). Higher dimensions can capture more nuance but require more storage and compute for similarity searches. Most modern models use 1024 dimensions.
What is max input tokens for embeddings?
Max input tokens is the maximum amount of text the model can process in a single embedding request. Models with larger context windows (like 128K tokens) can embed entire documents at once, while smaller windows (512 tokens) require chunking longer texts.
Built by @aellman
Tools
Directories
Rankings
- All Rankings
- Best LLM for Coding
- Best LLM for Math
- Best LLM for Writing
- Best LLM for RAG
- Best LLM for OpenClaw
- Best LLM for Cursor
- Best LLM for Windsurf
- Best LLM for Cline
- Best LLM for Aider
- Best LLM for GitHub Copilot
- Best LLM for Bolt
- Best LLM for Continue.dev
- MMLU-Pro
- GPQA
- LiveCodeBench
- Aider
- AIME
- MATH (Hard)
- Big-Bench Hard
2026 68 Ventures, LLC. All rights reserved.
