Price Per TokenPrice Per Token

WMT 2014 Leaderboard

Workshop on Machine Translation 2014 — multilingual translation quality benchmark.

Data from LayerLens

As of March 15, 2026, the top-scoring model on WMT 2014 is Gemini 2.0 Flash at 38.9%, followed by Llama 3.1 405B Instruct at 38.0% and Llama 4 Maverick at 38.0%. 13 models have been evaluated on this benchmark.

Last updated: March 15, 2026

Models

13

Best Score

38.9

Average

35.6

Std Dev

3.7

Categories
Multilingual
Provider
Model
Input $/M
Output $/M
WMT 2014
Actions
$0.100
$0.400
38.9
$0.900
$0.900
38.0
$0.150
$0.600
38.0
$2.000
$8.000
37.6
$3.000
$15.000
37.4
$2.500
$10.000
37.3
$0.080
$0.300
37.1
$0.800
$3.200
36.9
$0.014
$0.028
36.6
$0.800
$4.000
35.6
$0.070
$0.280
34.1
$2.500
$10.000
30.0
$0.030
$0.050
25.2

Pricing from OpenRouter. Benchmarks from Artificial Analysis.

108 out of our 483 tracked models have had a price change in March.

Get our weekly newsletter on pricing changes, new releases, and tools.

About WMT 2014

Workshop on Machine Translation 2014 — multilingual translation quality benchmark.

This leaderboard shows all models with WMT 2014 benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.

Frequently Asked Questions

Workshop on Machine Translation 2014 — multilingual translation quality benchmark.
As of March 15, 2026, Gemini 2.0 Flash leads the WMT 2014 leaderboard with a score of 38.9. Rankings change as new models are released and evaluated.
Currently 13 models have been evaluated on WMT 2014, with an average score of 35.6 and standard deviation of 3.7.
Benchmark scores are updated when new evaluations are published by our data sources (Artificial Analysis and LayerLens). Pricing data is refreshed daily from OpenRouter.