Price Per TokenPrice Per Token

MBPP Plus Leaderboard

Mostly Basic Python Problems Plus — tests Python code generation with enhanced test cases.

Data from LayerLens

As of April 18, 2026, the top-scoring model on MBPP Plus is Qwen3 235B A22B at 66.2%, followed by Qwen3 235B A22B at 66.2% and R1 at 64.7%. 65 models have been evaluated on this benchmark.

Last updated: April 18, 2026

Models

65

Best Score

66.2

Average

57.2

Std Dev

9.5

Categories
Computer Science and Programming
Provider
Model
Input $/M
Output $/M
MBPP Plus
Actions
$0.455
$0.900
66.2
$0.455
$0.900
66.2
$0.550
$2.000
64.7
$1.100
$4.400
64.3
$0.080
$0.240
63.4
$0.080
$0.240
63.4
$15.000
$75.000
63.4
$15.000
$75.000
63.4
$0.550
$2.200
63.2
$0.220
$0.900
63.2
$0.125
$1.000
63.2
$0.125
$1.000
63.2
$0.090
$0.780
63.2
$3.000
$15.000
63.2
$2.000
$8.000
63.0
$1.250
$10.000
63.0
$1.000
$10.000
63.0
$0.050
$0.400
63.0
$0.050
$0.400
63.0
$0.050
$0.400
63.0
$0.071
$0.100
62.7
$0.080
$0.280
62.6
$0.080
$0.280
62.6
$3.000
$15.000
62.2
$3.000
$15.000
62.2
$3.000
$15.000
62.1
$0.150
$0.580
62.1
$0.150
$0.580
62.1
$0.065
$0.140
61.9
$0.400
$1.760
61.9
$0.400
$1.760
61.9
$2.000
$8.000
61.6
$0.100
$0.400
61.1
$0.300
$0.500
61.1
$0.070
$0.270
61.1
$3.000
$15.000
60.6
$0.150
$0.600
60.1
$0.080
$0.160
59.8
$0.500
$2.150
59.5
$3.000
$15.000
58.7
$3.000
$15.000
58.7
$3.000
$15.000
58.5
$0.300
$2.500
57.7
$0.300
$2.500
57.7
$0.300
$2.500
57.7
$0.300
$2.500
57.7
$0.075
$0.200
56.6
$0.060
$0.120
55.8
$0.400
$2.000
54.5
$2.000
$6.000
54.0
$0.080
$0.300
54.0
$1.000
$5.000
53.4
$1.000
$5.000
53.4
$2.500
$10.000
53.2
$0.014
$0.028
51.1
$0.060
$0.240
50.8
$0.280
$0.900
49.7
$0.800
$3.200
49.2
$0.035
$0.140
47.9
$0.900
$0.900
36.8
$0.800
$4.000
33.9
$2.500
$10.000
32.3
$2.500
$10.000
30.5
$0.030
$0.050
26.5
$0.300
$0.300
24.1

Pricing from OpenRouter. Benchmarks from Artificial Analysis.

Get our weekly newsletter on pricing changes, new releases, and tools.

Join the Price Per Token Community
8 Ways to Use Fewer Tokens

About MBPP Plus

Mostly Basic Python Problems Plus — tests Python code generation with enhanced test cases.

This leaderboard shows all models with MBPP Plus benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.

Frequently Asked Questions

Mostly Basic Python Problems Plus — tests Python code generation with enhanced test cases.
As of April 18, 2026, Qwen3 235B A22B leads the MBPP Plus leaderboard with a score of 66.2. Rankings change as new models are released and evaluated.
Currently 65 models have been evaluated on MBPP Plus, with an average score of 57.2 and standard deviation of 9.5.
Benchmark scores are updated when new evaluations are published by our data sources (Artificial Analysis and LayerLens). Pricing data is refreshed daily from OpenRouter.