Price Per TokenPrice Per Token

MBPP Plus Leaderboard

Mostly Basic Python Problems Plus — tests Python code generation with enhanced test cases.

Data from LayerLens

Models

57

Best Score

66.2

Average

56.6

Std Dev

10.0

Categories
Computer Science and Programming
Provider
Model
Input $/M
Output $/M
MBPP Plus
Actions
$0.455
$1.820
66.2
$0.455
$1.820
66.2
$0.700
$2.500
64.7
$1.100
$4.400
64.3
$0.080
$0.240
63.4
$0.080
$0.240
63.4
$15.000
$75.000
63.4
$15.000
$75.000
63.4
$1.100
$4.400
63.2
$0.220
$1.000
63.2
$0.250
$2.000
63.2
$3.000
$15.000
63.2
$2.000
$8.000
63.0
$1.250
$10.000
63.0
$1.250
$10.000
63.0
$0.050
$0.400
63.0
$0.071
$0.100
62.7
$0.080
$0.280
62.6
$0.080
$0.280
62.6
$3.000
$15.000
62.2
$3.000
$15.000
62.2
$3.000
$15.000
62.1
$0.150
$0.400
62.1
$0.060
$0.140
61.9
$0.400
$2.200
61.9
$2.000
$8.000
61.6
$0.100
$0.400
61.1
$0.300
$0.500
61.1
$0.070
$0.270
61.1
$2.500
$10.000
60.6
$3.000
$15.000
60.6
$0.150
$0.600
60.1
$0.040
$0.150
59.8
$0.450
$2.150
59.5
$3.000
$15.000
58.7
$3.000
$15.000
58.5
$0.300
$2.500
57.7
$0.300
$2.500
57.7
$0.060
$0.180
56.6
$0.020
$0.040
55.8
$0.400
$2.000
54.5
$2.000
$6.000
54.0
$0.080
$0.300
54.0
$1.000
$5.000
53.4
$1.000
$5.000
53.4
$2.500
$10.000
53.2
$0.320
$0.890
51.1
$0.060
$0.240
50.8
$0.280
$1.100
49.7
$0.800
$3.200
49.2
$0.035
$0.140
47.9
$4.000
$4.000
36.8
$0.800
$4.000
33.9
$2.500
$10.000
32.3
$2.500
$10.000
30.5
$0.051
$0.340
26.5
$0.300
$0.300
24.1

Pricing from OpenRouter. Benchmarks from Artificial Analysis.

OpenClaw

Deploy OpenClaw in Under 1 Minute We handle hosting, scaling, and maintenance

93 out of our 301 tracked models have had a price change in March.

Get our weekly newsletter on pricing changes, new releases, and tools.

About MBPP Plus

Mostly Basic Python Problems Plus — tests Python code generation with enhanced test cases.

This leaderboard shows all models with MBPP Plus benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.

Advertise with us