Price Per TokenPrice Per Token

HumanEval Leaderboard

OpenAI HumanEval benchmark measuring Python code generation from function docstrings.

Data from LayerLens

Models

69

Best Score

97.6

Average

89.1

Std Dev

10.8

Categories
Computer Science and Programming
Provider
Model
Input $/M
Output $/M
HumanEval
Actions
$3.000
$15.000
97.6
$0.700
$2.500
97.4
$3.000
$15.000
97.0
$3.000
$15.000
97.0
$2.000
$12.000
97.0
$5.000
$25.000
97.0
$5.000
$25.000
97.0
$5.000
$25.000
97.0
$5.000
$25.000
97.0
$1.100
$4.400
96.3
$3.000
$15.000
96.3
$3.000
$15.000
96.3
$15.000
$75.000
96.3
$15.000
$75.000
96.3
$1.000
$5.000
96.3
$0.550
$3.500
96.3
$3.000
$15.000
96.2
$0.300
$0.500
95.7
$0.080
$0.240
95.7
$0.080
$0.240
95.7
$15.000
$75.000
95.7
$15.000
$75.000
95.7
$0.060
$0.140
95.5
$0.150
$0.400
95.5
$1.100
$4.400
95.1
$1.250
$10.000
95.1
$0.200
$1.100
95.1
$1.250
$10.000
94.5
$1.000
$5.000
93.9
$0.250
$0.400
93.9
$2.000
$8.000
93.3
$0.080
$0.280
93.3
$0.080
$0.280
93.3
$0.450
$2.150
93.3
$0.050
$0.200
93.3
$0.050
$0.200
93.3
$0.400
$2.400
93.3
$0.220
$1.000
92.7
$3.000
$15.000
92.1
$0.071
$0.100
92.1
$0.060
$0.400
92.1
$0.060
$0.400
92.1
$3.000
$15.000
91.5
$3.000
$15.000
90.9
$0.455
$1.820
90.2
$0.455
$1.820
90.2
$0.500
$1.500
90.2
$2.500
$10.000
88.4
$0.400
$2.000
88.4
$0.100
$0.400
87.8
$0.320
$0.890
87.2
$0.280
$1.100
86.6
$0.040
$0.150
85.4
$0.400
$2.000
85.4
$0.150
$0.600
84.8
$0.060
$0.180
83.5
$2.500
$10.000
82.9
$0.120
$0.390
82.3
$2.000
$6.000
82.3
$0.080
$0.300
81.1
$0.060
$0.240
78.0
$0.100
$0.300
77.4
$0.800
$3.200
76.8
$0.800
$4.000
75.6
$0.020
$0.040
73.2
$4.000
$4.000
67.1
$1.000
$1.000
51.2
$2.500
$10.000
48.2
$0.051
$0.340
47.6

Pricing from OpenRouter. Benchmarks from Artificial Analysis.

OpenClaw

Deploy OpenClaw in Under 1 Minute We handle hosting, scaling, and maintenance

93 out of our 301 tracked models have had a price change in March.

Get our weekly newsletter on pricing changes, new releases, and tools.

About HumanEval

OpenAI HumanEval benchmark measuring Python code generation from function docstrings.

This leaderboard shows all models with HumanEval benchmark scores, ranked from highest to lowest. Pricing data is included to help you compare performance against cost.

Advertise with us