Skip to results
salesevals.com/Evaluated Jul 1, 2026

Which models know sales?

26 model configurations coach GPT- and Sonnet-generated synthetic sales calls with hidden ground truth. A judge scores each coaching note from 0–100 on whether it found the real strengths, flaws, and next moves.

Calls
50
Models
26
Evaluations
1300
Benchmark
86.2
50 calls · 1300 evaluationsRank: Sales coaching benchmarkAll available runsBuild-time static dataEvals completed Jul 1, 2026
All eight scoring axes

How each model scores on every dimension

Cells shaded by score — greener is better. Models sorted by benchmark score.

Average score by model and scorecard dimension.
ModelOverallNeedle recallEvidence groundingFalse-positive controlPrioritizationActionabilitySales instinctTechnical accuracy
gpt-5.4 xhigh
GPT-5.4 · xhigh
89.0
88.1
93.9
89.6
88.2
92.9
90.6
91.5
gpt-5.4 high
GPT-5.4 · high
89.0
87.6
93.4
89.9
88.0
92.8
90.6
91.3
gpt-5.5 medium
GPT-5.5 · medium
88.8
88.5
93.7
88.0
87.8
93.3
90.6
91.2
gpt-5.5 xhigh
GPT-5.5 · xhigh
89.0
88.5
93.9
89.2
87.6
93.3
89.9
91.3
gpt-5.5 high
GPT-5.5 · high
88.6
88.4
93.6
89.0
86.7
93.4
90.2
91.1
gpt-5.4 medium
GPT-5.4 · medium
88.3
86.5
93.3
88.8
87.2
92.3
90.0
90.9
gpt-5.5 none
GPT-5.5 · none
88.1
86.8
93.5
87.9
86.9
92.9
90.0
91.1
gpt-5.5 low
GPT-5.5 · low
87.7
87.0
92.9
87.1
86.3
92.4
89.2
90.4
fable 5 high
Claude Fable 5 · high
87.5
86.9
90.1
83.5
86.7
93.0
90.4
89.6
gpt-5.4 low
GPT-5.4 · low
87.4
86.0
92.1
86.8
86.3
91.8
89.0
90.4
gpt-5.4 none
GPT-5.4 · none
87.4
85.8
92.7
86.7
86.7
91.4
88.7
90.5
opus 4.7 max
Claude Opus 4.7 · max
87.3
86.9
90.6
83.5
85.9
92.7
89.1
89.3
opus 4.7 high
Claude Opus 4.7 · high
86.8
86.1
89.0
82.4
85.3
92.1
88.8
88.8
opus 4.8 medium
Claude Opus 4.8 · medium
85.8
84.7
89.7
81.8
83.8
90.5
88.1
88.2
opus 4.7 medium
Claude Opus 4.7 · medium
85.6
84.2
89.1
82.6
83.9
91.0
88.2
87.8
opus 4.7 xhigh
Claude Opus 4.7 · xhigh
85.6
84.6
89.0
82.3
83.9
91.6
87.8
88.0
opus 4.7 low
Claude Opus 4.7 · low
85.6
84.1
89.6
82.7
84.1
90.9
87.6
88.5
opus 4.8 max
Claude Opus 4.8 · max
85.4
85.4
88.6
80.9
83.8
91.4
87.6
88.2
opus 4.8 xhigh
Claude Opus 4.8 · xhigh
85.2
84.8
88.7
81.3
83.8
90.4
87.6
88.3
opus 4.8 high
Claude Opus 4.8 · high
84.9
83.6
89.2
81.1
83.7
90.5
87.1
88.6
sonnet 4.6
Claude Sonnet 4.6 · default
84.6
83.8
87.0
79.6
83.2
91.2
87.6
86.6
sonnet 5
Claude Sonnet 5 · default
84.6
83.8
88.5
81.0
82.3
89.3
85.9
87.8
opus 4.8 low
Claude Opus 4.8 · low
84.0
82.8
88.5
80.4
81.9
89.3
85.5
87.5
glm 5.2
GLM 5.2 · default
84.0
82.2
88.4
80.8
81.6
89.4
85.7
87.3
deepseek v4 pro
DeepSeek V4 Pro · default
83.5
81.9
86.9
79.5
81.9
88.5
84.9
86.6
gemini 3.1 pro preview
Gemini 3.1 Pro Preview · default
78.9
74.5
86.2
78.0
76.9
84.1
81.4
84.1
Mean86.385.190.584.084.891.288.289.0
Coach model only

Estimated coach-run cost

Estimated from the saved visible prompt, saved coach output, and AI Gateway listed input/output rates. Generation and judging are excluded.

Est. coach spend
$150.89
Median / call
$0.11
Rows
26
deepseek v4 pro
DeepSeek V4 Pro · default
Score
83.5
Cost / call
$0.0047
Total cost
$0.24
Avg input
4,559
Avg output
3,180
Rate in / out
$0.43 / $0.87
glm 5.2
GLM 5.2 · default
Score
84.0
Cost / call
$0.03
Total cost
$1.30
Avg input
4,559
Avg output
4,455
Rate in / out
$1.40 / $4.40
gemini 3.1 pro preview
Gemini 3.1 Pro Preview · default
Score
78.9
Cost / call
$0.03
Total cost
$1.60
Avg input
4,559
Avg output
1,900
Rate in / out
$2.00 / $12.00
sonnet 5
Claude Sonnet 5 · default
Score
84.6
Cost / call
$0.05
Total cost
$2.50
Avg input
4,559
Avg output
4,079
Rate in / out
$2.00 / $10.00
gpt-5.4 none
GPT-5.4 · none
Score
87.4
Cost / call
$0.07
Total cost
$3.47
Avg input
4,559
Avg output
3,865
Rate in / out
$2.50 / $15.00
gpt-5.4 xhigh
GPT-5.4 · xhigh
Score
89.0
Cost / call
$0.07
Total cost
$3.65
Avg input
4,559
Avg output
4,110
Rate in / out
$2.50 / $15.00
gpt-5.4 medium
GPT-5.4 · medium
Score
88.3
Cost / call
$0.07
Total cost
$3.66
Avg input
4,559
Avg output
4,115
Rate in / out
$2.50 / $15.00
gpt-5.4 high
GPT-5.4 · high
Score
89.0
Cost / call
$0.07
Total cost
$3.67
Avg input
4,559
Avg output
4,133
Rate in / out
$2.50 / $15.00
gpt-5.4 low
GPT-5.4 · low
Score
87.4
Cost / call
$0.07
Total cost
$3.68
Avg input
4,559
Avg output
4,142
Rate in / out
$2.50 / $15.00
opus 4.8 low
Claude Opus 4.8 · low
Score
84.0
Cost / call
$0.10
Total cost
$4.76
Avg input
4,559
Avg output
2,893
Rate in / out
$5.00 / $25.00
opus 4.7 low
Claude Opus 4.7 · low
Score
85.6
Cost / call
$0.10
Total cost
$4.93
Avg input
4,559
Avg output
3,034
Rate in / out
$5.00 / $25.00
sonnet 4.6
Claude Sonnet 4.6 · default
Score
84.6
Cost / call
$0.10
Total cost
$5.21
Avg input
4,559
Avg output
6,034
Rate in / out
$3.00 / $15.00
opus 4.8 medium
Claude Opus 4.8 · medium
Score
85.8
Cost / call
$0.11
Total cost
$5.38
Avg input
4,559
Avg output
3,389
Rate in / out
$5.00 / $25.00
opus 4.7 medium
Claude Opus 4.7 · medium
Score
85.6
Cost / call
$0.11
Total cost
$5.52
Avg input
4,559
Avg output
3,508
Rate in / out
$5.00 / $25.00
opus 4.8 high
Claude Opus 4.8 · high
Score
84.9
Cost / call
$0.11
Total cost
$5.59
Avg input
4,559
Avg output
3,563
Rate in / out
$5.00 / $25.00
opus 4.7 high
Claude Opus 4.7 · high
Score
86.8
Cost / call
$0.13
Total cost
$6.48
Avg input
4,559
Avg output
4,273
Rate in / out
$5.00 / $25.00
opus 4.8 xhigh
Claude Opus 4.8 · xhigh
Score
85.2
Cost / call
$0.13
Total cost
$6.63
Avg input
4,559
Avg output
4,393
Rate in / out
$5.00 / $25.00
opus 4.7 xhigh
Claude Opus 4.7 · xhigh
Score
85.6
Cost / call
$0.14
Total cost
$7.07
Avg input
4,559
Avg output
4,747
Rate in / out
$5.00 / $25.00
opus 4.8 max
Claude Opus 4.8 · max
Score
85.4
Cost / call
$0.15
Total cost
$7.55
Avg input
4,559
Avg output
5,127
Rate in / out
$5.00 / $25.00
opus 4.7 max
Claude Opus 4.7 · max
Score
87.3
Cost / call
$0.16
Total cost
$8.11
Avg input
4,559
Avg output
5,574
Rate in / out
$5.00 / $25.00
gpt-5.5 none
GPT-5.5 · none
Score
88.1
Cost / call
$0.17
Total cost
$8.45
Avg input
4,559
Avg output
4,876
Rate in / out
$5.00 / $30.00
gpt-5.5 low
GPT-5.5 · low
Score
87.7
Cost / call
$0.17
Total cost
$8.66
Avg input
4,559
Avg output
5,015
Rate in / out
$5.00 / $30.00
gpt-5.5 medium
GPT-5.5 · medium
Score
88.8
Cost / call
$0.17
Total cost
$8.67
Avg input
4,559
Avg output
5,021
Rate in / out
$5.00 / $30.00
gpt-5.5 xhigh
GPT-5.5 · xhigh
Score
89.0
Cost / call
$0.18
Total cost
$8.98
Avg input
4,559
Avg output
5,228
Rate in / out
$5.00 / $30.00
gpt-5.5 high
GPT-5.5 · high
Score
88.6
Cost / call
$0.18
Total cost
$9.02
Avg input
4,559
Avg output
5,253
Rate in / out
$5.00 / $30.00
fable 5 high
Claude Fable 5 · high
Score
87.5
Cost / call
$0.32
Total cost
$16.12
Avg input
4,559
Avg output
5,535
Rate in / out
$10.00 / $50.00

Token counts use the saved coach prompt and saved structured output with a character-count estimate. Exact Gateway usage, cache reads, reasoning tokens, and any repair retries were not persisted, so this should be read as directional cost per coaching result.