salesevals.com/Evaluated Jul 1, 2026

Which models know sales?

26 model configurations coach GPT- and Sonnet-generated synthetic sales calls with hidden ground truth. A judge scores each coaching note from 0–100 on whether it found the real strengths, flaws, and next moves.

Calls: 50
Models: 26
Evaluations: 1300
Benchmark: 86.2

50 calls · 1300 evaluationsRank: Sales coaching benchmarkAll available runsBuild-time static dataEvals completed Jul 1, 2026

All eight scoring axes

How each model scores on every dimension

Cells shaded by score — greener is better. Models sorted by benchmark score.

Average score by model and scorecard dimension.
Model	Overall	Needle recall	Evidence grounding	False-positive control	Prioritization	Actionability	Sales instinct	Technical accuracy
gpt-5.4 xhigh GPT-5.4 · xhigh	89.0	88.1	93.9	89.6	88.2	92.9	90.6	91.5
gpt-5.4 high GPT-5.4 · high	89.0	87.6	93.4	89.9	88.0	92.8	90.6	91.3
gpt-5.5 medium GPT-5.5 · medium	88.8	88.5	93.7	88.0	87.8	93.3	90.6	91.2
gpt-5.5 xhigh GPT-5.5 · xhigh	89.0	88.5	93.9	89.2	87.6	93.3	89.9	91.3
gpt-5.5 high GPT-5.5 · high	88.6	88.4	93.6	89.0	86.7	93.4	90.2	91.1
gpt-5.4 medium GPT-5.4 · medium	88.3	86.5	93.3	88.8	87.2	92.3	90.0	90.9
gpt-5.5 none GPT-5.5 · none	88.1	86.8	93.5	87.9	86.9	92.9	90.0	91.1
gpt-5.5 low GPT-5.5 · low	87.7	87.0	92.9	87.1	86.3	92.4	89.2	90.4
fable 5 high Claude Fable 5 · high	87.5	86.9	90.1	83.5	86.7	93.0	90.4	89.6
gpt-5.4 low GPT-5.4 · low	87.4	86.0	92.1	86.8	86.3	91.8	89.0	90.4
gpt-5.4 none GPT-5.4 · none	87.4	85.8	92.7	86.7	86.7	91.4	88.7	90.5
opus 4.7 max Claude Opus 4.7 · max	87.3	86.9	90.6	83.5	85.9	92.7	89.1	89.3
opus 4.7 high Claude Opus 4.7 · high	86.8	86.1	89.0	82.4	85.3	92.1	88.8	88.8
opus 4.8 medium Claude Opus 4.8 · medium	85.8	84.7	89.7	81.8	83.8	90.5	88.1	88.2
opus 4.7 medium Claude Opus 4.7 · medium	85.6	84.2	89.1	82.6	83.9	91.0	88.2	87.8
opus 4.7 xhigh Claude Opus 4.7 · xhigh	85.6	84.6	89.0	82.3	83.9	91.6	87.8	88.0
opus 4.7 low Claude Opus 4.7 · low	85.6	84.1	89.6	82.7	84.1	90.9	87.6	88.5
opus 4.8 max Claude Opus 4.8 · max	85.4	85.4	88.6	80.9	83.8	91.4	87.6	88.2
opus 4.8 xhigh Claude Opus 4.8 · xhigh	85.2	84.8	88.7	81.3	83.8	90.4	87.6	88.3
opus 4.8 high Claude Opus 4.8 · high	84.9	83.6	89.2	81.1	83.7	90.5	87.1	88.6
sonnet 4.6 Claude Sonnet 4.6 · default	84.6	83.8	87.0	79.6	83.2	91.2	87.6	86.6
sonnet 5 Claude Sonnet 5 · default	84.6	83.8	88.5	81.0	82.3	89.3	85.9	87.8
opus 4.8 low Claude Opus 4.8 · low	84.0	82.8	88.5	80.4	81.9	89.3	85.5	87.5
glm 5.2 GLM 5.2 · default	84.0	82.2	88.4	80.8	81.6	89.4	85.7	87.3
deepseek v4 pro DeepSeek V4 Pro · default	83.5	81.9	86.9	79.5	81.9	88.5	84.9	86.6
gemini 3.1 pro preview Gemini 3.1 Pro Preview · default	78.9	74.5	86.2	78.0	76.9	84.1	81.4	84.1
Mean	86.3	85.1	90.5	84.0	84.8	91.2	88.2	89.0

Coach model only

Estimated coach-run cost

Estimated from the saved visible prompt, saved coach output, and AI Gateway listed input/output rates. Generation and judging are excluded.

Est. coach spend: $150.89
Median / call: $0.11
Rows: 26

Model

Score

Cost / call

Total cost

Avg input

Avg output

Rate in / out

deepseek v4 pro

DeepSeek V4 Pro · default

Score

83.5

Cost / call

$0.0047

Total cost

$0.24

Avg input

4,559

Avg output

3,180

Rate in / out

$0.43 / $0.87

glm 5.2

GLM 5.2 · default

Score

84.0

Cost / call

$0.03

Total cost

$1.30

Avg input

4,559

Avg output

4,455

Rate in / out

$1.40 / $4.40

gemini 3.1 pro preview

Gemini 3.1 Pro Preview · default

Score

78.9

Cost / call

$0.03

Total cost

$1.60

Avg input

4,559

Avg output

1,900

Rate in / out

$2.00 / $12.00

sonnet 5

Claude Sonnet 5 · default

Score

84.6

Cost / call

$0.05

Total cost

$2.50

Avg input

4,559

Avg output

4,079

Rate in / out

$2.00 / $10.00

gpt-5.4 none

GPT-5.4 · none

Score

87.4

Cost / call

$0.07

Total cost

$3.47

Avg input

4,559

Avg output

3,865

Rate in / out

$2.50 / $15.00

gpt-5.4 xhigh

GPT-5.4 · xhigh

Score

89.0

Cost / call

$0.07

Total cost

$3.65

Avg input

4,559

Avg output

4,110

Rate in / out

$2.50 / $15.00

gpt-5.4 medium

GPT-5.4 · medium

Score

88.3

Cost / call

$0.07

Total cost

$3.66

Avg input

4,559

Avg output

4,115

Rate in / out

$2.50 / $15.00

gpt-5.4 high

GPT-5.4 · high

Score

89.0

Cost / call

$0.07

Total cost

$3.67

Avg input

4,559

Avg output

4,133

Rate in / out

$2.50 / $15.00

gpt-5.4 low

GPT-5.4 · low

Score

87.4

Cost / call

$0.07

Total cost

$3.68

Avg input

4,559

Avg output

4,142

Rate in / out

$2.50 / $15.00

opus 4.8 low

Claude Opus 4.8 · low

Score

84.0

Cost / call

$0.10

Total cost

$4.76

Avg input

4,559

Avg output

2,893

Rate in / out

$5.00 / $25.00

opus 4.7 low

Claude Opus 4.7 · low

Score

85.6

Cost / call

$0.10

Total cost

$4.93

Avg input

4,559

Avg output

3,034

Rate in / out

$5.00 / $25.00

sonnet 4.6

Claude Sonnet 4.6 · default

Score

84.6

Cost / call

$0.10

Total cost

$5.21

Avg input

4,559

Avg output

6,034

Rate in / out

$3.00 / $15.00

opus 4.8 medium

Claude Opus 4.8 · medium

Score

85.8

Cost / call

$0.11

Total cost

$5.38

Avg input

4,559

Avg output

3,389

Rate in / out

$5.00 / $25.00

opus 4.7 medium

Claude Opus 4.7 · medium

Score

85.6

Cost / call

$0.11

Total cost

$5.52

Avg input

4,559

Avg output

3,508

Rate in / out

$5.00 / $25.00

opus 4.8 high

Claude Opus 4.8 · high

Score

84.9

Cost / call

$0.11

Total cost

$5.59

Avg input

4,559

Avg output

3,563

Rate in / out

$5.00 / $25.00

opus 4.7 high

Claude Opus 4.7 · high

Score

86.8

Cost / call

$0.13

Total cost

$6.48

Avg input

4,559

Avg output

4,273

Rate in / out

$5.00 / $25.00

opus 4.8 xhigh

Claude Opus 4.8 · xhigh

Score

85.2

Cost / call

$0.13

Total cost

$6.63

Avg input

4,559

Avg output

4,393

Rate in / out

$5.00 / $25.00

opus 4.7 xhigh

Claude Opus 4.7 · xhigh

Score

85.6

Cost / call

$0.14

Total cost

$7.07

Avg input

4,559

Avg output

4,747

Rate in / out

$5.00 / $25.00

opus 4.8 max

Claude Opus 4.8 · max

Score

85.4

Cost / call

$0.15

Total cost

$7.55

Avg input

4,559

Avg output

5,127

Rate in / out

$5.00 / $25.00

opus 4.7 max

Claude Opus 4.7 · max

Score

87.3

Cost / call

$0.16

Total cost

$8.11

Avg input

4,559

Avg output

5,574

Rate in / out

$5.00 / $25.00

gpt-5.5 none

GPT-5.5 · none

Score

88.1

Cost / call

$0.17

Total cost

$8.45

Avg input

4,559

Avg output

4,876

Rate in / out

$5.00 / $30.00

gpt-5.5 low

GPT-5.5 · low

Score

87.7

Cost / call

$0.17

Total cost

$8.66

Avg input

4,559

Avg output

5,015

Rate in / out

$5.00 / $30.00

gpt-5.5 medium

GPT-5.5 · medium

Score

88.8

Cost / call

$0.17

Total cost

$8.67

Avg input

4,559

Avg output

5,021

Rate in / out

$5.00 / $30.00

gpt-5.5 xhigh

GPT-5.5 · xhigh

Score

89.0

Cost / call

$0.18

Total cost

$8.98

Avg input

4,559

Avg output

5,228

Rate in / out

$5.00 / $30.00

gpt-5.5 high

GPT-5.5 · high

Score

88.6

Cost / call

$0.18

Total cost

$9.02

Avg input

4,559

Avg output

5,253

Rate in / out

$5.00 / $30.00

fable 5 high

Claude Fable 5 · high

Score

87.5

Cost / call

$0.32

Total cost

$16.12

Avg input

4,559

Avg output

5,535

Rate in / out

$10.00 / $50.00

Token counts use the saved coach prompt and saved structured output with a character-count estimate. Exact Gateway usage, cache reads, reasoning tokens, and any repair retries were not persisted, so this should be read as directional cost per coaching result.