Default all-Exa setups overpay up to 8.2× — here's the per-class routing that fixes it.
Naive 'all-Exa' setups overpay up to 8.2x — routing each query class to the cheapest engine that clears its quality bar (blended $0.85/1k vs $7/1k flat) closes the gap. Exa stays the right call where it wins; the savings come from not defaulting everything to it.
Comparison is a naive all-Exa default vs an optimized per-class router — the 8.2x is the ceiling a naive default leaves on the table, not a claim that Exa is overpriced. Best-value picks are 'cheapest for THESE query types' at list prices (see price_caveat).
The headline is a cost finding — cost/1k and latency are measured on all 170 queries. Quality is judged on a 62-query gold subset and shown as indicative per class; we do not rank on quality.
Best value by query class
Each class ranked by cost then latency. “Best value” is the cheapest engine that clears the class quality bar (computed in the dataset, 0.05 tolerance). Quality is indicative — see methodology.
Head-to-head comparisons
The best search API for AI agents and RAG depends on your query mix. Compare engines directly:
Read the full report
Detailed findings, methodology, per-class breakdowns, and caveats — the version worth linking to.
Methodology
Credibility is the point — here's exactly how this was measured, and where it's indicative.
- Queries
- 170 class-discriminating benchmark queries across 5 classes (Academic 35, News 33, Page lookup 33, Direct answer 34, Web 35).
- Quality judging
- 62 queries judged by LLM judge (kappa-validated vs human gold). Quadratic-weighted κ = 0.78 vs human gold (target 0.6, passes).
- What's measured where
- QUALITY scores come from the 62 judged queries (judge validated against human gold, kappa_quad=0.7776); LATENCY and per-query COST come from all 170 benchmark queries in bench_v2_raw.jsonl. cost_per_1k is read from the production router static.py _COST_PER_1K table.
- Cache economics
- Cache hit-rate band 9-46% across traffic-shape scenarios; baseline ~46%. Caching compounds the routing savings but does not carry margin alone. (baseline blended hit-rate 46%).
- Sources
- bench_v2_raw.jsonl, bench_queries_v2.jsonl, bench_queries_v2.README.md, gold_judgments.jsonl, judge_kappa_report.json, route_bench_report.json, cache_sensitivity_report.json, groundroute/src/groundroute/router/static.py (_COST_PER_1K)
- Generated
- 2026-06-14T07:40:03Z
On price: List prices as of 2026; actual cost varies by plan tier and volume. Notably: Serper drops to ~$0.30/1k at scale; Firecrawl 'Enhanced Mode' is ~5x for bot-protected sites. Comparisons use published list prices for a like-for-like baseline. (serper: list $1.00/1k; ~$0.30/1k at scale; firecrawl: list $0.85/1k; ~5x (Enhanced Mode) for bot-protected sites)
- Perplexity quality is shown as 'not measured' (null), NOT 0: its API key was non-functional during this run so it returned no parseable results. We publish its cost/latency but make NO quality claim — a 0 would be a data artifact, not a verdict. Re-run with a working key to measure it.
- Quality is judged on the 62-query gold subset; per-class judged-n is small (10-17), so per-class quality is INDICATIVE — the headline rests on COST (covered on all 170 queries), not on a quality ranking.
- Quality scores are tightly clustered (judge granularity ~0.083 = one result of three). Within-class quality_leader margins are small; best_value uses a 0.05 tolerance band around the leader to define 'clears the bar'.
- Queries are synthetic class-discriminating probes (see bench_queries_v2.README.md), not real partner traffic; treat rankings as directional.
Stop overpaying for search.
GroundRoute routes each query to the cheapest engine that clears your bar — with caching on top. You pay 50% of what the cache saves you, never more than going direct.
© 2026 GroundRoute, Inc. · Benchmark generated 2026-06-14T07:40:03Z · numbers traceable to the published dataset.