State of AI Search · 2026

Default all-Exa setups overpay up to 8.2× — here's the per-class routing that fixes it.

Naive 'all-Exa' setups overpay up to 8.2x — routing each query class to the cheapest engine that clears its quality bar (blended $0.85/1k vs $7/1k flat) closes the gap. Exa stays the right call where it wins; the savings come from not defaulting everything to it.

Comparison is a naive all-Exa default vs an optimized per-class router — the 8.2x is the ceiling a naive default leaves on the table, not a claim that Exa is overpriced. Best-value picks are 'cheapest for THESE query types' at list prices (see price_caveat).

8.24×
overpay: default-Exa ($7.00/1k) vs cheapest-that-clears-the-bar ($0.85/1k blended)
170
benchmark queries across 5 query classes — cost & latency measured on all of them
6
search APIs compared on the same queries, one endpoint

The headline is a cost finding — cost/1k and latency are measured on all 170 queries. Quality is judged on a 62-query gold subset and shown as indicative per class; we do not rank on quality.

Best value by query class

Each class ranked by cost then latency. “Best value” is the cheapest engine that clears the class quality bar (computed in the dataset, 0.05 tolerance). Quality is indicative — see methodology.

Academic40 queries · cheapest: firecrawl · fastest:
EngineQuality · indicative (n=11)p50 latencyp95 latencyCost / 1k
firecrawlcheapest91%not measured$0.85 / 1k
serper90%not measured$1.00 / 1k
brave88%not measured$5.00 / 1k
perplexityquality not measurednot measured$5.00 / 1k
exa91%not measured$7.00 / 1k
tavily86%not measured$8.00 / 1k
Cheapest: firecrawl · Fastest: — cheapest ≠ fastest. For high-fan-out agents (100+ searches/run) latency compounds; we route on both cost and latency, not price alone. Note: list $0.85/1k; ~5x (Enhanced Mode) for bot-protected sites.
News40 queries · cheapest: firecrawl · fastest:
EngineQuality · indicative (n=10)p50 latencyp95 latencyCost / 1k
firecrawlcheapest83%not measured$0.85 / 1k
serper58%not measured$1.00 / 1k
brave81%not measured$5.00 / 1k
perplexityquality not measurednot measured$5.00 / 1k
exa83%not measured$7.00 / 1k
tavily77%not measured$8.00 / 1k
Cheapest: firecrawl · Fastest: — cheapest ≠ fastest. For high-fan-out agents (100+ searches/run) latency compounds; we route on both cost and latency, not price alone. Note: list $0.85/1k; ~5x (Enhanced Mode) for bot-protected sites.
Page lookup40 queries · cheapest: firecrawl · fastest:
EngineQuality · indicative (n=10)p50 latencyp95 latencyCost / 1k
firecrawlcheapest89%not measured$0.85 / 1k
serper88%not measured$1.00 / 1k
brave91%not measured$5.00 / 1k
perplexityquality not measurednot measured$5.00 / 1k
exa87%not measured$7.00 / 1k
tavily88%not measured$8.00 / 1k
Cheapest: firecrawl · Fastest: — cheapest ≠ fastest. For high-fan-out agents (100+ searches/run) latency compounds; we route on both cost and latency, not price alone. Note: list $0.85/1k; ~5x (Enhanced Mode) for bot-protected sites.
Direct answer40 queries · cheapest: firecrawl · fastest:
EngineQuality · indicative (n=14)p50 latencyp95 latencyCost / 1k
firecrawlcheapest85%not measured$0.85 / 1k
serper83%not measured$1.00 / 1k
brave86%not measured$5.00 / 1k
perplexityquality not measurednot measured$5.00 / 1k
exa76%not measured$7.00 / 1k
tavily82%not measured$8.00 / 1k
Cheapest: firecrawl · Fastest: — cheapest ≠ fastest. For high-fan-out agents (100+ searches/run) latency compounds; we route on both cost and latency, not price alone. Note: list $0.85/1k; ~5x (Enhanced Mode) for bot-protected sites.
Web40 queries · cheapest: firecrawl · fastest:
EngineQuality · indicative (n=17)p50 latencyp95 latencyCost / 1k
firecrawlcheapest84%not measured$0.85 / 1k
serper83%not measured$1.00 / 1k
brave88%not measured$5.00 / 1k
perplexityquality not measurednot measured$5.00 / 1k
exa79%not measured$7.00 / 1k
tavily81%not measured$8.00 / 1k
Cheapest: firecrawl · Fastest: — cheapest ≠ fastest. For high-fan-out agents (100+ searches/run) latency compounds; we route on both cost and latency, not price alone. Note: list $0.85/1k; ~5x (Enhanced Mode) for bot-protected sites.

Head-to-head comparisons

The best search API for AI agents and RAG depends on your query mix. Compare engines directly:

Read the full report

Detailed findings, methodology, per-class breakdowns, and caveats — the version worth linking to.

Methodology

Credibility is the point — here's exactly how this was measured, and where it's indicative.

Queries
170 class-discriminating benchmark queries across 5 classes (Academic 35, News 33, Page lookup 33, Direct answer 34, Web 35).
Quality judging
62 queries judged by LLM judge (kappa-validated vs human gold). Quadratic-weighted κ = 0.78 vs human gold (target 0.6, passes).
What's measured where
QUALITY scores come from the 62 judged queries (judge validated against human gold, kappa_quad=0.7776); LATENCY and per-query COST come from all 170 benchmark queries in bench_v2_raw.jsonl. cost_per_1k is read from the production router static.py _COST_PER_1K table.
Cache economics
Cache hit-rate band 9-46% across traffic-shape scenarios; baseline ~46%. Caching compounds the routing savings but does not carry margin alone. (baseline blended hit-rate 46%).
Sources
bench_v2_raw.jsonl, bench_queries_v2.jsonl, bench_queries_v2.README.md, gold_judgments.jsonl, judge_kappa_report.json, route_bench_report.json, cache_sensitivity_report.json, groundroute/src/groundroute/router/static.py (_COST_PER_1K)
Generated
2026-06-14T07:40:03Z

On price: List prices as of 2026; actual cost varies by plan tier and volume. Notably: Serper drops to ~$0.30/1k at scale; Firecrawl 'Enhanced Mode' is ~5x for bot-protected sites. Comparisons use published list prices for a like-for-like baseline. (serper: list $1.00/1k; ~$0.30/1k at scale; firecrawl: list $0.85/1k; ~5x (Enhanced Mode) for bot-protected sites)

  • Perplexity quality is shown as 'not measured' (null), NOT 0: its API key was non-functional during this run so it returned no parseable results. We publish its cost/latency but make NO quality claim — a 0 would be a data artifact, not a verdict. Re-run with a working key to measure it.
  • Quality is judged on the 62-query gold subset; per-class judged-n is small (10-17), so per-class quality is INDICATIVE — the headline rests on COST (covered on all 170 queries), not on a quality ranking.
  • Quality scores are tightly clustered (judge granularity ~0.083 = one result of three). Within-class quality_leader margins are small; best_value uses a 0.05 tolerance band around the leader to define 'clears the bar'.
  • Queries are synthetic class-discriminating probes (see bench_queries_v2.README.md), not real partner traffic; treat rankings as directional.

Stop overpaying for search.

GroundRoute routes each query to the cheapest engine that clears your bar — with caching on top. You pay 50% of what the cache saves you, never more than going direct.

© 2026 GroundRoute, Inc. · Benchmark generated 2026-06-14T07:40:03Z · numbers traceable to the published dataset.