State of AI Search · 2026 · GroundRoute Benchmark

We benchmarked 6 search APIs on 170 AI-agent queries

AI agents fan out 80–240 web searches per task, and a lot of setups default everything to one premium engine. We benchmarked six — Serper, Brave, Exa, Tavily, Perplexity, Firecrawl — on 170 class-balanced agent queries. The robust finding is cost: a naive all-Exa default runs ~$7/1k while routing each query class to the cheapest engine that clears its quality bar blends to ~$0.85/1k — up to 8.24× difference. Quality differences were small and we treat them as indicative (judged on a 62-query gold subset, small per-class n). Here's everything, caveats included.

8.24×
all-Exa default ($7.00/1k) vs cheapest-that-clears-the-bar ($0.85/1k blended)
170
benchmark queries — cost & latency measured on all of them
62
gold-subset queries judged for quality (κ=0.78 vs human gold)

Methodology — honest, up front

We lead with caveats because this is the kind of post that gets scrutinised on HN. Methodology-forward is the credibility play.

Queries
170 synthetic class-discriminating probes across 5 classes (Academic 35, News 33, Page lookup 33, Direct answer 34, Web 35). Treat rankings as directional — not real partner traffic.
Cost axis
Measured on all 170 queries (the robust axis). List prices as of 2026.
Quality axis
Judged on a 62-query gold subset by an LLM judge validated against human gold at quadratic-weighted κ=0.78 (target 0.6, passes). Per-class judged-n is 10–17 → per-class quality is indicative. We don't rank on quality.
Cache economics
Exact-only simulation. Baseline blended hit-rate 46%. Cache hit-rate band 9-46% across traffic-shape scenarios; baseline ~46%. Caching compounds the routing savings but does not carry margin alone.
Reproducible
spikes/rankings/build_rankings.py

Price caveat: List prices as of 2026; actual cost varies by plan tier and volume. Notably: Serper drops to ~$0.30/1k at scale; Firecrawl 'Enhanced Mode' is ~5x for bot-protected sites. Comparisons use published list prices for a like-for-like baseline.

Finding 1 — Cost is where the money is (the robust result)

Cost and latency are measured on all 170queries — this is the finding we're confident in. All-Exa ($7/1k) vs route-to-cheapest-that-clears-the-bar (~$0.85/1k) = 8.24×. Honest framing: this is the ceiling a naive default leaves on the table — not a claim Exa is overpriced. Exa stays the right call where it wins; the savings come from not defaulting everything to a premium engine.

Cost & latency — all 170 queriessorted by cost / 1k (cheapest first)
EngineCost / 1k searchesp50 latencyNotes
firecrawl$0.85 / 1k1,450 mslist $0.85/1k; ~5x (Enhanced Mode) for bot-protected sites
serper$1.00 / 1k713 mslist $1.00/1k; ~$0.30/1k at scale
brave$5.00 / 1k614 ms
perplexity$5.00 / 1k3,538 ms
exa$7.00 / 1k1,246 ms
tavily$8.00 / 1k1,407 ms
Cheapest ≠ fastest: Brave/Serper lead latency (~600–700 ms); Firecrawl is cheapest but ~1,450 ms.

Finding 2 — Quality is tightly clustered (and we won't overclaim it)

Overall quality scores sat in a narrow band (judge scale 0–3; metric = mean fraction of results judged relevant). Margins are within roughly one judge increment, so we don't crown a definitive quality winner. Perplexity is “not measured,” not zero — its key was non-functional that run.

Quality (overall) — 62-query gold subset, indicativejudge κ=0.78
EngineQuality score (indicative)Measured?
brave87%yes
firecrawl86%yes
exa83%yes
tavily83%yes
serper81%yes
perplexitynot measuredkey down
Quality is indicative (n=62); the headline rests on cost (all 170 queries).

Finding 3 — No single engine wins every query class

Per-class quality leaders (indicative, small n): academic → Firecrawl · news → Firecrawl · page → Brave · answer → Brave · web → Brave. Notably, the two priciest engines (Exa $7, Tavily $8) didn't lead any class outright in this set. The takeaway isn't “engine X is best” — it's that the best engine changes by query type, so routing per class beats a single default.

Academic40 queries · cheapest: firecrawl · fastest:
EngineQuality · indicative (n=11)p50 latencyp95 latencyCost / 1k
firecrawlcheapest91%not measured$0.85 / 1k
serper90%not measured$1.00 / 1k
brave88%not measured$5.00 / 1k
perplexityquality not measurednot measured$5.00 / 1k
exa91%not measured$7.00 / 1k
tavily86%not measured$8.00 / 1k
Cheapest: firecrawl · Fastest: — cheapest ≠ fastest. For high-fan-out agents (100+ searches/run) latency compounds; we route on both cost and latency, not price alone. Note: list $0.85/1k; ~5x (Enhanced Mode) for bot-protected sites.
News40 queries · cheapest: firecrawl · fastest:
EngineQuality · indicative (n=10)p50 latencyp95 latencyCost / 1k
firecrawlcheapest83%not measured$0.85 / 1k
serper58%not measured$1.00 / 1k
brave81%not measured$5.00 / 1k
perplexityquality not measurednot measured$5.00 / 1k
exa83%not measured$7.00 / 1k
tavily77%not measured$8.00 / 1k
Cheapest: firecrawl · Fastest: — cheapest ≠ fastest. For high-fan-out agents (100+ searches/run) latency compounds; we route on both cost and latency, not price alone. Note: list $0.85/1k; ~5x (Enhanced Mode) for bot-protected sites.
Page lookup40 queries · cheapest: firecrawl · fastest:
EngineQuality · indicative (n=10)p50 latencyp95 latencyCost / 1k
firecrawlcheapest89%not measured$0.85 / 1k
serper88%not measured$1.00 / 1k
brave91%not measured$5.00 / 1k
perplexityquality not measurednot measured$5.00 / 1k
exa87%not measured$7.00 / 1k
tavily88%not measured$8.00 / 1k
Cheapest: firecrawl · Fastest: — cheapest ≠ fastest. For high-fan-out agents (100+ searches/run) latency compounds; we route on both cost and latency, not price alone. Note: list $0.85/1k; ~5x (Enhanced Mode) for bot-protected sites.
Direct answer40 queries · cheapest: firecrawl · fastest:
EngineQuality · indicative (n=14)p50 latencyp95 latencyCost / 1k
firecrawlcheapest85%not measured$0.85 / 1k
serper83%not measured$1.00 / 1k
brave86%not measured$5.00 / 1k
perplexityquality not measurednot measured$5.00 / 1k
exa76%not measured$7.00 / 1k
tavily82%not measured$8.00 / 1k
Cheapest: firecrawl · Fastest: — cheapest ≠ fastest. For high-fan-out agents (100+ searches/run) latency compounds; we route on both cost and latency, not price alone. Note: list $0.85/1k; ~5x (Enhanced Mode) for bot-protected sites.
Web40 queries · cheapest: firecrawl · fastest:
EngineQuality · indicative (n=17)p50 latencyp95 latencyCost / 1k
firecrawlcheapest84%not measured$0.85 / 1k
serper83%not measured$1.00 / 1k
brave88%not measured$5.00 / 1k
perplexityquality not measurednot measured$5.00 / 1k
exa79%not measured$7.00 / 1k
tavily81%not measured$8.00 / 1k
Cheapest: firecrawl · Fastest: — cheapest ≠ fastest. For high-fan-out agents (100+ searches/run) latency compounds; we route on both cost and latency, not price alone. Note: list $0.85/1k; ~5x (Enhanced Mode) for bot-protected sites.

Finding 4 — Caching compounds the savings

On an exact-only cache simulation, repeated agent queries hit ~46% baseline (band 9–46% across traffic shapes; go-line is 33%, kill-floor 20%). Caching compoundsthe routing savings but doesn't carry margin alone.

Method
exact-only cache simulation (cache_sensitivity_report.json)
Baseline hit-rate
46% blended across traffic shapes
Go / kill lines
Go-line 33% · kill-floor 20%

What this means — and where GroundRoute fits

Route each query class to the cheapest engine that clears its quality bar, cache the repeats, and keep a premium engine for the queries that need it. That's what GroundRoute does behind one API.

On pricing: you keep ~half the cache savings we generate, we keep the other half — so you're never worse off than going direct. BYOK supported. You can try the playground on your own queries, or read the per-engine breakdowns to see where each engine wins.

Caveats (kept prominent)

  • Perplexity quality is shown as 'not measured' (null), NOT 0: its API key was non-functional during this run so it returned no parseable results. We publish its cost/latency but make NO quality claim — a 0 would be a data artifact, not a verdict. Re-run with a working key to measure it.
  • Quality is judged on the 62-query gold subset; per-class judged-n is small (10-17), so per-class quality is INDICATIVE — the headline rests on COST (covered on all 170 queries), not on a quality ranking.
  • Quality scores are tightly clustered (judge granularity ~0.083 = one result of three). Within-class quality_leader margins are small; best_value uses a 0.05 tolerance band around the leader to define 'clears the bar'.
  • Queries are synthetic class-discriminating probes (see bench_queries_v2.README.md), not real partner traffic; treat rankings as directional.
  • Queries are synthetic class-discriminating probes, not real partner traffic — directional. A real partner-traffic re-run is the planned v2.

Per-engine & head-to-head breakdowns

Full pricing, limits, and benchmark details for each engine:

bravefirecrawlexatavilyserperperplexity

Compare pairs directly:

All head-to-head rankings →

Stop defaulting everything to one engine.

GroundRoute routes each query to the cheapest engine that clears your bar — with caching on top. You pay 50% of what the cache saves you, never more than going direct.

© 2026 GroundRoute, Inc. · Benchmark generated 2026-06-14T07:40:03Z · N=170 queries · numbers traceable to the published dataset (bench_v2_raw.jsonl, bench_queries_v2.jsonl).

FAQ

Which search API is cheapest for AI agents?
Firecrawl ($0.85/1k) and Serper ($1/1k, ~$0.30 at scale) are the cheapest. The optimal choice depends on your query class — see the per-class breakdown below.
Is Exa the best search API for AI agents?
Exa is strong for academic and semantic retrieval, but it's the second-priciest engine in the benchmark ($7/1k). On web/news/page queries, cheaper engines matched quality. Routing per class — not defaulting everything to Exa — is where the savings come from.
How was quality measured?
Quality was judged on a 62-query gold subset by an LLM judge validated against human labels (κ=0.78). Per-class n is 10–17, so per-class quality is indicative. The headline finding rests on cost, which was measured on all 170 queries.
What about Perplexity?
Perplexity's API key was non-functional during the benchmark run. We publish its cost ($5/1k) and latency (3,538ms) but make no quality claim — a zero would be an artifact, not a verdict.
What is GroundRoute?
GroundRoute is a search control plane for AI agents. You point your agent at one API; GroundRoute routes each query to the cheapest engine that clears your quality bar and caches repeats. You keep ~half the cache savings, never pay more than going direct.