Which search API is cheapest for AI agents?

Firecrawl ($0.85/1k) and Serper ($1/1k, ~$0.30 at scale) are the cheapest. The optimal choice depends on your query class — see the per-class breakdown below.

Is Exa the best search API for AI agents?

Exa is strong for academic and semantic retrieval, but it's the second-priciest engine in the benchmark ($7/1k). On web/news/page queries, cheaper engines matched quality. Routing per class — not defaulting everything to Exa — is where the savings come from.

How was quality measured?

Quality was judged on a 62-query gold subset by an LLM judge validated against human labels (κ=0.78). Per-class n is 10–17, so per-class quality is indicative. The headline finding rests on cost, which was measured on all 170 queries.

What about Perplexity?

Perplexity's API key was non-functional during the benchmark run. We publish its cost ($5/1k) and latency (3,538ms) but make no quality claim — a zero would be an artifact, not a verdict.

State of AI Search · 2026 · GroundRoute Benchmark

We benchmarked 6 search APIs on 170 AI-agent queries

AI agents fan out 80–240 web searches per task, and a lot of setups default everything to one premium engine. We benchmarked six — Serper, Brave, Exa, Tavily, Perplexity, Firecrawl — on 170 class-balanced agent queries. The robust finding is cost: a naive all-Exa default runs ~$7/1k while routing each query class to the cheapest engine that clears its quality bar blends to ~$0.85/1k — up to 8.24× difference. Quality differences were small and we treat them as indicative (judged on a 62-query gold subset, small per-class n). Here's everything, caveats included.

8.24×

all-Exa default ($7.00/1k) vs cheapest-that-clears-the-bar ($0.85/1k blended)

170

benchmark queries — cost & latency measured on all of them

gold-subset queries judged for quality (κ=0.78 vs human gold)

Methodology — honest, up front

We lead with caveats because this is the kind of post that gets scrutinised on HN. Methodology-forward is the credibility play.

Queries: 170 synthetic class-discriminating probes across 5 classes (Academic 35, News 33, Page lookup 33, Direct answer 34, Web 35). Treat rankings as directional — not real partner traffic.
Cost axis: Measured on all 170 queries (the robust axis). List prices as of 2026.
Quality axis: Judged on a 62-query gold subset by an LLM judge validated against human gold at quadratic-weighted κ=0.78 (target 0.6, passes). Per-class judged-n is 10–17 → per-class quality is indicative. We don't rank on quality.
Cache economics: Exact-only simulation. Baseline blended hit-rate 46%. Cache hit-rate band 9-46% across traffic-shape scenarios; baseline ~46%. Caching compounds the routing savings but does not carry margin alone.
Reproducible: spikes/rankings/build_rankings.py

Price caveat: List prices as of 2026; actual cost varies by plan tier and volume. Notably: Serper drops to ~$0.30/1k at scale; Firecrawl 'Enhanced Mode' is ~5x for bot-protected sites. Comparisons use published list prices for a like-for-like baseline.

Finding 1 — Cost is where the money is (the robust result)

Cost and latency are measured on all 170queries — this is the finding we're confident in. All-Exa ($7/1k) vs route-to-cheapest-that-clears-the-bar (~$0.85/1k) = 8.24×. Honest framing: this is the ceiling a naive default leaves on the table — not a claim Exa is overpriced. Exa stays the right call where it wins; the savings come from not defaulting everything to a premium engine.

Cost & latency — all 170 queriessorted by cost / 1k (cheapest first)

EngineCost / 1k searchesp50 latencyNotes

firecrawl$0.85 / 1k1,450 mslist $0.85/1k; ~5x (Enhanced Mode) for bot-protected sites

serper$1.00 / 1k713 mslist $1.00/1k; ~$0.30/1k at scale

brave$5.00 / 1k614 ms—

perplexity$5.00 / 1k3,538 ms—

exa$7.00 / 1k1,246 ms—

tavily$8.00 / 1k1,407 ms—

Cheapest ≠ fastest: Brave/Serper lead latency (~600–700 ms); Firecrawl is cheapest but ~1,450 ms.

Finding 2 — Quality is tightly clustered (and we won't overclaim it)

Overall quality scores sat in a narrow band (judge scale 0–3; metric = mean fraction of results judged relevant). Margins are within roughly one judge increment, so we don't crown a definitive quality winner. Perplexity is “not measured,” not zero — its key was non-functional that run.

Quality (overall) — 62-query gold subset, indicativejudge κ=0.78

EngineQuality score (indicative)Measured?

brave87%yes

firecrawl86%yes

exa83%yes

tavily83%yes

serper81%yes

perplexitynot measuredkey down

Quality is indicative (n=62); the headline rests on cost (all 170 queries).

Finding 3 — No single engine wins every query class

Per-class quality leaders (indicative, small n): academic → Firecrawl · news → Firecrawl · page → Brave · answer → Brave · web → Brave. Notably, the two priciest engines (Exa $7, Tavily $8) didn't lead any class outright in this set. The takeaway isn't “engine X is best” — it's that the best engine changes by query type, so routing per class beats a single default.

Academic40 queries · cheapest: firecrawl · fastest:

EngineQuality · indicative (n=11)p50 latencyp95 latencyCost / 1k

firecrawlcheapest91%not measured—$0.85 / 1k

serper90%not measured—$1.00 / 1k

brave88%not measured—$5.00 / 1k

perplexityquality not measured—not measured—$5.00 / 1k

exa91%not measured—$7.00 / 1k

tavily86%not measured—$8.00 / 1k

Cheapest: firecrawl · Fastest: — cheapest ≠ fastest. For high-fan-out agents (100+ searches/run) latency compounds; we route on both cost and latency, not price alone. Note: list $0.85/1k; ~5x (Enhanced Mode) for bot-protected sites.

News40 queries · cheapest: firecrawl · fastest:

EngineQuality · indicative (n=10)p50 latencyp95 latencyCost / 1k

firecrawlcheapest83%not measured—$0.85 / 1k

serper58%not measured—$1.00 / 1k

brave81%not measured—$5.00 / 1k

perplexityquality not measured—not measured—$5.00 / 1k

exa83%not measured—$7.00 / 1k

tavily77%not measured—$8.00 / 1k

Page lookup40 queries · cheapest: firecrawl · fastest:

EngineQuality · indicative (n=10)p50 latencyp95 latencyCost / 1k

firecrawlcheapest89%not measured—$0.85 / 1k

serper88%not measured—$1.00 / 1k

brave91%not measured—$5.00 / 1k

perplexityquality not measured—not measured—$5.00 / 1k

exa87%not measured—$7.00 / 1k

tavily88%not measured—$8.00 / 1k

Direct answer40 queries · cheapest: firecrawl · fastest:

EngineQuality · indicative (n=14)p50 latencyp95 latencyCost / 1k

firecrawlcheapest85%not measured—$0.85 / 1k

serper83%not measured—$1.00 / 1k

brave86%not measured—$5.00 / 1k

perplexityquality not measured—not measured—$5.00 / 1k

exa76%not measured—$7.00 / 1k

tavily82%not measured—$8.00 / 1k

Web40 queries · cheapest: firecrawl · fastest:

EngineQuality · indicative (n=17)p50 latencyp95 latencyCost / 1k

firecrawlcheapest84%not measured—$0.85 / 1k

serper83%not measured—$1.00 / 1k

brave88%not measured—$5.00 / 1k

perplexityquality not measured—not measured—$5.00 / 1k

exa79%not measured—$7.00 / 1k

tavily81%not measured—$8.00 / 1k

Finding 4 — Caching compounds the savings

On an exact-only cache simulation, repeated agent queries hit ~46% baseline (band 9–46% across traffic shapes; go-line is 33%, kill-floor 20%). Caching compoundsthe routing savings but doesn't carry margin alone.

Method: exact-only cache simulation (cache_sensitivity_report.json)
Baseline hit-rate: 46% blended across traffic shapes
Go / kill lines: Go-line 33% · kill-floor 20%

What this means — and where GroundRoute fits

Route each query class to the cheapest engine that clears its quality bar, cache the repeats, and keep a premium engine for the queries that need it. That's what GroundRoute does behind one API.

On pricing: you keep ~half the cache savings we generate, we keep the other half — so you're never worse off than going direct. BYOK supported. You can try the playground on your own queries, or read the per-engine breakdowns to see where each engine wins.

Caveats (kept prominent)

Perplexity quality is shown as 'not measured' (null), NOT 0: its API key was non-functional during this run so it returned no parseable results. We publish its cost/latency but make NO quality claim — a 0 would be a data artifact, not a verdict. Re-run with a working key to measure it.
Quality is judged on the 62-query gold subset; per-class judged-n is small (10-17), so per-class quality is INDICATIVE — the headline rests on COST (covered on all 170 queries), not on a quality ranking.
Quality scores are tightly clustered (judge granularity ~0.083 = one result of three). Within-class quality_leader margins are small; best_value uses a 0.05 tolerance band around the leader to define 'clears the bar'.
Queries are synthetic class-discriminating probes (see bench_queries_v2.README.md), not real partner traffic; treat rankings as directional.
Queries are synthetic class-discriminating probes, not real partner traffic — directional. A real partner-traffic re-run is the planned v2.

Per-engine & head-to-head breakdowns

Full pricing, limits, and benchmark details for each engine:

brave firecrawl exa tavily serper perplexity

Compare pairs directly:

All head-to-head rankings →

Stop defaulting everything to one engine.

GroundRoute routes each query to the cheapest engine that clears your bar — with caching on top. You pay 50% of what the cache saves you, never more than going direct.

Get an API key Try it in the playground →

FAQ

Which search API is cheapest for AI agents?: Firecrawl ($0.85/1k) and Serper ($1/1k, ~$0.30 at scale) are the cheapest. The optimal choice depends on your query class — see the per-class breakdown below.
Is Exa the best search API for AI agents?: Exa is strong for academic and semantic retrieval, but it's the second-priciest engine in the benchmark ($7/1k). On web/news/page queries, cheaper engines matched quality. Routing per class — not defaulting everything to Exa — is where the savings come from.
How was quality measured?: Quality was judged on a 62-query gold subset by an LLM judge validated against human labels (κ=0.78). Per-class n is 10–17, so per-class quality is indicative. The headline finding rests on cost, which was measured on all 170 queries.
What about Perplexity?: Perplexity's API key was non-functional during the benchmark run. We publish its cost ($5/1k) and latency (3,538ms) but make no quality claim — a zero would be an artifact, not a verdict.
What is GroundRoute?: GroundRoute is a search control plane for AI agents. You point your agent at one API; GroundRoute routes each query to the cheapest engine that clears your quality bar and caches repeats. You keep ~half the cache savings, never pay more than going direct.