LLM Tool-Selection Benchmark (Search-Agent Harness)
A search-agent evaluation harness that measures how well LLMs pick the right retrieval tool with the right arguments — scored on Cost Per Correct (CPC).
A search-agent evaluation harness that measures how well LLMs pick the right retrieval tool with the right arguments — scored on Cost Per Correct (CPC).
In retrieval systems, stable interfaces and rollout discipline usually matter more than early micro-optimizations.
Backend systems for indexing and retrieval workflows powering AI research experiences.