LLM Tool-Selection Benchmark (Search-Agent Harness)
in-progressA search-agent evaluation harness that measures how well LLMs pick the right retrieval tool with the right arguments — scored on Cost Per Correct (CPC).
Selected work across agentic AI and retrieval systems, SDK infrastructure, and developer tooling.
A search-agent evaluation harness that measures how well LLMs pick the right retrieval tool with the right arguments — scored on Cost Per Correct (CPC).