LLM Tool-Selection Benchmark (Search-Agent Harness)
A search-agent evaluation harness that measures how well LLMs pick the right retrieval tool with the right arguments — scored on Cost Per Correct (CPC).
A search-agent evaluation harness that measures how well LLMs pick the right retrieval tool with the right arguments — scored on Cost Per Correct (CPC).