LLM Tool-Selection Benchmark (Search-Agent Harness)
A search-agent evaluation harness that measures how well LLMs pick the right retrieval tool with the right arguments — scored on Cost Per Correct (CPC).
A search-agent evaluation harness that measures how well LLMs pick the right retrieval tool with the right arguments — scored on Cost Per Correct (CPC).
Why GPT-5's unified router model changes AI unit economics and why token discipline is becoming a core engineering capability.