Llm

LLM Tool-Selection Benchmark (Search-Agent Harness)
June 12, 2026
A search-agent evaluation harness that measures how well LLMs pick the right retrieval tool with the right arguments — scored on Cost Per Correct (CPC).
#ai-infra #retrieval #evaluation #llm
GPT-5's First Week: Margin Engineering, Context Ceilings, and Token Discipline
February 10, 2026
Why GPT-5's unified router model changes AI unit economics and why token discipline is becoming a core engineering capability.
#llm #ai-systems #token-economics #gpt-5

LLM Tool-Selection Benchmark (Search-Agent Harness)