agent-comparison
A/B test agent variants measuring quality and total session token cost across simple and complex benchmarks. Use when creating compact agent versions, validating agent changes, comparing internal vs external agents, or deciding between variants for production. Use for "compare agents", "A/B test", "benchmark agents", or "test agent efficiency". Do NOT use for evaluating single agents, testing skills, or optimizing prompts without variant comparison.
Changelog: Source: GitHub https://github.com/notque/ai-overkill
No comments yet. Be the first one!