benchmark-runner
Auto-discover all skills with evals in RConsortium/pharma-skills, benchmark each with vs. without skill using matched isolated sessions, and post scored results to the linked GitHub issue. Use whenever someone says "run benchmarks", "compare skill performance", "eval the skills", or wants to measure whether a skill improves output quality.
Changelog: Source: GitHub https://github.com/RConsortium/pharma-skills
Loading comments...