nasde-benchmark-creator
Create coding agent benchmarks for evaluation with nasde. Use this skill when the user wants to: - Create a new benchmark project (set of tasks for evaluating coding agents) - Add tasks to an existing benchmark - Create or modify agent variants (configurations that control agent behavior) - Set up assessment dimensions and scoring criteria - Verify that a new benchmark's Docker environment and tests work Even if the user doesn't say "benchmark" — if they're talking about creating coding challenges for AI agents or setting up evaluation criteria, this skill applies. --- # NASDE Benchmark Creator Create and configure coding agent benchmarks for evaluation with `nasde`. A benchmark is a set of coding tasks that AI agents solve inside isolated Docker containers, scored both by functional tests (pass/fail) and by an LLM-as-a-Judge architecture assessment. ## Critical: line endings on Windows (read this first) Benchmark scripts execute inside **Linux** sandboxes (Docker, Daytona). If `tests/test.sh`, `solution/solve.sh`, or `environment/Dockerfile` are checked out with **CRLF** line endings (the Windows git default when `core.autocrlf=true` and there is no `.gitattributes`), every trial fails immediately with: ```
Changelog: Source: GitHub https://github.com/NoesisVision/nasde-toolkit
Loading comments...