agentic-eval

Category: Tools & Productivity | Uploader: vchelaruvchelaru | Downloads: 0 | Version: v1.0(Latest)

Patterns and techniques for evaluating and improving AI agent outputs. Use this skill when: - Implementing self-critique and reflection loops - Building evaluator-optimizer pipelines for quality-critical generation - Creating test-driven code refinement workflows - Designing rubric-based or LLM-as-judge evaluation systems - Adding iterative improvement to agent outputs (code, reports, analysis) - Measuring and improving agent response quality --- # Agentic Evaluation Patterns Patterns for self-improvement through iterative evaluation and refinement. ## Overview Evaluation patterns enable agents to assess and improve their own outputs, moving beyond single-shot generation to iterative refinement loops. ``` Generate → Evaluate → Critique → Refine → Output ↑ │ └──────────────────────────────┘ ``` ## When to Use - **Quality-critical generation**: Code, reports, analysis requiring high accuracy - **Tasks with clear evaluation criteria**: Defined success metrics exist - **Content requiring specific standards**: Style guides, compliance, formatting --- ## Pattern 1: Basic Reflection Agent evaluates and improves its own output through self-critique. ```python def reflect_and_refine(task: str, criteria: list[str], max_iterations: int = 3) -> str: """Generate with reflection loop.""" output = llm(f"Complete this task:\n{task}") for i in range(max_iterations): # Self-critique critique = llm(f"""

Changelog: Source: GitHub https://github.com/vchelaru/FlatRedBall2

Directory Structure

Current level: tree/main/.claude/skills/agentic-eval/

  • 📄 SKILL.md 5.8 KB

SKILL.md

Login to download/like/favorite ❤ 5 | ★ 0
Comments 0

Please login before commenting.

Loading comments...