AI Agent Skills 搜索与发现平台

Daily Featured Skills Count

05/10 05/11 05/12 05/13 05/14 05/15 05/16

♾️ Free & Open Source 🛡️ Secure & Worry-Free

Import Skills

Composite Most Downloads Most Likes Most Comments Newest

langchain-ai

from GitHub Research & Analysis

📄 SKILL.md

eval data benchmark

eval-writer

Create new eval suites for the deepagentsjs monorepo. Handles dataset design, test case scaffolding, scoring logic, vitest configuration, and LangSmith integration. Use when the user asks to: (1) create an eval, (2) write an evaluation, (3) add a benchmark, (4) build an eval suite, (5) evaluate agent behaviour, (6) add test cases for a capability, or (7) implement an existing benchmark (e.g. oolong, AgentBench, SWE-bench). Trigger on phrases like 'create eval', 'new eval', 'add eval', 'benchmark', 'evaluate', 'eval suite', 'write evals for'.

⬇0 ❤1K 1 month ago · Uploaded Detail →

guanyang

from GitHub Data & AI

📁 references/
📁 scripts/
📄 SKILL.md

evaluation bias llm-as-judge

advanced-evaluation

This skill should be used when the user asks to "implement LLM-as-judge", "compare model outputs", "create evaluation rubrics", "mitigate evaluation bias", or mentions direct scoring, pairwise comparison, position bias, evaluation pipelines, or automated quality assessment.

⬇0 ❤628 1 month ago · Uploaded Detail →

agentscope-ai

from GitHub Data & AI

📄 SKILL.md

queries model evaluation

auto-arena

Automatically evaluate and compare multiple AI models or agents without pre-existing test data. Generates test queries from a task description, collects responses from all target endpoints, auto-generates evaluation rubrics, runs pairwise comparisons via a judge model, and produces win-rate rankings with reports and charts. Supports checkpoint resume, incremental endpoint addition, and judge model hot-swap. Use when the user asks to compare, benchmark, or rank multiple models or agents on a custom task, or run an arena-style evaluation. --- # Auto Arena Skill End-to-end automated model comparison using the OpenJudge `AutoArenaPipeline`: 1. **Generate queries** — LLM creates diverse test queries from task description 2. **Collect responses** — query all target endpoints concurrently 3. **Generate rubrics** — LLM produces evaluation criteria from task + sample queries 4. **Pairwise evaluation** — judge model compares every model pair (with position-bias swap) 5. **Analyze & rank** — compute win rates, win matrix, and rankings 6. **Report & charts** — Markdown report + win-rate bar chart + optional matrix heatmap ## Prerequisites ```bash # Install OpenJudge pip install py-openjudge # Extra dependency for auto_arena (chart generation) pip install matplotlib ``` ## Gather from user before running | Info | Required? | Notes | |------|-----------|-------| | Task description | Yes | What the models/agents should do (set in config YAML) | | Target endpoints | Yes | At least 2 OpenAI-compatible endpoints to compare | | Judge endpoint | Yes | Strong model for pairwise evaluation (e.g. `gpt-4`, `qwen-max`) | | API keys | Yes | Env vars: `OPENAI_API_KEY`, `DASHSCOPE_API_KEY`, etc. | | Number of queries | No | Default: `20` | | Seed queries | No | Example queries to guide generation style | | System prompts | No | Per-endpoint system prompts | | Output directory | No | Default: `./evaluation_results` | | Report language | No | `"zh"` (default) or `"en"` | ## Quick start ### CLI `

⬇0 ❤509 1 month ago · Uploaded Detail →

allenai

from GitHub Research & Analysis

📄 SKILL.md

benchmark add new

add-benchmark

Add a new simulation benchmark to the VLA evaluation harness. Use this skill whenever the user wants to integrate, create, or add a new benchmark or simulation environment — e.g. 'add ManiSkill3', 'integrate OmniGibson', 'hook up a new sim'. Also use when they ask how benchmarks are structured or want to understand the benchmark interface.

⬇0 ❤275 1 month ago · Uploaded Detail →

ory

from GitHub Research & Analysis

📄 SKILL.md

add bench-swe benchmark

add-benchmark

Add a new SWE benchmark task from a real GitHub bug-fix. Use when the user provides a GitHub issue or PR URL and wants to add it to the bench-swe pipeline.

⬇0 ❤185 1 month ago · Uploaded Detail →

radimsem

from GitHub Tools & Productivity

📄 SKILL.md

scenario benchmark remindb

add-bench-scenario

Use when adding a new scenario to remindb's benchmark suite — symptoms include "compare X tool against grep/cat", "add a token-savings benchmark for Y", "extend `internal/bench/scenarios.go`", "wire a new scenario into `bench.Run`", or any task that adds a row to the `scenario / naive (tok) / remindb (tok) / saved` output table. Distinct from Go `testing.B` benchmarks in `*_bench_test.go`.

⬇0 ❤83 3 days ago · Uploaded Detail →

DexForce

from GitHub Research & Analysis

📄 SKILL.md

conventions embodichain benchmark

benchmark

Write benchmark scripts for EmbodiChain modules following project conventions

⬇0 ❤149 1 month ago · Uploaded Detail →

poemswe

from GitHub Content & Multimedia

📄 SKILL.md

critically evaluation arguments

analyze

Critically analyze content, claims, or arguments with rigorous evaluation.

⬇0 ❤59 1 month ago · Uploaded Detail →

mlflow

from GitHub Tools & Productivity

📁 assets/
📁 references/
📁 scripts/
📄 SKILL.md

automation data evaluation

agent-evaluation

Use this when you need to EVALUATE OR IMPROVE or OPTIMIZE an existing LLM agent's output quality - including improving tool selection accuracy, answer quality, reducing costs, or fixing issues where the agent gives wrong/incomplete responses. Evaluates agents systematically using MLflow evaluation with datasets, scorers, and tracing. IMPORTANT - Always also load the instrumenting-with-mlflow-tracing skill before starting any work. Covers end-to-end evaluation workflow or individual components (tracing setup, dataset creation, scorer definition, evaluation execution).

⬇0 ❤20 1 month ago · Uploaded Detail →

netease-youdao

from GitHub Research & Analysis

📁 examples/
📁 scripts/
📁 server/
📄 .gitignore
📄 group.jpg
📄 install.sh

data paperswithcode benchmark

scholarclaw

学术论文搜索与分析服务 (Academic paper search & analysis)。当用户涉及以下学术场景时，必须使用本 skill 而非 web-search：搜索论文、查找 ArXiv/PubMed/PapersWithCode 论文、查询 SOTA 榜单与 benchmark 结果、引用分析、生成论文解读博客、查找论文相关 GitHub 仓库、获取热门论文推荐。Keywords: arxiv, paper, papers, academic, scholar, research, 论文, 学术, 搜索论文, 找论文, SOTA, benchmark, MMLU, citation, 引用, 博客, blog, PapersWithCode, HuggingFace.

⬇0 ❤9 1 month ago · Uploaded Detail →

NoesisVision

from GitHub Research & Analysis

📄 SKILL.md

benchmark coding create

nasde-benchmark-creator

Create coding agent benchmarks for evaluation with nasde. Use this skill when the user wants to: - Create a new benchmark project (set of tasks for evaluating coding agents) - Add tasks to an existing benchmark - Create or modify agent variants (configurations that control agent behavior) - Set up assessment dimensions and scoring criteria - Verify that a new benchmark's Docker environment and tests work Even if the user doesn't say "benchmark" — if they're talking about creating coding challenges for AI agents or setting up evaluation criteria, this skill applies. --- # NASDE Benchmark Creator Create and configure coding agent benchmarks for evaluation with `nasde`. A benchmark is a set of coding tasks that AI agents solve inside isolated Docker containers, scored both by functional tests (pass/fail) and by an LLM-as-a-Judge architecture assessment. ## Critical: line endings on Windows (read this first) Benchmark scripts execute inside **Linux** sandboxes (Docker, Daytona). If `tests/test.sh`, `solution/solve.sh`, or `environment/Dockerfile` are checked out with **CRLF** line endings (the Windows git default when `core.autocrlf=true` and there is no `.gitattributes`), every trial fails immediately with: ```

⬇0 ❤7 16 days ago · Uploaded Detail →

akshansh

from GitHub Development & Coding

📄 SKILL.md

recommendations prioritized evaluation

ade-audit

Run a full Build + Style + Move + Write evaluation on a page — score each framework, produce a combined report out of /200 with prioritized recommendations across all four.

⬇0 ❤7 1 month ago · Uploaded Detail →

‹ 1 2 ›

Creator Leaderboard

Most Published Most Liked Most Replied

1 No data --
2 No data --
3 No data --
4 No data --
5 No data --
6 No data --
7 No data --
8 No data --
9 No data --
10 No data --
11 No data --
12 No data --
13 No data --
14 No data --
15 No data --
16 No data --

Skill File Structure Sample (Reference)

skill-sample/
├─ SKILL.md              ⭐ Required: skill entry doc (purpose / usage / examples / deps)
├─ manifest.sample.json  ⭐ Recommended: machine-readable metadata (index / validation / autofill)
├─ LICENSE.sample        ⭐ Recommended: license & scope (open source / restriction / commercial)
├─ scripts/
│  └─ example-run.py     ✅ Runnable example script for quick verification
├─ assets/
│  ├─ example-formatting-guide.md  🧩 Output conventions: layout / structure / style
│  └─ example-template.tex         🧩 Templates: quickly generate standardized output
└─ references/           🧩 Knowledge base: methods / guides / best practices
   ├─ example-ref-structure.md     🧩 Structure reference
   ├─ example-ref-analysis.md      🧩 Analysis reference
   └─ example-ref-visuals.md       🧩 Visual reference

More Agent Skills specs Anthropic docs: https://agentskills.io/home

SKILL.md Requirements

├─ ⭐ Required: YAML Frontmatter (must be at top)
│  ├─ ⭐ name                 : unique skill name, follow naming convention
│  └─ ⭐ description          : include trigger keywords for matching
│
├─ ✅ Optional: Frontmatter extension fields
│  ├─ ✅ license              : license identifier
│  ├─ ✅ compatibility        : runtime constraints when needed
│  ├─ ✅ metadata             : key-value fields (author/version/source_url...)
│  └─ 🧩 allowed-tools        : tool whitelist (experimental)
│
└─ ✅ Recommended: Markdown body (progressive disclosure)
   ├─ ✅ Overview / Purpose
   ├─ ✅ When to use
   ├─ ✅ Step-by-step
   ├─ ✅ Inputs / Outputs
   ├─ ✅ Examples
   ├─ 🧩 Files & References
   ├─ 🧩 Edge cases
   ├─ 🧩 Troubleshooting
   └─ 🧩 Safety notes

Why SkillWink?

Skill files are scattered across GitHub and communities, difficult to search, and hard to evaluate. SkillWink organizes open-source skills into a searchable, filterable library you can directly download and use.

We provide keyword search, version updates, multi-metric ranking (downloads / likes / comments / updates), and open SKILL.md standards. You can also discuss usage and improvements on skill detail pages.

Keyword Search Version Updates Multi-Metric Ranking Open Standard Discussion

Quick Start:

Import/download skills (.zip/.skill), then place locally:

~/.claude/skills/ (Claude Code)

~/.codex/skills/ (Codex CLI)

One SKILL.md can be reused across tools.

FAQ

Everything you need to know: what skills are, how they work, how to find/import them, and how to contribute.

1. What are Agent Skills?

A skill is a reusable capability package, usually including SKILL.md (purpose/IO/how-to) and optional scripts/templates/examples.

Think of it as a plugin playbook + resource bundle for AI assistants/toolchains.

2. How do Skills work?

Skills use progressive disclosure: load brief metadata first, load full docs only when needed, then execute by guidance.

This keeps agents lightweight while preserving enough context for complex tasks.

3. How can I quickly find the right skill?

Use these three together:

Semantic search: describe your goal in natural language.
Multi-filtering: category/tag/author/language/license.
Sort by downloads/likes/comments/updated to find higher-quality skills.

4. Which import methods are supported?

Upload archive: .zip / .skill (recommended)
Upload skills folder
Import from GitHub repository

Note: file size for all methods should be within 10MB.

5. How to use in Claude / Codex?

Typical paths (may vary by local setup):

Claude Code：~/.claude/skills/
Codex CLI：~/.codex/skills/

One SKILL.md can usually be reused across tools.

6. Can one skill be shared across tools?

Yes. Most skills are standardized docs + assets, so they can be reused where format is supported.

Example: retrieval + writing + automation scripts as one workflow.

7. Are these skills safe to use?

Some skills come from public GitHub repositories and some are uploaded by SkillWink creators. Always review code before installing and own your security decisions.

8. Why does it not work after import?

Most common reasons:

Wrong folder path or nested one level too deep
Invalid/incomplete SKILL.md fields or format
Dependencies missing (Python/Node/CLI)
Tool has not reloaded skills yet

9. Does SkillWink include duplicates/low-quality skills?

We try to avoid that. Use ranking + comments to surface better skills:

Duplicate skills: compare differences (speed/stability/focus)
Low quality skills: regularly cleaned up

Import Skills

Skill File Structure Sample (Reference)

SKILL.md Requirements

Why SkillWink?

FAQ

1. What are Agent Skills?

2. How do Skills work?

3. How can I quickly find the right skill?

4. Which import methods are supported?

5. How to use in Claude / Codex?

6. Can one skill be shared across tools?

7. Are these skills safe to use?

8. Why does it not work after import?

9. Does SkillWink include duplicates/low-quality skills?

Notice