- 📄 post.py
- 📄 SKILL.md
bot2bot-post
Post a coordination message from this bot to the shared bot2bot channel, @-mentioning the other Sutando node.
Post a coordination message from this bot to the shared bot2bot channel, @-mentioning the other Sutando node.
Evaluate and score agent behavior against a golden reference. Use this skill whenever the user wants to run evaluation, check pass/fail status, understand metric scores, compare sessions for regressions, validate agent behavior, or score a trace from a file or a live session. Trigger on phrases like "eval this trace", "check my agent output", "did my agent do the right thing", "compare runs", "did my agent regress", "score session X", "evaluate against golden", "run evals". Works with both local trace files and live streaming sessions. --- Evaluate agent behavior and explain what the scores mean. ## Determine the input type First, figure out what to evaluate: - **Trace file(s)** — user mentions a `.json` or `.jsonl` file path → use `evaluate_traces` - **Sessions vs golden** — user has multiple live sessions and wants regression testing → use `evaluate_sessions` - **Single live session** — user wants to score one session against a golden eval set → guide them to use `evaluate_sessions` with one session as golden ## Evaluating trace files 1. Get the file path(s). Check the extension: `.jsonl` → `trace_format: "otlp-json"` | `.json` → `"jaeger-json"` (default) 2. Ask if they have a golden eval set JSON. For `tool_trajectory_avg_score` (the default metric), an eval set is required — it provides the expected tool call sequence to compare against. If they don't have one yet, explain this and suggest starting with `hallucinations_v1`, or ask if they want to create a golden set from a reference run first. 3. Call `evaluate_traces` with the file(s), format, and eval set. 4. Present results as a score table (see Score interpretation below) and explain failures. ## Evaluating sessions (regression testing) This workflow requires the server to be running with the `--dev` flag (which enables WebSocket and session streaming). Plain `agentevals serve` will not have sessions. If you get a connection error from any tool below, tell the user: ```bash uv run agentevals serve --dev ```
SSH into host `h100_sglang`, enter Docker container `sglang_bbuf`, work in `/data/bbuf/repos/sglang`, and use the ready H100 remote environment for SGLang **diffusion** development and validation. Use when a task needs diffusion model smoke tests, Triton/CUDA kernel validation, torch.compile diffusion checks, or a safe remote copy for diffusion-specific SGLang changes.
Routing guide -- when to use `nansen agent` (AI research) vs direct CLI data commands. Use when deciding how to answer a user's research question with Nansen tools.
Provides information about the bitwize-music plugin, its version, and its creator. Use when the user asks about the plugin, its purpose, version, or capabilities.
A deterministic thinking partner that challenges assumptions and applies mental models to sharpen decisions, solve problems, and think more clearly. Use this skill whenever a user says "help me think through X", "challenge my thinking", "what am I missing", "apply mental models to this", "play devil's advocate", "stress test this idea", "poke holes in my plan", "help me decide between X and Y", "what are the second-order effects", "I'm stuck on a decision", names any specific model (SWOT, first principles, inversion, pre-mortem, etc.), or asks for structured reasoning on any ambiguous, high-stakes, or complex problem. Also trigger when the user seems uncertain, is rationalizing, or is asking "am I thinking about this right?" Even casual phrases like "what do you think about..." on non-trivial topics should trigger this skill. --- # Thinking Partner A deterministic thinking partner that challenges assumptions and applies mental models to help users think better and clearer. Not a lecture — a sparring session. ## Core Philosophy Good thinking is an active achievement, not a default state. The goal is not to tell the user what to think, but to sharpen *how* they think by: 1. **Challenging assumptions** — Surface hidden beliefs the user is treating as facts 2. **Applying mental models** — Select and deploy the right thinking frameworks for the situation 3. **Detecting orientation capture** — Notice when thinking serves comfort instead of truth 4. **Maintaining productive tension** — Hold complexity open long enough to find real insight You are not a yes-machine. You are not an interrogator. You are a thinking partner: respectful, direct, genuinely curious, and willing to push back. ## When This Triggers - "Help me think through X" - "Challenge my thinking / assumptions" - "What am I missing?" - "Apply [any model name] to this" - "Play devil's advocate" - "Stress test this idea / plan" - "Help me decide between X and Y" - "What are the second-order effects?" - "Am I thin
Headless browser automation for AI agents using agent-browser CLI. Use when Claude needs to automate web browsing, scrape web data, interact with web pages, fill forms, take screenshots, or perform any browser-based tasks. Supports reference-based element targeting, session management, and semantic locators.
Creating and refining Mermaid diagrams with live reload. Use when users want flowcharts, sequence diagrams, class diagrams, ER diagrams, state diagrams, or any other Mermaid visualization. Provides best practices for syntax, styling, and the iterative workflow using mermaid_preview and mermaid_save tools.
Explore code structure, architecture, files, directories, and component relationships using the OpenTrace knowledge graph. Use this skill for ANY question about the codebase that the graph might answer — including browsing, searching, and understanding code organization.
Guide for creating effective skills that extend agent capabilities with specialized knowledge, workflows, or tool integrations. Use this skill when the user asks to: (1) create a new skill, (2) make a skill, (3) build a skill, (4) set up a skill, (5) initialize a skill, (6) scaffold a skill, (7) update or modify an existing skill, (8) validate a skill, (9) learn about skill structure, (10) understand how skills work, or (11) get guidance on skill design patterns. Trigger on phrases like \"create a skill\", \"new skill\", \"make a skill\", \"skill for X\", \"how do I create a skill\", or \"help me build a skill\".
Testing-first Open-FDD lab skill: external bench validation, frontend/API parity, BRICK+BACnet verification, overnight triage, and issue filing for confirmed product defects.
Monitors context window health throughout a session and rides peak context quality for maximum output fidelity. Activates automatically after plan-interview and intent-framed-agent. Stays active through execution and hands off cleanly to simplify-and-harden and self-improvement when the wave completes naturally or exits via handoff. Use this skill whenever a multi-step agent task is underway and session continuity or context drift is a concern. Especially important for long-running tasks, complex refactors, or any work where degraded context would silently corrupt the output. Trigger even if the user doesn't say "context surfing" — if an agent task is running across multiple steps with intent and a plan already established, this skill is live. --- # Context Surfing ## Install ```bash npx skills add pskoett/pskoett-ai-skills/skills/context-surfing ``` The agent rides the wave of peak context. When the wave crests, it commits. When it detects drift, it pulls out cleanly — saving state, handing off, and letting the next session catch the next wave. No wipeouts. No zombie sessions. Only intentional, high-fidelity execution. --- ## Mental Model
skill-sample/ ├─ SKILL.md ⭐ Required: skill entry doc (purpose / usage / examples / deps) ├─ manifest.sample.json ⭐ Recommended: machine-readable metadata (index / validation / autofill) ├─ LICENSE.sample ⭐ Recommended: license & scope (open source / restriction / commercial) ├─ scripts/ │ └─ example-run.py ✅ Runnable example script for quick verification ├─ assets/ │ ├─ example-formatting-guide.md 🧩 Output conventions: layout / structure / style │ └─ example-template.tex 🧩 Templates: quickly generate standardized output └─ references/ 🧩 Knowledge base: methods / guides / best practices ├─ example-ref-structure.md 🧩 Structure reference ├─ example-ref-analysis.md 🧩 Analysis reference └─ example-ref-visuals.md 🧩 Visual reference
More Agent Skills specs Anthropic docs: https://agentskills.io/home
├─ ⭐ Required: YAML Frontmatter (must be at top) │ ├─ ⭐ name : unique skill name, follow naming convention │ └─ ⭐ description : include trigger keywords for matching │ ├─ ✅ Optional: Frontmatter extension fields │ ├─ ✅ license : license identifier │ ├─ ✅ compatibility : runtime constraints when needed │ ├─ ✅ metadata : key-value fields (author/version/source_url...) │ └─ 🧩 allowed-tools : tool whitelist (experimental) │ └─ ✅ Recommended: Markdown body (progressive disclosure) ├─ ✅ Overview / Purpose ├─ ✅ When to use ├─ ✅ Step-by-step ├─ ✅ Inputs / Outputs ├─ ✅ Examples ├─ 🧩 Files & References ├─ 🧩 Edge cases ├─ 🧩 Troubleshooting └─ 🧩 Safety notes
Skill files are scattered across GitHub and communities, difficult to search, and hard to evaluate. SkillWink organizes open-source skills into a searchable, filterable library you can directly download and use.
We provide keyword search, version updates, multi-metric ranking (downloads / likes / comments / updates), and open SKILL.md standards. You can also discuss usage and improvements on skill detail pages.
Quick Start:
Import/download skills (.zip/.skill), then place locally:
~/.claude/skills/ (Claude Code)
~/.codex/skills/ (Codex CLI)
One SKILL.md can be reused across tools.
Everything you need to know: what skills are, how they work, how to find/import them, and how to contribute.
A skill is a reusable capability package, usually including SKILL.md (purpose/IO/how-to) and optional scripts/templates/examples.
Think of it as a plugin playbook + resource bundle for AI assistants/toolchains.
Skills use progressive disclosure: load brief metadata first, load full docs only when needed, then execute by guidance.
This keeps agents lightweight while preserving enough context for complex tasks.
Use these three together:
Note: file size for all methods should be within 10MB.
Typical paths (may vary by local setup):
One SKILL.md can usually be reused across tools.
Yes. Most skills are standardized docs + assets, so they can be reused where format is supported.
Example: retrieval + writing + automation scripts as one workflow.
Some skills come from public GitHub repositories and some are uploaded by SkillWink creators. Always review code before installing and own your security decisions.
Most common reasons:
We try to avoid that. Use ranking + comments to surface better skills: