Use when the user wants to analyze dataset bias, create stratified samples, evaluate fairness, or plan dataset collection. Triggers on phrases like "dataset bias", "stratified sample", "class imbalance", "data distribution", "fairness analysis", or "ethical review". --- # Dataset Curation Methodology You are helping a researcher curate, analyze, or expand a dataset with attention to bias, fairness, and quality. ## Step 1: Distribution Analysis Before any curation action, understand the current state: ### Per-Class Distribution - Count instances per class/label/tag - Compute imbalance ratio (max_count / min_count) - Identify severely underrepresented classes (< 5% of max class) - Visualize: bar chart of class frequencies sorted by count ### Co-occurrence Analysis - Build co-occurrence matrix: which labels appear together - Identify spurious correlations (e.g., "violence" always co-occurs with "male") - Check for label leakage between splits ### Metadata Distribution - Source diversity: how many sources/movies/documents contribute - Temporal distribution: are all time periods represented? - Content diversity: genre, style, domain coverage ## Step 2: Bias Assessment
Recursively scan neuroscience data directories containing HDF5 (.h5, .hdf5) and MATLAB (.mat) files, inspect their internal structure (dataset keys, shapes, dtypes, byte sizes), and produce a structured meta.json catalog. Automatically detects MATLAB v7.3 files and falls back to h5py when scipy.io.loadmat fails. Supports wildcard pattern merging to collapse repeated subject directory structures into single entries with dimension ranges. Use this skill when the user needs to generate a metadata summary of a hierarchical neuroscience dataset folder.
INVOKE THIS SKILL when creating, managing, or querying Arize datasets and examples. Covers dataset CRUD, appending examples, exporting data, and file-based dataset creation using the ax CLI.
Skill files are scattered across GitHub and communities, difficult to search, and hard to evaluate. SkillWink organizes open-source skills into a searchable, filterable library you can directly download and use.
We provide keyword search, version updates, multi-metric ranking (downloads / likes / comments / updates), and open SKILL.md standards. You can also discuss usage and improvements on skill detail pages.
Sort by downloads/likes/comments/updated to find higher-quality skills.
4. Which import methods are supported?
Upload archive: .zip / .skill (recommended)
Upload skills folder
Import from GitHub repository
Note: file size for all methods should be within 10MB.
5. How to use in Claude / Codex?
Typical paths (may vary by local setup):
Claude Code:~/.claude/skills/
Codex CLI:~/.codex/skills/
One SKILL.md can usually be reused across tools.
6. Can one skill be shared across tools?
Yes. Most skills are standardized docs + assets, so they can be reused where format is supported.
Example: retrieval + writing + automation scripts as one workflow.
7. Are these skills safe to use?
Some skills come from public GitHub repositories and some are uploaded by SkillWink creators. Always review code before installing and own your security decisions.
8. Why does it not work after import?
Most common reasons:
Wrong folder path or nested one level too deep
Invalid/incomplete SKILL.md fields or format
Dependencies missing (Python/Node/CLI)
Tool has not reloaded skills yet
9. Does SkillWink include duplicates/low-quality skills?
We try to avoid that. Use ranking + comments to surface better skills: