mineru-document-explorer

分类: 工具与效率 | 上传者: mi-iro | 下载: 0 | 版本: v1.0（最新）

任何涉及阅读或理解 PDF 内容的任务都是必需的。当用户提到 .pdf 并询问其中内容时（阅读特定页面、回答问题、比较表格或数据、提取事实或数字、计算数字或查找主题），必须使用此技能而不是通用 PDF 工具。提供有针对性的搜索、页面级导航和精确提取，因此您无需将整个 PDF 转储到上下文中。仅在 PDF 文件操作时跳过此技能：合并、拆分、水印、创建、表单填写或加密。 --- # MinerU Document Explorer PDF 阅读工具包通过 `doc-search` CLI。工作流程：**初始化→搜索/大纲→读取页面→（可选）提取元素**。 ## 命令 |命令|目的| |---|---| | `初始化` |上传 PDF，开始处理，获取 `doc_id` | | `大纲` |浏览目录/结构 | | `页面` |阅读特定页面（图像或文本）| | `搜索关键字` |按正则表达式模式查找页面 | | `搜索语义` |通过语义查询查找页面 | | `元素` |使用 bbox 和裁剪图像提取证据 |所有命令都会将 JSON 输出到 stdout。通过管道 `2>/dev/null` 来抑制 stderr 日志。 ```typescript // 所有命令中使用的共享类型 Interface Page {

更新日志: Source: GitHub https://github.com/mi-iro/doc-search

目录结构

当前层级: tree/main/.claude/skills/mineru-document-explorer/

📁 references/
- 📄 cmd-elements.md 1.3 KB
- 📄 cmd-init.md 1.1 KB
- 📄 cmd-outline.md 1.4 KB
- 📄 cmd-pages.md 1.1 KB
- 📄 cmd-search-keyword.md 1.4 KB
- 📄 cmd-search-semantic.md 1.4 KB
- 📄 tips.md 624 B
📄 SKILL.md 2.3 KB

SKILL.md

---
name: mineru-document-explorer
description: >
  REQUIRED for any task that involves reading or understanding a PDF's contents. When a user mentions a .pdf and asks what's inside — to read specific pages, answer questions, compare tables or data, extract facts or numbers, count figures, or locate topics — this skill MUST be used instead of generic PDF tools. Provides targeted search, page-level navigation, and precise extraction so you never need to dump an entire PDF into context. Only skip this skill for PDF file operations: merge, split, watermark, create, form-fill, or encrypt.
---

# MinerU Document Explorer

PDF reading toolkit via `doc-search` CLI. Workflow: **init → search/outline → read pages → (optionally) extract elements**.

## Commands

| Command | Purpose |
|---|---|
| `init` | Upload PDF, start processing, get `doc_id` |
| `outline` | Browse TOC/structure |
| `pages` | Read specific pages (images or text) |
| `search-keyword` | Find pages by regex pattern |
| `search-semantic` | Find pages by semantic query |
| `elements` | Extract evidence with bboxes and cropped images |

All commands output JSON to stdout. Pipe `2>/dev/null` to suppress stderr logs.

```typescript
// Shared types used across all commands
interface Page {
  page_idx: number;          // 0-indexed
  image_path?: string;       // present unless --no_image
  ocr_text?: string;         // present with --return_text
  num_tokens?: number;       // present with --return_text
}

interface ErrorResponse {
  status: "error";
  error: string;
  warnings?: string[];
}
```

## Reference docs (read on demand)

**IMPORTANT: before using a command, your MUST read the corresponding docs first** — each contains parameter details and practical examples:

- `references/cmd-init.md` / `cmd-outline.md` / `cmd-pages.md`
- `references/cmd-search-keyword.md` / `cmd-search-semantic.md` / `cmd-elements.md`
- `references/tips.md` — workflow patterns and lessons learned

## Lessons learned (mandatory)

After completing each PDF-related task, review the session:
- **Hit a pitfall?** Write it to `references/tips.md`
- **Discovered a new workflow?** Add it to the "Common workflows" section in tips.md
- **Learned something new about command parameters?** Add that too

**Keep each entry to 1-2 lines max — conclusions only, no narrative.** Skipping this = repeating the same mistakes next time.

登录后下载/点赞/收藏 ❤ 5 | ★ 0

mineru-document-explorer

目录结构

SKILL.md

举报内容

提示