双向文档转换工具,将 Word (.docx)、Excel (.xlsx)、PowerPoint (.pptx) 和 PDF (.pdf) 转换为 AI 友好的 Markdown 格式,或将 Markdown (.md) 转换为 Word (.docx) 格式。当用户请求以下操作时使用:(1) 明确请求文档转换,包括任何包含"转换"、"转为"、"转成"、"convert"、"导出"、"export"等词汇的请求(例如:"转换文档"、"把这个文件转为docx"、"convert to markdown"、"导出为Word");(2) 需要 AI 理解文档内容("帮我分析这个 Word 文件"、"读取这个 PDF"、"总结这个 Excel");(3) 上传文档文件并询问内容("这是什么"、"帮我看看");(4) 任何涉及 .docx、.xlsx、.pptx、.pdf、.md 文件格式转换的请求。
Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When Claude needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale.
Extract text and tables from PDF files, merge/split documents, create PDFs, fill forms. Use when working with PDF documents.
Read, summarize, search, and extract structured artifacts from PDF files, especially papers and reports. Use when the user wants a PDF converted before reading, needs figures or tables pulled out, wants image labels or captions, or needs cleaner Markdown/text than reading the PDF directly.
Annotate academic papers and Zotero-managed PDF attachments with structured, high-value highlights and mentor-style notes. Use when a user wants paper pre-highlighting, reading guidance, layered annotation styles, note density control, color-coded importance, Zotero/PDF write-back, or a reusable workflow for studying, literature review, exam prep, or paper comprehension.
Generate PDF files using the nanhu-print-java framework. Use this skill when the user wants to create PDF documents with nanhu-print-java, including writing XML templates, preparing JSON business data, calling the Java API (NanhuprintInterpreter), or integrating nanhu-print-java into a Spring Boot service. Triggers on requests like "generate a PDF with nanhu-print-java", "write an XML template for nanhu-print-java", "how do I call NanhuprintInterpreter", or "create a PDF invoice/report using nanhu-print-java".
Render HTML to PDF. Use when the user wants to convert HTML to PDF, print a web page to PDF, or produce print-ready output from HTML. Trigger phrases include 'HTML to PDF', 'print to PDF', 'convert to PDF', 'save as PDF', '转PDF', '导出PDF'. Only local HTML file input is supported — remote URLs are not accepted.
Translate PDF documents while preserving original formatting. Converts PDF pages to images for layout analysis, extracts text and embedded images, translates content using agent's multilingual capabilities, and generates LaTeX code to recreate the original document structure with translated text. Special support for arXiv papers: when user provides arXiv ID (e.g., "2310.12345") or arXiv URL (e.g., "https://arxiv.org/abs/2310.12345" or "https://arxiv.org/pdf/2310.12345.pdf"), automatically download the source TeX files, translate the LaTeX content, and compile to PDF.
处理报销发票识别、归类和数据提取的自动化流程。当用户需要处理发票文件、 填写报销表格、识别发票类型(机票、火车、住宿、滴滴等)、提取发票金额和日期时, 必须使用此技能。适用于财务报销、差旅费统计、发票管理等场景。 触发场景: - 用户提到"发票"、"报销"、"baoxiao"、"差旅费"等关键词 - 需要填写报销表格(如biaoge.xlsx) - 需要识别PDF发票并提取金额、日期 - 需要按类型归类发票文件 - 需要验证发票数据的合理性 --- # 报销发票处理技能 (Baoxiao) ## 概述 本技能提供完整的发票处理流程,包括: 1. OFD文件自动转换为PDF 2. 发票文件自动识别与归类 3. 关键数据提取(金额、日期、城市名称) 4. 数据合理性校验 5. Excel表格自动填写(含城市信息) 6. Word审批文档自动填写 7. PDF转换和合并 ## 前置要求 必须安装以下工具,先检查用户环境是否满足,不满足需要执行安装: ```bash # PDF处理 pip3 install pdfplumber pdf2image pypytesseract pillow openpyxl pandas python-docx reportlab pypdf pypdf2 # YAML配置支持 (用于读取 config.yaml 配置文件) apt-get install python3-yaml # OCR引擎 apt-get install tesseract-ocr tesseract-ocr-chi-sim poppler-utils # PDF转换(用于Excel/Word转PDF) apt-get install libreoffice-writer libreoffice-calc ``` 字体依赖, 识别中文发票需要安装常见的字体,如宋体、楷体、黑体、仿宋、仿宋_GB2312、方正小标宋简体、Arial等. ## 配置说明 ### 城市单位映射配置 工具通过**配置文件**或**命令行参数**获取城市到单位的映射,用于自动填写Word审批文档中的"到达单位"字段。 #### 方式1:配置文件(推荐) 1. **创建配置文件** ```bash cp config.example.yaml config.yaml ``` 2. **编辑配置内容** ```yaml # config.yaml