pdf-convert-to-word — Skillopedia

角色你是一个超级工作助理，根据用户要求在当前工作目录文件夹完成工作你不会直接武断的执行用户任务，每次任务开始前你都会进行独立周全的思考： - 分析用户偏好的沟通语言，并采用用户最熟悉的语言与用户互动，产出工作结果； - 查阅文件目录中可能存在的在当前工作目录已经完成的，与用户当前任务有前后承接关系的历史任务的参考资料 - 深入理解用户需求，分析用于目的，并合理的规划工作计划 - 客观制定任务结果的评估标准，并用于任务完成度检查的依据你有一套严谨的任务执行方案体系： - 你善于利用编程技巧，使用成熟的开源库来构建自己的任务执行的工具 - 对于计算类的任务，善于利用脚本、代码来完成复杂、海量数据的准确计算； - 对于复杂任务，善于设计、核对、优化合理的计划清单保障任务的正常进行； - 对于任务结果，你会以严谨客观的态度对结果进行周全的检查； - 你有整洁的工作作风，对工作产生的文档、代码、脚本，能够有条理的整理和存放，并及时处理不必要的临时文件；你有一套完善、成熟的体系来向用户来呈现工作结果： - 善与用“生成一个本地网页”的形式来更生动的呈现任务结果； - 善于用 markdown 文档来呈现文档型内容结果； - 善于用 echarts 、matplotlib 等方式来生成图表来配合呈现关键数据；你善于将关键执行结果和过程经验记录下到中： - 记录关键结果：对用户要求…

)]\n \n if not docx_files:\n print(\"当前目录下没有找到.docx文件\")\n return\n \n print(f\"找到 {len(docx_files)} 个Word文档: {docx_files}\")\n \n # 转换每个Word文档\n for docx_file in docx_files:\n docx_path = os.path.join(current_dir, docx_file)\n base_name = os.path.splitext(os.path.basename(docx_path))[0]\n md_file_path = os.path.join(current_dir, f\"{base_name}.md\")\n \n print(f\"正在将 {docx_file} 转换为 Markdown...\")\n \n try:\n markdown_content = convert_docx_to_md(docx_path)\n \n with open(md_file_path, 'w', encoding='utf-8') as f:\n f.write(markdown_content)\n \n print(f\"转换完成：{md_file_path}\")\n \n except Exception as e:\n print(f\"转换失败 {docx_file}: {e}\")\n\n\nif __name__ == \"__main__\":\n main()","content_type":"text/x-python; charset=utf-8","language":"python","size":3147,"content_sha256":"155428979ab3634b142d405a894e9bd92ef4979f13443bb851fe58bdd565934c"},{"filename":"convert_word_to_md.py","content":"import docx\nimport os\n\n\ndef word_to_markdown(docx_path):\n \"\"\"\n 将Word文档(.docx)转换为Markdown格式\n \"\"\"\n # 加载Word文档\n try:\n doc = docx.Document(docx_path)\n except Exception as e:\n print(f\"错误：无法加载文档 {docx_path} - {e}\")\n return None\n\n markdown_content = []\n \n # 处理文档中的每个段落和元素\n for element in doc.element.body:\n # 检查元素类型并转换为相应的Markdown格式\n if element.tag.endswith('p'): # 段落\n paragraph = docx.text.paragraph.Paragraph(element, doc.styles)\n text = paragraph.text.strip()\n if text:\n # 检查标题样式\n style = paragraph.style.name\n if style.startswith('Heading'):\n level = style.split(' ')[1] if ' ' in style else '1'\n try:\n level = int(level)\n except ValueError:\n level = 1\n markdown_content.append(f\"{'#' * level} {text}\\n\")\n else:\n markdown_content.append(f\"{text}\\n\")\n elif element.tag.endswith('tbl'): # 表格\n table = docx.table.Table(element, doc.part)\n # 转换表格为Markdown格式\n markdown_content.append(\"\\n| \")\n # 添加表头\n header_cells = [cell.text for cell in table.rows[0].cells]\n markdown_content.append(\" | \".join(header_cells))\n markdown_content.append(\" |\\n\")\n \n # 添加分隔行\n markdown_content.append(\"| \")\n markdown_content.append(\" | \".join(['---' for _ in header_cells]))\n markdown_content.append(\" |\\n\")\n \n # 添加其他行\n for i in range(1, len(table.rows)):\n cells = [cell.text for cell in table.rows[i].cells]\n markdown_content.append(\"| \")\n markdown_content.append(\" | \".join(cells))\n markdown_content.append(\" |\\n\")\n \n return \"\".join(markdown_content)\n\n\ndef convert_docx_to_md(docx_file_path):\n \"\"\"\n 转换单个docx文件到markdown\n \"\"\"\n # 获取文件名（不含扩展名）\n base_name = os.path.splitext(os.path.basename(docx_file_path))[0]\n md_file_path = os.path.join(os.path.dirname(docx_file_path), f\"{base_name}.md\")\n \n print(f\"正在将 {docx_file_path} 转换为 {md_file_path}\")\n \n markdown_content = word_to_markdown(docx_file_path)\n \n if markdown_content:\n with open(md_file_path, 'w', encoding='utf-8') as f:\n f.write(markdown_content)\n print(f\"转换完成：{md_file_path}\")\n return md_file_path\n else:\n print(f\"转换失败：{docx_file_path}\")\n return None\n\n\ndef main():\n # 获取当前目录\n current_dir = os.getcwd()\n \n # 查找所有.docx文件\n docx_files = [f for f in os.listdir(current_dir) if f.endswith('.docx')]\n \n if not docx_files:\n print(\"当前目录下没有找到.docx文件\")\n return\n \n print(f\"找到 {len(docx_files)} 个Word文档: {docx_files}\")\n \n # 转换每个Word文档\n for docx_file in docx_files:\n docx_path = os.path.join(current_dir, docx_file)\n convert_docx_to_md(docx_path)\n\n\nif __name__ == \"__main__\":\n main()","content_type":"text/x-python; charset=utf-8","language":"python","size":3387,"content_sha256":"e6107c6e9c7597cdb9ce168c993bddd7bfaebbcc6783103d3b32d75dcf724e98"},{"filename":"prompts/pdf_to_word_conversion_guide.md","content":"# PDF to Word Conversion Guide for AI Agents\n\nThis guide outlines a reliable method for converting PDF files to Word documents (`.docx`) while preserving images and formatting. This process utilizes the `pdf2docx` Python library.\n\n## 1. Goal\n\nThe primary goal is to convert a PDF file to a Word document, ensuring that images, layout, and text formatting are retained as accurately as possible.\n\n## 2. Recommended Tool\n\nThe recommended tool for this task is the `pdf2docx` Python library. It is an open-source library that has proven to be effective for this purpose.\n\n## 3. Conversion Process\n\nThe following steps provide a detailed walkthrough of the conversion process.\n\n### Step 1: Check for `pdf2docx` Installation\n\nFirst, check if the `pdf2docx` library is installed. Use the `pip3 show` command.\n\n```bash\npip3 show pdf2docx\n```\n\nIf the library is installed, you will see information about the package. If not, the command will likely return an error or no output.\n\n### Step 2: Install `pdf2docx` (if necessary)\n\nIf `pdf2docx` is not installed, install it using `pip3`.\n\n```bash\npip3 install pdf2docx\n```\n\n### Step 3: Handle Potential Version Conflicts\n\nA known issue with `pdf2docx` is an incompatibility with newer versions of its dependency, `PyMuPDF`. This can cause an `AttributeError: 'Rect' object has no attribute 'get_area'`.\n\nIf you encounter this error, you must downgrade `PyMuPDF` to a compatible version (e.g., 1.26.4).\n\n```bash\npip3 install PyMuPDF==1.26.4\n```\n\n### Step 4: Create a Conversion Script\n\nCreate a Python script to handle the conversion. This script will import the necessary library, define the file paths, and perform the conversion.\n\nHere is a template for the conversion script:\n\n```python\nfrom pdf2docx import Converter\n\n# Define the input PDF and output DOCX file paths\npdf_file = 'path/to/your/document.pdf'\ndocx_file = 'path/to/your/document.docx'\n\ntry:\n # Create a Converter object\n cv = Converter(pdf_file)\n\n # Convert the PDF to a DOCX file\n cv.convert(docx_file, start=0, end=None)\n\n # Close the Converter object\n cv.close()\n\n print(f\"Successfully converted {pdf_file} to {docx_file}\")\n\nexcept Exception as e:\n print(f\"An error occurred: {e}\")\n\n```\n\nReplace `'path/to/your/document.pdf'` and `'path/to/your/document.docx'` with the actual file paths.\n\n### Step 5: Execute the Conversion Script\n\nRun the Python script from your terminal.\n\n```bash\npython3 /path/to/your/script/convert.py\n```\n\nThe script will print a success message upon completion or an error message if something goes wrong.\n\n### Step 6: Clean Up\n\nAfter the conversion is complete, you can delete the temporary conversion script.\n\n```bash\nrm /path/to/your/script/convert.py\n```\n\n## 4. Summary of Commands\n\nHere is a summary of the shell commands used in this process:\n\n```bash\n# Check for pdf2docx installation\npip3 show pdf2docx\n\n# Install pdf2docx (if necessary)\npip3 install pdf2docx\n\n# Downgrade PyMuPDF to a compatible version (if needed)\npip3 install PyMuPDF==1.26.4\n\n# Execute the conversion script\npython3 convert.py\n\n# Remove the temporary script\nrm convert.py\n```\n\nBy following this guide, AI agents can efficiently and accurately convert PDF files to Word documents.\n","content_type":"text/markdown; charset=utf-8","language":"markdown","size":3215,"content_sha256":"911cb084bc6f825d25aa356d2950d1739ee35a4070398569f15d2b0760023ba5"},{"filename":"转换说明.md","content":"# 文档转换说明\n\n## 已完成的转换工作\n\n1. PDF → Word: 使用 `convert_pdf_to_word.py` 将 `曼德月度采购计划.pdf` 转换为 `曼德月度采购计划.docx`\n2. Word → Markdown: 使用 `convert_word_to_md_simple.py` 将 `曼德月度采购计划.docx` 转换为 `曼德月度采购计划.md`\n\n## 转换工具\n\n- `convert_pdf_to_word.py`: 用于将PDF文档转换为Word文档\n- `convert_word_to_md_simple.py`: 用于将Word文档转换为Markdown格式\n- `曼德月度采购计划.md`: 从Word文档转换而来的最终Markdown文件\n\n## 文件说明\n\n- `曼德月度采购计划.pdf`: 原始PDF文件\n- `曼德月度采购计划.docx`: 从PDF转换来的Word文件\n- `曼德月度采购计划.md`: 从Word转换来的Markdown文件（最终结果）\n\n## 注意事项\n\n- 转换结果保留了原始文档的表格和文本内容\n- Markdown文件可以直接在支持Markdown的编辑器中查看和编辑","content_type":"text/markdown; charset=utf-8","language":"markdown","size":926,"content_sha256":"72d47663e3e26cf372dbebe06c9f15a957de4d0ac5f8080d90b9ddd0a44658a3"}],"content_json":{"type":"doc","content":[{"type":"heading","attrs":{"level":2},"content":[{"text":"角色","type":"text"}]},{"type":"paragraph","content":[{"text":"你是一个超级工作助理，根据用户要求在当前工作目录文件夹完成工作","type":"text"}]},{"type":"paragraph","content":[{"text":"你不会直接武断的执行用户任务，每次任务开始前你都会进行独立周全的思考：","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"分析用户偏好的沟通语言，并采用用户最熟悉的语言与用户互动，产出工作结果；","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"查阅 ","type":"text"},{"text":".memories","type":"text","marks":[{"type":"code_inline"}]},{"text":" 文件目录中可能存在的在当前工作目录已经完成的，与用户当前任务有前后承接关系的历史任务的参考资料","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"深入理解用户需求，分析用于目的，并合理的规划工作计划","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"客观制定任务结果的评估标准，并用于任务完成度检查的依据","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"你有一套严谨的任务执行方案体系：","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"你善于利用编程技巧，使用成熟的开源库来构建自己的任务执行的工具","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"对于计算类的任务，善于利用脚本、代码来完成复杂、海量数据的准确计算；","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"对于复杂任务，善于设计、核对、优化合理的计划清单保障任务的正常进行；","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"对于任务结果，你会以严谨客观的态度对结果进行周全的检查；","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"你有整洁的工作作风，对工作产生的文档、代码、脚本，能够有条理的整理和存放，并及时处理不必要的临时文件；","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"你有一套完善、成熟的体系来向用户来呈现工作结果：","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"善与用“生成一个本地网页”的形式来更生动的呈现任务结果；","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"善于用 markdown 文档来呈现文档型内容结果；","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"善于用 echarts 、matplotlib 等方式来生成图表来配合呈现关键数据；","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"你善于将关键执行结果和过程经验记录下到 ","type":"text"},{"text":".memories","type":"text","marks":[{"type":"code_inline"}]},{"text":" 中：","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"记录关键结果：对用户要求编写或者总结的内容，都会以 markdown文件的形式记录下来；","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"记录执行过程：每次任务完成，都会将本次任务发生的过程，以目标、计划、过程、结果以 markdown文件的形式记录下来；","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"设计合理的记录文件名：每一份记录文件的文件名都会语义明确的表明内容，方便你下一步任务执行的时候能够语句文件名有效的进行关联检索或用户主动查阅；","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"你具备专业的文档/文件处理知识包括但不限于：","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"使用 openpyxl 等开源库和工具来处理 excel 文件；","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"使用 FFmpeg 等开源库和工具来处理视频和音频文件；","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"使用 ","type":"text"},{"text":".prompts/pdf_to_word_conversion_guide.md","type":"text","marks":[{"type":"code_inline"}]},{"text":" 中记录的相关方法处理 pdf 内容提取，以及生成word文档的","type":"text"}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Directory Overview","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":".attachments","type":"text","marks":[{"type":"code_inline"}]},{"text":"：附件目录，通过用户通过上传附件的方式添加，根据提示词判断是否需要读取内容","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"memories","type":"text","marks":[{"type":"code_inline"}]},{"text":": 用于你主动记录和查阅在当前工作目录下的历史执行过程和关键执行结果","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":".prompts","type":"text","marks":[{"type":"code_inline"}]},{"text":"：提示词目录，包括各个特殊的文件处理场景的经验和技术方案；","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"SKILL.md","type":"text","marks":[{"type":"code_inline"}]},{"text":"：本模板的唯一规范文件。","type":"text"}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"要求","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"安全隐私要求：严禁上传/暴露本机私密信息与凭据；如需示例，使用脱敏占位数据。","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"字体使用要求：在为图表、文档或脚本设置字体时，应始终检查并使用目标系统上已安装且支持所需语言字符（如中文、日文等）的字体，避免使用不存在或不支持特定语言的字体导致显示异常。","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"默认语言要求：请尽量用中文跟用户交流","type":"text"}]}]}]},{"type":"hr","attrs":{"markup":"---"}}]},"metadata":{"date":"2026-06-05","name":"pdf-convert-to-word","author":"@skillopedia","source":{"stars":114,"repo_name":"dingtalk-wukong-skills","origin_url":"https://github.com/stvlynn/dingtalk-wukong-skills/blob/HEAD/pdf-convert-to-word/SKILL.md","repo_owner":"stvlynn","body_sha256":"11d1e09cbd4061e4dc9234a21ed338fd1097f05c29934a27646bca4affd0f8af","cluster_key":"58dca40698c5fc4b8e49872068d4ab69b1fbadd58b1671a696a647e2d523e3d9","clean_bundle":{"format":"clean-skill-bundle-v1","source":"stvlynn/dingtalk-wukong-skills/pdf-convert-to-word/SKILL.md","attachments":[{"id":"17edbe0a-d7bf-5488-9ac8-bcbacc3fdcc7","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/17edbe0a-d7bf-5488-9ac8-bcbacc3fdcc7/attachment.py","path":"convert_pdf_to_word.py","size":717,"sha256":"b97e7de462d89f629ddd6fcb47dda63083dc9a1dde46b86aacc059b374a81086","contentType":"text/x-python; charset=utf-8"},{"id":"48a2ad9b-693e-5297-8be2-c6296a5da8b1","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/48a2ad9b-693e-5297-8be2-c6296a5da8b1/attachment.py","path":"convert_word_to_md.py","size":3387,"sha256":"e6107c6e9c7597cdb9ce168c993bddd7bfaebbcc6783103d3b32d75dcf724e98","contentType":"text/x-python; charset=utf-8"},{"id":"3b5961a3-d6c7-5cc7-85e1-e5de9cb05935","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/3b5961a3-d6c7-5cc7-85e1-e5de9cb05935/attachment.py","path":"convert_word_to_md_simple.py","size":3147,"sha256":"155428979ab3634b142d405a894e9bd92ef4979f13443bb851fe58bdd565934c","contentType":"text/x-python; charset=utf-8"},{"id":"133e299e-87bd-575e-8466-77b612e2cfaf","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/133e299e-87bd-575e-8466-77b612e2cfaf/attachment.md","path":"prompts/pdf_to_word_conversion_guide.md","size":3215,"sha256":"911cb084bc6f825d25aa356d2950d1739ee35a4070398569f15d2b0760023ba5","contentType":"text/markdown; charset=utf-8"},{"id":"2fc7c20c-983e-5cbc-a3cc-2baae904e4fc","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/2fc7c20c-983e-5cbc-a3cc-2baae904e4fc/attachment.md","path":"转换说明.md","size":926,"sha256":"72d47663e3e26cf372dbebe06c9f15a957de4d0ac5f8080d90b9ddd0a44658a3","contentType":"text/markdown; charset=utf-8"}],"bundle_sha256":"8a63d5522f8caf09c1e8c4a0addc9086b1bf906af3ac76e53f99f57066c3a7ad","attachment_count":5,"text_attachments":5,"attachment_storage":"skillopedia-attachments-v1","binary_attachments":0,"excluded_attachments":[]},"cluster_size":1,"skill_md_path":"pdf-convert-to-word/SKILL.md","import_metadata":{"date":"2026-06-05","author":"@skillopedia","version":"v1","category":"web-development","category_label":"Web"},"exact_dupes_collapsed_into_this":0},"version":"v1","category":"web-development","import_tag":"clean-skills-v1","description":"Convert PDF files into editable Word and Markdown outputs with the bundled conversion scripts and workflow guide."}},"renderedAt":1782981626252}

Important: agents should read /llm.txt, /llms.txt, or /.well-known/skills.json to discover the public Skillopedia API.

角色你是一个超级工作助理，根据用户要求在当前工作目录文件夹完成工作你不会直接武断的执行用户任务，每次任务开始前你都会进行独立周全的思考： - 分析用户偏好的沟通语言，并采用用户最熟悉的语言与用户互动，产出工作结果； - 查阅文件目录中可能存在的在当前工作目录已经完成的，与用户当前任务有前后承接关系的历史任务的参考资料 - 深入理解用户需求，分析用于目的，并合理的规划工作计划 - 客观制定任务结果的评估标准，并用于任务完成度检查的依据你有一套严谨的任务执行方案体系： - 你善于利用编程技巧，使用成熟的开源库来构建自己的任务执行的工具 - 对于计算类的任务，善于利用脚本、代码来完成复杂、海量数据的准确计算； - 对于复杂任务，善于设计、核对、优化合理的计划清单保障任务的正常进行； - 对于任务结果，你会以严谨客观的态度对结果进行周全的检查； - 你有整洁的工作作风，对工作产生的文档、代码、脚本，能够有条理的整理和存放，并及时处理不必要的临时文件；你有一套完善、成熟的体系来向用户来呈现工作结果： - 善与用“生成一个本地网页”的形式来更生动的呈现任务结果； - 善于用 markdown 文档来呈现文档型内容结果； - 善于用 echarts 、matplotlib 等方式来生成图表来配合呈现关键数据；你善于将关键执行结果和过程经验记录下到中： - 记录关键结果：对用户要求…