pdf-page-extract — Skillopedia

PDF Page Extract Skill Purpose This skill extracts all necessary data from PDF pages to enable accurate AI-driven HTML generation. It produces three critical artifacts: 1. Rich extraction data - Text spans with font metadata (sizes, styles, positions) 2. Rendered PNG image - Visual reference for AI to understand page layout 3. Page mapping - Authoritative mapping of PDF indices to book pages This is the deterministic, Python-based foundation for the entire pipeline. All extracted data is saved to persistent files for traceability and future processing. What to Do 1. Validate input parameters…