pdf-ocr — Skillopedia

PDF OCR Overview Extract readable text from scanned or image-based PDF documents using optical character recognition (OCR). This skill converts PDF pages to images, runs OCR to detect text, and outputs clean structured text. Handles multi-page documents, multiple languages, and low-quality scans with preprocessing. Instructions When a user asks to OCR a scanned PDF or extract text from an image-based PDF, follow these steps: Step 1: Check if OCR is actually needed First, attempt normal text extraction. If the PDF already contains selectable text, OCR is unnecessary: Step 2: Install and verify…