Doc-Process — Document Intelligence Skill Step 0 — Auto-Setup (run once on first use) Before invoking any script for the first time in a session, check whether the script dependencies are available. If any are missing, run the setup script automatically — no prompting needed: This installs all Python packages ( , , , , , , , ) and attempts to install system binaries ( , ) via or depending on the platform. When to run Step 0: - First time any script-assisted mode is used in a session - After a fresh - If a script fails with or To install Python packages only (no system packages): Or install di…

, '', date_str)\n date_str = re.sub(r'\\s*[A-Z]{2,5}

Doc-Process — Document Intelligence Skill Step 0 — Auto-Setup (run once on first use) Before invoking any script for the first time in a session, check whether the script dependencies are available. If any are missing, run the setup script automatically — no prompting needed: This installs all Python packages ( , , , , , , , ) and attempts to install system binaries ( , ) via or depending on the platform. When to run Step 0: - First time any script-assisted mode is used in a session - After a fresh - If a script fails with or To install Python packages only (no system packages): Or install di…

, '', date_str).strip()\n\n for fmt in _DATE_FORMATS:\n try:\n return datetime.strptime(date_str, fmt)\n except ValueError:\n continue\n\n raise ValueError(\n f\"Unrecognised date format: '{date_str}'. \"\n \"Supported: YYYY-MM-DD, DD/MM/YYYY, MM/DD/YYYY, DD.MM.YYYY, \"\n \"DD Mon YYYY, Mon DD YYYY, YYYYMMDD, and variants with time.\"\n )\n\n\ndef normalise_date(date_str: str) -> str:\n \"\"\"Return ISO 8601 date string (YYYY-MM-DD), or raise ValueError.\"\"\"\n return parse_date(date_str).strftime(\"%Y-%m-%d\")\n\n\n# ---------------------------------------------------------------------------\n# Amount parsing\n# ---------------------------------------------------------------------------\n\n# Comprehensive currency symbol regex\n_CURRENCY_SYMBOLS = r'[£$€¥₹₩₪₺₽฿₫₦₲₡₵₴₸₼₾₠₢₣¢₤₨$£¥]'\n\n\ndef parse_amount(amount_str) -> float:\n \"\"\"\n Parse a monetary amount string to a float.\n\n Handles:\n - US format: $1,234.56 → 1234.56\n - European format: €1.234,56 → 1234.56\n - Space thousands: 1 234,56 → 1234.56\n - Parenthetical: (42.00) → -42.00\n - Currency codes: USD 12.50 → 12.50\n - Signed: -42.00, +12.00\n - Integers: 1000 → 1000.0\n - None / empty raises ValueError\n \"\"\"\n if amount_str is None:\n raise ValueError(\"Amount is None.\")\n if isinstance(amount_str, (int, float)):\n return round(float(amount_str), 4)\n\n s = str(amount_str).strip()\n if not s:\n raise ValueError(f\"Could not parse amount: '{amount_str}'\")\n\n # Parenthetical negative: (42.00) → -42.00\n negative = False\n if s.startswith('(') and s.endswith(')'):\n negative = True\n s = s[1:-1].strip()\n\n # Strip currency symbols\n s = re.sub(_CURRENCY_SYMBOLS, '', s).strip()\n\n # Strip leading/trailing ISO currency codes (USD, EUR, GBP…)\n s = re.sub(r'^[A-Za-z]{2,3}\\s+', '', s).strip()\n s = re.sub(r'\\s+[A-Za-z]{2,3}

Doc-Process — Document Intelligence Skill Step 0 — Auto-Setup (run once on first use) Before invoking any script for the first time in a session, check whether the script dependencies are available. If any are missing, run the setup script automatically — no prompting needed: This installs all Python packages ( , , , , , , , ) and attempts to install system binaries ( , ) via or depending on the platform. When to run Step 0: - First time any script-assisted mode is used in a session - After a fresh - If a script fails with or To install Python packages only (no system packages): Or install di…

, '', s).strip()\n\n # Handle explicit sign\n if s.startswith('-'):\n negative = True\n s = s[1:].strip()\n elif s.startswith('+'):\n s = s[1:].strip()\n\n if not s:\n raise ValueError(f\"Could not parse amount: '{amount_str}'\")\n\n # Remove spaces (thousands separator in some EU locales)\n s_ns = s.replace('\\u00a0', '').replace(' ', '') # also strip non-breaking space\n\n has_comma = ',' in s_ns\n has_dot = '.' in s_ns\n\n if has_comma and has_dot:\n # Whichever is rightmost is the decimal separator\n if s_ns.rfind(',') > s_ns.rfind('.'):\n # EU: 1.234,56\n s_clean = s_ns.replace('.', '').replace(',', '.')\n else:\n # US: 1,234.56\n s_clean = s_ns.replace(',', '')\n\n elif has_comma and not has_dot:\n parts = s_ns.split(',')\n if len(parts) == 2 and len(parts[1]) == 3 and parts[0].lstrip('-+').isdigit():\n # Likely US thousands with no decimal: 1,234\n s_clean = s_ns.replace(',', '')\n elif len(parts) > 2:\n # Multiple commas → US thousands grouping: 1,234,567\n s_clean = s_ns.replace(',', '')\n else:\n # EU decimal: 1234,56 or 1,23\n s_clean = s_ns.replace(',', '.')\n\n elif has_dot and not has_comma:\n parts = s_ns.split('.')\n if len(parts) > 2:\n # Multiple dots → EU thousands: 1.234.567\n s_clean = s_ns.replace('.', '')\n else:\n # Single dot → decimal separator\n s_clean = s_ns\n\n else:\n # No separators → plain integer\n s_clean = s_ns\n\n if not s_clean or s_clean in ('.', ','):\n raise ValueError(f\"Could not parse amount: '{amount_str}'\")\n\n try:\n value = float(s_clean)\n if negative:\n value = -abs(value)\n return round(value, 4)\n except ValueError:\n raise ValueError(f\"Could not parse amount: '{amount_str}'\")\n\n\n# ---------------------------------------------------------------------------\n# Currency detection\n# ---------------------------------------------------------------------------\n\n_CURRENCY_CODE_RE = re.compile(\n r'\\b(USD|EUR|GBP|JPY|CAD|AUD|CHF|CNY|CNH|HKD|SGD|INR|KRW|MXN|BRL|'\n r'SEK|NOK|DKK|NZD|ZAR|AED|SAR|QAR|KWD|BHD|OMR|EGP|NGN|KES|IDR|MYR|'\n r'THB|PHP|VND|PKR|BDT|LKR|TWD|CZK|PLN|HUF|RON|BGN|HRK|RSD|TRY|RUB|'\n r'UAH|ILS|ARS|CLP|COP|PEN|UYU|PYG|BOB|GTQ|HNL|CRC|DOP|JMD|TTD|BBD)\\b'\n)\n\n\ndef detect_currency(text: str) -> str:\n \"\"\"\n Detect ISO currency code from a string. Returns code or 'USD' as default.\n \"\"\"\n if not text:\n return \"USD\"\n m = _CURRENCY_CODE_RE.search(text.upper())\n if m:\n return m.group(1)\n # Symbol fallback\n symbol_map = {\n '£': 'GBP', '€': 'EUR', '¥': 'JPY', '₹': 'INR', '₩': 'KRW',\n '₪': 'ILS', '₺': 'TRY', '₽': 'RUB', '฿': 'THB', '₫': 'VND',\n '₦': 'NGN', '₲': 'PYG', '₡': 'CRC', '₵': 'GHS', '₴': 'UAH',\n '₸': 'KZT', 'R

Doc-Process — Document Intelligence Skill Step 0 — Auto-Setup (run once on first use) Before invoking any script for the first time in a session, check whether the script dependencies are available. If any are missing, run the setup script automatically — no prompting needed: This installs all Python packages ( , , , , , , , ) and attempts to install system binaries ( , ) via or depending on the platform. When to run Step 0: - First time any script-assisted mode is used in a session - After a fresh - If a script fails with or To install Python packages only (no system packages): Or install di…

: 'BRL', 'S

Doc-Process — Document Intelligence Skill Step 0 — Auto-Setup (run once on first use) Before invoking any script for the first time in a session, check whether the script dependencies are available. If any are missing, run the setup script automatically — no prompting needed: This installs all Python packages ( , , , , , , , ) and attempts to install system binaries ( , ) via or depending on the platform. When to run Step 0: - First time any script-assisted mode is used in a session - After a fresh - If a script fails with or To install Python packages only (no system packages): Or install di…

: 'SGD', 'HK

Doc-Process — Document Intelligence Skill Step 0 — Auto-Setup (run once on first use) Before invoking any script for the first time in a session, check whether the script dependencies are available. If any are missing, run the setup script automatically — no prompting needed: This installs all Python packages ( , , , , , , , ) and attempts to install system binaries ( , ) via or depending on the platform. When to run Step 0: - First time any script-assisted mode is used in a session - After a fresh - If a script fails with or To install Python packages only (no system packages): Or install di…

: 'HKD', 'NT

Doc-Process — Document Intelligence Skill Step 0 — Auto-Setup (run once on first use) Before invoking any script for the first time in a session, check whether the script dependencies are available. If any are missing, run the setup script automatically — no prompting needed: This installs all Python packages ( , , , , , , , ) and attempts to install system binaries ( , ) via or depending on the platform. When to run Step 0: - First time any script-assisted mode is used in a session - After a fresh - If a script fails with or To install Python packages only (no system packages): Or install di…

: 'TWD',\n }\n for sym, code in symbol_map.items():\n if sym in text:\n return code\n return \"USD\"\n\n\n# ---------------------------------------------------------------------------\n# Category taxonomy\n# ---------------------------------------------------------------------------\n\nEXPENSE_CATEGORIES: dict[str, list[str]] = {\n \"Food & Dining\": [\n \"Restaurants\", \"Groceries\", \"Coffee & Cafes\", \"Bars & Alcohol\",\n \"Food Delivery\", \"Fast Food\", \"Bubble Tea\", \"Bakery\",\n ],\n \"Travel\": [\n \"Flights\", \"Hotels & Lodging\", \"Car Rental\", \"Taxis & Rideshare\",\n \"Parking\", \"Fuel\", \"Public Transit\", \"Travel Packages\",\n ],\n \"Office & Supplies\": [\n \"Office Supplies\", \"Printing\", \"Postage & Shipping\", \"Furniture\",\n ],\n \"Technology\": [\n \"Software & SaaS\", \"Hardware\", \"Phone & Internet\",\n \"Cloud Services\", \"Cybersecurity\",\n ],\n \"Professional Services\": [\n \"Legal\", \"Accounting\", \"Consulting\", \"Recruiting\", \"Freelance\",\n ],\n \"Marketing\": [\"Advertising\", \"Design\", \"Events\", \"PR & Communications\"],\n \"Health & Medical\": [\n \"Doctor & Clinic\", \"Pharmacy\", \"Insurance\",\n \"Gym & Wellness\", \"Mental Health\", \"Dental & Vision\",\n ],\n \"Entertainment\": [\n \"Streaming\", \"Music & Audio\", \"Gaming\", \"Events & Tickets\", \"Books & Media\",\n ],\n \"Utilities\": [\n \"Electricity\", \"Water\", \"Gas\", \"Waste\", \"Phone & Internet\",\n ],\n \"Education\": [\n \"Online Courses\", \"Books\", \"Conferences\", \"Certifications\",\n \"Language Learning\", \"Tuition\",\n ],\n \"Retail & Shopping\": [\n \"Online Shopping\", \"Department Stores\", \"Clothing & Apparel\",\n \"Electronics\", \"Home & Garden\", \"Furniture & Home\", \"Personal Care\",\n ],\n \"Financial Services\": [\n \"Banking Fees\", \"Investment\", \"Insurance Premiums\", \"Loan Payments\",\n \"Cryptocurrency\",\n ],\n \"Other\": [\n \"Miscellaneous\", \"Payment Services\", \"ATM & Cash\", \"Uncategorized\",\n ],\n}\n\nALL_CATEGORIES: set[str] = set(EXPENSE_CATEGORIES.keys())\nALL_SUBCATEGORIES: set[str] = {\n sub for subs in EXPENSE_CATEGORIES.values() for sub in subs\n}\n\n\ndef validate_category(category: str, subcategory: str = \"\") -> tuple[str, str]:\n \"\"\"\n Validate category and subcategory. Case-insensitive matching.\n Returns (canonical_category, subcategory) or raises ValueError with suggestions.\n \"\"\"\n # Case-insensitive category lookup\n cat_map = {c.lower(): c for c in ALL_CATEGORIES}\n matched_cat = cat_map.get(category.lower())\n if not matched_cat:\n # Suggest closest\n suggestions = [c for c in ALL_CATEGORIES if category.lower() in c.lower()]\n hint = f\" Did you mean: {', '.join(suggestions)}?\" if suggestions else \\\n f\" Valid: {', '.join(sorted(ALL_CATEGORIES))}\"\n raise ValueError(f\"Unknown category '{category}'.{hint}\")\n\n if subcategory:\n valid_subs = EXPENSE_CATEGORIES[matched_cat]\n sub_map = {s.lower(): s for s in valid_subs}\n matched_sub = sub_map.get(subcategory.lower())\n if not matched_sub:\n suggestions = [s for s in valid_subs if subcategory.lower() in s.lower()]\n hint = f\" Did you mean: {', '.join(suggestions)}?\" if suggestions else \\\n f\" Valid for '{matched_cat}': {', '.join(valid_subs)}\"\n raise ValueError(f\"Unknown subcategory '{subcategory}'.{hint}\")\n return matched_cat, matched_sub\n\n return matched_cat, \"\"\n\n\n# ---------------------------------------------------------------------------\n# Global merchant patterns for auto-categorisation\n# (regex pattern, category, subcategory)\n# Ordered: more specific patterns before broader ones.\n# ---------------------------------------------------------------------------\n\nMERCHANT_PATTERNS: list[tuple[str, str, str]] = [\n\n # ── STREAMING ─────────────────────────────────────────────────────────────\n (r\"netflix\", \"Entertainment\", \"Streaming\"),\n (r\"hulu\", \"Entertainment\", \"Streaming\"),\n (r\"disney\\+|disneyplus\", \"Entertainment\", \"Streaming\"),\n (r\"apple\\s*tv\\+?(?!\\s*store)\", \"Entertainment\", \"Streaming\"),\n (r\"paramount\\+|paramountplus|cbs\\s*all\\s*access\", \"Entertainment\", \"Streaming\"),\n (r\"peacock\\s*(?:tv|premium)?\", \"Entertainment\", \"Streaming\"),\n (r\"hbo\\s*max|\\bmax\\s*(?:streaming|subscription)\", \"Entertainment\", \"Streaming\"),\n (r\"discovery\\+|discoveryplus\", \"Entertainment\", \"Streaming\"),\n (r\"amazon\\s*prime\\s*video|prime\\s*video\", \"Entertainment\", \"Streaming\"),\n (r\"crunchyroll|funimation\", \"Entertainment\", \"Streaming\"),\n (r\"mubi|criterion\\s*channel|shudder|britbox|acorn\\s*tv\", \"Entertainment\", \"Streaming\"),\n (r\"youtube\\s*premium\", \"Entertainment\", \"Streaming\"),\n (r\"espn\\+|espnplus\", \"Entertainment\", \"Streaming\"),\n (r\"sling\\s*tv|fubotv?|philo\\s*tv|directv\\s*stream\", \"Entertainment\", \"Streaming\"),\n (r\"tubi\\s*tv|pluto\\s*tv|plex\\s*(?:pass|tv)?\", \"Entertainment\", \"Streaming\"),\n (r\"viu\\b|iflix|catchplay|bilibili|iqiyi|youku|mango\\s*tv\", \"Entertainment\", \"Streaming\"), # Asia\n (r\"wavve|tving|watcha|laftel|seezn|kakao\\s*tv\", \"Entertainment\", \"Streaming\"), # Korea\n (r\"hotstar|jio\\s*cinema|zee5|voot|sony\\s*liv|eros\\s*now|alt\\s*balaji\", \"Entertainment\", \"Streaming\"), # India\n (r\"vidio|mola\\s*tv|genflix\", \"Entertainment\", \"Streaming\"), # Indonesia\n (r\"iptv|stan\\b|binge\\b|foxtel\\s*(?:now|go)?\", \"Entertainment\", \"Streaming\"), # Australia\n\n # ── MUSIC ─────────────────────────────────────────────────────────────────\n (r\"spotify\", \"Entertainment\", \"Streaming\"),\n (r\"apple\\s*music\", \"Entertainment\", \"Music & Audio\"),\n (r\"tidal\\b\", \"Entertainment\", \"Music & Audio\"),\n (r\"amazon\\s*music\", \"Entertainment\", \"Music & Audio\"),\n (r\"youtube\\s*music\", \"Entertainment\", \"Music & Audio\"),\n (r\"deezer\", \"Entertainment\", \"Music & Audio\"),\n (r\"soundcloud\", \"Entertainment\", \"Music & Audio\"),\n (r\"pandora\", \"Entertainment\", \"Music & Audio\"),\n (r\"qobuz|napster|iheartradio|bandcamp\", \"Entertainment\", \"Music & Audio\"),\n\n # ── GAMING ────────────────────────────────────────────────────────────────\n (r\"xbox\\s*(?:game\\s*pass|live|gold)|microsoft\\s*gaming\", \"Entertainment\", \"Gaming\"),\n (r\"playstation|psn\\b|ps\\s*(?:now|plus|store)\", \"Entertainment\", \"Gaming\"),\n (r\"nintendo(?:\\s*eshop)?\", \"Entertainment\", \"Gaming\"),\n (r\"steam(?:\\s*games|\\s*purchase)?\", \"Entertainment\", \"Gaming\"),\n (r\"epic\\s*games\", \"Entertainment\", \"Gaming\"),\n (r\"ea\\s*(?:play|origin)|\\borigin\\.com\", \"Entertainment\", \"Gaming\"),\n (r\"battle\\.net|blizzard\\s*entertainment\", \"Entertainment\", \"Gaming\"),\n (r\"ubisoft|uplay\\b\", \"Entertainment\", \"Gaming\"),\n (r\"discord\\s*nitro\", \"Entertainment\", \"Gaming\"),\n (r\"roblox|minecraft|valorant|fortnite|genshin\", \"Entertainment\", \"Gaming\"),\n\n # ── EVENTS & TICKETS ──────────────────────────────────────────────────────\n (r\"ticketmaster|ticketek|sistic|axs\\.com|eventbrite|stubhub|viagogo|dice\\.fm\", \"Entertainment\", \"Events & Tickets\"),\n (r\"amc\\s*theatre|regal\\s*cinema|cinemark|imax\\b|cineplex|odeon|cineworld|vue\\s*cinema\", \"Entertainment\", \"Events & Tickets\"),\n (r\"golden\\s*village|cathay\\s*cineplexes|shaw\\s*theatres|gsc\\s*cinemas|tgv\\s*cinemas\", \"Entertainment\", \"Events & Tickets\"), # SEA\n\n # ── FOOD DELIVERY ─────────────────────────────────────────────────────────\n (r\"uber\\s*eats\", \"Food & Dining\", \"Food Delivery\"),\n (r\"doordash\", \"Food & Dining\", \"Food Delivery\"),\n (r\"grubhub\", \"Food & Dining\", \"Food Delivery\"),\n (r\"postmates\", \"Food & Dining\", \"Food Delivery\"),\n (r\"instacart\", \"Food & Dining\", \"Food Delivery\"),\n (r\"swiggy\", \"Food & Dining\", \"Food Delivery\"),\n (r\"zomato\", \"Food & Dining\", \"Food Delivery\"),\n (r\"deliveroo\", \"Food & Dining\", \"Food Delivery\"),\n (r\"foodpanda|food\\s*panda\", \"Food & Dining\", \"Food Delivery\"),\n (r\"grabfood|grab\\s*food\", \"Food & Dining\", \"Food Delivery\"),\n (r\"rappi\", \"Food & Dining\", \"Food Delivery\"),\n (r\"ifood\\b\", \"Food & Dining\", \"Food Delivery\"),\n (r\"pedidosya|pedidos\\s*ya\", \"Food & Dining\", \"Food Delivery\"),\n (r\"wolt\\b\", \"Food & Dining\", \"Food Delivery\"),\n (r\"glovo\", \"Food & Dining\", \"Food Delivery\"),\n (r\"just\\s*eat|justeat|menulog|skip\\s*the\\s*dishes|skipthedishes\", \"Food & Dining\", \"Food Delivery\"),\n (r\"talabat\", \"Food & Dining\", \"Food Delivery\"),\n (r\"noon\\s*food|careem\\s*food|hungerstation\", \"Food & Dining\", \"Food Delivery\"),\n (r\"meituan|ele\\.me|baidu\\s*waimai\", \"Food & Dining\", \"Food Delivery\"),\n (r\"baemin|yogiyo|coupang\\s*eats\", \"Food & Dining\", \"Food Delivery\"),\n (r\"demaecan\", \"Food & Dining\", \"Food Delivery\"),\n\n # ── COFFEE & CAFES ────────────────────────────────────────────────────────\n (r\"starbucks\", \"Food & Dining\", \"Coffee & Cafes\"),\n (r\"dunkin(?:\\s*donuts)?\", \"Food & Dining\", \"Coffee & Cafes\"),\n (r\"tim\\s*hortons\", \"Food & Dining\", \"Coffee & Cafes\"),\n (r\"peet'?s\\s*coffee\", \"Food & Dining\", \"Coffee & Cafes\"),\n (r\"blue\\s*bottle\\s*coffee\", \"Food & Dining\", \"Coffee & Cafes\"),\n (r\"costa\\s*coffee\", \"Food & Dining\", \"Coffee & Cafes\"),\n (r\"caffe\\s*nero\", \"Food & Dining\", \"Coffee & Cafes\"),\n (r\"second\\s*cup\", \"Food & Dining\", \"Coffee & Cafes\"),\n (r\"gloria\\s*jean'?s\", \"Food & Dining\", \"Coffee & Cafes\"),\n (r\"the\\s*coffee\\s*bean\", \"Food & Dining\", \"Coffee & Cafes\"),\n (r\"caribou\\s*coffee\", \"Food & Dining\", \"Coffee & Cafes\"),\n (r\"mccafe|mc\\s*cafe\", \"Food & Dining\", \"Coffee & Cafes\"),\n (r\"nespresso|nescafe|lavazza|segafredo|illy\\s*caffe\", \"Food & Dining\", \"Coffee & Cafes\"),\n (r\"hollys\\s*coffee|ediya|coffee\\s*bene|twosome\\s*place|paik'?s\\s*coffee\", \"Food & Dining\", \"Coffee & Cafes\"), # Korea\n (r\"doutor|komeda|st\\s*marc\\s*cafe|tully'?s\", \"Food & Dining\", \"Coffee & Cafes\"), # Japan\n (r\"old\\s*town\\s*white\\s*coffee|zus\\s*coffee|tealive\", \"Food & Dining\", \"Coffee & Cafes\"), # Malaysia\n (r\"ya\\s*kun|toast\\s*box|killiney\\s*kopitiam\", \"Food & Dining\", \"Coffee & Cafes\"), # Singapore\n (r\"kopi\\s*kenangan|fore\\s*coffee|janji\\s*jiwa\", \"Food & Dining\", \"Coffee & Cafes\"), # Indonesia\n\n # ── BUBBLE TEA ────────────────────────────────────────────────────────────\n (r\"gong\\s*cha|kung\\s*fu\\s*tea|tiger\\s*sugar|koi\\s*the|xing\\s*fu\\s*tang|moge\\s*tee\", \"Food & Dining\", \"Bubble Tea\"),\n (r\"chatime|coco\\s*(?:fresh|boba)?|share\\s*tea|the\\s*alley|r&b\\s*tea|presotea|tp\\s*tea\", \"Food & Dining\", \"Bubble Tea\"),\n (r\"tpumkin|boba\\s*guys|kung\\s*tea|tiger\\s*milk\\s*tea|happy\\s*lemon\", \"Food & Dining\", \"Bubble Tea\"),\n\n # ── FAST FOOD ─────────────────────────────────────────────────────────────\n (r\"mcdonald'?s|mcdonalds|mcd\\b|golden\\s*arches\", \"Food & Dining\", \"Fast Food\"),\n (r\"burger\\s*king\", \"Food & Dining\", \"Fast Food\"),\n (r\"wendy'?s\", \"Food & Dining\", \"Fast Food\"),\n (r\"kfc|kentucky\\s*fried\\s*chicken\", \"Food & Dining\", \"Fast Food\"),\n (r\"subway(?!\\s*(?:train|metro|station|transit|rail))\", \"Food & Dining\", \"Fast Food\"),\n (r\"pizza\\s*hut\", \"Food & Dining\", \"Fast Food\"),\n (r\"domino'?s(?:\\s*pizza)?\", \"Food & Dining\", \"Fast Food\"),\n (r\"taco\\s*bell\", \"Food & Dining\", \"Fast Food\"),\n (r\"chick-?fil-?a\", \"Food & Dining\", \"Fast Food\"),\n (r\"five\\s*guys\", \"Food & Dining\", \"Fast Food\"),\n (r\"shake\\s*shack\", \"Food & Dining\", \"Fast Food\"),\n (r\"in-?n-?out\\s*burger\", \"Food & Dining\", \"Fast Food\"),\n (r\"popeyes\\s*(?:louisiana)?\", \"Food & Dining\", \"Fast Food\"),\n (r\"chipot?le\", \"Food & Dining\", \"Fast Food\"),\n (r\"panda\\s*express\", \"Food & Dining\", \"Fast Food\"),\n (r\"wingstop|wing\\s*stop|buffalo\\s*wild\\s*wings|bww\\b\", \"Food & Dining\", \"Fast Food\"),\n (r\"sonic\\s*drive-?in\", \"Food & Dining\", \"Fast Food\"),\n (r\"jack\\s*in\\s*the\\s*box\", \"Food & Dining\", \"Fast Food\"),\n (r\"carl'?s\\s*jr|hardee'?s|whataburger\", \"Food & Dining\", \"Fast Food\"),\n (r\"papa\\s*john'?s|little\\s*caesars\", \"Food & Dining\", \"Fast Food\"),\n (r\"nando'?s|wagamama|itsu\\b|leon\\b|pret\\s*a\\s*manger|pret\\b\", \"Food & Dining\", \"Fast Food\"), # UK\n (r\"greggs\\b\", \"Food & Dining\", \"Bakery\"),\n (r\"krispy\\s*kreme|cinnabon|auntie\\s*anne'?s\", \"Food & Dining\", \"Bakery\"),\n (r\"jollibee\", \"Food & Dining\", \"Fast Food\"), # Philippines\n (r\"mos\\s*burger|lotteria|yoshinoya|sukiya|matsuya|gyudon|pepper\\s*lunch\", \"Food & Dining\", \"Fast Food\"), # Japan/Korea\n (r\"grill'?d|hungry\\s*jack'?s|red\\s*rooster|oporto\", \"Food & Dining\", \"Fast Food\"), # Australia\n\n # ── GROCERIES ─────────────────────────────────────────────────────────────\n # US / Canada\n (r\"walmart(?!\\s*(?:pharmacy|vision|auto|money|credit))\", \"Food & Dining\", \"Groceries\"),\n (r\"costco\", \"Food & Dining\", \"Groceries\"),\n (r\"sam'?s\\s*club\", \"Food & Dining\", \"Groceries\"),\n (r\"kroger\", \"Food & Dining\", \"Groceries\"),\n (r\"whole\\s*foods\", \"Food & Dining\", \"Groceries\"),\n (r\"trader\\s*joe'?s\", \"Food & Dining\", \"Groceries\"),\n (r\"safeway\", \"Food & Dining\", \"Groceries\"),\n (r\"albertsons\", \"Food & Dining\", \"Groceries\"),\n (r\"publix\", \"Food & Dining\", \"Groceries\"),\n (r\"h-?e-?b\\b\", \"Food & Dining\", \"Groceries\"),\n (r\"meijer\", \"Food & Dining\", \"Groceries\"),\n (r\"wegmans\", \"Food & Dining\", \"Groceries\"),\n (r\"harris\\s*teeter\", \"Food & Dining\", \"Groceries\"),\n (r\"stop\\s*&\\s*shop\", \"Food & Dining\", \"Groceries\"),\n (r\"shoprite\", \"Food & Dining\", \"Groceries\"),\n (r\"aldi(?!\\s*(?:cafe|coffee))\", \"Food & Dining\", \"Groceries\"),\n (r\"food\\s*lion|sprouts\\s*farmers|fresh\\s*market|market\\s*basket\", \"Food & Dining\", \"Groceries\"),\n (r\"winco\\s*foods|grocery\\s*outlet|save-?a-?lot\", \"Food & Dining\", \"Groceries\"),\n (r\"winn-?dixie|jewel-?osco|randalls|tom\\s*thumb|vons|ralphs|pavilions|king\\s*soopers\", \"Food & Dining\", \"Groceries\"),\n (r\"loblaws|sobeys|maxi\\b|provigo|iga\\s*canada|metro\\s*(?:inc|grocery|épicerie)\", \"Food & Dining\", \"Groceries\"),\n # UK / Ireland\n (r\"tesco(?!\\s*(?:bank|mobile))\", \"Food & Dining\", \"Groceries\"),\n (r\"sainsbury'?s\", \"Food & Dining\", \"Groceries\"),\n (r\"asda\", \"Food & Dining\", \"Groceries\"),\n (r\"waitrose(?!\\s*(?:uae|me|middle\\s*east))\", \"Food & Dining\", \"Groceries\"),\n (r\"m&s\\s*food|marks\\s*&\\s*spencer\\s*food\", \"Food & Dining\", \"Groceries\"),\n (r\"lidl\", \"Food & Dining\", \"Groceries\"),\n (r\"iceland\\s*foods\", \"Food & Dining\", \"Groceries\"),\n (r\"co-?op\\s*(?:food|group)?\", \"Food & Dining\", \"Groceries\"),\n (r\"morrisons\", \"Food & Dining\", \"Groceries\"),\n (r\"dunnes\\s*stores\", \"Food & Dining\", \"Groceries\"),\n # France\n (r\"carrefour(?!\\s*(?:sa\\s*?|uae|egypt|kenya|maroc))\", \"Food & Dining\", \"Groceries\"),\n (r\"e\\.\\s*leclerc|leclerc\\b\", \"Food & Dining\", \"Groceries\"),\n (r\"intermarche|intermarché\", \"Food & Dining\", \"Groceries\"),\n (r\"auchan\", \"Food & Dining\", \"Groceries\"),\n (r\"monoprix|franprix|naturalia|biocoop|systeme\\s*u|système\\s*u\", \"Food & Dining\", \"Groceries\"),\n # Germany / Austria / Switzerland\n (r\"rewe\\b\", \"Food & Dining\", \"Groceries\"),\n (r\"edeka\", \"Food & Dining\", \"Groceries\"),\n (r\"netto\\s*marken-?discount\", \"Food & Dining\", \"Groceries\"),\n (r\"penny\\s*markt\", \"Food & Dining\", \"Groceries\"),\n (r\"billa(?!\\s*(?:creek))\", \"Food & Dining\", \"Groceries\"),\n (r\"migros\", \"Food & Dining\", \"Groceries\"),\n (r\"coop\\s*(?:schweiz|suisse|supermarkt)?\", \"Food & Dining\", \"Groceries\"),\n # Spain / Italy / Netherlands / Nordics\n (r\"mercadona\", \"Food & Dining\", \"Groceries\"),\n (r\"dia\\s*(?:supermarket)?\", \"Food & Dining\", \"Groceries\"),\n (r\"eroski|consum\\b|plusfresc\", \"Food & Dining\", \"Groceries\"),\n (r\"esselunga|conad|eurospin\\b\", \"Food & Dining\", \"Groceries\"),\n (r\"albert\\s*heijn|jumbo\\s*(?:supermarkt)?|plus\\s*supermarkt\", \"Food & Dining\", \"Groceries\"),\n (r\"ica\\s*(?:maxi|kvantum|nara|supermarket)?|willys|hemkop\", \"Food & Dining\", \"Groceries\"),\n (r\"rema\\s*1000|kiwi\\s*(?:minipris)?|bunnpris|joker\\s*(?:butikk)?\", \"Food & Dining\", \"Groceries\"),\n (r\"k-?market|k-?ruoka|prisma\\b|s-?market|sale\\b|alepa\\b\", \"Food & Dining\", \"Groceries\"),\n # Asia-Pacific\n (r\"fairprice|ntuc\\s*fairprice|cold\\s*storage|sheng\\s*siong|prime\\s*supermarket\", \"Food & Dining\", \"Groceries\"), # SG\n (r\"aeon(?!\\s*(?:insurance|credit|bank|mall\\s*(?!food)))\", \"Food & Dining\", \"Groceries\"),\n (r\"ito\\s*yokado|daiei|york\\s*mart|seiyu\\b|tokyustore\", \"Food & Dining\", \"Groceries\"),\n (r\"parknshop|wellcome\\s*(?:supermarket)?|citysuper|fusion\\s*superstore\", \"Food & Dining\", \"Groceries\"),\n (r\"99\\s*ranch|h\\s*mart|mitsuwa|nijiya|marukai|t&t\\s*supermarket\", \"Food & Dining\", \"Groceries\"),\n (r\"big\\s*c|lotus'?s|tops\\s*(?:market|supermarket)|villa\\s*market\", \"Food & Dining\", \"Groceries\"),\n (r\"giant\\s*(?:hypermarket|supermarket|superstore)|jaya\\s*grocer|village\\s*grocer|mydin\", \"Food & Dining\", \"Groceries\"),\n (r\"transmart|hypermart|superindo|lotte\\s*mart|ranch\\s*market|grand\\s*lucky\", \"Food & Dining\", \"Groceries\"),\n (r\"sm\\s*supermarket|puregold|robinsons\\s*supermarket|rustan'?s\", \"Food & Dining\", \"Groceries\"),\n (r\"woolworths(?!\\s*(?:financial|sa|south\\s*africa))\", \"Food & Dining\", \"Groceries\"),\n (r\"coles(?!\\s*(?:group|myer))\", \"Food & Dining\", \"Groceries\"),\n (r\"pak'?n'?save|new\\s*world\\s*(?:nz)?|countdown\\s*(?:nz)?|four\\s*square\", \"Food & Dining\", \"Groceries\"),\n # Middle East\n (r\"lulu\\s*(?:hypermarket|express)?\", \"Food & Dining\", \"Groceries\"),\n (r\"spinneys\", \"Food & Dining\", \"Groceries\"),\n (r\"waitrose\\s*(?:uae|me)\", \"Food & Dining\", \"Groceries\"),\n (r\"al\\s*maya|union\\s*coop|nesto\\s*hypermarket\", \"Food & Dining\", \"Groceries\"),\n (r\"carrefour\\s*(?:uae|sa|egypt|maroc|kenya)\", \"Food & Dining\", \"Groceries\"),\n # South Africa\n (r\"checkers(?!\\s*(?:auto|hardware))\", \"Food & Dining\", \"Groceries\"),\n (r\"pick\\s*n\\s*pay|shoprite\\s*(?:holdings|supermarket)?\", \"Food & Dining\", \"Groceries\"),\n (r\"woolworths\\s*(?:sa|south\\s*africa|food)\", \"Food & Dining\", \"Groceries\"),\n (r\"spar\\s*(?:south\\s*africa)?\", \"Food & Dining\", \"Groceries\"),\n\n # ── RESTAURANTS ──────────────────────────────────────────────────────────\n (r\"applebee'?s|chili'?s|ihop|denny'?s|olive\\s*garden|red\\s*lobster\", \"Food & Dining\", \"Restaurants\"),\n (r\"tgi\\s*friday'?s|outback\\s*steakhouse|texas\\s*roadhouse|cheesecake\\s*factory\", \"Food & Dining\", \"Restaurants\"),\n (r\"cracker\\s*barrel|bob\\s*evans|waffle\\s*house|first\\s*watch\", \"Food & Dining\", \"Restaurants\"),\n (r\"benihana|p\\.?\\s*f\\.?\\s*chang'?s|melting\\s*pot|bonefish\\s*grill\", \"Food & Dining\", \"Restaurants\"),\n\n # ── AIRLINES ──────────────────────────────────────────────────────────────\n # Americas\n (r\"delta\\s*(?:air\\s*lines|airlines)?\", \"Travel\", \"Flights\"),\n (r\"united\\s*(?:airlines|air)?\", \"Travel\", \"Flights\"),\n (r\"american\\s*(?:airlines|air)?\", \"Travel\", \"Flights\"),\n (r\"southwest\\s*(?:airlines|air)?\", \"Travel\", \"Flights\"),\n (r\"jetblue\\s*(?:airways)?\", \"Travel\", \"Flights\"),\n (r\"alaska\\s*(?:airlines|air)?\", \"Travel\", \"Flights\"),\n (r\"spirit\\s*(?:airlines)?\", \"Travel\", \"Flights\"),\n (r\"frontier\\s*(?:airlines)?\", \"Travel\", \"Flights\"),\n (r\"hawaiian\\s*(?:airlines)?\", \"Travel\", \"Flights\"),\n (r\"sun\\s*country|breeze\\s*airways|avelo\\s*airlines\", \"Travel\", \"Flights\"),\n (r\"air\\s*canada\", \"Travel\", \"Flights\"),\n (r\"westjet\", \"Travel\", \"Flights\"),\n (r\"porter\\s*airlines\", \"Travel\", \"Flights\"),\n (r\"aeromexico\", \"Travel\", \"Flights\"),\n (r\"latam\\s*(?:airlines)?\", \"Travel\", \"Flights\"),\n (r\"avianca\", \"Travel\", \"Flights\"),\n (r\"gol\\s*(?:linhas|airlines)?|azul\\s*(?:linhas|airlines)?\", \"Travel\", \"Flights\"),\n (r\"copa\\s*(?:airlines)?\", \"Travel\", \"Flights\"),\n # Europe\n (r\"british\\s*airways\", \"Travel\", \"Flights\"),\n (r\"lufthansa\", \"Travel\", \"Flights\"),\n (r\"air\\s*france\", \"Travel\", \"Flights\"),\n (r\"klm\\b\", \"Travel\", \"Flights\"),\n (r\"iberia\\s*(?:airlines)?\", \"Travel\", \"Flights\"),\n (r\"swiss\\s*(?:international\\s*air|air\\s*lines)?\", \"Travel\", \"Flights\"),\n (r\"austrian\\s*(?:airlines)?\", \"Travel\", \"Flights\"),\n (r\"sas\\b|scandinavian\\s*airlines\", \"Travel\", \"Flights\"),\n (r\"finnair\", \"Travel\", \"Flights\"),\n (r\"norwegian\\s*(?:air)?\", \"Travel\", \"Flights\"),\n (r\"wizz\\s*air|wizzair\", \"Travel\", \"Flights\"),\n (r\"ryanair\", \"Travel\", \"Flights\"),\n (r\"easyjet\", \"Travel\", \"Flights\"),\n (r\"vueling|transavia|volotea\", \"Travel\", \"Flights\"),\n (r\"brussels\\s*airlines\", \"Travel\", \"Flights\"),\n (r\"lot\\s*(?:polish\\s*airlines)?\", \"Travel\", \"Flights\"),\n (r\"tap\\s*(?:air\\s*portugal)?\", \"Travel\", \"Flights\"),\n (r\"alitalia|ita\\s*airways\", \"Travel\", \"Flights\"),\n # Middle East & Africa\n (r\"emirates\\s*(?:airlines)?\", \"Travel\", \"Flights\"),\n (r\"etihad\\s*(?:airways)?\", \"Travel\", \"Flights\"),\n (r\"qatar\\s*airways\", \"Travel\", \"Flights\"),\n (r\"turkish\\s*(?:airlines|air)?\", \"Travel\", \"Flights\"),\n (r\"flydubai\", \"Travel\", \"Flights\"),\n (r\"air\\s*arabia\", \"Travel\", \"Flights\"),\n (r\"ethiopian\\s*(?:airlines)?\", \"Travel\", \"Flights\"),\n (r\"kenya\\s*airways\", \"Travel\", \"Flights\"),\n (r\"south\\s*african\\s*airways\", \"Travel\", \"Flights\"),\n (r\"royal\\s*air\\s*maroc\", \"Travel\", \"Flights\"),\n (r\"egyptair\", \"Travel\", \"Flights\"),\n # Asia-Pacific\n (r\"singapore\\s*airlines|silkair\", \"Travel\", \"Flights\"),\n (r\"cathay\\s*pacific|hk\\s*express|hong\\s*kong\\s*express\", \"Travel\", \"Flights\"),\n (r\"japan\\s*airlines|jal\\b\", \"Travel\", \"Flights\"),\n (r\"ana\\b|all\\s*nippon\\s*airways\", \"Travel\", \"Flights\"),\n (r\"korean\\s*air\", \"Travel\", \"Flights\"),\n (r\"asiana\\s*(?:airlines)?\", \"Travel\", \"Flights\"),\n (r\"thai\\s*(?:airways|airasia)\", \"Travel\", \"Flights\"),\n (r\"garuda\\s*(?:indonesia)?\", \"Travel\", \"Flights\"),\n (r\"malaysia\\s*airlines|malindo\\s*air|batik\\s*air\", \"Travel\", \"Flights\"),\n (r\"vietnam\\s*airlines\", \"Travel\", \"Flights\"),\n (r\"(?:air)?asia|airasia\", \"Travel\", \"Flights\"),\n (r\"lion\\s*air|wings\\s*air\", \"Travel\", \"Flights\"),\n (r\"indigo\\s*(?:airlines)?|6e\\b\", \"Travel\", \"Flights\"),\n (r\"air\\s*india\", \"Travel\", \"Flights\"),\n (r\"spicejet|vistara|go\\s*(?:first|air|airlines)\", \"Travel\", \"Flights\"),\n (r\"qantas\\s*(?:airways)?\", \"Travel\", \"Flights\"),\n (r\"jetstar\\s*(?:airways)?\", \"Travel\", \"Flights\"),\n (r\"virgin\\s*australia\", \"Travel\", \"Flights\"),\n (r\"air\\s*new\\s*zealand\", \"Travel\", \"Flights\"),\n (r\"cebu\\s*pacific|philippine\\s*airlines\", \"Travel\", \"Flights\"),\n\n # ── HOTELS & LODGING ──────────────────────────────────────────────────────\n (r\"marriott\", \"Travel\", \"Hotels & Lodging\"),\n (r\"hilton(?!\\s*(?:honors\\s*credit|garden\\s*inn\\s*credit))\", \"Travel\", \"Hotels & Lodging\"),\n (r\"hyatt\", \"Travel\", \"Hotels & Lodging\"),\n (r\"sheraton|westin|w\\s*hotels|st\\.\\s*regis|ritz-?carlton|le\\s*meridien|autograph\\s*collection\", \"Travel\", \"Hotels & Lodging\"),\n (r\"intercontinental|crowne\\s*plaza|holiday\\s*inn|staybridge|indigo\\s*hotel|kimpton\\s*hotel\", \"Travel\", \"Hotels & Lodging\"),\n (r\"best\\s*western\", \"Travel\", \"Hotels & Lodging\"),\n (r\"radisson(?!\\s*(?:blu\\s*rewards\\s*credit))\", \"Travel\", \"Hotels & Lodging\"),\n (r\"wyndham|ramada|days\\s*inn|super\\s*8|la\\s*quinta|microtel|travelodge|howard\\s*johnson\", \"Travel\", \"Hotels & Lodging\"),\n (r\"choice\\s*hotels|comfort\\s*inn|quality\\s*inn|sleep\\s*inn|econo\\s*lodge\", \"Travel\", \"Hotels & Lodging\"),\n (r\"motel\\s*6|red\\s*roof\\s*inn|extended\\s*stay|woodspring\\s*suites\", \"Travel\", \"Hotels & Lodging\"),\n (r\"four\\s*seasons\", \"Travel\", \"Hotels & Lodging\"),\n (r\"mandarin\\s*oriental\", \"Travel\", \"Hotels & Lodging\"),\n (r\"sofitel|novotel|ibis(?!\\s*(?:paint|styles\\s*paint))|mercure|pullman|mgallery\", \"Travel\", \"Hotels & Lodging\"),\n (r\"accorhotels|accor\\s*(?:live|hotels)\", \"Travel\", \"Hotels & Lodging\"),\n (r\"airbnb\", \"Travel\", \"Hotels & Lodging\"),\n (r\"vrbo\", \"Travel\", \"Hotels & Lodging\"),\n (r\"booking\\.com\", \"Travel\", \"Hotels & Lodging\"),\n (r\"expedia(?!\\s*(?:cruise|group\\s*media))\", \"Travel\", \"Hotels & Lodging\"),\n (r\"hotels\\.com\", \"Travel\", \"Hotels & Lodging\"),\n (r\"sonder\\b|vacasa\\b|blueground|furnished\\s*finder\", \"Travel\", \"Hotels & Lodging\"),\n\n # ── RIDESHARE & TAXI ──────────────────────────────────────────────────────\n (r\"uber(?!\\s*(?:eats|one))\", \"Travel\", \"Taxis & Rideshare\"),\n (r\"lyft\", \"Travel\", \"Taxis & Rideshare\"),\n (r\"grab(?!\\s*(?:food|mart|express|financial|pay))\", \"Travel\", \"Taxis & Rideshare\"),\n (r\"ola\\s*(?:cabs|money)?\", \"Travel\", \"Taxis & Rideshare\"),\n (r\"bolt(?!\\s*(?:ev|electric|scooter|energy))\", \"Travel\", \"Taxis & Rideshare\"),\n (r\"gett\\b|taxify\", \"Travel\", \"Taxis & Rideshare\"),\n (r\"cabify\", \"Travel\", \"Taxis & Rideshare\"),\n (r\"99\\s*taxis|indriver\", \"Travel\", \"Taxis & Rideshare\"),\n (r\"didi(?!\\s*(?:food|chuxing\\s*invest))\", \"Travel\", \"Taxis & Rideshare\"),\n (r\"yandex\\s*(?:taxi|go)\\b\", \"Travel\", \"Taxis & Rideshare\"),\n (r\"gojek|gocar\", \"Travel\", \"Taxis & Rideshare\"),\n\n # ── CAR RENTAL ────────────────────────────────────────────────────────────\n (r\"hertz\\b\", \"Travel\", \"Car Rental\"),\n (r\"avis\\s*(?:car|rent|budget)?\", \"Travel\", \"Car Rental\"),\n (r\"budget\\s*(?:car|rent\\s*a\\s*car)\", \"Travel\", \"Car Rental\"),\n (r\"enterprise\\s*(?:rent|car|holdings)?\", \"Travel\", \"Car Rental\"),\n (r\"national\\s*car\\s*rental\", \"Travel\", \"Car Rental\"),\n (r\"alamo(?!\\s*(?:drafthouse))\", \"Travel\", \"Car Rental\"),\n (r\"thrifty\\s*(?:car)?\", \"Travel\", \"Car Rental\"),\n (r\"dollar\\s*car\\s*rental\", \"Travel\", \"Car Rental\"),\n (r\"sixt\\s*(?:rent\\s*a\\s*car)?\", \"Travel\", \"Car Rental\"),\n (r\"europcar\", \"Travel\", \"Car Rental\"),\n (r\"zipcar\", \"Travel\", \"Car Rental\"),\n (r\"turo\\b\", \"Travel\", \"Car Rental\"),\n (r\"getaround\", \"Travel\", \"Car Rental\"),\n\n # ── FUEL / GAS ────────────────────────────────────────────────────────────\n (r\"shell(?!\\s*(?:gift|rewards|credit))\", \"Travel\", \"Fuel\"),\n (r\"bp\\b|british\\s*petroleum\", \"Travel\", \"Fuel\"),\n (r\"chevron(?!\\s*(?:bank|card|credit))\", \"Travel\", \"Fuel\"),\n (r\"exxon(?:mobil)?\", \"Travel\", \"Fuel\"),\n (r\"texaco\", \"Travel\", \"Fuel\"),\n (r\"arco\\b\", \"Travel\", \"Fuel\"),\n (r\"sunoco\", \"Travel\", \"Fuel\"),\n (r\"marathon\\s*(?:petro|gas|fuel)?\", \"Travel\", \"Fuel\"),\n (r\"murphy\\s*(?:usa|express)?\", \"Travel\", \"Fuel\"),\n (r\"casey'?s\\s*(?:general\\s*store)?\", \"Travel\", \"Fuel\"),\n (r\"circle\\s*k(?!\\s*convenience\\s*non)\", \"Travel\", \"Fuel\"),\n (r\"pilot\\s*(?:flying\\s*j|travel\\s*centers)?\", \"Travel\", \"Fuel\"),\n (r\"love'?s\\s*(?:travel\\s*stops)?\", \"Travel\", \"Fuel\"),\n (r\"speedway(?!\\s*(?:motorsports|nascar|casino))\", \"Travel\", \"Fuel\"),\n (r\"wawa(?!\\s*(?:inc|foundation|music))\", \"Travel\", \"Fuel\"),\n (r\"sheetz\", \"Travel\", \"Fuel\"),\n (r\"quiktrip|kwik\\s*trip|racetrac|maverik|cenex|kum\\s*&\\s*go|thorntons\", \"Travel\", \"Fuel\"),\n (r\"total\\s*(?:energies)?|elf\\b|esso\\b\", \"Travel\", \"Fuel\"),\n (r\"repsol\", \"Travel\", \"Fuel\"),\n (r\"petrobras|ipiranga|br\\s*distribuidora\", \"Travel\", \"Fuel\"),\n (r\"pemex\", \"Travel\", \"Fuel\"),\n (r\"sinopec|petrochina|cnpc\", \"Travel\", \"Fuel\"),\n (r\"enoc\\b|adnoc\\b|emarat\\b\", \"Travel\", \"Fuel\"),\n (r\"petron\\b|caltex(?!\\s*credit)\", \"Travel\", \"Fuel\"),\n (r\"pertamina\", \"Travel\", \"Fuel\"),\n (r\"ampol|bp\\s*australia\", \"Travel\", \"Fuel\"),\n\n # ── PUBLIC TRANSIT ────────────────────────────────────────────────────────\n (r\"mta\\b|nyc\\s*transit|new\\s*york\\s*transit\", \"Travel\", \"Public Transit\"),\n (r\"bart\\b\", \"Travel\", \"Public Transit\"),\n (r\"cta\\b|chicago\\s*transit\", \"Travel\", \"Public Transit\"),\n (r\"wmata\\b|washington\\s*metro\", \"Travel\", \"Public Transit\"),\n (r\"septa\\b\", \"Travel\", \"Public Transit\"),\n (r\"clipper\\s*card|orca\\s*card|charlie\\s*card|smartrip\\s*card\", \"Travel\", \"Public Transit\"),\n (r\"tfl\\b|transport\\s*for\\s*london|oyster\\s*card\", \"Travel\", \"Public Transit\"),\n (r\"translink\\b|stm\\b|oc\\s*transpo\\b|presto\\s*card\", \"Travel\", \"Public Transit\"),\n (r\"suica|pasmo|toica|manaca|nimoca|hayakaken\", \"Travel\", \"Public Transit\"),\n (r\"t-?money\\b|cashbee\\b\", \"Travel\", \"Public Transit\"),\n (r\"octopus\\s*card\", \"Travel\", \"Public Transit\"),\n (r\"ez-?link|nets\\s*flashpay\", \"Travel\", \"Public Transit\"),\n (r\"opal\\s*card\", \"Travel\", \"Public Transit\"),\n\n # ── CLOUD & INFRASTRUCTURE ────────────────────────────────────────────────\n (r\"amazon\\s*web\\s*services|aws\\b\", \"Technology\", \"Cloud Services\"),\n (r\"google\\s*cloud|gcp\\b\", \"Technology\", \"Cloud Services\"),\n (r\"microsoft\\s*azure|azure\\b\", \"Technology\", \"Cloud Services\"),\n (r\"digitalocean\", \"Technology\", \"Cloud Services\"),\n (r\"cloudflare\", \"Technology\", \"Cloud Services\"),\n (r\"linode\\b|akamai\\s*cloud\", \"Technology\", \"Cloud Services\"),\n (r\"vultr\\b\", \"Technology\", \"Cloud Services\"),\n (r\"hetzner\", \"Technology\", \"Cloud Services\"),\n (r\"ovhcloud|ovh\\b\", \"Technology\", \"Cloud Services\"),\n (r\"rackspace\", \"Technology\", \"Cloud Services\"),\n (r\"ibm\\s*cloud\", \"Technology\", \"Cloud Services\"),\n (r\"oracle\\s*cloud\", \"Technology\", \"Cloud Services\"),\n (r\"alibaba\\s*cloud|aliyun\", \"Technology\", \"Cloud Services\"),\n (r\"tencent\\s*cloud|huawei\\s*cloud\", \"Technology\", \"Cloud Services\"),\n\n # ── SOFTWARE & SaaS ───────────────────────────────────────────────────────\n (r\"github\", \"Technology\", \"Software & SaaS\"),\n (r\"gitlab\", \"Technology\", \"Software & SaaS\"),\n (r\"atlassian|jira\\b|confluence\\b|bitbucket|trello\\b\", \"Technology\", \"Software & SaaS\"),\n (r\"notion\\b\", \"Technology\", \"Software & SaaS\"),\n (r\"figma\\b\", \"Technology\", \"Software & SaaS\"),\n (r\"slack(?!\\s*(?:apparel|store))\", \"Technology\", \"Software & SaaS\"),\n (r\"zoom(?!\\s*(?:car|dental|whitening|tan))\", \"Technology\", \"Software & SaaS\"),\n (r\"webex\", \"Technology\", \"Software & SaaS\"),\n (r\"microsoft\\s*365|office\\s*365|microsoft\\s*office\", \"Technology\", \"Software & SaaS\"),\n (r\"google\\s*workspace|g\\s*suite|google\\s*one\\b|google\\s*drive\", \"Technology\", \"Software & SaaS\"),\n (r\"adobe\\s*(?:creative\\s*cloud|acrobat|lightroom|photoshop|illustrator|premiere|after\\s*effects|substance)\", \"Technology\", \"Software & SaaS\"),\n (r\"dropbox(?!\\s*(?:sign|paper\\s*sub))\", \"Technology\", \"Software & SaaS\"),\n (r\"box\\.com|box\\s*inc\", \"Technology\", \"Software & SaaS\"),\n (r\"canva\\b\", \"Technology\", \"Software & SaaS\"),\n (r\"sketch\\s*(?:app)?\", \"Technology\", \"Software & SaaS\"),\n (r\"invision|zeplin|framer\\b\", \"Technology\", \"Software & SaaS\"),\n (r\"linear\\.app|linear\\s*(?:issues)?\", \"Technology\", \"Software & SaaS\"),\n (r\"asana\\b\", \"Technology\", \"Software & SaaS\"),\n (r\"monday\\.com\", \"Technology\", \"Software & SaaS\"),\n (r\"clickup\", \"Technology\", \"Software & SaaS\"),\n (r\"basecamp\", \"Technology\", \"Software & SaaS\"),\n (r\"airtable\", \"Technology\", \"Software & SaaS\"),\n (r\"hubspot\", \"Technology\", \"Software & SaaS\"),\n (r\"salesforce\", \"Technology\", \"Software & SaaS\"),\n (r\"pipedrive\", \"Technology\", \"Software & SaaS\"),\n (r\"intercom\\b\", \"Technology\", \"Software & SaaS\"),\n (r\"zendesk\", \"Technology\", \"Software & SaaS\"),\n (r\"freshdesk|freshworks\", \"Technology\", \"Software & SaaS\"),\n (r\"mailchimp\", \"Technology\", \"Software & SaaS\"),\n (r\"klaviyo|activecampaign|convertkit|sendgrid|mailgun|brevo|sendinblue\", \"Technology\", \"Software & SaaS\"),\n (r\"twilio\\b\", \"Technology\", \"Software & SaaS\"),\n (r\"datadog\", \"Technology\", \"Software & SaaS\"),\n (r\"new\\s*relic\", \"Technology\", \"Software & SaaS\"),\n (r\"sentry\\.io|sentry\\s*subscription\", \"Technology\", \"Software & SaaS\"),\n (r\"pagerduty\", \"Technology\", \"Software & SaaS\"),\n (r\"circleci|travis\\s*ci\", \"Technology\", \"Software & SaaS\"),\n (r\"vercel\\b\", \"Technology\", \"Software & SaaS\"),\n (r\"netlify\\b\", \"Technology\", \"Software & SaaS\"),\n (r\"heroku\\b\", \"Technology\", \"Software & SaaS\"),\n (r\"supabase|planetscale|neon\\s*database|railway\\.app|render\\.com|fly\\.io\", \"Technology\", \"Software & SaaS\"),\n (r\"jetbrains|intellij|pycharm|webstorm|goland|clion|datagrip|rider\\b\", \"Technology\", \"Software & SaaS\"),\n (r\"quickbooks|xero\\b|freshbooks|wave\\s*accounting|sage\\s*(?:50|intacct|accounting)\", \"Technology\", \"Software & SaaS\"),\n (r\"shopify\\b\", \"Technology\", \"Software & SaaS\"),\n (r\"squarespace|wix\\.com|webflow\\b|wordpress\\.com\", \"Technology\", \"Software & SaaS\"),\n (r\"loom\\.com\", \"Technology\", \"Software & SaaS\"),\n (r\"calendly\", \"Technology\", \"Software & SaaS\"),\n (r\"typeform|surveymonkey|hotjar\\b\", \"Technology\", \"Software & SaaS\"),\n\n # ── CYBERSECURITY ─────────────────────────────────────────────────────────\n (r\"1password|lastpass|bitwarden|dashlane|keeper\\s*security\", \"Technology\", \"Cybersecurity\"),\n (r\"nordvpn|expressvpn|protonvpn|surfshark|private\\s*internet\\s*access|mullvad\", \"Technology\", \"Cybersecurity\"),\n (r\"norton\\s*(?:360|antivirus|lifelock)?|mcafee|bitdefender|malwarebytes|crowdstrike|eset\\b\", \"Technology\", \"Cybersecurity\"),\n\n # ── TELECOM / PHONE / INTERNET ────────────────────────────────────────────\n # US\n (r\"at&t(?!\\s*(?:retirement|pension|savings\\s*plan))\", \"Utilities\", \"Phone & Internet\"),\n (r\"verizon(?!\\s*(?:pension|retirement|media))\", \"Utilities\", \"Phone & Internet\"),\n (r\"t-?mobile\", \"Utilities\", \"Phone & Internet\"),\n (r\"comcast|xfinity\", \"Utilities\", \"Phone & Internet\"),\n (r\"charter\\s*communications|spectrum\\b\", \"Utilities\", \"Phone & Internet\"),\n (r\"cox\\s*communications\", \"Utilities\", \"Phone & Internet\"),\n (r\"dish\\s*network|directv\\s*(?!stream)\", \"Utilities\", \"Phone & Internet\"),\n (r\"centurylink|lumen\\s*technologies|frontier\\s*communications|windstream\", \"Utilities\", \"Phone & Internet\"),\n (r\"boost\\s*mobile|cricket\\s*wireless|metropcs|tracfone|mint\\s*mobile|visible\\s*wireless\", \"Utilities\", \"Phone & Internet\"),\n # Canada\n (r\"rogers(?!\\s*(?:arena|centre))\", \"Utilities\", \"Phone & Internet\"),\n (r\"bell\\s*(?:canada|mobility|fibe|mts)?\", \"Utilities\", \"Phone & Internet\"),\n (r\"telus(?!\\s*garden)\", \"Utilities\", \"Phone & Internet\"),\n (r\"shaw\\s*(?:communications)?|freedom\\s*mobile|videotron|eastlink|sasktel\", \"Utilities\", \"Phone & Internet\"),\n # UK\n (r\"vodafone(?!\\s*(?:idea|india))\", \"Utilities\", \"Phone & Internet\"),\n (r\"o2\\s*(?:uk)?\", \"Utilities\", \"Phone & Internet\"),\n (r\"bt\\s*(?:group|broadband|mobile)?\", \"Utilities\", \"Phone & Internet\"),\n (r\"sky\\s*(?:uk|broadband|tv|mobile|q)?\", \"Utilities\", \"Phone & Internet\"),\n (r\"three\\s*(?:uk)?|3\\s*uk\\b\", \"Utilities\", \"Phone & Internet\"),\n (r\"ee\\b|everything\\s*everywhere\", \"Utilities\", \"Phone & Internet\"),\n (r\"virgin\\s*media\", \"Utilities\", \"Phone & Internet\"),\n (r\"talktalk|plusnet\", \"Utilities\", \"Phone & Internet\"),\n # Europe\n (r\"deutsche\\s*telekom|t-?online|telekom\\s*deutschland\", \"Utilities\", \"Phone & Internet\"),\n (r\"orange(?!\\s*(?:julius|theory\\s*fitness|county|bank|theory))\", \"Utilities\", \"Phone & Internet\"),\n (r\"sfr\\b|bouygues\\s*telecom|free\\s*mobile|free\\s*telecom\", \"Utilities\", \"Phone & Internet\"),\n (r\"swisscom\", \"Utilities\", \"Phone & Internet\"),\n (r\"salt\\s*mobile|sunrise\\s*uw\", \"Utilities\", \"Phone & Internet\"),\n (r\"proximus\\b|telenet\\s*belgium\", \"Utilities\", \"Phone & Internet\"),\n (r\"telefonica|movistar|yoigo|masmovil\", \"Utilities\", \"Phone & Internet\"),\n (r\"tim\\s*(?:brasil|telecom\\s*italia)\", \"Utilities\", \"Phone & Internet\"),\n (r\"telenor\", \"Utilities\", \"Phone & Internet\"),\n (r\"telia(?!\\s*(?:company\\s*arena|arena))\", \"Utilities\", \"Phone & Internet\"),\n # Asia\n (r\"singtel\", \"Utilities\", \"Phone & Internet\"),\n (r\"starhub\", \"Utilities\", \"Phone & Internet\"),\n (r\"m1\\s*(?:limited|singapore)?\", \"Utilities\", \"Phone & Internet\"),\n (r\"telstra\\b\", \"Utilities\", \"Phone & Internet\"),\n (r\"optus\\b\", \"Utilities\", \"Phone & Internet\"),\n (r\"tpg\\s*telecom|aussie\\s*broadband\", \"Utilities\", \"Phone & Internet\"),\n (r\"airtel(?!\\s*(?:africa\\s*nigeria|money))\", \"Utilities\", \"Phone & Internet\"),\n (r\"jio\\b|reliance\\s*jio\", \"Utilities\", \"Phone & Internet\"),\n (r\"bsnl\\b|vi\\b|vodafone\\s*idea\", \"Utilities\", \"Phone & Internet\"),\n (r\"globe\\s*telecom|smart\\s*communications|dito\\s*telecom\", \"Utilities\", \"Phone & Internet\"),\n (r\"ntt\\s*docomo|softbank(?!\\s*(?:vision\\s*fund|robotics))|kddi\\b\", \"Utilities\", \"Phone & Internet\"),\n (r\"sk\\s*telecom|kt\\s*(?:corporation)?|lg\\s*uplus\", \"Utilities\", \"Phone & Internet\"),\n (r\"china\\s*mobile|china\\s*unicom|china\\s*telecom\", \"Utilities\", \"Phone & Internet\"),\n (r\"celcom|maxis\\b|digi\\s*telecom|u\\s*mobile\", \"Utilities\", \"Phone & Internet\"),\n (r\"du\\b\\s*(?:telecom)?|etisalat\\b|e&\\b\", \"Utilities\", \"Phone & Internet\"),\n (r\"stc\\b|mobily\\b|zain\\s*(?:ksa|saudi|group)?|ooredoo\", \"Utilities\", \"Phone & Internet\"),\n (r\"mtn\\s*(?:group|nigeria|ghana|south\\s*africa|uganda)?\", \"Utilities\", \"Phone & Internet\"),\n (r\"safaricom\", \"Utilities\", \"Phone & Internet\"),\n\n # ── RETAIL & SHOPPING ─────────────────────────────────────────────────────\n # MUST come after specific Amazon sub-services above\n (r\"amazon(?!\\s*(?:web\\s*services|music|prime\\s*video|pay|flex|fresh|go\\b))\", \"Retail & Shopping\", \"Online Shopping\"),\n (r\"ebay\", \"Retail & Shopping\", \"Online Shopping\"),\n (r\"etsy\\b\", \"Retail & Shopping\", \"Online Shopping\"),\n (r\"aliexpress\", \"Retail & Shopping\", \"Online Shopping\"),\n (r\"alibaba(?!\\s*cloud)\", \"Retail & Shopping\", \"Online Shopping\"),\n (r\"taobao|tmall|jd\\.com|jingdong|pinduoduo\", \"Retail & Shopping\", \"Online Shopping\"),\n (r\"shopee\", \"Retail & Shopping\", \"Online Shopping\"),\n (r\"lazada\", \"Retail & Shopping\", \"Online Shopping\"),\n (r\"tokopedia|bukalapak\", \"Retail & Shopping\", \"Online Shopping\"),\n (r\"shein\\b|temu\\b|wish\\.com\", \"Retail & Shopping\", \"Online Shopping\"),\n # Department stores\n (r\"target(?!\\s*(?:optical|pharmacy|clinic))\", \"Retail & Shopping\", \"Department Stores\"),\n (r\"nordstrom(?!\\s*rack\\s*credit)\", \"Retail & Shopping\", \"Department Stores\"),\n (r\"bloomingdale'?s\", \"Retail & Shopping\", \"Department Stores\"),\n (r\"neiman\\s*marcus|saks\\s*fifth\\s*avenue\", \"Retail & Shopping\", \"Department Stores\"),\n (r\"macy'?s\", \"Retail & Shopping\", \"Department Stores\"),\n (r\"j\\.?\\s*c\\.?\\s*penney|jcpenney\", \"Retail & Shopping\", \"Department Stores\"),\n (r\"kohl'?s\", \"Retail & Shopping\", \"Department Stores\"),\n (r\"burlington\\s*(?:coat\\s*factory)?\", \"Retail & Shopping\", \"Department Stores\"),\n (r\"tj\\s*maxx|t\\.?j\\.?\\s*maxx\", \"Retail & Shopping\", \"Department Stores\"),\n (r\"marshalls\", \"Retail & Shopping\", \"Department Stores\"),\n (r\"ross\\s*(?:stores|dress\\s*for\\s*less)?\", \"Retail & Shopping\", \"Department Stores\"),\n (r\"john\\s*lewis\", \"Retail & Shopping\", \"Department Stores\"),\n (r\"selfridges|harrods|harvey\\s*nichols|house\\s*of\\s*fraser\", \"Retail & Shopping\", \"Department Stores\"),\n (r\"el\\s*corte\\s*ingles\", \"Retail & Shopping\", \"Department Stores\"),\n (r\"galeries\\s*lafayette|printemps\\b|bon\\s*marche\", \"Retail & Shopping\", \"Department Stores\"),\n (r\"galeria\\s*kaufhof|karstadt\", \"Retail & Shopping\", \"Department Stores\"),\n (r\"isetan|takashimaya|sogo\\b|parco\\b|tokyu\\s*(?:dept|hands)\", \"Retail & Shopping\", \"Department Stores\"),\n (r\"david\\s*jones|myer\\b\", \"Retail & Shopping\", \"Department Stores\"),\n (r\"sm\\s*(?:department\\s*store|supermalls)|robinsons\\s*(?:dept|malls)\", \"Retail & Shopping\", \"Department Stores\"),\n # Clothing\n (r\"h&m\\b|h\\s*and\\s*m\\b|hennes\\s*&\\s*mauritz\", \"Retail & Shopping\", \"Clothing & Apparel\"),\n (r\"zara(?!\\s*larsson)\", \"Retail & Shopping\", \"Clothing & Apparel\"),\n (r\"uniqlo\", \"Retail & Shopping\", \"Clothing & Apparel\"),\n (r\"gap(?!\\s*(?:insurance|year))\", \"Retail & Shopping\", \"Clothing & Apparel\"),\n (r\"mango(?!\\s*(?:language|smoothie|juice))\", \"Retail & Shopping\", \"Clothing & Apparel\"),\n (r\"pull\\s*&\\s*bear|bershka|massimo\\s*dutti|stradivarius\", \"Retail & Shopping\", \"Clothing & Apparel\"),\n (r\"primark|penneys\\s*ireland\", \"Retail & Shopping\", \"Clothing & Apparel\"),\n (r\"asos\\b\", \"Retail & Shopping\", \"Clothing & Apparel\"),\n (r\"boohoo|prettylittlething|plt\\b|missguided|nasty\\s*gal\", \"Retail & Shopping\", \"Clothing & Apparel\"),\n (r\"levi'?s|wrangler\\b|lee\\s*jeans\", \"Retail & Shopping\", \"Clothing & Apparel\"),\n (r\"nike\\b\", \"Retail & Shopping\", \"Clothing & Apparel\"),\n (r\"adidas\", \"Retail & Shopping\", \"Clothing & Apparel\"),\n (r\"puma\\b\", \"Retail & Shopping\", \"Clothing & Apparel\"),\n (r\"under\\s*armour\", \"Retail & Shopping\", \"Clothing & Apparel\"),\n (r\"lululemon\", \"Retail & Shopping\", \"Clothing & Apparel\"),\n (r\"patagonia|the\\s*north\\s*face|columbia\\s*sportswear|arc'?teryx\", \"Retail & Shopping\", \"Clothing & Apparel\"),\n # Electronics\n (r\"best\\s*buy\", \"Retail & Shopping\", \"Electronics\"),\n (r\"b&h\\s*photo|adorama\", \"Retail & Shopping\", \"Electronics\"),\n (r\"micro\\s*center|newegg\", \"Retail & Shopping\", \"Electronics\"),\n (r\"apple\\s*(?:store|retail|com)(?!\\s*(?:tv\\+|music|pay|card|fitness|one|arcade|news))\", \"Retail & Shopping\", \"Electronics\"),\n (r\"currys|curry'?s\\s*pc\\s*world|pcworld\", \"Retail & Shopping\", \"Electronics\"),\n (r\"mediamarkt|saturn\\s*(?:electro)?\", \"Retail & Shopping\", \"Electronics\"),\n (r\"fnac\\b\", \"Retail & Shopping\", \"Electronics\"),\n (r\"darty\\b\", \"Retail & Shopping\", \"Electronics\"),\n (r\"harvey\\s*norman\", \"Retail & Shopping\", \"Electronics\"),\n (r\"jb\\s*hi-?fi\", \"Retail & Shopping\", \"Electronics\"),\n # Home improvement\n (r\"home\\s*depot\", \"Retail & Shopping\", \"Home & Garden\"),\n (r\"lowe'?s(?!\\s*foods)\", \"Retail & Shopping\", \"Home & Garden\"),\n (r\"b&q\\b\", \"Retail & Shopping\", \"Home & Garden\"),\n (r\"bunnings\\s*(?:warehouse)?\", \"Retail & Shopping\", \"Home & Garden\"),\n (r\"bauhaus\\b\", \"Retail & Shopping\", \"Home & Garden\"),\n (r\"ikea\", \"Retail & Shopping\", \"Furniture & Home\"),\n (r\"wayfair\", \"Retail & Shopping\", \"Furniture & Home\"),\n (r\"pottery\\s*barn|williams-?sonoma\", \"Retail & Shopping\", \"Furniture & Home\"),\n (r\"crate\\s*&\\s*barrel|west\\s*elm|restoration\\s*hardware\", \"Retail & Shopping\", \"Furniture & Home\"),\n\n # ── PHARMACY & HEALTH ─────────────────────────────────────────────────────\n (r\"cvs(?!\\s*(?:energy|corporate|pharmacy\\s*credit))\", \"Health & Medical\", \"Pharmacy\"),\n (r\"walgreens\", \"Health & Medical\", \"Pharmacy\"),\n (r\"rite\\s*aid\", \"Health & Medical\", \"Pharmacy\"),\n (r\"boots\\s*(?:uk|pharmacy)?\", \"Health & Medical\", \"Pharmacy\"),\n (r\"superdrug\", \"Health & Medical\", \"Pharmacy\"),\n (r\"dm\\s*(?:drogerie)?\", \"Health & Medical\", \"Pharmacy\"),\n (r\"rossmann\", \"Health & Medical\", \"Pharmacy\"),\n (r\"mueller|müller\\s*drogerie\", \"Health & Medical\", \"Pharmacy\"),\n (r\"kruidvat\", \"Health & Medical\", \"Pharmacy\"),\n (r\"apotek(?:\\s*hjärtat)?|apoteket|lloydsapotek\", \"Health & Medical\", \"Pharmacy\"),\n (r\"chemist\\s*warehouse\", \"Health & Medical\", \"Pharmacy\"),\n (r\"priceline\\s*pharmacy|terry\\s*white\", \"Health & Medical\", \"Pharmacy\"),\n (r\"guardian\\s*pharmacy\", \"Health & Medical\", \"Pharmacy\"),\n (r\"watsons(?!\\s*hotel)\", \"Health & Medical\", \"Pharmacy\"),\n (r\"mannings\\b\", \"Health & Medical\", \"Pharmacy\"),\n (r\"mercury\\s*drug\", \"Health & Medical\", \"Pharmacy\"),\n # Gym & Fitness\n (r\"planet\\s*fitness\", \"Health & Medical\", \"Gym & Wellness\"),\n (r\"equinox(?!\\s*(?:payments|capital))\", \"Health & Medical\", \"Gym & Wellness\"),\n (r\"24\\s*hour\\s*fitness\", \"Health & Medical\", \"Gym & Wellness\"),\n (r\"anytime\\s*fitness\", \"Health & Medical\", \"Gym & Wellness\"),\n (r\"la\\s*fitness\", \"Health & Medical\", \"Gym & Wellness\"),\n (r\"ymca|ywca\", \"Health & Medical\", \"Gym & Wellness\"),\n (r\"orangetheory\\s*fitness\", \"Health & Medical\", \"Gym & Wellness\"),\n (r\"f45\\s*training|barry'?s\\s*bootcamp|soulcycle|soul\\s*cycle\", \"Health & Medical\", \"Gym & Wellness\"),\n (r\"peloton\", \"Health & Medical\", \"Gym & Wellness\"),\n (r\"classpass|mindbody\", \"Health & Medical\", \"Gym & Wellness\"),\n (r\"fitness\\s*first|virgin\\s*active|pure\\s*gym|snap\\s*fitness\", \"Health & Medical\", \"Gym & Wellness\"),\n (r\"goodlife\\s*(?:fitness|health\\s*clubs)\", \"Health & Medical\", \"Gym & Wellness\"),\n # Healthcare\n (r\"teladoc|mdlive|doctor\\s*on\\s*demand|sesame\\s*care|zocdoc|one\\s*medical\", \"Health & Medical\", \"Doctor & Clinic\"),\n (r\"labcorp|quest\\s*diagnostics\", \"Health & Medical\", \"Doctor & Clinic\"),\n (r\"cigna|aetna|humana|anthem\\b|bluecross|blue\\s*cross|bcbs\\b|kaiser\\s*permanente\", \"Health & Medical\", \"Insurance\"),\n (r\"unitedhealthcare|uhc\\b|oscar\\s*health|united\\s*health\", \"Health & Medical\", \"Insurance\"),\n\n # ── FINANCIAL SERVICES ────────────────────────────────────────────────────\n (r\"paypal\", \"Other\", \"Payment Services\"),\n (r\"stripe(?!\\s*security)\", \"Other\", \"Payment Services\"),\n (r\"square(?!\\s*(?:enix|mile|meal|trade))\", \"Other\", \"Payment Services\"),\n (r\"venmo\", \"Other\", \"Payment Services\"),\n (r\"zelle\\b\", \"Other\", \"Payment Services\"),\n (r\"cash\\s*app\", \"Other\", \"Payment Services\"),\n (r\"wise\\b(?!\\s*(?:owl|crack|man|acre|word|town|use))|transferwise\", \"Other\", \"Payment Services\"),\n (r\"revolut\\b\", \"Other\", \"Payment Services\"),\n (r\"n26\\b\", \"Other\", \"Payment Services\"),\n (r\"monzo\\b\", \"Other\", \"Payment Services\"),\n (r\"starling\\s*bank\", \"Other\", \"Payment Services\"),\n (r\"chime(?!\\s*communications)\", \"Other\", \"Payment Services\"),\n (r\"remitly|worldremit|western\\s*union|moneygram\", \"Other\", \"Payment Services\"),\n (r\"paynow|paylah|grabpay\", \"Other\", \"Payment Services\"),\n (r\"wechat\\s*pay|alipay|unionpay\", \"Other\", \"Payment Services\"),\n (r\"paytm|phonepe|bhim\\b|gpay\\b\", \"Other\", \"Payment Services\"),\n (r\"ovo\\b|gopay|dana\\b|linkaja|qris\", \"Other\", \"Payment Services\"),\n (r\"gcash|maya(?!\\s*(?:bay|riviera|angelou))|paymaya\", \"Other\", \"Payment Services\"),\n (r\"m-?pesa|m-?shwari|tigo\\s*pesa\", \"Other\", \"Payment Services\"),\n (r\"flutterwave|paystack\", \"Other\", \"Payment Services\"),\n (r\"coinbase|binance|kraken\\s*exchange|gemini\\s*exchange|crypto\\.com\", \"Financial Services\", \"Cryptocurrency\"),\n (r\"robinhood|webull|etrade|e\\*trade|td\\s*ameritrade|charles\\s*schwab|fidelity|vanguard\\b\", \"Financial Services\", \"Investment\"),\n (r\"betterment|wealthfront|acorns\\b|stash\\b|sofi\\s*invest\", \"Financial Services\", \"Investment\"),\n\n # ── EDUCATION ─────────────────────────────────────────────────────────────\n (r\"coursera\", \"Education\", \"Online Courses\"),\n (r\"udemy\", \"Education\", \"Online Courses\"),\n (r\"udacity\", \"Education\", \"Online Courses\"),\n (r\"linkedin\\s*learning|lynda\\.com\", \"Education\", \"Online Courses\"),\n (r\"skillshare\", \"Education\", \"Online Courses\"),\n (r\"pluralsight\", \"Education\", \"Online Courses\"),\n (r\"masterclass\", \"Education\", \"Online Courses\"),\n (r\"brilliant\\.org\", \"Education\", \"Online Courses\"),\n (r\"duolingo\", \"Education\", \"Language Learning\"),\n (r\"babbel|rosetta\\s*stone\", \"Education\", \"Language Learning\"),\n (r\"o'reilly\\s*(?:media|learning)?\", \"Education\", \"Books\"),\n\n # ── PROFESSIONAL SERVICES ─────────────────────────────────────────────────\n (r\"legalzoom|rocket\\s*lawyer|docusign|hellosign|pandadoc\", \"Professional Services\", \"Legal\"),\n (r\"upwork|fiverr|toptal|freelancer\\.com|99designs\", \"Professional Services\", \"Freelance\"),\n\n # ── OFFICE & SHIPPING ─────────────────────────────────────────────────────\n (r\"staples(?!\\s*(?:center|arena))\", \"Office & Supplies\", \"Office Supplies\"),\n (r\"office\\s*depot|officemax|office\\s*max\", \"Office & Supplies\", \"Office Supplies\"),\n (r\"ryman(?!\\s*auditorium)|viking\\s*direct\", \"Office & Supplies\", \"Office Supplies\"),\n (r\"uline\\b|quill\\.com\", \"Office & Supplies\", \"Office Supplies\"),\n (r\"fedex(?!\\s*field)\", \"Office & Supplies\", \"Postage & Shipping\"),\n (r\"ups\\s*(?:store)?(?!\\s*arena)\", \"Office & Supplies\", \"Postage & Shipping\"),\n (r\"usps\\b|royal\\s*mail|australia\\s*post|canada\\s*post|new\\s*zealand\\s*post\", \"Office & Supplies\", \"Postage & Shipping\"),\n (r\"dhl(?!\\s*fashion)\", \"Office & Supplies\", \"Postage & Shipping\"),\n (r\"ninja\\s*van|lalamove|j&t\\s*express|pos\\s*malaysia|singpost|thailand\\s*post\", \"Office & Supplies\", \"Postage & Shipping\"),\n]\n\n\ndef auto_categorise(merchant: str) -> tuple[str, str]:\n \"\"\"\n Auto-categorise a merchant/description by pattern matching.\n Returns (category, subcategory). Falls back to ('Other', 'Uncategorized').\n \"\"\"\n if not merchant:\n return \"Other\", \"Miscellaneous\"\n\n # Normalise: lowercase, collapse whitespace, strip non-ASCII diacritics\n merchant_lower = merchant.lower().strip()\n try:\n merchant_ascii = unicodedata.normalize('NFKD', merchant_lower)\n merchant_ascii = merchant_ascii.encode('ascii', 'ignore').decode('ascii')\n except Exception:\n merchant_ascii = merchant_lower\n\n for pattern, category, subcategory in MERCHANT_PATTERNS:\n if re.search(pattern, merchant_lower) or re.search(pattern, merchant_ascii):\n return category, subcategory\n\n return \"Other\", \"Miscellaneous\"\n","content_type":"text/x-python; charset=utf-8","language":"python","size":59362,"content_sha256":"4e9ee3e311c0b252cee9b58153785df6cdda02efd9d8e830703f96861a6087c4"},{"filename":"setup.sh","content":"#!/usr/bin/env bash\n# setup.sh — Install all doc-process skill dependencies\n#\n# Usage:\n# bash skills/doc-process/setup.sh # install everything\n# bash skills/doc-process/setup.sh --light # Python packages only (no system deps)\n#\n# Runs from your project root (the directory where skills/ lives).\n\nset -e\n\nSKILL_DIR=\"$(cd \"$(dirname \"${BASH_SOURCE[0]}\")\" && pwd)\"\nLIGHT=false\n[[ \"${1:-}\" == \"--light\" ]] && LIGHT=true\n\n# ── Step 1: Ensure Python 3 is available ─────────────────────────────────────\necho \"=== doc-process: checking Python ===\"\nif command -v python3 &>/dev/null; then\n echo \"✓ $(python3 --version)\"\n PY=python3\nelif command -v python &>/dev/null && python -c \"import sys; assert sys.version_info.major==3\" 2>/dev/null; then\n echo \"✓ $(python --version)\"\n PY=python\nelse\n echo \"✗ Python 3 not found. Install it first:\"\n echo \" Amazon Linux / RHEL: sudo yum install -y python3\"\n echo \" Ubuntu / Debian: sudo apt-get install -y python3\"\n echo \" macOS: brew install python\"\n exit 1\nfi\n\n# ── Step 2: Ensure pip is available ──────────────────────────────────────────\necho \"\"\necho \"=== doc-process: ensuring pip is installed ===\"\nif $PY -m pip --version &>/dev/null; then\n echo \"✓ $($PY -m pip --version)\"\nelse\n echo \" pip not found — installing via ensurepip...\"\n if $PY -m ensurepip --upgrade 2>/dev/null; then\n echo \"✓ pip installed via ensurepip\"\n else\n echo \" ensurepip unavailable — trying package manager...\"\n if command -v apt-get &>/dev/null; then\n sudo apt-get install -y python3-pip\n elif command -v yum &>/dev/null; then\n sudo yum install -y python3-pip\n elif command -v dnf &>/dev/null; then\n sudo dnf install -y python3-pip\n elif command -v brew &>/dev/null; then\n brew install python # pip is bundled with Homebrew Python\n else\n echo \" Falling back to get-pip.py...\"\n curl -sSL https://bootstrap.pypa.io/get-pip.py | $PY\n fi\n fi\n # Upgrade pip to latest\n $PY -m pip install --upgrade pip --quiet\n echo \"✓ $($PY -m pip --version)\"\nfi\n\n# ── Step 3: Install Python packages ──────────────────────────────────────────\necho \"\"\necho \"=== doc-process: installing Python dependencies ===\"\n$PY -m pip install -r \"$SKILL_DIR/requirements.txt\" --quiet && \\\n echo \"✓ Python packages installed\" || \\\n { echo \"✗ pip install failed — try: $PY -m pip install --user -r skills/doc-process/requirements.txt\"; exit 1; }\n\nif $LIGHT; then\n echo \"Skipping system packages (--light mode).\"\n exit 0\nfi\n\necho \"\"\necho \"=== doc-process: checking system dependencies ===\"\n\n# ── tesseract (needed by redactor.py image mode) ─────────────────────────────\nif command -v tesseract &>/dev/null; then\n echo \"✓ tesseract $(tesseract --version 2>&1 | head -1 | awk '{print $2}')\"\nelse\n echo \" tesseract not found — installing...\"\n if command -v brew &>/dev/null; then\n brew install tesseract\n elif command -v apt-get &>/dev/null; then\n sudo apt-get install -y tesseract-ocr\n elif command -v dnf &>/dev/null; then\n sudo dnf install -y tesseract\n else\n echo \" ⚠ Could not auto-install tesseract. Install manually:\"\n echo \" macOS: brew install tesseract\"\n echo \" Ubuntu: apt install tesseract-ocr\"\n echo \" (image redaction will not work without it)\"\n fi\nfi\n\n# ── ffmpeg (needed by audio_transcriber.py) ──────────────────────────────────\nif command -v ffmpeg &>/dev/null; then\n echo \"✓ ffmpeg $(ffmpeg -version 2>&1 | head -1 | awk '{print $3}')\"\nelse\n echo \" ffmpeg not found — installing...\"\n if command -v brew &>/dev/null; then\n brew install ffmpeg\n elif command -v apt-get &>/dev/null; then\n sudo apt-get install -y ffmpeg\n elif command -v dnf &>/dev/null; then\n sudo dnf install -y ffmpeg\n else\n echo \" ⚠ Could not auto-install ffmpeg. Install manually:\"\n echo \" macOS: brew install ffmpeg\"\n echo \" Ubuntu: apt install ffmpeg\"\n echo \" (audio transcription will not work without it)\"\n fi\nfi\n\necho \"\"\necho \"=== doc-process: setup complete ===\"\necho \"All script-assisted modes are ready.\"\necho \"Note: openai-whisper will download its model file (~140 MB) on first audio transcription.\"\n","content_type":"application/x-sh; charset=utf-8","language":"bash","size":4794,"content_sha256":"c7f9eac3f7100709f801cabddc2748ff71dbe35d4f0c7d901a567c660dc04e9f"}],"content_json":{"type":"doc","content":[{"type":"heading","attrs":{"level":1},"content":[{"text":"Doc-Process — Document Intelligence Skill","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Step 0 — Auto-Setup (run once on first use)","type":"text"}]},{"type":"paragraph","content":[{"text":"Before invoking any script for the first time in a session, check whether the script dependencies are available. If any are missing, run the setup script automatically — no prompting needed:","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"bash skills/doc-process/setup.sh","type":"text"}]},{"type":"paragraph","content":[{"text":"This installs all Python packages (","type":"text"},{"text":"pymupdf","type":"text","marks":[{"type":"code_inline"}]},{"text":", ","type":"text"},{"text":"Pillow","type":"text","marks":[{"type":"code_inline"}]},{"text":", ","type":"text"},{"text":"pytesseract","type":"text","marks":[{"type":"code_inline"}]},{"text":", ","type":"text"},{"text":"opencv-python-headless","type":"text","marks":[{"type":"code_inline"}]},{"text":", ","type":"text"},{"text":"numpy","type":"text","marks":[{"type":"code_inline"}]},{"text":", ","type":"text"},{"text":"img2pdf","type":"text","marks":[{"type":"code_inline"}]},{"text":", ","type":"text"},{"text":"pdfplumber","type":"text","marks":[{"type":"code_inline"}]},{"text":", ","type":"text"},{"text":"openai-whisper","type":"text","marks":[{"type":"code_inline"}]},{"text":") and attempts to install system binaries (","type":"text"},{"text":"tesseract","type":"text","marks":[{"type":"code_inline"}]},{"text":", ","type":"text"},{"text":"ffmpeg","type":"text","marks":[{"type":"code_inline"}]},{"text":") via ","type":"text"},{"text":"brew","type":"text","marks":[{"type":"code_inline"}]},{"text":" or ","type":"text"},{"text":"apt","type":"text","marks":[{"type":"code_inline"}]},{"text":" depending on the platform.","type":"text"}]},{"type":"paragraph","content":[{"text":"When to run Step 0:","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"First time any script-assisted mode is used in a session","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"After a fresh ","type":"text"},{"text":"clawhub install piyush-zinc/doc-process","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"If a script fails with ","type":"text"},{"text":"ModuleNotFoundError","type":"text","marks":[{"type":"code_inline"}]},{"text":" or ","type":"text"},{"text":"ImportError","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"paragraph","content":[{"text":"To install Python packages only (no system packages):","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"bash skills/doc-process/setup.sh --light","type":"text"}]},{"type":"paragraph","content":[{"text":"Or install directly from the skill's requirements file:","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"pip install -r skills/doc-process/requirements.txt","type":"text"}]},{"type":"blockquote","content":[{"type":"paragraph","content":[{"text":"Note:","type":"text","marks":[{"type":"strong"}]},{"text":" ","type":"text"},{"text":"openai-whisper","type":"text","marks":[{"type":"code_inline"}]},{"text":" downloads its model (~140 MB) on first audio transcription — not at install time.","type":"text"}]}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Overview","type":"text"}]},{"type":"paragraph","content":[{"text":"This skill handles all document-related tasks using Claude's native vision/language capabilities for reading and analysis, and Python scripts for file-output operations. Most modes require ","type":"text"},{"text":"no installation","type":"text","marks":[{"type":"strong"}]},{"text":" — only the file-output scripts need third-party libraries.","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"How Features Are Implemented","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Feature","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Implementation","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"External libraries","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"OCR / reading images","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Claude built-in vision","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"None","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"MRZ decoding (passport/ID)","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Claude reads MRZ visually, applies ICAO algorithm","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"None","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"PDF reading","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Claude reads PDF text layer or visually","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"None","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Form autofill","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Claude reads form fields, outputs fill table","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"None","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Contract analysis","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Claude applies reference rule set","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"None","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Receipt / invoice scanning","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Claude reads image or PDF","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"None","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Bank statement (PDF)","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Claude reads PDF pages","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"None","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Bank statement (CSV)","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"statement_parser.py","type":"text","marks":[{"type":"code_inline"}]},{"text":" — pure stdlib","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"None","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Expense logging","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"expense_logger.py","type":"text","marks":[{"type":"code_inline"}]},{"text":" — pure stdlib","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"None","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Bank report generation","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"report_generator.py","type":"text","marks":[{"type":"code_inline"}]},{"text":" — pure stdlib","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"None","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Resume / CV parsing","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Claude reads document","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"None","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Medical summarizer","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Claude reads document","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"None","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Legal redaction (display)","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Claude marks up output","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"None","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Legal redaction (file output)","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"redactor.py","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"pymupdf","type":"text","marks":[{"type":"strong"}]},{"text":" (PDF); ","type":"text"},{"text":"Pillow + pytesseract","type":"text","marks":[{"type":"strong"}]},{"text":" (image)","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Meeting minutes (text/PDF)","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Claude reads document","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"None","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Translation","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Claude's multilingual capabilities","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"None","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Document categorizer","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Claude reads first 1–2 pages (with consent gate)","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"None","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Timeline logging","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"timeline_manager.py","type":"text","marks":[{"type":"code_inline"}]},{"text":" — pure stdlib","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"None","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Table extraction (PDF)","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"table_extractor.py","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"pdfplumber","type":"text","marks":[{"type":"strong"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Audio transcription","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"audio_transcriber.py","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"openai-whisper + ffmpeg","type":"text","marks":[{"type":"strong"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Doc scan / perspective correction","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"doc_scanner.py","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"opencv-python-headless, numpy, Pillow","type":"text","marks":[{"type":"strong"}]},{"text":"; img2pdf optional","type":"text"}]}]}]}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Dependencies & Installation","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"No installation required for core functionality","type":"text"}]},{"type":"paragraph","content":[{"text":"Reading, analysis, form filling, contract review, receipt scanning, bank statement analysis (PDF), resume parsing, ID scanning, medical summarising, redaction markup, meeting minutes, and translation all run on Claude's built-in capabilities.","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Optional — install only for file-output scripts","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"# PII redaction to PDF/image files (redactor.py)\npip install pymupdf>=1.23 # required for PDF redaction\npip install Pillow>=10.0 # required for image redaction\npip install pytesseract>=0.3 # required for image redaction (also: brew install tesseract)\n\n# Document scanning / perspective correction (doc_scanner.py)\npip install opencv-python-headless>=4.9 numpy>=1.24 Pillow>=10.0\npip install img2pdf>=0.5 # optional — for PDF output; Pillow fallback used if absent\n\n# Table extraction from PDFs (table_extractor.py)\npip install pdfplumber>=0.11\n\n# Audio transcription (audio_transcriber.py)\n# Also requires ffmpeg binary: brew install ffmpeg / apt install ffmpeg\npip install openai-whisper>=20231117","type":"text"}]},{"type":"paragraph","content":[{"text":"All dependencies are also listed in ","type":"text"},{"text":"requirements.txt","type":"text","marks":[{"type":"code_inline"}]},{"text":" at the repository root.","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Binary dependencies","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Binary","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Required by","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Install","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"tesseract","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"redactor.py","type":"text","marks":[{"type":"code_inline"}]},{"text":" (image mode)","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"brew install tesseract","type":"text","marks":[{"type":"code_inline"}]},{"text":" / ","type":"text"},{"text":"apt install tesseract-ocr","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"ffmpeg","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"audio_transcriber.py","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"brew install ffmpeg","type":"text","marks":[{"type":"code_inline"}]},{"text":" / ","type":"text"},{"text":"apt install ffmpeg","type":"text","marks":[{"type":"code_inline"}]}]}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Network access","type":"text"}]},{"type":"paragraph","content":[{"text":"openai-whisper","type":"text","marks":[{"type":"code_inline"}]},{"text":" downloads model files (~140 MB) from OpenAI/HuggingFace servers ","type":"text"},{"text":"on first run only","type":"text","marks":[{"type":"strong"}]},{"text":". Cached at ","type":"text"},{"text":"~/.cache/whisper/","type":"text","marks":[{"type":"code_inline"}]},{"text":". All other scripts are fully local after installation.","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Script Reference","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Script","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Dependencies","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Purpose","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Example","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"redactor.py","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"pymupdf; Pillow + pytesseract (image mode)","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"PII redaction to file (PDF/image/text)","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"python scripts/redactor.py --file doc.pdf --mode full --log","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"doc_scanner.py","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"opencv-python-headless, numpy, Pillow; img2pdf optional","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Document scanning: edge detection, perspective correction, scan-quality output","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"python scripts/doc_scanner.py --input photo.jpg --output scanned.png --mode bw","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"expense_logger.py","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"None","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Add/list/edit/delete expense entries in CSV","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"python scripts/expense_logger.py add --date 2024-03-15 --merchant \"Starbucks\" --amount 13.12 --file expenses.csv","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"statement_parser.py","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"None","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Parse bank CSV export, categorize transactions","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"python scripts/statement_parser.py --file statement.csv --output categorized.json","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"report_generator.py","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"None","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Format categorized JSON into a markdown report","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"python scripts/report_generator.py --file categorized.json --type bank","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"timeline_manager.py","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"None","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Manage opt-in document processing timeline","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"python scripts/timeline_manager.py show","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"audio_transcriber.py","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"openai-whisper, ffmpeg","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Transcribe audio files to text","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"python scripts/audio_transcriber.py --file meeting.mp3 --output transcript.txt","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"table_extractor.py","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"pdfplumber","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Extract tables from PDFs to CSV or JSON","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"python scripts/table_extractor.py --file document.pdf --output data.csv","type":"text","marks":[{"type":"code_inline"}]}]}]}]}]},{"type":"paragraph","content":[{"text":"All scripts import only what they declare. Scripts with no declared deps use Python stdlib only. You can verify any script: \"show me the source of [script name]\".","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Script Import Verification","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Script","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Stdlib imports","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Third-party","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Network","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"timeline_manager.py","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"argparse, json, sys, datetime, pathlib, uuid, collections","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"None","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Never","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"redactor.py","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"argparse, re, sys, pathlib, dataclasses","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"pymupdf (PDF); Pillow + pytesseract (image)","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Never","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"doc_scanner.py","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"argparse, json, sys, time, pathlib","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"opencv-python-headless, numpy, Pillow; img2pdf optional","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Never","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"expense_logger.py","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"argparse, csv, json, sys, pathlib","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"None","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Never","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"statement_parser.py","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"argparse, csv, json, re, sys, collections, datetime, pathlib","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"None","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Never","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"report_generator.py","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"argparse, json, sys, collections, pathlib","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"None","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Never","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"utils.py","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"re, unicodedata, datetime, pathlib","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"None","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Never","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"audio_transcriber.py","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"argparse, sys, pathlib","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"openai-whisper","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"First-run model download only","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"table_extractor.py","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"argparse, csv, io, json, sys, pathlib","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"pdfplumber","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Never","type":"text"}]}]}]}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Privacy & Data Handling","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Aspect","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Policy","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Document content","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Read locally within this session only. Not stored, indexed, or transmitted.","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Personal data for form autofill","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Used only to complete the current form. Not written to any file. Not retained after session.","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Timeline log","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Opt-in only. Confirmed by user before any entry is written. Contains no raw document content — only category-level summaries.","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Redacted output files","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Written only to a path the user explicitly confirms.","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Audio transcripts","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Written to a local file the user specifies. Model download on first Whisper use only.","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"No telemetry","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"This skill has no analytics, usage reporting, or network calls beyond what is listed above.","type":"text"}]}]}]}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Step 1 — Identify the Mode","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Explicit intent → go directly to the matching mode","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Mode","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"User intent signals","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Typical file types","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Document Categorizer","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"\"process this\", \"what is this?\", \"analyze this\", \"help with this\", no clear intent","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Any","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Form Autofill","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"fill, autofill, fill out, complete this form","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"PDF form, image, screenshot","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Contract Analyzer","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"review, summarize, contract, agreement, risks, red flags, NDA, lease","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"PDF, text","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Receipt Scanner","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"receipt, invoice, log expense, scan this bill","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Photo, image, PDF","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Bank Statement Analyzer","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"bank statement, transactions, subscriptions, categorize spending","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"PDF, CSV","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Resume / CV Parser","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"parse resume, extract cv, what's on this resume, scan resume","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"PDF, image, text","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"ID & Passport Scanner","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"scan id, read passport, extract from id card, scan my passport","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Photo, image, PDF","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Medical Summarizer","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"lab report, blood test, prescription, discharge summary, medical results","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"PDF, image, text","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Legal Redactor","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"redact, remove pii, anonymize, censor sensitive info","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"PDF, text, image","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Meeting Minutes","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"meeting minutes, action items, summarize meeting, transcribe meeting","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Text, PDF, image, audio","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Table Extractor","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"extract table, table to csv, get data from pdf, table to json","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"PDF, image, text","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Document Translator","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"translate this, translate to [language], document translation","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Any","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Document Timeline","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"show my timeline, document history, what have I processed, save timeline","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"—","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Doc Scan","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"scan this photo, make this look scanned, correct perspective, dewarp, clean this photo, digitize this, straighten this","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Photo, image","type":"text"}]}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Ambiguous intent → Document Categorizer (with consent gate)","type":"text"}]},{"type":"paragraph","content":[{"text":"If the user uploads a file without a clear mode signal, ","type":"text"},{"text":"do not read it yet","type":"text","marks":[{"type":"strong"}]},{"text":". Ask:","type":"text"}]},{"type":"blockquote","content":[{"type":"paragraph","content":[{"text":"\"I can classify this document automatically to suggest the best mode — that requires me to read the first 1–2 pages. Or you can choose directly:","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Option","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Best for","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Form Autofill","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Forms with fill-in fields","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Contract Analyzer","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Agreements, NDAs, leases","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Receipt Scanner","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Receipts, invoices","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Bank Statement Analyzer","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Bank/credit card statements","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Resume Parser","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"CVs, resumes","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"ID Scanner","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Passports, IDs, driver's licenses","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Medical Summarizer","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Lab reports, prescriptions","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Legal Redactor","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Any document with PII to remove","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Meeting Minutes","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Notes or recordings","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Table Extractor","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Documents with data tables","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Translator","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Non-English documents","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Doc Scan","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Document photo needing perspective correction","type":"text"}]}]}]}]},{"type":"paragraph","content":[{"text":"Shall I classify it, or which mode would you like?\"","type":"text"}]}]},{"type":"paragraph","content":[{"text":"Only read the document after the user confirms.","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Step 2 — Read the Document","type":"text"}]},{"type":"paragraph","content":[{"text":"Use the ","type":"text"},{"text":"Read","type":"text","marks":[{"type":"code_inline"}]},{"text":" tool on the uploaded file. For images, read them visually. For PDFs over 10 pages, read in page ranges.","type":"text"}]},{"type":"paragraph","content":[{"text":"For audio files (Meeting Minutes mode only):","type":"text","marks":[{"type":"strong"}]},{"text":" confirm before running — this requires ","type":"text"},{"text":"openai-whisper","type":"text","marks":[{"type":"code_inline"}]},{"text":" and downloads a model on first run:","type":"text"}]},{"type":"blockquote","content":[{"type":"paragraph","content":[{"text":"\"Transcribing this audio requires the ","type":"text"},{"text":"openai-whisper","type":"text","marks":[{"type":"code_inline"}]},{"text":" library. On first use it downloads a model file (~140 MB). Is that OK?\"","type":"text"}]}]},{"type":"paragraph","content":[{"text":"If yes:","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"python skills/doc-process/scripts/audio_transcriber.py --file \u003cpath> --output transcript.txt","type":"text"}]},{"type":"paragraph","content":[{"text":"If no: ask if the user can provide a text transcript.","type":"text"}]},{"type":"paragraph","content":[{"text":"For document photos (Doc Scan mode):","type":"text","marks":[{"type":"strong"}]},{"text":" read the image visually first to assess quality and detect the document type before running the scanner script.","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Step 3 — Execute the Mode","type":"text"}]},{"type":"paragraph","content":[{"text":"Load and follow the matching reference file in full:","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Mode","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Reference file","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Document Categorizer","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"references/document-categorizer.md","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Form Autofill","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"references/form-autofill.md","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Contract Analyzer","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"references/contract-analyzer.md","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Receipt Scanner","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"references/receipt-scanner.md","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Bank Statement Analyzer","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"references/bank-statement-analyzer.md","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Resume / CV Parser","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"references/resume-parser.md","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"ID & Passport Scanner","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"references/id-scanner.md","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Medical Summarizer","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"references/medical-summarizer.md","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Legal Redactor","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"references/legal-redactor.md","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Meeting Minutes","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"references/meeting-minutes.md","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Table Extractor","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"references/table-extractor.md","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Document Translator","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"references/document-translator.md","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Document Timeline","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"references/document-timeline.md","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Doc Scan","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"references/doc-scan.md","type":"text","marks":[{"type":"code_inline"}]}]}]}]}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Step 4 — Redactor: PII Rule Coverage","type":"text"}]},{"type":"paragraph","content":[{"text":"The ","type":"text"},{"text":"redactor.py","type":"text","marks":[{"type":"code_inline"}]},{"text":" script covers the following PII categories across ","type":"text"},{"text":"50+ rule types","type":"text","marks":[{"type":"strong"}]},{"text":" for global document types (bank statements, contracts, medical records, invoices, share-purchase agreements, government forms, and more).","type":"text"}]},{"type":"paragraph","content":[{"text":"Category 1 — Personal Identifiers","type":"text","marks":[{"type":"strong"}]},{"text":" (standard + light mode)","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Rule","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Examples","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"SSN (US)","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"123-45-6789","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"SIN (Canada)","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"123-456-789","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"UK National Insurance Number","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"AB 12 34 56 C","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Australian TFN","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"123 456 789","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Australian Medicare number","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"1234 56789 1","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Indian Aadhaar","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"1234 5678 9012","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Passport number","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"A12345678","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Driver's license","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"keyword-anchored","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"UK NHS number","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"943 476 5919","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"National / voter ID","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"keyword-anchored","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Vehicle VIN","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"keyword-anchored 17-char code","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"NRIC (Singapore)","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"S1234567A","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Medical record (MRN)","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"keyword-anchored","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Indian PAN","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"AABCW6386P","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Email address","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"[email protected]","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Phone number","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"all international formats; date/reference false-positives suppressed","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Street address","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"BLK/BLOCK/FLAT/UNIT/APT prefix + number + street name + type (Street, Ave, Rd, Hill, Close, Quay, Park, etc.)","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Unit / apartment number","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"#02-01, Unit 3B, Apt 4C, Flat 12","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"P.O. Box","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"PO Box 1234","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"US ZIP / CA postal","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"10001, M5V 3A8","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"UK postcode","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"SW1A 2AA","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"International 6-digit postal","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Singapore 229572, Bangalore 560067","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"IPv4 address","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"192.168.1.1","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"MAC address","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"AA:BB:CC:DD:EE:FF","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Date of birth","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"keyword + numeric/month-name formats","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Age","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"\"Age: 34\"","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Labeled name (50+ field keywords)","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Bill To, Shipper, Attention, Buyer, Seller, Patient, Employee, Plaintiff, Trustee, Shareholder, Director, Tenant, Lender, Beneficiary, etc.","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Honorific prefix + name","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Mr./Mrs./Ms./Dr./Prof./Rev./Hon./Mx. + name","type":"text"}]}]}]}]},{"type":"paragraph","content":[{"text":"Category 2 — Financial Data","type":"text","marks":[{"type":"strong"}]},{"text":" (standard + full mode)","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Rule","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Examples","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Credit / debit card number","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"4111 1111 1111 1111","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Card CVV","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"CVV: 123","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Card expiry","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"03/26","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Bank account number","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"keyword-anchored","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"IBAN","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"IBAN country-code validated (GB, DE, FR, etc.)","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"ABA / routing number","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"\"Routing No.\" and \"ABA No.\"","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"UK Sort code","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"20-00-00","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Australian BSB","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"063-000","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Indian IFSC code","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"HDFC0000001","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"SWIFT / BIC code","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"allows space in code (e.g. CHAS US33)","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Salary / compensation","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"salary, CTC, gross/net pay, take-home, remuneration","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Credit score","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"keyword-anchored","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Loan / mortgage amount","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"keyword-anchored","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Tax figures","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"AGI, taxable income, tax paid","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Net worth / total assets","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"keyword-anchored","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Cryptocurrency wallet","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Bitcoin, Ethereum","type":"text"}]}]}]}]},{"type":"paragraph","content":[{"text":"Category 3 — Sensitive / Protected","type":"text","marks":[{"type":"strong"}]},{"text":" (full mode only)","type":"text"}]},{"type":"paragraph","content":[{"text":"HIV/AIDS status, blood type, mental health diagnoses (expanded), reproductive health, substance use history, sexual orientation / gender identity, disability, criminal record, genetic information, immigration status, minor's name, attorney–client privilege, trade secrets.","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Redaction modes","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Flag","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Categories","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Use case","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"--mode light","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Cat 1 only","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Sharing docs where financial details can remain","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"--mode standard","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Cat 1 + 2 (default)","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"General privacy protection","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"--mode full","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Cat 1 + 2 + 3","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Legal filings, healthcare, immigration, HR","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"--custom REGEX","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Cat 0 + selected mode","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Domain-specific or proprietary terms","type":"text"}]}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"How PDF redaction works","type":"text"}]},{"type":"ordered_list","attrs":{"order":1,"listStyle":"number"},"content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Word bounding boxes are extracted from the PDF layout engine","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"PII is detected using a single-pass, non-overlapping regex engine","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Matched spans are mapped back to word bounding boxes","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"PyMuPDF redaction annotations (solid black fill) are placed on the exact word rects","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"apply_redactions()","type":"text","marks":[{"type":"code_inline"}]},{"text":" burns the black fills in and removes the underlying text data from the content stream — redacted text cannot be copy-pasted or extracted","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"The file is saved incrementally — every non-redacted element (fonts, images, vector graphics, metadata) is left completely untouched","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"The original file is never modified; output is always a separate copy","type":"text"}]}]}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Step 5 — Doc Scan: How It Works","type":"text"}]},{"type":"paragraph","content":[{"text":"The ","type":"text"},{"text":"doc_scanner.py","type":"text","marks":[{"type":"code_inline"}]},{"text":" script converts a document photo into a professional scan in 7 steps:","type":"text"}]},{"type":"ordered_list","attrs":{"order":1,"listStyle":"number"},"content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Multi-strategy edge detection","type":"text","marks":[{"type":"strong"}]},{"text":" — tries three approaches in order: (A) Canny on greyscale; (B) Morphological gradient; (C) Colour/brightness threshold. Stops at first success.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Sub-pixel corner refinement","type":"text","marks":[{"type":"strong"}]},{"text":" — ","type":"text"},{"text":"cv2.cornerSubPix","type":"text","marks":[{"type":"code_inline"}]},{"text":" makes the four corner points accurate to sub-pixel level for the most precise warp.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Perspective warp","type":"text","marks":[{"type":"strong"}]},{"text":" — four-point transform using Lanczos interpolation flattens the document to a perfect rectangle.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Shadow removal","type":"text","marks":[{"type":"strong"}]},{"text":" — per-channel background estimation + normalisation removes cast shadows and uneven lighting without affecting text.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Scan-quality enhancement","type":"text","marks":[{"type":"strong"}]},{"text":" — mode-specific: BW = adaptive threshold (block size auto-scaled to resolution) + stroke repair + denoising; Gray = auto-levels + CLAHE + unsharp mask; Color = white-balance + CLAHE + sharpening.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Scanner border","type":"text","marks":[{"type":"strong"}]},{"text":" — 8 px white border simulates scanner bed edge.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"DPI-tagged output","type":"text","marks":[{"type":"strong"}]},{"text":" — saved with embedded DPI metadata (default 300 DPI, print quality).","type":"text"}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"When auto-detection fails","type":"text"}]},{"type":"paragraph","content":[{"text":"If the script reports ","type":"text"},{"text":"\"corners_detected\": false","type":"text","marks":[{"type":"code_inline"}]},{"text":":","type":"text"}]},{"type":"ordered_list","attrs":{"order":1,"listStyle":"number"},"content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Offer manual corner hints: ask the user where the four corners of the document are approximately","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Use ","type":"text"},{"text":"--no-warp","type":"text","marks":[{"type":"code_inline"}]},{"text":" to at least apply enhancement without perspective correction","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Provide photography tips (see ","type":"text"},{"text":"references/doc-scan.md","type":"text","marks":[{"type":"code_inline"}]},{"text":" Step 8)","type":"text"}]}]}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Step 6 — Document Timeline (Opt-In)","type":"text"}]},{"type":"paragraph","content":[{"text":"Off by default. After completing the first document task in a session, ask once:","type":"text"}]},{"type":"blockquote","content":[{"type":"paragraph","content":[{"text":"\"Would you like me to keep a processing log for this session? It records document type, filename, and a category-level summary (no raw content, no personal data) to ","type":"text"},{"text":"~/.doc-process-timeline.json","type":"text","marks":[{"type":"code_inline"}]},{"text":" on your local machine. Entirely optional — yes or no.\"","type":"text"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Yes","type":"text","marks":[{"type":"strong"}]},{"text":" → confirm \"Timeline logging is on.\" Log current and subsequent documents. Announce each with \"Logged to your timeline.\"","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"No","type":"text","marks":[{"type":"strong"}]},{"text":" → confirm \"No log will be kept.\" Do not run any timeline script. Do not ask again this session.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"No response / unsure","type":"text","marks":[{"type":"strong"}]},{"text":" → treat as No.","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"Summary rules (strictly enforced):","type":"text","marks":[{"type":"strong"}]},{"text":" the ","type":"text"},{"text":"--summary","type":"text","marks":[{"type":"code_inline"}]},{"text":" argument must never contain names, ID numbers, dates of birth, addresses, account numbers, card numbers, medical values, or any data that could identify a person. Category-level descriptions only.","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Step 7 — Deliver Output","type":"text"}]},{"type":"paragraph","content":[{"text":"Present output in clean tables with section headers as specified in each reference file. Always end with an action prompt relevant to the mode. For Doc Scan, always offer to continue processing the scanned output.","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"General Principles","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Never hallucinate field values.","type":"text","marks":[{"type":"strong"}]},{"text":" Unknown values → ","type":"text"},{"text":"[MISSING]","type":"text","marks":[{"type":"code_inline"}]},{"text":" or ","type":"text"},{"text":"[UNREADABLE]","type":"text","marks":[{"type":"code_inline"}]},{"text":".","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Flag risks conservatively","type":"text","marks":[{"type":"strong"}]},{"text":" — when in doubt, include it.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Keep summaries scannable","type":"text","marks":[{"type":"strong"}]},{"text":" with tables and bullets.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Do not echo sensitive data","type":"text","marks":[{"type":"strong"}]},{"text":" beyond what is necessary for the immediate task.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Always include relevant disclaimers","type":"text","marks":[{"type":"strong"}]},{"text":" (medical, legal, privacy) where required by the reference guide.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Timeline is opt-in per session.","type":"text","marks":[{"type":"strong"}]},{"text":" Never log without confirmed consent.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Personal data for form autofill is session-only.","type":"text","marks":[{"type":"strong"}]},{"text":" Never write it to a file.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Before running any script with third-party deps","type":"text","marks":[{"type":"strong"}]},{"text":", run ","type":"text"},{"text":"bash skills/doc-process/setup.sh","type":"text","marks":[{"type":"code_inline"}]},{"text":" automatically if deps are not yet installed (see Step 0). No need to ask — the setup script is safe and idempotent.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Categorize before asking","type":"text","marks":[{"type":"strong"}]},{"text":" — but only after confirming the user wants auto-classification.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"For Doc Scan:","type":"text","marks":[{"type":"strong"}]},{"text":" always assess the image visually first; never process non-document images.","type":"text"}]}]}]},{"type":"hr","attrs":{"markup":"---"}}]},"metadata":{"date":"2026-06-05","name":"doc-process","author":"@skillopedia","source":{"stars":2012,"repo_name":"openclaw-master-skills","origin_url":"https://github.com/leoyeai/openclaw-master-skills/blob/HEAD/skills/doc-process/SKILL.md","repo_owner":"leoyeai","body_sha256":"ea3a996ba424ca817396b9b9094bece39179b46c60c96065dd7574ca2313cf1f","cluster_key":"d90d5721ee3aa2f16b32f505b7a00b1263a0e4b765b89e13282bd40a85575df0","clean_bundle":{"format":"clean-skill-bundle-v1","source":"leoyeai/openclaw-master-skills/skills/doc-process/SKILL.md","attachments":[{"id":"8c7cae0e-a327-51c9-bd83-b49f418a886f","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/8c7cae0e-a327-51c9-bd83-b49f418a886f/attachment.json","path":"_meta.json","size":633,"sha256":"faaa0bfc19c9206d802f3c174f6e7bff9d4bb3aedec4eeff1aaf73ef148f6992","contentType":"application/json; charset=utf-8"},{"id":"864d2a1a-8f52-5428-9a85-12854515bbd1","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/864d2a1a-8f52-5428-9a85-12854515bbd1/attachment.json","path":"evals/evals.json","size":6046,"sha256":"b619e1d9e47d95ab69d5f8a4381cd1d4407765a281d6a2e23362a4ad6cc64585","contentType":"application/json; charset=utf-8"},{"id":"1ffcc1bc-182b-538f-be1d-9a51527d3569","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/1ffcc1bc-182b-538f-be1d-9a51527d3569/attachment.md","path":"references/bank-statement-analyzer.md","size":11904,"sha256":"2311beb91333ade86cd1f4403ac5bc914449aecaa76f04009621077ad4e9dfe3","contentType":"text/markdown; charset=utf-8"},{"id":"2f62d86d-f79f-5d68-bec5-b95030277d3a","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/2f62d86d-f79f-5d68-bec5-b95030277d3a/attachment.md","path":"references/contract-analyzer.md","size":16287,"sha256":"4a9c3bd833fc8574d457469f6ae186dd2ec255026498ad6f8f5672597876129c","contentType":"text/markdown; charset=utf-8"},{"id":"83c223dd-f498-59a4-bf40-b88e5c5f7d7d","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/83c223dd-f498-59a4-bf40-b88e5c5f7d7d/attachment.md","path":"references/doc-scan.md","size":9861,"sha256":"e983d97b6dafa367dd7977f6ae8d943ebac8502297f2e7fb2115c5cb17c89e69","contentType":"text/markdown; charset=utf-8"},{"id":"4e1fca30-6725-5965-86a5-0f8c86e0b78e","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/4e1fca30-6725-5965-86a5-0f8c86e0b78e/attachment.md","path":"references/document-categorizer.md","size":10858,"sha256":"b5fbf09a425ea4cf38f0398234a82fec2619f91e1eb94247d9e4c64747bef82b","contentType":"text/markdown; charset=utf-8"},{"id":"061211da-7c6a-52b0-860e-3097c925654a","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/061211da-7c6a-52b0-860e-3097c925654a/attachment.md","path":"references/document-timeline.md","size":6216,"sha256":"3dd7379ae2d960c1eebf74af6cb19c1f252d3e9a65f3e1b3f0c969c64c852b6b","contentType":"text/markdown; charset=utf-8"},{"id":"0dfa9318-2b99-5054-83ff-75f383fca1c0","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/0dfa9318-2b99-5054-83ff-75f383fca1c0/attachment.md","path":"references/document-translator.md","size":10380,"sha256":"4fb71b3d0bfb76caef34b74905923edefe3cb5de6471fa7844b1bae4c7c977c6","contentType":"text/markdown; charset=utf-8"},{"id":"9e2ee06e-2440-578e-b348-53e17fee833d","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/9e2ee06e-2440-578e-b348-53e17fee833d/attachment.md","path":"references/form-autofill.md","size":13298,"sha256":"cac8a61dd883f73acb7c93ba18b70a485c19a61243127aed9264ce979e5c1abd","contentType":"text/markdown; charset=utf-8"},{"id":"96b148c5-efd6-5fa0-a177-1eeafbe63efe","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/96b148c5-efd6-5fa0-a177-1eeafbe63efe/attachment.md","path":"references/id-scanner.md","size":9013,"sha256":"efb635b7b161707326274848c03be4f833c73da629b2633ded87a5c1cc96523b","contentType":"text/markdown; charset=utf-8"},{"id":"471925ea-544d-5d07-beca-37c9b404526f","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/471925ea-544d-5d07-beca-37c9b404526f/attachment.md","path":"references/legal-redactor.md","size":13304,"sha256":"a9f25f08de423ed2296238f6860e67958547830b0d9eec3b76e38bee48beda42","contentType":"text/markdown; charset=utf-8"},{"id":"a76a49f5-a0f8-52f8-a41a-b61d040542c2","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/a76a49f5-a0f8-52f8-a41a-b61d040542c2/attachment.md","path":"references/medical-summarizer.md","size":14635,"sha256":"513664fceefb5d4e576764acdadefc82a9c65614edc9ee6d8d932cf48e3ef9ff","contentType":"text/markdown; charset=utf-8"},{"id":"685897b7-6852-5a65-8029-49ee2498e30e","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/685897b7-6852-5a65-8029-49ee2498e30e/attachment.md","path":"references/meeting-minutes.md","size":8181,"sha256":"e52d84d9f389918818acf7cc5f1973f324209866877c76ca16b1e47294631f1c","contentType":"text/markdown; charset=utf-8"},{"id":"9a19f342-37f2-5506-b8a5-d63822c2853b","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/9a19f342-37f2-5506-b8a5-d63822c2853b/attachment.md","path":"references/receipt-scanner.md","size":10922,"sha256":"4a117fd1915dafb85b8ff6495a2f8513517e90d8a9bd08022c3d2efa036f57be","contentType":"text/markdown; charset=utf-8"},{"id":"61a359cb-c577-570c-9c2d-d6fe90b75e0b","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/61a359cb-c577-570c-9c2d-d6fe90b75e0b/attachment.md","path":"references/resume-parser.md","size":8777,"sha256":"980064c2124c36e0046525af5c8216904b29aeeb648d2bf279547f03315b03dd","contentType":"text/markdown; charset=utf-8"},{"id":"0dfee6ed-ea5f-537b-a27d-42ec56ad9526","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/0dfee6ed-ea5f-537b-a27d-42ec56ad9526/attachment.md","path":"references/table-extractor.md","size":8612,"sha256":"6cfde821a2317e4d97fb9e7e7f7aaaa2c4e5683f332a6c0276cf9224130a009b","contentType":"text/markdown; charset=utf-8"},{"id":"99746384-56e9-5db2-9892-c510392ad205","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/99746384-56e9-5db2-9892-c510392ad205/attachment.txt","path":"requirements.txt","size":1187,"sha256":"8fd213120f3a4f40a4ff6a8e342d98bcbcc66467fc11a13b54843bbe82a14f8e","contentType":"text/plain; charset=utf-8"},{"id":"6d10a879-7cbb-534e-8e78-5cc0d10f664c","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/6d10a879-7cbb-534e-8e78-5cc0d10f664c/attachment.py","path":"scripts/__init__.py","size":0,"sha256":"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855","contentType":"text/x-python; charset=utf-8"},{"id":"a13242cc-2e47-5d3e-80a3-d87826b315f2","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/a13242cc-2e47-5d3e-80a3-d87826b315f2/attachment.py","path":"scripts/audio_transcriber.py","size":5401,"sha256":"d8c6089f139789888886817e2556a2be2ca3b93c0475a632fd8be371cbab4486","contentType":"text/x-python; charset=utf-8"},{"id":"9d13cbb7-8812-56af-858f-8d1e67940f30","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/9d13cbb7-8812-56af-858f-8d1e67940f30/attachment.py","path":"scripts/doc_scanner.py","size":25965,"sha256":"dfc20cd7eb5d82aa1b9e2088708b26a7bef808317ab9ef6b18a66481c4f14001","contentType":"text/x-python; charset=utf-8"},{"id":"12adf5b4-3cbe-5677-be59-8df6da2486f6","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/12adf5b4-3cbe-5677-be59-8df6da2486f6/attachment.py","path":"scripts/expense_logger.py","size":25350,"sha256":"6e926e335432dfe1c506b4c767ea256aed839198f2ce5163464ff1c2fa3b8de2","contentType":"text/x-python; charset=utf-8"},{"id":"aeec4622-7510-521b-b66a-98d39e1c18bf","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/aeec4622-7510-521b-b66a-98d39e1c18bf/attachment.py","path":"scripts/redactor.py","size":42202,"sha256":"8761f251cdc8628e83d683b52a75ecf2aa021e1340c59bdf6c5b457797cad138","contentType":"text/x-python; charset=utf-8"},{"id":"fe766058-b32d-5cea-b252-99696f102f63","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/fe766058-b32d-5cea-b252-99696f102f63/attachment.py","path":"scripts/report_generator.py","size":19029,"sha256":"b364cbfa15d54bcce5ceba18f5eb1221a5c3f3018e5b9c1d9c290d0d60dd4adb","contentType":"text/x-python; charset=utf-8"},{"id":"1ba524ce-1d49-5cea-849e-941b12ecaafa","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/1ba524ce-1d49-5cea-849e-941b12ecaafa/attachment.py","path":"scripts/statement_parser.py","size":23608,"sha256":"57a2b1a1c022422c42565ea3d3dd22a5119cbcef5ed92aec22f0ff0f763c7ec6","contentType":"text/x-python; charset=utf-8"},{"id":"862323b6-c5a1-5c30-98c3-f8be4bea7cae","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/862323b6-c5a1-5c30-98c3-f8be4bea7cae/attachment.py","path":"scripts/table_extractor.py","size":7874,"sha256":"633ecb22d50c053745bd8db12a750360086da3a98375cf8d6a937103e35bcdf2","contentType":"text/x-python; charset=utf-8"},{"id":"ce112795-0cb9-5673-bfdb-d31f74a13554","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/ce112795-0cb9-5673-bfdb-d31f74a13554/attachment.py","path":"scripts/timeline_manager.py","size":9248,"sha256":"92e0517db983fe8c0fc64b956467cb2616ee4ccf5a694915b081746eef6ac019","contentType":"text/x-python; charset=utf-8"},{"id":"d013b2e1-67c9-576e-8bab-8c2ac65a6143","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/d013b2e1-67c9-576e-8bab-8c2ac65a6143/attachment.py","path":"scripts/utils.py","size":59362,"sha256":"4e9ee3e311c0b252cee9b58153785df6cdda02efd9d8e830703f96861a6087c4","contentType":"text/x-python; charset=utf-8"},{"id":"b053615d-2fad-5c98-832b-33a1a049df0a","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/b053615d-2fad-5c98-832b-33a1a049df0a/attachment.sh","path":"setup.sh","size":4794,"sha256":"c7f9eac3f7100709f801cabddc2748ff71dbe35d4f0c7d901a567c660dc04e9f","contentType":"application/x-sh; charset=utf-8"}],"bundle_sha256":"9b20c64f8600e4c853a25b5d26c99bff543427aeac0e115fd92c70fd87c930e5","attachment_count":28,"text_attachments":28,"attachment_storage":"skillopedia-attachments-v1","binary_attachments":0,"excluded_attachments":[]},"cluster_size":1,"skill_md_path":"skills/doc-process/SKILL.md","import_metadata":{"date":"2026-06-05","author":"@skillopedia","version":"v1","category":"data-analytics","category_label":"Data"},"exact_dupes_collapsed_into_this":0},"version":"v1","category":"data-analytics","import_tag":"clean-skills-v1","description":"Document intelligence: categorize, autofill forms, analyze contracts, scan receipts/invoices, analyze bank statements, parse resumes/CVs, scan IDs/passports (MRZ), summarize medical records, redact PII (light/standard/full, 50+ rule types, global coverage), extract meeting minutes/action items, extract tables to CSV/JSON, translate documents, scan/dewarp document photos (edge detection, perspective correction, scan-quality output). Trigger: fill this form, autofill, review contract, red flags, scan receipt, log expense, bank statement, subscriptions, parse resume, scan passport, read id, lab report, redact, remove pii, anonymize, meeting minutes, action items, extract table, table to csv, translate, scan photo, make scanned, dewarp, correct perspective, what is this, analyze this.\n","allowed-tools":["Read","Write","Edit","Bash","Glob"]}},"renderedAt":1782979407738}

Doc-Process — Document Intelligence Skill Step 0 — Auto-Setup (run once on first use) Before invoking any script for the first time in a session, check whether the script dependencies are available. If any are missing, run the setup script automatically — no prompting needed: This installs all Python packages ( , , , , , , , ) and attempts to install system binaries ( , ) via or depending on the platform. When to run Step 0: - First time any script-assisted mode is used in a session - After a fresh - If a script fails with or To install Python packages only (no system packages): Or install di…