Data Masking Overview Data masking replaces real sensitive data with realistic but fake data, preserving format and structure. Essential for: - Dev/staging environments : Use masked production data without exposing real PII - Log sanitization : Prevent PII from appearing in log aggregation systems - Analytics : Analyze behavioral patterns without raw PII - Testing : Realistic test data that won't trigger real consequences Masking Techniques | Technique | How | When to Use | |-----------|-----|-------------| | Static masking | Replace data at rest permanently | Dev DB copy | | Dynamic masking…

, '\\1***\\3') AS email,\n -- Mask phone: show only last 4\n '***-***-' || RIGHT(phone, 4) AS phone,\n -- Mask SSN: show only last 4\n '***-**-' || RIGHT(ssn, 4) AS ssn,\n -- Keep non-sensitive fields as-is\n created_at,\n status,\n country\nFROM users;\n\n-- Grant dev team access to masked view only (not base table)\nGRANT SELECT ON users_masked TO dev_team;\nREVOKE SELECT ON users FROM dev_team;\n\n-- Column-level masking function using pgcrypto for format-preserving\nCREATE OR REPLACE FUNCTION mask_pan(pan TEXT) RETURNS TEXT AS $\nBEGIN\n RETURN RPAD(LEFT(pan, 6), LENGTH(pan) - 4, '*') || RIGHT(pan, 4);\nEND;\n$ LANGUAGE plpgsql IMMUTABLE;\n\n-- Dynamic masking based on current user role\nCREATE OR REPLACE FUNCTION get_user_data(p_user_id UUID)\nRETURNS TABLE (name TEXT, email TEXT, phone TEXT) AS $\nBEGIN\n IF current_user = 'admin_role' THEN\n RETURN QUERY SELECT u.name, u.email, u.phone FROM users u WHERE u.id = p_user_id;\n ELSE\n RETURN QUERY SELECT \n LEFT(u.name, 1) || '***',\n REGEXP_REPLACE(u.email, '^([^@])([^@]*)(@.+)

Data Masking Overview Data masking replaces real sensitive data with realistic but fake data, preserving format and structure. Essential for: - Dev/staging environments : Use masked production data without exposing real PII - Log sanitization : Prevent PII from appearing in log aggregation systems - Analytics : Analyze behavioral patterns without raw PII - Testing : Realistic test data that won't trigger real consequences Masking Techniques | Technique | How | When to Use | |-----------|-----|-------------| | Static masking | Replace data at rest permanently | Dev DB copy | | Dynamic masking…

, '\\1***\\3'),\n '***-***-' || RIGHT(u.phone, 4)\n FROM users u WHERE u.id = p_user_id;\n END IF;\nEND;\n$ LANGUAGE plpgsql SECURITY DEFINER;\n```\n\n## Microsoft Presidio — Auto-Detection\n\n```python\n# Presidio automatically detects and masks PII using NLP\nfrom presidio_analyzer import AnalyzerEngine\nfrom presidio_anonymizer import AnonymizerEngine, AnonymizerConfig\nfrom presidio_anonymizer.entities import OperatorConfig\n\nanalyzer = AnalyzerEngine()\nanonymizer = AnonymizerEngine()\n\ndef mask_text_presidio(text: str, masking_style: str = \"replace\") -> str:\n \"\"\"Auto-detect and mask PII using Presidio NLP.\"\"\"\n results = analyzer.analyze(text=text, language=\"en\")\n \n if masking_style == \"replace\":\n # Replace with type label: [EMAIL_ADDRESS]\n operators = {\n \"DEFAULT\": OperatorConfig(\"replace\", {\"new_value\": \"[REDACTED]\"}),\n \"EMAIL_ADDRESS\": OperatorConfig(\"replace\", {\"new_value\": \"[EMAIL]\"}),\n \"PHONE_NUMBER\": OperatorConfig(\"replace\", {\"new_value\": \"[PHONE]\"}),\n \"PERSON\": OperatorConfig(\"replace\", {\"new_value\": \"[NAME]\"}),\n \"US_SSN\": OperatorConfig(\"replace\", {\"new_value\": \"[SSN]\"}),\n }\n elif masking_style == \"hash\":\n # Hash for consistent pseudonymization (same input → same output)\n operators = {\"DEFAULT\": OperatorConfig(\"hash\", {\"hash_type\": \"sha256\"})}\n \n anonymized = anonymizer.anonymize(\n text=text,\n analyzer_results=results,\n operators=operators\n )\n return anonymized.text\n\n# Example\ntext = \"Contact John Smith at [email protected] or 555-123-4567\"\nprint(mask_text_presidio(text))\n# → \"Contact [NAME] at [EMAIL] or [PHONE]\"\n```\n\n## Production DB → Dev DB Pipeline\n\n```bash\n#!/bin/bash\n# mask-db-for-dev.sh — Safe production → dev data pipeline\n\nset -e\nPROD_DB=\"postgresql://prod-server/app\"\nDEV_DB=\"postgresql://dev-server/app_dev\"\n\necho \"Dumping production schema...\"\npg_dump --schema-only $PROD_DB > schema.sql\n\necho \"Applying schema to dev...\"\npsql $DEV_DB \u003c schema.sql\n\necho \"Copying and masking data...\"\npsql $PROD_DB -c \"\\COPY (\n SELECT \n id,\n LEFT(first_name, 1) || 'XXXX' AS first_name,\n 'User' AS last_name,\n 'user_' || id || '@example.com' AS email,\n '555-000-' || LPAD((ROW_NUMBER() OVER())::TEXT, 4, '0') AS phone,\n created_at,\n status\n FROM users\n) TO STDOUT WITH CSV\" | psql $DEV_DB -c \"\\COPY users_masked FROM STDIN WITH CSV\"\n\necho \"Done. Dev database ready with masked data.\"\n```\n\n## Statistical Anonymization (GDPR)\n\n**Anonymization vs Pseudonymization (GDPR Article 4):**\n- **Anonymization**: Irreversible -- data can never be linked to an individual. Falls outside GDPR scope.\n- **Pseudonymization**: Reversible -- data can be re-linked with additional info. Still personal data under GDPR.\n\n**Key techniques for true anonymization:**\n- **k-Anonymity**: Each record is indistinguishable from at least k-1 others on quasi-identifiers (age, ZIP, gender). Generalize values into ranges and suppress groups smaller than k.\n- **l-Diversity**: Each equivalence class has at least l distinct sensitive attribute values, preventing attribute disclosure.\n- **Differential Privacy**: Mathematical privacy guarantee controlled by epsilon -- add calibrated noise to query results. Use `diffprivlib` (Python) or Google DP libraries.\n\nk-anonymity alone is often insufficient for GDPR -- combine with l-diversity and/or differential privacy.\n\n## Compliance Checklist\n\n- [ ] PII inventory completed (what data, where it lives)\n- [ ] Log scrubbing middleware deployed in all services\n- [ ] Dev/staging environments use masked data only\n- [ ] Database views/roles restrict raw PII access\n- [ ] API responses mask PII for non-privileged callers\n- [ ] CI pipeline scans for hardcoded PII/secrets\n- [ ] Masked data pipeline documented and tested\n- [ ] Masking solution reviewed annually\n---","attachment_filenames":["_scores.json"],"attachments":[{"filename":"_scores.json","content":"{\n \"version\": \"1.0.0\",\n \"skillHash\": \"sha256:d6b7ec9b15b12bc4544710a6fa1c03872f53e4e3a695b8bdeab9fb33b146dc0a\",\n \"scoredAt\": \"2026-05-13T16:20:02.446Z\",\n \"backend\": \"ollama\",\n \"model\": \"gpt-oss:20b\",\n \"quality\": {\n \"score\": 87,\n \"dimensions\": {\n \"clarity\": \"PASS\",\n \"completeness\": \"WEAK\",\n \"conciseness\": \"PASS\",\n \"actionability\": \"PASS\",\n \"crossPlatform\": \"WEAK\",\n \"examples\": \"PASS\"\n },\n \"issues\": [\n {\n \"severity\": \"MEDIUM\",\n \"category\": \"completeness\",\n \"detail\": \"The skill does not address error handling or edge cases in the provided examples.\"\n },\n {\n \"severity\": \"MEDIUM\",\n \"category\": \"crossPlatform\",\n \"detail\": \"The skill assumes specific runtime environments (Python 3.9+, Node.js 18+) and libraries, limiting cross-platform applicability.\"\n }\n ]\n },\n \"security\": {\n \"verdict\": \"SAFE\",\n \"issues\": []\n },\n \"impact\": {\n \"multiplier\": 1,\n \"baselineAvg\": 98,\n \"treatmentAvg\": 98,\n \"scenarios\": [\n {\n \"name\": \"mask-emails-in-csv-with-faker\",\n \"baseline\": 100,\n \"treatment\": 100,\n \"rationale\": \"Both responses fully satisfy the rubric, correctly importing csv and faker, reading the CSV, replacing the email column with faker.email(), preserving other columns, and writing to a new CSV.\"\n },\n {\n \"name\": \"presidio-auto-mask-pii\",\n \"baseline\": 95,\n \"treatment\": 95,\n \"rationale\": \"Both responses fully implement the required Presidio pipeline, import the correct engines, analyze and anonymize with replace operators, and return the masked text; they differ only in style and placeholder choices.\"\n }\n ]\n }\n}\n","content_type":"application/json; charset=utf-8","language":"json","size":1731,"content_sha256":"77ff54a33611f1137b8a38753c316a33543863fe512c74e836c386bcfe81db77"}],"content_json":{"type":"doc","content":[{"type":"heading","attrs":{"level":1},"content":[{"text":"Data Masking","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Overview","type":"text"}]},{"type":"paragraph","content":[{"text":"Data masking replaces real sensitive data with realistic but fake data, preserving format and structure. Essential for:","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Dev/staging environments","type":"text","marks":[{"type":"strong"}]},{"text":": Use masked production data without exposing real PII","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Log sanitization","type":"text","marks":[{"type":"strong"}]},{"text":": Prevent PII from appearing in log aggregation systems","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Analytics","type":"text","marks":[{"type":"strong"}]},{"text":": Analyze behavioral patterns without raw PII","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Testing","type":"text","marks":[{"type":"strong"}]},{"text":": Realistic test data that won't trigger real consequences","type":"text"}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Masking Techniques","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Technique","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"How","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"When to Use","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Static masking","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Replace data at rest permanently","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Dev DB copy","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Dynamic masking","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Mask on-read, original preserved","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Role-based views","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Tokenization","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Replace with token that maps to real value","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Payment cards","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Format-preserving","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Keep format, change values (e.g., real-looking SSN)","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Testing","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Redaction","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Replace with placeholder (","type":"text"},{"text":"[REDACTED]","type":"text","marks":[{"type":"code_inline"}]},{"text":")","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Logs","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Generalization","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Replace specific value with range (age 34 → 30-40)","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Analytics","type":"text"}]}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"PII Pattern Library","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"python"},"content":[{"text":"import re\n\nPII_PATTERNS = {\n \"email\": r'\\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Z|a-z]{2,}\\b',\n \"phone_us\": r'\\b(?:\\+1[-.]?)?\\(?[0-9]{3}\\)?[-.\\s]?[0-9]{3}[-.\\s]?[0-9]{4}\\b',\n \"ssn\": r'\\b(?!000|666|9\\d{2})\\d{3}-(?!00)\\d{2}-(?!0000)\\d{4}\\b',\n \"credit_card\": r'\\b(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|3[47][0-9]{13}|6(?:011|5[0-9]{2})[0-9]{12})\\b',\n \"ip_address\": r'\\b(?:(?:25[0-5]|2[0-4]\\d|[01]?\\d\\d?)\\.){3}(?:25[0-5]|2[0-4]\\d|[01]?\\d\\d?)\\b',\n \"date_of_birth\": r'\\b(?:0[1-9]|1[0-2])[\\/\\-](?:0[1-9]|[12]\\d|3[01])[\\/\\-](?:19|20)\\d{2}\\b',\n \"passport\": r'\\b[A-Z]{1,2}[0-9]{6,9}\\b',\n \"zip_code\": r'\\b\\d{5}(?:-\\d{4})?\\b',\n}","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Email and Credit Card Maskers","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"python"},"content":[{"text":"import random\nimport string\nfrom faker import Faker\n\nfake = Faker()\n\ndef mask_email(email: str) -> str:\n \"\"\"Mask email preserving domain structure.\"\"\"\n local, domain = email.split('@')\n masked_local = local[0] + '*' * (len(local) - 2) + local[-1] if len(local) > 2 else '***'\n return f\"{masked_local}@{domain}\"\n\ndef mask_email_fake(email: str) -> str:\n \"\"\"Replace email with realistic fake.\"\"\"\n return fake.email()\n\ndef mask_credit_card(card_number: str) -> str:\n \"\"\"Mask credit card — show only last 4 digits.\"\"\"\n cleaned = re.sub(r'[\\s-]', '', card_number)\n return '*' * (len(cleaned) - 4) + cleaned[-4:]\n\ndef mask_ssn(ssn: str) -> str:\n \"\"\"Mask SSN — show only last 4.\"\"\"\n cleaned = ssn.replace('-', '').replace(' ', '')\n return f\"***-**-{cleaned[-4:]}\"\n\ndef mask_phone(phone: str) -> str:\n \"\"\"Mask phone — show only last 4 digits.\"\"\"\n digits = re.sub(r'\\D', '', phone)\n return f\"***-***-{digits[-4:]}\"\n\ndef generate_fake_pii() -> dict:\n \"\"\"Generate a complete set of realistic fake PII for testing.\"\"\"\n return {\n \"name\": fake.name(),\n \"email\": fake.email(),\n \"phone\": fake.phone_number(),\n \"address\": fake.address(),\n \"ssn\": fake.ssn(),\n \"dob\": fake.date_of_birth(minimum_age=18, maximum_age=90).isoformat(),\n \"credit_card\": fake.credit_card_number(card_type='visa'),\n \"company\": fake.company(),\n }","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Log Sanitizer Middleware","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"python"},"content":[{"text":"# Express.js log scrubbing middleware\nconst PII_PATTERNS = {\n email: /\\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Z|a-z]{2,}\\b/g,\n creditCard: /\\b(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|3[47][0-9]{13})\\b/g,\n ssn: /\\b(?!000|666|9\\d{2})\\d{3}-(?!00)\\d{2}-(?!0000)\\d{4}\\b/g,\n phone: /\\b(?:\\+1[-.]?)?\\(?[0-9]{3}\\)?[-.\\s]?[0-9]{3}[-.\\s]?[0-9]{4}\\b/g,\n password: /\"password\"\\s*:\\s*\"[^\"]*\"/g,\n token: /\"(?:token|api_key|secret|authorization)\"\\s*:\\s*\"[^\"]*\"/gi,\n};\n\nfunction sanitizeLog(data) {\n let sanitized = typeof data === 'string' ? data : JSON.stringify(data);\n \n sanitized = sanitized.replace(PII_PATTERNS.email, '[EMAIL]');\n sanitized = sanitized.replace(PII_PATTERNS.creditCard, '[CREDIT_CARD]');\n sanitized = sanitized.replace(PII_PATTERNS.ssn, '[SSN]');\n sanitized = sanitized.replace(PII_PATTERNS.phone, '[PHONE]');\n sanitized = sanitized.replace(PII_PATTERNS.password, '\"password\":\"[REDACTED]\"');\n sanitized = sanitized.replace(PII_PATTERNS.token, (match) => {\n const key = match.split(':')[0];\n return `${key}:\"[REDACTED]\"`;\n });\n \n return sanitized;\n}\n\n// Wrap Winston logger to auto-sanitize\nconst winston = require('winston');\nconst logger = winston.createLogger({\n transports: [new winston.transports.Console()],\n format: winston.format.combine(\n winston.format.printf(({ level, message, ...meta }) => {\n return JSON.stringify({\n level,\n message: sanitizeLog(message),\n ...JSON.parse(sanitizeLog(JSON.stringify(meta)))\n });\n })\n )\n});","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Database Masking (PostgreSQL)","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"sql"},"content":[{"text":"-- Create masked view for dev access\nCREATE OR REPLACE VIEW users_masked AS\nSELECT\n id,\n -- Mask name: keep first letter + *** \n LEFT(first_name, 1) || '***' AS first_name,\n LEFT(last_name, 1) || '***' AS last_name,\n -- Mask email: preserve domain\n REGEXP_REPLACE(email, '^([^@])([^@]*)(@.+)

Data Masking Overview Data masking replaces real sensitive data with realistic but fake data, preserving format and structure. Essential for: - Dev/staging environments : Use masked production data without exposing real PII - Log sanitization : Prevent PII from appearing in log aggregation systems - Analytics : Analyze behavioral patterns without raw PII - Testing : Realistic test data that won't trigger real consequences Masking Techniques | Technique | How | When to Use | |-----------|-----|-------------| | Static masking | Replace data at rest permanently | Dev DB copy | | Dynamic masking…

, '\\1***\\3') AS email,\n -- Mask phone: show only last 4\n '***-***-' || RIGHT(phone, 4) AS phone,\n -- Mask SSN: show only last 4\n '***-**-' || RIGHT(ssn, 4) AS ssn,\n -- Keep non-sensitive fields as-is\n created_at,\n status,\n country\nFROM users;\n\n-- Grant dev team access to masked view only (not base table)\nGRANT SELECT ON users_masked TO dev_team;\nREVOKE SELECT ON users FROM dev_team;\n\n-- Column-level masking function using pgcrypto for format-preserving\nCREATE OR REPLACE FUNCTION mask_pan(pan TEXT) RETURNS TEXT AS $\nBEGIN\n RETURN RPAD(LEFT(pan, 6), LENGTH(pan) - 4, '*') || RIGHT(pan, 4);\nEND;\n$ LANGUAGE plpgsql IMMUTABLE;\n\n-- Dynamic masking based on current user role\nCREATE OR REPLACE FUNCTION get_user_data(p_user_id UUID)\nRETURNS TABLE (name TEXT, email TEXT, phone TEXT) AS $\nBEGIN\n IF current_user = 'admin_role' THEN\n RETURN QUERY SELECT u.name, u.email, u.phone FROM users u WHERE u.id = p_user_id;\n ELSE\n RETURN QUERY SELECT \n LEFT(u.name, 1) || '***',\n REGEXP_REPLACE(u.email, '^([^@])([^@]*)(@.+)

Data Masking Overview Data masking replaces real sensitive data with realistic but fake data, preserving format and structure. Essential for: - Dev/staging environments : Use masked production data without exposing real PII - Log sanitization : Prevent PII from appearing in log aggregation systems - Analytics : Analyze behavioral patterns without raw PII - Testing : Realistic test data that won't trigger real consequences Masking Techniques | Technique | How | When to Use | |-----------|-----|-------------| | Static masking | Replace data at rest permanently | Dev DB copy | | Dynamic masking…

, '\\1***\\3'),\n '***-***-' || RIGHT(u.phone, 4)\n FROM users u WHERE u.id = p_user_id;\n END IF;\nEND;\n$ LANGUAGE plpgsql SECURITY DEFINER;","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Microsoft Presidio — Auto-Detection","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"python"},"content":[{"text":"# Presidio automatically detects and masks PII using NLP\nfrom presidio_analyzer import AnalyzerEngine\nfrom presidio_anonymizer import AnonymizerEngine, AnonymizerConfig\nfrom presidio_anonymizer.entities import OperatorConfig\n\nanalyzer = AnalyzerEngine()\nanonymizer = AnonymizerEngine()\n\ndef mask_text_presidio(text: str, masking_style: str = \"replace\") -> str:\n \"\"\"Auto-detect and mask PII using Presidio NLP.\"\"\"\n results = analyzer.analyze(text=text, language=\"en\")\n \n if masking_style == \"replace\":\n # Replace with type label: [EMAIL_ADDRESS]\n operators = {\n \"DEFAULT\": OperatorConfig(\"replace\", {\"new_value\": \"[REDACTED]\"}),\n \"EMAIL_ADDRESS\": OperatorConfig(\"replace\", {\"new_value\": \"[EMAIL]\"}),\n \"PHONE_NUMBER\": OperatorConfig(\"replace\", {\"new_value\": \"[PHONE]\"}),\n \"PERSON\": OperatorConfig(\"replace\", {\"new_value\": \"[NAME]\"}),\n \"US_SSN\": OperatorConfig(\"replace\", {\"new_value\": \"[SSN]\"}),\n }\n elif masking_style == \"hash\":\n # Hash for consistent pseudonymization (same input → same output)\n operators = {\"DEFAULT\": OperatorConfig(\"hash\", {\"hash_type\": \"sha256\"})}\n \n anonymized = anonymizer.anonymize(\n text=text,\n analyzer_results=results,\n operators=operators\n )\n return anonymized.text\n\n# Example\ntext = \"Contact John Smith at [email protected] or 555-123-4567\"\nprint(mask_text_presidio(text))\n# → \"Contact [NAME] at [EMAIL] or [PHONE]\"","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Production DB → Dev DB Pipeline","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"#!/bin/bash\n# mask-db-for-dev.sh — Safe production → dev data pipeline\n\nset -e\nPROD_DB=\"postgresql://prod-server/app\"\nDEV_DB=\"postgresql://dev-server/app_dev\"\n\necho \"Dumping production schema...\"\npg_dump --schema-only $PROD_DB > schema.sql\n\necho \"Applying schema to dev...\"\npsql $DEV_DB \u003c schema.sql\n\necho \"Copying and masking data...\"\npsql $PROD_DB -c \"\\COPY (\n SELECT \n id,\n LEFT(first_name, 1) || 'XXXX' AS first_name,\n 'User' AS last_name,\n 'user_' || id || '@example.com' AS email,\n '555-000-' || LPAD((ROW_NUMBER() OVER())::TEXT, 4, '0') AS phone,\n created_at,\n status\n FROM users\n) TO STDOUT WITH CSV\" | psql $DEV_DB -c \"\\COPY users_masked FROM STDIN WITH CSV\"\n\necho \"Done. Dev database ready with masked data.\"","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Statistical Anonymization (GDPR)","type":"text"}]},{"type":"paragraph","content":[{"text":"Anonymization vs Pseudonymization (GDPR Article 4):","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Anonymization","type":"text","marks":[{"type":"strong"}]},{"text":": Irreversible -- data can never be linked to an individual. Falls outside GDPR scope.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Pseudonymization","type":"text","marks":[{"type":"strong"}]},{"text":": Reversible -- data can be re-linked with additional info. Still personal data under GDPR.","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"Key techniques for true anonymization:","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"k-Anonymity","type":"text","marks":[{"type":"strong"}]},{"text":": Each record is indistinguishable from at least k-1 others on quasi-identifiers (age, ZIP, gender). Generalize values into ranges and suppress groups smaller than k.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"l-Diversity","type":"text","marks":[{"type":"strong"}]},{"text":": Each equivalence class has at least l distinct sensitive attribute values, preventing attribute disclosure.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Differential Privacy","type":"text","marks":[{"type":"strong"}]},{"text":": Mathematical privacy guarantee controlled by epsilon -- add calibrated noise to query results. Use ","type":"text"},{"text":"diffprivlib","type":"text","marks":[{"type":"code_inline"}]},{"text":" (Python) or Google DP libraries.","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"k-anonymity alone is often insufficient for GDPR -- combine with l-diversity and/or differential privacy.","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Compliance Checklist","type":"text"}]},{"type":"checkbox_list","attrs":{"id":null},"content":[{"type":"checkbox_item","attrs":{"checked":false},"content":[{"type":"paragraph","content":[{"text":"PII inventory completed (what data, where it lives)","type":"text"}]}]},{"type":"checkbox_item","attrs":{"checked":false},"content":[{"type":"paragraph","content":[{"text":"Log scrubbing middleware deployed in all services","type":"text"}]}]},{"type":"checkbox_item","attrs":{"checked":false},"content":[{"type":"paragraph","content":[{"text":"Dev/staging environments use masked data only","type":"text"}]}]},{"type":"checkbox_item","attrs":{"checked":false},"content":[{"type":"paragraph","content":[{"text":"Database views/roles restrict raw PII access","type":"text"}]}]},{"type":"checkbox_item","attrs":{"checked":false},"content":[{"type":"paragraph","content":[{"text":"API responses mask PII for non-privileged callers","type":"text"}]}]},{"type":"checkbox_item","attrs":{"checked":false},"content":[{"type":"paragraph","content":[{"text":"CI pipeline scans for hardcoded PII/secrets","type":"text"}]}]},{"type":"checkbox_item","attrs":{"checked":false},"content":[{"type":"paragraph","content":[{"text":"Masked data pipeline documented and tested","type":"text"}]}]},{"type":"checkbox_item","attrs":{"checked":false},"content":[{"type":"paragraph","content":[{"text":"Masking solution reviewed annually","type":"text"}]}]}]},{"type":"hr","attrs":{"markup":"---"}}]},"metadata":{"date":"2026-06-05","name":"data-masking","author":"@skillopedia","source":{"stars":62,"repo_name":"skills","origin_url":"https://github.com/terminalskills/skills/blob/HEAD/skills/data-masking/SKILL.md","repo_owner":"terminalskills","body_sha256":"764523341774e39873ec1e0442fac2d9a19587fd71e83488901e4265e9e3e024","cluster_key":"9e9abfdcd2917df96f38d773072e39beeffbe316b150db4372999a1855d60ccc","clean_bundle":{"format":"clean-skill-bundle-v1","source":"terminalskills/skills/skills/data-masking/SKILL.md","attachments":[{"id":"b73ea828-50c3-5063-8829-bb69a89edadb","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/b73ea828-50c3-5063-8829-bb69a89edadb/attachment.json","path":"_scores.json","size":1731,"sha256":"77ff54a33611f1137b8a38753c316a33543863fe512c74e836c386bcfe81db77","contentType":"application/json; charset=utf-8"}],"bundle_sha256":"dfa0d3f462b1b14fd6a42f1a70a417b8754df29add3c173b0588691c423499db","attachment_count":1,"text_attachments":1,"attachment_storage":"skillopedia-attachments-v1","binary_attachments":0,"excluded_attachments":[]},"cluster_size":1,"skill_md_path":"skills/data-masking/SKILL.md","import_metadata":{"date":"2026-06-05","author":"@skillopedia","version":"v1","category":"data-analytics","category_label":"Data"},"exact_dupes_collapsed_into_this":0},"license":"Apache-2.0","version":"v1","category":"data-analytics","metadata":{"tags":["data-masking","pii","privacy","anonymization","redaction"],"agents":["claude-code","openai-codex","gemini-cli","cursor"],"author":"terminal-skills","version":"1.0.0","category":"development","use-cases":["Mask production PII before copying to dev/staging environment","Scrub sensitive data from application logs","Redact PII from API responses for analytics"]},"import_tag":"clean-skills-v1","description":"Mask, redact, and anonymize sensitive data (PII, PCI, PHI) in databases, logs, and APIs. Use when protecting PII in dev/staging environments, redacting sensitive data from logs, anonymizing data for analytics, or applying k-anonymity and differential privacy for GDPR-compliant data sharing.","compatibility":"Python 3.9+, Node.js 18+. Libraries: faker, presidio, anonymize-it."}},"renderedAt":1782979954349}

Data Masking Overview Data masking replaces real sensitive data with realistic but fake data, preserving format and structure. Essential for: - Dev/staging environments : Use masked production data without exposing real PII - Log sanitization : Prevent PII from appearing in log aggregation systems - Analytics : Analyze behavioral patterns without raw PII - Testing : Realistic test data that won't trigger real consequences Masking Techniques | Technique | How | When to Use | |-----------|-----|-------------| | Static masking | Replace data at rest permanently | Dev DB copy | | Dynamic masking…