Data Validation Engine Overview Implement and enforce data integrity rules at the database level using CHECK constraints, triggers, foreign keys, and custom validation functions across PostgreSQL and MySQL. Prerequisites - Database credentials with ALTER TABLE and CREATE FUNCTION permissions - or CLI for executing validation queries - Current schema documentation or access to for column specifications - Business rules document describing valid data ranges, formats, and relationships - Backup of production data before applying new constraints (constraints may reject existing invalid data) Inst…

`\n - Phone format: `SELECT phone FROM contacts WHERE phone !~ '^\\+?[1-9]\\d{6,14}

Data Validation Engine Overview Implement and enforce data integrity rules at the database level using CHECK constraints, triggers, foreign keys, and custom validation functions across PostgreSQL and MySQL. Prerequisites - Database credentials with ALTER TABLE and CREATE FUNCTION permissions - or CLI for executing validation queries - Current schema documentation or access to for column specifications - Business rules document describing valid data ranges, formats, and relationships - Backup of production data before applying new constraints (constraints may reject existing invalid data) Inst…

`\n - URL format: `SELECT url FROM links WHERE url !~ '^https?://.+'`\n - Date ranges: `SELECT * FROM events WHERE start_date > end_date`\n\n4. Check numeric range violations: `SELECT * FROM products WHERE price \u003c 0 OR price > 999999.99` and `SELECT * FROM users WHERE age \u003c 0 OR age > 150`. Map each column to its valid range based on business rules.\n\n5. Identify duplicate records that violate intended uniqueness: `SELECT email, COUNT(*) FROM users GROUP BY email HAVING COUNT(*) > 1`. Determine which duplicate to keep (most recent, most complete) and plan deduplication.\n\n6. Generate CHECK constraints for validated rules:\n - `ALTER TABLE products ADD CONSTRAINT chk_price_positive CHECK (price >= 0)`\n - `ALTER TABLE users ADD CONSTRAINT chk_email_format CHECK (email ~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}

Data Validation Engine Overview Implement and enforce data integrity rules at the database level using CHECK constraints, triggers, foreign keys, and custom validation functions across PostgreSQL and MySQL. Prerequisites - Database credentials with ALTER TABLE and CREATE FUNCTION permissions - or CLI for executing validation queries - Current schema documentation or access to for column specifications - Business rules document describing valid data ranges, formats, and relationships - Backup of production data before applying new constraints (constraints may reject existing invalid data) Inst…

)`\n - `ALTER TABLE events ADD CONSTRAINT chk_date_order CHECK (start_date \u003c= end_date)`\n - `ALTER TABLE orders ADD CONSTRAINT chk_status_valid CHECK (status IN ('pending', 'processing', 'shipped', 'delivered', 'cancelled'))`\n\n7. Create foreign key constraints with appropriate cascade behavior:\n - `ALTER TABLE orders ADD CONSTRAINT fk_orders_customer FOREIGN KEY (customer_id) REFERENCES customers(id) ON DELETE RESTRICT`\n - Use `ON DELETE CASCADE` for dependent data (order_items when order is deleted)\n - Use `ON DELETE SET NULL` for optional relationships (assigned_to when user is deactivated)\n\n8. Implement complex business rule validation using database triggers when CHECK constraints are insufficient:\n - Trigger that prevents order total from exceeding customer credit limit\n - Trigger that enforces at least one admin user per organization\n - Trigger that validates JSON schema for JSONB columns\n\n9. Apply constraints in a safe two-phase approach:\n - Phase 1: Run validation queries to find all violations. Generate data cleanup scripts. Execute cleanup.\n - Phase 2: Apply constraints with `NOT VALID` option (PostgreSQL): `ALTER TABLE users ADD CONSTRAINT chk_email CHECK (email ~ '...') NOT VALID` then `ALTER TABLE users VALIDATE CONSTRAINT chk_email` (validates existing data without blocking writes).\n\n10. Generate a data quality report summarizing: total records per table, violation counts by constraint type, cleanup actions taken, constraints applied, and remaining data quality issues requiring manual review.\n\n## Output\n\n- **Data quality audit report** with violation counts, examples, and severity ratings\n- **Data cleanup scripts** (SQL) to fix violations before constraint application\n- **Constraint DDL scripts** with CHECK, FOREIGN KEY, NOT NULL, and UNIQUE constraints\n- **Validation triggers** for complex business rules beyond simple constraints\n- **Ongoing validation queries** for periodic data quality monitoring\n\n## Error Handling\n\n| Error | Cause | Solution |\n|-------|-------|---------|\n| `check constraint violated by existing row` | Existing data fails the new constraint | Run the validation query first to find violations; clean up data; use `NOT VALID` option to add constraint without checking existing data, then validate separately |\n| `cannot add foreign key: referenced row not found` | Orphaned child records reference non-existent parent | Clean up orphaned records first with DELETE or UPDATE to valid parent; or insert missing parent records |\n| `column cannot be made NOT NULL: contains NULL values` | Existing rows have NULL in the target column | Backfill NULLs with `UPDATE table SET column = default_value WHERE column IS NULL` before adding NOT NULL |\n| Trigger function causes performance regression | Complex validation logic executes on every INSERT/UPDATE | Optimize trigger function; use WHEN clause to limit trigger firing; consider CHECK constraints instead of triggers for simple rules |\n| Circular foreign key prevents constraint creation | Tables reference each other, preventing creation order | Use `ALTER TABLE ADD CONSTRAINT` after both tables exist; or use `DEFERRABLE INITIALLY DEFERRED` constraints |\n\n## Examples\n\n**Auditing a legacy database with 50,000 invalid email addresses**: Validation query reveals 50,000 of 2M user records have invalid email formats (missing @, double dots, spaces). A cleanup script normalizes common issues (trim whitespace, lowercase) and flags 3,000 unfixable records for manual review. After cleanup, a CHECK constraint with regex validation is applied.\n\n**Enforcing referential integrity on a database without foreign keys**: An application relied on application-level FK enforcement, resulting in 12,000 orphaned order_items, 800 orphaned payments, and 200 orphaned reviews. Cleanup scripts archive orphaned records to backup tables, then foreign key constraints with `ON DELETE CASCADE` are added. A nightly validation job monitors for new orphans.\n\n**Implementing business rules for a financial application**: Constraints enforce: account balance cannot be negative (`CHECK (balance >= 0)`), transfer amount must be positive (`CHECK (amount > 0)`), transaction date cannot be in the future (`CHECK (transaction_date \u003c= CURRENT_DATE)`), and a trigger prevents transfers between accounts owned by different customers unless explicitly authorized.\n\n## Resources\n\n- PostgreSQL CHECK constraints: https://www.postgresql.org/docs/current/ddl-constraints.html\n- PostgreSQL triggers: https://www.postgresql.org/docs/current/triggers.html\n- MySQL CHECK constraints (8.0.16+): https://dev.mysql.com/doc/refman/8.0/en/create-table-check-constraints.html\n- Data validation patterns: https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-CHECK-CONSTRAINTS\n- NOT VALID constraint option: https://www.postgresql.org/docs/current/sql-altertable.html\n---","attachment_filenames":["assets/README.md","references/README.md","scripts/configure_validation_rules.py","scripts/generate_validation_report.py","scripts/README.md"],"attachments":[{"filename":"assets/README.md","content":"# Assets\n\nBundled resources for data-validation-engine skill\n\n- [ ] validation_report_template.html: HTML template for generating visually appealing and informative data validation reports.\n","content_type":"text/markdown; charset=utf-8","language":"markdown","size":190,"content_sha256":"8e03eab6255fe7ae2637f81a943ab4461c9465f6f7d5ad713a90516a52190496"},{"filename":"references/README.md","content":"# References\n\nBundled resources for data-validation-engine skill\n","content_type":"text/markdown; charset=utf-8","language":"markdown","size":65,"content_sha256":"3aada9fbb2db449b37b7bd36644626be76b70fe1f508c0dcb40e6ae6bef1619f"},{"filename":"scripts/configure_validation_rules.py","content":"#!/usr/bin/env python3\n\"\"\"\nInteractively configure data validation rules for a database table.\n\nThis script allows users to define and customize validation rules for database tables,\nwhich are then saved in a configuration file for use by other validation scripts.\n\"\"\"\n\nimport argparse\nimport json\nimport sys\nfrom datetime import datetime\nfrom typing import Dict, Any\n\n\nclass ValidationRuleConfigurator:\n \"\"\"Interactively configure validation rules.\"\"\"\n\n def __init__(self):\n \"\"\"Initialize configurator.\"\"\"\n self.rules = []\n self.table_name = \"\"\n self.database = \"\"\n\n def add_not_null_rule(self, column: str):\n \"\"\"\n Add a NOT NULL validation rule.\n\n Args:\n column: Column name\n \"\"\"\n self.rules.append(\n {\"rule\": \"not_null\", \"column\": column, \"description\": f\"Column {column} must not contain NULL values\"}\n )\n\n def add_unique_rule(self, column: str):\n \"\"\"\n Add a UNIQUE validation rule.\n\n Args:\n column: Column name\n \"\"\"\n self.rules.append(\n {\"rule\": \"unique\", \"column\": column, \"description\": f\"Column {column} must contain unique values\"}\n )\n\n def add_range_rule(self, column: str, min_value: float, max_value: float):\n \"\"\"\n Add a RANGE validation rule.\n\n Args:\n column: Column name\n min_value: Minimum allowed value\n max_value: Maximum allowed value\n \"\"\"\n self.rules.append(\n {\n \"rule\": \"range\",\n \"column\": column,\n \"min\": min_value,\n \"max\": max_value,\n \"description\": f\"Column {column} values must be between {min_value} and {max_value}\",\n }\n )\n\n def add_pattern_rule(self, column: str, pattern: str):\n \"\"\"\n Add a PATTERN (regex) validation rule.\n\n Args:\n column: Column name\n pattern: Regular expression pattern\n \"\"\"\n self.rules.append(\n {\n \"rule\": \"pattern\",\n \"column\": column,\n \"pattern\": pattern,\n \"description\": f\"Column {column} values must match pattern: {pattern}\",\n }\n )\n\n def add_custom_rule(self, column: str, query: str):\n \"\"\"\n Add a custom SQL validation rule.\n\n Args:\n column: Column name\n query: Custom SQL query\n \"\"\"\n self.rules.append(\n {\"rule\": \"custom\", \"column\": column, \"query\": query, \"description\": f\"Custom validation on {column}\"}\n )\n\n def remove_rule(self, index: int) -> bool:\n \"\"\"\n Remove a rule by index.\n\n Args:\n index: Rule index\n\n Returns:\n True if successful, False otherwise\n \"\"\"\n if 0 \u003c= index \u003c len(self.rules):\n del self.rules[index]\n return True\n return False\n\n def get_config_dict(self) -> Dict[str, Any]:\n \"\"\"\n Get configuration as dictionary.\n\n Returns:\n Configuration dictionary\n \"\"\"\n return {\n \"table\": self.table_name,\n \"database\": self.database,\n \"created_at\": datetime.now().isoformat(),\n \"validations\": self.rules,\n }\n\n def load_config(self, filepath: str) -> bool:\n \"\"\"\n Load configuration from JSON file.\n\n Args:\n filepath: Path to JSON file\n\n Returns:\n True if successful, False otherwise\n \"\"\"\n try:\n with open(filepath, \"r\") as f:\n config = json.load(f)\n\n self.table_name = config.get(\"table\", \"\")\n self.database = config.get(\"database\", \"\")\n self.rules = config.get(\"validations\", [])\n\n return True\n except (FileNotFoundError, json.JSONDecodeError) as e:\n print(f\"Error loading config: {e}\", file=sys.stderr)\n return False\n\n def save_config(self, filepath: str) -> bool:\n \"\"\"\n Save configuration to JSON file.\n\n Args:\n filepath: Path to save JSON file\n\n Returns:\n True if successful, False otherwise\n \"\"\"\n try:\n with open(filepath, \"w\") as f:\n json.dump(self.get_config_dict(), f, indent=2)\n return True\n except Exception as e:\n print(f\"Error saving config: {e}\", file=sys.stderr)\n return False\n\n\ndef interactive_mode(configurator: ValidationRuleConfigurator):\n \"\"\"\n Run interactive configuration mode.\n\n Args:\n configurator: ValidationRuleConfigurator instance\n \"\"\"\n print(\"\\n\" + \"=\" * 60)\n print(\"Data Validation Rule Configurator\")\n print(\"=\" * 60 + \"\\n\")\n\n # Get table and database info\n configurator.table_name = input(\"Enter table name: \").strip()\n if not configurator.table_name:\n print(\"Error: Table name is required\")\n sys.exit(1)\n\n configurator.database = input(\"Enter database name (optional): \").strip()\n\n print(\"\\nConfigure validation rules for this table.\")\n print(\"Enter 'help' for rule descriptions, 'done' when finished.\\n\")\n\n while True:\n print(\"\\nAvailable rules:\")\n print(\" 1. not-null - Column cannot contain NULL values\")\n print(\" 2. unique - Column values must be unique\")\n print(\" 3. range - Column values must be within min/max\")\n print(\" 4. pattern - Column values must match regex pattern\")\n print(\" 5. custom - Custom SQL validation query\")\n print(\" 6. list - Show current rules\")\n print(\" 7. remove - Remove a rule\")\n print(\" 8. done - Finish configuration\")\n\n choice = input(\"\\nEnter rule type (1-8): \").strip().lower()\n\n if choice in [\"done\", \"8\"]:\n break\n elif choice == \"help\":\n print_help()\n elif choice in [\"1\", \"not-null\"]:\n column = input(\"Enter column name: \").strip()\n if column:\n configurator.add_not_null_rule(column)\n print(f\"✓ Added NOT NULL rule for {column}\")\n elif choice in [\"2\", \"unique\"]:\n column = input(\"Enter column name: \").strip()\n if column:\n configurator.add_unique_rule(column)\n print(f\"✓ Added UNIQUE rule for {column}\")\n elif choice in [\"3\", \"range\"]:\n column = input(\"Enter column name: \").strip()\n try:\n min_val = float(input(\"Enter minimum value: \"))\n max_val = float(input(\"Enter maximum value: \"))\n configurator.add_range_rule(column, min_val, max_val)\n print(f\"✓ Added RANGE rule for {column} [{min_val}, {max_val}]\")\n except ValueError:\n print(\"Error: Invalid numeric values\")\n elif choice in [\"4\", \"pattern\"]:\n column = input(\"Enter column name: \").strip()\n pattern = input(\"Enter regex pattern: \").strip()\n if column and pattern:\n configurator.add_pattern_rule(column, pattern)\n print(f\"✓ Added PATTERN rule for {column}\")\n elif choice in [\"5\", \"custom\"]:\n column = input(\"Enter column name: \").strip()\n query = input(\"Enter SQL query: \").strip()\n if column and query:\n configurator.add_custom_rule(column, query)\n print(f\"✓ Added CUSTOM rule for {column}\")\n elif choice in [\"6\", \"list\"]:\n list_rules(configurator)\n elif choice in [\"7\", \"remove\"]:\n list_rules(configurator)\n try:\n idx = int(input(\"Enter rule number to remove: \")) - 1\n if configurator.remove_rule(idx):\n print(f\"✓ Removed rule {idx + 1}\")\n else:\n print(\"Error: Invalid rule number\")\n except ValueError:\n print(\"Error: Invalid input\")\n else:\n print(\"Invalid choice. Please try again.\")\n\n # Summary and save\n print(\"\\n\" + \"=\" * 60)\n print(\"Configuration Summary\")\n print(\"=\" * 60)\n print(f\"Table: {configurator.table_name}\")\n print(f\"Database: {configurator.database or '(none specified)'}\")\n print(f\"Total Rules: {len(configurator.rules)}\\n\")\n\n list_rules(configurator)\n\n # Save configuration\n save_choice = input(\"\\nSave configuration? (y/n): \").strip().lower()\n if save_choice in [\"y\", \"yes\"]:\n filepath = input(\"Enter filename to save (default: validation_rules.json): \").strip()\n if not filepath:\n filepath = \"validation_rules.json\"\n\n if configurator.save_config(filepath):\n print(f\"✓ Configuration saved to {filepath}\")\n else:\n print(\"Error: Failed to save configuration\")\n sys.exit(1)\n\n\ndef list_rules(configurator: ValidationRuleConfigurator):\n \"\"\"\n Display configured rules.\n\n Args:\n configurator: ValidationRuleConfigurator instance\n \"\"\"\n if not configurator.rules:\n print(\"No rules configured yet.\")\n return\n\n print(\"\\nConfigured Rules:\")\n print(\"-\" * 60)\n for i, rule in enumerate(configurator.rules, 1):\n rule_type = rule.get(\"rule\", \"unknown\").upper()\n column = rule.get(\"column\", \"N/A\")\n description = rule.get(\"description\", \"\")\n\n print(f\"{i}. [{rule_type}] {column}\")\n print(f\" {description}\")\n\n # Show additional details based on rule type\n if rule.get(\"rule\") == \"range\":\n print(f\" Range: [{rule.get('min')}, {rule.get('max')}]\")\n elif rule.get(\"rule\") == \"pattern\":\n print(f\" Pattern: {rule.get('pattern')}\")\n elif rule.get(\"rule\") == \"custom\":\n print(f\" Query: {rule.get('query')}\")\n\n print(\"-\" * 60)\n\n\ndef print_help():\n \"\"\"Print help information.\"\"\"\n help_text = r\"\"\"\nRule Types:\n\nNOT NULL\n Description: Ensures column contains no NULL values\n Example: Validate that 'user_id' is never NULL\n\nUNIQUE\n Description: Ensures all values in column are unique\n Example: Email addresses must be unique for user table\n\nRANGE\n Description: Ensures numeric values fall within min/max bounds\n Example: Age must be between 0 and 150\n Price must be between 0 and 999999\n\nPATTERN\n Description: Ensures values match a regular expression\n Example: Email format: ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$\n Phone: ^\\+?1?\\d{9,15}$\n\nCUSTOM\n Description: Run custom SQL query for complex validations\n Example: SELECT COUNT(*) FROM users WHERE created_at > updated_at\n (Ensures created_at is always before updated_at)\n\"\"\"\n print(help_text)\n\n\ndef create_from_args(args: argparse.Namespace) -> ValidationRuleConfigurator:\n \"\"\"\n Create configurator from command-line arguments.\n\n Args:\n args: Parsed arguments\n\n Returns:\n Configured ValidationRuleConfigurator\n \"\"\"\n configurator = ValidationRuleConfigurator()\n configurator.table_name = args.table\n configurator.database = args.database\n\n # Add rules from arguments\n if args.not_null:\n for column in args.not_null.split(\",\"):\n configurator.add_not_null_rule(column.strip())\n\n if args.unique:\n for column in args.unique.split(\",\"):\n configurator.add_unique_rule(column.strip())\n\n if args.range:\n # Format: \"column:min:max\" or \"column:min:max,column2:min2:max2\"\n for range_spec in args.range.split(\",\"):\n parts = range_spec.split(\":\")\n if len(parts) == 3:\n try:\n column = parts[0].strip()\n min_val = float(parts[1].strip())\n max_val = float(parts[2].strip())\n configurator.add_range_rule(column, min_val, max_val)\n except ValueError:\n print(f\"Warning: Invalid range specification: {range_spec}\")\n\n return configurator\n\n\ndef main():\n \"\"\"Main entry point for rule configuration.\"\"\"\n parser = argparse.ArgumentParser(\n description=\"Configure data validation rules for database tables\",\n formatter_class=argparse.RawDescriptionHelpFormatter,\n epilog=\"\"\"\nExamples:\n # Interactive mode (recommended)\n %(prog)s\n\n # Command-line mode with single table\n %(prog)s --table users --database mydb \\\\\n --not-null id,email \\\\\n --unique email \\\\\n --output rules.json\n\n # With range validation\n %(prog)s --table products --database catalog \\\\\n --range \"price:0:10000,quantity:0:1000000\" \\\\\n --output rules.json\n\n # Load and modify existing rules\n %(prog)s --load rules.json --not-null phone --output rules.json\n \"\"\",\n )\n\n parser.add_argument(\"--table\", help=\"Table name for non-interactive mode\")\n parser.add_argument(\"--database\", help=\"Database name\")\n parser.add_argument(\"--not-null\", help=\"Comma-separated columns that must not be NULL\")\n parser.add_argument(\"--unique\", help=\"Comma-separated columns that must be unique\")\n parser.add_argument(\"--range\", help=\"Range validations in format: col:min:max,col2:min2:max2\")\n parser.add_argument(\"--load\", help=\"Load existing configuration file\")\n parser.add_argument(\"--output\", help=\"Output file for configuration (JSON)\")\n\n args = parser.parse_args()\n\n try:\n # Determine mode\n if args.load:\n # Load existing config\n configurator = ValidationRuleConfigurator()\n if not configurator.load_config(args.load):\n sys.exit(1)\n print(f\"Loaded configuration from {args.load}\")\n list_rules(configurator)\n elif args.table:\n # Command-line mode\n configurator = create_from_args(args)\n else:\n # Interactive mode\n configurator = ValidationRuleConfigurator()\n interactive_mode(configurator)\n sys.exit(0)\n\n # If table specified, save configuration\n if args.table or args.load:\n if args.table and not configurator.table_name:\n configurator.table_name = args.table\n\n output_file = args.output or \"validation_rules.json\"\n\n if configurator.save_config(output_file):\n print(f\"\\n✓ Configuration saved to {output_file}\")\n print(f\"Total rules: {len(configurator.rules)}\")\n sys.exit(0)\n else:\n sys.exit(1)\n\n except Exception as e:\n print(f\"Error: {e}\", file=sys.stderr)\n sys.exit(1)\n\n\nif __name__ == \"__main__\":\n main()\n","content_type":"text/x-python; charset=utf-8","language":"python","size":14669,"content_sha256":"dedc87f7fcabf041f6ca27666206d3a6f61c155515cac28888b26f158152ebc0"},{"filename":"scripts/generate_validation_report.py","content":"#!/usr/bin/env python3\n\"\"\"\nGenerate comprehensive report of data validation results.\n\nThis script creates detailed HTML, JSON, or Markdown reports from validation results,\nincluding statistics, identified issues, trend analysis, and recommendations.\n\"\"\"\n\nimport argparse\nimport json\nimport sys\nfrom datetime import datetime\nfrom typing import Dict, Any\n\n\nclass ValidationReportGenerator:\n \"\"\"Generates comprehensive validation reports.\"\"\"\n\n def __init__(self):\n \"\"\"Initialize report generator.\"\"\"\n self.results = []\n self.metadata = {}\n\n def load_results(self, filepath: str) -> bool:\n \"\"\"\n Load validation results from JSON file.\n\n Args:\n filepath: Path to JSON results file\n\n Returns:\n True if successful, False otherwise\n \"\"\"\n try:\n with open(filepath, \"r\") as f:\n data = json.load(f)\n\n self.results = data.get(\"validations\", [])\n self.metadata = {\n \"table\": data.get(\"table\", \"unknown\"),\n \"timestamp\": data.get(\"timestamp\", datetime.now().isoformat()),\n \"statistics\": data.get(\"statistics\", {}),\n }\n\n return True\n except (FileNotFoundError, json.JSONDecodeError) as e:\n print(f\"Error loading results: {e}\", file=sys.stderr)\n return False\n\n def generate_summary(self) -> Dict[str, Any]:\n \"\"\"\n Generate summary statistics from results.\n\n Returns:\n Dictionary with summary stats\n \"\"\"\n if not self.results:\n return {\"total\": 0, \"passed\": 0, \"failed\": 0, \"pass_rate\": 0.0, \"issues\": []}\n\n total = len(self.results)\n passed = sum(1 for r in self.results if r.get(\"valid\", False))\n failed = total - passed\n\n issues = []\n for result in self.results:\n if not result.get(\"valid\", False):\n details = result.get(\"details\", {})\n issues.append(\n {\n \"rule\": details.get(\"rule\", \"unknown\"),\n \"column\": details.get(\"column\", \"unknown\"),\n \"severity\": self._determine_severity(details.get(\"rule\")),\n }\n )\n\n return {\n \"total\": total,\n \"passed\": passed,\n \"failed\": failed,\n \"pass_rate\": (passed / total * 100) if total > 0 else 0,\n \"issues\": issues,\n \"critical_issues\": sum(1 for i in issues if i[\"severity\"] == \"critical\"),\n \"high_issues\": sum(1 for i in issues if i[\"severity\"] == \"high\"),\n \"medium_issues\": sum(1 for i in issues if i[\"severity\"] == \"medium\"),\n }\n\n def _determine_severity(self, rule: str) -> str:\n \"\"\"\n Determine issue severity based on rule type.\n\n Args:\n rule: Validation rule name\n\n Returns:\n Severity level (critical, high, medium, low)\n \"\"\"\n critical_rules = [\"not_null\", \"unique\", \"foreign_key\"]\n high_rules = [\"range\", \"pattern\"]\n medium_rules = [\"custom\"]\n\n if rule in critical_rules:\n return \"critical\"\n elif rule in high_rules:\n return \"high\"\n elif rule in medium_rules:\n return \"medium\"\n else:\n return \"low\"\n\n def generate_json_report(self) -> str:\n \"\"\"Generate JSON report.\"\"\"\n summary = self.generate_summary()\n\n report = {\n \"metadata\": self.metadata,\n \"summary\": summary,\n \"timestamp\": datetime.now().isoformat(),\n \"validations\": self.results,\n }\n\n return json.dumps(report, indent=2)\n\n def generate_markdown_report(self) -> str:\n \"\"\"Generate Markdown report.\"\"\"\n summary = self.generate_summary()\n\n md = []\n md.append(\"# Data Validation Report\")\n md.append(f\"\\n**Table:** {self.metadata.get('table', 'Unknown')}\")\n md.append(f\"**Generated:** {datetime.now().isoformat()}\")\n md.append(\"\")\n\n # Summary section\n md.append(\"## Executive Summary\\n\")\n md.append(\"| Metric | Value |\")\n md.append(\"|--------|-------|\")\n md.append(f\"| Total Checks | {summary['total']} |\")\n md.append(f\"| Passed | {summary['passed']} |\")\n md.append(f\"| Failed | {summary['failed']} |\")\n md.append(f\"| Pass Rate | {summary['pass_rate']:.1f}% |\")\n md.append(\"\")\n\n # Table statistics\n stats = self.metadata.get(\"statistics\", {})\n if stats and \"error\" not in stats:\n md.append(\"## Table Statistics\\n\")\n md.append(f\"- **Total Rows:** {stats.get('row_count', 'N/A')}\")\n md.append(f\"- **Total Columns:** {stats.get('column_count', 'N/A')}\")\n md.append(\"\")\n\n # Issues section\n if summary[\"failed\"] > 0:\n md.append(\"## Issues Identified\\n\")\n\n if summary[\"critical_issues\"] > 0:\n md.append(\"### Critical Issues\\n\")\n for issue in summary[\"issues\"]:\n if issue[\"severity\"] == \"critical\":\n md.append(f\"- **{issue['rule']}** on column `{issue['column']}`\")\n md.append(\"\")\n\n if summary[\"high_issues\"] > 0:\n md.append(\"### High Priority Issues\\n\")\n for issue in summary[\"issues\"]:\n if issue[\"severity\"] == \"high\":\n md.append(f\"- **{issue['rule']}** on column `{issue['column']}`\")\n md.append(\"\")\n\n if summary[\"medium_issues\"] > 0:\n md.append(\"### Medium Priority Issues\\n\")\n for issue in summary[\"issues\"]:\n if issue[\"severity\"] == \"medium\":\n md.append(f\"- **{issue['rule']}** on column `{issue['column']}`\")\n md.append(\"\")\n\n # Detailed results\n if summary[\"failed\"] > 0:\n md.append(\"## Detailed Validation Results\\n\")\n for result in self.results:\n if not result.get(\"valid\", False):\n details = result.get(\"details\", {})\n rule = details.get(\"rule\", \"unknown\")\n column = details.get(\"column\", \"N/A\")\n\n md.append(f\"### {rule.replace('_', ' ').title()} - {column}\\n\")\n\n if \"error\" in details:\n md.append(f\"**Error:** {details['error']}\\n\")\n elif rule == \"not_null\":\n md.append(f\"**Issue:** Found {details.get('null_count', 0)} NULL values\\n\")\n elif rule == \"unique\":\n md.append(f\"**Issue:** Found {details.get('duplicate_count', 0)} duplicate groups\\n\")\n elif rule == \"range\":\n md.append(\n f\"**Issue:** {details.get('out_of_range_count', 0)} values \"\n f\"outside range [{details.get('min')}, {details.get('max')}]\\n\"\n )\n\n # Recommendations\n md.append(\"## Recommendations\\n\")\n if summary[\"critical_issues\"] > 0:\n md.append(\"1. **Immediately address critical issues** - These indicate data integrity problems\")\n if summary[\"high_issues\"] > 0:\n md.append(\"2. **Resolve high priority issues within 1 week** - These affect data quality\")\n if summary[\"failed\"] == 0:\n md.append(\"✅ **All validations passed!** - Data integrity is maintained\")\n else:\n md.append(\"3. **Review validation rules** - Ensure they match business requirements\")\n md.append(\"4. **Implement data quality monitoring** - Set up alerts for recurring issues\")\n\n md.append(\"\")\n md.append(\"---\")\n md.append(\"*Report generated by Data Validation Engine*\")\n\n return \"\\n\".join(md)\n\n def generate_html_report(self) -> str:\n \"\"\"Generate HTML report.\"\"\"\n summary = self.generate_summary()\n stats = self.metadata.get(\"statistics\", {})\n\n html = [\n \"\u003c!DOCTYPE html>\",\n \"\u003chtml>\",\n \"\u003chead>\",\n \"\u003cmeta charset='UTF-8'>\",\n \"\u003cmeta name='viewport' content='width=device-width, initial-scale=1.0'>\",\n \"\u003ctitle>Data Validation Report\u003c/title>\",\n \"\u003cstyle>\",\n \"body { font-family: Arial, sans-serif; margin: 20px; background-color: #f5f5f5; }\",\n \".container { max-width: 1000px; margin: 0 auto; background-color: white; padding: 20px; border-radius: 8px; box-shadow: 0 2px 4px rgba(0,0,0,0.1); }\",\n \"h1 { color: #333; border-bottom: 3px solid #007bff; padding-bottom: 10px; }\",\n \"h2 { color: #555; margin-top: 30px; }\",\n \"table { width: 100%; border-collapse: collapse; margin: 15px 0; }\",\n \"th, td { padding: 12px; text-align: left; border-bottom: 1px solid #ddd; }\",\n \"th { background-color: #007bff; color: white; }\",\n \"tr:hover { background-color: #f9f9f9; }\",\n \".summary { display: grid; grid-template-columns: repeat(auto-fit, minmax(200px, 1fr)); gap: 20px; margin: 20px 0; }\",\n \".metric { background-color: #f0f0f0; padding: 15px; border-radius: 5px; }\",\n \".metric .value { font-size: 24px; font-weight: bold; color: #007bff; }\",\n \".metric .label { color: #666; margin-top: 5px; }\",\n \".critical { color: #dc3545; font-weight: bold; }\",\n \".high { color: #ff6b6b; font-weight: bold; }\",\n \".medium { color: #ffc107; font-weight: bold; }\",\n \".success { color: #28a745; font-weight: bold; }\",\n \".issue-list { margin: 15px 0; }\",\n \".issue { background-color: #fff3cd; padding: 10px; margin: 5px 0; border-left: 4px solid #ffc107; border-radius: 3px; }\",\n \".footer { text-align: center; color: #999; margin-top: 30px; padding-top: 15px; border-top: 1px solid #ddd; }\",\n \"\u003c/style>\",\n \"\u003c/head>\",\n \"\u003cbody>\",\n \"\u003cdiv class='container'>\",\n \"\u003ch1>Data Validation Report\u003c/h1>\",\n f\"\u003cp>\u003cstrong>Table:\u003c/strong> {self.metadata.get('table', 'Unknown')}\u003c/p>\",\n f\"\u003cp>\u003cstrong>Generated:\u003c/strong> {datetime.now().isoformat()}\u003c/p>\",\n ]\n\n # Summary metrics\n html.append(\"\u003ch2>Summary\u003c/h2>\")\n html.append(\"\u003cdiv class='summary'>\")\n html.append(\n f\"\u003cdiv class='metric'>\u003cdiv class='value'>{summary['total']}\u003c/div>\u003cdiv class='label'>Total Checks\u003c/div>\u003c/div>\"\n )\n html.append(\n f\"\u003cdiv class='metric'>\u003cdiv class='value success'>{summary['passed']}\u003c/div>\u003cdiv class='label'>Passed\u003c/div>\u003c/div>\"\n )\n html.append(\n f\"\u003cdiv class='metric'>\u003cdiv class='value critical'>{summary['failed']}\u003c/div>\u003cdiv class='label'>Failed\u003c/div>\u003c/div>\"\n )\n html.append(\n f\"\u003cdiv class='metric'>\u003cdiv class='value'>{summary['pass_rate']:.1f}%\u003c/div>\u003cdiv class='label'>Pass Rate\u003c/div>\u003c/div>\"\n )\n html.append(\"\u003c/div>\")\n\n # Table statistics\n if stats and \"error\" not in stats:\n html.append(\"\u003ch2>Table Statistics\u003c/h2>\")\n html.append(\"\u003ctable>\")\n html.append(\"\u003ctr>\u003cth>Metric\u003c/th>\u003cth>Value\u003c/th>\u003c/tr>\")\n html.append(f\"\u003ctr>\u003ctd>Total Rows\u003c/td>\u003ctd>{stats.get('row_count', 'N/A')}\u003c/td>\u003c/tr>\")\n html.append(f\"\u003ctr>\u003ctd>Total Columns\u003c/td>\u003ctd>{stats.get('column_count', 'N/A')}\u003c/td>\u003c/tr>\")\n html.append(\"\u003c/table>\")\n\n # Issues\n if summary[\"failed\"] > 0:\n html.append(\"\u003ch2>Issues\u003c/h2>\")\n html.append(\"\u003cdiv class='issue-list'>\")\n\n for issue in summary[\"issues\"]:\n severity_class = issue[\"severity\"]\n html.append(\n f\"\u003cdiv class='issue'>\"\n f\"\u003cspan class='{severity_class}'>{issue['severity'].upper()}\u003c/span>: \"\n f\"\u003cstrong>{issue['rule']}\u003c/strong> on column \u003ccode>{issue['column']}\u003c/code>\"\n f\"\u003c/div>\"\n )\n\n html.append(\"\u003c/div>\")\n else:\n html.append(\"\u003ch2>Results\u003c/h2>\")\n html.append(\"\u003cp class='success'>✅ All validations passed!\u003c/p>\")\n\n # Footer\n html.append(\"\u003cdiv class='footer'>\")\n html.append(\"\u003cp>Report generated by Data Validation Engine\u003c/p>\")\n html.append(\"\u003c/div>\")\n\n html.append(\"\u003c/div>\")\n html.append(\"\u003c/body>\")\n html.append(\"\u003c/html>\")\n\n return \"\\n\".join(html)\n\n\ndef main():\n \"\"\"Main entry point for report generation.\"\"\"\n parser = argparse.ArgumentParser(\n description=\"Generate comprehensive data validation reports\",\n formatter_class=argparse.RawDescriptionHelpFormatter,\n epilog=\"\"\"\nExamples:\n %(prog)s --results validation.json\n %(prog)s --results validation.json --format markdown\n %(prog)s --results validation.json --format html --output report.html\n %(prog)s --results validation.json --format json --output report.json\n \"\"\",\n )\n\n parser.add_argument(\"--results\", required=True, help=\"Path to JSON file containing validation results\")\n parser.add_argument(\"--format\", default=\"markdown\", choices=[\"json\", \"markdown\", \"html\"], help=\"Report format\")\n parser.add_argument(\"--output\", help=\"Output file for report\")\n parser.add_argument(\"--verbose\", action=\"store_true\", help=\"Print detailed output\")\n\n args = parser.parse_args()\n\n try:\n generator = ValidationReportGenerator()\n\n # Load results\n if args.verbose:\n print(f\"Loading results from {args.results}...\", file=sys.stderr)\n\n if not generator.load_results(args.results):\n sys.exit(1)\n\n # Generate report\n if args.format == \"json\":\n report = generator.generate_json_report()\n elif args.format == \"html\":\n report = generator.generate_html_report()\n else: # markdown\n report = generator.generate_markdown_report()\n\n # Print report\n print(report)\n\n # Save to file if requested\n if args.output:\n with open(args.output, \"w\") as f:\n f.write(report)\n\n if args.verbose:\n print(f\"\\nReport saved to {args.output}\", file=sys.stderr)\n\n sys.exit(0)\n\n except Exception as e:\n print(f\"Error: {e}\", file=sys.stderr)\n sys.exit(1)\n\n\nif __name__ == \"__main__\":\n main()\n","content_type":"text/x-python; charset=utf-8","language":"python","size":14526,"content_sha256":"5c1250724dd83ba0d83e91b38d574017ed7e2d98588fdd195d3ee362ce219c3e"},{"filename":"scripts/README.md","content":"# Scripts\n\nBundled resources for data-validation-engine skill\n\n- [x] validate_data.py: Script to execute data validation checks against a specified database and table, logging any discrepancies.\n- [x] generate_validation_report.py: Script to generate a comprehensive report of data validation results, including statistics and identified issues.\n- [x] configure_validation_rules.py: Script to interactively configure data validation rules for a given database and table, storing the rules in a configuration file.\n","content_type":"text/markdown; charset=utf-8","language":"markdown","size":514,"content_sha256":"54291eea4c81c1f2def66238523290db9c90e1d5313531075403b437e7abe017"}],"content_json":{"type":"doc","content":[{"type":"heading","attrs":{"level":1},"content":[{"text":"Data Validation Engine","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Overview","type":"text"}]},{"type":"paragraph","content":[{"text":"Implement and enforce data integrity rules at the database level using CHECK constraints, triggers, foreign keys, and custom validation functions across PostgreSQL and MySQL.","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Prerequisites","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Database credentials with ALTER TABLE and CREATE FUNCTION permissions","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"psql","type":"text","marks":[{"type":"code_inline"}]},{"text":" or ","type":"text"},{"text":"mysql","type":"text","marks":[{"type":"code_inline"}]},{"text":" CLI for executing validation queries","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Current schema documentation or access to ","type":"text"},{"text":"information_schema","type":"text","marks":[{"type":"code_inline"}]},{"text":" for column specifications","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Business rules document describing valid data ranges, formats, and relationships","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Backup of production data before applying new constraints (constraints may reject existing invalid data)","type":"text"}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Instructions","type":"text"}]},{"type":"ordered_list","attrs":{"order":1,"listStyle":"number"},"content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Audit existing data quality by running validation queries before adding constraints. Check for NULL values in columns that should be required: ","type":"text"},{"text":"SELECT column_name, COUNT(*) FILTER (WHERE column_name IS NULL) AS null_count, COUNT(*) AS total FROM table_name GROUP BY column_name","type":"text","marks":[{"type":"code_inline"}]},{"text":".","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Detect orphaned records (broken referential integrity): ","type":"text"},{"text":"SELECT c.id FROM child_table c LEFT JOIN parent_table p ON c.parent_id = p.id WHERE p.id IS NULL","type":"text","marks":[{"type":"code_inline"}]},{"text":". Document all orphaned records for cleanup or archival before adding foreign key constraints.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Validate data format compliance:","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Email format: ","type":"text"},{"text":"SELECT email FROM users WHERE email !~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}

Data Validation Engine Overview Implement and enforce data integrity rules at the database level using CHECK constraints, triggers, foreign keys, and custom validation functions across PostgreSQL and MySQL. Prerequisites - Database credentials with ALTER TABLE and CREATE FUNCTION permissions - or CLI for executing validation queries - Current schema documentation or access to for column specifications - Business rules document describing valid data ranges, formats, and relationships - Backup of production data before applying new constraints (constraints may reject existing invalid data) Inst…

","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Phone format: ","type":"text"},{"text":"SELECT phone FROM contacts WHERE phone !~ '^\\+?[1-9]\\d{6,14}

Data Validation Engine Overview Implement and enforce data integrity rules at the database level using CHECK constraints, triggers, foreign keys, and custom validation functions across PostgreSQL and MySQL. Prerequisites - Database credentials with ALTER TABLE and CREATE FUNCTION permissions - or CLI for executing validation queries - Current schema documentation or access to for column specifications - Business rules document describing valid data ranges, formats, and relationships - Backup of production data before applying new constraints (constraints may reject existing invalid data) Inst…

","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"URL format: ","type":"text"},{"text":"SELECT url FROM links WHERE url !~ '^https?://.+'","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Date ranges: ","type":"text"},{"text":"SELECT * FROM events WHERE start_date > end_date","type":"text","marks":[{"type":"code_inline"}]}]}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Check numeric range violations: ","type":"text"},{"text":"SELECT * FROM products WHERE price \u003c 0 OR price > 999999.99","type":"text","marks":[{"type":"code_inline"}]},{"text":" and ","type":"text"},{"text":"SELECT * FROM users WHERE age \u003c 0 OR age > 150","type":"text","marks":[{"type":"code_inline"}]},{"text":". Map each column to its valid range based on business rules.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Identify duplicate records that violate intended uniqueness: ","type":"text"},{"text":"SELECT email, COUNT(*) FROM users GROUP BY email HAVING COUNT(*) > 1","type":"text","marks":[{"type":"code_inline"}]},{"text":". Determine which duplicate to keep (most recent, most complete) and plan deduplication.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Generate CHECK constraints for validated rules:","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"ALTER TABLE products ADD CONSTRAINT chk_price_positive CHECK (price >= 0)","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"ALTER TABLE users ADD CONSTRAINT chk_email_format CHECK (email ~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}

Data Validation Engine Overview Implement and enforce data integrity rules at the database level using CHECK constraints, triggers, foreign keys, and custom validation functions across PostgreSQL and MySQL. Prerequisites - Database credentials with ALTER TABLE and CREATE FUNCTION permissions - or CLI for executing validation queries - Current schema documentation or access to for column specifications - Business rules document describing valid data ranges, formats, and relationships - Backup of production data before applying new constraints (constraints may reject existing invalid data) Inst…

)","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"ALTER TABLE events ADD CONSTRAINT chk_date_order CHECK (start_date \u003c= end_date)","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"ALTER TABLE orders ADD CONSTRAINT chk_status_valid CHECK (status IN ('pending', 'processing', 'shipped', 'delivered', 'cancelled'))","type":"text","marks":[{"type":"code_inline"}]}]}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Create foreign key constraints with appropriate cascade behavior:","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"ALTER TABLE orders ADD CONSTRAINT fk_orders_customer FOREIGN KEY (customer_id) REFERENCES customers(id) ON DELETE RESTRICT","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Use ","type":"text"},{"text":"ON DELETE CASCADE","type":"text","marks":[{"type":"code_inline"}]},{"text":" for dependent data (order_items when order is deleted)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Use ","type":"text"},{"text":"ON DELETE SET NULL","type":"text","marks":[{"type":"code_inline"}]},{"text":" for optional relationships (assigned_to when user is deactivated)","type":"text"}]}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Implement complex business rule validation using database triggers when CHECK constraints are insufficient:","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Trigger that prevents order total from exceeding customer credit limit","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Trigger that enforces at least one admin user per organization","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Trigger that validates JSON schema for JSONB columns","type":"text"}]}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Apply constraints in a safe two-phase approach:","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Phase 1: Run validation queries to find all violations. Generate data cleanup scripts. Execute cleanup.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Phase 2: Apply constraints with ","type":"text"},{"text":"NOT VALID","type":"text","marks":[{"type":"code_inline"}]},{"text":" option (PostgreSQL): ","type":"text"},{"text":"ALTER TABLE users ADD CONSTRAINT chk_email CHECK (email ~ '...') NOT VALID","type":"text","marks":[{"type":"code_inline"}]},{"text":" then ","type":"text"},{"text":"ALTER TABLE users VALIDATE CONSTRAINT chk_email","type":"text","marks":[{"type":"code_inline"}]},{"text":" (validates existing data without blocking writes).","type":"text"}]}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Generate a data quality report summarizing: total records per table, violation counts by constraint type, cleanup actions taken, constraints applied, and remaining data quality issues requiring manual review.","type":"text"}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Output","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Data quality audit report","type":"text","marks":[{"type":"strong"}]},{"text":" with violation counts, examples, and severity ratings","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Data cleanup scripts","type":"text","marks":[{"type":"strong"}]},{"text":" (SQL) to fix violations before constraint application","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Constraint DDL scripts","type":"text","marks":[{"type":"strong"}]},{"text":" with CHECK, FOREIGN KEY, NOT NULL, and UNIQUE constraints","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Validation triggers","type":"text","marks":[{"type":"strong"}]},{"text":" for complex business rules beyond simple constraints","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Ongoing validation queries","type":"text","marks":[{"type":"strong"}]},{"text":" for periodic data quality monitoring","type":"text"}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Error Handling","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Error","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Cause","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Solution","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"check constraint violated by existing row","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Existing data fails the new constraint","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Run the validation query first to find violations; clean up data; use ","type":"text"},{"text":"NOT VALID","type":"text","marks":[{"type":"code_inline"}]},{"text":" option to add constraint without checking existing data, then validate separately","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"cannot add foreign key: referenced row not found","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Orphaned child records reference non-existent parent","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Clean up orphaned records first with DELETE or UPDATE to valid parent; or insert missing parent records","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"column cannot be made NOT NULL: contains NULL values","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Existing rows have NULL in the target column","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Backfill NULLs with ","type":"text"},{"text":"UPDATE table SET column = default_value WHERE column IS NULL","type":"text","marks":[{"type":"code_inline"}]},{"text":" before adding NOT NULL","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Trigger function causes performance regression","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Complex validation logic executes on every INSERT/UPDATE","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Optimize trigger function; use WHEN clause to limit trigger firing; consider CHECK constraints instead of triggers for simple rules","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Circular foreign key prevents constraint creation","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Tables reference each other, preventing creation order","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Use ","type":"text"},{"text":"ALTER TABLE ADD CONSTRAINT","type":"text","marks":[{"type":"code_inline"}]},{"text":" after both tables exist; or use ","type":"text"},{"text":"DEFERRABLE INITIALLY DEFERRED","type":"text","marks":[{"type":"code_inline"}]},{"text":" constraints","type":"text"}]}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Examples","type":"text"}]},{"type":"paragraph","content":[{"text":"Auditing a legacy database with 50,000 invalid email addresses","type":"text","marks":[{"type":"strong"}]},{"text":": Validation query reveals 50,000 of 2M user records have invalid email formats (missing @, double dots, spaces). A cleanup script normalizes common issues (trim whitespace, lowercase) and flags 3,000 unfixable records for manual review. After cleanup, a CHECK constraint with regex validation is applied.","type":"text"}]},{"type":"paragraph","content":[{"text":"Enforcing referential integrity on a database without foreign keys","type":"text","marks":[{"type":"strong"}]},{"text":": An application relied on application-level FK enforcement, resulting in 12,000 orphaned order_items, 800 orphaned payments, and 200 orphaned reviews. Cleanup scripts archive orphaned records to backup tables, then foreign key constraints with ","type":"text"},{"text":"ON DELETE CASCADE","type":"text","marks":[{"type":"code_inline"}]},{"text":" are added. A nightly validation job monitors for new orphans.","type":"text"}]},{"type":"paragraph","content":[{"text":"Implementing business rules for a financial application","type":"text","marks":[{"type":"strong"}]},{"text":": Constraints enforce: account balance cannot be negative (","type":"text"},{"text":"CHECK (balance >= 0)","type":"text","marks":[{"type":"code_inline"}]},{"text":"), transfer amount must be positive (","type":"text"},{"text":"CHECK (amount > 0)","type":"text","marks":[{"type":"code_inline"}]},{"text":"), transaction date cannot be in the future (","type":"text"},{"text":"CHECK (transaction_date \u003c= CURRENT_DATE)","type":"text","marks":[{"type":"code_inline"}]},{"text":"), and a trigger prevents transfers between accounts owned by different customers unless explicitly authorized.","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Resources","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"PostgreSQL CHECK constraints: https://www.postgresql.org/docs/current/ddl-constraints.html","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"PostgreSQL triggers: https://www.postgresql.org/docs/current/triggers.html","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"MySQL CHECK constraints (8.0.16+): https://dev.mysql.com/doc/refman/8.0/en/create-table-check-constraints.html","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Data validation patterns: https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-CHECK-CONSTRAINTS","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"NOT VALID constraint option: https://www.postgresql.org/docs/current/sql-altertable.html","type":"text"}]}]}]},{"type":"hr","attrs":{"markup":"---"}}]},"metadata":{"date":"2026-06-05","name":"validating-database-integrity","tags":["database","validating-database"],"author":"@skillopedia","source":{"stars":2275,"repo_name":"claude-code-plugins-plus-skills","origin_url":"https://github.com/jeremylongshore/claude-code-plugins-plus-skills/blob/HEAD/plugins/database/data-validation-engine/skills/validating-database-integrity/SKILL.md","repo_owner":"jeremylongshore","body_sha256":"5e8d211de350138a21fb9d84d8b42a6a125ca11ac197951c672364445d0b81ed","cluster_key":"adb2cb8c378e90f00562b32555c303365c0576b4366d2961d90a327b425dc3fe","clean_bundle":{"format":"clean-skill-bundle-v1","source":"jeremylongshore/claude-code-plugins-plus-skills/plugins/database/data-validation-engine/skills/validating-database-integrity/SKILL.md","attachments":[{"id":"3225e530-8a0c-5b8b-9cc2-4d6291d769af","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/3225e530-8a0c-5b8b-9cc2-4d6291d769af/attachment.md","path":"assets/README.md","size":190,"sha256":"8e03eab6255fe7ae2637f81a943ab4461c9465f6f7d5ad713a90516a52190496","contentType":"text/markdown; charset=utf-8"},{"id":"f13e7fc4-eea7-532f-8a10-5da0ed03b232","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/f13e7fc4-eea7-532f-8a10-5da0ed03b232/attachment.md","path":"references/README.md","size":65,"sha256":"3aada9fbb2db449b37b7bd36644626be76b70fe1f508c0dcb40e6ae6bef1619f","contentType":"text/markdown; charset=utf-8"},{"id":"0b37136b-328f-5f37-b1b4-83803dae175f","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/0b37136b-328f-5f37-b1b4-83803dae175f/attachment.md","path":"scripts/README.md","size":514,"sha256":"54291eea4c81c1f2def66238523290db9c90e1d5313531075403b437e7abe017","contentType":"text/markdown; charset=utf-8"},{"id":"7fd21e88-f974-5d7f-8bf9-74a32f721728","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/7fd21e88-f974-5d7f-8bf9-74a32f721728/attachment.py","path":"scripts/configure_validation_rules.py","size":14669,"sha256":"dedc87f7fcabf041f6ca27666206d3a6f61c155515cac28888b26f158152ebc0","contentType":"text/x-python; charset=utf-8"},{"id":"8987d23f-0ae5-5b59-89b8-724af1a68b72","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/8987d23f-0ae5-5b59-89b8-724af1a68b72/attachment.py","path":"scripts/generate_validation_report.py","size":14526,"sha256":"5c1250724dd83ba0d83e91b38d574017ed7e2d98588fdd195d3ee362ce219c3e","contentType":"text/x-python; charset=utf-8"}],"bundle_sha256":"6e2a71fd4c139924e32d9f565393bfee2951cb79ed6432deafcf291eb86cc2e6","attachment_count":5,"text_attachments":5,"attachment_storage":"skillopedia-attachments-v1","binary_attachments":0,"excluded_attachments":[]},"cluster_size":1,"skill_md_path":"plugins/database/data-validation-engine/skills/validating-database-integrity/SKILL.md","import_metadata":{"date":"2026-06-05","author":"@skillopedia","version":"v1","category":"data-analytics","category_label":"Data"},"exact_dupes_collapsed_into_this":0},"license":"MIT","version":"v1","category":"data-analytics","import_tag":"clean-skills-v1","description":"Process use when you need to ensure database integrity through comprehensive data validation.\nThis skill validates data types, ranges, formats, referential integrity, and business rules.\nTrigger with phrases like \"validate database data\", \"implement data validation rules\",\n\"enforce data integrity constraints\", or \"validate data formats\".\n","allowed-tools":"Read, Write, Edit, Grep, Glob, Bash(psql:*), Bash(mysql:*)","compatibility":"Designed for Claude Code, also compatible with Codex and OpenClaw"}},"renderedAt":1782979292444}

Data Validation Engine Overview Implement and enforce data integrity rules at the database level using CHECK constraints, triggers, foreign keys, and custom validation functions across PostgreSQL and MySQL. Prerequisites - Database credentials with ALTER TABLE and CREATE FUNCTION permissions - or CLI for executing validation queries - Current schema documentation or access to for column specifications - Business rules document describing valid data ranges, formats, and relationships - Backup of production data before applying new constraints (constraints may reject existing invalid data) Inst…