phy-regex-audit — Skillopedia

ReDoS & Regex Quality Auditor One regex. One crafted input. Your Node.js server hangs for 30 seconds. ReDoS (Regular Expression Denial of Service) is real, underestimated, and embarrassingly fixable. This skill walks every source file in your project, extracts regex literals, identifies catastrophic backtracking patterns, and tells you exactly which ones are dangerous and how to fix them. Supports JS/TS, Python, Go, Java, Ruby, PHP, Rust. Zero external API. --- Trigger Phrases - "regex security", "ReDoS", "catastrophic backtracking" - "regex audit", "slow regex", "regex vulnerability" - "chec…

)):\n findings.append(ReDoSFinding(\n regex_match=rm,\n severity=Severity.MEDIUM,\n vulnerability_type='MISSING_ANCHORS',\n dangerous_subpattern=pattern[:40],\n description='Validator regex lacks ^ and $ anchors — matches anywhere in string.',\n attack_input_example='\"garbage_validinput_garbage\"',\n fix_suggestion=f'Add anchors: /^{pattern[:40]}$/',\n ))\n\n return findings\n\n\ndef check_locale_assumptions(rm: RegexMatch) -> Optional[ReDoSFinding]:\n \"\"\"Detect ASCII-only character classes that should be locale-aware.\"\"\"\n ASCII_ONLY_HINTS = [\n (re.compile(r'\\[a-zA-Z\\]|\\[a-z\\]|\\[A-Z\\]'), 'Matches only ASCII letters — fails on é, ü, ñ, etc.'),\n (re.compile(r'\\[0-9\\]'), 'Use \\\\d or [0-9] explicitly; be aware \\\\d matches Unicode digits in some engines'),\n ]\n for detector, msg in ASCII_ONLY_HINTS:\n if detector.search(rm.pattern):\n return ReDoSFinding(\n regex_match=rm,\n severity=Severity.LOW,\n vulnerability_type='LOCALE_ASSUMPTION',\n dangerous_subpattern=detector.search(rm.pattern).group(0),\n description=f'ASCII-only character class: {msg}',\n attack_input_example='\"Ångström\" or \"naïve\"',\n fix_suggestion='Use \\\\p{L} (Unicode letter) if your regex engine supports it, or explicitly list the characters you need.',\n )\n return None\n```\n\n---\n\n## Step 4: Run Full Scan\n\n```python\nimport os\nimport glob\n\ndef run_regex_audit(target_dir='.', min_severity=Severity.LOW, high_risk_only=False):\n \"\"\"Full scan: discover files, extract regexes, analyze, report.\"\"\"\n\n # Discover files\n all_files = []\n for ext in ['.js', '.jsx', '.ts', '.tsx', '.mjs', '.py', '.go', '.java', '.rb', '.php', '.rs']:\n pattern = f'{target_dir}/**/*{ext}'\n for f in glob.glob(pattern, recursive=True):\n if not any(skip in f for skip in ['node_modules', '.git', 'dist', 'build', '.next', 'vendor']):\n all_files.append(f)\n\n # Extract and analyze\n all_findings = []\n total_regexes = 0\n\n for fpath in all_files:\n regexes = extract_regexes_from_file(fpath)\n total_regexes += len(regexes)\n\n for rm in regexes:\n if high_risk_only and not rm.in_handler:\n continue\n\n findings = analyze_regex_for_redos(rm)\n locale_finding = check_locale_assumptions(rm)\n if locale_finding:\n findings.append(locale_finding)\n\n for f in findings:\n if f.severity.value >= min_severity.value:\n all_findings.append(f)\n\n # Sort by severity desc, then file\n all_findings.sort(key=lambda x: (-x.severity.value, x.regex_match.file, x.regex_match.line))\n\n return all_findings, total_regexes, len(all_files)\n```\n\n---\n\n## Step 5: Output Report\n\n```markdown\n## ReDoS & Regex Security Audit\nProject: my-app | Files scanned: 847 | Regexes found: 214\n\n---\n\n### Summary\n\n| Severity | Count | Description |\n|----------|-------|-------------|\n| 🔴 CRITICAL | 2 | Exponential backtracking — proven DoS vector |\n| 🟠 HIGH | 4 | Polynomial backtracking — slow on long inputs |\n| 🟡 MEDIUM | 7 | Potentially slow or logic error |\n| ⚪ LOW | 11 | Style/locale issues |\n\n**⚠️ 3 findings are in HTTP request handlers — prioritize these.**\n\n---\n\n### 🔴 CRITICAL — Exponential Backtracking\n\n**#1 — src/middleware/auth.js:47**\n```js\n// Context: JWT token format validator in Express middleware\napp.use('/api', (req, res, next) => {\n const token = req.headers.authorization\n if (!/^([a-zA-Z0-9_-]+\\.)+[a-zA-Z0-9_-]+$/.test(token)) { ... }\n```\n\n**Dangerous pattern:** `([a-zA-Z0-9_-]+\\.)+`\n**Vulnerability:** NESTED_QUANTIFIERS — the inner `+` and outer `+` create exponential backtracking.\n**Attack input:** `\"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\"` (no dot — triggers backtracking)\n**DoS potential:** Input of length 30 takes ~2 seconds on Node.js. Length 50 takes minutes.\n**In HTTP handler:** ⚠️ YES — any unauthenticated request can trigger this.\n\n**Fix:**\n```js\n// Before (vulnerable):\n/^([a-zA-Z0-9_-]+\\.)+[a-zA-Z0-9_-]+$/\n\n// After (safe):\n// Option 1: Possessive quantifier (not supported in JS — use atomic group via lookbehind trick)\n// Option 2: Rewrite without nested quantifiers:\n/^[a-zA-Z0-9_-]+(?:\\.[a-zA-Z0-9_-]+)+$/\n// This eliminates the nested quantifier — the outer group cannot backtrack\n// into positions already consumed by the inner match.\n```\n\n---\n\n**#2 — src/utils/email-validator.ts:12**\n```ts\nconst EMAIL_RE = /^(([^\u003c>()\\[\\]\\\\.,;:\\s@\"]+(\\.[^\u003c>()\\[\\]\\\\.,;:\\s@\"]+)*)|(\".+\"))@/\n```\n\n**Dangerous pattern:** `([^\u003c>...]+(\\.[^\u003c>...]+)*)`\n**Vulnerability:** NESTED_QUANTIFIERS inside alternation\n**Attack input:** `\"a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a@\"` (missing domain after @)\n**Fix:**\n```ts\n// Use a proven safe email regex:\nconst EMAIL_RE = /^[^\\s@]+@[^\\s@]+\\.[^\\s@]+$/\n// For strict RFC 5322: use the validator.js library (pre-audited, safe)\n// npm install validator → isEmail(str)\n```\n\n---\n\n### 🟠 HIGH — Polynomial Backtracking\n\n**#3 — src/api/search.js:89**\n```js\n// URL parameter parser\nconst queryRe = /.*id=.*&.*/\n```\n\n**Vulnerability:** GREEDY_DOTSTAR_ANCHORED — `.*...*` is O(n²)\n**Attack input:** 10,000-char query string with no `id=` → scans 10,000 × 10,000 positions\n**In HTTP handler:** ⚠️ YES — `req.query` passed directly\n\n**Fix:**\n```js\n// Before:\n/.*id=.*&.*/\n// After (anchored, no double .* scan):\n/(?:^|&)id=([^&]+)/\n```\n\n---\n\n### 🟡 MEDIUM — Missing Anchors\n\n**#5 — src/validation/phone.ts:3**\n```ts\nconst PHONE_RE = /\\+?[0-9]{10,15}/ // used in: validatePhone(userInput)\n```\n\n**Issue:** No `^` or ` phy-regex-audit — Skillopedia anchor — matches `+15551234567` embedded in `\"Call +15551234567 for support\"`.\n**Attack scenario:** Attacker bypasses phone validation by embedding valid number in malicious input.\n\n**Fix:**\n```ts\nconst PHONE_RE = /^\\+?[0-9]{10,15}$/\n```\n\n---\n\n### ⚪ LOW — Locale Assumptions\n\n**#8 — src/utils/slugify.ts:7**\n```ts\n.replace(/[^a-zA-Z0-9-]/g, '-')\n```\n\n**Issue:** `[a-zA-Z]` excludes ñ, é, ü, ç, ș — slugs for non-English content will be all dashes.\n**Fix:** Normalize Unicode first: `str.normalize('NFKD').replace(/[\\u0300-\\u036f]/g, '').replace(/[^a-z0-9-]/gi, '-')`\n\n---\n\n### CI Integration\n\nAdd to your test suite or pre-commit hook:\n\n```bash\n# One-liner scan — exits non-zero if CRITICAL or HIGH found\npython3 -c \"\nimport re, sys, glob\n\nNESTED_Q = re.compile(r'\$([^()]{1,30}\\+[^()]{0,10})\$\\+|\$([^()]{1,30})\$\\*\\+')\nDOUBLE_STAR = re.compile(r'\\.\\*[^)]{0,10}\\.\\*')\n\nfound = []\nfor fpath in glob.glob('src/**/*.{js,ts,py}', recursive=True):\n lines = open(fpath, errors='replace').readlines()\n for i, line in enumerate(lines, 1):\n for m in re.finditer(r'/((?:[^/\\\\]|\\\\.){3,})/', line):\n pat = m.group(1)\n if NESTED_Q.search(pat) or DOUBLE_STAR.search(pat):\n found.append(f'{fpath}:{i}: {pat[:60]}')\n\nif found:\n print(f'FAIL: {len(found)} ReDoS-vulnerable regex(es) found:')\n for f in found: print(' ', f)\n sys.exit(1)\nelse:\n print(f'PASS: No ReDoS patterns detected')\n\"\n```\n\n---\n\n### Resources\n\n- **safe-regex** npm package: `npx safe-regex \"your-pattern\"` — quick single-pattern check\n- **vuln-regex-detector**: more comprehensive, uses fuzzing\n- **OWASP ReDoS prevention**: https://owasp.org/www-community/attacks/ReDoS\n- **Cloudflare outage 2019**: caused by a single ReDoS regex in a WAF rule\n- **Stack Overflow outage 2016**: caused by `\\s*` nested inside `\\s+` in a markdown parser\n```\n\n---\n\n## Quick Mode Output\n\n```\nReDoS Audit: my-app (847 files, 214 regexes)\n\n🔴 CRITICAL (2): 2 exponential-backtracking patterns in HTTP handlers\n src/middleware/auth.js:47 — ([a-zA-Z0-9_-]+\\.)+ nested quantifiers\n src/utils/email-validator.ts:12 — complex email regex, nested groups\n\n🟠 HIGH (4): polynomial backtracking\n src/api/search.js:89 — double .* in query parser (in HTTP handler ⚠️)\n src/parsers/csv.ts:23, src/lib/url.js:15, src/routes/user.ts:88\n\n🟡 MEDIUM (7): missing anchors (5), unbounded complex groups (2)\n⚪ LOW (11): locale assumptions in character classes\n\nPriority: Fix auth.js:47 first — it's in unauthenticated middleware, CRITICAL severity\nQuick win: s/([a-zA-Z0-9_-]+\\.)+/[a-zA-Z0-9_-]+(?:\\.[a-zA-Z0-9_-]+)+/ → safe in 30 seconds\n```\n---","attachment_filenames":["_meta.json"],"attachments":[{"filename":"_meta.json","content":"{\n \"owner\": \"phy041\",\n \"slug\": \"phy-regex-audit\",\n \"displayName\": \"Phy Regex Audit\",\n \"latest\": {\n \"version\": \"1.0.0\",\n \"publishedAt\": 1773799346046,\n \"commit\": \"https://github.com/openclaw/skills/commit/78d4af91ed05ce9b92239effe4464a1188172420\"\n },\n \"history\": []\n}\n","content_type":"application/json; charset=utf-8","language":"json","size":282,"content_sha256":"39f90164854528309b400406bcafd89f4c61cc9606ce745f5bd7ecd4b1ab0777"}],"content_json":{"type":"doc","content":[{"type":"heading","attrs":{"level":1},"content":[{"text":"ReDoS & Regex Quality Auditor","type":"text"}]},{"type":"paragraph","content":[{"text":"One regex. One crafted input. Your Node.js server hangs for 30 seconds.","type":"text"}]},{"type":"paragraph","content":[{"text":"ReDoS (Regular Expression Denial of Service) is real, underestimated, and embarrassingly fixable. This skill walks every source file in your project, extracts regex literals, identifies catastrophic backtracking patterns, and tells you exactly which ones are dangerous and how to fix them.","type":"text"}]},{"type":"paragraph","content":[{"text":"Supports JS/TS, Python, Go, Java, Ruby, PHP, Rust. Zero external API.","type":"text","marks":[{"type":"strong"}]}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Trigger Phrases","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"\"regex security\", \"ReDoS\", \"catastrophic backtracking\"","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"\"regex audit\", \"slow regex\", \"regex vulnerability\"","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"\"check my regexes\", \"regex denial of service\"","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"\"is this regex safe\", \"regex performance\"","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"\"hardcoded locale regex\", \"missing anchor\"","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"\"/regex-audit\"","type":"text"}]}]}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"How to Provide Input","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"# Option 1: Audit current directory (auto-detect all source files)\n/regex-audit\n\n# Option 2: Specific directory or file\n/regex-audit src/\n/regex-audit lib/validators.js\n\n# Option 3: Focus on specific language\n/regex-audit --lang js\n/regex-audit --lang python\n\n# Option 4: Show only CRITICAL and HIGH severity\n/regex-audit --min-severity high\n\n# Option 5: Check a single regex pattern for safety\n/regex-audit --pattern \"^(a+)+\"\n\n# Option 6: Focus on HTTP handler files (highest risk)\n/regex-audit --high-risk-only\n\n# Option 7: Output machine-readable JSON for CI\n/regex-audit --json","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Step 1: Discover Source Files","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"python3 -c \"\nimport glob, os\nfrom pathlib import Path\n\n# Language file patterns\npatterns = {\n 'JavaScript/TypeScript': ['**/*.js', '**/*.ts', '**/*.jsx', '**/*.tsx', '**/*.mjs'],\n 'Python': ['**/*.py'],\n 'Go': ['**/*.go'],\n 'Java': ['**/*.java'],\n 'Ruby': ['**/*.rb'],\n 'PHP': ['**/*.php'],\n 'Rust': ['**/*.rs'],\n}\n\nskip_dirs = {'node_modules', '.git', 'dist', 'build', '.next', 'vendor', '__pycache__', '.venv', 'venv'}\n\nall_files = []\nfor lang, file_patterns in patterns.items():\n lang_files = []\n for p in file_patterns:\n for f in glob.glob(p, recursive=True):\n parts = set(Path(f).parts)\n if not parts & skip_dirs:\n lang_files.append(f)\n if lang_files:\n print(f'{lang}: {len(lang_files)} files')\n all_files.extend(lang_files)\n\nprint(f'\\\\nTotal: {len(all_files)} source files to scan')\n\"","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Step 2: Extract Regex Literals","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"python"},"content":[{"text":"import re\nfrom pathlib import Path\nfrom dataclasses import dataclass\nfrom typing import Optional\n\n@dataclass\nclass RegexMatch:\n file: str\n line: int\n pattern: str\n raw_context: str # the surrounding code line\n language: str\n in_handler: bool # is this in an HTTP handler / validator?\n\n# Language-specific regex extraction patterns\nEXTRACTORS = {\n 'js': [\n # Regex literals: /pattern/flags\n re.compile(r'(?\u003c![=!\u003c>])\\/([^\\/\\n\\r]{3,}?)\\/([gimsuy]*)'),\n # new RegExp(\"pattern\")\n re.compile(r'new\\s+RegExp\\([\"\\']([^\"\\']{3,})[\"\\']'),\n # .test(), .match(), .exec() with string literal\n re.compile(r'\\.(?:test|match|exec|replace|search)\\([\"\\'/]([^\"\\'\\/\\n]{3,})[\"\\'/]'),\n ],\n 'python': [\n # re.compile(r\"pattern\")\n re.compile(r're\\.(?:compile|match|search|fullmatch|findall|finditer|sub|subn|split)\\([\"\\']([^\"\\']{3,})[\"\\']'),\n re.compile(r're\\.(?:compile|match|search|fullmatch|findall|finditer|sub|subn|split)\\(r[\"\\']([^\"\\']{3,})[\"\\']'),\n ],\n 'go': [\n # regexp.MustCompile(`pattern`)\n re.compile(r'regexp\\.(?:MustCompile|Compile|Match|MatchString)\\([\"`]([^\"`]{3,})[\"`]'),\n ],\n 'java': [\n # Pattern.compile(\"pattern\")\n re.compile(r'Pattern\\.compile\\([\"\\']([^\"\\']{3,})[\"\\']'),\n re.compile(r'\\.matches\\([\"\\']([^\"\\']{3,})[\"\\']'),\n ],\n 'ruby': [\n # /pattern/ or Regexp.new(\"pattern\")\n re.compile(r'\\/([^\\/\\n]{3,})\\/'),\n re.compile(r'Regexp\\.new\\([\"\\']([^\"\\']{3,})[\"\\']'),\n ],\n 'php': [\n # preg_match('/pattern/', ...)\n re.compile(r'preg_(?:match|replace|split|grep)\\([\"\\']([^\"\\']{3,})[\"\\']'),\n ],\n 'rust': [\n # Regex::new(r\"pattern\")\n re.compile(r'Regex::new\\([r]?[\"\\']([^\"\\']{3,})[\"\\']'),\n ],\n}\n\n# Keywords that indicate high-risk call sites\nHIGH_RISK_CONTEXTS = [\n 'app.get', 'app.post', 'app.put', 'app.delete',\n 'router.', 'express', 'fastify', 'koa',\n 'validate', 'validator', 'sanitize',\n 'request.body', 'req.body', 'req.params', 'req.query',\n 'process.argv', 'sys.argv',\n 'input(', 'readline(',\n 'url.parse', 'new URL(',\n '@app.route', 'flask.request',\n 'r.URL.Query', 'r.FormValue',\n]\n\nLANG_MAP = {\n '.js': 'js', '.jsx': 'js', '.ts': 'js', '.tsx': 'js', '.mjs': 'js',\n '.py': 'python',\n '.go': 'go',\n '.java': 'java',\n '.rb': 'ruby',\n '.php': 'php',\n '.rs': 'rust',\n}\n\n\ndef extract_regexes_from_file(fpath: str) -> list[RegexMatch]:\n \"\"\"Extract all regex literals from a source file.\"\"\"\n ext = Path(fpath).suffix.lower()\n lang = LANG_MAP.get(ext)\n if not lang or lang not in EXTRACTORS:\n return []\n\n try:\n lines = Path(fpath).read_text(encoding='utf-8', errors='replace').splitlines()\n except Exception:\n return []\n\n results = []\n for line_num, line in enumerate(lines, 1):\n # Check if this line is in a high-risk context (look at surrounding 5 lines)\n context_window = '\\n'.join(lines[max(0, line_num-5):line_num+5])\n in_handler = any(kw in context_window for kw in HIGH_RISK_CONTEXTS)\n\n for extractor in EXTRACTORS[lang]:\n for m in extractor.finditer(line):\n pattern = m.group(1)\n if len(pattern) >= 3:\n results.append(RegexMatch(\n file=fpath,\n line=line_num,\n pattern=pattern,\n raw_context=line.strip(),\n language=lang,\n in_handler=in_handler,\n ))\n\n return results","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Step 3: Detect ReDoS Patterns","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"python"},"content":[{"text":"from enum import Enum\n\nclass Severity(Enum):\n CRITICAL = 4 # Exponential backtracking — proven DoS vector\n HIGH = 3 # Polynomial backtracking — slow on long inputs\n MEDIUM = 2 # Potentially slow — context-dependent\n LOW = 1 # Style/quality issue\n\n@dataclass\nclass ReDoSFinding:\n regex_match: RegexMatch\n severity: Severity\n vulnerability_type: str\n dangerous_subpattern: str\n description: str\n attack_input_example: str\n fix_suggestion: str\n\n\n# ReDoS pattern signatures\nREDOS_PATTERNS = [\n\n # ===== CRITICAL: Exponential Backtracking =====\n\n {\n 'name': 'NESTED_QUANTIFIERS',\n 'severity': Severity.CRITICAL,\n 'detector': re.compile(r'\$([^()]{1,30}\\+[^()]{0,10})\$\\+|\$([^()]{1,30}\\*[^()]{0,10})\$\\+'),\n 'description': 'Nested quantifiers (a+)+ or (a*)+ create exponential backtracking.',\n 'attack_shape': 'Long string of matching chars followed by one non-matching char',\n 'example_attack': '\"aaaaaaaaaaaaaaaaaaaaaaaaaX\"',\n 'fix': 'Use atomic group (?>...) or possessive quantifier — rewrite to remove nesting.',\n },\n {\n 'name': 'NESTED_STAR_PLUS',\n 'severity': Severity.CRITICAL,\n 'detector': re.compile(r'\$([^()]{1,30})\$\\*\\+|\$([^()]{1,30})\$\\+\\*'),\n 'description': 'Nested star/plus combination enables exponential path explosion.',\n 'attack_shape': 'Repeated matching chars followed by failure',\n 'example_attack': '\"aaaaaaaaaaaX\"',\n 'fix': 'Flatten quantifiers or use possessive quantifiers.',\n },\n\n # ===== HIGH: Polynomial Backtracking =====\n\n {\n 'name': 'ALTERNATION_OVERLAP',\n 'severity': Severity.HIGH,\n 'detector': re.compile(r'\$([a-zA-Z]{1,5})\\|([a-zA-Z]{1,5})\$\\+|\$([a-zA-Z]{1,5})\\|([a-zA-Z]{1,5})\$\\*'),\n 'description': 'Overlapping alternation with quantifier: (ab|a)+ causes polynomial backtracking.',\n 'attack_shape': 'Long string that partially matches both alternatives',\n 'example_attack': '\"ababababababababX\"',\n 'fix': 'Reorder alternatives longest-first; avoid overlapping prefixes. Use (?>a(?:b)?)+ instead of (ab|a)+',\n },\n {\n 'name': 'GREEDY_DOTSTAR_ANCHORED',\n 'severity': Severity.HIGH,\n 'detector': re.compile(r'\\.\\*.*\\.\\*'),\n 'description': 'Multiple .* in sequence causes O(n²) backtracking on non-matching inputs.',\n 'attack_shape': 'Long string that partially matches then fails at the end',\n 'example_attack': '\"a\" * 10000 + \"X\"',\n 'fix': 'Use [^\\\\n]* instead of .* when newlines are impossible; add anchors.',\n },\n\n # ===== MEDIUM: Potentially Slow =====\n\n {\n 'name': 'UNBOUNDED_REPETITION_COMPLEX',\n 'severity': Severity.MEDIUM,\n 'detector': re.compile(r'\$[^()]{5,}\$\\{[0-9,]+\\}'),\n 'description': 'Large repetition count on complex group — can be slow on long non-matching inputs.',\n 'attack_shape': 'Input that triggers maximum iterations before failing',\n 'example_attack': 'Input matching N-1 repetitions but failing on last character',\n 'fix': 'Add possessive quantifier or reduce repetition scope.',\n },\n {\n 'name': 'MISSING_ANCHORS',\n 'severity': Severity.MEDIUM,\n 'detector': None, # Checked separately\n 'description': 'Regex without ^ or $ anchors on email/URL patterns causes full-text search instead of match.',\n 'attack_shape': 'Input containing valid pattern embedded in garbage',\n 'example_attack': '\"evil.com/redirect?to=legit.com\"',\n 'fix': 'Add ^ at start and $ at end: /^pattern$/',\n },\n]\n\n\ndef analyze_regex_for_redos(rm: RegexMatch) -> list[ReDoSFinding]:\n \"\"\"Check a single regex for ReDoS vulnerabilities.\"\"\"\n findings = []\n pattern = rm.pattern\n\n for sig in REDOS_PATTERNS:\n if sig['detector'] is None:\n continue\n match = sig['detector'].search(pattern)\n if match:\n findings.append(ReDoSFinding(\n regex_match=rm,\n severity=sig['severity'],\n vulnerability_type=sig['name'],\n dangerous_subpattern=match.group(0),\n description=sig['description'],\n attack_input_example=sig.get('example_attack', ''),\n fix_suggestion=sig['fix'],\n ))\n\n # Check for missing anchors on patterns that look like email/URL/phone validators\n looks_like_validator = any(\n kw in rm.raw_context.lower()\n for kw in ['email', 'url', 'phone', 'validate', 'isValid', 'check']\n )\n if looks_like_validator and not (pattern.startswith('^') or pattern.endswith('

Important: agents should read /llm.txt, /llms.txt, or /.well-known/skills.json to discover the public Skillopedia API.

)):\n findings.append(ReDoSFinding(\n regex_match=rm,\n severity=Severity.MEDIUM,\n vulnerability_type='MISSING_ANCHORS',\n dangerous_subpattern=pattern[:40],\n description='Validator regex lacks ^ and $ anchors — matches anywhere in string.',\n attack_input_example='\"garbage_validinput_garbage\"',\n fix_suggestion=f'Add anchors: /^{pattern[:40]}$/',\n ))\n\n return findings\n\n\ndef check_locale_assumptions(rm: RegexMatch) -> Optional[ReDoSFinding]:\n \"\"\"Detect ASCII-only character classes that should be locale-aware.\"\"\"\n ASCII_ONLY_HINTS = [\n (re.compile(r'\\[a-zA-Z\\]|\\[a-z\\]|\\[A-Z\\]'), 'Matches only ASCII letters — fails on é, ü, ñ, etc.'),\n (re.compile(r'\\[0-9\\]'), 'Use \\\\d or [0-9] explicitly; be aware \\\\d matches Unicode digits in some engines'),\n ]\n for detector, msg in ASCII_ONLY_HINTS:\n if detector.search(rm.pattern):\n return ReDoSFinding(\n regex_match=rm,\n severity=Severity.LOW,\n vulnerability_type='LOCALE_ASSUMPTION',\n dangerous_subpattern=detector.search(rm.pattern).group(0),\n description=f'ASCII-only character class: {msg}',\n attack_input_example='\"Ångström\" or \"naïve\"',\n fix_suggestion='Use \\\\p{L} (Unicode letter) if your regex engine supports it, or explicitly list the characters you need.',\n )\n return None","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Step 4: Run Full Scan","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"python"},"content":[{"text":"import os\nimport glob\n\ndef run_regex_audit(target_dir='.', min_severity=Severity.LOW, high_risk_only=False):\n \"\"\"Full scan: discover files, extract regexes, analyze, report.\"\"\"\n\n # Discover files\n all_files = []\n for ext in ['.js', '.jsx', '.ts', '.tsx', '.mjs', '.py', '.go', '.java', '.rb', '.php', '.rs']:\n pattern = f'{target_dir}/**/*{ext}'\n for f in glob.glob(pattern, recursive=True):\n if not any(skip in f for skip in ['node_modules', '.git', 'dist', 'build', '.next', 'vendor']):\n all_files.append(f)\n\n # Extract and analyze\n all_findings = []\n total_regexes = 0\n\n for fpath in all_files:\n regexes = extract_regexes_from_file(fpath)\n total_regexes += len(regexes)\n\n for rm in regexes:\n if high_risk_only and not rm.in_handler:\n continue\n\n findings = analyze_regex_for_redos(rm)\n locale_finding = check_locale_assumptions(rm)\n if locale_finding:\n findings.append(locale_finding)\n\n for f in findings:\n if f.severity.value >= min_severity.value:\n all_findings.append(f)\n\n # Sort by severity desc, then file\n all_findings.sort(key=lambda x: (-x.severity.value, x.regex_match.file, x.regex_match.line))\n\n return all_findings, total_regexes, len(all_files)","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Step 5: Output Report","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"markdown"},"content":[{"text":"## ReDoS & Regex Security Audit\nProject: my-app | Files scanned: 847 | Regexes found: 214\n\n---\n\n### Summary\n\n| Severity | Count | Description |\n|----------|-------|-------------|\n| 🔴 CRITICAL | 2 | Exponential backtracking — proven DoS vector |\n| 🟠 HIGH | 4 | Polynomial backtracking — slow on long inputs |\n| 🟡 MEDIUM | 7 | Potentially slow or logic error |\n| ⚪ LOW | 11 | Style/locale issues |\n\n**⚠️ 3 findings are in HTTP request handlers — prioritize these.**\n\n---\n\n### 🔴 CRITICAL — Exponential Backtracking\n\n**#1 — src/middleware/auth.js:47**\n```js\n// Context: JWT token format validator in Express middleware\napp.use('/api', (req, res, next) => {\n const token = req.headers.authorization\n if (!/^([a-zA-Z0-9_-]+\\.)+[a-zA-Z0-9_-]+$/.test(token)) { ... }","type":"text"}]},{"type":"paragraph","content":[{"text":"Dangerous pattern:","type":"text","marks":[{"type":"strong"}]},{"text":" ","type":"text"},{"text":"([a-zA-Z0-9_-]+\\.)+","type":"text","marks":[{"type":"code_inline"}]},{"text":" ","type":"text"},{"text":"Vulnerability:","type":"text","marks":[{"type":"strong"}]},{"text":" NESTED_QUANTIFIERS — the inner ","type":"text"},{"text":"+","type":"text","marks":[{"type":"code_inline"}]},{"text":" and outer ","type":"text"},{"text":"+","type":"text","marks":[{"type":"code_inline"}]},{"text":" create exponential backtracking. ","type":"text"},{"text":"Attack input:","type":"text","marks":[{"type":"strong"}]},{"text":" ","type":"text"},{"text":"\"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\"","type":"text","marks":[{"type":"code_inline"}]},{"text":" (no dot — triggers backtracking) ","type":"text"},{"text":"DoS potential:","type":"text","marks":[{"type":"strong"}]},{"text":" Input of length 30 takes ~2 seconds on Node.js. Length 50 takes minutes. ","type":"text"},{"text":"In HTTP handler:","type":"text","marks":[{"type":"strong"}]},{"text":" ⚠️ YES — any unauthenticated request can trigger this.","type":"text"}]},{"type":"paragraph","content":[{"text":"Fix:","type":"text","marks":[{"type":"strong"}]}]},{"type":"code_block","attrs":{"wrap":false,"language":"js"},"content":[{"text":"// Before (vulnerable):\n/^([a-zA-Z0-9_-]+\\.)+[a-zA-Z0-9_-]+$/\n\n// After (safe):\n// Option 1: Possessive quantifier (not supported in JS — use atomic group via lookbehind trick)\n// Option 2: Rewrite without nested quantifiers:\n/^[a-zA-Z0-9_-]+(?:\\.[a-zA-Z0-9_-]+)+$/\n// This eliminates the nested quantifier — the outer group cannot backtrack\n// into positions already consumed by the inner match.","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"paragraph","content":[{"text":"#2 — src/utils/email-validator.ts:12","type":"text","marks":[{"type":"strong"}]}]},{"type":"code_block","attrs":{"wrap":false,"language":"ts"},"content":[{"text":"const EMAIL_RE = /^(([^\u003c>()\\[\\]\\\\.,;:\\s@\"]+(\\.[^\u003c>()\\[\\]\\\\.,;:\\s@\"]+)*)|(\".+\"))@/","type":"text"}]},{"type":"paragraph","content":[{"text":"Dangerous pattern:","type":"text","marks":[{"type":"strong"}]},{"text":" ","type":"text"},{"text":"([^\u003c>...]+(\\.[^\u003c>...]+)*)","type":"text","marks":[{"type":"code_inline"}]},{"text":" ","type":"text"},{"text":"Vulnerability:","type":"text","marks":[{"type":"strong"}]},{"text":" NESTED_QUANTIFIERS inside alternation ","type":"text"},{"text":"Attack input:","type":"text","marks":[{"type":"strong"}]},{"text":" ","type":"text"},{"text":"\"a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a@\"","type":"text","marks":[{"type":"code_inline"}]},{"text":" (missing domain after @) ","type":"text"},{"text":"Fix:","type":"text","marks":[{"type":"strong"}]}]},{"type":"code_block","attrs":{"wrap":false,"language":"ts"},"content":[{"text":"// Use a proven safe email regex:\nconst EMAIL_RE = /^[^\\s@]+@[^\\s@]+\\.[^\\s@]+$/\n// For strict RFC 5322: use the validator.js library (pre-audited, safe)\n// npm install validator → isEmail(str)","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":3},"content":[{"text":"🟠 HIGH — Polynomial Backtracking","type":"text"}]},{"type":"paragraph","content":[{"text":"#3 — src/api/search.js:89","type":"text","marks":[{"type":"strong"}]}]},{"type":"code_block","attrs":{"wrap":false,"language":"js"},"content":[{"text":"// URL parameter parser\nconst queryRe = /.*id=.*&.*/","type":"text"}]},{"type":"paragraph","content":[{"text":"Vulnerability:","type":"text","marks":[{"type":"strong"}]},{"text":" GREEDY_DOTSTAR_ANCHORED — ","type":"text"},{"text":".*...*","type":"text","marks":[{"type":"code_inline"}]},{"text":" is O(n²) ","type":"text"},{"text":"Attack input:","type":"text","marks":[{"type":"strong"}]},{"text":" 10,000-char query string with no ","type":"text"},{"text":"id=","type":"text","marks":[{"type":"code_inline"}]},{"text":" → scans 10,000 × 10,000 positions ","type":"text"},{"text":"In HTTP handler:","type":"text","marks":[{"type":"strong"}]},{"text":" ⚠️ YES — ","type":"text"},{"text":"req.query","type":"text","marks":[{"type":"code_inline"}]},{"text":" passed directly","type":"text"}]},{"type":"paragraph","content":[{"text":"Fix:","type":"text","marks":[{"type":"strong"}]}]},{"type":"code_block","attrs":{"wrap":false,"language":"js"},"content":[{"text":"// Before:\n/.*id=.*&.*/\n// After (anchored, no double .* scan):\n/(?:^|&)id=([^&]+)/","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":3},"content":[{"text":"🟡 MEDIUM — Missing Anchors","type":"text"}]},{"type":"paragraph","content":[{"text":"#5 — src/validation/phone.ts:3","type":"text","marks":[{"type":"strong"}]}]},{"type":"code_block","attrs":{"wrap":false,"language":"ts"},"content":[{"text":"const PHONE_RE = /\\+?[0-9]{10,15}/ // used in: validatePhone(userInput)","type":"text"}]},{"type":"paragraph","content":[{"text":"Issue:","type":"text","marks":[{"type":"strong"}]},{"text":" No ","type":"text"},{"text":"^","type":"text","marks":[{"type":"code_inline"}]},{"text":" or ","type":"text"},{"text":"$","type":"text","marks":[{"type":"code_inline"}]},{"text":" anchor — matches ","type":"text"},{"text":"+15551234567","type":"text","marks":[{"type":"code_inline"}]},{"text":" embedded in ","type":"text"},{"text":"\"Call +15551234567 for support\"","type":"text","marks":[{"type":"code_inline"}]},{"text":". ","type":"text"},{"text":"Attack scenario:","type":"text","marks":[{"type":"strong"}]},{"text":" Attacker bypasses phone validation by embedding valid number in malicious input.","type":"text"}]},{"type":"paragraph","content":[{"text":"Fix:","type":"text","marks":[{"type":"strong"}]}]},{"type":"code_block","attrs":{"wrap":false,"language":"ts"},"content":[{"text":"const PHONE_RE = /^\\+?[0-9]{10,15}$/","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":3},"content":[{"text":"⚪ LOW — Locale Assumptions","type":"text"}]},{"type":"paragraph","content":[{"text":"#8 — src/utils/slugify.ts:7","type":"text","marks":[{"type":"strong"}]}]},{"type":"code_block","attrs":{"wrap":false,"language":"ts"},"content":[{"text":".replace(/[^a-zA-Z0-9-]/g, '-')","type":"text"}]},{"type":"paragraph","content":[{"text":"Issue:","type":"text","marks":[{"type":"strong"}]},{"text":" ","type":"text"},{"text":"[a-zA-Z]","type":"text","marks":[{"type":"code_inline"}]},{"text":" excludes ñ, é, ü, ç, ș — slugs for non-English content will be all dashes. ","type":"text"},{"text":"Fix:","type":"text","marks":[{"type":"strong"}]},{"text":" Normalize Unicode first: ","type":"text"},{"text":"str.normalize('NFKD').replace(/[\\u0300-\\u036f]/g, '').replace(/[^a-z0-9-]/gi, '-')","type":"text","marks":[{"type":"code_inline"}]}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":3},"content":[{"text":"CI Integration","type":"text"}]},{"type":"paragraph","content":[{"text":"Add to your test suite or pre-commit hook:","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"# One-liner scan — exits non-zero if CRITICAL or HIGH found\npython3 -c \"\nimport re, sys, glob\n\nNESTED_Q = re.compile(r'\$([^()]{1,30}\\+[^()]{0,10})\$\\+|\$([^()]{1,30})\$\\*\\+')\nDOUBLE_STAR = re.compile(r'\\.\\*[^)]{0,10}\\.\\*')\n\nfound = []\nfor fpath in glob.glob('src/**/*.{js,ts,py}', recursive=True):\n lines = open(fpath, errors='replace').readlines()\n for i, line in enumerate(lines, 1):\n for m in re.finditer(r'/((?:[^/\\\\]|\\\\.){3,})/', line):\n pat = m.group(1)\n if NESTED_Q.search(pat) or DOUBLE_STAR.search(pat):\n found.append(f'{fpath}:{i}: {pat[:60]}')\n\nif found:\n print(f'FAIL: {len(found)} ReDoS-vulnerable regex(es) found:')\n for f in found: print(' ', f)\n sys.exit(1)\nelse:\n print(f'PASS: No ReDoS patterns detected')\n\"","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":3},"content":[{"text":"Resources","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"safe-regex","type":"text","marks":[{"type":"strong"}]},{"text":" npm package: ","type":"text"},{"text":"npx safe-regex \"your-pattern\"","type":"text","marks":[{"type":"code_inline"}]},{"text":" — quick single-pattern check","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"vuln-regex-detector","type":"text","marks":[{"type":"strong"}]},{"text":": more comprehensive, uses fuzzing","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"OWASP ReDoS prevention","type":"text","marks":[{"type":"strong"}]},{"text":": https://owasp.org/www-community/attacks/ReDoS","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Cloudflare outage 2019","type":"text","marks":[{"type":"strong"}]},{"text":": caused by a single ReDoS regex in a WAF rule","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Stack Overflow outage 2016","type":"text","marks":[{"type":"strong"}]},{"text":": caused by ","type":"text"},{"text":"\\s*","type":"text","marks":[{"type":"code_inline"}]},{"text":" nested inside ","type":"text"},{"text":"\\s+","type":"text","marks":[{"type":"code_inline"}]},{"text":" in a markdown parser","type":"text"}]}]}]},{"type":"code_block","attrs":{"wrap":false,"language":""},"content":[{"text":"\n---\n\n## Quick Mode Output\n","type":"text"}]},{"type":"paragraph","content":[{"text":"ReDoS Audit: my-app (847 files, 214 regexes)","type":"text"}]},{"type":"paragraph","content":[{"text":"🔴 CRITICAL (2): 2 exponential-backtracking patterns in HTTP handlers src/middleware/auth.js:47 — ([a-zA-Z0-9_-]+.)+ nested quantifiers src/utils/email-validator.ts:12 — complex email regex, nested groups","type":"text"}]},{"type":"paragraph","content":[{"text":"🟠 HIGH (4): polynomial backtracking src/api/search.js:89 — double .* in query parser (in HTTP handler ⚠️) src/parsers/csv.ts:23, src/lib/url.js:15, src/routes/user.ts:88","type":"text"}]},{"type":"paragraph","content":[{"text":"🟡 MEDIUM (7): missing anchors (5), unbounded complex groups (2) ⚪ LOW (11): locale assumptions in character classes","type":"text"}]},{"type":"paragraph","content":[{"text":"Priority: Fix auth.js:47 first — it's in unauthenticated middleware, CRITICAL severity Quick win: s/([a-zA-Z0-9_-]+.)+/[a-zA-Z0-9_-]+(?:.[a-zA-Z0-9_-]+)+/ → safe in 30 seconds","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":""},"content":[{"text":"---","type":"text"}]}]},"metadata":{"date":"2026-06-05","name":"phy-regex-audit","author":"@skillopedia","source":{"stars":2012,"repo_name":"openclaw-master-skills","origin_url":"https://github.com/leoyeai/openclaw-master-skills/blob/HEAD/skills/phy-regex-audit/SKILL.md","repo_owner":"leoyeai","body_sha256":"1b0f4b2bdae72d27c34a8cf34ce834b4cf9e3b65db45c7e39427b31559b33dec","cluster_key":"aba9e486f5adc2aaefaf90305f87fb9c96d32a2ae8c0d185e6cf02bbcec4ab45","clean_bundle":{"format":"clean-skill-bundle-v1","source":"leoyeai/openclaw-master-skills/skills/phy-regex-audit/SKILL.md","attachments":[{"id":"2d20022c-4522-588a-b786-1b25e2ceec8e","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/2d20022c-4522-588a-b786-1b25e2ceec8e/attachment.json","path":"_meta.json","size":282,"sha256":"39f90164854528309b400406bcafd89f4c61cc9606ce745f5bd7ecd4b1ab0777","contentType":"application/json; charset=utf-8"}],"bundle_sha256":"3891641d6f9292233fba836240db7196704836910afeb953eab2d612e83763b0","attachment_count":1,"text_attachments":1,"attachment_storage":"skillopedia-attachments-v1","binary_attachments":0,"excluded_attachments":[]},"cluster_size":1,"skill_md_path":"skills/phy-regex-audit/SKILL.md","import_metadata":{"date":"2026-06-05","author":"@skillopedia","version":"v1","category":"security","category_label":"Security"},"exact_dupes_collapsed_into_this":0},"license":"Apache-2.0","version":"v1","category":"security","metadata":{"tags":["security","regex","redos","performance","static-analysis","developer-tools","javascript","python","denial-of-service"],"author":"PHY041","version":"1.0.0"},"import_tag":"clean-skills-v1","description":"Static ReDoS (Regular Expression Denial of Service) vulnerability scanner and regex quality auditor for codebases. Walks all source files to extract regex literals, detects catastrophic backtracking patterns (nested quantifiers, overlapping alternation, unbounded repetition on complex groups), severity-ranks each finding as CRITICAL/HIGH/MEDIUM, reports file and line number with the dangerous sub-pattern highlighted, identifies high-risk call sites (HTTP request handlers, form validators, URL parsers), and suggests safe rewrites using atomic groups or simplified alternatives. Also detects hardcoded locale assumptions (character classes assuming ASCII), overly permissive patterns, and regexes missing anchors. Supports JS/TS, Python, Go, Java, Ruby, PHP, Rust. Zero external API — pure static analysis. Triggers on \"regex security\", \"ReDoS\", \"catastrophic backtracking\", \"regex audit\", \"slow regex\", \"regex vulnerability\", \"/regex-audit\"."}},"renderedAt":1782980912848}

Important: agents should read /llm.txt, /llms.txt, or /.well-known/skills.json to discover the public Skillopedia API.