Karpathy Coder — Active Coding Discipline Derived from Andrej Karpathy's observations on LLM coding pitfalls. This is not just guidelines — it ships Python tools that detect violations, a review agent, a slash command, and a pre-commit hook. "The models make wrong assumptions on your behalf and just run along with them without checking. They don't manage their confusion, don't seek clarifications, don't surface inconsistencies, don't present tradeoffs, don't push back when they should." "They really like to overcomplicate code and APIs, bloat abstractions, don't clean up dead code... implemen…

| tr '\\n' ' ') \\\n --threshold medium --json > complexity.json\n - name: Diff noise check\n run: |\n python engineering/karpathy-coder/scripts/diff_surgeon.py \\\n --diff origin/main...HEAD --json > noise.json\n - name: Report\n run: |\n echo \"## Karpathy Review\" >> $GITHUB_STEP_SUMMARY\n python -c \"\n import json\n c = json.load(open('complexity.json'))\n n = json.load(open('noise.json'))\n print(f'Complexity: {c[\\\"average_score\\\"]}/100 ({c[\\\"total_findings\\\"]} findings)')\n print(f'Diff noise: {n[\\\"noise_ratio\\\"]*100:.0f}% ({n[\\\"verdict\\\"]})')\n \" >> $GITHUB_STEP_SUMMARY\n```\n\n## Team adoption\n\n1. **Start with Level 1** for a week. Let the team see the principles in action.\n2. **Add Level 2** when reviewing PRs. Run `/karpathy-check` on every PR.\n3. **Add Level 3** when the team agrees the principles are useful. Gate commits.\n4. **Add Level 4** for repos with multiple contributors or LLM-heavy workflows.\n\n**Anti-pattern:** Going straight to Level 4 without team buy-in. The principles are opinionated — teams should experience them before enforcing them.\n","content_type":"text/markdown; charset=utf-8","language":"markdown","size":3751,"content_sha256":"f03c5a3c1851e2453ce2da679848a4b1e8a47062756310a808e2d0a7048d3e9a"},{"filename":"references/karpathy-principles.md","content":"# Karpathy Principles — Full Context\n\nSource: [Andrej Karpathy on X](https://x.com/karpathy/status/2015883857489522876), January 2026.\n\n## The original observations\n\nKarpathy identified four categories of LLM coding failure:\n\n### 1. Assumption management\n\n> \"The models make wrong assumptions on your behalf and just run along with them without checking. They don't manage their confusion, don't seek clarifications, don't surface inconsistencies, don't present tradeoffs, don't push back when they should.\"\n\n**What this means in practice:**\n- User says \"export user data\" → LLM picks JSON, writes to disk, includes all fields, doesn't ask which users\n- User says \"make it faster\" → LLM adds caching, async, and connection pooling without asking what \"faster\" means\n- User says \"fix the bug\" → LLM guesses which bug based on context, never confirms\n\n**The fix:** Before writing ANY code, list assumptions explicitly. If there are 2+ valid interpretations, present them and ask. If something is unclear, stop and name the confusion.\n\n### 2. Overcomplexity\n\n> \"They really like to overcomplicate code and APIs, bloat abstractions, don't clean up dead code... implement a bloated construction over 1000 lines when 100 would do.\"\n\n**Why LLMs do this:**\n- Training data contains enterprise patterns (Strategy, Factory, Observer) applied at inappropriate scale\n- \"More thorough\" feels safe — the LLM can't be wrong for handling edge cases, even if they're impossible\n- No cost pressure — generating 1000 lines takes the same effort as generating 100\n\n**The fix:** Ask \"would a senior engineer say this is overcomplicated?\" after writing. If a function has one caller, it shouldn't be a class. If an abstraction serves one use case, inline it.\n\n### 3. Orthogonal edits\n\n> \"They still sometimes change/remove comments and code they don't sufficiently understand as side effects, even if orthogonal to the task.\"\n\n**Common manifestations:**\n- Reformats quote style while fixing a bug\n- Adds type annotations to unchanged functions\n- \"Improves\" a comment near the bug fix\n- Renames variables in untouched code\n- Adds docstrings to functions that weren't changed\n\n**The fix:** Every changed line must trace to the user's request. If you notice something unrelated that could be improved, mention it — don't change it.\n\n### 4. Weak verification loops\n\n> \"LLMs are exceptionally good at looping until they meet specific goals... Don't tell it what to do, give it success criteria and watch it go.\"\n\n**The insight:** LLMs perform dramatically better with declarative goals (\"all tests pass\") than imperative instructions (\"add a try/except block\"). The best workflow:\n\n1. Define success criteria as concrete, verifiable checks\n2. Let the LLM loop until all checks pass\n3. Each step has its own \"verify:\" annotation\n\n## When to relax each principle\n\n| Principle | Relax when... |\n|---|---|\n| Think Before Coding | The request is unambiguous and self-contained (e.g., \"add a return statement on line 42\") |\n| Simplicity First | The user explicitly asked for an abstraction, configuration, or extensibility |\n| Surgical Changes | The user said \"refactor this file\" or \"clean up this module\" |\n| Goal-Driven Execution | The task is a one-liner with obvious correctness (e.g., rename a variable) |\n\n## The 80/20 of enforcement\n\nIf you adopt only ONE principle, adopt **Surgical Changes** (#3). It's the most measurable (diff analysis), the most commonly violated (LLMs love to \"improve\" things), and the easiest to check (does the diff contain lines unrelated to the task?).\n\nIf you adopt TWO, add **Simplicity First** (#2). Overcomplexity is the second-most-common failure and the most expensive to fix (you ship abstraction debt, then maintain it forever).\n","content_type":"text/markdown; charset=utf-8","language":"markdown","size":3754,"content_sha256":"c1470d2f73763d8a740830b39e3eff2784e35b7588a95bbb991754d290893bff"},{"filename":"scripts/assumption_linter.py","content":"#!/usr/bin/env python3\n\"\"\"\nassumption_linter.py — Detect hidden assumptions in a plan or proposal.\n\nKarpathy Principle #1 (Think Before Coding): \"State your assumptions\nexplicitly. If uncertain, ask. If multiple interpretations exist, present\nthem — don't pick silently.\"\n\nReads a markdown plan (or stdin) and flags:\n - Phrases that indicate silent choices (\"I'll just...\", \"Obviously...\", \"Simply...\")\n - Missing scope boundaries (\"export\" without specifying what/who/how)\n - Format/location assumptions without explicit mention\n - Single-interpretation language for ambiguous requirements\n - Missing error/edge-case consideration\n\nUsage:\n python assumption_linter.py plan.md\n echo \"I'll add a function to export user data\" | python assumption_linter.py -\n python assumption_linter.py plan.md --json\n\nThis is a heuristic tool, not a proof engine. False positives are expected;\nthe point is to trigger a conversation about assumptions.\n\"\"\"\nfrom __future__ import annotations\nimport argparse\nimport json\nimport re\nimport sys\nfrom pathlib import Path\n\n# --- Pattern library ---\n\nASSUMPTION_SIGNALS = [\n (re.compile(r\"\\b(?:I'll just|let me just|we can just|just)\\b\", re.I),\n \"assumption-just\", \"'just' often hides complexity. What's being skipped?\"),\n (re.compile(r\"\\b(?:obviously|clearly|of course|naturally)\\b\", re.I),\n \"assumption-obvious\", \"Signals an unstated assumption. Is it really obvious?\"),\n (re.compile(r\"\\b(?:simply|straightforward|trivial|easy)\\b\", re.I),\n \"assumption-simple\", \"Minimizing language. Could be hiding real complexity.\"),\n (re.compile(r\"\\b(?:should be fine|should work|shouldn't be a problem)\\b\", re.I),\n \"assumption-hopeful\", \"Hopeful rather than verified. How will you confirm?\"),\n (re.compile(r\"\\b(?:I assume|assuming|I'm guessing|probably)\\b\", re.I),\n \"assumption-explicit\", \"At least it's explicit — but have you verified?\"),\n (re.compile(r\"\\b(?:all users|every|everything|always|never)\\b\", re.I),\n \"scope-absolute\", \"Absolute scope. Is that really the case?\"),\n]\n\nMISSING_CLARIFICATION = [\n (re.compile(r\"\\b(?:export|import|save|load|fetch|send)\\b.*\\b(?:data|file|users)\\b\", re.I),\n \"missing-format\", \"Export/save/fetch mentioned but format not specified (JSON? CSV? API?)\"),\n (re.compile(r\"\\b(?:fix|improve|optimize|refactor|update)\\b\", re.I),\n \"vague-action\", \"Vague action verb. What specifically changes? What's the measurable improvement?\"),\n (re.compile(r\"\\b(?:handle|deal with|take care of)\\b.*\\b(?:error|edge|case)\\b\", re.I),\n \"vague-error-handling\", \"Error handling mentioned vaguely. Which errors? What behavior?\"),\n (re.compile(r\"\\b(?:the user|users)\\b(?!.*\\b(?:who|which|specific|certain|admin|role)\\b)\", re.I),\n \"unscoped-user\", \"Which user(s)? All? Specific role? Authenticated only?\"),\n]\n\nNO_VERIFICATION = [\n (re.compile(r\"^(?:(?!(?:test|verify|check|assert|confirm|ensure|validate)).)*$\", re.I),\n \"no-verification\", \"No verification step found in this block. How will you know it works?\"),\n]\n\n\ndef lint_text(text, source_name=\"stdin\"):\n \"\"\"Lint a plan text. Return list of findings.\"\"\"\n findings = []\n lines = text.splitlines()\n\n for i, line in enumerate(lines, 1):\n stripped = line.strip()\n if not stripped or stripped.startswith(\"#\"):\n continue\n\n for pattern, category, message in ASSUMPTION_SIGNALS:\n for m in pattern.finditer(stripped):\n findings.append({\n \"line\": i,\n \"category\": category,\n \"matched\": m.group(0),\n \"message\": message,\n \"context\": stripped[:120],\n })\n\n for pattern, category, message in MISSING_CLARIFICATION:\n if pattern.search(stripped):\n findings.append({\n \"line\": i,\n \"category\": category,\n \"matched\": pattern.search(stripped).group(0),\n \"message\": message,\n \"context\": stripped[:120],\n })\n\n # Check if any \"plan\" or numbered-list block lacks verification\n plan_blocks = re.findall(r\"(?:^|\\n)((?:\\d+\\.\\s+.+\\n?)+)\", text)\n for block in plan_blocks:\n has_verify = bool(re.search(r\"\\b(?:test|verify|check|assert|confirm|ensure|validate)\\b\", block, re.I))\n if not has_verify:\n findings.append({\n \"line\": 0,\n \"category\": \"missing-verification\",\n \"matched\": block[:80].replace(\"\\n\", \" \"),\n \"message\": \"Plan block has no verification step. Add 'verify:' checks.\",\n \"context\": block[:120].replace(\"\\n\", \" \"),\n })\n\n return findings\n\n\ndef main():\n p = argparse.ArgumentParser(\n description=\"Detect hidden assumptions in a plan or proposal (Karpathy Principle #1).\",\n epilog=\"Reads a markdown file or stdin. Flags silent choices, vague actions, and missing verification.\",\n )\n p.add_argument(\"input\", nargs=\"?\", default=\"-\", help=\"Markdown file to lint, or - for stdin\")\n p.add_argument(\"--json\", action=\"store_true\", help=\"JSON output\")\n args = p.parse_args()\n\n if args.input == \"-\":\n text = sys.stdin.read()\n source = \"stdin\"\n else:\n path = Path(args.input)\n if not path.exists():\n print(f\"[error] {path} not found\", file=sys.stderr)\n sys.exit(1)\n text = path.read_text(encoding=\"utf-8\", errors=\"replace\")\n source = str(path)\n\n findings = lint_text(text, source)\n\n categories = {}\n for f in findings:\n categories.setdefault(f[\"category\"], []).append(f)\n\n result = {\n \"status\": \"ok\",\n \"source\": source,\n \"total_findings\": len(findings),\n \"by_category\": {k: len(v) for k, v in categories.items()},\n \"verdict\": \"CLEAN\" if len(findings) == 0 else (\"REVIEW\" if len(findings) \u003c 5 else \"CLARIFY\"),\n \"findings\": findings,\n }\n\n if args.json:\n print(json.dumps(result, indent=2))\n return\n\n print(f\"Assumption Linter — {source}\")\n print(f\"Findings: {len(findings)} Verdict: {result['verdict']}\")\n if findings:\n print()\n for cat, items in categories.items():\n print(f\" [{cat}] ({len(items)})\")\n for item in items[:5]:\n line_ref = f\"L{item['line']}: \" if item[\"line\"] else \"\"\n print(f\" {line_ref}{item['message']}\")\n print(f\" → \\\"{item['matched']}\\\" in: {item['context'][:80]}\")\n if len(items) > 5:\n print(f\" ... and {len(items) - 5} more\")\n print()\n else:\n print(\"\\n Plan looks explicit. Assumptions are surfaced.\")\n\n print(f\"\\nVerdict: {result['verdict']}\")\n\n\nif __name__ == \"__main__\":\n main()\n","content_type":"text/x-python; charset=utf-8","language":"python","size":6830,"content_sha256":"7569dc75fdff4533f0b13070a33f561a31a6f451cb8c2a6d06ab425eea78b191"},{"filename":"scripts/complexity_checker.py","content":"#!/usr/bin/env python3\n\"\"\"\ncomplexity_checker.py — Detect over-engineering in Python/TypeScript files.\n\nKarpathy Principle #2 (Simplicity First): \"No abstractions for single-use code.\nIf you write 200 lines and it could be 50, rewrite it.\"\n\nChecks:\n - Cyclomatic complexity (branches per function)\n - Class count relative to file size (too many classes = premature abstraction)\n - Nesting depth (deep nesting = hard to read)\n - Function length (long functions = doing too much)\n - Import count (many imports = over-coupled)\n - Abstract base classes / protocols for small files (premature patterns)\n\nUsage:\n python complexity_checker.py path/to/file.py\n python complexity_checker.py src/ --threshold medium\n python complexity_checker.py . --ext py,ts --json\n\nThresholds:\n strict — flags aggressively (good for new code)\n medium — balanced (default)\n relaxed — flags only egregious cases (good for legacy code)\n\"\"\"\nfrom __future__ import annotations\nimport argparse\nimport json\nimport os\nimport re\nimport sys\nfrom pathlib import Path\n\n# --- Thresholds ---\n\nTHRESHOLDS = {\n \"strict\": {\n \"max_cyclomatic\": 5,\n \"max_nesting\": 3,\n \"max_function_lines\": 30,\n \"max_imports\": 10,\n \"max_classes_per_100_lines\": 2,\n \"max_file_lines\": 300,\n },\n \"medium\": {\n \"max_cyclomatic\": 8,\n \"max_nesting\": 4,\n \"max_function_lines\": 50,\n \"max_imports\": 15,\n \"max_classes_per_100_lines\": 3,\n \"max_file_lines\": 500,\n },\n \"relaxed\": {\n \"max_cyclomatic\": 12,\n \"max_nesting\": 5,\n \"max_function_lines\": 80,\n \"max_imports\": 25,\n \"max_classes_per_100_lines\": 5,\n \"max_file_lines\": 1000,\n },\n}\n\n# --- Analysis functions ---\n\nBRANCH_KEYWORDS_PY = re.compile(\n r\"^\\s*(if |elif |for |while |except |with |and |or |case )\", re.MULTILINE\n)\nBRANCH_KEYWORDS_TS = re.compile(\n r\"^\\s*(if\\s*\\(|else if|for\\s*\\(|while\\s*\\(|catch\\s*\\(|case |switch\\s*\\(|\\?\\?|&&|\\|\\|)\",\n re.MULTILINE,\n)\nFUNC_DEF_PY = re.compile(r\"^\\s*(?:async\\s+)?def\\s+(\\w+)\", re.MULTILINE)\nFUNC_DEF_TS = re.compile(\n r\"^\\s*(?:export\\s+)?(?:async\\s+)?(?:function\\s+(\\w+)|(?:const|let)\\s+(\\w+)\\s*=\\s*(?:async\\s+)?\\()\",\n re.MULTILINE,\n)\nCLASS_DEF_PY = re.compile(r\"^\\s*class\\s+\\w+\", re.MULTILINE)\nCLASS_DEF_TS = re.compile(r\"^\\s*(?:export\\s+)?(?:abstract\\s+)?class\\s+\\w+\", re.MULTILINE)\nIMPORT_PY = re.compile(r\"^(?:import |from \\S+ import )\", re.MULTILINE)\nIMPORT_TS = re.compile(r\"^import\\s+\", re.MULTILINE)\nABC_PATTERN = re.compile(r\"ABC|abstractmethod|Protocol|@abstract|Abstract\\w+Base\", re.MULTILINE)\nINDENT_RE = re.compile(r\"^( *)\\S\", re.MULTILINE)\n\n\ndef detect_lang(path):\n ext = path.suffix.lower()\n if ext in {\".py\"}:\n return \"python\"\n if ext in {\".ts\", \".tsx\", \".js\", \".jsx\"}:\n return \"typescript\"\n return None\n\n\ndef count_branches(text, lang):\n pat = BRANCH_KEYWORDS_PY if lang == \"python\" else BRANCH_KEYWORDS_TS\n return len(pat.findall(text))\n\n\ndef extract_functions(text, lang):\n \"\"\"Return list of (name, start_line, line_count).\"\"\"\n pat = FUNC_DEF_PY if lang == \"python\" else FUNC_DEF_TS\n lines = text.splitlines()\n funcs = []\n for m in pat.finditer(text):\n name = m.group(1) or (m.group(2) if m.lastindex and m.lastindex >= 2 else \"anonymous\")\n start = text[:m.start()].count(\"\\n\")\n # Estimate function length: count indented lines until next same-level def or end\n indent = len(m.group(0)) - len(m.group(0).lstrip())\n end = start + 1\n for i in range(start + 1, len(lines)):\n stripped = lines[i].rstrip()\n if not stripped:\n continue\n line_indent = len(stripped) - len(stripped.lstrip())\n if line_indent \u003c= indent and stripped.lstrip() and not stripped.lstrip().startswith((\"#\", \"//\", \"/*\", \"*\")):\n if lang == \"python\" and (stripped.lstrip().startswith(\"def \") or stripped.lstrip().startswith(\"class \") or stripped.lstrip().startswith(\"async def \")):\n break\n if lang == \"typescript\" and pat.match(stripped):\n break\n end = i + 1\n funcs.append({\"name\": name, \"start_line\": start + 1, \"lines\": end - start})\n return funcs\n\n\ndef max_nesting(text, lang):\n \"\"\"Return the maximum indentation depth in the file.\"\"\"\n if lang == \"python\":\n unit = 4\n else:\n unit = 2\n depths = []\n for m in INDENT_RE.finditer(text):\n spaces = len(m.group(1))\n depths.append(spaces // unit if unit else 0)\n return max(depths) if depths else 0\n\n\ndef analyze_file(path, thresholds):\n \"\"\"Analyze a single file. Return dict with findings.\"\"\"\n text = path.read_text(encoding=\"utf-8\", errors=\"replace\")\n lang = detect_lang(path)\n if not lang:\n return None\n\n lines = text.splitlines()\n line_count = len(lines)\n findings = []\n\n # File length\n if line_count > thresholds[\"max_file_lines\"]:\n findings.append({\n \"rule\": \"file-length\",\n \"severity\": \"warn\",\n \"message\": f\"File is {line_count} lines (max {thresholds['max_file_lines']}). Consider splitting.\",\n })\n\n # Import count\n imp_pat = IMPORT_PY if lang == \"python\" else IMPORT_TS\n import_count = len(imp_pat.findall(text))\n if import_count > thresholds[\"max_imports\"]:\n findings.append({\n \"rule\": \"import-count\",\n \"severity\": \"warn\",\n \"message\": f\"{import_count} imports (max {thresholds['max_imports']}). High coupling?\",\n })\n\n # Class density\n cls_pat = CLASS_DEF_PY if lang == \"python\" else CLASS_DEF_TS\n class_count = len(cls_pat.findall(text))\n if line_count > 0:\n density = class_count / (line_count / 100)\n if density > thresholds[\"max_classes_per_100_lines\"]:\n findings.append({\n \"rule\": \"class-density\",\n \"severity\": \"warn\",\n \"message\": f\"{class_count} classes in {line_count} lines ({density:.1f} per 100). Premature abstraction?\",\n })\n\n # Premature ABC/Protocol in small files\n if class_count > 0 and line_count \u003c 200 and ABC_PATTERN.search(text):\n findings.append({\n \"rule\": \"premature-abstraction\",\n \"severity\": \"warn\",\n \"message\": \"Abstract base class / Protocol in a file under 200 lines. Is this needed yet?\",\n })\n\n # Nesting depth\n depth = max_nesting(text, lang)\n if depth > thresholds[\"max_nesting\"]:\n findings.append({\n \"rule\": \"nesting-depth\",\n \"severity\": \"warn\",\n \"message\": f\"Max nesting depth {depth} (max {thresholds['max_nesting']}). Extract or flatten.\",\n })\n\n # Cyclomatic complexity (file-level)\n branches = count_branches(text, lang)\n funcs = extract_functions(text, lang)\n func_count = max(len(funcs), 1)\n avg_cyclomatic = branches / func_count\n if avg_cyclomatic > thresholds[\"max_cyclomatic\"]:\n findings.append({\n \"rule\": \"cyclomatic-complexity\",\n \"severity\": \"warn\",\n \"message\": f\"Average cyclomatic complexity {avg_cyclomatic:.1f} (max {thresholds['max_cyclomatic']}). Simplify branching.\",\n })\n\n # Function length\n for f in funcs:\n if f[\"lines\"] > thresholds[\"max_function_lines\"]:\n findings.append({\n \"rule\": \"function-length\",\n \"severity\": \"warn\",\n \"message\": f\"Function '{f['name']}' is {f['lines']} lines (max {thresholds['max_function_lines']}). Split it.\",\n \"line\": f[\"start_line\"],\n })\n\n score = max(0, 100 - len(findings) * 15)\n return {\n \"file\": str(path),\n \"language\": lang,\n \"lines\": line_count,\n \"functions\": len(funcs),\n \"classes\": class_count,\n \"imports\": import_count,\n \"max_nesting\": depth,\n \"avg_cyclomatic\": round(avg_cyclomatic, 1),\n \"score\": score,\n \"findings\": findings,\n }\n\n\ndef collect_files(target, extensions):\n target = Path(target)\n if target.is_file():\n return [target]\n files = []\n for ext in extensions:\n files.extend(target.rglob(f\"*.{ext}\"))\n # Exclude common non-source dirs\n skip = {\"node_modules\", \".git\", \"__pycache__\", \".venv\", \"venv\", \"dist\", \"build\"}\n return [f for f in files if not any(p in skip for p in f.parts)]\n\n\ndef main():\n p = argparse.ArgumentParser(\n description=\"Detect over-engineering in Python/TypeScript files (Karpathy Principle #2).\",\n epilog=\"Thresholds: strict (new code), medium (default), relaxed (legacy).\",\n )\n p.add_argument(\"target\", help=\"File or directory to analyze\")\n p.add_argument(\n \"--threshold\",\n choices=sorted(THRESHOLDS.keys()),\n default=\"medium\",\n help=\"Strictness level (default: medium)\",\n )\n p.add_argument(\n \"--ext\",\n default=\"py,ts,tsx,js,jsx\",\n help=\"Comma-separated file extensions to scan (default: py,ts,tsx,js,jsx)\",\n )\n p.add_argument(\"--json\", action=\"store_true\", help=\"JSON output\")\n args = p.parse_args()\n\n thresholds = THRESHOLDS[args.threshold]\n extensions = [e.strip().lstrip(\".\") for e in args.ext.split(\",\")]\n files = collect_files(args.target, extensions)\n\n if not files:\n msg = f\"No files found matching extensions: {extensions}\"\n if args.json:\n print(json.dumps({\"status\": \"error\", \"message\": msg}))\n else:\n print(f\"[error] {msg}\", file=sys.stderr)\n sys.exit(1)\n\n results = []\n for f in sorted(files):\n r = analyze_file(f, thresholds)\n if r:\n results.append(r)\n\n total_findings = sum(len(r[\"findings\"]) for r in results)\n avg_score = sum(r[\"score\"] for r in results) / len(results) if results else 100\n\n summary = {\n \"status\": \"ok\",\n \"threshold\": args.threshold,\n \"files_analyzed\": len(results),\n \"total_findings\": total_findings,\n \"average_score\": round(avg_score, 1),\n \"verdict\": \"PASS\" if total_findings == 0 else (\"WARN\" if avg_score >= 50 else \"FAIL\"),\n \"results\": results,\n }\n\n if args.json:\n print(json.dumps(summary, indent=2))\n return\n\n print(f\"Karpathy Simplicity Check — {len(results)} files, threshold: {args.threshold}\")\n print(f\"Average score: {avg_score:.0f}/100 Findings: {total_findings}\")\n print()\n for r in results:\n if not r[\"findings\"]:\n continue\n print(f\" {r['file']} (score {r['score']}/100)\")\n for f in r[\"findings\"]:\n line = f\" line {f['line']}\" if \"line\" in f else \"\"\n print(f\" [{f['severity'].upper()}] {f['rule']}{line}: {f['message']}\")\n print()\n if total_findings == 0:\n print(\" No findings. Code looks appropriately simple.\")\n print(f\"\\nVerdict: {summary['verdict']}\")\n\n\nif __name__ == \"__main__\":\n main()\n","content_type":"text/x-python; charset=utf-8","language":"python","size":10976,"content_sha256":"d521f491b956484e00744a3b81713dfb4a9cd97a1768db9408605e181c91e80e"},{"filename":"scripts/diff_surgeon.py","content":"#!/usr/bin/env python3\n\"\"\"\ndiff_surgeon.py — Detect diff noise: changes that don't trace to the stated goal.\n\nKarpathy Principle #3 (Surgical Changes): \"Every changed line should trace\ndirectly to the user's request.\"\n\nAnalyzes a git diff and flags:\n - Comment-only changes (unrelated to the task)\n - Whitespace / formatting changes\n - Import additions not used by the new code\n - Style changes (quote style, trailing commas, semicolons)\n - Docstring additions to unchanged functions\n - Variable renames in untouched code\n - Type annotation additions to unchanged signatures\n\nUsage:\n python diff_surgeon.py # analyze staged diff\n python diff_surgeon.py --diff HEAD~1..HEAD # analyze last commit\n python diff_surgeon.py --file changes.diff # analyze a diff file\n python diff_surgeon.py --json\n\nExit codes:\n 0 clean — all changes look intentional\n 1 noise detected — review before committing\n\"\"\"\nfrom __future__ import annotations\nimport argparse\nimport json\nimport re\nimport subprocess\nimport sys\nfrom pathlib import Path\n\n# --- Noise detectors ---\n\nCOMMENT_ONLY = re.compile(r\"^[+-]\\s*(?:#|//|/\\*|\\*|\u003c!--)\")\nWHITESPACE_ONLY = re.compile(r\"^[+-]\\s*$\")\nQUOTE_CHANGE = re.compile(r'^[+-]\\s*.*[\"\\'].*[\"\\']')\nDOCSTRING_ADD = re.compile(r'^[+]\\s*\"\"\"')\nIMPORT_LINE = re.compile(r\"^[+]\\s*(?:import |from \\S+ import |const .* = require)\")\nTYPE_ANNOTATION = re.compile(r\"^[+-].*:\\s*(?:str|int|float|bool|list|dict|Optional|Union|Any|string|number|boolean)\\b\")\nSEMICOLON_CHANGE = re.compile(r\"^[+-].*;\\s*$\")\nTRAILING_COMMA = re.compile(r\"^[+-].*,\\s*$\")\n\n\ndef get_diff(args):\n \"\"\"Get diff text from args.\"\"\"\n if args.file:\n return Path(args.file).read_text(encoding=\"utf-8\", errors=\"replace\")\n diff_range = args.diff or \"--staged\"\n cmd = [\"git\", \"diff\", diff_range] if diff_range != \"--staged\" else [\"git\", \"diff\", \"--staged\"]\n try:\n result = subprocess.run(cmd, capture_output=True, text=True, timeout=30)\n return result.stdout\n except (subprocess.TimeoutExpired, FileNotFoundError) as e:\n print(f\"[error] git diff failed: {e}\", file=sys.stderr)\n sys.exit(1)\n\n\ndef parse_hunks(diff_text):\n \"\"\"Parse a unified diff into per-file hunks.\"\"\"\n files = []\n current_file = None\n current_lines = []\n\n for line in diff_text.splitlines():\n if line.startswith(\"diff --git\"):\n if current_file:\n files.append({\"file\": current_file, \"lines\": current_lines})\n # Extract filename: diff --git a/path b/path\n parts = line.split(\" b/\")\n current_file = parts[-1] if len(parts) > 1 else \"unknown\"\n current_lines = []\n elif line.startswith(\"+++ \") or line.startswith(\"--- \"):\n continue\n elif line.startswith(\"@@\"):\n current_lines.append({\"type\": \"hunk_header\", \"text\": line})\n elif line.startswith(\"+\") or line.startswith(\"-\"):\n current_lines.append({\"type\": \"change\", \"text\": line})\n\n if current_file:\n files.append({\"file\": current_file, \"lines\": current_lines})\n return files\n\n\ndef classify_line(line_text):\n \"\"\"Classify a changed line. Returns a noise category or None if intentional.\"\"\"\n if WHITESPACE_ONLY.match(line_text):\n return \"whitespace\"\n if COMMENT_ONLY.match(line_text):\n return \"comment-only\"\n if DOCSTRING_ADD.match(line_text):\n return \"docstring-addition\"\n if SEMICOLON_CHANGE.match(line_text):\n # Check if ONLY change is semicolon\n stripped = line_text[1:].rstrip(\";\").rstrip()\n if not stripped.strip():\n return None\n return \"semicolon-style\"\n return None\n\n\ndef analyze_file_diff(file_data):\n \"\"\"Analyze a single file's diff for noise.\"\"\"\n findings = []\n change_lines = [l for l in file_data[\"lines\"] if l[\"type\"] == \"change\"]\n total_changes = len(change_lines)\n\n if total_changes == 0:\n return findings\n\n # Detect paired +/- that are only whitespace/style changes\n additions = [l[\"text\"] for l in change_lines if l[\"text\"].startswith(\"+\")]\n deletions = [l[\"text\"] for l in change_lines if l[\"text\"].startswith(\"-\")]\n\n noise_count = 0\n for line_data in change_lines:\n category = classify_line(line_data[\"text\"])\n if category:\n noise_count += 1\n findings.append({\n \"category\": category,\n \"line\": line_data[\"text\"][:120],\n })\n\n # Detect quote-style swaps (paired changes where only quotes differ)\n for a, d in zip(sorted(additions), sorted(deletions)):\n a_norm = a[1:].replace('\"', \"'\").strip()\n d_norm = d[1:].replace('\"', \"'\").strip()\n if a_norm == d_norm and a[1:].strip() != d[1:].strip():\n findings.append({\n \"category\": \"quote-style-swap\",\n \"line\": f\"{d[:60]} → {a[:60]}\",\n })\n\n noise_ratio = noise_count / total_changes if total_changes > 0 else 0\n return findings\n\n\ndef main():\n p = argparse.ArgumentParser(\n description=\"Detect diff noise — changes that don't trace to the stated goal (Karpathy Principle #3).\",\n epilog=\"Run before committing to catch drive-by refactors and style drift.\",\n )\n p.add_argument(\"--diff\", default=None, help=\"Git diff range (e.g. HEAD~1..HEAD). Default: staged changes.\")\n p.add_argument(\"--file\", default=None, help=\"Read diff from a file instead of git\")\n p.add_argument(\"--json\", action=\"store_true\", help=\"JSON output\")\n args = p.parse_args()\n\n diff_text = get_diff(args)\n if not diff_text.strip():\n result = {\"status\": \"ok\", \"message\": \"No diff to analyze\", \"files\": 0, \"noise_lines\": 0, \"verdict\": \"CLEAN\"}\n if args.json:\n print(json.dumps(result, indent=2))\n else:\n print(\"No diff to analyze. Stage changes first (git add) or specify --diff range.\")\n return\n\n file_diffs = parse_hunks(diff_text)\n all_findings = []\n file_results = []\n\n for fd in file_diffs:\n findings = analyze_file_diff(fd)\n if findings:\n file_results.append({\"file\": fd[\"file\"], \"findings\": findings})\n all_findings.extend(findings)\n\n total_noise = len(all_findings)\n total_changes = sum(\n len([l for l in fd[\"lines\"] if l[\"type\"] == \"change\"]) for fd in file_diffs\n )\n noise_ratio = total_noise / total_changes if total_changes > 0 else 0\n\n verdict = \"CLEAN\" if noise_ratio \u003c 0.1 else (\"NOISY\" if noise_ratio \u003c 0.3 else \"VERY_NOISY\")\n\n result = {\n \"status\": \"ok\",\n \"files_in_diff\": len(file_diffs),\n \"total_change_lines\": total_changes,\n \"noise_lines\": total_noise,\n \"noise_ratio\": round(noise_ratio, 2),\n \"verdict\": verdict,\n \"file_results\": file_results,\n }\n\n if args.json:\n print(json.dumps(result, indent=2))\n return\n\n print(f\"Diff Surgeon — {len(file_diffs)} files, {total_changes} changed lines\")\n print(f\"Noise ratio: {noise_ratio:.0%} ({total_noise} noise lines)\")\n print(f\"Verdict: {verdict}\")\n if file_results:\n print()\n for fr in file_results:\n print(f\" {fr['file']}:\")\n categories = {}\n for f in fr[\"findings\"]:\n categories.setdefault(f[\"category\"], []).append(f[\"line\"])\n for cat, lines in categories.items():\n print(f\" [{cat}] {len(lines)} instance(s)\")\n for l in lines[:3]:\n print(f\" {l}\")\n if len(lines) > 3:\n print(f\" ... and {len(lines) - 3} more\")\n print()\n print(\"Recommendation: review flagged lines. Remove changes that don't trace to your task.\")\n else:\n print(\"\\n All changes look intentional. Clean diff.\")\n\n sys.exit(1 if verdict != \"CLEAN\" else 0)\n\n\nif __name__ == \"__main__\":\n main()\n","content_type":"text/x-python; charset=utf-8","language":"python","size":7918,"content_sha256":"982ae9dbc050d3cf2b16968c13adf3acfe8b700d908efa5b14c94f12594c2f12"},{"filename":"scripts/goal_verifier.py","content":"#!/usr/bin/env python3\n\"\"\"\ngoal_verifier.py — Check if a plan has verifiable success criteria.\n\nKarpathy Principle #4 (Goal-Driven Execution): \"Define success criteria.\nLoop until verified. Don't tell it what to do — give it success criteria\nand watch it go.\"\n\nReads a markdown plan and scores:\n - Does each step have a verification check?\n - Are success criteria concrete (test, assertion, measurement)?\n - Are there vague criteria (\"make it work\", \"looks good\")?\n - Is there a final verification step?\n\nUsage:\n python goal_verifier.py plan.md\n python goal_verifier.py plan.md --json\n\nScoring:\n Each plan step gets 0-3 points:\n 3 = concrete verification (test assertion, metric, command)\n 2 = reasonable verification (manual check, visual)\n 1 = vague verification (\"should work\", \"looks right\")\n 0 = no verification mentioned\n\"\"\"\nfrom __future__ import annotations\nimport argparse\nimport json\nimport re\nimport sys\nfrom pathlib import Path\n\nCONCRETE_VERIFY = re.compile(\n r\"\\b(?:test\\s+pass|assert|assertEqual|expect\\(|\\.toBe|\\.toEqual|\"\n r\"exit\\s+code\\s*[=:]\\s*0|status\\s*[=:]\\s*200|curl\\s|\"\n r\"grep\\s|diff\\s|python.*test|npm\\s+test|pytest|jest|\"\n r\"measure|benchmark|metric|latency\\s*\u003c|throughput\\s*>)\\b\",\n re.I,\n)\nREASONABLE_VERIFY = re.compile(\n r\"\\b(?:verify|check|confirm|inspect|review|compare|validate|\"\n r\"run\\s+and\\s+see|manually|open\\s+in\\s+browser|visual|screenshot)\\b\",\n re.I,\n)\nVAGUE_VERIFY = re.compile(\n r\"\\b(?:should\\s+work|looks?\\s+(?:good|right|fine|ok)|\"\n r\"seems?\\s+(?:correct|fine)|hopefully|probably\\s+works?)\\b\",\n re.I,\n)\nSTEP_PATTERN = re.compile(r\"^(?:\\d+[\\.\\)]\\s+|[-*]\\s+\\[.\\]\\s+|[-*]\\s+(?:Step\\s+\\d+))\", re.M)\nVERIFY_LABEL = re.compile(r\"(?:verify|check|success\\s+criteria|done\\s+when|acceptance)\\s*:\", re.I)\n\n\ndef extract_steps(text):\n \"\"\"Extract plan steps from markdown.\"\"\"\n lines = text.splitlines()\n steps = []\n current_step = None\n current_body = []\n\n for line in lines:\n if STEP_PATTERN.match(line.strip()):\n if current_step:\n steps.append({\"title\": current_step, \"body\": \"\\n\".join(current_body)})\n current_step = line.strip()\n current_body = []\n elif current_step:\n current_body.append(line)\n\n if current_step:\n steps.append({\"title\": current_step, \"body\": \"\\n\".join(current_body)})\n\n return steps\n\n\ndef score_step(step):\n \"\"\"Score a step's verification quality (0-3).\"\"\"\n full_text = step[\"title\"] + \"\\n\" + step[\"body\"]\n\n if CONCRETE_VERIFY.search(full_text):\n return 3, \"concrete\"\n if VERIFY_LABEL.search(full_text) and REASONABLE_VERIFY.search(full_text):\n return 2, \"reasonable\"\n if REASONABLE_VERIFY.search(full_text):\n return 2, \"reasonable\"\n if VAGUE_VERIFY.search(full_text):\n return 1, \"vague\"\n return 0, \"none\"\n\n\ndef analyze_plan(text, source):\n \"\"\"Analyze a plan for verification quality.\"\"\"\n steps = extract_steps(text)\n\n if not steps:\n return {\n \"status\": \"ok\",\n \"source\": source,\n \"steps_found\": 0,\n \"message\": \"No numbered/bulleted plan steps found. Is this a plan?\",\n \"verdict\": \"NO_PLAN\",\n \"score\": 0,\n \"max_score\": 0,\n \"step_results\": [],\n }\n\n step_results = []\n total_score = 0\n max_score = len(steps) * 3\n\n for step in steps:\n pts, level = score_step(step)\n total_score += pts\n step_results.append({\n \"title\": step[\"title\"][:120],\n \"score\": pts,\n \"level\": level,\n \"has_verify_label\": bool(VERIFY_LABEL.search(step[\"body\"])),\n })\n\n # Check for final verification\n has_final = False\n if steps:\n last_full = steps[-1][\"title\"] + steps[-1][\"body\"]\n if re.search(r\"\\b(?:final|end-to-end|full.*test|regression|all.*pass)\\b\", last_full, re.I):\n has_final = True\n\n pct = (total_score / max_score * 100) if max_score > 0 else 0\n if pct >= 70:\n verdict = \"STRONG\"\n elif pct >= 40:\n verdict = \"WEAK\"\n else:\n verdict = \"MISSING\"\n\n return {\n \"status\": \"ok\",\n \"source\": source,\n \"steps_found\": len(steps),\n \"score\": total_score,\n \"max_score\": max_score,\n \"percentage\": round(pct, 1),\n \"has_final_verification\": has_final,\n \"verdict\": verdict,\n \"step_results\": step_results,\n \"recommendations\": _recommendations(step_results, has_final),\n }\n\n\ndef _recommendations(step_results, has_final):\n recs = []\n none_steps = [s for s in step_results if s[\"level\"] == \"none\"]\n vague_steps = [s for s in step_results if s[\"level\"] == \"vague\"]\n\n if none_steps:\n recs.append(f\"{len(none_steps)} step(s) have no verification. Add 'verify: [check]' to each.\")\n if vague_steps:\n recs.append(f\"{len(vague_steps)} step(s) have vague criteria. Replace 'should work' with a concrete check.\")\n if not has_final:\n recs.append(\"No final/end-to-end verification step. Add one at the end.\")\n if not recs:\n recs.append(\"Plan has strong verification coverage. Good to go.\")\n return recs\n\n\ndef main():\n p = argparse.ArgumentParser(\n description=\"Check if a plan has verifiable success criteria (Karpathy Principle #4).\",\n epilog=\"Scores each step 0-3 based on verification quality.\",\n )\n p.add_argument(\"input\", nargs=\"?\", default=\"-\", help=\"Markdown plan file, or - for stdin\")\n p.add_argument(\"--json\", action=\"store_true\", help=\"JSON output\")\n args = p.parse_args()\n\n if args.input == \"-\":\n text = sys.stdin.read()\n source = \"stdin\"\n else:\n path = Path(args.input)\n if not path.exists():\n print(f\"[error] {path} not found\", file=sys.stderr)\n sys.exit(1)\n text = path.read_text(encoding=\"utf-8\", errors=\"replace\")\n source = str(path)\n\n result = analyze_plan(text, source)\n\n if args.json:\n print(json.dumps(result, indent=2))\n return\n\n print(f\"Goal Verifier — {source}\")\n print(f\"Steps: {result['steps_found']} Score: {result['score']}/{result['max_score']} ({result['percentage']}%)\")\n print(f\"Verdict: {result['verdict']}\")\n print()\n\n for sr in result[\"step_results\"]:\n icon = {\"concrete\": \"+\", \"reasonable\": \"~\", \"vague\": \"?\", \"none\": \"!\"}[sr[\"level\"]]\n print(f\" [{icon}] {sr['title'][:100]} ({sr['level']}, {sr['score']}/3)\")\n\n print()\n for rec in result[\"recommendations\"]:\n print(f\" -> {rec}\")\n\n\nif __name__ == \"__main__\":\n main()\n","content_type":"text/x-python; charset=utf-8","language":"python","size":6633,"content_sha256":"b8b516ee11bcc61b5253bcb66db09769fe1f72a143194d095dd4e265b3a208ac"}],"content_json":{"type":"doc","content":[{"type":"heading","attrs":{"level":1},"content":[{"text":"Karpathy Coder — Active Coding Discipline","type":"text"}]},{"type":"paragraph","content":[{"text":"Derived from ","type":"text"},{"text":"Andrej Karpathy's observations","type":"text","marks":[{"type":"link","attrs":{"href":"https://x.com/karpathy/status/2015883857489522876","title":null}}]},{"text":" on LLM coding pitfalls. This is ","type":"text"},{"text":"not just guidelines","type":"text","marks":[{"type":"strong"}]},{"text":" — it ships Python tools that detect violations, a review agent, a slash command, and a pre-commit hook.","type":"text"}]},{"type":"blockquote","content":[{"type":"paragraph","content":[{"text":"\"The models make wrong assumptions on your behalf and just run along with them without checking. They don't manage their confusion, don't seek clarifications, don't surface inconsistencies, don't present tradeoffs, don't push back when they should.\"","type":"text"}]},{"type":"paragraph","content":[{"text":"\"They really like to overcomplicate code and APIs, bloat abstractions, don't clean up dead code... implement a bloated construction over 1000 lines when 100 would do.\"","type":"text"}]},{"type":"paragraph","content":[{"text":"\"LLMs are exceptionally good at looping until they meet specific goals... Don't tell it what to do, give it success criteria and watch it go.\"","type":"text"}]},{"type":"paragraph","content":[{"text":"— Andrej Karpathy","type":"text"}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"The four principles","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"1. Think Before Coding","type":"text"}]},{"type":"paragraph","content":[{"text":"Don't assume. Don't hide confusion. Surface tradeoffs.","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"State assumptions explicitly. If uncertain, ask.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"If multiple interpretations exist, present them — don't pick silently.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"If a simpler approach exists, say so. Push back when warranted.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"If something is unclear, stop. Name what's confusing. Ask.","type":"text"}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"2. Simplicity First","type":"text"}]},{"type":"paragraph","content":[{"text":"Minimum code that solves the problem. Nothing speculative.","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"No features beyond what was asked.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"No abstractions for single-use code.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"No \"flexibility\" or \"configurability\" that wasn't requested.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"No error handling for impossible scenarios.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"If you write 200 lines and it could be 50, rewrite it.","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"The test:","type":"text","marks":[{"type":"strong"}]},{"text":" Would a senior engineer say this is overcomplicated? If yes, simplify.","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"3. Surgical Changes","type":"text"}]},{"type":"paragraph","content":[{"text":"Touch only what you must. Clean up only your own mess.","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Don't \"improve\" adjacent code, comments, or formatting.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Don't refactor things that aren't broken.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Match existing style, even if you'd do it differently.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"If you notice unrelated dead code, mention it — don't delete it.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Remove imports/variables/functions that YOUR changes made unused.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Don't remove pre-existing dead code unless asked.","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"The test:","type":"text","marks":[{"type":"strong"}]},{"text":" Every changed line should trace directly to the user's request.","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"4. Goal-Driven Execution","type":"text"}]},{"type":"paragraph","content":[{"text":"Define success criteria. Loop until verified.","type":"text","marks":[{"type":"strong"}]}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Instead of...","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Transform to...","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"\"Add validation\"","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"\"Write tests for invalid inputs, then make them pass\"","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"\"Fix the bug\"","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"\"Write a test that reproduces it, then make it pass\"","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"\"Refactor X\"","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"\"Ensure tests pass before and after\"","type":"text"}]}]}]}]},{"type":"paragraph","content":[{"text":"For multi-step tasks, state a brief plan:","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":""},"content":[{"text":"1. [Step] → verify: [check]\n2. [Step] → verify: [check]\n3. [Step] → verify: [check]","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Slash command","type":"text"}]},{"type":"paragraph","content":[{"text":"/karpathy-check","type":"text","marks":[{"type":"code_inline"}]},{"text":" — Run the full 4-principle review on your staged changes.","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Python tools (","type":"text"},{"text":"scripts/","type":"text","marks":[{"type":"code_inline"}]},{"text":")","type":"text"}]},{"type":"paragraph","content":[{"text":"All tools are stdlib-only. Run with ","type":"text"},{"text":"--help","type":"text","marks":[{"type":"code_inline"}]},{"text":".","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Script","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"What it detects","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"complexity_checker.py","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Over-engineering: too many classes, deep nesting, high cyclomatic complexity, unused params, premature abstractions","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"diff_surgeon.py","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Diff noise: lines that don't trace to the stated goal — comment changes, style drift, drive-by refactors","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"assumption_linter.py","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Hidden assumptions in a plan: unasked features, missing clarifications, silent interpretation choices","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"goal_verifier.py","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Weak success criteria: vague plans without verifiable checks, missing test assertions","type":"text"}]}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Sub-agent","type":"text"}]},{"type":"paragraph","content":[{"text":"karpathy-reviewer","type":"text","marks":[{"type":"code_inline"}]},{"text":" — Runs all 4 principles against a diff. Dispatched by ","type":"text"},{"text":"/karpathy-check","type":"text","marks":[{"type":"code_inline"}]},{"text":" or manually before committing.","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Pre-commit hook","type":"text"}]},{"type":"paragraph","content":[{"text":"hooks/karpathy-gate.sh","type":"text","marks":[{"type":"code_inline"}]},{"text":" — runs ","type":"text"},{"text":"complexity_checker.py","type":"text","marks":[{"type":"code_inline"}]},{"text":" and ","type":"text"},{"text":"diff_surgeon.py","type":"text","marks":[{"type":"code_inline"}]},{"text":" on staged files. Warns (non-blocking) when violations are found. Wire it via ","type":"text"},{"text":".claude/settings.json","type":"text","marks":[{"type":"code_inline"}]},{"text":" or Husky.","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"References","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"references/karpathy-principles.md","type":"text","marks":[{"type":"code_inline"}]},{"text":" — the source quotes, deeper context, when to relax each principle","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"references/anti-patterns.md","type":"text","marks":[{"type":"code_inline"}]},{"text":" — 10+ before/after examples across Python, TypeScript, and shell","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"references/enforcement-patterns.md","type":"text","marks":[{"type":"code_inline"}]},{"text":" — how to wire hooks, CI integration, team adoption","type":"text"}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"When to relax","type":"text"}]},{"type":"paragraph","content":[{"text":"These principles bias toward ","type":"text"},{"text":"caution over speed","type":"text","marks":[{"type":"strong"}]},{"text":". For trivial tasks (typo fixes, obvious one-liners), use judgment. The principles matter most on:","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Non-trivial implementations (>20 lines changed)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Code you don't fully understand","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Multi-step tasks with unclear requirements","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Anything that will be reviewed by humans","type":"text"}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Cross-tool compatibility","type":"text"}]},{"type":"paragraph","content":[{"text":"Installs via plugin for Claude Code. For other tools, copy the principles into your schema file:","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Tool","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Schema file","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Claude Code","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"CLAUDE.md","type":"text","marks":[{"type":"code_inline"}]},{"text":" (auto-loaded by plugin)","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Codex CLI","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"AGENTS.md","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Cursor","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"AGENTS.md","type":"text","marks":[{"type":"code_inline"}]},{"text":" or ","type":"text"},{"text":".cursorrules","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Antigravity / OpenCode / Gemini CLI","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"AGENTS.md","type":"text","marks":[{"type":"code_inline"}]}]}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Related skills (chains via ","type":"text"},{"text":"context: fork","type":"text","marks":[{"type":"code_inline"}]},{"text":")","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"self-eval","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" — honest quality scoring after completing work","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"code-reviewer","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" — broader code review; karpathy-coder focuses on the 4 LLM-specific pitfalls","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"llm-wiki","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" — compound knowledge; karpathy-coder ensures you don't overcomplicate while building it","type":"text"}]}]}]},{"type":"hr","attrs":{"markup":"---"}}]},"metadata":{"date":"2026-06-05","name":"karpathy-coder","tags":["code-quality","discipline","karpathy","simplicity","surgical-changes","anti-patterns","review"],"author":"@skillopedia","source":{"stars":16818,"repo_name":"claude-skills","origin_url":"https://github.com/alirezarezvani/claude-skills/blob/HEAD/engineering/karpathy-coder/skills/karpathy-coder/SKILL.md","repo_owner":"alirezarezvani","body_sha256":"a0b4631bbc2eecd874c2b488dde6c1b7e0b964d39b5848e75cf3beb420ac67bf","cluster_key":"64c62e5a66a5b2f74f6cac36235a063dee40e5aecdad2fcabfd476beca03246b","clean_bundle":{"format":"clean-skill-bundle-v1","source":"alirezarezvani/claude-skills/engineering/karpathy-coder/skills/karpathy-coder/SKILL.md","attachments":[{"id":"07f31de5-5df4-52f7-9e35-423104ae7bfd","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/07f31de5-5df4-52f7-9e35-423104ae7bfd/attachment.json","path":"expected_outputs/assumption_linter.json","size":933,"sha256":"c5ecb4b74e4ba0ef177152af75dedc145ddc9ed3186ad681cb67e9d5224f8b49","contentType":"application/json; charset=utf-8"},{"id":"4f2c11cc-73bf-53d4-a345-bdccfe37d21a","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/4f2c11cc-73bf-53d4-a345-bdccfe37d21a/attachment.json","path":"expected_outputs/complexity_checker.json","size":584,"sha256":"1cc591af88d83561f74e992570bdbaf7b95166b96759a8b00ef02f882152c0ed","contentType":"application/json; charset=utf-8"},{"id":"3a527ca3-97ad-5b22-a49a-8be299599100","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/3a527ca3-97ad-5b22-a49a-8be299599100/attachment.json","path":"expected_outputs/diff_surgeon.json","size":158,"sha256":"baa0ec12b990f7f14bd3a68125b98b25ab914624e11e4abc6587bbffb2c87399","contentType":"application/json; charset=utf-8"},{"id":"4f0ebb95-0df3-5ea1-801c-4b8a6546eff0","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/4f0ebb95-0df3-5ea1-801c-4b8a6546eff0/attachment.json","path":"expected_outputs/goal_verifier.json","size":853,"sha256":"61b323d92e70c2baedabffb65b9e5723b20e6b048e4f96e47907f6595cea298b","contentType":"application/json; charset=utf-8"},{"id":"04a4e318-c01a-5a01-9216-b7001752037a","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/04a4e318-c01a-5a01-9216-b7001752037a/attachment.md","path":"references/anti-patterns.md","size":4965,"sha256":"08bd1915d8f4b03c5cfeb725caacc5238616f3ab5384b85845af12e9c58c5916","contentType":"text/markdown; charset=utf-8"},{"id":"f69875f6-986b-5624-9924-e55799edaa68","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/f69875f6-986b-5624-9924-e55799edaa68/attachment.md","path":"references/enforcement-patterns.md","size":3751,"sha256":"f03c5a3c1851e2453ce2da679848a4b1e8a47062756310a808e2d0a7048d3e9a","contentType":"text/markdown; charset=utf-8"},{"id":"91bb981a-b512-5d46-a7cc-6afeb6126d33","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/91bb981a-b512-5d46-a7cc-6afeb6126d33/attachment.md","path":"references/karpathy-principles.md","size":3754,"sha256":"c1470d2f73763d8a740830b39e3eff2784e35b7588a95bbb991754d290893bff","contentType":"text/markdown; charset=utf-8"},{"id":"e4035124-06fa-50ef-abe2-6570a284536f","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/e4035124-06fa-50ef-abe2-6570a284536f/attachment.py","path":"scripts/assumption_linter.py","size":6830,"sha256":"7569dc75fdff4533f0b13070a33f561a31a6f451cb8c2a6d06ab425eea78b191","contentType":"text/x-python; charset=utf-8"},{"id":"2658f31d-f730-505f-9974-32b05c5a53fb","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/2658f31d-f730-505f-9974-32b05c5a53fb/attachment.py","path":"scripts/complexity_checker.py","size":10976,"sha256":"d521f491b956484e00744a3b81713dfb4a9cd97a1768db9408605e181c91e80e","contentType":"text/x-python; charset=utf-8"},{"id":"8e482a6c-f345-5ffe-8924-eeb016a188b1","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/8e482a6c-f345-5ffe-8924-eeb016a188b1/attachment.py","path":"scripts/diff_surgeon.py","size":7918,"sha256":"982ae9dbc050d3cf2b16968c13adf3acfe8b700d908efa5b14c94f12594c2f12","contentType":"text/x-python; charset=utf-8"},{"id":"ea5f598b-1bcb-52c8-aedc-d75af9a67350","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/ea5f598b-1bcb-52c8-aedc-d75af9a67350/attachment.py","path":"scripts/goal_verifier.py","size":6633,"sha256":"b8b516ee11bcc61b5253bcb66db09769fe1f72a143194d095dd4e265b3a208ac","contentType":"text/x-python; charset=utf-8"}],"bundle_sha256":"69c99da6858658d13b109e75b5077e04c524d1532a9aff37d815b0a3272526d8","attachment_count":11,"text_attachments":11,"attachment_storage":"skillopedia-attachments-v1","binary_attachments":0,"excluded_attachments":[]},"cluster_size":2,"skill_md_path":"engineering/karpathy-coder/skills/karpathy-coder/SKILL.md","import_metadata":{"date":"2026-06-05","author":"@skillopedia","version":"v1","category":"ai-agent-development","category_label":"AI"},"exact_dupes_collapsed_into_this":1},"context":"fork","license":"MIT","version":"v1","category":"ai-agent-development","import_tag":"clean-skills-v1","description":"Use when writing, reviewing, or committing code to enforce Karpathy's 4 coding principles — surface assumptions before coding, keep it simple, make surgical changes, define verifiable goals. Triggers on \"review my diff\", \"check complexity\", \"am I overcomplicating this\", \"karpathy check\", \"before I commit\", or any code quality concern where the LLM might be overcoding.","compatible_tools":["claude-code","codex-cli","cursor","antigravity","opencode","gemini-cli"]}},"renderedAt":1782979738673}

Karpathy Coder — Active Coding Discipline Derived from Andrej Karpathy's observations on LLM coding pitfalls. This is not just guidelines — it ships Python tools that detect violations, a review agent, a slash command, and a pre-commit hook. "The models make wrong assumptions on your behalf and just run along with them without checking. They don't manage their confusion, don't seek clarifications, don't surface inconsistencies, don't present tradeoffs, don't push back when they should." "They really like to overcomplicate code and APIs, bloat abstractions, don't clean up dead code... implemen…