autoforge — Skillopedia

AutoForge — Autonomous Optimization Framework Stop reflecting. Start converging. Every iteration is measured, logged, and validated — not vibed. AutoForge replaces ad-hoc "improve this" prompts with a rigorous optimization loop: define evals, run iterations, track pass rates in TSV, report live to your channel, and stop only when math says you're done. Multi-model cross-validation prevents the "same model grades its own homework" blind spot. Four modes. One convergence standard. | Mode | What it does | Best for | |------|-------------|----------| | | Simulate 5 scenarios/iter, evaluate Yes/No…

\\t'\n\n# Generate progress chart\npython3 scripts/visualize.py results.tsv --title \"ML Experiment\"\n```\n\n## Preventing Sleep During Training\n\n```bash\n# macOS: prevent sleep while loop runs\ncaffeinate -i &\nCAFE_PID=$!\n# After the run: kill $CAFE_PID\n\n# Linux: use systemd-inhibit or screen/tmux\nsystemd-inhibit --what=idle uv run train.py\n```\n\n## Integration with AutoForge\n\nML mode integrates with the standard autoforge infrastructure:\n\n1. **TSV tracking** — Same format, `val_bpb` maps to `pass_rate` (inverted: lower is better)\n2. **Reporting** — `report.sh` works unchanged, showing progress bars\n3. **Stop conditions** — Same convergence rules apply (adapt for minimization)\n4. **Visualization** — `visualize.py` charts the training curve\n\nTo adapt stop conditions for minimization (lower = better):\n- `improved` = val_bpb is **lower** than previous best\n- `retained` = val_bpb is equal\n- `discard` = val_bpb is higher\n","content_type":"text/markdown; charset=utf-8","language":"markdown","size":3004,"content_sha256":"8826681430db6077465e881cfd152870b2462365175021810de719b4e3070f08"},{"filename":"results/email-prompt-proposed.md","content":"# Email Briefing Prompt\nCheck unread emails (max 10). For each important email:\n1. Sender\n2. One-sentence summary\n3. Action: Reply / Ignore / Forward\n4. If Reply: draft a copy-paste reply suggestion\n\nPriority: Bank, Arzt, Behörde, Kunden, Rechnungen, Verträge, Fristen > Freunde > Rest\nIgnore completely: Newsletter, Marketing, LinkedIn, Spam, Massenmails\nIf nothing important: \"Keine wichtigen Emails.\"\nMax 300 words total.\n","content_type":"text/markdown; charset=utf-8","language":"markdown","size":427,"content_sha256":"7882c035e5b5bf9dea410a102695ea023df72972d745f6e3b781c990433ca2a7"},{"filename":"scripts/report.sh","content":"#!/bin/bash\n# AutoForge Report — Live progress updates with Unicode bars\n# Supports: Telegram, Discord, Slack, stdout (ANSI fallback)\n#\n# Usage: ./report.sh [results.tsv] [skill-name] [--final] [--json]\n#\n# Environment:\n# AF_CHANNEL — Messaging channel (telegram, discord, slack). Default: telegram\n# AF_CHAT_ID — Chat/group ID for delivery. If unset, prints to stdout.\n# AF_TOPIC_ID — Thread/topic ID within the chat (optional).\n#\n# Examples:\n# AF_CHAT_ID=\"-100123456\" AF_TOPIC_ID=\"2211\" ./report.sh results.tsv \"My Skill\"\n# ./report.sh results.tsv \"My Skill\" --final\n# ./report.sh results.tsv \"My Skill\" --json\n\nset -euo pipefail\n\nRESULTS_FILE=\"${1:-results.tsv}\"\nSKILL_NAME=\"${2:-Skill}\"\nshift 2 2>/dev/null || true\n\n# Parse flags\nFINAL_FLAG=\"false\"\nJSON_FLAG=\"\"\nfor arg in \"$@\"; do\n case \"$arg\" in\n --final) FINAL_FLAG=\"true\" ;;\n --json) JSON_FLAG=\"yes\" ;;\n esac\ndone\n\n# Configuration from environment\nCHANNEL=\"${AF_CHANNEL:-telegram}\"\nCHAT_ID=\"${AF_CHAT_ID:-}\"\nTOPIC_ID=\"${AF_TOPIC_ID:-}\"\n\n# --- Validation ---\n\nif [ ! -f \"$RESULTS_FILE\" ]; then\n echo \"Error: Results file not found: $RESULTS_FILE\" >&2\n exit 1\nfi\n\nLINE_COUNT=$(tail -n +2 \"$RESULTS_FILE\" 2>/dev/null | wc -l | tr -d ' ')\nif [ \"$LINE_COUNT\" -eq 0 ]; then\n echo \"Error: No data rows in $RESULTS_FILE\" >&2\n exit 1\nfi\n\n# --- Data Extraction ---\n\nTOTAL=$(tail -n +2 \"$RESULTS_FILE\" | wc -l | tr -d ' ')\nKEEP=$(tail -n +2 \"$RESULTS_FILE\" | awk -F'\\t' '{s=$NF} s==\"keep\"||s==\"best\"||s==\"improved\"||s==\"retained\"||s==\"baseline\" {c++} END{print c+0}')\nDISCARD=$(tail -n +2 \"$RESULTS_FILE\" | awk -F'\\t' '$NF==\"discard\" {c++} END{print c+0}')\nBEST=$(tail -n +2 \"$RESULTS_FILE\" | awk -F'\\t' '{val=$3; gsub(/%/,\"\",val); if(val ~ /^[0-9.]+$/ && val+0>max+0)max=val} END{print max+0}')\nBEST_ITER=$(tail -n +2 \"$RESULTS_FILE\" | awk -F'\\t' -v best=\"$BEST\" '{val=$3; gsub(/%/,\"\",val); if(val ~ /^[0-9.]+$/ && val+0==best+0){print NR; exit}}')\nLAST_RATE=$(tail -n +2 \"$RESULTS_FILE\" | tail -1 | awk -F'\\t' '{print $3}')\nLAST_STATUS=$(tail -n +2 \"$RESULTS_FILE\" | tail -1 | awk -F'\\t' '{print $NF}')\n\n# --- JSON Output ---\n\nif [ \"$JSON_FLAG\" = \"yes\" ]; then\n # Collect iteration data as JSON array\n ITER_JSON=$(tail -n +2 \"$RESULTS_FILE\" | awk -F'\\t' '\n BEGIN { printf \"[\" }\n NR>1 { printf \",\" }\n {\n gsub(/\"/, \"\\\\\\\"\", $2);\n gsub(/\"/, \"\\\\\\\"\", $4);\n gsub(/%/, \"\", $3);\n printf \"{\\\"iteration\\\":%s,\\\"summary\\\":\\\"%s\\\",\\\"pass_rate\\\":%s,\\\"change\\\":\\\"%s\\\",\\\"status\\\":\\\"%s\\\"}\", $1, $2, ($3 ~ /^[0-9.]+$/ ? $3 : \"0\"), $4, $5\n }\n END { printf \"]\" }\n ')\n\n cat \u003c\u003cEOF\n{\n \"skill\": \"${SKILL_NAME}\",\n \"total_iterations\": ${TOTAL},\n \"kept\": ${KEEP},\n \"discarded\": ${DISCARD},\n \"best_pass_rate\": ${BEST},\n \"best_iteration\": ${BEST_ITER:-0},\n \"last_rate\": \"${LAST_RATE}\",\n \"last_status\": \"${LAST_STATUS}\",\n \"final\": ${FINAL_FLAG},\n \"iterations\": ${ITER_JSON}\n}\nEOF\n exit 0\nfi\n\n# --- Build Unicode Bar Display ---\n\nITER_LINES=\"\"\nwhile IFS=

Important: agents should read /llm.txt, /llms.txt, or /.well-known/skills.json to discover the public Skillopedia API.

\\t' read -r iter summary rate change status; do\n rate_num=\"${rate//%/}\"\n # Skip non-numeric rates (audit mode: PASS/FAIL)\n if ! echo \"$rate_num\" | grep -qE '^[0-9.]+

Important: agents should read /llm.txt, /llms.txt, or /.well-known/skills.json to discover the public Skillopedia API.

; then\n rate_num=\"0\"\n fi\n\n # Build progress bar (pure bash)\n filled=$((rate_num / 5))\n empty=$((20 - filled))\n bar=\"\"\n for ((b=0; b\u003cfilled; b++)); do bar=\"${bar}█\"; done\n for ((b=0; b\u003cempty; b++)); do bar=\"${bar}░\"; done\n\n case \"$status\" in\n keep|improved|retained|best) icon=\"✅\" ;;\n discard) icon=\"❌\" ;;\n crash) icon=\"💥\" ;;\n baseline) icon=\"📍\" ;;\n *) icon=\"🔹\" ;;\n esac\n\n ITER_LINES=\"${ITER_LINES}\n${icon} Iter ${iter} ${bar} ${rate}\"\ndone \u003c \u003c(tail -n +2 \"$RESULTS_FILE\")\n\n# --- Build Message ---\n\nif [ \"$FINAL_FLAG\" = \"true\" ]; then\n case \"$LAST_STATUS\" in\n improved|best) CONCLUSION=\"✅ Loop converged — improvement found\" ;;\n retained) CONCLUSION=\"➡️ Loop stable — no further improvement potential\" ;;\n discard) CONCLUSION=\"⚠️ Last attempt discarded — best state from Iter ${BEST_ITER}\" ;;\n *) CONCLUSION=\"🏁 Loop finished\" ;;\n esac\n\n # Channel-specific formatting\n case \"$CHANNEL\" in\n discord)\n # Discord: no markdown in code blocks, simpler formatting\n MSG=\"📊 **AutoForge complete: ${SKILL_NAME}**\n${ITER_LINES}\n\n──────────────────────\nIterations: ${TOTAL} ✅ Keep: ${KEEP} ❌ Discard: ${DISCARD}\n🏆 Best pass rate: ${BEST}% (Iter ${BEST_ITER})\n\n${CONCLUSION}\n\n_In --dry-run mode: No changes written. Approve for --live?_\"\n ;;\n *)\n MSG=\"📊 *AutoForge complete: ${SKILL_NAME}*\n${ITER_LINES}\n\n──────────────────────\nIterations: ${TOTAL} ✅ Keep: ${KEEP} ❌ Discard: ${DISCARD}\n🏆 Best pass rate: ${BEST}% (Iter ${BEST_ITER})\n\n${CONCLUSION}\n\n_In --dry-run mode: No changes written. Approve for --live?_\"\n ;;\n esac\nelse\n case \"$CHANNEL\" in\n discord)\n MSG=\"📊 **AutoForge: ${SKILL_NAME}**\n${ITER_LINES}\n\n──────────────────────\nIterations: ${TOTAL} ✅ Keep: ${KEEP} ❌ Discard: ${DISCARD}\n🏆 Best: ${BEST}%\"\n ;;\n *)\n MSG=\"📊 *AutoForge: ${SKILL_NAME}*\n${ITER_LINES}\n\n──────────────────────\nIterations: ${TOTAL} ✅ Keep: ${KEEP} ❌ Discard: ${DISCARD}\n🏆 Best: ${BEST}%\"\n ;;\n esac\nfi\n\n# --- Deliver ---\n\nif [ -n \"$CHAT_ID\" ] && command -v openclaw &>/dev/null; then\n # Build openclaw command\n CMD=\"openclaw message send --channel ${CHANNEL} --target ${CHAT_ID}\"\n if [ -n \"$TOPIC_ID\" ]; then\n CMD=\"${CMD} --thread-id ${TOPIC_ID}\"\n fi\n CMD=\"${CMD} --message\"\n\n $CMD \"$MSG\"\nelse\n # Stdout fallback with ANSI colors\n if [ -t 1 ]; then\n # Terminal: add colors\n echo \"\"\n echo -e \"\\033[1;36m${MSG}\\033[0m\"\n echo \"\"\n if [ -z \"$CHAT_ID\" ]; then\n echo -e \"\\033[33mTip: Set AF_CHAT_ID to deliver reports to a channel.\\033[0m\"\n fi\n if ! command -v openclaw &>/dev/null; then\n echo -e \"\\033[33mTip: Install openclaw CLI for channel delivery.\\033[0m\"\n fi\n else\n # Piped: plain text\n echo \"$MSG\"\n fi\nfi\n","content_type":"application/x-sh; charset=utf-8","language":"bash","size":6152,"content_sha256":"aa13a3347cecce2fa87603434e61004c2c6046e49802d1144da015676630eef4"},{"filename":"scripts/visualize.py","content":"#!/usr/bin/env python3\n\"\"\"\nAutoForge Visualizer\nReads results.tsv and generates a pass-rate chart as PNG.\nUsage: python3 visualize.py [results.tsv] [--output ./results/progress.png] [--title \"Skill Name\"]\n\"\"\"\n\nimport sys\nimport csv\nimport argparse\nfrom pathlib import Path\n\n\ndef main():\n parser = argparse.ArgumentParser(description=\"Generate pass-rate progress chart from autoforge results.\")\n parser.add_argument(\"results\", nargs=\"?\", default=\"results.tsv\", help=\"Path to results TSV file\")\n parser.add_argument(\"--output\", default=\"./results/af-progress.png\", help=\"Output PNG path\")\n parser.add_argument(\"--title\", default=\"AutoForge Progress\", help=\"Chart title\")\n args = parser.parse_args()\n\n # Read TSV\n rows = []\n with open(args.results, newline=\"\") as f:\n reader = csv.DictReader(f, delimiter=\"\\t\")\n for row in reader:\n rows.append(row)\n\n if not rows:\n print(\"No data in results file.\")\n sys.exit(1)\n\n # Extract data\n iterations = list(range(1, len(rows) + 1))\n\n # Parse pass rate (e.g. \"83%\" or \"0.83\")\n pass_rates = []\n for r in rows:\n val = r.get(\"pass_rate\", \"0\").strip().rstrip(\"%\")\n try:\n v = float(val)\n if v \u003c= 1.0:\n v *= 100\n pass_rates.append(v)\n except ValueError:\n pass_rates.append(0)\n\n statuses = [r.get(\"status\", \"keep\") for r in rows]\n changes = [r.get(\"change_description\", \"\") for r in rows]\n\n # Matplotlib chart\n try:\n import matplotlib\n matplotlib.use(\"Agg\")\n import matplotlib.pyplot as plt\n import matplotlib.patches as mpatches\n\n fig, ax = plt.subplots(figsize=(10, 5))\n fig.patch.set_facecolor(\"#1a1a2e\")\n ax.set_facecolor(\"#16213e\")\n\n # Line\n ax.plot(iterations, pass_rates, color=\"#e94560\", linewidth=2.5, zorder=3, marker=\"o\", markersize=8)\n\n # Color points by status\n keep_statuses = {\"keep\", \"improved\", \"retained\", \"baseline\", \"best\"}\n for i, (x, y, status) in enumerate(zip(iterations, pass_rates, statuses)):\n color = \"#00b4d8\" if status in keep_statuses else \"#e94560\"\n ax.scatter(x, y, color=color, s=100, zorder=4)\n\n # 80% threshold line\n ax.axhline(y=80, color=\"#ffffff\", linestyle=\"--\", linewidth=1, alpha=0.4, label=\"80% target\")\n\n # Axes\n ax.set_xlabel(\"Iteration\", color=\"#cccccc\", fontsize=11)\n ax.set_ylabel(\"Pass Rate (%)\", color=\"#cccccc\", fontsize=11)\n ax.set_title(args.title, color=\"#ffffff\", fontsize=14, fontweight=\"bold\", pad=15)\n ax.set_ylim(0, 105)\n ax.set_xticks(iterations)\n ax.tick_params(colors=\"#cccccc\")\n for spine in ax.spines.values():\n spine.set_edgecolor(\"#444444\")\n\n # Legend\n keep_patch = mpatches.Patch(color=\"#00b4d8\", label=\"Keep/Improved\")\n discard_patch = mpatches.Patch(color=\"#e94560\", label=\"Discard\")\n ax.legend(handles=[keep_patch, discard_patch], facecolor=\"#1a1a2e\",\n labelcolor=\"#cccccc\", framealpha=0.8)\n\n # Annotate best pass rate\n best_idx = pass_rates.index(max(pass_rates))\n ax.annotate(f\"Best: {max(pass_rates):.0f}%\",\n xy=(iterations[best_idx], pass_rates[best_idx]),\n xytext=(iterations[best_idx] + 0.3, pass_rates[best_idx] - 8),\n color=\"#ffffff\", fontsize=10,\n arrowprops=dict(arrowstyle=\"->\", color=\"#ffffff\", lw=1.2))\n\n # Change labels (short, below X axis)\n for i, (x, change) in enumerate(zip(iterations, changes)):\n short = change[:20] + \"…\" if len(change) > 20 else change\n ax.text(x, -12, short, ha=\"center\", va=\"top\", fontsize=7,\n color=\"#888888\", rotation=30, transform=ax.get_xaxis_transform())\n\n # Ensure output directory exists\n Path(args.output).parent.mkdir(parents=True, exist_ok=True)\n\n plt.tight_layout()\n plt.savefig(args.output, dpi=150, bbox_inches=\"tight\", facecolor=fig.get_facecolor())\n plt.close()\n print(f\"Chart saved: {args.output}\")\n return args.output\n\n except ImportError:\n # Fallback: ASCII chart\n print(f\"\\n📊 {args.title}\")\n print(\"─\" * 50)\n for i, (x, y, s) in enumerate(zip(iterations, pass_rates, statuses)):\n bar = \"█\" * int(y / 5)\n icon = \"✅\" if s in (\"keep\", \"improved\", \"retained\", \"baseline\", \"best\") else \"❌\"\n print(f\" Iter {x:2d} {icon} {bar:\u003c20} {y:.0f}%\")\n print(f\"\\n Best: {max(pass_rates):.0f}% @ Iter {pass_rates.index(max(pass_rates))+1}\")\n print(\"─\" * 50)\n print(\"(matplotlib not installed — ASCII fallback)\")\n sys.exit(0)\n\n\nif __name__ == \"__main__\":\n main()\n","content_type":"text/x-python; charset=utf-8","language":"python","size":4825,"content_sha256":"328e29e9bd909ca712f92c83e44abed6c794d4a8e2686ee8f0feba2c81ef237e"}],"content_json":{"type":"doc","content":[{"type":"heading","attrs":{"level":1},"content":[{"text":"AutoForge — Autonomous Optimization Framework","type":"text"}]},{"type":"blockquote","content":[{"type":"paragraph","content":[{"text":"Stop reflecting. Start converging. Every iteration is measured, logged, and validated — not vibed.","type":"text"}]}]},{"type":"paragraph","content":[{"text":"AutoForge replaces ad-hoc \"improve this\" prompts with a rigorous optimization loop: define evals, run iterations, track pass rates in TSV, report live to your channel, and stop only when math says you're done. Multi-model cross-validation prevents the \"same model grades its own homework\" blind spot.","type":"text"}]},{"type":"paragraph","content":[{"text":"Four modes. One convergence standard.","type":"text","marks":[{"type":"strong"}]}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Mode","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"What it does","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Best for","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"prompt","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Simulate 5 scenarios/iter, evaluate Yes/No","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"SKILL.md, prompts, doc templates","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"code","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Sandboxed test execution, measure exit/stdout/stderr","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Shell scripts, Python tools, pipelines","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"audit","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Test CLI commands live, verify SKILL.md matches reality","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"CLI skill documentation","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"project","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Scan whole repo, cross-file consistency analysis","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"README↔CLI drift, Dockerfile↔deps, CI gaps","type":"text"}]}]}]}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":1},"content":[{"text":"AutoForge — Top-Agent Architecture","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Overview","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":""},"content":[{"text":"Agent (you)\n├── State: results.tsv, current target file state, iteration counter\n├── Iteration 1: evaluate → improve → write TSV → report\n├── Iteration 2: evaluate → improve → write TSV → report\n├── ...\n└── Finish: report.sh --final → configured channel","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Sub-Agent = You","type":"text"}]},{"type":"paragraph","content":[{"text":"\"Sub-Agent\" is a ","type":"text"},{"text":"conceptual role","type":"text","marks":[{"type":"strong"}]},{"text":", not a separate process. You (the top-agent) execute each iteration yourself: simulate/execute → evaluate → write TSV → call report.sh. The templates below describe what you do PER ITERATION — not what you send to another agent.","type":"text"}]},{"type":"paragraph","content":[{"text":"For code mode, run tests using the ","type":"text"},{"text":"exec","type":"text","marks":[{"type":"code_inline"}]},{"text":" tool.","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Multi-Model Setup (recommended for Deep Audits)","type":"text"}]},{"type":"paragraph","content":[{"text":"For complex audits, you can ","type":"text"},{"text":"split two roles across different models","type":"text","marks":[{"type":"strong"}]},{"text":":","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Role","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Model","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Task","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Optimizer","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Opus / GPT-4.1","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Analyzes, finds issues, writes fixes","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Validator","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"GPT-5 / Gemini (different model)","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Checks against ground truth, provides pass rate","type":"text"}]}]}]}]},{"type":"paragraph","content":[{"text":"Flow:","type":"text","marks":[{"type":"strong"}]},{"text":" Optimizer and Validator alternate. Optimizer iterations have status ","type":"text"},{"text":"improved","type":"text","marks":[{"type":"code_inline"}]},{"text":"/","type":"text"},{"text":"retained","type":"text","marks":[{"type":"code_inline"}]},{"text":"/","type":"text"},{"text":"discard","type":"text","marks":[{"type":"code_inline"}]},{"text":". Validator iterations confirm or refute the pass rate. Spawn validators as sub-agents with ","type":"text"},{"text":"sessions_spawn","type":"text","marks":[{"type":"code_inline"}]},{"text":" and explicit ","type":"text"},{"text":"model","type":"text","marks":[{"type":"code_inline"}]},{"text":".","type":"text"}]},{"type":"paragraph","content":[{"text":"When to use Multi-Model:","type":"text","marks":[{"type":"strong"}]},{"text":" Deep Audits (>5 iterations expected), complex ground truth, or when a single model is blind to its own errors.","type":"text"}]},{"type":"paragraph","content":[{"text":"When Single-Model suffices:","type":"text","marks":[{"type":"strong"}]},{"text":" Simple CLI audits, prompt optimization, code with clear tests.","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Configuration","type":"text"}]},{"type":"paragraph","content":[{"text":"AutoForge uses environment variables for reporting. All are optional — without them, output goes to stdout.","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Variable","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Default","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Description","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"AF_CHANNEL","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"telegram","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Messaging channel for reports","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"AF_CHAT_ID","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"(none)","type":"text","marks":[{"type":"em"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Chat/group ID for report delivery","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"AF_TOPIC_ID","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"(none)","type":"text","marks":[{"type":"em"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Thread/topic ID within the chat","type":"text"}]}]}]}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Hard Invariants","type":"text"}]},{"type":"paragraph","content":[{"text":"These rules apply ","type":"text"},{"text":"always","type":"text","marks":[{"type":"strong"}]},{"text":", regardless of mode:","type":"text"}]},{"type":"ordered_list","attrs":{"order":1,"listStyle":"number"},"content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"TSV is mandatory.","type":"text","marks":[{"type":"strong"}]},{"text":" Every iteration writes exactly one row to ","type":"text"},{"text":"results/[target]-results.tsv","type":"text","marks":[{"type":"code_inline"}]},{"text":".","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Reporting is mandatory.","type":"text","marks":[{"type":"strong"}]},{"text":" Call ","type":"text"},{"text":"report.sh","type":"text","marks":[{"type":"code_inline"}]},{"text":" immediately after every TSV row.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"--dry-run never overwrites the target.","type":"text","marks":[{"type":"strong"}]},{"text":" Only TSV, ","type":"text"},{"text":"*-proposed.md","type":"text","marks":[{"type":"code_inline"}]},{"text":", and reports are written.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Mode isolation is strict.","type":"text","marks":[{"type":"strong"}]},{"text":" Only execute steps for the assigned mode.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Iteration 1 = Baseline.","type":"text","marks":[{"type":"strong"}]},{"text":" Evaluate the original version unchanged, status ","type":"text"},{"text":"baseline","type":"text","marks":[{"type":"code_inline"}]},{"text":".","type":"text"}]}]}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Modes — Read ONLY Your Mode!","type":"text"}]},{"type":"paragraph","content":[{"text":"You are assigned ONE mode. ","type":"text"},{"text":"Ignore all sections for other modes.","type":"text","marks":[{"type":"strong"}]}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Mode","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"What happens","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Output","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"prompt","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Mentally simulate skill/prompt, evaluate against evals","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Improved prompt text","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"code","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Run tests in sandbox, measure results","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Improved code","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"audit","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Test CLI commands (read-only only!) + verify SKILL.md against reality","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Improved SKILL.md","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"project","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Scan whole repo, cross-file analysis, fix multiple files per iteration","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Improved repository","type":"text"}]}]}]}]},{"type":"paragraph","content":[{"text":"Your mode is in the task prompt.","type":"text","marks":[{"type":"strong"}]},{"text":" Everything else is irrelevant to you.","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"TSV Format (same for ALL modes)","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Header (once at loop start):","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"printf '%s\\t%s\\t%s\\t%s\\t%s\\n' \"iteration\" \"prompt_version_summary\" \"pass_rate\" \"change_description\" \"status\" > results/[target]-results.tsv","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Row per iteration:","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"printf '%s\\t%s\\t%s\\t%s\\t%s\\n' \"1\" \"Baseline\" \"58%\" \"Original version\" \"baseline\" >> results/[target]-results.tsv","type":"text"}]},{"type":"blockquote","content":[{"type":"paragraph","content":[{"text":"Use ","type":"text","marks":[{"type":"strong"}]},{"text":"printf","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" not ","type":"text","marks":[{"type":"strong"}]},{"text":"echo -e","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":"!","type":"text","marks":[{"type":"strong"}]},{"text":" ","type":"text"},{"text":"echo -e","type":"text","marks":[{"type":"code_inline"}]},{"text":" interprets backslashes in field values. ","type":"text"},{"text":"printf '%s'","type":"text","marks":[{"type":"code_inline"}]},{"text":" outputs strings literally.","type":"text"}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"5 columns, TAB-separated, EXACTLY this order:","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"#","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Column","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Type","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Rules","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"1","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"iteration","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Integer","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"1, 2, 3, ...","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"2","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"prompt_version_summary","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"String","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Max 50 Unicode chars. No tabs, no newlines.","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"3","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"pass_rate","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"String","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Number + ","type":"text"},{"text":"%","type":"text","marks":[{"type":"code_inline"}]},{"text":": ","type":"text"},{"text":"58%","type":"text","marks":[{"type":"code_inline"}]},{"text":", ","type":"text"},{"text":"92%","type":"text","marks":[{"type":"code_inline"}]},{"text":", ","type":"text"},{"text":"100%","type":"text","marks":[{"type":"code_inline"}]},{"text":". Always integer.","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"4","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"change_description","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"String","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Max 100 Unicode chars. No tabs, no newlines.","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"5","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"status","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Enum","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Exactly one of: ","type":"text"},{"text":"baseline","type":"text","marks":[{"type":"code_inline"}]},{"text":" · ","type":"text"},{"text":"improved","type":"text","marks":[{"type":"code_inline"}]},{"text":" · ","type":"text"},{"text":"retained","type":"text","marks":[{"type":"code_inline"}]},{"text":" · ","type":"text"},{"text":"discard","type":"text","marks":[{"type":"code_inline"}]}]}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Escaping rules:","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Tabs","type":"text","marks":[{"type":"strong"}]},{"text":" in text fields → replace with spaces","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Newlines","type":"text","marks":[{"type":"strong"}]},{"text":" in text fields → replace with ","type":"text"},{"text":"|","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Empty fields","type":"text","marks":[{"type":"strong"}]},{"text":" → use hyphen ","type":"text"},{"text":"-","type":"text","marks":[{"type":"code_inline"}]},{"text":" (never leave empty)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"$","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" and backticks","type":"text","marks":[{"type":"strong"}]},{"text":" → use ","type":"text"},{"text":"printf '%s'","type":"text","marks":[{"type":"code_inline"}]},{"text":" or escape with ","type":"text"},{"text":"\\$","type":"text","marks":[{"type":"code_inline"}]},{"text":" (prevents unintended variable interpolation)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Unicode/Emoji","type":"text","marks":[{"type":"strong"}]},{"text":" allowed, count as 1 character (not bytes)","type":"text"}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Status rules (based on pass-rate comparison):","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"baseline","type":"text","marks":[{"type":"code_inline"}]},{"text":" — ","type":"text"},{"text":"Mandatory for Iteration 1.","type":"text","marks":[{"type":"strong"}]},{"text":" Evaluate original version only.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"improved","type":"text","marks":[{"type":"code_inline"}]},{"text":" — Pass rate ","type":"text"},{"text":"higher","type":"text","marks":[{"type":"strong"}]},{"text":" than previous best → new version becomes current state","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"retained","type":"text","marks":[{"type":"code_inline"}]},{"text":" — Pass rate ","type":"text"},{"text":"equal or marginally better","type":"text","marks":[{"type":"strong"}]},{"text":" → predecessor remains","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"discard","type":"text","marks":[{"type":"code_inline"}]},{"text":" — Pass rate ","type":"text"},{"text":"lower","type":"text","marks":[{"type":"strong"}]},{"text":" → change discarded, revert to best state","type":"text"}]}]}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Reporting (same for ALL modes)","type":"text"}]},{"type":"paragraph","content":[{"text":"After EVERY TSV row","type":"text","marks":[{"type":"strong"}]},{"text":" (including baseline):","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"bash scripts/report.sh results/[target]-results.tsv \"[Skill Name]\"","type":"text"}]},{"type":"paragraph","content":[{"text":"After loop ends","type":"text","marks":[{"type":"strong"}]},{"text":", additionally with ","type":"text"},{"text":"--final","type":"text","marks":[{"type":"code_inline"}]},{"text":":","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"bash scripts/report.sh results/[target]-results.tsv \"[Skill Name]\" --final","type":"text"}]},{"type":"paragraph","content":[{"text":"The report script reads ","type":"text"},{"text":"AF_CHANNEL","type":"text","marks":[{"type":"code_inline"}]},{"text":", ","type":"text"},{"text":"AF_CHAT_ID","type":"text","marks":[{"type":"code_inline"}]},{"text":", and ","type":"text"},{"text":"AF_TOPIC_ID","type":"text","marks":[{"type":"code_inline"}]},{"text":" from environment. Without them, it prints to stdout with ANSI colors.","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Stop Conditions (for ALL modes)","type":"text"}]},{"type":"paragraph","content":[{"text":"Priority — first matching condition wins, top to bottom:","type":"text"}]},{"type":"ordered_list","attrs":{"order":1,"listStyle":"number"},"content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"🛑 ","type":"text"},{"text":"Minimum iterations","type":"text","marks":[{"type":"strong"}]},{"text":" — If specified in task (e.g. \"min 5\"), this count MUST be reached. No other condition can stop before.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"🛑 ","type":"text"},{"text":"Max 30 iterations","type":"text","marks":[{"type":"strong"}]},{"text":" — Hard safety net, stop immediately.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"❌ ","type":"text"},{"text":"3× ","type":"text","marks":[{"type":"strong"}]},{"text":"discard","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" in a row","type":"text","marks":[{"type":"strong"}]},{"text":" → structural problem, stop + analyze.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"✅ ","type":"text"},{"text":"3× 100% pass rate","type":"text","marks":[{"type":"strong"}]},{"text":" (after minimum) → confirmed perfect, done.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"➡️ ","type":"text"},{"text":"5× ","type":"text","marks":[{"type":"strong"}]},{"text":"retained","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" in a row","type":"text","marks":[{"type":"strong"}]},{"text":" → converged, done.","type":"text"}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Counting rules:","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"3× 100%","type":"text","marks":[{"type":"code_inline"}]},{"text":" = three iterations with ","type":"text"},{"text":"pass_rate == 100%","type":"text","marks":[{"type":"code_inline"}]},{"text":", not necessarily consecutive.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"5× retained","type":"text","marks":[{"type":"code_inline"}]},{"text":" and ","type":"text"},{"text":"3× discard","type":"text","marks":[{"type":"code_inline"}]},{"text":" = ","type":"text"},{"text":"consecutive","type":"text","marks":[{"type":"strong"}]},{"text":" (in a row).","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"baseline","type":"text","marks":[{"type":"code_inline"}]},{"text":" counts toward no series.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"improved","type":"text","marks":[{"type":"code_inline"}]},{"text":" interrupts ","type":"text"},{"text":"retained","type":"text","marks":[{"type":"code_inline"}]},{"text":" and ","type":"text"},{"text":"discard","type":"text","marks":[{"type":"code_inline"}]},{"text":" series.","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"At 100% in early iterations:","type":"text","marks":[{"type":"strong"}]},{"text":" Keep going! Test harder edge cases. Only 3× 100% ","type":"text"},{"text":"after the minimum","type":"text","marks":[{"type":"em"}]},{"text":" confirms true perfection.","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Recognizing Validator Noise","type":"text"}]},{"type":"paragraph","content":[{"text":"In multi-model setups, the Validator can produce ","type":"text"},{"text":"false positives","type":"text","marks":[{"type":"strong"}]},{"text":" — fails that aren't real issues:","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Config path vs tool name confusion","type":"text","marks":[{"type":"strong"}]},{"text":" (e.g. ","type":"text"},{"text":"agents.list[]","type":"text","marks":[{"type":"code_inline"}]},{"text":" ≠ ","type":"text"},{"text":"agents_list","type":"text","marks":[{"type":"code_inline"}]},{"text":" tool)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Inverted checks","type":"text","marks":[{"type":"strong"}]},{"text":" (\"no X\" → Validator looks for X as required)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Normal English as forbidden reference","type":"text","marks":[{"type":"strong"}]},{"text":" (e.g. \"runtime outcome\" ≠ ","type":"text"},{"text":"runtime: \"acp\"","type":"text","marks":[{"type":"code_inline"}]},{"text":")","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Overcounting","type":"text","marks":[{"type":"strong"}]},{"text":" (thread commands counted as subagent commands)","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"Rule:","type":"text","marks":[{"type":"strong"}]},{"text":" If after all real fixes >3 discards come in a row and the fail justifications don't hold up under scrutiny → ","type":"text"},{"text":"declare convergence","type":"text","marks":[{"type":"strong"}]},{"text":", don't validate endlessly.","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Execution Modes","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Flag","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Behavior","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"--dry-run","type":"text","marks":[{"type":"code_inline"}]},{"text":" (default)","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Only TSV + proposed files. Target file/repo remains unchanged.","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"--live","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Target file/repo is overwritten. Auto-backup → ","type":"text"},{"text":"results/backups/","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"--resume","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Read existing TSV, continue from last iteration. On invalid format: abort.","type":"text"}]}]}]}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"mode: prompt","type":"text"}]},{"type":"blockquote","content":[{"type":"paragraph","content":[{"text":"Only read if your task contains ","type":"text","marks":[{"type":"strong"}]},{"text":"mode: prompt","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":"!","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Per Iteration: What you do","type":"text"}]},{"type":"ordered_list","attrs":{"order":1,"listStyle":"number"},"content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Read current prompt/skill","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Mentally simulate 5 different realistic scenarios","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Evaluate each scenario against ","type":"text"},{"text":"all evals","type":"text","marks":[{"type":"strong"}]},{"text":" (Yes=1, No=0)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Pass rate = (Sum Yes) / (Eval count × 5 scenarios) × 100","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Compare with best previous pass rate → determine status","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"On ","type":"text"},{"text":"improved","type":"text","marks":[{"type":"code_inline"}]},{"text":": propose ","type":"text"},{"text":"minimal, surgical","type":"text","marks":[{"type":"strong"}]},{"text":" improvement","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Write TSV row + call report.sh","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Check stop conditions","type":"text"}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"At the End","type":"text"}]},{"type":"paragraph","content":[{"text":"Best version → ","type":"text"},{"text":"results/[target]-proposed.md","type":"text","marks":[{"type":"code_inline"}]},{"text":" + report.sh ","type":"text"},{"text":"--final","type":"text","marks":[{"type":"code_inline"}]}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"mode: code","type":"text"}]},{"type":"blockquote","content":[{"type":"paragraph","content":[{"text":"Only read if your task contains ","type":"text","marks":[{"type":"strong"}]},{"text":"mode: code","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":"!","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Per Iteration: What you do","type":"text"}]},{"type":"ordered_list","attrs":{"order":1,"listStyle":"number"},"content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Create sandbox: ","type":"text"},{"text":"SCRATCH=$(mktemp -d) && cd $SCRATCH","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Write current code to sandbox","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Execute test command (with ","type":"text"},{"text":"timeout 60s","type":"text","marks":[{"type":"code_inline"}]},{"text":")","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Measure: exit_code, stdout, stderr, runtime","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Evaluate against evals → calculate pass rate","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"On ","type":"text"},{"text":"improved","type":"text","marks":[{"type":"code_inline"}]},{"text":": minimal code improvement + verify again","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Write TSV row + call report.sh","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Check stop conditions","type":"text"}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Code Eval Types","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Eval Type","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Description","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Example","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"exit_code","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Process exit code","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"exit_code == 0","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"output_contains","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"stdout contains string","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"\"SUCCESS\" in stdout","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"output_matches","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"stdout matches regex","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"r\"Total: \\d+\"","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"test_pass","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Test framework green","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"pytest exit 0","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"runtime","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Runtime limit","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"\u003c 5000ms","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"no_stderr","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"No error output","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"stderr == \"\"","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"file_exists","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Output file created","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"result.json exists","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"json_valid","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Output is valid JSON","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"json.loads(stdout)","type":"text","marks":[{"type":"code_inline"}]}]}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"At the End","type":"text"}]},{"type":"paragraph","content":[{"text":"Best code → ","type":"text"},{"text":"results/[target]-proposed.[ext]","type":"text","marks":[{"type":"code_inline"}]},{"text":" + report.sh ","type":"text"},{"text":"--final","type":"text","marks":[{"type":"code_inline"}]}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"mode: audit","type":"text"}]},{"type":"blockquote","content":[{"type":"paragraph","content":[{"text":"Only read if your task contains ","type":"text","marks":[{"type":"strong"}]},{"text":"mode: audit","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":"!","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"paragraph","content":[{"text":"⚠️ ","type":"text"},{"text":"DO NOT write your own code.","type":"text","marks":[{"type":"strong"}]},{"text":" Only test CLI commands of the target tool (","type":"text"},{"text":"--help","type":"text","marks":[{"type":"code_inline"}]},{"text":" + read-only).","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Two Variants","type":"text"}]},{"type":"paragraph","content":[{"text":"Simple Audit (CLI skill, clear commands):","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"2 iterations: Baseline → Proposed Fix","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"For tools with clear ","type":"text"},{"text":"--help","type":"text","marks":[{"type":"code_inline"}]},{"text":" output and simple command structure","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"Deep Audit (complex docs, many checks):","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Iterative loop like prompt/code, same stop conditions","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"For extensive documentation with many checkpoints (e.g. config keys, tool policy, parameter lists)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Recommended: Multi-Model setup (Opus Optimizer + external Validator)","type":"text"}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Simple Audit Flow","type":"text"}]},{"type":"ordered_list","attrs":{"order":1,"listStyle":"number"},"content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Write TSV header","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Iteration 1 (Baseline):","type":"text","marks":[{"type":"strong"}]},{"text":" Test every documented command → pass rate → TSV + report","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Iteration 2 (Proposed Fix):","type":"text","marks":[{"type":"strong"}]},{"text":" Write improved SKILL.md → expected pass rate → TSV + report","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Improved SKILL.md → ","type":"text"},{"text":"results/[target]-proposed.md","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Detail results → ","type":"text"},{"text":"results/[target]-audit-details.md","type":"text","marks":[{"type":"code_inline"}]},{"text":" (NOT in TSV!)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"report.sh ","type":"text"},{"text":"--final","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Deep Audit Flow","type":"text"}]},{"type":"ordered_list","attrs":{"order":1,"listStyle":"number"},"content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Write TSV header","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Iteration 1 (Baseline):","type":"text","marks":[{"type":"strong"}]},{"text":" Extract ground truth from source, define all checks, evaluate baseline","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Iterations 2+:","type":"text","marks":[{"type":"strong"}]},{"text":" Optimizer fixes issues → Validator checks → TSV + report per iteration","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Loop runs until stop conditions trigger (3× 100%, 5× retained, 3× discard)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Final version → ","type":"text"},{"text":"results/[target]-proposed.md","type":"text","marks":[{"type":"code_inline"}]},{"text":" or ","type":"text"},{"text":"results/[target]-v1.md","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"report.sh ","type":"text"},{"text":"--final","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Fixed Evals (audit)","type":"text"}]},{"type":"ordered_list","attrs":{"order":1,"listStyle":"number"},"content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Completeness — Does SKILL.md cover ≥80% of real commands/config?","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Correctness — Are ≥90% of documented commands/params syntactically correct?","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"No stale references — Does everything documented actually exist?","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"No missing core features — Are all important features covered?","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Workflow quality — Does quick-start actually work?","type":"text"}]}]}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"mode: project","type":"text"}]},{"type":"blockquote","content":[{"type":"paragraph","content":[{"text":"Only read if your task contains ","type":"text","marks":[{"type":"strong"}]},{"text":"mode: project","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":"!","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"paragraph","content":[{"text":"⚠️ ","type":"text"},{"text":"This mode operates on an ENTIRE repository/directory","type":"text","marks":[{"type":"strong"}]},{"text":", not a single file. Cross-file consistency is the core feature — this is NOT \"audit on many files.\"","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Three Phases","type":"text"}]},{"type":"paragraph","content":[{"text":"Project mode runs through three sequential phases. Phases 1 and 2 happen once (in Iteration 1 = Baseline). Phase 3 is the iterative fix loop.","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":3},"content":[{"text":"Phase 1: Scan & Plan","type":"text"}]},{"type":"ordered_list","attrs":{"order":1,"listStyle":"number"},"content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Analyze the repo directory:","type":"text","marks":[{"type":"strong"}]}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"# Discover structure\ntree -L 3 --dirsfirst [target_dir]\nls -la [target_dir]","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Identify relevant files","type":"text","marks":[{"type":"strong"}]},{"text":" and classify by priority:","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Priority","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Files","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"critical","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"README, Dockerfile, CI workflows (.github/workflows), package.json/requirements.txt, main entry points","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"normal","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Tests, configs, scripts, .env.example, .gitignore","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"low","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Docs, examples, LICENSE, CHANGELOG","type":"text"}]}]}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Build the File-Map","type":"text","marks":[{"type":"strong"}]},{"text":" — a mental inventory of what exists and what's missing.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Compose eval set:","type":"text","marks":[{"type":"strong"}]},{"text":" Merge user-provided evals with auto-detected evals (see Default Evals below).","type":"text"}]}]}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":3},"content":[{"text":"Phase 2: Cross-File Analysis","type":"text"}]},{"type":"paragraph","content":[{"text":"Run consistency checks ","type":"text"},{"text":"across","type":"text","marks":[{"type":"strong"}]},{"text":" files. Each check = one eval point:","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Check","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"What it verifies","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"README ↔ CLI","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Documented commands/flags match actual ","type":"text"},{"text":"--help","type":"text","marks":[{"type":"code_inline"}]},{"text":" output","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Dockerfile ↔ deps","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"requirements.txt","type":"text","marks":[{"type":"code_inline"}]},{"text":" / ","type":"text"},{"text":"package.json","type":"text","marks":[{"type":"code_inline"}]},{"text":" versions match what Dockerfile installs","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"CI ↔ project structure","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Workflow references correct paths, scripts, test commands","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":".env.example","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" ↔ code","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Every env var in code has a corresponding entry in ","type":"text"},{"text":".env.example","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Imports ↔ dependencies","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Every ","type":"text"},{"text":"import","type":"text","marks":[{"type":"code_inline"}]},{"text":" / ","type":"text"},{"text":"require","type":"text","marks":[{"type":"code_inline"}]},{"text":" has a matching dependency declaration","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Tests ↔ source","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Test files exist for critical modules","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":".gitignore","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" ↔ artifacts","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Build outputs, secrets, and caches are excluded","type":"text"}]}]}]}]},{"type":"paragraph","content":[{"text":"Result of Phase 2:","type":"text","marks":[{"type":"strong"}]},{"text":" A complete eval checklist with per-file and cross-file checks, each scored Yes/No.","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":3},"content":[{"text":"Phase 3: Iterative Fix Loop","type":"text"}]},{"type":"paragraph","content":[{"text":"Same loop logic as prompt/code/audit — TSV, report.sh, stop conditions. Key differences:","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Multiple files","type":"text","marks":[{"type":"strong"}]},{"text":" can be changed per iteration","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Pass rate","type":"text","marks":[{"type":"strong"}]},{"text":" = aggregated over ALL evals (file-specific + cross-file)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Fixes are minimal and surgical","type":"text","marks":[{"type":"strong"}]},{"text":" — don't refactor blindly, only fix what improves pass rate","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"change_description","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" includes which files were touched: ","type":"text"},{"text":"\"Fix Dockerfile + CI workflow sync\"","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Per Iteration: What you do","type":"text"}]},{"type":"ordered_list","attrs":{"order":1,"listStyle":"number"},"content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Evaluate current repo state against ","type":"text"},{"text":"all evals","type":"text","marks":[{"type":"strong"}]},{"text":" (file-specific + cross-file)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Calculate pass rate: (passing evals / total evals) × 100","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Compare with best previous pass rate → determine status","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"On ","type":"text"},{"text":"improved","type":"text","marks":[{"type":"code_inline"}]},{"text":": apply ","type":"text"},{"text":"minimal, surgical fixes","type":"text","marks":[{"type":"strong"}]},{"text":" to the fewest files necessary","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Verify the fix didn't break other evals (re-run affected checks)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Write TSV row + call report.sh","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Check stop conditions","type":"text"}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Dry-Run vs Live","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Flag","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Behavior","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"--dry-run","type":"text","marks":[{"type":"code_inline"}]},{"text":" (default)","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Fixed files → ","type":"text"},{"text":"results/[target]-proposed/","type":"text","marks":[{"type":"code_inline"}]},{"text":" directory (mirrors repo structure). Original repo untouched.","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"--live","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Files overwritten in-place. Originals backed up → ","type":"text"},{"text":"results/backups/","type":"text","marks":[{"type":"code_inline"}]},{"text":" (preserving directory structure).","type":"text"}]}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Default Evals (auto-applied unless overridden)","type":"text"}]},{"type":"paragraph","content":[{"text":"These evals are ","type":"text"},{"text":"automatically used","type":"text","marks":[{"type":"strong"}]},{"text":" when the user doesn't provide custom evals. The agent detects which are applicable based on what exists in the repo:","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"#","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Eval","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Condition","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"1","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"README accurate? (describes actual features/commands)","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"README exists","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"2","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Tests present and green? (","type":"text"},{"text":"pytest","type":"text","marks":[{"type":"code_inline"}]},{"text":" / ","type":"text"},{"text":"npm test","type":"text","marks":[{"type":"code_inline"}]},{"text":" / ","type":"text"},{"text":"go test","type":"text","marks":[{"type":"code_inline"}]},{"text":")","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Test files or test config detected","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"3","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"CI configured and syntactically correct?","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":".github/workflows/","type":"text","marks":[{"type":"code_inline"}]},{"text":" or ","type":"text"},{"text":".gitlab-ci.yml","type":"text","marks":[{"type":"code_inline"}]},{"text":" exists","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"4","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"No hardcoded secrets? (`grep -rE \"(password","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"api_key","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"5","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Dependencies complete? (","type":"text"},{"text":"requirements.txt","type":"text","marks":[{"type":"code_inline"}]},{"text":" ↔ imports, ","type":"text"},{"text":"package.json","type":"text","marks":[{"type":"code_inline"}]},{"text":" ↔ requires)","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Dependency file exists","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"6","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Dockerfile functional? (","type":"text"},{"text":"docker build","type":"text","marks":[{"type":"code_inline"}]},{"text":" succeeds or Dockerfile syntax valid)","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Dockerfile exists","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"7","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":".gitignore","type":"text","marks":[{"type":"code_inline"}]},{"text":" sensible? (no secrets, build artifacts excluded)","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":".gitignore","type":"text","marks":[{"type":"code_inline"}]},{"text":" exists","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"8","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"License present?","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Always","type":"text"}]}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Eval Scoring","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":""},"content":[{"text":"Pass Rate = (Passing Evals / Total Applicable Evals) × 100","type":"text"}]},{"type":"paragraph","content":[{"text":"Evals that don't apply (e.g. \"Dockerfile functional?\" when no Dockerfile exists) are ","type":"text"},{"text":"excluded from the total","type":"text","marks":[{"type":"strong"}]},{"text":", not counted as passes.","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"At the End","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"--dry-run","type":"text","marks":[{"type":"code_inline"}]},{"text":": All proposed changes → ","type":"text"},{"text":"results/[target]-proposed/","type":"text","marks":[{"type":"code_inline"}]},{"text":" directory","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"--live","type":"text","marks":[{"type":"code_inline"}]},{"text":": Changes already applied, backups in ","type":"text"},{"text":"results/backups/","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"report.sh ","type":"text"},{"text":"--final","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Optionally: ","type":"text"},{"text":"results/[target]-project-details.md","type":"text","marks":[{"type":"code_inline"}]},{"text":" with per-file findings (NOT in TSV!)","type":"text"}]}]}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Directory Structure","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":""},"content":[{"text":"autoforge/\n├── SKILL.md ← This file\n├── results/\n│ ├── [target]-results.tsv ← TSV logs\n│ ├── [target]-proposed.md ← Proposed improvement (prompt/audit)\n│ ├── [target]-proposed/ ← Proposed repo changes (project mode)\n│ │ ├── README.md\n│ │ ├── Dockerfile\n│ │ └── ...\n│ ├── [target]-v1.md ← Deep audit final version\n│ ├── [target]-audit-details.md ← Audit details (audit mode only)\n│ ├── [target]-project-details.md ← Project details (project mode only)\n│ └── backups/ ← Auto-backups (--live)\n│ ├── [file].bak ← Single file backups (prompt/code/audit)\n│ └── [target]-backup/ ← Full directory backup (project mode)\n├── scripts/\n│ ├── report.sh ← Channel reporting\n│ └── visualize.py ← PNG chart (optional)\n├── references/\n│ ├── eval-examples.md ← Pre-built evals\n│ └── ml-mode.md ← ML training guide\n└── examples/\n ├── demo-results.tsv ← Demo data\n └── example-config.json ← Example configuration","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Examples (task descriptions, NOT CLI commands)","type":"text"}]},{"type":"paragraph","content":[{"text":"AutoForge is not a CLI tool — it's a ","type":"text"},{"text":"skill prompt","type":"text","marks":[{"type":"strong"}]},{"text":" for the agent:","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":""},"content":[{"text":"# Optimize a prompt\n\"Start autoforge mode: prompt for the coding-agent skill.\n Evals: PTY correct? Workspace protected? Clearly structured?\"\n\n# Audit a CLI skill (simple)\n\"Start autoforge mode: audit for notebooklm-py.\"\n\n# Deep audit with multi-model\n\"Start autoforge mode: audit (deep) for subagents docs.\n Optimizer: Opus, Validator: GPT-5\n Extract ground truth from source, validate iteratively.\"\n\n# Optimize code\n\"Start autoforge mode: code for backup.sh.\n File: ./backup.sh\n Test: bash backup.sh personal --dry-run\n Evals: exit_code==0, backup file created, \u003c 10s runtime\"\n\n# Optimize a whole repository\n\"Start autoforge mode: project for ./my-app\n Evals: Tests green? CI correct? No hardcoded secrets? README accurate?\"\n\n# Project mode with custom focus\n\"Start autoforge mode: project for /path/to/api-server\n Focus: Docker + CI pipeline consistency\n Evals: docker build succeeds, CI workflow references correct paths,\n .env.example covers all env vars used in code\"\n\n# Project mode dry-run (default)\n\"Start autoforge mode: project for ./my-tool --dry-run\n Use default evals. Show me what needs fixing.\"","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Eval Examples → Mode Mapping","type":"text"}]},{"type":"paragraph","content":[{"text":"references/eval-examples.md","type":"text","marks":[{"type":"code_inline"}]},{"text":" provides ready-to-use Yes/No evals grouped by category. Here's how they map to AutoForge modes:","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"eval-examples.md Category","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"AutoForge Mode","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Notes","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Briefing, Email, Calendar, Summary, Proposal","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"prompt","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Mental simulation with scenario evals","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Python Script, Shell Script, API, Data Pipeline, Build","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"code","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Real execution with measurable criteria","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"CI/CD, Docker, Helm, Kubernetes, Terraform","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"code","type":"text","marks":[{"type":"code_inline"}]},{"text":" or ","type":"text"},{"text":"project","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"code","type":"text","marks":[{"type":"code_inline"}]},{"text":" for single files, ","type":"text"},{"text":"project","type":"text","marks":[{"type":"code_inline"}]},{"text":" for cross-file","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Code Review, API Documentation","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"audit","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Verify docs match reality","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Project / Repository, Cross-File Consistency, Security Baseline","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"project","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Whole-repo scanning and cross-file checks","type":"text"}]}]}]}]},{"type":"paragraph","content":[{"text":"Pick evals from the matching category and paste them into your task prompt as the eval set.","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Tips","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Always start with ","type":"text"},{"text":"--dry-run","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"prompt","type":"text","marks":[{"type":"code_inline"}]},{"text":" = think, ","type":"text"},{"text":"code","type":"text","marks":[{"type":"code_inline"}]},{"text":" = execute, ","type":"text"},{"text":"audit","type":"text","marks":[{"type":"code_inline"}]},{"text":" = test CLI, ","type":"text"},{"text":"project","type":"text","marks":[{"type":"code_inline"}]},{"text":" = optimize repo","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Simple Audit for clear CLI skills, Deep Audit for complex docs","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Project mode scans the whole repo — cross-file consistency is the killer feature","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Multi-Model for Deep Audits: different models cover different blind spots","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"At >3 discards after all fixes: check for validator noise, declare convergence if justified","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"TSV + report.sh are NOT optional — they are the user interface","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"For ML training: see ","type":"text"},{"text":"references/ml-mode.md","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"hr","attrs":{"markup":"---"}}]},"metadata":{"date":"2026-06-05","name":"autoforge","author":"@skillopedia","source":{"stars":2012,"repo_name":"openclaw-master-skills","origin_url":"https://github.com/leoyeai/openclaw-master-skills/blob/HEAD/skills/autoforge/SKILL.md","repo_owner":"leoyeai","body_sha256":"2cc0f61138db56d50cec69f052e42cf2fb8ee856b6b4c682fb7eb004cb411b59","cluster_key":"0ab14439e84a6c8129ab8e2abcdbb1da0a1fdd9a303d9860662702672f5d2e49","clean_bundle":{"format":"clean-skill-bundle-v1","source":"leoyeai/openclaw-master-skills/skills/autoforge/SKILL.md","attachments":[{"id":"575bd1f5-7be9-5d7e-ab3d-685ce906f34a","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/575bd1f5-7be9-5d7e-ab3d-685ce906f34a/attachment.md","path":"README.md","size":16512,"sha256":"188a1fdf37a2c1a890b520b628137833761c77ec9e0b49cab8e3390e43861d8e","contentType":"text/markdown; charset=utf-8"},{"id":"9d1c210f-b1ca-581d-a073-3186c4b25400","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/9d1c210f-b1ca-581d-a073-3186c4b25400/attachment.json","path":"_meta.json","size":273,"sha256":"72224c7fb689851cb7bb5df6bb7c8f02b3f10e52b828c81d446b5f2eeb4084a3","contentType":"application/json; charset=utf-8"},{"id":"566564e7-e420-57a9-a6a5-e8c9f217c554","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/566564e7-e420-57a9-a6a5-e8c9f217c554/attachment.json","path":"examples/example-config.json","size":241,"sha256":"894062bf74d6451cf6dd4ea59cb53d08aec833eb508fd437904e2c17a3ed9753","contentType":"application/json; charset=utf-8"},{"id":"ad5cfe32-5fac-55f1-8ffd-490aecddfc9e","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/ad5cfe32-5fac-55f1-8ffd-490aecddfc9e/attachment.md","path":"references/eval-examples.md","size":9042,"sha256":"0de2c34d29ff5b563590e37e2224a1f613b39040acfd882a8a52e40acfc7e096","contentType":"text/markdown; charset=utf-8"},{"id":"cfd41785-ccec-5e08-bd48-38d843b0b9b5","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/cfd41785-ccec-5e08-bd48-38d843b0b9b5/attachment.md","path":"references/ml-mode.md","size":3004,"sha256":"8826681430db6077465e881cfd152870b2462365175021810de719b4e3070f08","contentType":"text/markdown; charset=utf-8"},{"id":"a4af4fd9-88c3-5182-8ea1-3bfb9c3d5aad","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/a4af4fd9-88c3-5182-8ea1-3bfb9c3d5aad/attachment.md","path":"results/email-prompt-proposed.md","size":427,"sha256":"7882c035e5b5bf9dea410a102695ea023df72972d745f6e3b781c990433ca2a7","contentType":"text/markdown; charset=utf-8"},{"id":"cf3f19d9-4492-52d3-ac41-23d3da1d7546","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/cf3f19d9-4492-52d3-ac41-23d3da1d7546/attachment.sh","path":"scripts/report.sh","size":6152,"sha256":"aa13a3347cecce2fa87603434e61004c2c6046e49802d1144da015676630eef4","contentType":"application/x-sh; charset=utf-8"},{"id":"cbc27e4f-9916-57bd-b332-53de5fed1364","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/cbc27e4f-9916-57bd-b332-53de5fed1364/attachment.py","path":"scripts/visualize.py","size":4825,"sha256":"328e29e9bd909ca712f92c83e44abed6c794d4a8e2686ee8f0feba2c81ef237e","contentType":"text/x-python; charset=utf-8"}],"bundle_sha256":"56c15818fda534eebbb7e156aac1812769fb19ca5e28ed0bbcdf2349b44f2e05","attachment_count":8,"text_attachments":8,"attachment_storage":"skillopedia-attachments-v1","binary_attachments":0,"excluded_attachments":[]},"cluster_size":1,"skill_md_path":"skills/autoforge/SKILL.md","import_metadata":{"date":"2026-06-05","author":"@skillopedia","version":"v1","category":"security","category_label":"Security"},"exact_dupes_collapsed_into_this":0},"version":"v1","category":"security","import_tag":"clean-skills-v1","description":"AutoForge is a production-grade autonomous optimization framework for AI agents. It replaces subjective \"reflection\" with mathematically rigorous convergence loops — tracking every iteration in TSV, cross-validating with multiple models, and stopping only when pass rates confirm real improvement. Four specialized modes: prompt (skill & doc optimization via scenario simulation), code (sandboxed test execution with measurable criteria), audit (CLI verification against live tool behavior), and project (whole-repo cross-file consistency analysis). Battle-tested across 50+ iterations on production skills. Use when: user says \"autoforge\", \"forge\", \"optimize skill\", \"improve\", \"run autoforge\", \"optimize code\", \"improve script\", \"optimize repo\", \"forge project\", \"check project\", \"repo audit\"."}},"renderedAt":1782979438541}

Important: agents should read /llm.txt, /llms.txt, or /.well-known/skills.json to discover the public Skillopedia API.