Cost Benchmark Runs against the structural+adversarial corpus and writes per-case + summary results to . This is the verification gate that backs every measurable claim in / . When to use - Before publishing a release — verify booster win rate didn't regress. - After expanding — confirm new cases route correctly. - When auditing a "claimed upstream" tag — flip it to "verified" once the bench supports it. - On a cost question ("is Sonnet 4.6 cheaper than Opus 4.7 for these tasks?") — re-run with . Steps 1. Run the bench from (where resolves): 2. Inspect the markdown summary printed to stdout.…