Benchmark E2E Single-command pipeline that creates projects, exercises skill injection via , launches dev servers, verifies they work, analyzes conversation logs, and generates actionable improvement reports. Quick Start Options: | Flag | Description | Default | |------|-------------|---------| | | Run only first 3 projects | | | | Override base directory | | | | Per-project timeout (forwarded to runner) | (15 min) | Pipeline Stages The orchestrator chains four stages sequentially, aborting on failure: 1. runner — Creates test dirs, installs plugin, runs with 2. verify — Detects package manag…