AgentBench for OpenClaw Benchmark your OpenClaw agent's general capabilities across 40 real-world tasks spanning 7 domains. Commands When the user says any of these, follow the corresponding instructions: - — Run the full benchmark suite (all 40 tasks) - — Run only easy+medium tasks (19 tasks) - — Run one domain only - — Run a single task - — Tag results as externally verified scoring - — List all tasks grouped by domain - — Show results from previous runs - — Compare two runs side-by-side Flags are combinable: Running a Benchmark Step 1: Discover Tasks Read task.yaml files from the directory…