Eval Harness Skill A formal evaluation framework for Claude Code sessions, implementing eval-driven development (EDD) principles. When to Activate - Setting up eval-driven development (EDD) for AI-assisted workflows - Defining pass/fail criteria for Claude Code task completion - Measuring agent reliability with pass@k metrics - Creating regression test suites for prompt or agent changes - Benchmarking agent performance across model versions Philosophy Eval-Driven Development treats evals as the "unit tests of AI development": - Define expected behavior BEFORE implementation - Run evals contin…