Skill Grader Evaluate skill test run outputs against expectations and extract implicit claims. Input Schema Grading Process Step 1: Read Context Note: eval prompt, execution steps, errors, final result. Step 2: Grade Expectations For each EXPECTATION in expectations: PASS criteria: - Clear evidence in transcript or outputs - Evidence reflects genuine task completion, not surface compliance - A correct filename with wrong content is FAIL, not PASS FAIL criteria: - No evidence found - Evidence contradicts expectation - Evidence is superficial (right format, wrong substance) - Cannot be verified…