Agent Evaluation Overview LLM-as-judge evaluation framework that scores AI-generated content on 5 dimensions using a 1-5 rubric. Agents evaluate outputs, compute a weighted composite score, and emit a structured verdict with evidence citations. Core principle: Systematic quality verification before claiming completion. Agent-studio currently has no way to verify agent output quality — this skill fills that gap. When to Use Always: - Before marking a task complete (pair with ) - After a plan is generated (evaluate plan quality) - After code review outputs (evaluate review quality) - During ref…