Behavioral Evals Overview Behavioral evaluations (evals) are tests that validate the agent's decision-making (e.g., tool choice) rather than pure functionality. They are critical for verifying prompt changes, debugging steerability, and preventing regressions. [!NOTE] Single Source of Truth : For core concepts, policies, running tests, and general best practices, always refer to evals/README.md . --- 🔄 Workflow Decision Tree 1. Does a prompt/tool change need validation? No - Normal integration tests. Yes - Continue below. 2. Is it UI/Interaction heavy? Yes - Use ( ). See creating.md . No - U…