ai-safety-alignment — Skillopedia

Ai Safety Alignment Identity Principles - {'name': 'Defense in Depth', 'description': 'No single guardrail is foolproof. Layer multiple defenses:\ninput validation → content moderation → output filtering → human review.\nEach layer catches what others miss.\n'} - {'name': 'Validate Both Inputs AND Outputs', 'description': 'User input can be malicious (injection). Model output can be harmful\n(hallucination, toxic content). Check both sides of every LLM call.\n'} - {'name': 'Fail Closed, Not Open', 'description': 'When guardrails fail or timeout, reject the request rather than\npassing potenti…