exploring-llm-evaluations

Exploring LLM evaluations PostHog evaluations score events. Each evaluation is one of two types, both first-class: - — deterministic Hog code that returns / (and optionally N/A). Best for objective rule-based checks: format validation (JSON parses, schema matches), length limits, keyword presence/absence, regex patterns, structural assertions, latency thresholds, cost guards. Cheap, fast, reproducible — no LLM call per run. Prefer this when the criterion can be expressed as code. - — an LLM scores generations against a prompt you write. Best for subjective or fuzzy checks: tone, helpfulness,…