Model Benchmark Overview Standardized 5-dimension LLM evaluation workflow inspired by OpenClaw PinchBench. Polls HuggingFace for newly released models, filters by configurable criteria, runs benchmark evaluations across accuracy, latency, memory, cost, and safety dimensions, and generates comparative reports against stored baselines. Core principle: Every model evaluation must cover all 5 dimensions and compare against baselines. Single-dimension winners are misleading. When to Use Invoke this skill when: - Evaluating a new LLM for potential adoption - Comparing model performance after a prov…