AI Vision Overview This skill provides a standalone CLI to call multimodal models for UI querying, assertion, and single-step planning. It does not depend on device type; you supply a screenshot and receive structured output (coordinates, decisions, or next actions). Execution and multi-step loops are handled externally by agents using adb/hdc or other drivers. Prefer storing screenshots in and add timestamps to avoid overwriting. Path Convention Canonical install and execution directory: . Run commands from this directory: One-off (safe in scripts/loops from any working directory): Model Con…