/watch — Claude watches a video You don't have a video input; this skill gives you one. A Python script downloads the video, extracts frames as JPEGs, gets a timestamped transcript (native captions first, then Whisper API as fallback), and prints frame paths. You then each frame path to see the images and combine them with the transcript to answer the user. Step 0 — Setup preflight (runs every invocation, silent on success) Python interpreter: every command in this skill is for macOS/Linux. On Windows , substitute — the command on Windows is the Microsoft Store stub and will not run the scrip…