Transcribe Skill Production-grade speech-to-text transcription with intelligent file handling, multiple output formats, and parallel processing. When to Use ✅ USE this skill when: - Transcribing audio recordings to text - Creating subtitles for video content - Converting speech to searchable text - Needing word-level timestamps - Processing podcasts or meeting recordings - Transcribing interviews - Converting audio notes to text - Creating transcripts for video editing ❌ DON'T use this skill when: - Transcribing YouTube videos → Use youtube-transcript (faster, no API cost) - Real-time transcr…

\\t' read -r time word; do\n echo \"$time: $word\" >> index.txt\ndone \u003c meeting-by-words.txt\n\necho \"Archive created: index.txt\"\n```\n\n### Subtitle Synchronization\n\n```bash\n#!/bin/bash\nVIDEO=\"video.mp4\"\nAUDIO=\"video.m4a\" # Extracted audio\n\n# Get word-level transcription\n{baseDir}/transcribe.js \"$AUDIO\" --format json --output transcription.json\n\n# Create SRT with optimized line breaks\njq -r '\n def format_srt_time(seconds):\n [ (seconds / 3600 | floor),\n (seconds % 3600 / 60 | floor),\n (seconds % 60 | floor),\n (seconds % 1 * 1000 | floor)\n ] | \n [.[]] as [$h, $m, $s, $ms] |\n \"\\($h | tostring | split(\"\") | (. | length | if . \u003c 2 then [\"0\"] + $h else $h end) | add):\\($m | tostring | split(\"\") | (. | length | if . \u003c 2 then [\"0\"] + $m else $m end) | add):\\($s | tostring | split(\"\") | (. | length | if . \u003c 2 then [\"0\"] + $s else $s end) | add),\\($ms | tostring | split(\"\") | (. | length | if . \u003c 3 then [\"0\"] + $ms else $ms end) | add)\";\n \n \"WEBVTT\",\n \"\",\n (.words | map(.word) | join(\" \") | split(\"\\\\. \") | .[] | select(length > 0) | \n { text: ., start: ., end: . })\n | \n \"\\(format_srt_time(.start)) --> \\(format_srt_time(.end))\",\n \"\\(.text)\"\n' transcription.json > subtitles.srt\n\necho \"SRT subtitles created: subtitles.srt\"\n```\n\n### Extract Keywords with Timestamps\n\n```bash\n#!/bin/bash\nAUDIO=\"recording.mp3\"\nKEYWORDS=(\"budget\" \"timeline\" \"decision\")\n\n# Transcribe\n{baseDir}/transcribe.js \"$AUDIO\" --format json --output data.json\n\n# Find keywords with timestamps\necho \"Keyword timestamps:\"\nfor kw in \"${KEYWORDS[@]}\"; do\n jq -r --arg kw \"${kw,,}\" '.words[] | select(.word | ascii_downcase | contains($kw)) | \"\\(.word) at \\(.start)s\"' data.json\ndone\n```\n\n## Performance Tips\n\n### 1. Use Cache\n\n```bash\n# First time (slow)\n{baseDir}/transcribe.js audio.mp3\n\n# Second time (fast)\n{baseDir}/transcribe.js audio.mp3\n\n# Same file, different format - different cache\n{baseDir}/transcribe.js audio.mp3 --format srt # New cache entry\n```\n\n### 2. Specify Language\n\n```bash\n# Auto-detect (slower first pass)\n{baseDir}/transcribe.js spanish.mp3\n\n# Specify language (faster, more accurate)\n{baseDir}/transcribe.js spanish.mp3 --language es\n```\n\n### 3. Pre-extract Audio\n\n```bash\n# Slower: video with embedded audio\n{baseDir}/transcribe.js video.mp4\n\n# Faster: pre-extracted audio\nffmpeg -i video.mp4 -vn -c:a libmp3lame -b:a 192k audio.mp3\n{baseDir}/transcribe.js audio.mp3\n```\n\n### 4. Batch Processing\n\n```bash\n# Process multiple files\nfor f in *.mp3; do\n {baseDir}/transcribe.js \"$f\" &\ndone\nwait\n```\n\n### 5. Parallel Segments\n\n```bash\n# Large files process segments in parallel\n# 30-minute file with 3 segments\n# Elapsed time: ~60 seconds (3x faster than sequential)\n```\n\n## Notes\n\n- Maximum file duration: 2 hours\n- Maximum file size for direct upload: 25MB\n- Caching includes format in key (different formats = different caches)\n- API rate limits: 60 requests/minute\n- Segment size: 10 minutes (configurable in code)\n- Output format affects cache (srt and json cached separately)\n- Word timestamps provide ~50ms precision\n- SRT/VTT formats group words into phrases (~5 words)\n- TSV/CSV provide per-word timestamps\n- JSON includes all metadata and word-level data\n- Audio preprocessing preserves quality while optimizing for Whisper\n- FFmpeg required for format conversion and segmentation\n- Network errors retry up to 3 times with exponential backoff\n---","attachment_filenames":["transcribe.sh"],"attachments":[{"filename":"transcribe.sh","content":"#!/bin/bash\n\nSCRIPT_DIR=\"$(cd \"$(dirname \"$0\")\" && pwd)\"\n\n# Source config if available\nif [ -f \"$SCRIPT_DIR/config\" ]; then\n source \"$SCRIPT_DIR/config\"\nfi\n\nif [ -z \"$1\" ]; then\n echo \"Usage: transcribe.sh \u003caudio-file>\"\n exit 1\nfi\n\nif [ -z \"$GROQ_API_KEY\" ]; then\n echo \"Error: GROQ_API_KEY not set. Create config file with: echo 'GROQ_API_KEY=\\\"your-key\\\"' > $SCRIPT_DIR/config\"\n exit 1\nfi\n\nAUDIO_FILE=\"$1\"\n\nif [ ! -f \"$AUDIO_FILE\" ]; then\n echo \"Error: File not found: $AUDIO_FILE\"\n exit 1\nfi\n\ncurl -s -X POST \"https://api.groq.com/openai/v1/audio/transcriptions\" \\\n -H \"Authorization: Bearer $GROQ_API_KEY\" \\\n -F \"file=@${AUDIO_FILE}\" \\\n -F \"model=whisper-large-v3-turbo\" \\\n -F \"response_format=text\"\n","content_type":"application/x-sh; charset=utf-8","language":"bash","size":715,"content_sha256":"b06056d1a676f54d5500c0acac508bb16bce9a76bd117aab3f644dae940530c0"}],"content_json":{"type":"doc","content":[{"type":"heading","attrs":{"level":1},"content":[{"text":"Transcribe Skill","type":"text"}]},{"type":"paragraph","content":[{"text":"Production-grade speech-to-text transcription with intelligent file handling, multiple output formats, and parallel processing.","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"When to Use","type":"text"}]},{"type":"paragraph","content":[{"text":"✅ ","type":"text"},{"text":"USE this skill when:","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Transcribing audio recordings to text","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Creating subtitles for video content","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Converting speech to searchable text","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Needing word-level timestamps","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Processing podcasts or meeting recordings","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Transcribing interviews","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Converting audio notes to text","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Creating transcripts for video editing","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"❌ ","type":"text"},{"text":"DON'T use this skill when:","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Transcribing YouTube videos → Use youtube-transcript (faster, no API cost)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Real-time transcription → Use streaming tools","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Already have captions → Use youtube-transcript","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Need video-specific processing → Use ffmpeg-tools first","type":"text"}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Prerequisites","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"# 1. Get Groq API key\n# Visit: https://console.groq.com/\n# Create an API key\n\n# 2. Set environment variable\nexport GROQ_API_KEY=\"gsk_your_api_key_here\"\n\n# 3. Install FFmpeg (for audio processing)\nbrew install ffmpeg # macOS\nsudo apt install ffmpeg # Ubuntu/Debian\n\n# 4. Verify\nnode --version # Should show version","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Commands","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Basic Usage","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"# Basic transcription (outputs plain text)\n{baseDir}/transcribe.js audio.m4a\n\n# Transcribe with specific output format\n{baseDir}/transcribe.js audio.mp3 --format srt --output subtitles.srt\n{baseDir}/transcribe.js meeting.wav --format json --output result.json\n\n# Specify language for better accuracy\n{baseDir}/transcribe.js spanish.mp3 --language es --format text\n{baseDir}/transcribe.js audio.mp3 --language de --format vtt","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Output Formats","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"# Plain text (default)\n{baseDir}/transcribe.js audio.mp3 --format text\nTranscriber output follows without timestamps.\n\n# JSON with detailed data\n{baseDir}/transcribe.js audio.mp3 --format json\n{\n \"text\": \"Transcription text...\",\n \"duration\": 123.45,\n \"language\": \"en\",\n \"words\": [{\"word\": \"Transcription\", \"start\": 0.0, \"end\": 0.5}, ...]\n}\n\n# SRT subtitles\n{baseDir}/transcribe.js audio.mp3 --format srt --output subtitles.srt\n1\n00:00:00,000 --> 00:00:05,500\nTranscription of the audio begins here\n\n2\n00:00:05,500 --> 00:00:11,200\nAnd continues in the next segment\n\n# VTT subtitles\n{baseDir}/transcribe.js audio.mp3 --format vtt --output captions.vtt\nWEBVTT\n\n00:00.000 --> 00:05.500\nTranscription of the audio begins here\n\n# Word timings TSV\n{baseDir}/transcribe.js audio.mp3 --format tsv\nstart\\tend\\tword\n0.000\\t0.450\\tTranscription\n0.450\\t0.820\\tof\n0.820\\t1.240\\tthe\n\n# Word timings CSV\n{baseDir}/transcribe.js audio.mp3 --format csv\nstart,end,word\n0.000,0.450,\"Transcription\"\n0.450,0.820,\"of\"\n0.820,1.240,\"the\"","type":"text"}]},{"type":"paragraph","content":[{"text":"Format Comparison:","type":"text","marks":[{"type":"strong"}]}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Format","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Use Case","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Word Timestamps","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"File Size","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"text","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"General use","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"❌","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Small","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"json","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"API integration","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"✅","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Large","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"srt","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Subtitles","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"⚠️ Phrases","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Medium","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"vtt","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Web captions","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"⚠️ Phrases","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Medium","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"tsv","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Spreadsheet","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"✅","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Medium","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"csv","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Database import","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"✅","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Medium","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"word_timings","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Analysis","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"✅","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Large","type":"text"}]}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Language Selection","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"# Auto-detect (default)\n{baseDir}/transcribe.js audio.mp3\n\n# Specify language for better accuracy\n{baseDir}/transcribe.js audio.mp3 --language en # English\n{baseDir}/transcribe.js audio.mp3 --language es # Spanish\n{baseDir}/transcribe.js audio.mp3 --language fr # French\n{baseDir}/transcribe.js audio.mp3 --language de # German\n{baseDir}/transcribe.js audio.mp3 --language ja # Japanese","type":"text"}]},{"type":"paragraph","content":[{"text":"Supported Languages:","type":"text","marks":[{"type":"strong"}]},{"text":" All 99 languages supported by Whisper","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Large File Processing","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"# Files >25MB are automatically segmented\n{baseDir}/transcribe.js long-recording.mp3\n\n# Progress shown for segmented files\n⏳ Transcribing: Segment 3/12 (25.0%) | Elapsed: 45.2s\n\n# Output combined automatically","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Cache Control","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"# Use cache (default) - instant for previously transcribed\n{baseDir}/transcribe.js audio.mp3\n\n# Force fresh transcription\n{baseDir}/transcribe.js audio.mp3 --no-cache","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"API Provider Selection","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"# Use Groq (default) - faster, cheaper\n{baseDir}/transcribe.js audio.mp3 --provider groq\n\n# Use OpenAI Whisper (requires OPENAI_API_KEY)\n{baseDir}/transcribe.js audio.mp3 --provider openai","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Supported Audio Formats","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Format","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Extension","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Notes","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"MP3","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":".mp3","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Best compatibility","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"MP4","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":".mp4, .m4a","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"iOS recordings","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"WAV","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":".wav","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Uncompressed, large files","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"OGG","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":".ogg, .oga, .ogv","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Open format","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"FLAC","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":".flac","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Lossless compression","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"WebM","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":".webm","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Web audio/videos","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"AAC","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":".aac","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Apple format","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"WMA","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":".wma","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Windows format","type":"text"}]}]}]}]},{"type":"paragraph","content":[{"text":"Audio Preprocessing:","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Unsupported formats are auto-converted to MP3","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Sample rate normalized to 16kHz (Whisper optimal)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Mono channel for better accuracy","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Bitrate: 192kbps MP3","type":"text"}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Features","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Automatic Segmentation","type":"text"}]},{"type":"paragraph","content":[{"text":"Large audio files are automatically split for processing:","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":""},"content":[{"text":"Audio File >25MB\n ↓ FFmpeg\nConvert to MP3 (16kHz, mono)\n ↓\nSplit into 10-minute segments\n ↓\nTranscribe segments in parallel\n ↓\nMerge results with adjusted timestamps","type":"text"}]},{"type":"paragraph","content":[{"text":"Segmentation Benefits:","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"✓ Handles recordings up to 2 hours","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"✓ Respects API rate limits","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"✓ Parallel processing for speed","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"✓ Seamless results (timestamps adjusted)","type":"text"}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Word-Level Timestamps","type":"text"}]},{"type":"paragraph","content":[{"text":"Each word includes start and end timestamps:","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"json"},"content":[{"text":"{\n \"words\": [\n {\"word\": \"Hello\", \"start\": 0.000, \"end\": 0.320},\n {\"word\": \"and\", \"start\": 0.320, \"end\": 0.560},\n {\"word\": \"welcome\", \"start\": 0.560, \"end\": 0.980},\n {\"word\": \"everyone\", \"start\": 0.980, \"end\": 1.420}\n ]\n}","type":"text"}]},{"type":"paragraph","content":[{"text":"Uses for Timestamps:","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Jump to specific words in audio","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Create perfectly synced subtitles","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Search within transcripts","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Edit audio at transcript points","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Analyze speech patterns","type":"text"}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Intelligent Caching","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Cache Location:","type":"text","marks":[{"type":"strong"}]},{"text":" ","type":"text"},{"text":"/tmp/transcribe-cache/","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"TTL:","type":"text","marks":[{"type":"strong"}]},{"text":" 24 hours","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Cache Key:","type":"text","marks":[{"type":"strong"}]},{"text":" File hash + language + model","type":"text"}]}]}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"# First time: ~10-60 seconds\n{baseDir}/transcribe.js audio.mp3 --format json\n\n# Second time: ~1 second (cache hit)\n{baseDir}/transcribe.js audio.mp3 --format json\n\n# Force fresh: ~10-60 seconds\n{baseDir}/transcribe.js audio.mp3 --format json --no-cache","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Rate Limiting","type":"text"}]},{"type":"paragraph","content":[{"text":"Built-in protection against API limits:","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Max 60 requests per minute","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Automatic delays between requests","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Sequential processing for safety","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"Cost Optimization:","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Groq Whisper Turbo: Free tier available","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Cached results cost nothing","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Segmented files use 1 request per segment","type":"text"}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Error Handling","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Error Codes","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Code","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Name","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Description","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"0","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"SUCCESS","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Transcription complete","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"1","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"INVALID_INPUT","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Bad parameters","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"2","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"FILE_NOT_FOUND","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Audio file missing","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"3","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"FILE_TOO_LARGE","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Exceeds 2 hours","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"4","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"UNSUPPORTED_FORMAT","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Can't process format","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"5","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"API_KEY_MISSING","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"GROQ_API_KEY not set","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"6","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"API_ERROR","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Request failed","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"7","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"RATE_LIMITED","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"API throttling","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"8","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"NETWORK_ERROR","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Connection issue","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"9","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"TIMEOUT","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Request took too long","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"10","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"AUDIO_PROCESSING_ERROR","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"FFmpeg failed","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"11","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"SEGMENTATION_ERROR","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Splitting failed","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"12","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"INTERRUPTED","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"User cancelled","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"99","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"UNKNOWN","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Unexpected error","type":"text"}]}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Common Errors","type":"text"}]},{"type":"paragraph","content":[{"text":"\"API key not found\"","type":"text","marks":[{"type":"strong"}]}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"# Solution: Set the environment variable\nexport GROQ_API_KEY=\"gsk_your_key\"\necho \"export GROQ_API_KEY=gsk_your_key\" >> ~/.zshrc # Persist","type":"text"}]},{"type":"paragraph","content":[{"text":"\"File too large\"","type":"text","marks":[{"type":"strong"}]}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"# Video duration exceeds 2 hours\n# Solution: Split manually first\nffmpeg -i long.mp4 -ss 0 -t 7200 first.mp4\nffmpeg -i long.mp4 -ss 7200 -t 7200 second.mp4","type":"text"}]},{"type":"paragraph","content":[{"text":"\"Rate limited\"","type":"text","marks":[{"type":"strong"}]}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"# Too many requests\n# Solution: Wait 1 minute, try again\n# Or add delay between batch operations","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Technical Details","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Processing Pipeline","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":""},"content":[{"text":"1. Validate Input\n ├── Check file exists\n ├── Check format supported\n ├── Probe audio metadata\n └── Validate size/duration\n\n2. Check Cache\n └── Return cached if available\n\n3. Preprocess (if needed)\n ├── Convert to MP3\n ├── Set sample rate to 16kHz\n └── Normalize to mono\n\n4. Split (if >25MB)\n └── Create 10-minute segments\n\n5. Transcribe\n ├── Rate-limited requests\n ├── Word-level timestamps\n └── Progress tracking\n\n6. Merge (if segmented)\n └── Adjust timestamps\n\n7. Format Output\n └── Apply selected format\n\n8. Cache Result\n └── Store for 24 hours","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"API Configuration","type":"text"}]},{"type":"paragraph","content":[{"text":"Groq (Default):","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Endpoint: ","type":"text"},{"text":"api.groq.com/v1/audio/transcriptions","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Model: ","type":"text"},{"text":"whisper-large-v3-turbo","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Max file size: 25MB per request","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Word-level timestamps: Yes","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Cost: Free tier: $0.0013/minute","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"OpenAI (Optional):","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Endpoint: ","type":"text"},{"text":"api.openai.com/v1/audio/transcriptions","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Model: ","type":"text"},{"text":"whisper-1","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Max file size: 25MB per request","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Word-level timestamps: Yes","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Cost: $0.006/minute","type":"text"}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Timestamp Adjustment","type":"text"}]},{"type":"paragraph","content":[{"text":"For segmented files, timestamps are adjusted:","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":""},"content":[{"text":"Segment 1: [0:00 - 10:00] → [0:00 - 10:00]\nSegment 2: [0:00 - 10:00] → [10:00 - 20:00]\nSegment 3: [0:00 - 10:00] → [20:00 - 30:00]","type":"text"}]},{"type":"paragraph","content":[{"text":"Example:","type":"text","marks":[{"type":"strong"}]}]},{"type":"code_block","attrs":{"wrap":false,"language":""},"content":[{"text":"Segment 2 word: \"discussion\", start: 5:30\nAdjusted timestamp: 5:30 + 10:00 = 15:30","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Examples","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Transcribe Meeting Recording","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"#!/bin/bash\nMEETING=\"meeting-$(date +%Y%m%d).mp3\"\n\necho \"Transcribing meeting...\"\n{baseDir}/transcribe.js \"$MEETING\" --format txt --output \"$MEETING.txt\"\n{baseDir}/transcribe.js \"$MEETING\" --format srt --output \"$MEETING.srt\"\n{baseDir}/transcribe.js \"$MEETING\" --format json --output \"$MEETING.json\"\n\necho \"Done: $MEETING.{txt,srt,json}\"","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Batch Transcribe Directory","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"#!/bin/bash\nmkdir -p transcripts\n\nfor audio in *.mp3 *.m4a *.wav; do\n [ -f \"$audio\" ] || continue\n \n echo \"Processing: $audio\"\n base=\"${audio%.*}\"\n \n {baseDir}/transcribe.js \"$audio\" --format srt --output \"transcripts/${base}.srt\" 2>/dev/null\n \n if [ $? -eq 0 ]; then\n echo \" ✓ Created transcripts/${base}.srt\"\n else\n echo \" ✗ Failed\"\n fi\n \n sleep 1 # Rate limit protection\ndone","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Create Searchable Meeting Archive","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"#!/bin/bash\nINPUT=\"meeting.mp3\"\n\n# Transcribe with word timings\n{baseDir}/transcribe.js \"$INPUT\" --format json --output meeting.json\n\n# Extract all utterances with timestamps\njq -r '\n .words[] | \n \"\\(.start | tostring | split(\".\") | .[0] + \".\" + .[1][:2])\\t\\(.word)\"\n' meeting.json > meeting-by-words.txt\n\n# Create time-indexed file\necho \"Meeting transcript indexed by time\" > index.txt\nwhile IFS=

Transcribe Skill Production-grade speech-to-text transcription with intelligent file handling, multiple output formats, and parallel processing. When to Use ✅ USE this skill when: - Transcribing audio recordings to text - Creating subtitles for video content - Converting speech to searchable text - Needing word-level timestamps - Processing podcasts or meeting recordings - Transcribing interviews - Converting audio notes to text - Creating transcripts for video editing ❌ DON'T use this skill when: - Transcribing YouTube videos → Use youtube-transcript (faster, no API cost) - Real-time transcr…

\\t' read -r time word; do\n echo \"$time: $word\" >> index.txt\ndone \u003c meeting-by-words.txt\n\necho \"Archive created: index.txt\"","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Subtitle Synchronization","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"#!/bin/bash\nVIDEO=\"video.mp4\"\nAUDIO=\"video.m4a\" # Extracted audio\n\n# Get word-level transcription\n{baseDir}/transcribe.js \"$AUDIO\" --format json --output transcription.json\n\n# Create SRT with optimized line breaks\njq -r '\n def format_srt_time(seconds):\n [ (seconds / 3600 | floor),\n (seconds % 3600 / 60 | floor),\n (seconds % 60 | floor),\n (seconds % 1 * 1000 | floor)\n ] | \n [.[]] as [$h, $m, $s, $ms] |\n \"\\($h | tostring | split(\"\") | (. | length | if . \u003c 2 then [\"0\"] + $h else $h end) | add):\\($m | tostring | split(\"\") | (. | length | if . \u003c 2 then [\"0\"] + $m else $m end) | add):\\($s | tostring | split(\"\") | (. | length | if . \u003c 2 then [\"0\"] + $s else $s end) | add),\\($ms | tostring | split(\"\") | (. | length | if . \u003c 3 then [\"0\"] + $ms else $ms end) | add)\";\n \n \"WEBVTT\",\n \"\",\n (.words | map(.word) | join(\" \") | split(\"\\\\. \") | .[] | select(length > 0) | \n { text: ., start: ., end: . })\n | \n \"\\(format_srt_time(.start)) --> \\(format_srt_time(.end))\",\n \"\\(.text)\"\n' transcription.json > subtitles.srt\n\necho \"SRT subtitles created: subtitles.srt\"","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Extract Keywords with Timestamps","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"#!/bin/bash\nAUDIO=\"recording.mp3\"\nKEYWORDS=(\"budget\" \"timeline\" \"decision\")\n\n# Transcribe\n{baseDir}/transcribe.js \"$AUDIO\" --format json --output data.json\n\n# Find keywords with timestamps\necho \"Keyword timestamps:\"\nfor kw in \"${KEYWORDS[@]}\"; do\n jq -r --arg kw \"${kw,,}\" '.words[] | select(.word | ascii_downcase | contains($kw)) | \"\\(.word) at \\(.start)s\"' data.json\ndone","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Performance Tips","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"1. Use Cache","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"# First time (slow)\n{baseDir}/transcribe.js audio.mp3\n\n# Second time (fast)\n{baseDir}/transcribe.js audio.mp3\n\n# Same file, different format - different cache\n{baseDir}/transcribe.js audio.mp3 --format srt # New cache entry","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"2. Specify Language","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"# Auto-detect (slower first pass)\n{baseDir}/transcribe.js spanish.mp3\n\n# Specify language (faster, more accurate)\n{baseDir}/transcribe.js spanish.mp3 --language es","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"3. Pre-extract Audio","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"# Slower: video with embedded audio\n{baseDir}/transcribe.js video.mp4\n\n# Faster: pre-extracted audio\nffmpeg -i video.mp4 -vn -c:a libmp3lame -b:a 192k audio.mp3\n{baseDir}/transcribe.js audio.mp3","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"4. Batch Processing","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"# Process multiple files\nfor f in *.mp3; do\n {baseDir}/transcribe.js \"$f\" &\ndone\nwait","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"5. Parallel Segments","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"# Large files process segments in parallel\n# 30-minute file with 3 segments\n# Elapsed time: ~60 seconds (3x faster than sequential)","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Notes","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Maximum file duration: 2 hours","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Maximum file size for direct upload: 25MB","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Caching includes format in key (different formats = different caches)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"API rate limits: 60 requests/minute","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Segment size: 10 minutes (configurable in code)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Output format affects cache (srt and json cached separately)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Word timestamps provide ~50ms precision","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"SRT/VTT formats group words into phrases (~5 words)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"TSV/CSV provide per-word timestamps","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"JSON includes all metadata and word-level data","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Audio preprocessing preserves quality while optimizing for Whisper","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"FFmpeg required for format conversion and segmentation","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Network errors retry up to 3 times with exponential backoff","type":"text"}]}]}]},{"type":"hr","attrs":{"markup":"---"}}]},"metadata":{"date":"2026-06-05","name":"transcribe","author":"@skillopedia","source":{"stars":0,"repo_name":"upgraded-carnival","origin_url":"https://github.com/winsorllc/upgraded-carnival/blob/HEAD/.pi/skills/transcribe/SKILL.md","repo_owner":"winsorllc","body_sha256":"1a4e8a4b0e4642ec434f8e4b9caf9447a848726d6a33747216de39c47f73dda2","cluster_key":"79ace6b5ad0adfd74b5a8a66effd371c0456a6e8d4347930b496623dfd263d14","clean_bundle":{"format":"clean-skill-bundle-v1","source":"winsorllc/upgraded-carnival/.pi/skills/transcribe/SKILL.md","attachments":[{"id":"43b1f4cd-cba4-5f4a-8f58-93965220ff49","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/43b1f4cd-cba4-5f4a-8f58-93965220ff49/attachment.js","path":"transcribe.js","size":25540,"sha256":"983d668d5dfc9765a02e3213fd4ca57a22c8059c54b1b20f1a1a4bcbb084a8ed","contentType":"application/javascript; charset=utf-8"},{"id":"c89286db-e775-5ef0-9a3f-15eb2e74cca2","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/c89286db-e775-5ef0-9a3f-15eb2e74cca2/attachment.sh","path":"transcribe.sh","size":715,"sha256":"b06056d1a676f54d5500c0acac508bb16bce9a76bd117aab3f644dae940530c0","contentType":"application/x-sh; charset=utf-8"}],"bundle_sha256":"de9e8c3dd5b9acd44588595b8b9ff4243c63483907690e81798f19a0ee67dcb2","attachment_count":2,"text_attachments":2,"attachment_storage":"skillopedia-attachments-v1","binary_attachments":0,"excluded_attachments":[]},"cluster_size":1,"skill_md_path":".pi/skills/transcribe/SKILL.md","import_metadata":{"date":"2026-06-05","author":"@skillopedia","version":"v1","category":"documents-office","category_label":"Documents"},"exact_dupes_collapsed_into_this":0},"version":"v1","category":"documents-office","metadata":{"tags":["transcription","speech-to-text","whisper","audio","srt","vtt","subtitle"],"author":"Pi Agent","version":"2.0.0","category":"media","requires":{"env":["GROQ_API_KEY"],"bins":["node","ffmpeg"]}},"import_tag":"clean-skills-v1","description":"Production-grade speech-to-text transcription with Groq Whisper API support. Automatic file segmentation, multiple output formats, word-level timestamps, language auto-detection, and intelligent caching."}},"renderedAt":1782982096643}

Transcribe Skill Production-grade speech-to-text transcription with intelligent file handling, multiple output formats, and parallel processing. When to Use ✅ USE this skill when: - Transcribing audio recordings to text - Creating subtitles for video content - Converting speech to searchable text - Needing word-level timestamps - Processing podcasts or meeting recordings - Transcribing interviews - Converting audio notes to text - Creating transcripts for video editing ❌ DON'T use this skill when: - Transcribing YouTube videos → Use youtube-transcript (faster, no API cost) - Real-time transcr…