Groq Core Workflow B: Audio, Vision & Speech Overview Beyond chat completions, Groq provides ultra-fast audio transcription (Whisper at 216x real-time), multimodal vision (Llama 4 Scout/Maverick), and text-to-speech. These endpoints use the same client. Prerequisites - installed, set - For audio: audio files in supported formats - For vision: image URLs or base64 images Audio Models | Model ID | Languages | Speed | Best For | |----------|-----------|-------|----------| | | 100+ | 164x real-time | Best accuracy, multilingual | | | 100+ | 216x real-time | Best speed/accuracy balance | Supported…