Multimodal Corpus Ingestion Overview Mixed corpora break down when everything is treated like plain text. Ingest code, prose, visuals, and transcripts according to what each artifact can actually tell you, then normalize them into one corpus with provenance intact. When to Use - A task spans code, docs, PDFs, screenshots, or diagrams - You need one queryable corpus instead of scattered files - The user gives a folder with mixed artifact types - Architecture or product understanding depends on visuals and prose together - Retrieval quality is poor because source types are inconsistent Source C…