Instructions Determine these before writing code. Prefer discovering them from the repo and the user request. Ask only when the choice materially changes the implementation. 1. Runtime shape - Are they connecting to a running local server, embedding Chroma into tests, or setting up local development from scratch? - Decide whether they need , a Docker or service command, or , or Python . 2. Persistence - Persistent local data: choose an intentional data path. - Disposable test data: use defaults or a temp directory. 3. Embedding model - Reuse the app's existing embedding provider when possible…

,\n },\n});\n```\n\n### Combining regex with metadata filters\n\nRegex filters can be combined with metadata filters using `$and` and `$or` operators. This is powerful for narrowing results by both content patterns and structured metadata. Note however regex can not be used on metadata string values.\n\n```typescript\nawait collection.query({\n queryTexts: ['query1', 'query2'],\n whereDocument: {\n $and: [{ $contains: 'search_string_1' }, { $regex: '[a-z]+' }],\n },\n});\n```\n\n### Performance considerations\n\nRegex filtering happens after the initial vector search retrieves candidates. For best performance:\n- Keep regex patterns simple when possible\n- Use metadata filters to reduce the candidate set before regex matching\n- Consider whether a metadata field with pre-extracted values would be faster than runtime regex\n","content_type":"text/markdown; charset=utf-8","language":"markdown","size":2420,"content_sha256":"cecf6396b638ab810326a34145cf7cad930141ee55d42dee495c12b33b4487a9"},{"filename":"understanding-a-codebase.md","content":"---\nname: Integrating Chroma into an existing system\ndescription: Guidance for adding Chroma search to an existing application\n---\n\n## Integrating Chroma into an existing system\n\nAdding search to an existing application requires understanding the data flow and planning both the initial import and ongoing synchronization. This guide helps identify the key questions to answer.\n\n### Key questions to ask the user\n\nBefore writing any code, clarify:\n\n1. **What data should be searchable?** (documents, products, messages, etc.)\n2. **Where does that data currently live?** (database, S3, API, files)\n3. **How is the data structured?** (helps determine chunking strategy)\n4. **How often does the data change?** (informs sync strategy)\n5. **What latency is acceptable for updates to appear in search?**\n\n## Understanding the source data\n\nBefore designing the import pipeline, ask the user if you can look at a sample of the data that will be made searchable. Seeing real records is far more useful than a description of the schema.\n\n**If the data is in a database:** Write a short script that connects to the database and prints a few records. For example, a script that queries 3-5 rows from the relevant table and prints them to the terminal. This lets you see the actual field names, content lengths, and metadata available.\n\n**If the data is on disk:** Read a few of the files directly to understand their structure, format, and size. For example, if indexing markdown files, read 2-3 of them to see how they're organized.\n\nWhat to look for:\n- **Which field(s) contain the searchable text** — this becomes the document content in Chroma\n- **How long the content is** — determines whether chunking is needed\n- **What metadata is available** — fields like category, author, date, or tenant ID that could be useful for filtering\n- **How records are identified** — the primary key or filename that will link Chroma documents back to the source\n\nThis step prevents guesswork and leads to better chunking and metadata design decisions.\n\n## Initial data import (offline ingest)\n\nThe first step is getting existing data into Chroma. This typically involves:\n\n1. **Reading from the source** - database queries, S3 listing, API pagination\n2. **Chunking** - breaking large documents into searchable pieces (see data-model.md)\n3. **Embedding** - converting text chunks to vectors\n4. **Writing to Chroma** - batching for efficiency\n\nBuild this as a reusable pipeline, not a one-off script. The same chunking and embedding logic will be needed for ongoing updates.\n\n**Progress tracking:** For large imports, track which records have been processed. This allows resuming after failures and re-running for updates. A simple approach is storing the last processed ID or timestamp.\n\n## Keeping data in sync (online writes)\n\nAfter the initial import, new and updated data must flow to Chroma. There are two main patterns:\n\n### Asynchronous (recommended)\n\nUse a message queue (SQS, RabbitMQ, Redis streams, etc.) to decouple the primary write path from Chroma updates:\n\n1. Application writes to primary database\n2. Application publishes an event with the record ID\n3. Queue consumer fetches the record, chunks, embeds, and writes to Chroma\n\n**Benefits:** Primary writes aren't slowed by embedding latency. Retries are handled by the queue. Search updates can lag slightly without affecting the main application.\n\n### Synchronous\n\nIf no queue infrastructure exists and slight latency is acceptable, update Chroma in the same request:\n\n1. Application writes to primary database\n2. Application chunks, embeds, and writes to Chroma\n3. Request completes\n\n**Tradeoffs:** Simpler infrastructure but adds latency to every write. Failures in Chroma can affect the primary write path unless carefully handled.\n\n**Ask the user:** Do they have an async queue? If not, is synchronous acceptable, or should we set one up?\n\n## Handling updates and deletes\n\n- **Updates:** Re-chunk and re-embed the document, then use `upsert` to replace existing chunks\n- **Deletes:** Delete all chunks for the document by ID prefix or metadata filter\n\nStoring the source record ID in chunk metadata makes this straightforward. For example, if a blog post with ID `post-123` has 3 chunks, store `{\"source_id\": \"post-123\", \"chunk_index\": 0}` etc. on each chunk.","content_type":"text/markdown; charset=utf-8","language":"markdown","size":4315,"content_sha256":"de253c412a1c5899774606a6728cea15693b385d283022a68cbded083f3b9864"},{"filename":"updating-deleting/python.md","content":"---\nname: Updating and Deleting\ndescription: Update existing documents and delete data from collections\n---\n\n## Updating and Deleting\n\nChroma provides `update`, `upsert`, and `delete` methods for modifying data after initial insertion. Understanding when to use each is important for building reliable data sync pipelines.\n\n### Method overview\n\n| Method | Behavior | Use when |\n|--------|----------|----------|\n| `update` | Modifies existing documents, fails if ID doesn't exist | You know the document exists |\n| `upsert` | Updates if exists, inserts if not | Syncing from external data source |\n| `delete` | Removes documents by ID or filter | Removing stale or unwanted data |\n\n### Imports\n\n```python\nimport time\nfrom typing import TypedDict\n\nimport chromadb\n\nclient = chromadb.HttpClient(host=\"localhost\", port=8000)\n```\n\n## Update\n\nUpdate modifies existing documents. If an ID doesn't exist, the operation fails silently for that ID (no error thrown, but nothing is updated).\n\n**Important:** When you update a document's text, Chroma re-computes the embedding automatically using the collection's embedding function.\n\n```python\ncollection = client.get_or_create_collection(name=\"my_collection\")\n\ncollection.add(\n ids=[\"doc1\", \"doc2\"],\n documents=[\"Original text for doc1\", \"Original text for doc2\"],\n metadatas=[{\"category\": \"draft\"}, {\"category\": \"draft\"}],\n)\n\ncollection.update(\n ids=[\"doc1\"],\n documents=[\"Updated text for doc1\"],\n)\n\ncollection.update(\n ids=[\"doc1\", \"doc2\"],\n metadatas=[{\"category\": \"published\"}, {\"category\": \"published\"}],\n)\n\ncollection.update(\n ids=[\"doc2\"],\n documents=[\"Completely revised doc2 content\"],\n metadatas=[{\"category\": \"published\", \"revision\": 2}],\n)\n```\n\n## Upsert\n\nUpsert is the preferred method for syncing data from an external source. It inserts new documents and updates existing ones in a single operation.\n\n**When to use upsert vs update:**\n- Use `upsert` when syncing from a primary database (you don't know which records are new)\n- Use `update` when you're certain the document already exists\n\n```python\ncollection2 = client.get_or_create_collection(name=\"articles\")\n\ncollection2.upsert(\n ids=[\"article-123\", \"article-456\", \"article-789\"],\n documents=[\n \"Content of article 123\",\n \"Content of article 456\",\n \"Content of article 789\",\n ],\n metadatas=[\n {\"source_id\": \"123\", \"updated_at\": int(time.time())},\n {\"source_id\": \"456\", \"updated_at\": int(time.time())},\n {\"source_id\": \"789\", \"updated_at\": int(time.time())},\n ],\n)\n\ncollection2.upsert(\n ids=[\"article-123\", \"article-456\"],\n documents=[\n \"Updated content of article 123\",\n \"Updated content of article 456\",\n ],\n metadatas=[\n {\"source_id\": \"123\", \"updated_at\": int(time.time())},\n {\"source_id\": \"456\", \"updated_at\": int(time.time())},\n ],\n)\n```\n\n## Delete by ID\n\nThe simplest way to delete documents is by their IDs.\n\n```python\nawaiting_cleanup = [\"article-789\", \"article-456\"]\ncollection2.delete(ids=awaiting_cleanup)\n```\n\n## Delete by filter\n\nDelete documents matching metadata or content filters without knowing specific IDs. Useful for bulk cleanup operations.\n\n```python\ncollection2.delete(\n where={\"source_id\": \"123\"},\n)\n```\n\n## Syncing from an external data source\n\nA common pattern is keeping Chroma in sync with a primary database. This example shows how to handle creates, updates, and deletes.\n\n```python\nclass SourceRecord(TypedDict):\n id: str\n content: str\n deleted: bool\n updated_at: int\n\n\ndef sync_records(records: list[SourceRecord]) -> None:\n active_records = [record for record in records if not record[\"deleted\"]]\n deleted_ids = [record[\"id\"] for record in records if record[\"deleted\"]]\n\n if active_records:\n collection2.upsert(\n ids=[record[\"id\"] for record in active_records],\n documents=[record[\"content\"] for record in active_records],\n metadatas=[\n {\n \"source_id\": record[\"id\"],\n \"updated_at\": record[\"updated_at\"],\n }\n for record in active_records\n ],\n )\n\n if deleted_ids:\n collection2.delete(ids=deleted_ids)\n```\n\n### Sync strategy tips\n\n**Track source IDs:** Always store the primary database ID in metadata so you can find and update documents later.\n\n**Batch operations:** Process updates in batches of 100-500 to balance throughput and memory usage.\n\n**Handle deletes:** When records are deleted from your primary database, delete them from Chroma too. Use metadata filters if you track `source_id`.\n\n**Idempotent syncs:** Use `upsert` so re-running a sync doesn't create duplicates.\n","content_type":"text/markdown; charset=utf-8","language":"markdown","size":4726,"content_sha256":"4750797962aeb6262c06190a708ae9556d694962d4d784a7ebaa62698ec60f70"},{"filename":"updating-deleting/typescript.md","content":"---\nname: Updating and Deleting\ndescription: Update existing documents and delete data from collections\n---\n\n## Updating and Deleting\n\nChroma provides `update`, `upsert`, and `delete` methods for modifying data after initial insertion. Understanding when to use each is important for building reliable data sync pipelines.\n\n### Method overview\n\n| Method | Behavior | Use when |\n|--------|----------|----------|\n| `update` | Modifies existing documents, fails if ID doesn't exist | You know the document exists |\n| `upsert` | Updates if exists, inserts if not | Syncing from external data source |\n| `delete` | Removes documents by ID or filter | Removing stale or unwanted data |\n\n### Imports\n\n```typescript\nimport { ChromaClient } from 'chromadb';\nimport { DefaultEmbeddingFunction } from '@chroma-core/default-embed';\n\nconst client = new ChromaClient();\nconst embeddingFunction = new DefaultEmbeddingFunction();\n```\n\n## Update\n\nUpdate modifies existing documents. If an ID doesn't exist, the operation fails silently for that ID (no error thrown, but nothing is updated).\n\n**Important:** When you update a document's text, Chroma re-computes the embedding automatically using the collection's embedding function.\n\n```typescript\nconst collection = await client.getOrCreateCollection({\n name: 'my_collection',\n embeddingFunction,\n});\n\nawait collection.add({\n ids: ['doc1', 'doc2'],\n documents: ['Original text for doc1', 'Original text for doc2'],\n metadatas: [{ category: 'draft' }, { category: 'draft' }],\n});\n\nawait collection.update({\n ids: ['doc1'],\n documents: ['Updated text for doc1'],\n});\n\nawait collection.update({\n ids: ['doc1', 'doc2'],\n metadatas: [{ category: 'published' }, { category: 'published' }],\n});\n\nawait collection.update({\n ids: ['doc2'],\n documents: ['Completely revised doc2 content'],\n metadatas: [{ category: 'published', revision: 2 }],\n});\n```\n\n## Upsert\n\nUpsert is the preferred method for syncing data from an external source. It inserts new documents and updates existing ones in a single operation.\n\n**When to use upsert vs update:**\n- Use `upsert` when syncing from a primary database (you don't know which records are new)\n- Use `update` when you're certain the document already exists\n\n```typescript\nconst collection2 = await client.getOrCreateCollection({\n name: 'articles',\n embeddingFunction,\n});\n\nawait collection2.upsert({\n ids: ['article-123', 'article-456', 'article-789'],\n documents: [\n 'Content of article 123',\n 'Content of article 456',\n 'Content of article 789',\n ],\n metadatas: [\n { source_id: '123', updated_at: Date.now() },\n { source_id: '456', updated_at: Date.now() },\n { source_id: '789', updated_at: Date.now() },\n ],\n});\n\nawait collection2.upsert({\n ids: ['article-123', 'article-456'],\n documents: [\n 'Updated content of article 123',\n 'Updated content of article 456',\n ],\n metadatas: [\n { source_id: '123', updated_at: Date.now() },\n { source_id: '456', updated_at: Date.now() },\n ],\n});\n```\n\n## Delete by ID\n\nThe simplest way to delete documents is by their IDs.\n\n```typescript\nconst collection3 = await client.getOrCreateCollection({\n name: 'my_collection',\n embeddingFunction,\n});\n\nawait collection3.delete({\n ids: ['doc1', 'doc2'],\n});\n\nawait collection3.delete({\n ids: ['doc3'],\n});\n```\n\n## Delete by filter\n\nDelete documents matching metadata or content filters without knowing specific IDs. Useful for bulk cleanup operations.\n\n```typescript\nconst collection4 = await client.getOrCreateCollection({\n name: 'my_collection',\n embeddingFunction,\n});\n\nawait collection4.delete({\n where: { status: 'archived' },\n});\n\nawait collection4.delete({\n where: { source_id: 'old-source-123' },\n});\n\nawait collection4.delete({\n whereDocument: { $contains: 'DEPRECATED' },\n});\n\nawait collection4.delete({\n ids: ['doc1', 'doc2', 'doc3', 'doc4'],\n where: { category: 'temp' },\n});\n```\n\n## Syncing from an external data source\n\nA common pattern is keeping Chroma in sync with a primary database. This example shows how to handle creates, updates, and deletes.\n\n```typescript\ninterface SourceRecord {\n id: string;\n content: string;\n updated_at: number;\n category: string;\n}\n\nasync function syncToChroma(\n collectionName: string,\n records: SourceRecord[],\n deletedIds: string[]\n) {\n const collection = await client.getOrCreateCollection({\n name: collectionName,\n embeddingFunction,\n });\n\n if (records.length > 0) {\n const batchSize = 100;\n\n for (let i = 0; i \u003c records.length; i += batchSize) {\n const batch = records.slice(i, i + batchSize);\n\n await collection.upsert({\n ids: batch.map((record) => `source-${record.id}`),\n documents: batch.map((record) => record.content),\n metadatas: batch.map((record) => ({\n source_id: record.id,\n updated_at: record.updated_at,\n category: record.category,\n })),\n });\n }\n }\n\n if (deletedIds.length > 0) {\n await collection.delete({\n ids: deletedIds.map((id) => `source-${id}`),\n });\n }\n\n return { synced: records.length, deleted: deletedIds.length };\n}\n\nconst changedRecords: SourceRecord[] = [\n {\n id: '1',\n content: 'Article about TypeScript',\n updated_at: Date.now(),\n category: 'tech',\n },\n {\n id: '2',\n content: 'Guide to vector databases',\n updated_at: Date.now(),\n category: 'tech',\n },\n];\n\nconst deletedRecordIds = ['old-1', 'old-2'];\n\nawait syncToChroma('articles', changedRecords, deletedRecordIds);\n```\n\n### Sync strategy tips\n\n**Track source IDs:** Always store the primary database ID in metadata so you can find and update documents later.\n\n**Batch operations:** Process updates in batches of 100-500 to balance throughput and memory usage.\n\n**Handle deletes:** When records are deleted from your primary database, delete them from Chroma too. Use metadata filters if you track `source_id`.\n\n**Idempotent syncs:** Use `upsert` so re-running a sync doesn't create duplicates.\n","content_type":"text/markdown; charset=utf-8","language":"markdown","size":5978,"content_sha256":"988bea6497c12f361d9718f936d7a3923746ad6d8b15ac60073cfa3f4a244672"}],"content_json":{"type":"doc","content":[{"type":"heading","attrs":{"level":2},"content":[{"text":"Instructions","type":"text"}]},{"type":"paragraph","content":[{"text":"Determine these before writing code. Prefer discovering them from the repo and the user request. Ask only when the choice materially changes the implementation.","type":"text"}]},{"type":"ordered_list","attrs":{"order":1,"listStyle":"number"},"content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Runtime shape","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Are they connecting to a running local server, embedding Chroma into tests, or setting up local development from scratch?","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Decide whether they need ","type":"text"},{"text":"chroma run","type":"text","marks":[{"type":"code_inline"}]},{"text":", a Docker or service command, ","type":"text"},{"text":"HttpClient","type":"text","marks":[{"type":"code_inline"}]},{"text":" or ","type":"text"},{"text":"ChromaClient","type":"text","marks":[{"type":"code_inline"}]},{"text":", or Python ","type":"text"},{"text":"EphemeralClient","type":"text","marks":[{"type":"code_inline"}]},{"text":".","type":"text"}]}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Persistence","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Persistent local data: choose an intentional data path.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Disposable test data: use defaults or a temp directory.","type":"text"}]}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Embedding model","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Reuse the app's existing embedding provider when possible.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Otherwise default to ","type":"text"},{"text":"@chroma-core/default-embed","type":"text","marks":[{"type":"code_inline"}]},{"text":" in TypeScript or the standard local default in Python.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"If the user explicitly wants OpenAI embeddings in TypeScript, install and use ","type":"text"},{"text":"@chroma-core/openai","type":"text","marks":[{"type":"code_inline"}]},{"text":".","type":"text"}]}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Indexed data shape","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Determine what is being indexed, how it should be chunked, and what metadata is needed for filtering and updates.","type":"text"}]}]}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Routing","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Existing local server","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Confirm host and port before changing client code.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Validate the server is reachable before assuming collections are missing.","type":"text"}]}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Fresh local development","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Add a local startup path such as ","type":"text"},{"text":"chroma run","type":"text","marks":[{"type":"code_inline"}]},{"text":" or the repo's existing Docker or service command.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Default to ","type":"text"},{"text":"localhost:8000","type":"text","marks":[{"type":"code_inline"}]},{"text":" unless the repo already uses another address.","type":"text"}]}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Python tests or disposable local workflows","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Prefer ","type":"text"},{"text":"EphemeralClient","type":"text","marks":[{"type":"code_inline"}]},{"text":" when persistence is unnecessary.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Call out that data is lost when the process exits.","type":"text"}]}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Persistent local development","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Use a stable data path and make persistence explicit in code or config.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Do not silently switch between ephemeral and persistent modes.","type":"text"}]}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Search integration work","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Use ","type":"text"},{"text":"getOrCreateCollection()","type":"text","marks":[{"type":"code_inline"}]},{"text":" in TypeScript or ","type":"text"},{"text":"get_or_create_collection()","type":"text","marks":[{"type":"code_inline"}]},{"text":" in Python.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Design document IDs and metadata so upserts and deletes are straightforward.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Batch writes when syncing large datasets.","type":"text"}]}]}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Ask vs proceed","type":"text"}]},{"type":"paragraph","content":[{"text":"Ask first:","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Embedding model choice (cost and quality implications)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Whether they need persistent local data","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"How they are starting the local server","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Multi-tenant data isolation strategy","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"Proceed with sensible defaults:","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Use ","type":"text"},{"text":"getOrCreateCollection()","type":"text","marks":[{"type":"code_inline"}]},{"text":" (TypeScript) / ","type":"text"},{"text":"get_or_create_collection()","type":"text","marks":[{"type":"code_inline"}]},{"text":" (Python)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Use cosine similarity (most common)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Chunk size under 8KB","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Store source IDs in metadata for updates/deletes","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Use a local server on ","type":"text"},{"text":"localhost:8000","type":"text","marks":[{"type":"code_inline"}]},{"text":" unless the repo already configures another address or is using Python ","type":"text"},{"text":"EphemeralClient","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"What to validate","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Correct client import (","type":"text"},{"text":"ChromaClient","type":"text","marks":[{"type":"code_inline"}]},{"text":", ","type":"text"},{"text":"HttpClient","type":"text","marks":[{"type":"code_inline"}]},{"text":", or ","type":"text"},{"text":"Client","type":"text","marks":[{"type":"code_inline"}]},{"text":")","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Embedding function package is installed (TypeScript)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Local server is reachable before assuming collections are missing","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Local path and persistence mode are intentional","type":"text"}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Implementation notes","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Local Chroma is the right default for development, tests, and self-hosted deployments.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"OSS Chroma does not include Chroma Cloud-only features such as ","type":"text"},{"text":"Schema()","type":"text","marks":[{"type":"code_inline"}]},{"text":" and ","type":"text"},{"text":"Search()","type":"text","marks":[{"type":"code_inline"}]},{"text":".","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"If the user asks for hybrid dense and sparse retrieval, treat that as a likely Chroma Cloud requirement unless the repo already implements an OSS workaround.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"For open source Chroma, dense retrieval with a single embedding function is the normal baseline.","type":"text"}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Minimal patterns","type":"text"}]},{"type":"paragraph","content":[{"text":"Start a local Chroma server when the repo needs one:","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"chroma run","type":"text"}]},{"type":"paragraph","content":[{"text":"Default address: ","type":"text"},{"text":"localhost:8000","type":"text","marks":[{"type":"code_inline"}]},{"text":".","type":"text"}]},{"type":"paragraph","content":[{"text":"TypeScript local client:","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"typescript"},"content":[{"text":"import { ChromaClient } from 'chromadb';\nimport { DefaultEmbeddingFunction } from '@chroma-core/default-embed';\n\nconst client = new ChromaClient();\n\nconst embeddingFunction = new DefaultEmbeddingFunction();\nconst collection = await client.getOrCreateCollection({\n name: 'my_collection',\n embeddingFunction,\n});\n\n// Add documents\nawait collection.add({\n ids: ['doc1', 'doc2'],\n documents: ['First document text', 'Second document text'],\n});\n\n// Query\nconst results = await collection.query({\n queryTexts: ['search query'],\n nResults: 5,\n});","type":"text"}]},{"type":"paragraph","content":[{"text":"Python local client:","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"python"},"content":[{"text":"import chromadb\n\nclient = chromadb.HttpClient(host=\"localhost\", port=8000)\n\ncollection = client.get_or_create_collection(name=\"my_collection\")\n\n# Add documents\ncollection.add(\n ids=[\"doc1\", \"doc2\"] ,\n documents=[\"First document text\", \"Second document text\"],\n)\n\n# Query\nresults = collection.query(\n query_texts=[\"search query\"],\n n_results=5,\n)","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Learn More","type":"text"}]},{"type":"paragraph","content":[{"text":"Fetch Chroma's ","type":"text"},{"text":"llms.txt","type":"text","marks":[{"type":"code_inline"}]},{"text":" only when you need API or product details that are not already in the repo or this skill: https://docs.trychroma.com/llms.txt","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Available Topics","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Typescript","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Chroma Regex Filtering","type":"text","marks":[{"type":"link","attrs":{"href":"./regex/typescript.md","title":null}}]},{"text":" - Learn how to use regex filters in Chroma queries","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Query and Get","type":"text","marks":[{"type":"link","attrs":{"href":"./querying/typescript.md","title":null}}]},{"text":" - Query and Get Data from Chroma Collections","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Metadata","type":"text","marks":[{"type":"link","attrs":{"href":"./metadata/typescript.md","title":null}}]},{"text":" - Store and query metadata, including filters and array values","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Updating and Deleting","type":"text","marks":[{"type":"link","attrs":{"href":"./updating-deleting/typescript.md","title":null}}]},{"text":" - Update existing documents and delete data from collections","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Error Handling","type":"text","marks":[{"type":"link","attrs":{"href":"./error-handling/typescript.md","title":null}}]},{"text":" - Handling errors and failures when working with Chroma","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Local Chroma","type":"text","marks":[{"type":"link","attrs":{"href":"./local-chroma/typescript.md","title":null}}]},{"text":" - How to run and use local chroma","type":"text"}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Python","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Chroma Regex Filtering","type":"text","marks":[{"type":"link","attrs":{"href":"./regex/python.md","title":null}}]},{"text":" - Learn how to use regex filters in Chroma queries","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Query and Get","type":"text","marks":[{"type":"link","attrs":{"href":"./querying/python.md","title":null}}]},{"text":" - Query and Get Data from Chroma Collections","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Metadata","type":"text","marks":[{"type":"link","attrs":{"href":"./metadata/python.md","title":null}}]},{"text":" - Store and query metadata, including filters and array values","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Updating and Deleting","type":"text","marks":[{"type":"link","attrs":{"href":"./updating-deleting/python.md","title":null}}]},{"text":" - Update existing documents and delete data from collections","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Error Handling","type":"text","marks":[{"type":"link","attrs":{"href":"./error-handling/python.md","title":null}}]},{"text":" - Handling errors and failures when working with Chroma","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Local Chroma","type":"text","marks":[{"type":"link","attrs":{"href":"./local-chroma/python.md","title":null}}]},{"text":" - How to run and use local chroma","type":"text"}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"General","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Data Model","type":"text","marks":[{"type":"link","attrs":{"href":"./data-model.md","title":null}}]},{"text":" - An overview of how Chroma stores data","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Integrating Chroma into an existing system","type":"text","marks":[{"type":"link","attrs":{"href":"./understanding-a-codebase.md","title":null}}]},{"text":" - Guidance for adding Chroma search to an existing application","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Chroma CLI","type":"text","marks":[{"type":"link","attrs":{"href":"./cli.md","title":null}}]},{"text":" - Starting and managing a local open source Chroma server from the CLI","type":"text"}]}]}]},{"type":"hr","attrs":{"markup":"---"}}]},"metadata":{"date":"2026-06-05","name":"chroma-local","author":"@skillopedia","source":{"stars":18,"repo_name":"agent-skills","origin_url":"https://github.com/chroma-core/agent-skills/blob/HEAD/skills/chroma-local/SKILL.md","repo_owner":"chroma-core","body_sha256":"d47aa56b105664b21564dd0d7c431286b2ddf343085ef79270abebc38c65f522","cluster_key":"e4a97736bc207e900004c3484309312a9eae221c15de7230a1c0cfb575eb785e","clean_bundle":{"format":"clean-skill-bundle-v1","source":"chroma-core/agent-skills/skills/chroma-local/SKILL.md","attachments":[{"id":"1c1da354-153a-57b8-9309-a691e27a4c8e","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/1c1da354-153a-57b8-9309-a691e27a4c8e/attachment.md","path":"cli.md","size":1081,"sha256":"152796438697e7be4ac89934d69b3e78508d5f822bd91fc449979c7872c67d56","contentType":"text/markdown; charset=utf-8"},{"id":"c3f5177a-569b-5e71-9b4a-1b761f1378f1","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/c3f5177a-569b-5e71-9b4a-1b761f1378f1/attachment.md","path":"data-model.md","size":4079,"sha256":"829d2e66b2bfeb4b6b09e90288092d05b1d5b1e28fb73f85f73224ed230a0b2d","contentType":"text/markdown; charset=utf-8"},{"id":"33bd90e1-e38c-5153-bf96-0414dd017bcb","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/33bd90e1-e38c-5153-bf96-0414dd017bcb/attachment.md","path":"error-handling/python.md","size":5201,"sha256":"033697e62dfa828c26e807a9bf89ea5d284d6db3f770f07c8f8f12cb0e3f532b","contentType":"text/markdown; charset=utf-8"},{"id":"aa3e7d0f-060c-5bb5-9987-1d866795cbac","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/aa3e7d0f-060c-5bb5-9987-1d866795cbac/attachment.md","path":"error-handling/typescript.md","size":6952,"sha256":"a69264672a53c997b8c3098b8f0da498d0f5dd154974f40309788fdd477708ee","contentType":"text/markdown; charset=utf-8"},{"id":"b9b8de1c-6c32-51ac-b44e-c981ea0a4b8c","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/b9b8de1c-6c32-51ac-b44e-c981ea0a4b8c/attachment.md","path":"local-chroma/python.md","size":2452,"sha256":"f500c5a3ea3a83814a090791f4ac266dd6ae27f82dabb98e1a5041f89bebbfb0","contentType":"text/markdown; charset=utf-8"},{"id":"969c2042-2a67-5b69-bb2f-5e12b1d4d95b","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/969c2042-2a67-5b69-bb2f-5e12b1d4d95b/attachment.md","path":"local-chroma/typescript.md","size":2559,"sha256":"3396c139f1a751d9c16554a8d4b8377ee0c48c3121c047d645ed00c8207f1869","contentType":"text/markdown; charset=utf-8"},{"id":"45a61aa4-710f-589a-a3f3-16801a8050ff","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/45a61aa4-710f-589a-a3f3-16801a8050ff/attachment.md","path":"metadata/python.md","size":4408,"sha256":"456a9b5b57dc5a7dd6a3b8144ebd5fc52bce4d75d1ad8415f0c0f30be62b5f9a","contentType":"text/markdown; charset=utf-8"},{"id":"f3330cf3-4a1f-5e82-9074-0f64b7342fd4","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/f3330cf3-4a1f-5e82-9074-0f64b7342fd4/attachment.md","path":"metadata/typescript.md","size":4456,"sha256":"7b32e59f632af1e80fc3e397761c38ccd9f702d515dd438ca51ba66bc401dd96","contentType":"text/markdown; charset=utf-8"},{"id":"bab65092-9168-5dc3-8633-37283b8e400b","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/bab65092-9168-5dc3-8633-37283b8e400b/attachment.md","path":"querying/python.md","size":4038,"sha256":"2d9c508d43f885c6fed002786fc18094321c4ef28b304011fa388af4c672270d","contentType":"text/markdown; charset=utf-8"},{"id":"661659ca-a685-54f7-b483-73fc40c7c24f","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/661659ca-a685-54f7-b483-73fc40c7c24f/attachment.md","path":"querying/typescript.md","size":4382,"sha256":"65c31da9f8b257c78e3f2196982e0b2d1e66465ccbf9e3894262a9c017068364","contentType":"text/markdown; charset=utf-8"},{"id":"572bed76-b62e-56ed-b826-ba59dd49734c","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/572bed76-b62e-56ed-b826-ba59dd49734c/attachment.md","path":"regex/python.md","size":2227,"sha256":"31b0f34152ded5f028d1157340e36d0ba37ba3a9f655b787144f62e947e1b128","contentType":"text/markdown; charset=utf-8"},{"id":"4a664349-3bd5-5d73-93a7-1e073129761e","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/4a664349-3bd5-5d73-93a7-1e073129761e/attachment.md","path":"regex/typescript.md","size":2420,"sha256":"cecf6396b638ab810326a34145cf7cad930141ee55d42dee495c12b33b4487a9","contentType":"text/markdown; charset=utf-8"},{"id":"71a915ea-2a90-5460-8f97-9f478e291bfd","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/71a915ea-2a90-5460-8f97-9f478e291bfd/attachment.md","path":"understanding-a-codebase.md","size":4315,"sha256":"de253c412a1c5899774606a6728cea15693b385d283022a68cbded083f3b9864","contentType":"text/markdown; charset=utf-8"},{"id":"4d262abe-1aa0-5aad-afd1-a4e004738abc","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/4d262abe-1aa0-5aad-afd1-a4e004738abc/attachment.md","path":"updating-deleting/python.md","size":4726,"sha256":"4750797962aeb6262c06190a708ae9556d694962d4d784a7ebaa62698ec60f70","contentType":"text/markdown; charset=utf-8"},{"id":"4ce371ca-184c-526d-9bf9-d59d4858c266","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/4ce371ca-184c-526d-9bf9-d59d4858c266/attachment.md","path":"updating-deleting/typescript.md","size":5978,"sha256":"988bea6497c12f361d9718f936d7a3923746ad6d8b15ac60073cfa3f4a244672","contentType":"text/markdown; charset=utf-8"}],"bundle_sha256":"60283d03e2b0c6a32fc130f19b740011a031df05e17c61de9819f38cb4249e08","attachment_count":15,"text_attachments":15,"attachment_storage":"skillopedia-attachments-v1","binary_attachments":0,"excluded_attachments":[]},"cluster_size":1,"skill_md_path":"skills/chroma-local/SKILL.md","import_metadata":{"date":"2026-06-05","author":"@skillopedia","version":"v1","category":"devops-infrastructure","category_label":"DevOps"},"exact_dupes_collapsed_into_this":0},"version":"v1","category":"devops-infrastructure","import_tag":"clean-skills-v1","description":"Use when the user needs self-hosted or local Chroma for semantic search, including `ChromaClient`, `HttpClient`, or Python `EphemeralClient`, local persistence, Docker or `chroma run`, or OSS Chroma without Chroma Cloud features."}},"renderedAt":1782980754677}

Instructions Determine these before writing code. Prefer discovering them from the repo and the user request. Ask only when the choice materially changes the implementation. 1. Runtime shape - Are they connecting to a running local server, embedding Chroma into tests, or setting up local development from scratch? - Decide whether they need , a Docker or service command, or , or Python . 2. Persistence - Persistent local data: choose an intentional data path. - Disposable test data: use defaults or a temp directory. 3. Embedding model - Reuse the app's existing embedding provider when possible…