PostgreSQL Expert Skills This skill provides comprehensive PostgreSQL expertise through specialized references. Load the appropriate reference based on the task. Available References Table Design - design-postgres-tables — Data types, constraints, indexes, JSONB patterns, partitioning, and PostgreSQL best practices. Use for any general table/schema design task. - design-postgis-tables — PostGIS spatial table design: geometry vs geography types, SRIDs, spatial indexing, and location-based query patterns. Use when the task involves geographic or spatial data. Search - pgvector-semantic-search —…

)` for reusable custom types with validation. Enforces constraints across tables.\n- **Composite types**: `CREATE TYPE address AS (street TEXT, city TEXT, zip TEXT)` for structured data within columns. Access with `(col).field` syntax.\n- **JSONB**: preferred over JSON; index with **GIN**. Use only for optional/semi-structured attrs. ONLY use JSON if the original ordering of the contents MUST be preserved.\n- **Vector types**: `vector` type by `pgvector` for vector similarity search for embeddings.\n\n### Do not use the following data types\n\n- DO NOT use `timestamp` (without time zone); DO use `timestamptz` instead.\n- DO NOT use `char(n)` or `varchar(n)`; DO use `text` instead.\n- DO NOT use `money` type; DO use `numeric` instead.\n- DO NOT use `timetz` type; DO use `timestamptz` instead.\n- DO NOT use `timestamptz(0)` or any other precision specification; DO use `timestamptz` instead\n- DO NOT use `serial` type; DO use `generated always as identity` instead.\n- DO NOT use `POINT`, `LINE`, `POLYGON`, `CIRCLE` built-in types, DO use `geometry` from postgis extension instead.\n\n## Table Types\n\n- **Regular**: default; fully durable, logged.\n- **TEMPORARY**: session-scoped, auto-dropped, not logged. Faster for scratch work.\n- **UNLOGGED**: persistent but not crash-safe. Faster writes; good for caches/staging.\n\n## Row-Level Security\n\nEnable with `ALTER TABLE tbl ENABLE ROW LEVEL SECURITY`. Create policies: `CREATE POLICY user_access ON orders FOR SELECT TO app_users USING (user_id = current_user_id())`. Built-in user-based access control at the row level.\n\n## Constraints\n\n- **PK**: implicit UNIQUE + NOT NULL; creates a B-tree index.\n- **FK**: specify `ON DELETE/UPDATE` action (`CASCADE`, `RESTRICT`, `SET NULL`, `SET DEFAULT`). Add explicit index on referencing column—speeds up joins and prevents locking issues on parent deletes/updates. Use `DEFERRABLE INITIALLY DEFERRED` for circular FK dependencies checked at transaction end.\n- **UNIQUE**: creates a B-tree index; allows multiple NULLs unless `NULLS NOT DISTINCT` (PG15+). Standard behavior: `(1, NULL)` and `(1, NULL)` are allowed. With `NULLS NOT DISTINCT`: only one `(1, NULL)` allowed. Prefer `NULLS NOT DISTINCT` unless you specifically need duplicate NULLs.\n- **CHECK**: row-local constraints; NULL values pass the check (three-valued logic). Example: `CHECK (price > 0)` allows NULL prices. Combine with `NOT NULL` to enforce: `price NUMERIC NOT NULL CHECK (price > 0)`.\n- **EXCLUDE**: prevents overlapping values using operators. `EXCLUDE USING gist (room_id WITH =, booking_period WITH &&)` prevents double-booking rooms. Requires appropriate index type (often GiST).\n\n## Indexing\n\n- **B-tree**: default for equality/range queries (`=`, `\u003c`, `>`, `BETWEEN`, `ORDER BY`)\n- **Composite**: order matters—index used if equality on leftmost prefix (`WHERE a = ? AND b > ?` uses index on `(a,b)`, but `WHERE b = ?` does not). Put most selective/frequently filtered columns first.\n- **Covering**: `CREATE INDEX ON tbl (id) INCLUDE (name, email)` - includes non-key columns for index-only scans without visiting table.\n- **Partial**: for hot subsets (`WHERE status = 'active'` → `CREATE INDEX ON tbl (user_id) WHERE status = 'active'`). Any query with `status = 'active'` can use this index.\n- **Expression**: for computed search keys (`CREATE INDEX ON tbl (LOWER(email))`). Expression must match exactly in WHERE clause: `WHERE LOWER(email) = '[email protected]'`.\n- **GIN**: JSONB containment/existence, arrays (`@>`, `?`), full-text search (`@@`)\n- **GiST**: ranges, geometry, exclusion constraints\n- **BRIN**: very large, naturally ordered data (time-series)—minimal storage overhead. Effective when row order on disk correlates with indexed column (insertion order or after `CLUSTER`).\n\n## Partitioning\n\n- Use for very large tables (>100M rows) where queries consistently filter on partition key (often time/date).\n- Alternate use: use for tables where data maintenance tasks dictates e.g. data pruned or bulk replaced periodically\n- **RANGE**: common for time-series (`PARTITION BY RANGE (created_at)`). Create partitions: `CREATE TABLE logs_2024_01 PARTITION OF logs FOR VALUES FROM ('2024-01-01') TO ('2024-02-01')`. **TimescaleDB** automates time-based or ID-based partitioning with retention policies and compression.\n- **LIST**: for discrete values (`PARTITION BY LIST (region)`). Example: `FOR VALUES IN ('us-east', 'us-west')`.\n- **HASH**: for even distribution when no natural key (`PARTITION BY HASH (user_id)`). Creates N partitions with modulus.\n- **Constraint exclusion**: requires `CHECK` constraints on partitions for query planner to prune. Auto-created for declarative partitioning (PG10+).\n- Prefer declarative partitioning or hypertables. Do NOT use table inheritance.\n- **Limitations**: no global UNIQUE constraints—include partition key in PK/UNIQUE. FKs from partitioned tables not supported; use triggers.\n\n## Special Considerations\n\n### Update-Heavy Tables\n\n- **Separate hot/cold columns**—put frequently updated columns in separate table to minimize bloat.\n- **Use `fillfactor=90`** to leave space for HOT updates that avoid index maintenance.\n- **Avoid updating indexed columns**—prevents beneficial HOT updates.\n- **Partition by update patterns**—separate frequently updated rows in a different partition from stable data.\n\n### Insert-Heavy Workloads\n\n- **Minimize indexes**—only create what you query; every index slows inserts.\n- **Use `COPY` or multi-row `INSERT`** instead of single-row inserts.\n- **UNLOGGED tables** for rebuildable staging data—much faster writes.\n- **Defer index creation** for bulk loads—>drop index, load data, recreate indexes.\n- **Partition by time/hash** to distribute load. **TimescaleDB** automates partitioning and compression of insert-heavy data.\n- **Use a natural key for primary key** such as a (timestamp, device_id) if enforcing global uniqueness is important many insert-heavy tables don't need a primary key at all.\n- If you do need a surrogate key, **Prefer `BIGINT GENERATED ALWAYS AS IDENTITY` over `UUID`**.\n\n### Upsert-Friendly Design\n\n- **Requires UNIQUE index** on conflict target columns—`ON CONFLICT (col1, col2)` needs exact matching unique index (partial indexes don't work).\n- **Use `EXCLUDED.column`** to reference would-be-inserted values; only update columns that actually changed to reduce write overhead.\n- **`DO NOTHING` faster** than `DO UPDATE` when no actual update needed.\n\n### Safe Schema Evolution\n\n- **Transactional DDL**: most DDL operations can run in transactions and be rolled back—`BEGIN; ALTER TABLE...; ROLLBACK;` for safe testing.\n- **Concurrent index creation**: `CREATE INDEX CONCURRENTLY` avoids blocking writes but can't run in transactions.\n- **Volatile defaults cause rewrites**: adding `NOT NULL` columns with volatile defaults (e.g., `now()`, `gen_random_uuid()`) rewrites entire table. Non-volatile defaults are fast.\n- **Drop constraints before columns**: `ALTER TABLE DROP CONSTRAINT` then `DROP COLUMN` to avoid dependency issues.\n- **Function signature changes**: `CREATE OR REPLACE` with different arguments creates overloads, not replacements. DROP old version if no overload desired.\n\n## Generated Columns\n\n- `... GENERATED ALWAYS AS (\u003cexpr>) STORED` for computed, indexable fields. PG18+ adds `VIRTUAL` columns (computed on read, not stored).\n\n## Extensions\n\n- **`pgcrypto`**: `crypt()` for password hashing.\n- **`uuid-ossp`**: alternative UUID functions; prefer `pgcrypto` for new projects.\n- **`pg_trgm`**: fuzzy text search with `%` operator, `similarity()` function. Index with GIN for `LIKE '%pattern%'` acceleration.\n- **`citext`**: case-insensitive text type. Prefer expression indexes on `LOWER(col)` unless you need case-insensitive constraints.\n- **`btree_gin`/`btree_gist`**: enable mixed-type indexes (e.g., GIN index on both JSONB and text columns).\n- **`hstore`**: key-value pairs; mostly superseded by JSONB but useful for simple string mappings.\n- **`timescaledb`**: essential for time-series—automated partitioning, retention, compression, continuous aggregates.\n- **`postgis`**: comprehensive geospatial support beyond basic geometric types—essential for location-based applications.\n- **`pgvector`**: vector similarity search for embeddings.\n- **`pgaudit`**: audit logging for all database activity.\n\n## JSONB Guidance\n\n- Prefer `JSONB` with **GIN** index.\n- Default: `CREATE INDEX ON tbl USING GIN (jsonb_col);` → accelerates:\n - **Containment** `jsonb_col @> '{\"k\":\"v\"}'`\n - **Key existence** `jsonb_col ? 'k'`, **any/all keys** `?\\|`, `?&`\n - **Path containment** on nested docs\n - **Disjunction** `jsonb_col @> ANY(ARRAY['{\"status\":\"active\"}', '{\"status\":\"pending\"}'])`\n- Heavy `@>` workloads: consider opclass `jsonb_path_ops` for smaller/faster containment-only indexes:\n - `CREATE INDEX ON tbl USING GIN (jsonb_col jsonb_path_ops);`\n - **Trade-off**: loses support for key existence (`?`, `?|`, `?&`) queries—only supports containment (`@>`)\n- Equality/range on a specific scalar field: extract and index with B-tree (generated column or expression):\n - `ALTER TABLE tbl ADD COLUMN price INT GENERATED ALWAYS AS ((jsonb_col->>'price')::INT) STORED;`\n - `CREATE INDEX ON tbl (price);`\n - Prefer queries like `WHERE price BETWEEN 100 AND 500` (uses B-tree) over `WHERE (jsonb_col->>'price')::INT BETWEEN 100 AND 500` without index.\n- Arrays inside JSONB: use GIN + `@>` for containment (e.g., tags). Consider `jsonb_path_ops` if only doing containment.\n- Keep core relations in tables; use JSONB for optional/variable attributes.\n- Use constraints to limit allowed JSONB values in a column e.g. `config JSONB NOT NULL CHECK(jsonb_typeof(config) = 'object')`\n\n## Examples\n\n### Users\n\n```sql\nCREATE TABLE users (\n user_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,\n email TEXT NOT NULL UNIQUE,\n name TEXT NOT NULL,\n created_at TIMESTAMPTZ NOT NULL DEFAULT now()\n);\nCREATE UNIQUE INDEX ON users (LOWER(email));\nCREATE INDEX ON users (created_at);\n```\n\n### Orders\n\n```sql\nCREATE TABLE orders (\n order_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,\n user_id BIGINT NOT NULL REFERENCES users(user_id),\n status TEXT NOT NULL DEFAULT 'PENDING' CHECK (status IN ('PENDING','PAID','CANCELED')),\n total NUMERIC(10,2) NOT NULL CHECK (total > 0),\n created_at TIMESTAMPTZ NOT NULL DEFAULT now()\n);\nCREATE INDEX ON orders (user_id);\nCREATE INDEX ON orders (created_at);\n```\n\n### JSONB\n\n```sql\nCREATE TABLE profiles (\n user_id BIGINT PRIMARY KEY REFERENCES users(user_id),\n attrs JSONB NOT NULL DEFAULT '{}',\n theme TEXT GENERATED ALWAYS AS (attrs->>'theme') STORED\n);\nCREATE INDEX profiles_attrs_gin ON profiles USING GIN (attrs);\n```\n","content_type":"text/markdown; charset=utf-8","language":"markdown","size":16881,"content_sha256":"a4822cefdc283979e8a75f312780cb00aad83fb03b994f72e1675dc04a8eb252"},{"filename":"references/find-hypertable-candidates.md","content":"---\nname: find-hypertable-candidates\ndescription: |\n Use this skill to analyze an existing PostgreSQL database and identify which tables should be converted to Timescale/TimescaleDB hypertables.\n\n **Trigger when user asks to:**\n - Analyze database tables for hypertable conversion potential\n - Identify time-series or event tables in an existing schema\n - Evaluate if a table would benefit from Timescale/TimescaleDB\n - Audit PostgreSQL tables for migration to Timescale/TimescaleDB/TigerData\n - Score or rank tables for hypertable candidacy\n\n\n **Keywords:** hypertable candidate, table analysis, migration assessment, Timescale, TimescaleDB, time-series detection, insert-heavy tables, event logs, audit tables\n\n Provides SQL queries to analyze table statistics, index patterns, and query patterns. Includes scoring criteria (8+ points = good candidate) and pattern recognition for IoT, events, transactions, and sequential data.\nlicense: Apache-2.0\ncompatibility: Requires PostgreSQL 15+ with TimescaleDB\nmetadata:\n author: tigerdata\n---\n\n# PostgreSQL Hypertable Candidate Analysis\n\nIdentify tables that would benefit from TimescaleDB hypertable conversion. After identification, use the companion \"migrate-postgres-tables-to-hypertables\" skill for configuration and migration.\n\n## TimescaleDB Benefits\n\n**Performance gains:** 90%+ compression, fast time-based queries, improved insert performance, efficient aggregations, continuous aggregates for materialization (dashboards, reports, analytics), automatic data management (retention, compression).\n\n**Best for insert-heavy patterns:**\n\n- Time-series data (sensors, metrics, monitoring)\n- Event logs (user events, audit trails, application logs)\n- Transaction records (orders, payments, financial)\n- Sequential data (auto-incrementing IDs with timestamps)\n- Append-only datasets (immutable records, historical)\n\n**Requirements:** Large volumes (1M+ rows), time-based queries, infrequent updates\n\n## Step 1: Database Schema Analysis\n\n### Option A: From Database Connection\n\n#### Table statistics and size\n\n```sql\n-- Get all tables with row counts and insert/update patterns\nWITH table_stats AS (\n SELECT\n schemaname, tablename,\n n_tup_ins as total_inserts,\n n_tup_upd as total_updates,\n n_tup_del as total_deletes,\n n_live_tup as live_rows,\n n_dead_tup as dead_rows\n FROM pg_stat_user_tables\n),\ntable_sizes AS (\n SELECT\n schemaname, tablename,\n pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as total_size,\n pg_total_relation_size(schemaname||'.'||tablename) as total_size_bytes\n FROM pg_tables\n WHERE schemaname NOT IN ('information_schema', 'pg_catalog')\n)\nSELECT\n ts.schemaname, ts.tablename, ts.live_rows,\n tsize.total_size, tsize.total_size_bytes,\n ts.total_inserts, ts.total_updates, ts.total_deletes,\n ROUND(CASE WHEN ts.live_rows > 0\n THEN (ts.total_inserts::float / ts.live_rows) * 100\n ELSE 0 END, 2) as insert_ratio_pct\nFROM table_stats ts\nJOIN table_sizes tsize ON ts.schemaname = tsize.schemaname AND ts.tablename = tsize.tablename\nORDER BY tsize.total_size_bytes DESC;\n```\n\n**Look for:**\n\n- mostly insert-heavy patterns (less updates/deletes)\n- big tables (1M+ rows or 100MB+)\n\n#### Index patterns\n\n```sql\n-- Identify common query dimensions\nSELECT schemaname, tablename, indexname, indexdef\nFROM pg_indexes\nWHERE schemaname NOT IN ('information_schema', 'pg_catalog')\nORDER BY tablename, indexname;\n```\n\n**Look for:**\n\n- Multiple indexes with timestamp/created_at columns → time-based queries\n- Composite (entity_id, timestamp) indexes → good candidates\n- Time-only indexes → time range filtering common\n\n#### Query patterns (if pg_stat_statements available)\n\n```sql\n-- Check availability\nSELECT EXISTS (SELECT 1 FROM pg_extension WHERE extname = 'pg_stat_statements');\n\n-- Analyze expensive queries for candidate tables\nSELECT query, calls, mean_exec_time, total_exec_time\nFROM pg_stat_statements\nWHERE query ILIKE '%your_table_name%'\nORDER BY total_exec_time DESC LIMIT 20;\n```\n\n**✅ Good patterns:** Time-based WHERE, entity filtering combined with time-based qualifiers, GROUP BY time_bucket, range queries over time\n**❌ Poor patterns:** Non-time lookups with no time-based qualifiers in same query (WHERE email = ...)\n\n#### Constraints\n\n```sql\n-- Check migration compatibility\nSELECT conname, contype, pg_get_constraintdef(oid) as definition\nFROM pg_constraint\nWHERE conrelid = 'your_table_name'::regclass;\n```\n\n**Compatibility:**\n\n- Primary keys (p): Must include partition column or ask user if can be modified\n- Foreign keys (f): Plain→Hypertable and Hypertable→Plain OK, Hypertable→Hypertable NOT supported\n- Unique constraints (u): Must include partition column or ask user if can be modified\n- Check constraints (c): Usually OK\n\n### Option B: From Code Analysis\n\n#### ✅ GOOD Patterns\n\n```python\n# Append-only logging\nINSERT INTO events (user_id, event_time, data) VALUES (...);\n# Time-series collection\nINSERT INTO metrics (device_id, timestamp, value) VALUES (...);\n# Time-based queries\nSELECT * FROM metrics WHERE timestamp >= NOW() - INTERVAL '24 hours';\n# Time aggregations\nSELECT DATE_TRUNC('day', timestamp), COUNT(*) GROUP BY 1;\n```\n\n#### ❌ POOR Patterns\n\n```python\n# Frequent updates to historical records\nUPDATE users SET email = ..., updated_at = NOW() WHERE id = ...;\n# Non-time lookups\nSELECT * FROM users WHERE email = ...;\n# Small reference tables\nSELECT * FROM countries ORDER BY name;\n```\n\n#### Schema Indicators\n\n**✅ GOOD:**\n\n- Has timestamp/timestamptz column\n- Multiple indexes with timestamp-based columns\n- Composite (entity_id, timestamp) indexes\n\n**❌ POOR:**\n\n- Mostly indexes with non-time-based columns (on columns like email, name, status, etc.)\n- Columns that you expect to be updated over time (updated_at, updated_by, status, etc.)\n- Unique constraints on non-time fields\n- Frequent updated_at modifications\n- Small static tables\n\n#### Special Case: ID-Based Tables\n\nSequential ID tables can be candidates if:\n\n- Insert-mostly pattern / updates are either infrequent or only on recent records.\n- If updates do happen, they occur on recent records (such as an order status being updated orderered->processing->delivered. Note once an order is delivered, it is unlikely to be updated again.)\n- IDs correlate with time (as is the case for serial/auto-incrementing IDs/GENERATED ALWAYS AS IDENTITY)\n- ID is the primary query dimension\n- Recent data accessed more often (frequently the case in ecommerce, finance, etc.)\n- Time-based reporting common (e.g. monthly, daily summaries/analytics)\n\n```sql\nCREATE TABLE orders (\n id BIGSERIAL PRIMARY KEY, -- Can partition by ID\n user_id BIGINT,\n created_at TIMESTAMPTZ DEFAULT NOW() -- For sparse indexes\n);\n```\n\nNote: For ID-based tables where there is also a time column (created_at, ordered_at, etc.),\nyou can partition by ID and use sparse indexes on the time column.\nSee the `migrate-postgres-tables-to-hypertables` skill for details.\n\n## Step 2: Candidacy Scoring (8+ points = good candidate)\n\n### Time-Series Characteristics (5+ points needed)\n\n- Has timestamp/timestamptz column: **3 points**\n- Data inserted chronologically: **2 points**\n- Queries filter by time: **2 points**\n- Time aggregations common: **2 points**\n\n### Scale & Performance (3+ points recommended)\n\n- Large table (1M+ rows or 100MB+): **2 points**\n- High insert volume: **1 point**\n- Infrequent updates to historical: **1 point**\n- Range queries common: **1 point**\n- Aggregation queries: **2 points**\n\n### Data Patterns (bonus)\n\n- Contains entity ID for segmentation (device_id, user_id, product_id, symbol, etc.): **1 point**\n- Numeric measurements: **1 point**\n- Log/event structure: **1 point**\n\n## Common Patterns\n\n### ✅ GOOD Candidates\n\n**✅ Event/Log Tables** (user_events, audit_logs)\n\n```sql\nCREATE TABLE user_events (\n id BIGSERIAL PRIMARY KEY,\n user_id BIGINT,\n event_type TEXT,\n event_time TIMESTAMPTZ DEFAULT NOW(),\n metadata JSONB\n);\n-- Partition by id, segment by user_id, enable minmax sparse_index on event_time\n```\n\n**✅ Sensor/IoT Data** (sensor_readings, telemetry)\n\n```sql\nCREATE TABLE sensor_readings (\n device_id TEXT,\n timestamp TIMESTAMPTZ,\n temperature DOUBLE PRECISION,\n humidity DOUBLE PRECISION\n);\n-- Partition by timestamp, segment by device_id, minmax sparse indexes on temperature and humidity\n```\n\n**✅ Financial/Trading** (stock_prices, transactions)\n\n```sql\nCREATE TABLE stock_prices (\n symbol VARCHAR(10),\n price_time TIMESTAMPTZ,\n open_price DECIMAL,\n close_price DECIMAL,\n volume BIGINT\n);\n-- Partition by price_time, segment by symbol, minmax sparse indexes on open_price and close_price and volume\n```\n\n**✅ System Metrics** (monitoring_data)\n\n```sql\nCREATE TABLE system_metrics (\n hostname TEXT,\n metric_time TIMESTAMPTZ,\n cpu_usage DOUBLE PRECISION,\n memory_usage BIGINT\n);\n-- Partition by metric_time, segment by hostname, minmax sparse indexes on cpu_usage and memory_usage\n```\n\n### ❌ POOR Candidates\n\n**❌ Reference Tables** (countries, categories)\n\n```sql\nCREATE TABLE countries (\n id SERIAL PRIMARY KEY,\n name VARCHAR(100),\n code CHAR(2)\n);\n-- Static data, no time component\n```\n\n**❌ User Profiles** (users, accounts)\n\n```sql\nCREATE TABLE users (\n id BIGSERIAL PRIMARY KEY,\n email VARCHAR(255),\n created_at TIMESTAMPTZ,\n updated_at TIMESTAMPTZ\n);\n-- Accessed by ID, frequently updated, has timestamp but it's not the primary query dimension (the primary query dimension is id or email)\n```\n\n**❌ Settings/Config** (user_settings)\n\n```sql\nCREATE TABLE user_settings (\n user_id BIGINT PRIMARY KEY,\n theme VARCHAR(20), -- Changes: light -> dark -> auto\n language VARCHAR(10), -- Changes: en -> es -> fr\n notifications JSONB, -- Frequent preference updates\n updated_at TIMESTAMPTZ\n);\n-- Accessed by user_id, frequently updated, has timestamp but it's not the primary query dimension (the primary query dimension is user_id)\n```\n\n## Analysis Output Requirements\n\nFor each candidate table provide:\n\n- **Score:** Based on criteria (8+ = strong candidate)\n- **Pattern:** Insert vs update ratio\n- **Access:** Time-based vs entity lookups\n- **Size:** Current size and growth rate\n- **Queries:** Time-range, aggregations, point lookups\n\nFocus on insert-heavy patterns with time-based or sequential access. Tables scoring 8+ points are strong candidates for conversion.\n","content_type":"text/markdown; charset=utf-8","language":"markdown","size":10525,"content_sha256":"12472d435c00f8676faa0be2d9a068ffea1c57f4f8376bacfe5ec921a727bd94"},{"filename":"references/migrate-postgres-tables-to-hypertables.md","content":"---\nname: migrate-postgres-tables-to-hypertables\ndescription: |\n Use this skill to migrate identified PostgreSQL tables to Timescale/TimescaleDB hypertables with optimal configuration and validation.\n\n **Trigger when user asks to:**\n - Migrate or convert PostgreSQL tables to hypertables\n - Execute hypertable migration with minimal downtime\n - Plan blue-green migration for large tables\n - Validate hypertable migration success\n - Configure compression after migration\n\n **Prerequisites:** Tables already identified as candidates (use find-hypertable-candidates first if needed)\n\n **Keywords:** migrate to hypertable, convert table, Timescale, TimescaleDB, blue-green migration, in-place conversion, create_hypertable, migration validation, compression setup\n\n Step-by-step migration planning including: partition column selection, chunk interval calculation, PK/constraint handling, migration execution (in-place vs blue-green), and performance validation queries.\nlicense: Apache-2.0\ncompatibility: Requires PostgreSQL 15+ with TimescaleDB\nmetadata:\n author: tigerdata\n---\n\n# PostgreSQL to TimescaleDB Hypertable Migration\n\nMigrate identified PostgreSQL tables to TimescaleDB hypertables with optimal configuration, migration planning and validation.\n\n**Prerequisites**: Tables already identified as hypertable candidates (use companion \"find-hypertable-candidates\" skill if needed).\n\n## Step 1: Optimal Configuration\n\n### Partition Column Selection\n\n```sql\n-- Find potential partition columns\nSELECT column_name, data_type, is_nullable\nFROM information_schema.columns\nWHERE table_name = 'your_table_name'\n AND data_type IN ('timestamp', 'timestamptz', 'bigint', 'integer', 'date')\nORDER BY ordinal_position;\n```\n\n**Requirements:** Time-based (TIMESTAMP/TIMESTAMPTZ/DATE) or sequential integer (INT/BIGINT)\n\nShould represent when the event actually occurred or sequential ordering.\n\n**Common choices:**\n\n- `timestamp`, `created_at`, `event_time` - when event occurred\n- `id`, `sequence_number` - auto-increment (for sequential data without timestamps)\n- `ingested_at` - less ideal, only if primary query dimension\n- `updated_at` - AVOID (records updated out of order, breaks chunk distribution) unless primary query dimension\n\n#### Special Case: table with BOTH ID AND Timestamp\n\nWhen table has sequential ID (PK) AND timestamp that correlate:\n\n```sql\n-- Partition by ID, enable minmax sparse indexes on timestamp\nSELECT create_hypertable('orders', 'id', chunk_time_interval => 1000000);\nALTER TABLE orders SET (\n timescaledb.sparse_index = 'minmax(created_at),...'\n);\n```\n\nSparse indexes on time column enable skipping compressed blocks outside queried time ranges.\n\nUse when: ID correlates with time (newer records have higher IDs), need ID-based lookups, time queries also common\n\n### Chunk Interval Selection\n\n```sql\n-- Ensure statistics are current\nANALYZE your_table_name;\n\n-- Estimate index size per time unit\nWITH time_range AS (\n SELECT\n MIN(timestamp_column) as min_time,\n MAX(timestamp_column) as max_time,\n EXTRACT(EPOCH FROM (MAX(timestamp_column) - MIN(timestamp_column)))/3600 as total_hours\n FROM your_table_name\n),\ntotal_index_size AS (\n SELECT SUM(pg_relation_size(indexname::regclass)) as total_index_bytes\n FROM pg_stat_user_indexes\n WHERE schemaname||'.'||tablename = 'your_schema.your_table_name'\n)\nSELECT\n pg_size_pretty(tis.total_index_bytes / tr.total_hours) as index_size_per_hour\nFROM time_range tr, total_index_size tis;\n```\n\n**Target:** Indexes of recent chunks \u003c 25% of RAM\n**Default:** IMPORTANT: Keep default of 7 days if unsure\n**Range:** 1 hour minimum, 30 days maximum\n\n**Example:** 32GB RAM → target 8GB for recent indexes. If index_size_per_hour = 200MB:\n\n- 1 hour chunks: 200MB chunk index size × 40 recent = 8GB ✓\n- 6 hour chunks: 1.2GB chunk index size × 7 recent = 8.4GB ✓\n- 1 day chunks: 4.8GB chunk index size × 2 recent = 9.6GB ⚠️\n Choose largest interval keeping 2+ recent chunk indexes under target.\n\n### Primary Key/ Unique Constraints Compatibility\n\n```sql\n-- Check existing primary key/ unique constraints\nSELECT conname, pg_get_constraintdef(oid) as definition\nFROM pg_constraint\nWHERE conrelid = 'your_table_name'::regclass AND contype = 'p' OR contype = 'u';\n```\n\n**Rules:** PK/UNIQUE must include partition column\n\n**Actions:**\n\n1. **No PK/UNIQUE:** No changes needed\n2. **PK/UNIQUE includes partition column:** No changes needed\n3. **PK/UNIQUE excludes partition column:** ⚠️ **ASK USER PERMISSION** to modify PK/UNIQUE\n\n**Example: user prompt if needed:**\n\n> \"Primary key (id) doesn't include partition column (timestamp). Must modify to PRIMARY KEY (id, timestamp) to convert to hypertable. This may break application code. Is this acceptable?\"\n> \"Unique constraint (id) doesn't include partition column (timestamp). Must modify to UNIQUE (id, timestamp) to convert to hypertable. This may break application code. Is this acceptable?\"\n\nIf the user accepts, modify the constraint:\n\n```sql\nBEGIN;\nALTER TABLE your_table_name DROP CONSTRAINT existing_pk_name;\nALTER TABLE your_table_name ADD PRIMARY KEY (existing_columns, partition_column);\nCOMMIT;\n```\n\nIf the user does not accept, you should NOT migrate the table.\n\nIMPORTANT: DO NOT modify the primary key/unique constraint without user permission.\n\n### Compression Configuration\n\nFor detailed segment_by and order_by selection, see \"setup-timescaledb-hypertables\" skill. Quick reference:\n\n**segment_by:** Most common WHERE filter with >100 rows per value per chunk\n\n- IoT: `device_id`\n- Finance: `symbol`\n- Analytics: `user_id` or `session_id`\n\n```sql\n-- Analyze cardinality for segment_by selection\nSELECT column_name, COUNT(DISTINCT column_name) as unique_values,\n ROUND(COUNT(*)::float / COUNT(DISTINCT column_name), 2) as avg_rows_per_value\nFROM your_table_name GROUP BY column_name;\n```\n\n**order_by:** Usually `timestamp DESC`. The (segment_by, order_by) combination should form a natural time-series progression.\n\n- If column has \u003c100 rows/chunk (too low for segment_by), prepend to order_by: `order_by='low_density_col, timestamp DESC'`\n\n**sparse indexes:** add minmax on the columns that are used in the WHERE clauses but are not in the segment_by or order_by. Use minmax for columns used in range queries.\n\n```sql\nALTER TABLE your_table_name SET (\n timescaledb.enable_columnstore,\n timescaledb.segmentby = 'entity_id',\n timescaledb.orderby = 'timestamp DESC'\n timescaledb.sparse_index = 'minmax(value_1),...'\n);\n\n-- Compress after data unlikely to change (adjust `after` parameter based on update patterns)\nCALL add_columnstore_policy('your_table_name', after => INTERVAL '7 days');\n```\n\n## Step 2: Migration Planning\n\n### Pre-Migration Checklist\n\n- [ ] Partition column selected\n- [ ] Chunk interval calculated (or using default)\n- [ ] PK includes partition column OR user approved modification\n- [ ] No Hypertable→Hypertable foreign keys\n- [ ] Unique constraints include partition column\n- [ ] Created compression configuration (segment_by, order_by, sparse indexes, compression policy)\n- [ ] Maintenance window scheduled / backup created.\n\n### Migration Options\n\n#### Option 1: In-Place (Tables \u003c 1GB)\n\n```sql\n-- Enable extension\nCREATE EXTENSION IF NOT EXISTS timescaledb;\n\n-- Convert to hypertable (locks table)\nSELECT create_hypertable(\n 'your_table_name',\n 'timestamp_column',\n chunk_time_interval => INTERVAL '7 days',\n if_not_exists => TRUE\n);\n\n-- Configure compression\nALTER TABLE your_table_name SET (\n timescaledb.enable_columnstore,\n timescaledb.segmentby = 'entity_id',\n timescaledb.orderby = 'timestamp DESC',\n timescaledb.sparse_index = 'minmax(value_1),...'\n);\n\n-- Adjust `after` parameter based on update patterns\nCALL add_columnstore_policy('your_table_name', after => INTERVAL '7 days');\n```\n\n#### Option 2: Blue-Green (Tables > 1GB)\n\n```sql\n-- 1. Create new hypertable\nCREATE TABLE your_table_name_new (LIKE your_table_name INCLUDING ALL);\n\n-- 2. Convert to hypertable\nSELECT create_hypertable('your_table_name_new', 'timestamp_column');\n\n-- 3. Configure compression\nALTER TABLE your_table_name_new SET (\n timescaledb.enable_columnstore,\n timescaledb.segmentby = 'entity_id',\n timescaledb.orderby = 'timestamp DESC'\n);\n\n-- 4. Migrate data in batches\nINSERT INTO your_table_name_new\nSELECT * FROM your_table_name\nWHERE timestamp_column >= '2024-01-01' AND timestamp_column \u003c '2024-02-01';\n-- Repeat for each time range\n\n-- 4. Enter maintenance window and do the following:\n\n-- 5. Pause modification of the old table.\n\n-- 6. Copy over the most recent data from the old table to the new table.\n\n-- 7. Swap tables\nBEGIN;\nALTER TABLE your_table_name RENAME TO your_table_name_old;\nALTER TABLE your_table_name_new RENAME TO your_table_name;\nCOMMIT;\n\n-- 8. Exit maintenance window.\n\n-- 9. (sometime much later) Drop old table after validation\n-- DROP TABLE your_table_name_old;\n```\n\n### Common Issues\n\n#### Foreign Keys\n\n```sql\n-- Check foreign keys\nSELECT conname, confrelid::regclass as referenced_table\nFROM pg_constraint\nWHERE (conrelid = 'your_table_name'::regclass\n OR confrelid = 'your_table_name'::regclass)\n AND contype = 'f';\n```\n\n**Supported:** Plain→Hypertable, Hypertable→Plain\n**NOT supported:** Hypertable→Hypertable\n\n⚠️ **CRITICAL:** Hypertable→Hypertable FKs must be dropped (enforce in application). **ASK USER PERMISSION**. If no, **STOP MIGRATION**.\n\n#### Large Table Migration Time\n\n```sql\n-- Rough estimate: ~75k rows/second\nSELECT\n pg_size_pretty(pg_total_relation_size(tablename)) as size,\n n_live_tup as rows,\n ROUND(n_live_tup / 75000.0 / 60, 1) as estimated_minutes\nFROM pg_stat_user_tables\nWHERE tablename = 'your_table_name';\n```\n\n**Solutions for large tables (>1GB/10M rows):** Use blue-green migration, migrate during off-peak, test on subset first\n\n## Step 3: Performance Validation\n\n### Chunk & Compression Analysis\n\n```sql\n-- View chunks and compression\nSELECT\n chunk_name,\n pg_size_pretty(total_bytes) as size,\n pg_size_pretty(compressed_total_bytes) as compressed_size,\n ROUND((total_bytes - compressed_total_bytes::numeric) / total_bytes * 100, 1) as compression_pct,\n range_start,\n range_end\nFROM timescaledb_information.chunks\nWHERE hypertable_name = 'your_table_name'\nORDER BY range_start DESC;\n```\n\n**Look for:**\n\n- Consistent chunk sizes (within 2x)\n- Compression >90% for time-series\n- Recent chunks uncompressed\n- Chunk indexes \u003c 25% RAM\n\n### Query Performance Tests\n\n```sql\n-- 1. Time-range query (should show chunk exclusion)\nEXPLAIN (ANALYZE, BUFFERS)\nSELECT COUNT(*), AVG(value)\nFROM your_table_name\nWHERE timestamp >= NOW() - INTERVAL '1 day';\n\n-- 2. Entity + time query (benefits from segment_by)\nEXPLAIN (ANALYZE, BUFFERS)\nSELECT * FROM your_table_name\nWHERE entity_id = 'X' AND timestamp >= NOW() - INTERVAL '1 week';\n\n-- 3. Aggregation (benefits from columnstore)\nEXPLAIN (ANALYZE, BUFFERS)\nSELECT DATE_TRUNC('hour', timestamp), entity_id, COUNT(*), AVG(value)\nFROM your_table_name\nWHERE timestamp >= NOW() - INTERVAL '1 month'\nGROUP BY 1, 2;\n```\n\n**✅ Good signs:**\n\n- \"Chunks excluded during startup: X\" in EXPLAIN plan\n- \"Custom Scan (ColumnarScan)\" for compressed data\n- Lower \"Buffers: shared read\" in EXPLAIN ANALYZE plan than pre-migration\n- Faster execution times\n\n**❌ Bad signs:**\n\n- \"Seq Scan\" on large chunks\n- No chunk exclusion messages\n- Slower than before migration\n\n### Storage Metrics\n\n```sql\n-- Monitor compression effectiveness\nSELECT\n hypertable_name,\n pg_size_pretty(total_bytes) as total_size,\n pg_size_pretty(compressed_total_bytes) as compressed_size,\n ROUND(compressed_total_bytes::numeric / total_bytes * 100, 1) as compressed_pct_of_total,\n ROUND((uncompressed_total_bytes - compressed_total_bytes::numeric) /\n uncompressed_total_bytes * 100, 1) as compression_ratio_pct\nFROM timescaledb_information.hypertables\nWHERE hypertable_name = 'your_table_name';\n```\n\n**Monitor:**\n\n- compression_ratio_pct >90% (typical time-series)\n- compressed_pct_of_total growing as data ages\n- Size growth slowing significantly vs pre-hypertable\n- Decreasing compression_ratio_pct = poor segment_by\n\n### Troubleshooting\n\n#### Poor Chunk Exclusion\n\n```sql\n-- Verify chunks are being excluded\nEXPLAIN (ANALYZE, BUFFERS)\nSELECT * FROM your_table_name\nWHERE timestamp >= '2024-01-01' AND timestamp \u003c '2024-01-02';\n-- Look for \"Chunks excluded during startup: X\"\n```\n\n#### Poor Compression\n\n```sql\n-- Get newest compressed chunk name\nSELECT chunk_name FROM timescaledb_information.chunks\nWHERE hypertable_name = 'your_table_name'\n AND compressed_total_bytes IS NOT NULL\nORDER BY range_start DESC LIMIT 1;\n\n-- Analyze segment distribution\nSELECT segment_by_column, COUNT(*) as rows_per_segment\nFROM _timescaledb_internal._hyper_X_Y_chunk -- Use actual chunk name\nGROUP BY 1 ORDER BY 2 DESC;\n```\n\n**Look for:** \u003c20 rows per segment: Poor segment_by choice (should be >100) => Low compression potential.\n\n#### Poor insert performance\n\nCheck that you don't have too many indexes. Unused indexes hurt insert performance and should be dropped.\n\n```sql\nSELECT\n schemaname,\n tablename,\n indexname,\n idx_tup_read,\n idx_tup_fetch,\n idx_scan\nFROM pg_stat_user_indexes\nWHERE tablename LIKE '%your_table_name%'\nORDER BY idx_scan DESC;\n```\n\n**Look for:** Unused indexes via a low idx_scan value. Drop such indexes (but ask user permission).\n\n### Ongoing Monitoring\n\n```sql\n-- Monitor chunk compression status\nCREATE OR REPLACE VIEW hypertable_compression_status AS\nSELECT\n h.hypertable_name,\n COUNT(c.chunk_name) as total_chunks,\n COUNT(c.chunk_name) FILTER (WHERE c.compressed_total_bytes IS NOT NULL) as compressed_chunks,\n ROUND(\n COUNT(c.chunk_name) FILTER (WHERE c.compressed_total_bytes IS NOT NULL)::numeric /\n COUNT(c.chunk_name) * 100, 1\n ) as compression_coverage_pct,\n pg_size_pretty(SUM(c.total_bytes)) as total_size,\n pg_size_pretty(SUM(c.compressed_total_bytes)) as compressed_size\nFROM timescaledb_information.hypertables h\nLEFT JOIN timescaledb_information.chunks c ON h.hypertable_name = c.hypertable_name\nGROUP BY h.hypertable_name;\n\n-- Query this view regularly to monitor compression progress\nSELECT * FROM hypertable_compression_status\nWHERE hypertable_name = 'your_table_name';\n```\n\n**Look for:**\n\n- compression_coverage_pct should increase over time as data ages and gets compressed.\n- total_chunks should not grow too quickly (more than 10000 becomes a problem).\n- You should not see unexpected spikes in total_size or compressed_size.\n\n## Success Criteria\n\n**✅ Migration successful when:**\n\n- All queries return correct results\n- Query performance equal or better\n- Compression >90% for older data\n- Chunk exclusion working for time queries\n- Insert performance acceptable\n\n**❌ Investigate if:**\n\n- Query performance >20% worse\n- Compression \u003c80%\n- No chunk exclusion\n- Insert performance degraded\n- Increased error rates\n\nFocus on high-volume, insert-heavy workloads with time-based access patterns for best ROI.\n","content_type":"text/markdown; charset=utf-8","language":"markdown","size":15068,"content_sha256":"1d52f9d8351a62deaaa63f5d226954d93439d9c2c8d0bd60745958dc43779565"},{"filename":"references/pgvector-semantic-search.md","content":"---\nname: pgvector-semantic-search\ndescription: |\n Use this skill for setting up vector similarity search with pgvector for AI/ML embeddings, RAG applications, or semantic search.\n\n **Trigger when user asks to:**\n - Store or search vector embeddings in PostgreSQL\n - Set up semantic search, similarity search, or nearest neighbor search\n - Create HNSW or IVFFlat indexes for vectors\n - Implement RAG (Retrieval Augmented Generation) with PostgreSQL\n - Optimize pgvector performance, recall, or memory usage\n - Use binary quantization for large vector datasets\n\n **Keywords:** pgvector, embeddings, semantic search, vector similarity, HNSW, IVFFlat, halfvec, cosine distance, nearest neighbor, RAG, LLM, AI search\n\n Covers: halfvec storage, HNSW index configuration (m, ef_construction, ef_search), quantization strategies, filtered search, bulk loading, and performance tuning.\nlicense: Apache-2.0\ncompatibility: Requires PostgreSQL 15+ with the pgvector extension\nmetadata:\n author: tigerdata\n---\n\n# pgvector for Semantic Search\n\nSemantic search finds content by meaning rather than exact keywords. An embedding model converts text into high-dimensional vectors, where similar meanings map to nearby points. pgvector stores these vectors in PostgreSQL and uses approximate nearest neighbor (ANN) indexes to find the closest matches quickly—scaling to millions of rows without leaving the database. Store your text alongside its embedding, then query by converting your search text to a vector and returning the rows with the smallest distance.\n\nThis guide covers pgvector setup and tuning—not embedding model selection or text chunking, which significantly affect search quality. Requires pgvector 0.8.0+ for all features (`halfvec`, `binary_quantize`, iterative scan).\n\n## Golden Path (Default Setup)\n\nUse this configuration unless you have a specific reason not to.\n- Embedding column data type: `halfvec(N)` where `N` is your embedding dimension (must match everywhere). Examples use 1536; replace with your dimension `N`.\n- Distance: cosine (`\u003c=>`)\n- Index: HNSW (`m = 16`, `ef_construction = 64`). Use `halfvec_cosine_ops` and query with `\u003c=>`.\n- Query-time recall: `SET hnsw.ef_search = 100` (good starting point from published benchmarks, increase for higher recall at higher latency)\n- Query pattern: `ORDER BY embedding \u003c=> $1::halfvec(N) LIMIT k`\n\nThis setup provides a strong speed–recall tradeoff for most text-embedding workloads.\n\n## Core Rules\n\n- **Enable the extension** in each database: `CREATE EXTENSION IF NOT EXISTS vector;`\n- **Use HNSW indexes by default**—superior speed-recall tradeoff, can be created on empty tables, no training step required. Only consider IVFFlat for write-heavy or memory-bound workloads.\n- **Use `halfvec` by default**—store and index as `halfvec` for 50% smaller storage and indexes with minimal recall loss.\n- **Index after bulk loading** initial data for best build performance.\n- **Create indexes concurrently** in production: `CREATE INDEX CONCURRENTLY ...`\n- **Use cosine distance by default** (`\u003c=>`): For non-normalized embeddings, use cosine. For unit-normalized embeddings, cosine and inner product yield identical rankings; default to cosine.\n- **Match query operator to index ops**: Index with `halfvec_cosine_ops` requires `\u003c=>` in queries; `halfvec_l2_ops` requires `\u003c->`; mismatched operators won't use the index.\n- **Always cast query vectors explicitly** (`$1::halfvec(N)`) to avoid implicit-cast failures in prepared statements.\n- **Always use the same embedding model for data and queries**. Similarity search only works when the model generating the vectors is the same.\n\n## Type Rules\n\n- Store embeddings as `halfvec(N)`\n- Cast query vectors to `halfvec(N)`\n- Store binary quantized vectors as `bit(N)` in a generated column\n- Do not mix `vector` / `halfvec` / `bit` without explicit casts\n- Never call `binary_quantize()` on table columns inside `ORDER BY`; store it instead\n- Dimensions must match: a `halfvec(1536)` column requires query vectors cast as `::halfvec(1536)`.\n\n## Standard Pattern\n\n```sql\n-- Store and index as halfvec\nCREATE TABLE items (\n id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,\n contents TEXT NOT NULL,\n embedding halfvec(1536) NOT NULL -- NOT NULL requires embeddings generated before insert, not async\n);\nCREATE INDEX ON items USING hnsw (embedding halfvec_cosine_ops);\n\n-- Query: returns 10 closest items. $1 is the embedding of your search text.\nSELECT id, contents FROM items ORDER BY embedding \u003c=> $1::halfvec(1536) LIMIT 10;\n```\n\nFor other distance operators (L2, inner product, etc.), see the [pgvector README](https://github.com/pgvector/pgvector).\n\n## HNSW Index\n\nThe recommended index type. Creates a multilayer navigable graph with superior speed-recall tradeoff. Can be created on empty tables (no training step required).\n\n```sql\nCREATE INDEX ON items USING hnsw (embedding halfvec_cosine_ops);\n\n-- With tuning parameters\nCREATE INDEX ON items USING hnsw (embedding halfvec_cosine_ops) WITH (m = 16, ef_construction = 64);\n```\n\n### HNSW Parameters\n\n| Parameter | Default | Description |\n|-----------|---------|-------------|\n| `m` | 16 | Max connections per layer. Higher = better recall, more memory |\n| `ef_construction` | 64 | Build-time candidate list. Higher = better graph quality, slower build |\n| `hnsw.ef_search` | 40 | Query-time candidate list. Higher = better recall, slower queries. Should be ≥ LIMIT. |\n\n**ef_search tuning (rough guidelines—actual results vary by dataset):**\n\n| ef_search | Approx Recall | Relative Speed |\n|-----------|---------------|----------------|\n| 40 | lower (~95% on some benchmarks) | 1x (baseline) |\n| 100 | higher | ~2x slower |\n| 200 | very-high | ~4x slower |\n| 400 | near-exact | ~8x slower |\n\n```sql\n-- Set search parameter for session\nSET hnsw.ef_search = 100;\n\n-- Set for single query\nBEGIN;\nSET LOCAL hnsw.ef_search = 100;\nSELECT id, contents FROM items ORDER BY embedding \u003c=> $1::halfvec(1536) LIMIT 10;\nCOMMIT;\n```\n\n## IVFFlat Index (Generally Not Recommended)\n\nDefault to HNSW. Use IVFFlat only when HNSW’s operational costs matter more than peak recall.\n\nChoose IVFFlat if:\n- Write-heavy or constantly changing data AND you're willing to rebuild the index frequently\n- You rebuild indexes often and want predictable build time and memory usage\n- Memory is tight and you cannot keep an HNSW graph mostly resident\n- Data is partitioned or tiered, and this index lives on colder partitions\n\nAvoid IVFFlat if you need:\n- highest recall at low latency\n- minimal tuning\n- a “set and forget” index\n\nNotes:\n- IVFFlat requires data to exist before index creation.\n- Recall depends on `lists` and `ivfflat.probes`; higher probes = better recall, slower queries.\n\nStarter config:\n```sql\nCREATE INDEX ON items\nUSING ivfflat (embedding halfvec_cosine_ops)\nWITH (lists = 1000);\n\nSET ivfflat.probes = 10;\n```\n\n## Quantization Strategies\n\n- Quantization is a memory decision, not a recall decision.\n- Use `halfvec` by default for storage and indexing.\n- Estimate HNSW index footprint as ~4–6 KB per 1536-dim `halfvec` (m=16) (order-of-magnitude); 3072-dim is ~2×; m=32 roughly doubles HNSW link/graph overhead.\n- If p95/p99 latency rises while CPU is mostly idle, the HNSW index is likely no longer resident in memory.\n- If `halfvec` doesn’t fit, use binary quantization + re-ranking.\n\n### Guidelines for 1536-dim vectors\n\nApproximate `halfvec` capacity at `m=16`, 1536-dim (assumes RAM mostly available for index caching):\n\n| RAM | Approx max halfvec vectors |\n|-----|----------------------------|\n| 16 GB | ~2–3M vectors |\n| 32 GB | ~4–6M vectors |\n| 64 GB | ~8–12M vectors |\n| 128 GB | ~16–25M vectors |\n\nFor 3072-dim embeddings, divide these numbers by ~2. \nFor `m=32`, also divide capacity by ~2.\n\nIf the index cannot fit in memory at this scale, use binary quantization.\n\nThese are ranges, not guarantees. Validate by monitoring cache residency and p95/p99 latency under load.\n\n### Binary Quantization (For Very Large Datasets)\n\n32× memory reduction. Use with re-ranking for acceptable recall.\n\n```sql\n-- Table with generated column for binary quantization\nCREATE TABLE items (\n id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,\n contents TEXT NOT NULL,\n embedding halfvec(1536) NOT NULL,\n embedding_bq bit(1536) GENERATED ALWAYS AS (binary_quantize(embedding)::bit(1536)) STORED\n);\n\nCREATE INDEX ON items USING hnsw (embedding_bq bit_hamming_ops);\n\n-- Query with re-ranking for better recall\n-- ef_search must be >= inner LIMIT to retrieve enough candidates\nSET hnsw.ef_search = 800;\nWITH q AS (\n SELECT binary_quantize($1::halfvec(1536))::bit(1536) AS qb\n)\nSELECT *\nFROM (\n SELECT i.id, i.contents, i.embedding\n FROM items i, q\n ORDER BY i.embedding_bq \u003c~> q.qb -- computes binary distance using index\n LIMIT 800\n) candidates\nORDER BY candidates.embedding \u003c=> $1::halfvec(1536) -- computes halfvec distance (no index), more accurate than binary\nLIMIT 10;\n```\n\nThe 80× oversampling ratio (800 candidates for 10 results) is a reasonable starting point. Binary quantization loses precision, so more candidates are needed to find true nearest neighbors during re-ranking. Increase if recall is insufficient; decrease if re-ranking latency is too high.\n\n## Performance by Dataset Size\n\n| Scale | Vectors | Config | Notes |\n|-------|---------|--------|-------|\n| Small | \u003c100K | Defaults | Index optional but improves tail latency |\n| Medium | 100K–5M | Defaults | Monitor p95 latency; most common production range |\n| Large | 5M+ | `ef_construction=100+` | Memory residency critical |\n| Very Large | 10M+ | Binary quantization + re-ranking | Add RAM or partition first if possible |\n\nTune `ef_search` first for recall; only increase `m` if recall plateaus and memory allows. Under concurrency, tail latency spikes when the index doesn't fit in memory. Binary quantization is an escape hatch—prefer adding RAM or partitioning first.\n\n## Filtering Best Practices\n\nFiltered vector search requires care. Depending on filter selectivity and query shape, filters can cause early termination (too few rows, missing results) or increase work (latency).\n\n### Iterative scan (recommended when filters are selective)\n\nBy default, HNSW may stop early when a WHERE clause is present, which can lead to fewer results than expected. Iterative scan allows HNSW to continue searching until enough filtered rows are found.\n\nEnable iterative scan when filters materially reduce the result set.\n\n```sql\n-- Enable iterative scans for filtered queries\nSET hnsw.iterative_scan = relaxed_order;\n\nSELECT id, contents\nFROM items\nWHERE category_id = 123\nORDER BY embedding \u003c=> $1::halfvec(1536)\nLIMIT 10;\n```\n\nIf results are still sparse, increase the scan budget:\n\n```sql\nSET hnsw.max_scan_tuples = 50000;\n```\n\nTrade-off: increasing `hnsw.max_scan_tuples` improves recall but can significantly increase latency.\n\n**When iterative scan is not needed:**\n- The filter matches a large portion of the table (low selectivity)\n- You are prefiltering via a B-tree index\n- You are querying a single partition or partial index\n\n### Choose the right filtering strategy\n\n**Highly selective filters (under ~10k rows)**\nUse a B-tree index on the filter column so Postgres can prefilter before ANN.\n\n```sql\nCREATE INDEX ON items (category_id);\n```\n\n**Low-cardinality filters (few distinct values)**\nUse partial HNSW indexes per filter value.\n\n```sql\nCREATE INDEX ON items\nUSING hnsw (embedding halfvec_cosine_ops)\nWHERE category_id = 11;\n```\n\n**Many filter values or large datasets**\nPartition by the filter key to keep each ANN index small.\n\n```sql\nCREATE TABLE items (\n embedding halfvec(1536),\n category_id int\n) PARTITION BY LIST (category_id);\n```\n\n### Key rules\n\n- Filters that match few rows require prefiltering, partitioning, or iterative scan.\n- Always validate filtered queries by measuring p95/p99 latency and tuples visited under realistic load.\n\n### Alternative: pgvectorscale for label-based filtering\n\nFor large datasets with label-based filters, [pgvectorscale](https://github.com/timescale/pgvectorscale)'s StreamingDiskANN index supports filtered indexes on `smallint[]` columns. Labels are indexed alongside vectors, enabling efficient filtered search without the accuracy tradeoffs of HNSW post-filtering. See the pgvectorscale documentation for setup details.\n\n## Bulk Loading\n\n```sql\n-- COPY is fastest; binary format is faster but requires proper encoding\n-- Text format: '[0.1, 0.2, ...]'\nCOPY items (contents, embedding) FROM STDIN;\n-- Binary format (if your client supports it):\nCOPY items (contents, embedding) FROM STDIN WITH (FORMAT BINARY);\n\n-- Add indexes AFTER loading\nSET maintenance_work_mem = '4GB';\nSET max_parallel_maintenance_workers = 7;\nCREATE INDEX ON items USING hnsw (embedding halfvec_cosine_ops);\n```\n\n## Maintenance\n\n- **VACUUM regularly** after updates/deletes—stale entries may persist until vacuumed\n- **REINDEX** if performance degrades after high churn (rebuilds the graph from scratch)\n- For write-heavy workloads with frequent deletes, consider IVFFlat or partitioning by time using hypertables\n\n## Monitoring & Debugging\n\n```sql\n-- Check index size\nSELECT pg_size_pretty(pg_relation_size('items_embedding_idx'));\n\n-- Debug query performance\nEXPLAIN (ANALYZE, BUFFERS) SELECT id, contents FROM items ORDER BY embedding \u003c=> $1::halfvec(1536) LIMIT 10;\n\n-- Monitor index build progress\nSELECT phase, round(100.0 * blocks_done / nullif(blocks_total, 0), 1) AS \"%\" \nFROM pg_stat_progress_create_index;\n\n-- Compare approximate vs exact recall\nBEGIN;\nSET LOCAL enable_indexscan = off; -- Force exact search\nSELECT id, contents FROM items ORDER BY embedding \u003c=> $1::halfvec(1536) LIMIT 10;\nCOMMIT;\n\n-- Force index use for debugging\nBEGIN;\nSET LOCAL enable_seqscan = off;\nSELECT id, contents FROM items ORDER BY embedding \u003c=> $1::halfvec(1536) LIMIT 10;\nCOMMIT;\n```\n\n## Common Issues (Symptom → Fix)\n\n| Symptom | Likely Cause | Fix |\n|--------|--------------|-----|\n| Query does not use ANN index | Missing `ORDER BY` + `LIMIT`, operator mismatch, or implicit casts | Use `ORDER BY` with a distance operator that matches the index ops class; explicitly cast query vectors |\n| Fewer results than expected (filtered query) | HNSW stops early due to filter | Enable iterative scan; increase `hnsw.max_scan_tuples`; or prefilter (B-tree), use partial indexes, or partition |\n| Fewer results than expected (unfiltered query) | ANN recall too low | Increase `hnsw.ef_search` |\n| High latency with low CPU usage | HNSW index not resident in memory | Use `halfvec`, reduce `m`/`ef_construction`, add RAM, partition, or use binary quantization |\n| Slow index builds | Insufficient build memory or parallelism | Increase `maintenance_work_mem` and `max_parallel_maintenance_workers`; build after bulk load |\n| Out-of-memory errors | Index too large for available RAM | Use `halfvec`, reduce index parameters, or switch to binary quantization with re-ranking |\n| Zero or missing results | NULL or zero vectors | Avoid NULL embeddings; do not use zero vectors with cosine distance |\n","content_type":"text/markdown; charset=utf-8","language":"markdown","size":15106,"content_sha256":"0c23028b9bc8f9a8bc4f45042d5996877533f60d7798a074ce73373ebc3e24f6"},{"filename":"references/postgres-hybrid-text-search.md","content":"---\nname: postgres-hybrid-text-search\ndescription: |\n Use this skill to implement hybrid search combining BM25 keyword search with semantic vector search using Reciprocal Rank Fusion (RRF).\n\n **Trigger when user asks to:**\n - Combine keyword and semantic search\n - Implement hybrid search or multi-modal retrieval\n - Use BM25/pg_textsearch with pgvector together\n - Implement RRF (Reciprocal Rank Fusion) for search\n - Build search that handles both exact terms and meaning\n\n\n **Keywords:** hybrid search, BM25, pg_textsearch, RRF, reciprocal rank fusion, keyword search, full-text search, reranking, cross-encoder\n\n Covers: pg_textsearch BM25 index setup, parallel query patterns, client-side RRF fusion (Python/TypeScript), weighting strategies, and optional ML reranking.\nlicense: Apache-2.0\ncompatibility: Requires PostgreSQL 15+ with pgvector and pg_textsearch extensions\nmetadata:\n author: tigerdata\n---\n\n# Hybrid Text Search\n\nHybrid search combines keyword search (BM25) with semantic search (vector embeddings) to get the best of both: exact keyword matching and meaning-based retrieval. Use Reciprocal Rank Fusion (RRF) to merge results from both methods into a single ranked list.\n\nThis guide covers combining [pg_textsearch](https://github.com/timescale/pg_textsearch) (BM25) with [pgvector](https://github.com/pgvector/pgvector). Requires both extensions. For high-volume setups, filtering, or advanced pgvector tuning (binary quantization, HNSW parameters), see the **pgvector-semantic-search** skill.\n\npg_textsearch is a new BM25 text search extension for PostgreSQL, fully open-source and available hosted on Tiger Cloud as well as for self-managed deployments. It provides true BM25 ranking, which often improves relevance compared to PostgreSQL's built-in ts_rank and can offer better performance at scale. Note: pg_textsearch is currently in prerelease and not yet recommended for production use. pg_textsearch currently supports PostgreSQL 17 and 18.\n\n## When to Use Hybrid Search\n\n- **Use hybrid** when queries mix specific terms (product names, codes, proper nouns) with conceptual intent\n- **Use semantic only** when meaning matters more than exact wording (e.g., \"how to fix slow queries\" should match \"query optimization\")\n- **Use keyword only** when exact matches are critical (e.g., error codes, SKUs, legal citations)\n\nHybrid search typically improves recall over either method alone, at the cost of slightly more complexity.\n\n## Data Preparation\n\nChunk your documents into smaller pieces (typically 500–1000 tokens) and store each chunk with its embedding. Both BM25 and semantic search operate on the same chunks—this keeps fusion simple since you're comparing like with like.\n\n## Golden Path (Default Setup)\n\n```sql\n-- Enable extensions\nCREATE EXTENSION IF NOT EXISTS vector;\nCREATE EXTENSION IF NOT EXISTS pg_textsearch;\n\n-- Table with both indexes\nCREATE TABLE documents (\n id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,\n content TEXT NOT NULL,\n embedding halfvec(1536) NOT NULL\n);\n\n-- BM25 index for keyword search\nCREATE INDEX ON documents USING bm25 (content) WITH (text_config = 'english');\n\n-- HNSW index for semantic search\nCREATE INDEX ON documents USING hnsw (embedding halfvec_cosine_ops);\n```\n\n### BM25 Notes\n\n- **Negative scores**: The `\u003c@>` operator returns negative values where lower = better match. RRF uses rank position, so this doesn't affect fusion.\n- **Language config**: Change `text_config` to match your content language (e.g., `'french'`, `'german'`). See [PostgreSQL text search configurations](https://www.postgresql.org/docs/current/textsearch-configuration.html).\n- **Tuning**: BM25 has `k1` (term frequency saturation, default 1.2) and `b` (length normalization, default 0.75) parameters. Defaults work well; only tune if relevance is poor.\n ```sql\n CREATE INDEX ON documents USING bm25 (content) WITH (text_config = 'english', k1 = 1.5, b = 0.8);\n ```\n- **Partitioned tables**: Each partition maintains local statistics. Scores are not directly comparable across partitions—query individual partitions when score comparability matters.\n\n## RRF Query Pattern\n\nReciprocal Rank Fusion combines rankings from multiple searches. Each result's score is `1 / (k + rank)` where `k` is a constant (typically 60). Results are summed across searches and re-sorted.\n\n**Run both queries in parallel from your client** for lower latency, then fuse results client-side:\n\n```sql\n-- Query 1: Keyword search (BM25)\n-- $1: search text\nSELECT id, content FROM documents ORDER BY content \u003c@> $1 LIMIT 50;\n```\n\n```sql\n-- Query 2: Semantic search (separate query, run in parallel)\n-- $1: embedding of your search text as halfvec(1536)\nSELECT id, content FROM documents ORDER BY embedding \u003c=> $1::halfvec(1536) LIMIT 50;\n```\n\n```python\n# Client-side RRF fusion (Python)\ndef rrf_fusion(keyword_results, semantic_results, k=60, limit=10):\n scores = {}\n content_map = {}\n\n for rank, row in enumerate(keyword_results, start=1):\n scores[row['id']] = scores.get(row['id'], 0) + 1 / (k + rank)\n content_map[row['id']] = row['content']\n\n for rank, row in enumerate(semantic_results, start=1):\n scores[row['id']] = scores.get(row['id'], 0) + 1 / (k + rank)\n content_map[row['id']] = row['content']\n\n sorted_ids = sorted(scores, key=scores.get, reverse=True)[:limit]\n return [{'id': id, 'content': content_map[id], 'score': scores[id]} for id in sorted_ids]\n```\n\n```typescript\n// Client-side RRF fusion (TypeScript)\ntype Row = { id: number; content: string };\ntype Result = Row & { score: number };\n\nfunction rrfFusion(keywordResults: Row[], semanticResults: Row[], k = 60, limit = 10): Result[] {\n const scores = new Map\u003cnumber, number>();\n const contentMap = new Map\u003cnumber, string>();\n\n keywordResults.forEach((row, i) => {\n scores.set(row.id, (scores.get(row.id) ?? 0) + 1 / (k + i + 1));\n contentMap.set(row.id, row.content);\n });\n\n semanticResults.forEach((row, i) => {\n scores.set(row.id, (scores.get(row.id) ?? 0) + 1 / (k + i + 1));\n contentMap.set(row.id, row.content);\n });\n\n return [...scores.entries()]\n .sort((a, b) => b[1] - a[1])\n .slice(0, limit)\n .map(([id, score]) => ({ id, content: contentMap.get(id)!, score }));\n}\n```\n\n### RRF Parameters\n\n| Parameter | Default | Description |\n|-----------|---------|-------------|\n| `k` | 60 | Smoothing constant. Higher values reduce rank differences; 60 is standard |\n| Candidates per search | 50 | Higher = better recall, more work |\n| Final limit | 10 | Results returned after fusion |\n\nIncrease candidates if relevant results are being missed. The k=60 constant rarely needs tuning.\n\n## Weighting Keyword vs Semantic\n\nTo favor one method over another, multiply its RRF contribution:\n\n```python\n# Weight semantic search 2x higher than keyword\nkeyword_weight = 1.0\nsemantic_weight = 2.0\n\nfor rank, row in enumerate(keyword_results, start=1):\n scores[row['id']] = scores.get(row['id'], 0) + keyword_weight / (k + rank)\n\nfor rank, row in enumerate(semantic_results, start=1):\n scores[row['id']] = scores.get(row['id'], 0) + semantic_weight / (k + rank)\n```\n\n```typescript\n// Weight semantic search 2x higher than keyword\nconst keywordWeight = 1.0;\nconst semanticWeight = 2.0;\n\nkeywordResults.forEach((row, i) => {\n scores.set(row.id, (scores.get(row.id) ?? 0) + keywordWeight / (k + i + 1));\n});\n\nsemanticResults.forEach((row, i) => {\n scores.set(row.id, (scores.get(row.id) ?? 0) + semanticWeight / (k + i + 1));\n});\n```\n\nStart with equal weights (1.0 each) and adjust based on measured relevance.\n\n## Reranking with ML Models\n\nFor highest quality, add a reranking step using a cross-encoder model. Cross-encoders (e.g., `cross-encoder/ms-marco-MiniLM-L-6-v2`) are more accurate than bi-encoders but too slow for initial retrieval—use them only on the candidate set.\n\nRun the same parallel queries as above with a higher LIMIT (e.g., 100), then:\n\n```python\n# 1. Fuse results with RRF (more candidates for reranking)\ncandidates = rrf_fusion(keyword_results, semantic_results, limit=100)\n\n# 2. Rerank with cross-encoder\nfrom sentence_transformers import CrossEncoder\nreranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')\n\npairs = [(query_text, doc['content']) for doc in candidates]\nscores = reranker.predict(pairs)\n\n# 3. Return top 10 by reranker score\nreranked = sorted(zip(candidates, scores), key=lambda x: x[1], reverse=True)[:10]\n```\n\n```typescript\nimport { CohereClientV2 } from 'cohere-ai';\n\n// 1. Fuse results with RRF (more candidates for reranking)\nconst candidates = rrfFusion(keywordResults, semanticResults, 60, 100);\n\n// 2. Rerank via API (example uses Cohere SDK; Jina, Voyage, and others work similarly)\nconst cohere = new CohereClientV2({ token: COHERE_API_KEY });\n\nconst reranked = await cohere.rerank({\n model: 'rerank-v3.5',\n query: queryText,\n documents: candidates.map(c => c.content),\n topN: 10\n});\n\n// 3. Map back to original documents\nconst results = reranked.results.map(r => candidates[r.index]);\n```\n\nReranking is optional—hybrid RRF alone significantly improves over single-method search.\n\n## Performance Considerations\n\n- **Index both columns**: BM25 index on text, HNSW index on embedding\n- **Limit candidate pools**: 50–100 candidates per method is usually sufficient\n- **Run queries in parallel**: Client-side parallelism reduces latency vs sequential execution\n- **Monitor latency**: Hybrid adds overhead; ensure both indexes fit in memory\n\n## Scaling with pgvectorscale\n\nFor large datasets (10M+ vectors) or workloads with selective metadata filters, consider [pgvectorscale](https://github.com/timescale/pgvectorscale)'s StreamingDiskANN index instead of HNSW for the semantic search component.\n\n**When to use StreamingDiskANN:**\n- Large datasets where HNSW doesn't fit in memory\n- Queries that filter by labels (e.g., tenant_id, category, tags)\n- When you need high-performance filtered vector search\n\n**Label-based filtering:** StreamingDiskANN supports filtered indexes on `smallint[]` label columns. Labels are indexed alongside vectors, enabling efficient filtered search without post-filtering accuracy loss.\n\n```sql\n-- Enable pgvectorscale (in addition to pgvector)\nCREATE EXTENSION IF NOT EXISTS vectorscale;\n\n-- Table with label column for filtering\nCREATE TABLE documents (\n id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,\n content TEXT NOT NULL,\n embedding halfvec(1536) NOT NULL,\n labels smallint[] NOT NULL -- e.g., category IDs, tenant IDs\n);\n\n-- StreamingDiskANN index with label filtering\nCREATE INDEX ON documents USING diskann (embedding vector_cosine_ops, labels);\n\n-- BM25 index for keyword search\nCREATE INDEX ON documents USING bm25 (content) WITH (text_config = 'english');\n\n-- Filtered semantic search using && (array overlap)\nSELECT id, content FROM documents\nWHERE labels && ARRAY[1, 3]::smallint[]\nORDER BY embedding \u003c=> $1::halfvec(1536) LIMIT 50;\n```\n\nSee the [pgvectorscale documentation](https://github.com/timescale/pgvectorscale) for more details on filtered indexes and tuning parameters.\n\n## Monitoring & Debugging\n\n```sql\n-- Force index usage for verification (planner may prefer seqscan on small tables)\nSET enable_seqscan = off;\n\n-- Verify BM25 index is used\nEXPLAIN SELECT id, content FROM documents ORDER BY content \u003c@> 'search text' LIMIT 10;\n-- Look for: Index Scan using ... (bm25)\n\n-- Verify HNSW index is used\nEXPLAIN SELECT id, content FROM documents ORDER BY embedding \u003c=> '[0.1, 0.2, ...]'::halfvec(1536) LIMIT 10;\n-- Look for: Index Scan using ... (hnsw)\n\nSET enable_seqscan = on; -- Re-enable for normal operation\n\n-- Check index sizes\nSELECT indexname, pg_size_pretty(pg_relation_size(indexname::regclass)) AS size\nFROM pg_indexes WHERE tablename = 'documents';\n```\n\nIf EXPLAIN still shows sequential scans with `enable_seqscan = off`, verify indexes exist and queries use correct operators (`\u003c@>` for BM25, `\u003c=>` for cosine). For more pgvector debugging guidance, see the **pgvector-semantic-search** skill.\n\n## Common Issues\n\n| Symptom | Likely Cause | Fix |\n|---------|--------------|-----|\n| Missing exact matches | Keyword search not returning them | Check BM25 index exists; verify text_config matches content language |\n| Poor semantic results | Embedding model mismatch | Ensure query embedding uses same model as stored embeddings |\n| Slow queries | Large candidate pools or missing indexes | Reduce inner LIMIT; verify both indexes exist and are used (EXPLAIN) |\n| Skewed results | One method dominating | Adjust RRF weights; verify both searches return reasonable candidates |\n","content_type":"text/markdown; charset=utf-8","language":"markdown","size":12600,"content_sha256":"cb4837e06c8072acbd1134522d6f29bec819c4b8ae0fc6417f106e739b7421e8"},{"filename":"references/setup-timescaledb-hypertables.md","content":"---\nname: setup-timescaledb-hypertables\ndescription: |\n Use this skill when creating database schemas or tables for Timescale, TimescaleDB, TigerData, or Tiger Cloud, especially for time-series, IoT, metrics, events, or log data. Use this to improve the performance of any insert-heavy table.\n\n **Trigger when user asks to:**\n - Create or design SQL schemas/tables AND Timescale/TimescaleDB/TigerData/Tiger Cloud is available\n - Set up hypertables, compression, retention policies, or continuous aggregates\n - Configure partition columns, segment_by, order_by, or chunk intervals\n - Optimize time-series database performance or storage\n - Create tables for sensors, metrics, telemetry, events, or transaction logs\n\n **Keywords:** CREATE TABLE, hypertable, Timescale, TimescaleDB, time-series, IoT, metrics, sensor data, compression policy, continuous aggregates, columnstore, retention policy, chunk interval, segment_by, order_by\n\n Step-by-step instructions for hypertable creation, column selection, compression policies, retention, continuous aggregates, and indexes.\nlicense: Apache-2.0\ncompatibility: Requires PostgreSQL 15+ with TimescaleDB\nmetadata:\n author: tigerdata\n---\n\n# TimescaleDB Complete Setup\n\nInstructions for insert-heavy data patterns where data is inserted but rarely changed:\n\n- **Time-series data** (sensors, metrics, system monitoring)\n- **Event logs** (user events, audit trails, application logs)\n- **Transaction records** (orders, payments, financial transactions)\n- **Sequential data** (records with auto-incrementing IDs and timestamps)\n- **Append-only datasets** (immutable records, historical data)\n\n## Step 1: Create Hypertable\n\n```sql\nCREATE TABLE your_table_name (\n timestamp TIMESTAMPTZ NOT NULL,\n entity_id TEXT NOT NULL, -- device_id, user_id, symbol, etc.\n category TEXT, -- sensor_type, event_type, asset_class, etc.\n value_1 DOUBLE PRECISION, -- price, temperature, latency, etc.\n value_2 DOUBLE PRECISION, -- volume, humidity, throughput, etc.\n value_3 INTEGER, -- count, status, level, etc.\n metadata JSONB -- flexible additional data\n) WITH (\n tsdb.hypertable,\n tsdb.partition_column='timestamp',\n tsdb.enable_columnstore=true, -- Disable if table has vector columns\n tsdb.segmentby='entity_id', -- See selection guide below\n tsdb.orderby='timestamp DESC', -- See selection guide below\n tsdb.sparse_index='minmax(value_1),minmax(value_2),minmax(value_3)' -- see selection guide below\n);\n```\n\n### Compression Decision\n\n- **Enable by default** for insert-heavy patterns\n- **Disable** if table has vector type columns (pgvector) - indexes on vector columns incompatible with columnstore\n\n### Partition Column Selection\n\nMust be time-based (TIMESTAMP/TIMESTAMPTZ/DATE) or integer (INT/BIGINT) with good temporal/sequential distribution.\n\n**Common patterns:**\n\n- TIME-SERIES: `timestamp`, `event_time`, `measured_at`\n- EVENT LOGS: `event_time`, `created_at`, `logged_at`\n- TRANSACTIONS: `created_at`, `transaction_time`, `processed_at`\n- SEQUENTIAL: `id` (auto-increment when no timestamp), `sequence_number`\n- APPEND-ONLY: `created_at`, `inserted_at`, `id`\n\n**Less ideal:** `ingested_at` (when data entered system - use only if it's your primary query dimension)\n**Avoid:** `updated_at` (breaks time ordering unless it's primary query dimension)\n\n### Segment_By Column Selection\n\n**PREFER SINGLE COLUMN** - multi-column rarely optimal. Multi-column can only work for highly correlated columns (e.g., metric_name + metric_type) with sufficient row density.\n\n**Requirements:**\n\n- Frequently used in WHERE clauses (most common filter)\n- Good row density (>100 rows per value per chunk)\n- Primary logical partition/grouping\n\n**Examples:**\n\n- IoT: `device_id`\n- Finance: `symbol`\n- Metrics: `service_name`, `service_name, metric_type` (if sufficient row density), `metric_name, metric_type` (if sufficient row density)\n- Analytics: `user_id` if sufficient row density, otherwise `session_id`\n- E-commerce: `product_id` if sufficient row density, otherwise `category_id`\n\n**Row density guidelines:**\n\n- Target: >100 rows per segment_by value within each chunk.\n- Poor: \u003c10 rows per segment_by value per chunk → choose less granular column\n- What to do with low-density columns: prepend to order_by column list.\n\n**Query pattern drives choice:**\n\n```sql\nSELECT * FROM table WHERE entity_id = 'X' AND timestamp > ...\n-- ↳ segment_by: entity_id (if >100 rows per chunk)\n```\n\n**Avoid:** timestamps, unique IDs, low-density columns (\u003c100 rows/value/chunk), columns rarely used in filtering\n\n### Order_By Column Selection\n\nCreates natural time-series progression when combined with segment_by for optimal compression.\n\n**Most common:** `timestamp DESC`\n\n**Examples:**\n\n- IoT/Finance/E-commerce: `timestamp DESC`\n- Metrics: `metric_name, timestamp DESC` (if metric_name has too low density for segment_by)\n- Analytics: `user_id, timestamp DESC` (user_id has too low density for segment_by)\n\n**Alternative patterns:**\n\n- `sequence_id DESC` for event streams with sequence numbers\n- `timestamp DESC, event_order DESC` for sub-ordering within same timestamp\n\n**Low-density column handling:**\nIf a column has \u003c100 rows per chunk (too low for segment_by), prepend it to order_by:\n\n- Example: `metric_name` has 20 rows/chunk → use `segment_by='service_name'`, `order_by='metric_name, timestamp DESC'`\n- Groups similar values together (all temperature readings, then pressure readings) for better compression\n\n**Good test:** ordering created by `(segment_by_column, order_by_column)` should form a natural time-series progression. Values close to each other in the progression should be similar.\n\n**Avoid in order_by:** random columns, columns with high variance between adjacent rows, columns unrelated to segment_by\n\n### Compression Sparse Index Selection\n\n**Sparse indexes** enable query filtering on compressed data without decompression. Store metadata per batch (~1000 rows) to eliminate batches that don't match query predicates.\n\n**Types:**\n\n- **minmax:** Min/max values per batch - for range queries (>, \u003c, BETWEEN) on numeric/temporal columns\n\n**Use minmax for:** price, temperature, measurement, timestamp (range filtering)\n\n**Use for:**\n\n- minmax for outlier detection (temperature > 90).\n- minmax for fields that are highly correlated with segmentby and orderby columns (e.g. if orderby includes `created_at`, minmax on `updated_at` is useful).\n\n**Avoid:** rarely filtered columns.\n\nIMPORTANT: NEVER index columns in segmentby or orderby. Orderby columns will always have minmax indexes without any configuration.\n\n**Configuration:**\nThe format is a comma-separated list of type_of_index(column_name).\n\n```sql\nALTER TABLE table_name SET (\n timescaledb.sparse_index = 'minmax(value_1),minmax(value_2)'\n);\n```\n\nExplicit configuration available since v2.22.0 (was auto-created since v2.16.0).\n\n### Chunk Time Interval (Optional)\n\nDefault: 7 days (use if volume unknown, or ask user). Adjust based on volume:\n\n- High frequency: 1 hour - 1 day\n- Medium: 1 day - 1 week\n- Low: 1 week - 1 month\n\n```sql\nSELECT set_chunk_time_interval('your_table_name', INTERVAL '1 day');\n```\n\n**Good test:** recent chunk indexes should fit in less than 25% of RAM.\n\n### Indexes & Primary Keys\n\nCommon index patterns - composite indexes on an id and timestamp:\n\n```sql\nCREATE INDEX idx_entity_timestamp ON your_table_name (entity_id, timestamp DESC);\n```\n\n**Important:** Only create indexes you'll actually use - each has maintenance overhead.\n\n**Primary key and unique constraints rules:** Must include partition column.\n\n**Option 1: Composite PK with partition column**\n\n```sql\nALTER TABLE your_table_name ADD PRIMARY KEY (entity_id, timestamp);\n```\n\n**Option 2: Single-column PK (only if it's the partition column)**\n\n```sql\nCREATE TABLE ... (id BIGINT PRIMARY KEY, ...) WITH (tsdb.partition_column='id');\n```\n\n**Option 3: No PK**: strict uniqueness is often not required for insert-heavy patterns.\n\n## Step 2: Compression Policy (Optional)\n\n**IMPORTANT**: If you used `tsdb.enable_columnstore=true` in Step 1, starting with TimescaleDB version 2.23 a columnstore policy is **automatically created** with `after => INTERVAL '7 days'`. You only need to call `add_columnstore_policy()` if you want to customize the `after` interval to something other than 7 days.\n\nSet `after` interval for when: data becomes mostly immutable (some updates/backfill OK) AND B-tree indexes aren't needed for queries (less common criterion).\n\n```sql\n-- In TimescaleDB 2.23 and later only needed if you want to override the default 7-day policy created by tsdb.enable_columnstore=true\n-- Remove the existing auto-created policy first:\n-- CALL remove_columnstore_policy('your_table_name');\n-- Then add custom policy:\n-- CALL add_columnstore_policy('your_table_name', after => INTERVAL '1 day');\n```\n\n## Step 3: Retention Policy\n\nIMPORTANT: Don't guess - ask user or comment out if unknown.\n\n```sql\n-- Example - replace with requirements or comment out\nSELECT add_retention_policy('your_table_name', INTERVAL '365 days');\n```\n\n## Step 4: Create Continuous Aggregates\n\nUse different aggregation intervals for different uses.\n\n### Short-term (Minutes/Hours)\n\nFor up-to-the-minute dashboards on high-frequency data.\n\n```sql\nCREATE MATERIALIZED VIEW your_table_hourly\nWITH (timescaledb.continuous) AS\nSELECT\n time_bucket(INTERVAL '1 hour', timestamp) AS bucket,\n entity_id,\n category,\n COUNT(*) as record_count,\n AVG(value_1) as avg_value_1,\n MIN(value_1) as min_value_1,\n MAX(value_1) as max_value_1,\n SUM(value_2) as sum_value_2\nFROM your_table_name\nGROUP BY bucket, entity_id, category;\n```\n\n### Long-term (Days/Weeks/Months)\n\nFor long-term reporting and analytics.\n\n```sql\nCREATE MATERIALIZED VIEW your_table_daily\nWITH (timescaledb.continuous) AS\nSELECT\n time_bucket(INTERVAL '1 day', timestamp) AS bucket,\n entity_id,\n category,\n COUNT(*) as record_count,\n AVG(value_1) as avg_value_1,\n MIN(value_1) as min_value_1,\n MAX(value_1) as max_value_1,\n PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY value_1) as median_value_1,\n PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY value_1) as p95_value_1,\n SUM(value_2) as sum_value_2\nFROM your_table_name\nGROUP BY bucket, entity_id, category;\n```\n\n## Step 5: Aggregate Refresh Policies\n\nSet up refresh policies based on your data freshness requirements.\n\n**start_offset:** Usually omit (refreshes all). Exception: If you don't care about refreshing data older than X (see below). With retention policy on raw data: match the retention policy.\n\n**end_offset:** Set beyond active update window (e.g., 15 min if data usually arrives within 10 min). Data newer than end_offset won't appear in queries without real-time aggregation. If you don't know your update window, use the size of the time_bucket in the query, but not less than 5 minutes.\n\n**schedule_interval:** Set to the same value as the end_offset but not more than 1 hour.\n\n**Hourly - frequent refresh for dashboards:**\n\n```sql\nSELECT add_continuous_aggregate_policy('your_table_hourly',\n start_offset => NULL,\n end_offset => INTERVAL '15 minutes',\n schedule_interval => INTERVAL '15 minutes');\n```\n\n**Daily - less frequent for reports:**\n\n```sql\nSELECT add_continuous_aggregate_policy('your_table_daily',\n start_offset => NULL,\n end_offset => INTERVAL '1 hour',\n schedule_interval => INTERVAL '1 hour');\n```\n\n**Use start_offset only if you don't care about refreshing old data**\nUse for high-volume systems where query accuracy on older data doesn't matter:\n\n```sql\n-- the following aggregate can be stale for data older than 7 days\n-- SELECT add_continuous_aggregate_policy('aggregate_for_last_7_days',\n-- start_offset => INTERVAL '7 days', -- only refresh last 7 days (NULL = refresh all)\n-- end_offset => INTERVAL '15 minutes',\n-- schedule_interval => INTERVAL '15 minutes');\n```\n\nIMPORTANT: you MUST set a start_offset to be less than the retention policy on raw data. By default, set the start_offset equal to the retention policy.\nIf the retention policy is commented out, comment out the start_offset as well. like this:\n\n```sql\nSELECT add_continuous_aggregate_policy('your_table_daily',\n start_offset => NULL, -- Use NULL to refresh all data, or set to retention period if enabled on raw data\n-- start_offset => INTERVAL '\u003cretention period here>', -- uncomment if retention policy is enabled on the raw data table\n end_offset => INTERVAL '1 hour',\n schedule_interval => INTERVAL '1 hour');\n```\n\n## Step 6: Real-Time Aggregation (Optional)\n\nReal-time combines materialized + recent raw data at query time. Provides up-to-date results at the cost of higher query latency.\n\nMore useful for fine-grained aggregates (e.g., minutely) than coarse ones (e.g., daily/monthly) since large buckets will be mostly incomplete with recent data anyway.\n\nDisabled by default in v2.13+, before that it was enabled by default.\n\n**Use when:** Need data newer than end_offset, up-to-minute dashboards, can tolerate higher query latency\n**Disable when:** Performance critical, refresh policies sufficient, high query volume, missing and stale data for recent data is acceptable\n\n**Enable for current results (higher query cost):**\n\n```sql\nALTER MATERIALIZED VIEW your_table_hourly SET (timescaledb.materialized_only = false);\n```\n\n**Disable for performance (but with stale results):**\n\n```sql\nALTER MATERIALIZED VIEW your_table_hourly SET (timescaledb.materialized_only = true);\n```\n\n## Step 7: Compress Aggregates\n\nRule: segment_by = ALL GROUP BY columns except time_bucket, order_by = time_bucket DESC\n\n```sql\n-- Hourly\nALTER MATERIALIZED VIEW your_table_hourly SET (\n timescaledb.enable_columnstore,\n timescaledb.segmentby = 'entity_id, category',\n timescaledb.orderby = 'bucket DESC'\n);\nCALL add_columnstore_policy('your_table_hourly', after => INTERVAL '3 days');\n\n-- Daily\nALTER MATERIALIZED VIEW your_table_daily SET (\n timescaledb.enable_columnstore,\n timescaledb.segmentby = 'entity_id, category',\n timescaledb.orderby = 'bucket DESC'\n);\nCALL add_columnstore_policy('your_table_daily', after => INTERVAL '7 days');\n```\n\n## Step 8: Aggregate Retention\n\nAggregates are typically kept longer than raw data.\nIMPORTANT: Don't guess - ask user or you **MUST comment out if unknown**.\n\n```sql\n-- Example - replace or comment out\nSELECT add_retention_policy('your_table_hourly', INTERVAL '2 years');\nSELECT add_retention_policy('your_table_daily', INTERVAL '5 years');\n```\n\n## Step 9: Performance Indexes on Continuous Aggregates\n\n**Index strategy:** Analyze WHERE clauses in common queries → Create indexes matching filter columns + time ordering\n\n**Pattern:** `(filter_column, bucket DESC)` supports `WHERE filter_column = X AND bucket >= Y ORDER BY bucket DESC`\n\nExamples:\n\n```sql\nCREATE INDEX idx_hourly_entity_bucket ON your_table_hourly (entity_id, bucket DESC);\nCREATE INDEX idx_hourly_category_bucket ON your_table_hourly (category, bucket DESC);\n```\n\n**Multi-column filters:** Create composite indexes for `WHERE entity_id = X AND category = Y`:\n\n```sql\nCREATE INDEX idx_hourly_entity_category_bucket ON your_table_hourly (entity_id, category, bucket DESC);\n```\n\n**Important:** Only create indexes you'll actually use - each has maintenance overhead.\n\n## Step 10: Optional Enhancements\n\n### Space Partitioning (NOT RECOMMENDED)\n\nOnly for query patterns where you ALWAYS filter by the space-partition column with expert knowledge and extensive benchmarking. STRONGLY prefer time-only partitioning.\n\n## Step 11: Verify Configuration\n\n```sql\n-- Check hypertable\nSELECT * FROM timescaledb_information.hypertables\nWHERE hypertable_name = 'your_table_name';\n\n-- Check compression settings\nSELECT * FROM hypertable_compression_stats('your_table_name');\n\n-- Check aggregates\nSELECT * FROM timescaledb_information.continuous_aggregates;\n\n-- Check policies\nSELECT * FROM timescaledb_information.jobs ORDER BY job_id;\n\n-- Monitor chunk information\nSELECT\n chunk_name,\n range_start,\n range_end,\n is_compressed\nFROM timescaledb_information.chunks\nWHERE hypertable_name = 'your_table_name'\nORDER BY range_start DESC;\n```\n\n## Performance Guidelines\n\n- **Chunk size:** Recent chunk indexes should fit in less than 25% of RAM\n- **Compression:** Expect 90%+ reduction (10x) with proper columnstore config\n- **Query optimization:** Use continuous aggregates for historical queries and dashboards\n- **Memory:** Run `timescaledb-tune` for self-hosting (auto-configured on cloud)\n\n## Schema Best Practices\n\n### Do's and Don'ts\n\n- ✅ Use `TIMESTAMPTZ` NOT `timestamp`\n- ✅ Use `>=` and `\u003c` NOT `BETWEEN` for timestamps\n- ✅ Use `TEXT` with constraints NOT `char(n)`/`varchar(n)`\n- ✅ Use `snake_case` NOT `CamelCase`\n- ✅ Use `BIGINT GENERATED ALWAYS AS IDENTITY` NOT `SERIAL`\n- ✅ Use `BIGINT` for IDs by default over `INTEGER` or `SMALLINT`\n- ✅ Use `DOUBLE PRECISION` by default over `REAL`/`FLOAT`\n- ✅ Use `NUMERIC` NOT `MONEY`\n- ✅ Use `NOT EXISTS` NOT `NOT IN`\n- ✅ Use `time_bucket()` or `date_trunc()` NOT `timestamp(0)` for truncation\n\n## API Reference (Current vs Deprecated)\n\n**Deprecated Parameters → New Parameters:**\n\n- `timescaledb.compress` → `timescaledb.enable_columnstore`\n- `timescaledb.compress_segmentby` → `timescaledb.segmentby`\n- `timescaledb.compress_orderby` → `timescaledb.orderby`\n\n**Deprecated Functions → New Functions:**\n\n- `add_compression_policy()` → `add_columnstore_policy()`\n- `remove_compression_policy()` → `remove_columnstore_policy()`\n- `compress_chunk()` → `convert_to_columnstore()` (use with `CALL`, not `SELECT`)\n- `decompress_chunk()` → `convert_to_rowstore()` (use with `CALL`, not `SELECT`)\n\n**Compression Stats (use functions, not views):**\n\n- Use function: `hypertable_compression_stats('table_name')`\n- Use function: `chunk_compression_stats('_timescaledb_internal._hyper_X_Y_chunk')`\n- Note: Views like `columnstore_settings` may not be available in all versions; use functions instead\n\n**Manual Compression Example:**\n\n```sql\n-- Compress a specific chunk\nCALL convert_to_columnstore('_timescaledb_internal._hyper_7_1_chunk');\n\n-- Check compression statistics\nSELECT\n number_compressed_chunks,\n pg_size_pretty(before_compression_total_bytes) as before_compression,\n pg_size_pretty(after_compression_total_bytes) as after_compression,\n ROUND(100.0 * (1 - after_compression_total_bytes::numeric / NULLIF(before_compression_total_bytes, 0)), 1) as compression_pct\nFROM hypertable_compression_stats('your_table_name');\n```\n\n## Questions to Ask User\n\n1. What kind of data will you be storing?\n2. How do you expect to use the data?\n3. What queries will you run?\n4. How long to keep the data?\n5. Column types if unclear\n","content_type":"text/markdown; charset=utf-8","language":"markdown","size":18850,"content_sha256":"dcfce6a31d21f427f852eb030e36a30ff5d38f5ad06176beb35f02f0485e2098"}],"content_json":{"type":"doc","content":[{"type":"heading","attrs":{"level":1},"content":[{"text":"PostgreSQL Expert Skills","type":"text"}]},{"type":"paragraph","content":[{"text":"This skill provides comprehensive PostgreSQL expertise through specialized references. Load the appropriate reference based on the task.","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Available References","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Table Design","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"design-postgres-tables","type":"text","marks":[{"type":"link","attrs":{"href":"references/design-postgres-tables.md","title":null}},{"type":"strong"}]},{"text":" — Data types, constraints, indexes, JSONB patterns, partitioning, and PostgreSQL best practices. ","type":"text"},{"text":"Use for any general table/schema design task.","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"design-postgis-tables","type":"text","marks":[{"type":"link","attrs":{"href":"references/design-postgis-tables.md","title":null}},{"type":"strong"}]},{"text":" — PostGIS spatial table design: geometry vs geography types, SRIDs, spatial indexing, and location-based query patterns. ","type":"text"},{"text":"Use when the task involves geographic or spatial data.","type":"text","marks":[{"type":"strong"}]}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Search","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"pgvector-semantic-search","type":"text","marks":[{"type":"link","attrs":{"href":"references/pgvector-semantic-search.md","title":null}},{"type":"strong"}]},{"text":" — Vector similarity search with pgvector: HNSW/IVFFlat indexes, halfvec storage, quantization, filtered search, and tuning. ","type":"text"},{"text":"Use for embeddings, RAG, or semantic search.","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"postgres-hybrid-text-search","type":"text","marks":[{"type":"link","attrs":{"href":"references/postgres-hybrid-text-search.md","title":null}},{"type":"strong"}]},{"text":" — Hybrid search combining BM25 keyword search with pgvector semantic search using RRF. ","type":"text"},{"text":"Use when combining keyword and meaning-based search.","type":"text","marks":[{"type":"strong"}]}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"TimescaleDB","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"setup-timescaledb-hypertables","type":"text","marks":[{"type":"link","attrs":{"href":"references/setup-timescaledb-hypertables.md","title":null}},{"type":"strong"}]},{"text":" — Hypertable creation, compression, retention policies, continuous aggregates, and indexes. ","type":"text"},{"text":"Use when setting up TimescaleDB from scratch.","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"find-hypertable-candidates","type":"text","marks":[{"type":"link","attrs":{"href":"references/find-hypertable-candidates.md","title":null}},{"type":"strong"}]},{"text":" — SQL queries to analyze existing tables and score them for hypertable conversion. ","type":"text"},{"text":"Use when evaluating which tables to migrate.","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"migrate-postgres-tables-to-hypertables","type":"text","marks":[{"type":"link","attrs":{"href":"references/migrate-postgres-tables-to-hypertables.md","title":null}},{"type":"strong"}]},{"text":" — Step-by-step migration: partition column selection, in-place vs blue-green, validation. ","type":"text"},{"text":"Use when executing a migration.","type":"text","marks":[{"type":"strong"}]}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"How to Use","type":"text"}]},{"type":"ordered_list","attrs":{"order":1,"listStyle":"number"},"content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Identify which reference matches the user's task from the descriptions above.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Load the reference file to get detailed instructions and SQL patterns.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"For tasks spanning multiple areas (e.g., \"design a table with vector search\"), load multiple references as needed.","type":"text"}]}]}]},{"type":"hr","attrs":{"markup":"---"}}]},"metadata":{"date":"2026-06-05","name":"postgres","author":"@skillopedia","source":{"stars":1751,"repo_name":"pg-aiguide","origin_url":"https://github.com/timescale/pg-aiguide/blob/HEAD/skills/postgres/SKILL.md","repo_owner":"timescale","body_sha256":"0b909640fbeab5152e504794363b8043944a036967f42f9783a7e102c6fd708a","cluster_key":"456fdc452d2cd8024b036ce97df674daf084985921fef148efe8e08fd39eae38","clean_bundle":{"format":"clean-skill-bundle-v1","source":"timescale/pg-aiguide/skills/postgres/SKILL.md","attachments":[{"id":"a6c51dd0-1e0c-56d9-ab61-dc7b3a0f3ea2","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/a6c51dd0-1e0c-56d9-ab61-dc7b3a0f3ea2/attachment.md","path":"references/design-postgis-tables.md","size":16924,"sha256":"c94284724fb0d611cf7717b2cfa8863a135db62a5c144e17dbbf691ead760b87","contentType":"text/markdown; charset=utf-8"},{"id":"cda1319a-6a64-5fcd-a3a6-32448d887f92","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/cda1319a-6a64-5fcd-a3a6-32448d887f92/attachment.md","path":"references/design-postgres-tables.md","size":16881,"sha256":"a4822cefdc283979e8a75f312780cb00aad83fb03b994f72e1675dc04a8eb252","contentType":"text/markdown; charset=utf-8"},{"id":"7ee2ad90-cbd7-5a56-a9fe-1667993e4597","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/7ee2ad90-cbd7-5a56-a9fe-1667993e4597/attachment.md","path":"references/find-hypertable-candidates.md","size":10525,"sha256":"12472d435c00f8676faa0be2d9a068ffea1c57f4f8376bacfe5ec921a727bd94","contentType":"text/markdown; charset=utf-8"},{"id":"260890ee-adb1-5a10-85b9-ccbfc12726b9","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/260890ee-adb1-5a10-85b9-ccbfc12726b9/attachment.md","path":"references/migrate-postgres-tables-to-hypertables.md","size":15068,"sha256":"1d52f9d8351a62deaaa63f5d226954d93439d9c2c8d0bd60745958dc43779565","contentType":"text/markdown; charset=utf-8"},{"id":"aefea659-3796-5731-a6be-875412111f91","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/aefea659-3796-5731-a6be-875412111f91/attachment.md","path":"references/pgvector-semantic-search.md","size":15106,"sha256":"0c23028b9bc8f9a8bc4f45042d5996877533f60d7798a074ce73373ebc3e24f6","contentType":"text/markdown; charset=utf-8"},{"id":"7b2ed06c-cd57-5071-8583-36fa2e2b3c98","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/7b2ed06c-cd57-5071-8583-36fa2e2b3c98/attachment.md","path":"references/postgres-hybrid-text-search.md","size":12600,"sha256":"cb4837e06c8072acbd1134522d6f29bec819c4b8ae0fc6417f106e739b7421e8","contentType":"text/markdown; charset=utf-8"},{"id":"6ea7e21c-842f-5895-9b43-475cfd9b7f98","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/6ea7e21c-842f-5895-9b43-475cfd9b7f98/attachment.md","path":"references/setup-timescaledb-hypertables.md","size":18850,"sha256":"dcfce6a31d21f427f852eb030e36a30ff5d38f5ad06176beb35f02f0485e2098","contentType":"text/markdown; charset=utf-8"}],"bundle_sha256":"8001c1bb616d6376373d0da85b91c32728028e02e2b4714782662e33fc555c7d","attachment_count":7,"text_attachments":0,"attachment_storage":"skillopedia-attachments-v1","binary_attachments":0,"excluded_attachments":[]},"cluster_size":1,"skill_md_path":"skills/postgres/SKILL.md","import_metadata":{"date":"2026-06-05","author":"@skillopedia","version":"v1","category":"data-analytics","category_label":"Data"},"exact_dupes_collapsed_into_this":0},"license":"Apache-2.0","version":"v1","category":"data-analytics","metadata":{"author":"tigerdata"},"import_tag":"clean-skills-v1","description":"Use this skill for any PostgreSQL database work — table design, indexing, data types, constraints, extensions (pgvector, PostGIS, TimescaleDB), search, and migrations.\n\n**Trigger when user asks to:**\n- Design or modify PostgreSQL tables, schemas, or data models\n- Choose data types, constraints, indexes, or partitioning strategies\n- Work with pgvector embeddings, semantic search, or RAG\n- Set up full-text search, hybrid search, or BM25 ranking\n- Use PostGIS for spatial/geographic data\n- Set up TimescaleDB hypertables for time-series data\n- Migrate tables to hypertables or evaluate migration candidates\n\n**Keywords:** PostgreSQL, Postgres, SQL, schema, table design, indexes, constraints, pgvector, PostGIS, TimescaleDB, hypertable, semantic search, hybrid search, BM25, time-series\n"}},"renderedAt":1782979265371}

PostgreSQL Expert Skills This skill provides comprehensive PostgreSQL expertise through specialized references. Load the appropriate reference based on the task. Available References Table Design - design-postgres-tables — Data types, constraints, indexes, JSONB patterns, partitioning, and PostgreSQL best practices. Use for any general table/schema design task. - design-postgis-tables — PostGIS spatial table design: geometry vs geography types, SRIDs, spatial indexing, and location-based query patterns. Use when the task involves geographic or spatial data. Search - pgvector-semantic-search —…