sql-cookbook — Skillopedia

Platform-Specific Syntax Guide PostgreSQL (Aurora, RDS, Supabase, Neon) Working with dates and times: Text operations: JSON and array handling: Tuning guidance: - Run to inspect actual execution plans - Add indexes on columns used in WHERE and JOIN conditions - Prefer over for correlated lookups - Consider partial indexes for frequently used filter predicates - Deploy connection pooling when handling concurrent workloads Snowflake Date and time functions: Text and pattern matching: Navigating semi-structured columns: Tuning guidance: - Rely on clustering keys rather than traditional indexes f…

-- POSIX regex\n\n-- Splitting and extracting\nLEFT(str, n), RIGHT(str, n)\nSPLIT_PART(str, delimiter, part_num)\nREGEXP_REPLACE(str, pattern, replacement)\n```\n\n**JSON and array handling:**\n```sql\n-- Accessing JSON fields\npayload->>'key' -- returns text\npayload->'nested'->'field' -- returns json object\npayload#>>'{a,b,c}' -- deep path as text\n\n-- Array utilities\nARRAY_AGG(col)\nANY(arr_col)\narr_col @> ARRAY['val']\n```\n\n**Tuning guidance:**\n- Run `EXPLAIN ANALYZE` to inspect actual execution plans\n- Add indexes on columns used in WHERE and JOIN conditions\n- Prefer `EXISTS` over `IN` for correlated lookups\n- Consider partial indexes for frequently used filter predicates\n- Deploy connection pooling when handling concurrent workloads\n\n### Snowflake\n\n**Date and time functions:**\n```sql\n-- Getting current moments\nCURRENT_DATE(), CURRENT_TIMESTAMP(), SYSDATE()\n\n-- Shifting dates\nDATEADD(day, 7, some_date)\nDATEDIFF(day, start_dt, end_dt)\n\n-- Truncation\nDATE_TRUNC('month', event_ts)\n\n-- Component extraction\nYEAR(event_ts), MONTH(event_ts), DAY(event_ts)\nDAYOFWEEK(event_ts)\n\n-- Formatting\nTO_CHAR(event_ts, 'YYYY-MM-DD')\n```\n\n**Text and pattern matching:**\n```sql\n-- Case-insensitive matching (default collation dependent)\ncol ILIKE '%term%'\nREGEXP_LIKE(col, 'pattern')\n\n-- Working with VARIANT / semi-structured data\ncol:key::string\nPARSE_JSON('{\"a\": 1}')\nGET_PATH(variant_col, 'path.to.field')\n\n-- Expanding nested arrays\nSELECT f.value FROM tbl, LATERAL FLATTEN(input => arr_col) f\n```\n\n**Navigating semi-structured columns:**\n```sql\n-- Dot-path access on VARIANT\npayload:customer:name::STRING\npayload:items[0]:price::NUMBER\n\n-- Exploding nested arrays into rows\nSELECT\n t.id,\n elem.value:name::STRING AS item_name,\n elem.value:qty::NUMBER AS item_qty\nFROM my_table t,\nLATERAL FLATTEN(input => t.payload:items) elem\n```\n\n**Tuning guidance:**\n- Rely on clustering keys rather than traditional indexes for large tables\n- Target clustering key columns in filters to maximize partition pruning\n- Size the virtual warehouse appropriately for query complexity\n- Reuse results via `RESULT_SCAN(LAST_QUERY_ID())` to skip re-execution\n- Use transient tables for ephemeral staging data\n\n### BigQuery\n\n**Date and time functions:**\n```sql\n-- Current moments\nCURRENT_DATE(), CURRENT_TIMESTAMP()\n\n-- Shifting dates forward and backward\nDATE_ADD(some_date, INTERVAL 7 DAY)\nDATE_SUB(some_date, INTERVAL 1 MONTH)\nDATE_DIFF(end_dt, start_dt, DAY)\nTIMESTAMP_DIFF(end_ts, start_ts, HOUR)\n\n-- Truncation\nDATE_TRUNC(event_ts, MONTH)\nTIMESTAMP_TRUNC(event_ts, HOUR)\n\n-- Component extraction\nEXTRACT(YEAR FROM event_ts)\nEXTRACT(DAYOFWEEK FROM event_ts) -- Sunday = 1\n\n-- Display formatting\nFORMAT_DATE('%Y-%m-%d', date_col)\nFORMAT_TIMESTAMP('%Y-%m-%d %H:%M:%S', ts_col)\n```\n\n**Text operations:**\n```sql\n-- No native ILIKE; lowercase first\nLOWER(col) LIKE '%term%'\nREGEXP_CONTAINS(col, r'pattern')\nREGEXP_EXTRACT(col, r'pattern')\n\n-- Splitting and reassembling\nSPLIT(str, delimiter) -- produces an ARRAY\nARRAY_TO_STRING(arr, delimiter)\n```\n\n**Arrays and structs:**\n```sql\n-- Array utilities\nARRAY_AGG(col)\nUNNEST(arr_col)\nARRAY_LENGTH(arr_col)\nval IN UNNEST(arr_col)\n\n-- Struct field access\nstruct_col.field_name\n```\n\n**Tuning guidance:**\n- Always apply filters on partition columns (typically date) to minimize bytes scanned\n- Add clustering on columns that appear frequently in WHERE clauses\n- Swap `COUNT(DISTINCT ...)` for `APPROX_COUNT_DISTINCT()` on high-cardinality data\n- Avoid `SELECT *` since billing scales with bytes read\n- Use `DECLARE` / `SET` for parameterized script logic\n- Run a dry-run estimate before executing expensive queries\n\n### Redshift\n\n**Date and time functions:**\n```sql\n-- Current moments\nCURRENT_DATE, GETDATE(), SYSDATE\n\n-- Shifting dates\nDATEADD(day, 7, some_date)\nDATEDIFF(day, start_dt, end_dt)\n\n-- Truncation\nDATE_TRUNC('month', event_ts)\n\n-- Component extraction\nEXTRACT(YEAR FROM event_ts)\nDATE_PART('dow', event_ts)\n```\n\n**Text operations:**\n```sql\n-- Case-insensitive matching\ncol ILIKE '%term%'\nREGEXP_INSTR(col, 'pattern') > 0\n\n-- Splitting and concatenation\nSPLIT_PART(str, delimiter, part_num)\nLISTAGG(col, ', ') WITHIN GROUP (ORDER BY col)\n```\n\n**Tuning guidance:**\n- Assign distribution keys (DISTKEY) so joined tables are co-located on the same nodes\n- Define sort keys (SORTKEY) on columns commonly used for filtering\n- Review query plans with `EXPLAIN`\n- Watch for DS_BCAST and DS_DIST in plans (signs of costly cross-node shuffles)\n- Run `ANALYZE` and `VACUUM` on a regular cadence\n- Use late-binding views for flexibility when schemas evolve\n\n### Databricks SQL\n\n**Date and time functions:**\n```sql\n-- Current moments\nCURRENT_DATE(), CURRENT_TIMESTAMP()\n\n-- Shifting dates\nDATE_ADD(some_date, 7)\nDATEDIFF(end_dt, start_dt)\nADD_MONTHS(some_date, 1)\n\n-- Truncation\nDATE_TRUNC('MONTH', event_ts)\nTRUNC(date_col, 'MM')\n\n-- Component extraction\nYEAR(event_ts), MONTH(event_ts)\nDAYOFWEEK(event_ts)\n```\n\n**Delta Lake capabilities:**\n```sql\n-- Query historical snapshots\nSELECT * FROM tbl TIMESTAMP AS OF '2024-01-15'\nSELECT * FROM tbl VERSION AS OF 42\n\n-- Inspect table history\nDESCRIBE HISTORY tbl\n\n-- Upsert with MERGE\nMERGE INTO target USING source\nON target.id = source.id\nWHEN MATCHED THEN UPDATE SET *\nWHEN NOT MATCHED THEN INSERT *\n```\n\n**Tuning guidance:**\n- Run `OPTIMIZE` with `ZORDER` on frequently queried columns\n- Enable the Photon engine for compute-heavy workloads\n- Apply `CACHE TABLE` on datasets that are read repeatedly\n- Partition by low-cardinality date columns for efficient pruning\n\n## Reusable Analytical Patterns\n\n### Window Function Recipes\n\n```sql\n-- Assign sequential position within groups\nROW_NUMBER() OVER (PARTITION BY user_id ORDER BY event_ts DESC)\nRANK() OVER (PARTITION BY category ORDER BY revenue DESC)\nDENSE_RANK() OVER (ORDER BY score DESC)\n\n-- Cumulative sums and rolling averages\nSUM(amount) OVER (ORDER BY dt ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_total\nAVG(amount) OVER (ORDER BY dt ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS rolling_7d_avg\n\n-- Accessing adjacent rows\nLAG(val, 1) OVER (PARTITION BY entity ORDER BY dt) AS prior_val\nLEAD(val, 1) OVER (PARTITION BY entity ORDER BY dt) AS next_val\n\n-- Boundary values within a partition\nFIRST_VALUE(status) OVER (PARTITION BY user_id ORDER BY event_ts ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)\nLAST_VALUE(status) OVER (PARTITION BY user_id ORDER BY event_ts ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)\n\n-- Proportional share\nrevenue / SUM(revenue) OVER () AS share_of_total\nrevenue / SUM(revenue) OVER (PARTITION BY category) AS share_within_category\n```\n\n### Structured Queries with CTEs\n\n```sql\nWITH\n-- Stage 1: Narrow down to the target population\neligible_users AS (\n SELECT user_id, signup_date, tier\n FROM users\n WHERE signup_date >= DATE '2024-01-01'\n AND account_status = 'active'\n),\n\n-- Stage 2: Derive per-user measures\nper_user AS (\n SELECT\n u.user_id,\n u.tier,\n COUNT(DISTINCT e.session_id) AS sessions,\n SUM(e.revenue) AS revenue\n FROM eligible_users u\n LEFT JOIN events e ON u.user_id = e.user_id\n GROUP BY u.user_id, u.tier\n),\n\n-- Stage 3: Roll up to tier-level summary\ntier_summary AS (\n SELECT\n tier,\n COUNT(*) AS users,\n AVG(sessions) AS avg_sessions,\n SUM(revenue) AS total_revenue\n FROM per_user\n GROUP BY tier\n)\n\nSELECT * FROM tier_summary ORDER BY total_revenue DESC;\n```\n\n### Cohort Retention Tracking\n\n```sql\nWITH signup_cohorts AS (\n SELECT\n user_id,\n DATE_TRUNC('month', first_seen) AS cohort\n FROM users\n),\nmonthly_activity AS (\n SELECT\n user_id,\n DATE_TRUNC('month', activity_date) AS active_month\n FROM user_events\n)\nSELECT\n s.cohort,\n COUNT(DISTINCT s.user_id) AS cohort_size,\n COUNT(DISTINCT CASE\n WHEN a.active_month = s.cohort THEN a.user_id\n END) AS m0,\n COUNT(DISTINCT CASE\n WHEN a.active_month = s.cohort + INTERVAL '1 month' THEN a.user_id\n END) AS m1,\n COUNT(DISTINCT CASE\n WHEN a.active_month = s.cohort + INTERVAL '3 months' THEN a.user_id\n END) AS m3\nFROM signup_cohorts s\nLEFT JOIN monthly_activity a ON s.user_id = a.user_id\nGROUP BY s.cohort\nORDER BY s.cohort;\n```\n\n### Conversion Funnel Measurement\n\n```sql\nWITH step_flags AS (\n SELECT\n user_id,\n MAX(CASE WHEN event_name = 'page_view' THEN 1 ELSE 0 END) AS saw_page,\n MAX(CASE WHEN event_name = 'signup_start' THEN 1 ELSE 0 END) AS began_signup,\n MAX(CASE WHEN event_name = 'signup_complete' THEN 1 ELSE 0 END) AS finished_signup,\n MAX(CASE WHEN event_name = 'first_purchase' THEN 1 ELSE 0 END) AS made_purchase\n FROM events\n WHERE event_date >= CURRENT_DATE - INTERVAL '30 days'\n GROUP BY user_id\n)\nSELECT\n COUNT(*) AS total_users,\n SUM(saw_page) AS viewers,\n SUM(began_signup) AS signup_starts,\n SUM(finished_signup) AS signup_completions,\n SUM(made_purchase) AS purchasers,\n ROUND(100.0 * SUM(began_signup) / NULLIF(SUM(saw_page), 0), 1) AS view_to_start_rate,\n ROUND(100.0 * SUM(finished_signup) / NULLIF(SUM(began_signup), 0), 1) AS start_to_finish_rate,\n ROUND(100.0 * SUM(made_purchase) / NULLIF(SUM(finished_signup), 0), 1) AS finish_to_purchase_rate\nFROM step_flags;\n```\n\n### Removing Duplicate Records\n\n```sql\n-- Retain only the latest version of each entity\nWITH ordered AS (\n SELECT\n *,\n ROW_NUMBER() OVER (\n PARTITION BY entity_id\n ORDER BY modified_at DESC\n ) AS seq\n FROM raw_table\n)\nSELECT * FROM ordered WHERE seq = 1;\n```\n\n## Troubleshooting Query Failures\n\nWhen a query produces an error, work through these checks:\n\n1. **Syntax issues**: Verify dialect compatibility (e.g., BigQuery lacks `ILIKE`; only BigQuery has `SAFE_DIVIDE`)\n2. **Missing columns**: Confirm column names against the actual schema; watch for case sensitivity with quoted identifiers in PostgreSQL\n3. **Type conflicts**: Add explicit casts when comparing mismatched types (`CAST(col AS DATE)` or `col::DATE`)\n4. **Divide-by-zero**: Wrap denominators with `NULLIF(denominator, 0)` or use platform-specific safe division\n5. **Column ambiguity**: Alias every table in JOINs and always prefix column references\n6. **GROUP BY violations**: Every selected column that is not inside an aggregate must appear in GROUP BY (BigQuery permits alias references as an exception)\n---","attachment_filenames":[],"attachments":[],"content_json":{"type":"doc","content":[{"type":"heading","attrs":{"level":2},"content":[{"text":"Platform-Specific Syntax Guide","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"PostgreSQL (Aurora, RDS, Supabase, Neon)","type":"text"}]},{"type":"paragraph","content":[{"text":"Working with dates and times:","type":"text","marks":[{"type":"strong"}]}]},{"type":"code_block","attrs":{"wrap":false,"language":"sql"},"content":[{"text":"-- Today and now\nCURRENT_DATE, CURRENT_TIMESTAMP, NOW()\n\n-- Adding and subtracting intervals\nsome_date + INTERVAL '7 days'\nsome_date - INTERVAL '1 month'\n\n-- Round down to a time boundary\nDATE_TRUNC('month', event_ts)\n\n-- Pull out individual components\nEXTRACT(YEAR FROM event_ts)\nEXTRACT(DOW FROM event_ts) -- Sunday = 0\n\n-- Render as formatted text\nTO_CHAR(event_ts, 'YYYY-MM-DD')","type":"text"}]},{"type":"paragraph","content":[{"text":"Text operations:","type":"text","marks":[{"type":"strong"}]}]},{"type":"code_block","attrs":{"wrap":false,"language":"sql"},"content":[{"text":"-- Joining strings together\nfirst_name || ' ' || last_name\nCONCAT(first_name, ' ', last_name)\n\n-- Flexible matching\ncol ILIKE '%term%' -- ignores case\ncol ~ '^regex

Important: agents should read /llm.txt, /llms.txt, or /.well-known/skills.json to discover the public Skillopedia API.

-- POSIX regex\n\n-- Splitting and extracting\nLEFT(str, n), RIGHT(str, n)\nSPLIT_PART(str, delimiter, part_num)\nREGEXP_REPLACE(str, pattern, replacement)","type":"text"}]},{"type":"paragraph","content":[{"text":"JSON and array handling:","type":"text","marks":[{"type":"strong"}]}]},{"type":"code_block","attrs":{"wrap":false,"language":"sql"},"content":[{"text":"-- Accessing JSON fields\npayload->>'key' -- returns text\npayload->'nested'->'field' -- returns json object\npayload#>>'{a,b,c}' -- deep path as text\n\n-- Array utilities\nARRAY_AGG(col)\nANY(arr_col)\narr_col @> ARRAY['val']","type":"text"}]},{"type":"paragraph","content":[{"text":"Tuning guidance:","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Run ","type":"text"},{"text":"EXPLAIN ANALYZE","type":"text","marks":[{"type":"code_inline"}]},{"text":" to inspect actual execution plans","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Add indexes on columns used in WHERE and JOIN conditions","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Prefer ","type":"text"},{"text":"EXISTS","type":"text","marks":[{"type":"code_inline"}]},{"text":" over ","type":"text"},{"text":"IN","type":"text","marks":[{"type":"code_inline"}]},{"text":" for correlated lookups","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Consider partial indexes for frequently used filter predicates","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Deploy connection pooling when handling concurrent workloads","type":"text"}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Snowflake","type":"text"}]},{"type":"paragraph","content":[{"text":"Date and time functions:","type":"text","marks":[{"type":"strong"}]}]},{"type":"code_block","attrs":{"wrap":false,"language":"sql"},"content":[{"text":"-- Getting current moments\nCURRENT_DATE(), CURRENT_TIMESTAMP(), SYSDATE()\n\n-- Shifting dates\nDATEADD(day, 7, some_date)\nDATEDIFF(day, start_dt, end_dt)\n\n-- Truncation\nDATE_TRUNC('month', event_ts)\n\n-- Component extraction\nYEAR(event_ts), MONTH(event_ts), DAY(event_ts)\nDAYOFWEEK(event_ts)\n\n-- Formatting\nTO_CHAR(event_ts, 'YYYY-MM-DD')","type":"text"}]},{"type":"paragraph","content":[{"text":"Text and pattern matching:","type":"text","marks":[{"type":"strong"}]}]},{"type":"code_block","attrs":{"wrap":false,"language":"sql"},"content":[{"text":"-- Case-insensitive matching (default collation dependent)\ncol ILIKE '%term%'\nREGEXP_LIKE(col, 'pattern')\n\n-- Working with VARIANT / semi-structured data\ncol:key::string\nPARSE_JSON('{\"a\": 1}')\nGET_PATH(variant_col, 'path.to.field')\n\n-- Expanding nested arrays\nSELECT f.value FROM tbl, LATERAL FLATTEN(input => arr_col) f","type":"text"}]},{"type":"paragraph","content":[{"text":"Navigating semi-structured columns:","type":"text","marks":[{"type":"strong"}]}]},{"type":"code_block","attrs":{"wrap":false,"language":"sql"},"content":[{"text":"-- Dot-path access on VARIANT\npayload:customer:name::STRING\npayload:items[0]:price::NUMBER\n\n-- Exploding nested arrays into rows\nSELECT\n t.id,\n elem.value:name::STRING AS item_name,\n elem.value:qty::NUMBER AS item_qty\nFROM my_table t,\nLATERAL FLATTEN(input => t.payload:items) elem","type":"text"}]},{"type":"paragraph","content":[{"text":"Tuning guidance:","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Rely on clustering keys rather than traditional indexes for large tables","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Target clustering key columns in filters to maximize partition pruning","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Size the virtual warehouse appropriately for query complexity","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Reuse results via ","type":"text"},{"text":"RESULT_SCAN(LAST_QUERY_ID())","type":"text","marks":[{"type":"code_inline"}]},{"text":" to skip re-execution","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Use transient tables for ephemeral staging data","type":"text"}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"BigQuery","type":"text"}]},{"type":"paragraph","content":[{"text":"Date and time functions:","type":"text","marks":[{"type":"strong"}]}]},{"type":"code_block","attrs":{"wrap":false,"language":"sql"},"content":[{"text":"-- Current moments\nCURRENT_DATE(), CURRENT_TIMESTAMP()\n\n-- Shifting dates forward and backward\nDATE_ADD(some_date, INTERVAL 7 DAY)\nDATE_SUB(some_date, INTERVAL 1 MONTH)\nDATE_DIFF(end_dt, start_dt, DAY)\nTIMESTAMP_DIFF(end_ts, start_ts, HOUR)\n\n-- Truncation\nDATE_TRUNC(event_ts, MONTH)\nTIMESTAMP_TRUNC(event_ts, HOUR)\n\n-- Component extraction\nEXTRACT(YEAR FROM event_ts)\nEXTRACT(DAYOFWEEK FROM event_ts) -- Sunday = 1\n\n-- Display formatting\nFORMAT_DATE('%Y-%m-%d', date_col)\nFORMAT_TIMESTAMP('%Y-%m-%d %H:%M:%S', ts_col)","type":"text"}]},{"type":"paragraph","content":[{"text":"Text operations:","type":"text","marks":[{"type":"strong"}]}]},{"type":"code_block","attrs":{"wrap":false,"language":"sql"},"content":[{"text":"-- No native ILIKE; lowercase first\nLOWER(col) LIKE '%term%'\nREGEXP_CONTAINS(col, r'pattern')\nREGEXP_EXTRACT(col, r'pattern')\n\n-- Splitting and reassembling\nSPLIT(str, delimiter) -- produces an ARRAY\nARRAY_TO_STRING(arr, delimiter)","type":"text"}]},{"type":"paragraph","content":[{"text":"Arrays and structs:","type":"text","marks":[{"type":"strong"}]}]},{"type":"code_block","attrs":{"wrap":false,"language":"sql"},"content":[{"text":"-- Array utilities\nARRAY_AGG(col)\nUNNEST(arr_col)\nARRAY_LENGTH(arr_col)\nval IN UNNEST(arr_col)\n\n-- Struct field access\nstruct_col.field_name","type":"text"}]},{"type":"paragraph","content":[{"text":"Tuning guidance:","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Always apply filters on partition columns (typically date) to minimize bytes scanned","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Add clustering on columns that appear frequently in WHERE clauses","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Swap ","type":"text"},{"text":"COUNT(DISTINCT ...)","type":"text","marks":[{"type":"code_inline"}]},{"text":" for ","type":"text"},{"text":"APPROX_COUNT_DISTINCT()","type":"text","marks":[{"type":"code_inline"}]},{"text":" on high-cardinality data","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Avoid ","type":"text"},{"text":"SELECT *","type":"text","marks":[{"type":"code_inline"}]},{"text":" since billing scales with bytes read","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Use ","type":"text"},{"text":"DECLARE","type":"text","marks":[{"type":"code_inline"}]},{"text":" / ","type":"text"},{"text":"SET","type":"text","marks":[{"type":"code_inline"}]},{"text":" for parameterized script logic","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Run a dry-run estimate before executing expensive queries","type":"text"}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Redshift","type":"text"}]},{"type":"paragraph","content":[{"text":"Date and time functions:","type":"text","marks":[{"type":"strong"}]}]},{"type":"code_block","attrs":{"wrap":false,"language":"sql"},"content":[{"text":"-- Current moments\nCURRENT_DATE, GETDATE(), SYSDATE\n\n-- Shifting dates\nDATEADD(day, 7, some_date)\nDATEDIFF(day, start_dt, end_dt)\n\n-- Truncation\nDATE_TRUNC('month', event_ts)\n\n-- Component extraction\nEXTRACT(YEAR FROM event_ts)\nDATE_PART('dow', event_ts)","type":"text"}]},{"type":"paragraph","content":[{"text":"Text operations:","type":"text","marks":[{"type":"strong"}]}]},{"type":"code_block","attrs":{"wrap":false,"language":"sql"},"content":[{"text":"-- Case-insensitive matching\ncol ILIKE '%term%'\nREGEXP_INSTR(col, 'pattern') > 0\n\n-- Splitting and concatenation\nSPLIT_PART(str, delimiter, part_num)\nLISTAGG(col, ', ') WITHIN GROUP (ORDER BY col)","type":"text"}]},{"type":"paragraph","content":[{"text":"Tuning guidance:","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Assign distribution keys (DISTKEY) so joined tables are co-located on the same nodes","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Define sort keys (SORTKEY) on columns commonly used for filtering","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Review query plans with ","type":"text"},{"text":"EXPLAIN","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Watch for DS_BCAST and DS_DIST in plans (signs of costly cross-node shuffles)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Run ","type":"text"},{"text":"ANALYZE","type":"text","marks":[{"type":"code_inline"}]},{"text":" and ","type":"text"},{"text":"VACUUM","type":"text","marks":[{"type":"code_inline"}]},{"text":" on a regular cadence","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Use late-binding views for flexibility when schemas evolve","type":"text"}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Databricks SQL","type":"text"}]},{"type":"paragraph","content":[{"text":"Date and time functions:","type":"text","marks":[{"type":"strong"}]}]},{"type":"code_block","attrs":{"wrap":false,"language":"sql"},"content":[{"text":"-- Current moments\nCURRENT_DATE(), CURRENT_TIMESTAMP()\n\n-- Shifting dates\nDATE_ADD(some_date, 7)\nDATEDIFF(end_dt, start_dt)\nADD_MONTHS(some_date, 1)\n\n-- Truncation\nDATE_TRUNC('MONTH', event_ts)\nTRUNC(date_col, 'MM')\n\n-- Component extraction\nYEAR(event_ts), MONTH(event_ts)\nDAYOFWEEK(event_ts)","type":"text"}]},{"type":"paragraph","content":[{"text":"Delta Lake capabilities:","type":"text","marks":[{"type":"strong"}]}]},{"type":"code_block","attrs":{"wrap":false,"language":"sql"},"content":[{"text":"-- Query historical snapshots\nSELECT * FROM tbl TIMESTAMP AS OF '2024-01-15'\nSELECT * FROM tbl VERSION AS OF 42\n\n-- Inspect table history\nDESCRIBE HISTORY tbl\n\n-- Upsert with MERGE\nMERGE INTO target USING source\nON target.id = source.id\nWHEN MATCHED THEN UPDATE SET *\nWHEN NOT MATCHED THEN INSERT *","type":"text"}]},{"type":"paragraph","content":[{"text":"Tuning guidance:","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Run ","type":"text"},{"text":"OPTIMIZE","type":"text","marks":[{"type":"code_inline"}]},{"text":" with ","type":"text"},{"text":"ZORDER","type":"text","marks":[{"type":"code_inline"}]},{"text":" on frequently queried columns","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Enable the Photon engine for compute-heavy workloads","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Apply ","type":"text"},{"text":"CACHE TABLE","type":"text","marks":[{"type":"code_inline"}]},{"text":" on datasets that are read repeatedly","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Partition by low-cardinality date columns for efficient pruning","type":"text"}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Reusable Analytical Patterns","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Window Function Recipes","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"sql"},"content":[{"text":"-- Assign sequential position within groups\nROW_NUMBER() OVER (PARTITION BY user_id ORDER BY event_ts DESC)\nRANK() OVER (PARTITION BY category ORDER BY revenue DESC)\nDENSE_RANK() OVER (ORDER BY score DESC)\n\n-- Cumulative sums and rolling averages\nSUM(amount) OVER (ORDER BY dt ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_total\nAVG(amount) OVER (ORDER BY dt ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS rolling_7d_avg\n\n-- Accessing adjacent rows\nLAG(val, 1) OVER (PARTITION BY entity ORDER BY dt) AS prior_val\nLEAD(val, 1) OVER (PARTITION BY entity ORDER BY dt) AS next_val\n\n-- Boundary values within a partition\nFIRST_VALUE(status) OVER (PARTITION BY user_id ORDER BY event_ts ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)\nLAST_VALUE(status) OVER (PARTITION BY user_id ORDER BY event_ts ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)\n\n-- Proportional share\nrevenue / SUM(revenue) OVER () AS share_of_total\nrevenue / SUM(revenue) OVER (PARTITION BY category) AS share_within_category","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Structured Queries with CTEs","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"sql"},"content":[{"text":"WITH\n-- Stage 1: Narrow down to the target population\neligible_users AS (\n SELECT user_id, signup_date, tier\n FROM users\n WHERE signup_date >= DATE '2024-01-01'\n AND account_status = 'active'\n),\n\n-- Stage 2: Derive per-user measures\nper_user AS (\n SELECT\n u.user_id,\n u.tier,\n COUNT(DISTINCT e.session_id) AS sessions,\n SUM(e.revenue) AS revenue\n FROM eligible_users u\n LEFT JOIN events e ON u.user_id = e.user_id\n GROUP BY u.user_id, u.tier\n),\n\n-- Stage 3: Roll up to tier-level summary\ntier_summary AS (\n SELECT\n tier,\n COUNT(*) AS users,\n AVG(sessions) AS avg_sessions,\n SUM(revenue) AS total_revenue\n FROM per_user\n GROUP BY tier\n)\n\nSELECT * FROM tier_summary ORDER BY total_revenue DESC;","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Cohort Retention Tracking","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"sql"},"content":[{"text":"WITH signup_cohorts AS (\n SELECT\n user_id,\n DATE_TRUNC('month', first_seen) AS cohort\n FROM users\n),\nmonthly_activity AS (\n SELECT\n user_id,\n DATE_TRUNC('month', activity_date) AS active_month\n FROM user_events\n)\nSELECT\n s.cohort,\n COUNT(DISTINCT s.user_id) AS cohort_size,\n COUNT(DISTINCT CASE\n WHEN a.active_month = s.cohort THEN a.user_id\n END) AS m0,\n COUNT(DISTINCT CASE\n WHEN a.active_month = s.cohort + INTERVAL '1 month' THEN a.user_id\n END) AS m1,\n COUNT(DISTINCT CASE\n WHEN a.active_month = s.cohort + INTERVAL '3 months' THEN a.user_id\n END) AS m3\nFROM signup_cohorts s\nLEFT JOIN monthly_activity a ON s.user_id = a.user_id\nGROUP BY s.cohort\nORDER BY s.cohort;","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Conversion Funnel Measurement","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"sql"},"content":[{"text":"WITH step_flags AS (\n SELECT\n user_id,\n MAX(CASE WHEN event_name = 'page_view' THEN 1 ELSE 0 END) AS saw_page,\n MAX(CASE WHEN event_name = 'signup_start' THEN 1 ELSE 0 END) AS began_signup,\n MAX(CASE WHEN event_name = 'signup_complete' THEN 1 ELSE 0 END) AS finished_signup,\n MAX(CASE WHEN event_name = 'first_purchase' THEN 1 ELSE 0 END) AS made_purchase\n FROM events\n WHERE event_date >= CURRENT_DATE - INTERVAL '30 days'\n GROUP BY user_id\n)\nSELECT\n COUNT(*) AS total_users,\n SUM(saw_page) AS viewers,\n SUM(began_signup) AS signup_starts,\n SUM(finished_signup) AS signup_completions,\n SUM(made_purchase) AS purchasers,\n ROUND(100.0 * SUM(began_signup) / NULLIF(SUM(saw_page), 0), 1) AS view_to_start_rate,\n ROUND(100.0 * SUM(finished_signup) / NULLIF(SUM(began_signup), 0), 1) AS start_to_finish_rate,\n ROUND(100.0 * SUM(made_purchase) / NULLIF(SUM(finished_signup), 0), 1) AS finish_to_purchase_rate\nFROM step_flags;","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Removing Duplicate Records","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"sql"},"content":[{"text":"-- Retain only the latest version of each entity\nWITH ordered AS (\n SELECT\n *,\n ROW_NUMBER() OVER (\n PARTITION BY entity_id\n ORDER BY modified_at DESC\n ) AS seq\n FROM raw_table\n)\nSELECT * FROM ordered WHERE seq = 1;","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Troubleshooting Query Failures","type":"text"}]},{"type":"paragraph","content":[{"text":"When a query produces an error, work through these checks:","type":"text"}]},{"type":"ordered_list","attrs":{"order":1,"listStyle":"number"},"content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Syntax issues","type":"text","marks":[{"type":"strong"}]},{"text":": Verify dialect compatibility (e.g., BigQuery lacks ","type":"text"},{"text":"ILIKE","type":"text","marks":[{"type":"code_inline"}]},{"text":"; only BigQuery has ","type":"text"},{"text":"SAFE_DIVIDE","type":"text","marks":[{"type":"code_inline"}]},{"text":")","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Missing columns","type":"text","marks":[{"type":"strong"}]},{"text":": Confirm column names against the actual schema; watch for case sensitivity with quoted identifiers in PostgreSQL","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Type conflicts","type":"text","marks":[{"type":"strong"}]},{"text":": Add explicit casts when comparing mismatched types (","type":"text"},{"text":"CAST(col AS DATE)","type":"text","marks":[{"type":"code_inline"}]},{"text":" or ","type":"text"},{"text":"col::DATE","type":"text","marks":[{"type":"code_inline"}]},{"text":")","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Divide-by-zero","type":"text","marks":[{"type":"strong"}]},{"text":": Wrap denominators with ","type":"text"},{"text":"NULLIF(denominator, 0)","type":"text","marks":[{"type":"code_inline"}]},{"text":" or use platform-specific safe division","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Column ambiguity","type":"text","marks":[{"type":"strong"}]},{"text":": Alias every table in JOINs and always prefix column references","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"GROUP BY violations","type":"text","marks":[{"type":"strong"}]},{"text":": Every selected column that is not inside an aggregate must appear in GROUP BY (BigQuery permits alias references as an exception)","type":"text"}]}]}]},{"type":"hr","attrs":{"markup":"---"}}]},"metadata":{"date":"2026-06-05","name":"sql-cookbook","author":"@skillopedia","source":{"stars":63,"repo_name":"vm0-skills","origin_url":"https://github.com/vm0-ai/vm0-skills/blob/HEAD/sql-cookbook/SKILL.md","repo_owner":"vm0-ai","body_sha256":"5d3234017ae891d961710fba9dcc2adf2314ba629813967a371b8516be42482f","cluster_key":"7d27c6e48bf1fb5ad7b42874adf760c8cfd92c67f5d069904e05063d3e865865","clean_bundle":{"format":"clean-skill-bundle-v1","source":"vm0-ai/vm0-skills/sql-cookbook/SKILL.md","bundle_sha256":"4cd0797bb4f0a30e2039555bb7a7b21d46c5dfd7dd0486b70850522e09f1341b","attachment_count":0,"text_attachments":0,"binary_attachments":0},"cluster_size":1,"skill_md_path":"sql-cookbook/SKILL.md","import_metadata":{"date":"2026-06-05","author":"@skillopedia","version":"v1","category":"data-analytics","category_label":"Data"},"exact_dupes_collapsed_into_this":0},"version":"v1","category":"data-analytics","import_tag":"clean-skills-v1","description":"Craft and optimize SQL for any warehouse dialect — Snowflake, BigQuery, Databricks, Redshift, PostgreSQL. Covers window functions, CTEs, aggregations, joins, funnel queries, cohort retention, deduplication, dialect translation, slow query tuning, and cross-platform syntax differences."}},"renderedAt":1782979551409}

Important: agents should read /llm.txt, /llms.txt, or /.well-known/skills.json to discover the public Skillopedia API.