Alibaba Cloud EMR Serverless StarRocks Development & Operations Assistant Help users perform day-to-day table design, data ingestion, SQL writing & tuning, and health diagnostics on Alibaba Cloud EMR Serverless StarRocks. All cluster access goes through the bundled CLI (pymysql-based, uses the user's own account); no MySQL client required. Non-READ SQL is classified by sqlglot and requires confirmation before execution. Scope statement : This Skill focuses on using StarRocks — development, diagnostics, and day-to-day data operations. Cluster-internal data and schema operations (DDL, DML, mate…

\n\n-- LOCATE / INSTR: find substring position\nSELECT LOCATE('World', 'Hello World'); -- returns 7\n```\n\n---\n\n## Date Functions\n\n### Date Retrieval and Conversion\n\n| Function | Description | Example | Result |\n|------|------|------|------|\n| `NOW()` | Current datetime | `SELECT NOW()` | `2024-01-15 10:30:00` |\n| `CURDATE()` | Current date | `SELECT CURDATE()` | `2024-01-15` |\n| `DATE(dt)` | Extract the date part | `DATE('2024-01-15 10:30:00')` | `2024-01-15` |\n| `YEAR(dt)` | Extract year | `YEAR('2024-01-15')` | `2024` |\n| `MONTH(dt)` | Extract month | `MONTH('2024-01-15')` | `1` |\n| `DAY(dt)` | Extract day | `DAY('2024-01-15')` | `15` |\n| `HOUR(dt)` | Extract hour | `HOUR('2024-01-15 10:30:00')` | `10` |\n| `DAYOFWEEK(dt)` | Day of week (1=Sunday) | `DAYOFWEEK('2024-01-15')` | `2` (Monday) |\n\n### Date Arithmetic\n\n```sql\n-- DATE_ADD / DATE_SUB: add/subtract dates\nSELECT DATE_ADD('2024-01-15', INTERVAL 7 DAY); -- 2024-01-22\nSELECT DATE_SUB('2024-01-15', INTERVAL 1 MONTH); -- 2023-12-15\n\n-- DATEDIFF: difference in days\nSELECT DATEDIFF('2024-01-15', '2024-01-01'); -- 14\n\n-- TIMESTAMPDIFF: time difference (multiple units supported)\nSELECT TIMESTAMPDIFF(HOUR, '2024-01-15 00:00:00', '2024-01-15 10:30:00'); -- 10\n\n-- DATE_TRUNC: date truncation (commonly used for grouping)\nSELECT DATE_TRUNC('month', '2024-01-15'); -- 2024-01-01\nSELECT DATE_TRUNC('week', '2024-01-15'); -- 2024-01-15 (Monday)\nSELECT DATE_TRUNC('hour', '2024-01-15 10:35:00'); -- 2024-01-15 10:00:00\n```\n\n### Date Formatting\n\n```sql\n-- DATE_FORMAT: date to string\nSELECT DATE_FORMAT('2024-01-15', '%Y-%m-%d'); -- '2024-01-15'\nSELECT DATE_FORMAT(NOW(), '%Y-%m-%d %H:%i:%s');\n\n-- STR_TO_DATE: string to date\nSELECT STR_TO_DATE('15/01/2024', '%d/%m/%Y'); -- 2024-01-15\n\n-- Common format specifiers\n-- %Y 4-digit year %m 2-digit month %d 2-digit day\n-- %H 24-hour %i minute %s second\n-- %T equivalent to %H:%i:%s\n```\n\n### Time Zone Handling\n\n```sql\n-- CONVERT_TZ: time zone conversion\nSELECT CONVERT_TZ('2024-01-15 10:00:00', 'UTC', 'Asia/Shanghai');\n-- Result: 2024-01-15 18:00:00\n\n-- UNIX_TIMESTAMP / FROM_UNIXTIME\nSELECT UNIX_TIMESTAMP('2024-01-15 00:00:00'); -- timestamp\nSELECT FROM_UNIXTIME(1705248000); -- datetime\n```\n\n---\n\n## JSON Functions\n\n### JSON Parsing and Querying\n\n```sql\n-- Creating and querying JSON columns\n-- Use the JSON type when creating the table: json_data JSON\n\n-- Use the -> operator (returns JSON)\nSELECT json_data -> 'name' FROM table;\n\n-- Use the ->> operator (returns string)\nSELECT json_data ->> 'name' FROM table;\n\n-- JSON_QUERY: extract a JSON sub-object\nSELECT JSON_QUERY(json_data, '$.address') FROM users;\n\n-- JSON_VALUE: extract a scalar value\nSELECT JSON_VALUE(json_data, '$.age') FROM users;\n\n-- JSON_EXISTS: check whether a path exists\nSELECT * FROM users WHERE JSON_EXISTS(json_data, '$.phone');\n\n-- JSON_LENGTH: get array length\nSELECT JSON_LENGTH(json_data, '$.tags') FROM products;\n```\n\n### JSON Construction\n\n```sql\n-- PARSE_JSON: string to JSON\nSELECT PARSE_JSON('{\"name\": \"John\", \"age\": 30}');\n\n-- JSON_OBJECT: construct a JSON object\nSELECT JSON_OBJECT('name', name, 'age', age) FROM users;\n\n-- JSON_ARRAY: construct a JSON array\nSELECT JSON_ARRAY(1, 2, 3);\n```\n\n### JSON Expansion\n\n```sql\n-- json_each / json_each_text: expand a JSON object's key-value pairs\nSELECT jt.key, jt.value\nFROM products,\nLATERAL json_each(CAST(json_data AS JSON)) AS jt;\n\n-- Unnest a JSON array\nSELECT *\nFROM orders,\nLATERAL unnest(CAST(JSON_QUERY(json_data, '$.items') AS ARRAY\u003cJSON>)) AS t(item);\n```\n\n---\n\n## Array and MAP Functions\n\n### Array Functions\n\n```sql\n-- Array construction\nSELECT [1, 2, 3]; -- ARRAY literal\nSELECT ARRAY_GENERATE(1, 10); -- generates [1,2,...,10]\n\n-- Array access\nSELECT arr[1] FROM t; -- first element (1-based)\n\n-- Array operations\nSELECT ARRAY_LENGTH([1, 2, 3]); -- 3\nSELECT ARRAY_CONTAINS([1, 2, 3], 2); -- true\nSELECT ARRAY_APPEND([1, 2], 3); -- [1, 2, 3]\nSELECT ARRAY_REMOVE([1, 2, 3, 2], 2); -- [1, 3]\nSELECT ARRAY_DISTINCT([1, 2, 2, 3]); -- [1, 2, 3]\nSELECT ARRAY_SORT([3, 1, 2]); -- [1, 2, 3]\nSELECT ARRAY_SLICE([1, 2, 3, 4], 2, 3); -- [2, 3]\nSELECT ARRAY_CONCAT([1, 2], [3, 4]); -- [1, 2, 3, 4]\nSELECT ARRAY_JOIN([1, 2, 3], ','); -- '1,2,3'\n\n-- Unnest an array (pivot)\nSELECT order_id, tag\nFROM orders, unnest(tags) AS t(tag);\n\n-- Array aggregation\nSELECT user_id, ARRAY_AGG(product_id) FROM orders GROUP BY user_id;\nSELECT ARRAY_UNIQUE_AGG(tag) FROM products GROUP BY category;\n\n-- Lambda expressions (array transformations)\nSELECT ARRAY_MAP(x -> x * 2, [1, 2, 3]); -- [2, 4, 6]\nSELECT ARRAY_FILTER(x -> x > 2, [1, 2, 3, 4]); -- [3, 4]\nSELECT ARRAY_SUM(ARRAY_MAP(x -> x * x, [1,2,3])); -- 14\n```\n\n### MAP Functions\n\n```sql\n-- MAP construction\nSELECT MAP{'key1': 'value1', 'key2': 'value2'};\nSELECT MAP_FROM_ARRAYS(['k1','k2'], ['v1','v2']);\n\n-- MAP access\nSELECT map_col['key1'] FROM t;\n\n-- MAP operations\nSELECT MAP_KEYS(map_col) FROM t; -- get all keys\nSELECT MAP_VALUES(map_col) FROM t; -- get all values\nSELECT MAP_SIZE(map_col) FROM t; -- number of key-value pairs\nSELECT MAP_CONTAINS_KEY(map_col, 'k1'); -- whether a key exists\n\n-- MAP aggregation\nSELECT MAP_AGG(key_col, value_col) FROM t GROUP BY group_col;\n```\n\n---\n\n## Conditional Functions\n\n### CASE / IF / COALESCE\n\n```sql\n-- CASE WHEN: multi-branch condition\nSELECT\n order_id,\n CASE\n WHEN amount > 10000 THEN 'large'\n WHEN amount > 1000 THEN 'medium'\n ELSE 'small'\n END AS order_level\nFROM orders;\n\n-- IF: simple binary choice\nSELECT IF(status = 'paid', amount, 0) AS paid_amount FROM orders;\n\n-- IFNULL / COALESCE: NULL handling\nSELECT COALESCE(phone, email, 'unknown') AS contact FROM users;\nSELECT IFNULL(amount, 0) AS amount FROM orders;\n\n-- NULLIF: returns NULL when two values are equal\nSELECT NULLIF(a, b) FROM t; -- NULL when a = b, otherwise a\n-- Commonly used to avoid division by zero:\nSELECT total / NULLIF(count, 0) AS avg_value FROM stats;\n```\n\n---\n\n## Type Conversion\n\n### CAST Function\n\n```sql\n-- Basic type conversion\nCAST(value AS INT)\nCAST(value AS BIGINT)\nCAST(value AS DOUBLE)\nCAST(value AS VARCHAR)\nCAST(value AS DATE)\nCAST(value AS DATETIME)\nCAST(value AS DECIMAL(10, 2))\nCAST(value AS JSON)\nCAST(value AS ARRAY\u003cINT>)\n\n-- Common caveats\n-- 1. String to numeric: invalid characters return NULL (no error)\nSELECT CAST('abc' AS INT); -- NULL\n\n-- 2. Precision loss\nSELECT CAST(1.999 AS INT); -- 1 (truncated, not rounded)\n\n-- 3. Use DECIMAL for monetary computations to avoid floating-point precision issues\nSELECT CAST(0.1 + 0.2 AS DECIMAL(10,2)); -- 0.30 (exact)\nSELECT 0.1 + 0.2; -- 0.30000000000000004 (floating-point error)\n\n-- 4. Date string conversion\nSELECT CAST('2024-01-15' AS DATE); -- standard format converts directly\nSELECT CAST('20240115' AS DATE); -- compact format also supported\n```\n\n---\n\n## BITMAP and HLL Functions\n\n### BITMAP Functions (Exact Deduplication)\n\n```sql\n-- Define a BITMAP column at table creation\n-- CREATE TABLE: uv BITMAP BITMAP_UNION\n\n-- Convert to BITMAP during ingestion\n-- Stream Load: to_bitmap(user_id)\n-- INSERT: INSERT INTO ... SELECT ..., to_bitmap(user_id) ...\n\n-- Query\nSELECT BITMAP_UNION_COUNT(uv) FROM table WHERE dt = '2024-01-15';\n\n-- Intersection (users active on both days)\nSELECT BITMAP_INTERSECT_COUNT(uv) FROM table WHERE dt IN ('2024-01-14', '2024-01-15');\n\n-- Compute retention rate\nSELECT\n BITMAP_UNION_COUNT(CASE WHEN dt = '2024-01-14' THEN uv END) AS day1_users,\n BITMAP_COUNT(\n BITMAP_AND(\n BITMAP_UNION(CASE WHEN dt = '2024-01-14' THEN uv END),\n BITMAP_UNION(CASE WHEN dt = '2024-01-15' THEN uv END)\n )\n ) AS retained_users\nFROM user_bitmap_table;\n```\n\n### HLL Functions (Approximate Deduplication)\n\n```sql\n-- Define an HLL column at table creation\n-- CREATE TABLE: uv HLL HLL_UNION\n\n-- Convert to HLL during ingestion\n-- Stream Load: hll_hash(user_id)\n\n-- Query\nSELECT HLL_UNION_AGG(uv) FROM table WHERE dt = '2024-01-15';\n-- About 1% error\n\n-- HLL's advantage: distinct counts can be merged across different dimensions\nSELECT\n region,\n HLL_UNION_AGG(uv) AS uv\nFROM table\nGROUP BY region;\n```\n","content_type":"text/markdown; charset=utf-8","language":"markdown","size":12361,"content_sha256":"17f87077945d3a43e8b55c3e3a34be59234fca74e5428581fdac57c6520669b0"},{"filename":"references/sql/materialized-views.md","content":"# Materialized Views\n\n## Table of Contents\n\n1. [Sync MV vs Async MV](#sync-mv-vs-async-mv)\n2. [Sync Materialized Views (Rollup)](#sync-materialized-views-rollup)\n3. [Async Materialized Views](#async-materialized-views)\n4. [Query Rewrite](#query-rewrite)\n5. [MV Operations and Management](#mv-operations-and-management)\n6. [Common Pitfalls and Limitations](#common-pitfalls-and-limitations)\n\n---\n\n## Sync MV vs Async MV\n\n| Feature | Sync MV (Rollup) | Async MV |\n|------|---------------------|-------------|\n| Data consistency | Strong consistency (real-time sync with base table) | Eventual consistency (per refresh strategy) |\n| Supported table count | Single table only | Supports multi-table JOIN |\n| External table support | Not supported | Supported (Hive/Iceberg, etc.) |\n| Aggregate functions | SUM, MIN, MAX, COUNT, BITMAP_UNION, HLL_UNION | All aggregate functions |\n| JOIN | Not supported | Supported |\n| Window functions | Not supported | Supported |\n| Refresh method | Automatic real-time sync | Manual / scheduled / triggered |\n| Creation syntax | `CREATE MATERIALIZED VIEW ... AS SELECT` | `CREATE MATERIALIZED VIEW ... REFRESH ... AS SELECT` |\n| Use case | Single-table pre-aggregation (PV/UV/SUM) | Multi-table JOIN, complex query acceleration |\n| Version requirement | All versions | v2.4+ (v3.0+ recommended) |\n\n**Decision guide:**\n- Single-table simple aggregation + strong consistency required -> Sync MV\n- Multi-table JOIN / complex query / latency acceptable -> Async MV\n- External table data acceleration -> Async MV\n\n---\n\n## Sync Materialized Views (Rollup)\n\n### Creation Syntax\n\n```sql\n-- Create a sync MV for a Duplicate Key table\nCREATE MATERIALIZED VIEW mv_daily_sales\nAS\nSELECT\n dt,\n product_id,\n SUM(quantity) AS total_quantity,\n SUM(amount) AS total_amount,\n COUNT(*) AS order_count\nFROM order_detail\nGROUP BY dt, product_id;\n```\n\n### Supported Aggregate Functions\n\n| Aggregate function | Notes |\n|---------|------|\n| `SUM` | Sum |\n| `MIN` | Minimum |\n| `MAX` | Maximum |\n| `COUNT` | Count |\n| `BITMAP_UNION` | BITMAP aggregation (for exact deduplication) |\n| `HLL_UNION` | HLL aggregation (for approximate deduplication) |\n| `PERCENTILE_UNION` | Percentile aggregation |\n\n### Automatic Query Rewrite\n\nSync MV rewrite is fully automatic and transparent:\n\n```sql\n-- Original query -> optimizer automatically rewrites to read from the MV\nSELECT dt, SUM(amount) FROM order_detail\nWHERE dt = '2024-01-15'\nGROUP BY dt;\n-- Actual execution: reads from mv_daily_sales rather than scanning the base table\n\n-- Use EXPLAIN to verify whether the MV was hit\nEXPLAIN SELECT dt, SUM(amount) FROM order_detail\nWHERE dt = '2024-01-15' GROUP BY dt;\n-- Check whether the output contains rollup: mv_daily_sales\n```\n\n---\n\n## Async Materialized Views\n\n### Creation Syntax\n\n```sql\n-- Basic syntax\nCREATE MATERIALIZED VIEW mv_name\n[COMMENT 'description']\n[PARTITION BY partition_expr]\n[DISTRIBUTED BY HASH(column) [BUCKETS N]]\n[REFRESH refresh_strategy]\n[PROPERTIES (\"key\"=\"value\", ...)]\nAS\nSELECT ...;\n```\n\n### Refresh Strategies\n\n```sql\n-- 1. Manual refresh (default)\nCREATE MATERIALIZED VIEW mv_report\nREFRESH MANUAL\nAS SELECT ...;\n-- Trigger manually: REFRESH MATERIALIZED VIEW mv_report;\n\n-- 2. Scheduled refresh\nCREATE MATERIALIZED VIEW mv_report\nREFRESH ASYNC EVERY (INTERVAL 1 HOUR)\nAS SELECT ...;\n\n-- 3. Partition-based scheduled refresh (incremental refresh)\nCREATE MATERIALIZED VIEW mv_report\nPARTITION BY dt\nREFRESH ASYNC EVERY (INTERVAL 1 HOUR)\nAS\nSELECT dt, region, SUM(amount)\nFROM orders\nGROUP BY dt, region;\n-- Only refreshes partitions whose data has changed\n\n-- 4. Set refresh time range (only refresh the most recent N partitions)\nCREATE MATERIALIZED VIEW mv_report\nPARTITION BY dt\nREFRESH ASYNC EVERY (INTERVAL 1 HOUR)\nPROPERTIES (\n \"partition_refresh_number\" = \"3\" -- refresh at most 3 partitions per run\n)\nAS SELECT ...;\n```\n\n### Multi-Table JOIN MV\n\n```sql\n-- Wide-table materialized view: fact table JOINed with dimension tables\nCREATE MATERIALIZED VIEW mv_order_wide\nPARTITION BY dt\nDISTRIBUTED BY HASH(order_id) BUCKETS 16\nREFRESH ASYNC EVERY (INTERVAL 30 MINUTE)\nAS\nSELECT\n o.dt,\n o.order_id,\n o.user_id,\n o.amount,\n u.name AS user_name,\n u.region,\n p.product_name,\n p.category\nFROM orders o\nJOIN users u ON o.user_id = u.user_id\nJOIN products p ON o.product_id = p.product_id;\n```\n\n### External Table MV (Data Lake Acceleration)\n\n```sql\n-- Accelerate Hive table data\nCREATE MATERIALIZED VIEW mv_hive_summary\nPARTITION BY dt\nREFRESH ASYNC EVERY (INTERVAL 1 HOUR)\nAS\nSELECT dt, region, SUM(revenue) AS total_revenue\nFROM hive_catalog.db.sales_table\nGROUP BY dt, region;\n\n-- Accelerate Iceberg table data\nCREATE MATERIALIZED VIEW mv_iceberg_agg\nREFRESH ASYNC EVERY (INTERVAL 2 HOUR)\nAS\nSELECT event_date, COUNT(*) AS event_count\nFROM iceberg_catalog.db.events\nGROUP BY event_date;\n```\n\n---\n\n## Query Rewrite\n\n### Conditions for Rewrite\n\nFor an async MV, Query Rewrite requires:\n\n| Condition | Notes |\n|------|------|\n| MV is Active | In `SHOW MATERIALIZED VIEWS`, is_active = true |\n| Data freshness | Some staleness is allowed by default; configurable via `query_rewrite_consistency` |\n| Query is coverable | The query's tables, columns, aggregates, and JOINs can be covered by the MV |\n| Rewrite switch is on | `enable_materialized_view_rewrite = true` (default on) |\n\n### Rewrite Modes\n\n```sql\n-- 1. Default: rewrite only when MV data is fresh\nSET materialized_view_rewrite_mode = 'default';\n\n-- 2. Force rewrite: rewrite even if MV data may be stale (trade consistency for performance)\nSET materialized_view_rewrite_mode = 'force';\n\n-- 3. Disable rewrite\nSET materialized_view_rewrite_mode = 'disable';\n```\n\n### Verifying Rewrite\n\n```sql\n-- Use EXPLAIN to check\nEXPLAIN SELECT region, SUM(amount)\nFROM orders\nWHERE dt >= '2024-01-01'\nGROUP BY region;\n\n-- Check whether the output contains the MV name\n-- If rewrite takes effect, you will see something like:\n-- TABLE: mv_order_summary\n-- instead of the original table name\n```\n\n### Common Reasons for Rewrite Failure\n\n| Reason | Resolution |\n|------|---------|\n| Query contains columns not in the MV | Add the corresponding columns to the MV definition |\n| Aggregate function in the query not supported by the MV | Adjust the MV definition or the SQL |\n| Query's WHERE range exceeds the MV's partitions | Extend the MV's partition range |\n| MV is inactive | Check refresh status, fix issues, then `ALTER MATERIALIZED VIEW mv_name ACTIVE` |\n| Query uses non-deterministic functions | Avoid `now()`, `random()`, etc. in queries |\n\n---\n\n## MV Operations and Management\n\n### View MV Status\n\n```sql\n-- List all materialized views\nSHOW MATERIALIZED VIEWS;\n\n-- View details\nSHOW MATERIALIZED VIEWS LIKE 'mv_name';\n\n-- Show the CREATE statement\nSHOW CREATE MATERIALIZED VIEW mv_name;\n\n-- View refresh history\nSELECT * FROM information_schema.task_runs\nWHERE task_name LIKE '%mv_name%'\nORDER BY create_time DESC\nLIMIT 10;\n```\n\n### Manual Refresh\n\n```sql\n-- Full refresh\nREFRESH MATERIALIZED VIEW mv_name;\n\n-- Refresh specified partitions\nREFRESH MATERIALIZED VIEW mv_name\nPARTITION START('2024-01-01') END('2024-01-02');\n\n-- Force refresh (even if no data changed)\nREFRESH MATERIALIZED VIEW mv_name FORCE;\n\n-- Wait synchronously for refresh to complete\nREFRESH MATERIALIZED VIEW mv_name WITH SYNC MODE;\n```\n\n### Modifying an MV\n\n```sql\n-- Change the refresh strategy\nALTER MATERIALIZED VIEW mv_name\nREFRESH ASYNC EVERY (INTERVAL 2 HOUR);\n\n-- Change properties\nALTER MATERIALIZED VIEW mv_name\nSET (\"partition_refresh_number\" = \"5\");\n\n-- Pause/resume refresh\nALTER MATERIALIZED VIEW mv_name INACTIVE; -- pause\nALTER MATERIALIZED VIEW mv_name ACTIVE; -- resume\n\n-- Drop\nDROP MATERIALIZED VIEW mv_name;\n-- For a sync MV, use:\nDROP MATERIALIZED VIEW mv_name ON table_name;\n```\n\n---\n\n## Common Pitfalls and Limitations\n\n| Pitfall | Notes | Resolution |\n|------|------|---------|\n| MV definition contains `now()` | Refresh results are unstable | Use a business time column instead |\n| MV partition key inconsistent with the base table | Cannot refresh incrementally; only full refresh | The MV's partition key must map to the base table's partition column |\n| External-table MV cannot detect data changes | Always full refresh | Set a reasonable refresh interval |\n| Too many MVs cause refresh resource contention | Refresh tasks queue and time out | Limit MV count; stagger refresh times |\n| Refresh failure causes data staleness | Query Rewrite falls back to the base table | Monitor refresh status; set alerts |\n| Base table schema change for a multi-table JOIN MV | MV becomes inactive | Rebuild the MV |\n| MV query rewrite produces results different from the original query | Data staleness or incomplete MV coverage | Validate MV data; verify rewrite logic |\n\n### MV Design Recommendations\n\n1. **Design by query frequency:** Prioritize MVs for high-frequency queries; low-frequency queries aren't worth it.\n2. **Limit MV count:** Recommend no more than 5 MVs per base table.\n3. **Partition alignment:** Keep MV partition granularity equal to or coarser than the base table (day -> month).\n4. **Monitor refreshes:** Periodically verify refresh success and acceptable latency.\n5. **Reserve extra columns:** Include a few extra dimension columns in the MV to cover more query patterns.\n","content_type":"text/markdown; charset=utf-8","language":"markdown","size":9310,"content_sha256":"4901f7a2cb0effd1d8e62899a08e394cbee0f4b12ec1c704bd1cb558ace7e6e8"},{"filename":"references/sql/query-writing.md","content":"# Query Writing Best Practices\n\n## Table of Contents\n\n1. [SELECT Writing Conventions](#select-writing-conventions)\n2. [JOIN Writing Conventions](#join-writing-conventions)\n3. [Subquery vs JOIN vs CTE](#subquery-vs-join-vs-cte)\n4. [WHERE Condition Optimization](#where-condition-optimization)\n5. [GROUP BY and DISTINCT Optimization](#group-by-and-distinct-optimization)\n6. [ORDER BY and LIMIT Optimization](#order-by-and-limit-optimization)\n7. [UNION and UNION ALL](#union-and-union-all)\n8. [Other Query Optimization Techniques](#other-query-optimization-techniques)\n\n---\n\n## SELECT Writing Conventions\n\n### Select Only the Columns You Need\n\n```sql\n-- Anti-pattern: SELECT *\nSELECT * FROM orders WHERE order_date = '2024-01-15';\n\n-- Correct: select only the needed columns to reduce I/O and network transfer\nSELECT order_id, user_id, amount, status\nFROM orders\nWHERE order_date = '2024-01-15';\n```\n\n**Reason:** StarRocks is a columnar engine that only reads the data files of the specified columns. `SELECT *` reads all columns and wastes I/O.\n\n### Alias Conventions\n\n```sql\n-- Recommended: use meaningful aliases for both tables and columns\nSELECT\n o.order_id,\n o.amount AS order_amount,\n u.name AS user_name\nFROM orders AS o\nJOIN users AS u ON o.user_id = u.user_id;\n```\n\n---\n\n## JOIN Writing Conventions\n\n### JOIN Type Selection\n\n| JOIN type | Use case | Notes |\n|-----------|---------|------|\n| `INNER JOIN` | Both tables must match | Most common; returns only matching rows |\n| `LEFT JOIN` | Left table is primary; right may be null | Keeps all rows from the left table |\n| `RIGHT JOIN` | Right table is primary | Recommend rewriting as LEFT JOIN (readability) |\n| `FULL OUTER JOIN` | Keep both tables | Lower performance; use with caution |\n| `CROSS JOIN` | Cartesian product | Only for small datasets |\n| `LEFT SEMI JOIN` | Existence check (similar to EXISTS) | Doesn't return right-table columns; filter only |\n| `LEFT ANTI JOIN` | Non-existence check (similar to NOT EXISTS) | Doesn't return right-table columns; exclusion only |\n\n### JOIN Distribution Strategies\n\nStarRocks supports three JOIN distribution strategies. The optimizer selects automatically, but you can override via hints:\n\n**1. Broadcast Join**\n\n```sql\n-- Automatically chosen when the right-side row estimate ≤ broadcast_row_limit (default 15,000,000 rows)\n-- The small table is broadcast to all BE nodes\nSELECT o.*, p.product_name\nFROM orders o -- large table\nJOIN products p ON o.product_id = p.product_id; -- small table, auto-broadcast\n```\n\n**2. Shuffle Join**\n\n```sql\n-- Automatically chosen when two large tables are joined\n-- Both tables are redistributed to the same nodes by the JOIN key\nSELECT o.*, l.line_amount\nFROM orders o -- large table\nJOIN order_lines l ON o.order_id = l.order_id; -- large table\n```\n\n**3. Colocate Join**\n\n```sql\n-- Tables in the same Colocate Group already share the same distribution\n-- Local JOIN, zero network transfer, best performance\nSELECT o.*, l.line_amount\nFROM orders o\nJOIN order_lines l ON o.order_id = l.order_id;\n-- Prerequisite: both tables in the same Colocate Group with order_id as the bucket key\n```\n\n**Force a JOIN strategy with hints:**\n\n```sql\n-- Force Broadcast Join\nSELECT /*+ SET_VAR(broadcast_row_limit=15000000) */ o.*, p.product_name\nFROM orders o\nJOIN [broadcast] products p ON o.product_id = p.product_id;\n\n-- Force Shuffle Join\nSELECT o.*, l.line_amount\nFROM orders o\nJOIN [shuffle] order_lines l ON o.order_id = l.order_id;\n\n-- Force Bucket Join (local Shuffle)\nSELECT o.*, l.line_amount\nFROM orders o\nJOIN [bucket] order_lines l ON o.order_id = l.order_id;\n```\n\n### JOIN Condition Conventions\n\n```sql\n-- Correct: JOIN conditions go in ON; filter conditions go in WHERE\nSELECT o.order_id, u.name\nFROM orders o\nLEFT JOIN users u ON o.user_id = u.user_id\nWHERE o.order_date >= '2024-01-01'; -- filter condition\n\n-- Anti-pattern: LEFT JOIN filter condition in WHERE (effectively becomes INNER JOIN)\nSELECT o.order_id, u.name\nFROM orders o\nLEFT JOIN users u ON o.user_id = u.user_id\nWHERE u.status = 'active'; -- This filters out rows where u is NULL!\n\n-- Correct: put right-table filter conditions in the ON clause\nSELECT o.order_id, u.name\nFROM orders o\nLEFT JOIN users u ON o.user_id = u.user_id AND u.status = 'active';\n```\n\n### JOIN Order Optimization\n\n```sql\n-- Recommended: large table on the left, small tables on the right (helps Broadcast)\n-- The optimizer usually adjusts automatically, but explicit ordering is clearer\nSELECT /*+ SET_VAR(disable_join_reorder=false) */\n fact.*,\n dim1.name,\n dim2.category\nFROM fact_table fact -- largest fact table on the far left\nJOIN dim_table_1 dim1 ON ... -- smaller dimension table\nJOIN dim_table_2 dim2 ON ...; -- smaller dimension table\n```\n\n---\n\n## Subquery vs JOIN vs CTE\n\n### Decision Guide\n\n| Scenario | Recommended | Reason |\n|------|------|------|\n| Simple existence check | `EXISTS` / `IN` | Clear semantics |\n| Need to reference columns from the related table | `JOIN` | Subquery cannot return outer-table columns |\n| Same result set referenced multiple times | `CTE (WITH)` | Avoids redundant computation |\n| Logical layers > 2 | `CTE (WITH)` | Improves readability |\n| Simple filter list | `IN (value list)` | Concise and efficient |\n\n### CTE Writing Example\n\n```sql\n-- Use CTEs to organize a complex query in layers\nWITH\n-- Layer 1: filter active users\nactive_users AS (\n SELECT user_id, name, region\n FROM users\n WHERE last_login >= '2024-01-01'\n AND status = 'active'\n),\n-- Layer 2: compute per-user order statistics\nuser_orders AS (\n SELECT\n u.user_id,\n u.name,\n u.region,\n COUNT(*) AS order_count,\n SUM(o.amount) AS total_amount\n FROM active_users u\n JOIN orders o ON u.user_id = o.user_id\n WHERE o.order_date >= '2024-01-01'\n GROUP BY u.user_id, u.name, u.region\n)\n-- Final query\nSELECT\n region,\n COUNT(*) AS user_count,\n SUM(order_count) AS total_orders,\n AVG(total_amount) AS avg_amount\nFROM user_orders\nGROUP BY region\nORDER BY total_orders DESC;\n```\n\n### Rewriting a Subquery as a JOIN\n\n```sql\n-- Anti-pattern: correlated subquery (runs once per outer row)\nSELECT user_id, name,\n (SELECT COUNT(*) FROM orders o WHERE o.user_id = u.user_id) AS order_count\nFROM users u;\n\n-- Rewrite as a JOIN (single pass)\nSELECT u.user_id, u.name, COALESCE(oc.order_count, 0) AS order_count\nFROM users u\nLEFT JOIN (\n SELECT user_id, COUNT(*) AS order_count\n FROM orders\n GROUP BY user_id\n) oc ON u.user_id = oc.user_id;\n```\n\n### Rewriting NOT IN\n\n```sql\n-- Anti-pattern: NOT IN has a NULL pitfall\nSELECT * FROM orders\nWHERE user_id NOT IN (SELECT user_id FROM blacklist);\n-- If blacklist.user_id contains NULL, the whole result is empty!\n\n-- Correct rewrite 1: NOT EXISTS\nSELECT * FROM orders o\nWHERE NOT EXISTS (\n SELECT 1 FROM blacklist b WHERE b.user_id = o.user_id\n);\n\n-- Correct rewrite 2: LEFT ANTI JOIN\nSELECT o.* FROM orders o\nLEFT ANTI JOIN blacklist b ON o.user_id = b.user_id;\n```\n\n---\n\n## WHERE Condition Optimization\n\n### Partition Pruning\n\nPartition pruning is one of the most important optimizations, reducing scanned data by orders of magnitude.\n\n```sql\n-- Correct: reference the partition column directly; partition pruning works\nSELECT * FROM events\nWHERE event_date >= '2024-01-01' AND event_date \u003c '2024-02-01';\n\n-- Anti-pattern: wrapping the partition column in a function; partition pruning fails\nSELECT * FROM events\nWHERE DATE_FORMAT(event_date, '%Y-%m') = '2024-01';\n\n-- Anti-pattern: expression computation; partition pruning fails\nSELECT * FROM events\nWHERE event_date - INTERVAL 7 DAY >= '2024-01-01';\n\n-- Correct rewrite: move the computation to the constant side\nSELECT * FROM events\nWHERE event_date >= DATE_ADD('2024-01-01', INTERVAL 7 DAY);\n```\n\n### Bucket Pruning\n\n```sql\n-- If the table is HASH bucketed on user_id\n-- Equality conditions trigger bucket pruning, scanning only matching tablets\nSELECT * FROM orders WHERE user_id = 12345; -- bucket pruning works\n\n-- IN also supports bucket pruning\nSELECT * FROM orders WHERE user_id IN (12345, 67890); -- bucket pruning works\n\n-- Range conditions do not support bucket pruning\nSELECT * FROM orders WHERE user_id > 10000; -- bucket pruning does not work\n```\n\n### Predicate Writing Conventions\n\n```sql\n-- 1. Avoid implicit type conversion\nWHERE varchar_col = '123' -- correct\nWHERE varchar_col = 123 -- wrong: implicit conversion invalidates the index\n\n-- 2. Avoid applying functions to columns\nWHERE dt >= '2024-01-01' -- correct\nWHERE YEAR(dt) = 2024 -- wrong: cannot use the index\n\n-- 3. LIKE prefix matching can use the sort key\nWHERE name LIKE 'John%' -- can use the prefix index\nWHERE name LIKE '%John%' -- cannot use the prefix index\n\n-- 4. Multi-condition ordering: high selectivity first\nWHERE user_id = 12345 -- high selectivity (first)\n AND status = 'active' -- low selectivity\n AND create_date >= '2024-01-01' -- partition pruning\n```\n\n---\n\n## GROUP BY and DISTINCT Optimization\n\n### GROUP BY Optimization\n\n```sql\n-- 1. Keep the GROUP BY column list short\nSELECT region, COUNT(*) FROM orders GROUP BY region; -- good\n\n-- 2. For high-cardinality GROUP BY, consider two-stage aggregation (StarRocks enables automatically)\n-- Controllable via session variable\nSET new_planner_agg_stage = 2; -- force two-stage aggregation\n\n-- 3. Avoid GROUP BY followed by HAVING COUNT for Top-N; use a window function instead\n-- Anti-pattern\nSELECT region, SUM(amount)\nFROM orders\nGROUP BY region\nHAVING SUM(amount) > 100000;\n\n-- For Top-N, use a window function\nWITH ranked AS (\n SELECT region, SUM(amount) AS total,\n ROW_NUMBER() OVER(ORDER BY SUM(amount) DESC) AS rn\n FROM orders\n GROUP BY region\n)\nSELECT region, total FROM ranked WHERE rn \u003c= 10;\n```\n\n### DISTINCT Optimization\n\n```sql\n-- Low cardinality (\u003c 10 million): COUNT(DISTINCT) is fine\nSELECT COUNT(DISTINCT user_id) FROM events WHERE dt = '2024-01-15';\n\n-- Large cardinality (billions): use approximate deduplication\nSELECT APPROX_COUNT_DISTINCT(user_id) FROM events WHERE dt = '2024-01-15';\n-- About 2-3% error, several times faster\n\n-- Very high cardinality + exact precision: use a BITMAP column at table creation\n-- At creation: `uv BITMAP BITMAP_UNION`\n-- Query:\nSELECT BITMAP_UNION_COUNT(uv) FROM events_agg WHERE dt = '2024-01-15';\n\n-- Multi-column DISTINCT: rewrite as GROUP BY\n-- Anti-pattern\nSELECT DISTINCT region, category FROM products;\n-- Equivalent (usually better performance)\nSELECT region, category FROM products GROUP BY region, category;\n```\n\n---\n\n## ORDER BY and LIMIT Optimization\n\n### Basic Principles\n\n```sql\n-- 1. ORDER BY must have LIMIT (unless a global sort is truly required)\nSELECT * FROM orders ORDER BY create_time DESC LIMIT 100;\n\n-- 2. Top-N query optimization (StarRocks uses the TopN operator automatically)\nSELECT * FROM orders\nWHERE order_date = '2024-01-15'\nORDER BY amount DESC\nLIMIT 10;\n\n-- 3. Pagination\n-- Small offset: use OFFSET directly\nSELECT * FROM orders ORDER BY order_id LIMIT 20 OFFSET 100;\n\n-- Large offset: use a WHERE cursor for pagination (avoid huge OFFSET)\nSELECT * FROM orders\nWHERE order_id > 10000 -- order_id of the last row on the previous page\nORDER BY order_id\nLIMIT 20;\n```\n\n### Sort Key Optimization\n\n```sql\n-- If the table's sort key is (user_id, create_time)\n-- The following query can leverage the sort key for acceleration\nSELECT * FROM orders\nWHERE user_id = 12345\nORDER BY create_time DESC\nLIMIT 10;\n-- Sort key prefix match + ORDER BY in the same direction as the sort key -> avoids extra sorting\n```\n\n---\n\n## UNION and UNION ALL\n\n```sql\n-- UNION ALL: no deduplication, good performance (recommended)\nSELECT user_id, 'order' AS source FROM orders WHERE dt = '2024-01-15'\nUNION ALL\nSELECT user_id, 'return' AS source FROM returns WHERE dt = '2024-01-15';\n\n-- UNION: automatic deduplication, extra sorting overhead\n-- Use only when deduplication is truly needed\nSELECT user_id FROM orders WHERE dt = '2024-01-15'\nUNION\nSELECT user_id FROM returns WHERE dt = '2024-01-15';\n\n-- Combine multiple tables then aggregate: UNION ALL first, then outer GROUP BY\nSELECT source, COUNT(DISTINCT user_id)\nFROM (\n SELECT user_id, 'order' AS source FROM orders\n UNION ALL\n SELECT user_id, 'return' AS source FROM returns\n) t\nGROUP BY source;\n```\n\n---\n\n## Other Query Optimization Techniques\n\n### EXISTS vs IN\n\n```sql\n-- Small result set: use IN\nSELECT * FROM orders WHERE user_id IN (1, 2, 3, 4, 5);\n\n-- Large subquery result set: use EXISTS (StarRocks auto-optimizes to Semi Join)\nSELECT * FROM orders o\nWHERE EXISTS (SELECT 1 FROM active_users u WHERE u.user_id = o.user_id);\n```\n\n### CASE WHEN Optimization\n\n```sql\n-- Pivot (compute multi-dimensional statistics in one SQL)\nSELECT\n dt,\n COUNT(*) AS total_orders,\n COUNT(CASE WHEN status = 'paid' THEN 1 END) AS paid_count,\n COUNT(CASE WHEN status = 'cancelled' THEN 1 END) AS cancel_count,\n SUM(CASE WHEN status = 'paid' THEN amount ELSE 0 END) AS paid_amount\nFROM orders\nWHERE dt >= '2024-01-01'\nGROUP BY dt;\n```\n\n### LATERAL JOIN to Unnest Arrays\n\n```sql\n-- Unnest a JSON array or ARRAY column\nSELECT order_id, item\nFROM orders,\nLATERAL unnest(split(item_ids, ',')) AS t(item);\n\n-- Unnest an ARRAY-typed column\nSELECT order_id, tag\nFROM orders,\nLATERAL unnest(tags) AS t(tag);\n```\n","content_type":"text/markdown; charset=utf-8","language":"markdown","size":13368,"content_sha256":"bd6e739979666ab5378d741cf4d06a94c307eddaa405f9745bfdd67e876ad65f"},{"filename":"references/sql/sql-tuning-checklist.md","content":"# SQL Tuning Checklist\n\n## Table of Contents\n\n1. [Collecting Statistics](#collecting-statistics)\n2. [Reading Execution Plans](#reading-execution-plans)\n3. [Verifying Partition Pruning](#verifying-partition-pruning)\n4. [Tuning Join Strategies](#tuning-join-strategies)\n5. [Verifying Predicate Pushdown](#verifying-predicate-pushdown)\n6. [Low-Cardinality Dictionary Optimization](#low-cardinality-dictionary-optimization)\n7. [Tuning Pipeline DOP](#tuning-pipeline-dop)\n8. [Common Session Variables](#common-session-variables)\n9. [Tuning Checklist Summary](#tuning-checklist-summary)\n\n---\n\n## Collecting Statistics\n\nStatistics are the foundation that lets the CBO (cost-based optimizer) make correct decisions. Stale or missing statistics cause the optimizer to pick a wrong execution plan.\n\n### View Statistics Status\n\n```sql\n-- View table-level statistics\nSHOW TABLE STATS table_name;\n-- Watch Updated_time for staleness (older than 1 day needs an update)\n\n-- View column-level statistics\nSHOW COLUMN STATS table_name;\n-- Watch NDV (number of distinct values), MIN, MAX, NULL ratio\n```\n\n### Collect Statistics\n\n```sql\n-- Sampling collection (recommended, fast)\nANALYZE TABLE table_name;\n\n-- Full collection (more precise but slow)\nANALYZE FULL TABLE table_name;\n\n-- Collect for specific columns\nANALYZE TABLE table_name (col1, col2);\n\n-- Collect histograms (for columns with data skew)\nANALYZE TABLE table_name UPDATE HISTOGRAM ON col1, col2;\n\n-- Drop statistics\nDROP STATS table_name;\n```\n\n### Auto-Collection Configuration\n\n```sql\n-- View auto-collection status\nSHOW ANALYZE STATUS;\n\n-- Auto-collection is driven by FE background tasks. Related settings:\n-- enable_statistic_collect = true -- enable auto-collection\n-- statistic_auto_collect_interval = 300 -- check interval (seconds)\n-- statistic_collect_interval_sec = 14400 -- minimum collection interval (seconds)\n```\n\n### When to Collect Manually\n\n| Scenario | Action |\n|------|------|\n| After a large bulk load | `ANALYZE TABLE` |\n| Execution plan clearly unreasonable (e.g., Broadcast of a large table) | `ANALYZE TABLE` and retry |\n| After a table schema change | `ANALYZE TABLE` |\n| After first ingestion into a newly created table | `ANALYZE TABLE` |\n| Data distribution has changed significantly | `ANALYZE TABLE UPDATE HISTOGRAM ON ...` |\n\n---\n\n## Reading Execution Plans\n\n### Basic Workflow\n\n```sql\n-- 1. View the execution plan\nEXPLAIN VERBOSE \u003cyour_sql>;\n\n-- 2. Pay attention to the following operators (read bottom-up)\n-- OlapScanNode -> HashJoinNode -> AggregateNode -> SortNode -> ExchangeNode -> ResultSink\n```\n\n### Key Operators\n\n#### OlapScanNode (Local Table Scan)\n\n```\nSCAN OLAP table_name\n partitions=3/365 \u003c- partition pruning: 3 hit / 365 total\n rollup: mv_name \u003c- hit sync materialized view (rollup)\n tabletRatio=10/30 \u003c- bucket pruning\n cardinality=1000000 \u003c- CBO estimated rows (large deviation from actual = stale statistics)\n avgRowSize=128.0 \u003c- average row size\n predicates: dt >= '2024-01-01' \u003c- predicate pushed down to the storage layer\n```\n\n**Check points:**\n- `partitions` denominator much greater than the numerator -> partition pruning works\n- `partitions` numerator equals the denominator -> partition pruning not applied; check the WHERE conditions\n- `tabletRatio` numerator equals the denominator -> bucket pruning not applied (all tablets scanned)\n- `rollup` shows an MV name -> materialized view hit\n- `cardinality` is the **CBO's row estimate**. Compare it **directly** to the user-stated total table size to get the staleness ratio; a ratio > 10× means stats are stale.\n\n**How to compute the cardinality / stale-stats ratio (read this carefully):**\n\nUse **`cardinality` vs the user-provided total table row count**, as a direct ratio. Example: prompt says \"actual table size ~500M\", `cardinality=5000000` → `500M / 5M = 100×` → stats are stale by ~100×, run `ANALYZE TABLE`.\n\n**Do NOT** try to estimate \"how many rows the predicate should filter to\" and compare against that guess. Two reasons:\n1. You don't have runtime row counts — only the user's stated totals and the optimizer's estimate.\n2. Guessing predicate selectivity from a `WHERE` clause (e.g. assuming `create_time > '2025-04-01'` matches \"about one month of data\") introduces order-of-magnitude errors and is unrelated to the diagnostic task. The job is to flag stats staleness, not to model selectivity.\n\n**Independent diagnostic signals — must be reported separately:**\n`partitions`/`tabletRatio` (plan-level pruning) and `cardinality` (statistics freshness) are **orthogonal**. When both look bad in the same plan (e.g. `partitions=365/365` AND `cardinality` diverges from the total table size by 10×–100×), you must list them as **two independent findings**, each with its own root cause and fix. Do **not** use cardinality deviation to \"explain\" pruning failure, and do not let pruning failure absorb the stale-stats finding — they need different remediations (rewrite predicate / fix type vs. `ANALYZE TABLE`). Always quantify the cardinality gap explicitly using the direct comparison (e.g. \"estimate 5M vs total 500M ≈ 100×\").\n\n#### HashJoinNode\n\n```\nHASH JOIN\n join op: INNER JOIN\n distribution type: BROADCAST \u003c- distribution strategy\n equal join conjunct: a.id = b.id\n cardinality=5000000\n```\n\n**Check points:**\n- BROADCAST on a large table -> verify the right-hand table is actually small (check cardinality)\n- Should be COLOCATE but shows SHUFFLE -> check Colocate Group configuration\n- BROADCAST between two large tables -> statistics may be stale; retry after ANALYZE\n\n#### ExchangeNode\n\n```\nEXCHANGE\n distribution type: SHUFFLE \u003c- data redistribution type\n partition exprs: order_id\n```\n\n**Check points:**\n- Many SHUFFLEs -> check if Colocate Join can eliminate them\n- UNPARTITIONED -> data is funneled to a single node; may become a bottleneck\n\n---\n\n## Verifying Partition Pruning\n\n### How to Check\n\n```sql\n-- In EXPLAIN output, check the partitions field of OlapScanNode\nEXPLAIN VERBOSE SELECT * FROM events WHERE dt = '2024-01-15';\n\n-- Correct: partitions=1/365 (scans only 1 partition)\n-- Abnormal: partitions=365/365 (scans all partitions)\n```\n\n### Common Causes of Pruning Failure\n\n| Cause | Example | Fix |\n|------|------|---------|\n| Function wraps the partition column | `WHERE YEAR(dt) = 2024` | `WHERE dt >= '2024-01-01' AND dt \u003c '2025-01-01'` |\n| Expression computation | `WHERE dt + 1 = '2024-01-16'` | `WHERE dt = '2024-01-15'` |\n| Implicit type conversion | Partition column is DATE, condition uses a string | Ensure types match |\n| OR condition spans partitions | `WHERE dt = '2024-01-15' OR user_id = 1` | Split into UNION ALL |\n| Condition uses a non-partition column | `WHERE user_id = 12345` (partition column is dt) | Add a condition on the partition column |\n\n---\n\n## Tuning Join Strategies\n\n### Judging Whether a Join Strategy Is Reasonable\n\n```sql\nEXPLAIN VERBOSE \u003cjoin_query>;\n-- Inspect the distribution type of HashJoinNode\n```\n\n### Tuning Methods\n\n| Current strategy | Issue | Tuning method |\n|---------|------|---------|\n| BROADCAST but right-hand table is large | Broadcasting a large table causes OOM | 1. `ANALYZE TABLE` to update statistics\u003cbr>2. `SET broadcast_row_limit = N` to limit broadcast rows |\n| SHUFFLE but COLOCATE is possible | Unnecessary network transfer | Ensure both tables are in the same Colocate Group |\n| BROADCAST but want to force SHUFFLE | Specific scenario requires SHUFFLE | `JOIN [shuffle] table ON ...` |\n| COLOCATE not actually applied | Colocate Group configuration mismatch | Verify bucket count, bucket key types, and replica count match |\n\n### Colocate Join Prerequisites\n\n```sql\n-- 1. Both tables must be in the same Colocate Group\n-- 2. Bucket key columns must match in count, type, and order\n-- 3. Bucket count (BUCKETS) must be identical\n-- 4. Replica count must match\n-- 5. JOIN condition must include all bucket key columns\n\n-- Check Colocate status\nSHOW PROC '/colocation_group';\n```\n\n---\n\n## Verifying Predicate Pushdown\n\n### How to Check\n\n```sql\nEXPLAIN VERBOSE \u003cquery>;\n-- Inspect the predicates field of OlapScanNode\n-- Conditions appearing in predicates = pushed down to the storage layer\n```\n\n### Common Causes of Pushdown Failure\n\n| Cause | Example | Fix |\n|------|------|---------|\n| Cross-table predicate | `WHERE a.x + b.y > 10` | Split into single-table conditions |\n| WHERE condition on outer-table side of an OUTER JOIN | LEFT JOIN with a WHERE on the right table | Move into the ON clause |\n| Non-aggregate condition in HAVING | `HAVING status = 'A'` | Move into WHERE |\n\n---\n\n## Low-Cardinality Dictionary Optimization\n\nStarRocks automatically enables dictionary encoding optimization for low-cardinality string columns (e.g., status, region, gender), using integer IDs in place of string operations across Scan/Shuffle/Join/Aggregate stages.\n\n```sql\n-- Enabled automatically; usually no manual configuration needed\n-- To verify: check the DictDecode-related metrics\n-- in the Query Profile\n\n-- If you suspect low-cardinality optimization isn't applied:\nSET global_dict_columns = 'col1, col2'; -- specify manually\n\n-- Disable (for troubleshooting)\nSET enable_low_cardinality_optimize = false;\n```\n\n---\n\n## Tuning Pipeline DOP\n\nDOP (Degree of Parallelism) controls a query's parallelism.\n\n```sql\n-- View the current DOP\nSHOW VARIABLES LIKE 'pipeline_dop';\n-- Default 0 = automatic (CPU cores / 2)\n\n-- Set manually\nSET pipeline_dop = 8; -- fixed parallelism\n\n-- Temporarily adjust for a single query\nSELECT /*+ SET_VAR(pipeline_dop=4) */ ...\n```\n\n### DOP Tuning Recommendations\n\n| Scenario | Suggested DOP |\n|------|---------|\n| Default | 0 (auto) |\n| High-concurrency small queries (QPS > 100) | 1-2 (lower per-query parallelism, improve concurrency) |\n| Low-concurrency large queries | CPU core count (fully utilize resources) |\n| Mixed workload | Combine with Resource Group isolation |\n\n---\n\n## Common Session Variables\n\n### Query Behavior\n\n| Variable | Default | Description |\n|------|--------|------|\n| `query_timeout` | 300 | Query timeout (seconds) |\n| `query_mem_limit` | 0 | Per-query memory limit (0 = unlimited) |\n| `pipeline_dop` | 0 | Parallelism (0 = auto) |\n| `parallel_fragment_exec_instance_num` | 1 | Parallel instance count per BE |\n\n### Optimizer Control\n\n| Variable | Default | Description |\n|------|--------|------|\n| `broadcast_row_limit` | 10000000 | Broadcast Join row limit |\n| `new_planner_agg_stage` | 0 | Aggregation stages (0 = auto, 1 = single-stage, 2 = two-stage) |\n| `enable_materialized_view_rewrite` | true | Whether to enable MV rewrite |\n| `cbo_max_reorder_node_use_exhaustive` | 4 | Use exhaustive ordering for Joins with fewer than this count |\n| `disable_join_reorder` | false | Disable Join reorder |\n\n### Resource Control\n\n| Variable | Default | Description |\n|------|--------|------|\n| `resource_group` | '' | Specify the resource group |\n| `enable_spill` | false | Allow intermediate results to spill to disk |\n| `spill_mem_table_size` | 100MB | Memory buffer size before spilling |\n\n### Viewing and Setting\n\n```sql\n-- View all variables\nSHOW VARIABLES;\n\n-- View a specific variable\nSHOW VARIABLES LIKE '%timeout%';\n\n-- Set a session-level variable (current connection only)\nSET query_timeout = 600;\n\n-- Set a global variable (affects all new connections)\nSET GLOBAL query_timeout = 600;\n```\n\n---\n\n## Tuning Checklist Summary\n\nRun the following checks in order of priority:\n\n| # | Check | Command | Expected outcome |\n|---|--------|------|---------|\n| 1 | Are statistics fresh | `SHOW TABLE STATS t` | Updated_time \u003c 24h |\n| 2 | Is partition pruning effective | `EXPLAIN`, view partitions | numerator \u003c\u003c denominator |\n| 3 | Is bucket pruning effective | `EXPLAIN`, view tabletRatio | numerator \u003c\u003c denominator |\n| 4 | Is the Join strategy reasonable | `EXPLAIN`, view distribution type | Large table not BROADCAST |\n| 5 | Are predicates pushed down | `EXPLAIN`, view predicates | Conditions appear in ScanNode |\n| 6 | Is the MV hit | `EXPLAIN`, view rollup / table | MV name shown |\n| 7 | Are estimated rows accurate | `EXPLAIN`, view cardinality | Deviation from actual \u003c 10x |\n| 8 | Is DOP reasonable | Profile or SHOW VARIABLES | Matches workload |\n| 9 | Is memory sufficient | query_mem in Profile | Not near mem_limit |\n| 10 | Any spill | spill metrics in Profile | No spill or acceptable |\n\n### Three-Step Tuning Method\n\n```\n1. EXPLAIN -> identify issues (pruning failed, wrong strategy, row-count deviation)\n2. ANALYZE -> refresh statistics so the optimizer can make correct choices\n3. SET_VAR / Hint -> fine-tune optimizer behavior (last resort)\n```\n","content_type":"text/markdown; charset=utf-8","language":"markdown","size":12667,"content_sha256":"e516738af63bb7e2b233ed3db64ddaa6123a1ae975c6c4237850e40c66a5c4cb"},{"filename":"references/sql/window-functions.md","content":"# Window and Analytic Functions\n\n## Table of Contents\n\n1. [Window Function Overview](#window-function-overview)\n2. [Ranking Functions](#ranking-functions)\n3. [Offset Functions](#offset-functions)\n4. [Aggregate Window Functions](#aggregate-window-functions)\n5. [Window Frame Definition](#window-frame-definition)\n6. [Typical Business Scenarios](#typical-business-scenarios)\n7. [Performance Caveats](#performance-caveats)\n\n---\n\n## Window Function Overview\n\nWindow functions compute over a result set without changing the row count. Basic syntax:\n\n```sql\nfunction_name(args) OVER (\n [PARTITION BY partition_columns]\n [ORDER BY sort_columns [ASC|DESC]]\n [window_frame]\n)\n```\n\n| Component | Purpose | Required |\n|------|------|---------|\n| `PARTITION BY` | Group rows; each group computed independently | Optional (omitted: the entire result set is one partition) |\n| `ORDER BY` | Order within the group | Required for ranking/offset functions; optional for aggregates |\n| `window_frame` | Defines the rows used in computation | Optional (has a default) |\n\n---\n\n## Ranking Functions\n\n### ROW_NUMBER()\n\nGenerates a sequential, unique number for each row; does not handle ties.\n\n```sql\n-- Salary ranking within each department (ties get different numbers)\nSELECT\n department,\n employee_name,\n salary,\n ROW_NUMBER() OVER(PARTITION BY department ORDER BY salary DESC) AS rn\nFROM employees;\n\n-- Example result:\n-- Sales | Zhang San | 15000 | 1\n-- Sales | Li Si | 15000 | 2 \u003c- Same salary, different number\n-- Sales | Wang Wu | 12000 | 3\n```\n\n### RANK()\n\nTies receive the same rank; the next rank skips ahead.\n\n```sql\nSELECT\n department,\n employee_name,\n salary,\n RANK() OVER(PARTITION BY department ORDER BY salary DESC) AS rnk\nFROM employees;\n\n-- Example result:\n-- Sales | Zhang San | 15000 | 1\n-- Sales | Li Si | 15000 | 1 \u003c- Tied at 1\n-- Sales | Wang Wu | 12000 | 3 \u003c- Skips 2\n```\n\n### DENSE_RANK()\n\nTies receive the same rank; the next rank does not skip.\n\n```sql\nSELECT\n department,\n employee_name,\n salary,\n DENSE_RANK() OVER(PARTITION BY department ORDER BY salary DESC) AS dense_rnk\nFROM employees;\n\n-- Example result:\n-- Sales | Zhang San | 15000 | 1\n-- Sales | Li Si | 15000 | 1 \u003c- Tied at 1\n-- Sales | Wang Wu | 12000 | 2 \u003c- Continues at 2 (no skip)\n```\n\n### Ranking Function Comparison\n\n| Function | Tie handling | Continuity | Typical scenario |\n|------|---------|--------|---------|\n| `ROW_NUMBER()` | No tie handling | Continuous | Pagination, deduplication (take first per group) |\n| `RANK()` | Ties share rank | Non-continuous | Competition rankings |\n| `DENSE_RANK()` | Ties share rank | Continuous | Top-N categories (top 3 may include more than 3 people) |\n\n---\n\n## Offset Functions\n\n### LAG() / LEAD()\n\nAccess data N rows before or after the current row.\n\n```sql\n-- LAG: access the previous row (period-over-period calculation)\nSELECT\n dt,\n revenue,\n LAG(revenue, 1) OVER(ORDER BY dt) AS prev_day_revenue, -- previous day\n LAG(revenue, 7) OVER(ORDER BY dt) AS prev_week_revenue, -- same day last week\n LAG(revenue, 1, 0) OVER(ORDER BY dt) AS prev_day_or_zero -- third argument: default value\nFROM daily_stats;\n\n-- LEAD: access the next row\nSELECT\n dt,\n revenue,\n LEAD(revenue, 1) OVER(ORDER BY dt) AS next_day_revenue\nFROM daily_stats;\n```\n\n### FIRST_VALUE() / LAST_VALUE()\n\nGet the value of the first or last row in the window.\n\n```sql\n-- Each user's first and most recent order amount\nSELECT\n user_id,\n order_date,\n amount,\n FIRST_VALUE(amount) OVER(\n PARTITION BY user_id ORDER BY order_date\n ) AS first_order_amount,\n LAST_VALUE(amount) OVER(\n PARTITION BY user_id ORDER BY order_date\n ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING\n ) AS last_order_amount\nFROM orders;\n```\n\n> **Note:** The default frame for `LAST_VALUE` is `ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`. You must explicitly specify `UNBOUNDED FOLLOWING` to retrieve the true last row.\n\n### NTH_VALUE()\n\nGet the value of the Nth row in the window.\n\n```sql\n-- Find the second-highest-paid employee in each department\nSELECT\n department,\n employee_name,\n salary,\n NTH_VALUE(salary, 2) OVER(\n PARTITION BY department ORDER BY salary DESC\n ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING\n ) AS second_highest\nFROM employees;\n```\n\n---\n\n## Aggregate Window Functions\n\nStandard aggregate functions (SUM, AVG, COUNT, MIN, MAX) can all be used as window functions.\n\n```sql\nSELECT\n dt,\n revenue,\n -- Cumulative sum\n SUM(revenue) OVER(ORDER BY dt) AS cumulative_revenue,\n -- Rolling average (last 7 days)\n AVG(revenue) OVER(\n ORDER BY dt ROWS BETWEEN 6 PRECEDING AND CURRENT ROW\n ) AS rolling_7d_avg,\n -- Rolling maximum\n MAX(revenue) OVER(\n ORDER BY dt ROWS BETWEEN 6 PRECEDING AND CURRENT ROW\n ) AS rolling_7d_max,\n -- Group total (for computing share)\n SUM(revenue) OVER(PARTITION BY region) AS region_total,\n -- Share\n ROUND(revenue * 100.0 / SUM(revenue) OVER(PARTITION BY region), 2) AS pct_of_region\nFROM daily_stats;\n```\n\n---\n\n## Window Frame Definition\n\n### Syntax\n\n```sql\n{ROWS | RANGE} BETWEEN frame_start AND frame_end\n\n-- frame_start / frame_end can be:\n-- UNBOUNDED PRECEDING -- the first row of the partition\n-- N PRECEDING -- N rows before the current row\n-- CURRENT ROW -- the current row\n-- N FOLLOWING -- N rows after the current row\n-- UNBOUNDED FOLLOWING -- the last row of the partition\n```\n\n### ROWS vs RANGE\n\n| Type | Meaning | Use case |\n|------|------|---------|\n| `ROWS` | Physical row count | Rolling window (last 7 rows) |\n| `RANGE` | Value range | Value-range window (last 7 days, including all rows for the same day) |\n\n```sql\n-- ROWS: exact physical row count\nSUM(revenue) OVER(ORDER BY dt ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)\n-- Meaning: previous 6 rows + current row = 7 rows\n\n-- RANGE: by value range (rows with the same dt form a group)\nSUM(revenue) OVER(ORDER BY dt RANGE BETWEEN INTERVAL 6 DAY PRECEDING AND CURRENT ROW)\n-- Meaning: all rows where dt is in [current dt - 6 days, current dt]\n```\n\n### Default Frame\n\n| ORDER BY present? | Default frame |\n|--------------|-----------|\n| No ORDER BY | `ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING` (entire partition) |\n| With ORDER BY | `RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW` (from partition start to current row) |\n\n---\n\n## Typical Business Scenarios\n\n### Scenario 1: Top-N Query\n\n```sql\n-- Top 3 best-selling products in each category\nWITH ranked AS (\n SELECT\n category,\n product_name,\n sales,\n ROW_NUMBER() OVER(PARTITION BY category ORDER BY sales DESC) AS rn\n FROM product_sales\n)\nSELECT category, product_name, sales\nFROM ranked\nWHERE rn \u003c= 3;\n```\n\n### Scenario 2: Period-Over-Period Calculation\n\n```sql\n-- Day-over-day and week-over-week\nSELECT\n dt,\n revenue,\n -- Day-over-day\n ROUND(\n (revenue - LAG(revenue, 1) OVER(ORDER BY dt))\n * 100.0 / LAG(revenue, 1) OVER(ORDER BY dt),\n 2\n ) AS day_over_day_pct,\n -- Week-over-week\n ROUND(\n (revenue - LAG(revenue, 7) OVER(ORDER BY dt))\n * 100.0 / LAG(revenue, 7) OVER(ORDER BY dt),\n 2\n ) AS week_over_week_pct\nFROM daily_revenue\nWHERE dt BETWEEN '2024-01-01' AND '2024-01-31';\n```\n\n### Scenario 3: Cumulative Computation\n\n```sql\n-- Month-to-date cumulative sales and completion progress\nSELECT\n dt,\n revenue,\n SUM(revenue) OVER(\n PARTITION BY DATE_TRUNC('month', dt)\n ORDER BY dt\n ) AS mtd_revenue, -- Month-to-Date\n ROUND(\n SUM(revenue) OVER(PARTITION BY DATE_TRUNC('month', dt) ORDER BY dt)\n * 100.0 / monthly_target,\n 2\n ) AS completion_pct\nFROM daily_revenue\nJOIN monthly_targets ON DATE_TRUNC('month', daily_revenue.dt) = monthly_targets.month;\n```\n\n### Scenario 4: Per-Group Deduplication (Latest Row per Group)\n\n```sql\n-- Most recent order per user\nWITH latest AS (\n SELECT *,\n ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY order_time DESC) AS rn\n FROM orders\n)\nSELECT * FROM latest WHERE rn = 1;\n```\n\n### Scenario 5: Consecutive Event Detection\n\n```sql\n-- Detect users who logged in for 3 consecutive days\nWITH daily_logins AS (\n SELECT DISTINCT user_id, DATE(login_time) AS login_date\n FROM login_logs\n),\nnumbered AS (\n SELECT\n user_id,\n login_date,\n login_date - INTERVAL ROW_NUMBER() OVER(\n PARTITION BY user_id ORDER BY login_date\n ) DAY AS group_id\n FROM daily_logins\n)\nSELECT user_id, MIN(login_date) AS start_date, COUNT(*) AS consecutive_days\nFROM numbered\nGROUP BY user_id, group_id\nHAVING COUNT(*) >= 3;\n```\n\n### Scenario 6: Share Analysis\n\n```sql\n-- Share of each region's revenue relative to the total\nSELECT\n region,\n revenue,\n ROUND(\n revenue * 100.0 / SUM(revenue) OVER(), -- no PARTITION BY = global total\n 2\n ) AS pct_of_total,\n ROUND(\n SUM(revenue) OVER(ORDER BY revenue DESC)\n * 100.0 / SUM(revenue) OVER(),\n 2\n ) AS cumulative_pct -- cumulative share (Pareto analysis)\nFROM regional_stats\nORDER BY revenue DESC;\n```\n\n---\n\n## Performance Caveats\n\n### Choosing PARTITION BY\n\n| Scenario | Suggestion |\n|------|------|\n| PARTITION BY column with very low cardinality (e.g., gender) | Large per-partition data; high memory pressure |\n| PARTITION BY column with extremely high cardinality (e.g., billion-level user_id) | Too many partitions; high scheduling overhead |\n| Recommended cardinality | Similar to GROUP BY; thousands to millions is optimal |\n\n### Avoid Redundant Computation\n\n```sql\n-- Anti-pattern: same window definition repeated multiple times\nSELECT\n dt,\n SUM(x) OVER(PARTITION BY region ORDER BY dt),\n AVG(x) OVER(PARTITION BY region ORDER BY dt),\n COUNT(x) OVER(PARTITION BY region ORDER BY dt)\nFROM t;\n\n-- Correct: StarRocks automatically merges computations on identical window definitions\n-- The above pattern incurs no extra cost in StarRocks (the optimizer merges them)\n-- However, using the WINDOW clause improves readability:\nSELECT\n dt,\n SUM(x) OVER w,\n AVG(x) OVER w,\n COUNT(x) OVER w\nFROM t\nWINDOW w AS (PARTITION BY region ORDER BY dt);\n```\n\n### Window Function vs GROUP BY\n\n```sql\n-- If you don't need to keep the original rows, use GROUP BY (more efficient)\nSELECT region, SUM(revenue) FROM sales GROUP BY region;\n\n-- If you need to keep original rows + aggregate values, use a window function\nSELECT region, dt, revenue,\n SUM(revenue) OVER(PARTITION BY region) AS region_total\nFROM sales;\n```\n","content_type":"text/markdown; charset=utf-8","language":"markdown","size":10690,"content_sha256":"df043f79b1f60c439863c13e6d789449182f2f5c6ae3f47c17d1a82ea8c01636"},{"filename":"scripts/sr_connect/__init__.py","content":"\"\"\"alibabacloud-starrocks-connect: base skill for StarRocks EMR Serverless access.\"\"\"\n\n__version__ = \"0.1.0\"\n","content_type":"text/x-python; charset=utf-8","language":"python","size":109,"content_sha256":"e4d16a2f96c0e06e15e8ec717b8c8f97e7f3b11e5df0645712af4034ce8c435b"},{"filename":"scripts/sr_connect/classify.py","content":"\"\"\"Classify SQL statements as READ vs non-READ for the srsql execution gate.\n\nUses sqlglot with dialect=\"starrocks\". Conservative: only inspects AST root\nnode class names; does not depend on deep parsing accuracy.\n\nWhen sqlglot cannot parse a statement (StarRocks-specific syntax it doesn't\nmodel yet, or genuine syntax errors), falls back to a leading-keyword match.\nThe fallback is best-effort and marked with a warning so the gate / Skill\ncan surface it to the user.\n\nVerdict semantics for the gate:\n READ -> executes directly\n everything else (incl. UNKNOWN) -> requires --yes\n\"\"\"\n\nfrom __future__ import annotations\n\nfrom dataclasses import dataclass\nfrom enum import Enum\n\nimport sqlglot\nimport sqlglot.errors\nimport sqlglot.expressions as exp\n\n\nDIALECT = \"starrocks\"\n\n\nclass Verdict(str, Enum):\n READ = \"read\"\n WRITE_DML = \"write_dml\" # INSERT / UPDATE / DELETE / MERGE / LOAD\n DDL = \"ddl\" # CREATE / ALTER / DROP / TRUNCATE\n ADMIN = \"admin\" # GRANT / REVOKE / CREATE USER / SET CONFIG ...\n SESSION = \"session\" # SET / USE / COMMIT / ROLLBACK\n UNKNOWN = \"unknown\" # parse failure and no keyword match\n\n\n# AST root class name -> Verdict.\n# Stable surface: depends only on root-level node identity, never deep AST.\n_ROOT_CLASS_TO_VERDICT: dict[str, Verdict] = {\n # Read\n \"Select\": Verdict.READ,\n \"Union\": Verdict.READ,\n \"Intersect\": Verdict.READ,\n \"Except\": Verdict.READ,\n \"With\": Verdict.READ,\n \"Subquery\": Verdict.READ,\n \"Show\": Verdict.READ,\n \"Describe\": Verdict.READ,\n \"Pragma\": Verdict.READ,\n # Write (DML)\n \"Insert\": Verdict.WRITE_DML,\n \"Update\": Verdict.WRITE_DML,\n \"Delete\": Verdict.WRITE_DML,\n \"Merge\": Verdict.WRITE_DML,\n # DDL\n \"Create\": Verdict.DDL,\n \"Drop\": Verdict.DDL,\n \"Alter\": Verdict.DDL,\n \"AlterColumn\": Verdict.DDL,\n \"TruncateTable\": Verdict.DDL,\n # Admin (sqlglot has dedicated classes for some)\n \"Kill\": Verdict.ADMIN,\n # Session-scoped\n \"Set\": Verdict.SESSION,\n \"Use\": Verdict.SESSION,\n \"Transaction\": Verdict.SESSION,\n \"Commit\": Verdict.SESSION,\n \"Rollback\": Verdict.SESSION,\n}\n\n\n# Leading-keyword -> Verdict, used by:\n# (a) exp.Command (sqlglot's catch-all for un-modeled statements)\n# (b) Parse-failure fallback (sqlglot couldn't parse at all)\n# Ordering is critical: longer / more-specific prefixes MUST come before\n# shorter ones, because the first prefix match wins.\n_COMMAND_VERBS: list[tuple[str, Verdict]] = [\n # Two-word: user/role admin\n (\"CREATE USER\", Verdict.ADMIN),\n (\"CREATE ROLE\", Verdict.ADMIN),\n (\"DROP USER\", Verdict.ADMIN),\n (\"DROP ROLE\", Verdict.ADMIN),\n (\"ALTER USER\", Verdict.ADMIN),\n (\"ALTER ROLE\", Verdict.ADMIN),\n (\"SET PASSWORD\", Verdict.ADMIN),\n (\"SET DEFAULT\", Verdict.ADMIN), # SET DEFAULT ROLE\n (\"SET ROLE\", Verdict.SESSION),\n # Two-word: write/refresh\n (\"SUBMIT TASK\", Verdict.WRITE_DML),\n (\"REFRESH MATERIALIZED\", Verdict.WRITE_DML),\n # Single-word admin\n (\"GRANT\", Verdict.ADMIN),\n (\"REVOKE\", Verdict.ADMIN),\n (\"ADMIN\", Verdict.ADMIN), # ADMIN SET CONFIG / REPAIR / ...\n (\"CANCEL\", Verdict.ADMIN),\n (\"PAUSE\", Verdict.ADMIN),\n (\"RESUME\", Verdict.ADMIN),\n (\"STOP\", Verdict.ADMIN),\n (\"KILL\", Verdict.ADMIN),\n (\"ANALYZE\", Verdict.ADMIN),\n (\"BACKUP\", Verdict.ADMIN),\n (\"RESTORE\", Verdict.ADMIN),\n (\"RECOVER\", Verdict.ADMIN),\n (\"INSTALL\", Verdict.ADMIN),\n (\"UNINSTALL\", Verdict.ADMIN),\n # Write (job-level commands)\n (\"LOAD\", Verdict.WRITE_DML),\n (\"REFRESH\", Verdict.WRITE_DML),\n # Read\n (\"EXPLAIN\", Verdict.READ),\n (\"DESCRIBE\", Verdict.READ),\n (\"DESC\", Verdict.READ),\n (\"HELP\", Verdict.READ),\n (\"SHOW\", Verdict.READ),\n (\"WITH\", Verdict.READ),\n (\"SELECT\", Verdict.READ),\n # Session\n (\"USE\", Verdict.SESSION),\n (\"SET\", Verdict.SESSION),\n # DDL fallback (must come AFTER more-specific user/role variants above)\n (\"CREATE\", Verdict.DDL),\n (\"DROP\", Verdict.DDL),\n (\"ALTER\", Verdict.DDL),\n (\"TRUNCATE\", Verdict.DDL),\n (\"RENAME\", Verdict.DDL),\n # DML fallback\n (\"INSERT\", Verdict.WRITE_DML),\n (\"UPDATE\", Verdict.WRITE_DML),\n (\"DELETE\", Verdict.WRITE_DML),\n (\"MERGE\", Verdict.WRITE_DML),\n]\n\n\n_TYPE_LABEL: dict[str, str] = {\n \"Select\": \"SELECT\",\n \"Union\": \"UNION\",\n \"Intersect\": \"INTERSECT\",\n \"Except\": \"EXCEPT\",\n \"With\": \"WITH (CTE)\",\n \"Subquery\": \"SUBQUERY\",\n \"Show\": \"SHOW\",\n \"Describe\": \"DESCRIBE\",\n \"Pragma\": \"PRAGMA\",\n \"Insert\": \"INSERT\",\n \"Update\": \"UPDATE\",\n \"Delete\": \"DELETE\",\n \"Merge\": \"MERGE\",\n \"Create\": \"CREATE\",\n \"Drop\": \"DROP\",\n \"Alter\": \"ALTER\",\n \"AlterColumn\": \"ALTER COLUMN\",\n \"TruncateTable\": \"TRUNCATE TABLE\",\n \"Kill\": \"KILL\",\n \"Set\": \"SET\",\n \"Use\": \"USE\",\n \"Transaction\": \"TRANSACTION\",\n \"Commit\": \"COMMIT\",\n \"Rollback\": \"ROLLBACK\",\n}\n\n\n@dataclass(frozen=True)\nclass Classification:\n verdict: Verdict\n statement_type: str\n target: str | None = None\n warning: str | None = None\n\n @property\n def is_read_only(self) -> bool:\n return self.verdict == Verdict.READ\n\n\ndef classify(sql: str) -> list[Classification]:\n \"\"\"Classify each top-level statement in `sql`. Multi-statement supported.\"\"\"\n if not sql or not sql.strip():\n return [Classification(\n verdict=Verdict.UNKNOWN,\n statement_type=\"EMPTY\",\n warning=\"no SQL statement detected\",\n )]\n try:\n asts = sqlglot.parse(sql, dialect=DIALECT)\n except sqlglot.errors.ParseError as e:\n return [_fallback_keyword_classify(sql, parse_error=str(e))]\n if not asts:\n return [_fallback_keyword_classify(sql, parse_error=\"no statement detected (comments only?)\")]\n results = []\n for ast in asts:\n if ast is None:\n results.append(_fallback_keyword_classify(sql, parse_error=\"statement could not be parsed\"))\n else:\n results.append(_classify_node(ast))\n return results\n\n\ndef classify_one(sql: str) -> Classification:\n \"\"\"Single decision for a SQL blob. Multi-statement aggregates to the\n most permission-demanding verdict (any non-READ wins).\"\"\"\n results = classify(sql)\n if not results:\n return Classification(\n verdict=Verdict.UNKNOWN,\n statement_type=\"EMPTY\",\n warning=\"no SQL statement detected\",\n )\n if len(results) == 1:\n return results[0]\n non_reads = [r for r in results if not r.is_read_only]\n if non_reads:\n first = non_reads[0]\n return Classification(\n verdict=first.verdict,\n statement_type=f\"MIXED ({len(results)} stmts, first non-READ: {first.statement_type})\",\n target=first.target,\n warning=f\"batch contains {len(results)} statements; gating on first non-READ\",\n )\n return Classification(\n verdict=Verdict.READ,\n statement_type=f\"MULTI READ ({len(results)} stmts)\",\n )\n\n\ndef _classify_node(ast: exp.Expression) -> Classification:\n cls_name = type(ast).__name__\n verdict = _ROOT_CLASS_TO_VERDICT.get(cls_name)\n if verdict is not None:\n return Classification(\n verdict=verdict,\n statement_type=_TYPE_LABEL.get(cls_name, cls_name.upper()),\n target=_extract_target(ast),\n )\n if isinstance(ast, exp.Command):\n verb = (ast.this or \"\").upper().strip()\n rest = \"\"\n if ast.expression is not None:\n rest = getattr(ast.expression, \"name\", \"\") or \"\"\n full = f\"{verb} {rest}\".upper().strip()\n for prefix, v in _COMMAND_VERBS:\n if full.startswith(prefix):\n return Classification(verdict=v, statement_type=prefix)\n return Classification(\n verdict=Verdict.UNKNOWN,\n statement_type=verb or \"UNKNOWN\",\n warning=f\"unrecognized statement verb: {verb!r}\",\n )\n return Classification(\n verdict=Verdict.UNKNOWN,\n statement_type=cls_name.upper(),\n warning=f\"unrecognized AST root: {cls_name}\",\n )\n\n\ndef _fallback_keyword_classify(sql: str, parse_error: str) -> Classification:\n \"\"\"Best-effort classification by leading keyword when sqlglot fails.\"\"\"\n leader = _leading_keyword(sql)\n if not leader:\n return Classification(\n verdict=Verdict.UNKNOWN,\n statement_type=\"EMPTY\",\n warning=parse_error,\n )\n for prefix, v in _COMMAND_VERBS:\n if leader.startswith(prefix):\n return Classification(\n verdict=v,\n statement_type=prefix,\n warning=(\n f\"sqlglot could not parse; classified by leading keyword \"\n f\"(best-effort). Reason: {parse_error[:120]}\"\n ),\n )\n one_word = leader.split()[0]\n return Classification(\n verdict=Verdict.UNKNOWN,\n statement_type=one_word,\n warning=f\"unrecognized leading keyword {one_word!r}; parse error: {parse_error[:120]}\",\n )\n\n\ndef _leading_keyword(sql: str) -> str:\n \"\"\"Strip leading comments + whitespace, return up to 3 leading words uppercased.\"\"\"\n s = sql\n while True:\n s = s.lstrip()\n if s.startswith(\"--\"):\n nl = s.find(\"\\n\")\n s = s[nl + 1:] if nl >= 0 else \"\"\n elif s.startswith(\"#\"):\n nl = s.find(\"\\n\")\n s = s[nl + 1:] if nl >= 0 else \"\"\n elif s.startswith(\"/*\"):\n end = s.find(\"*/\")\n s = s[end + 2:] if end >= 0 else \"\"\n else:\n break\n if not s:\n return \"\"\n parts = s.split(None, 3)\n return \" \".join(parts[:3]).upper()\n\n\ndef _extract_target(ast: exp.Expression) -> str | None:\n \"\"\"Best-effort target table/object name for display. Returns None when\n the target is not a simple identifier (e.g. SELECT subqueries).\"\"\"\n target_node = getattr(ast, \"this\", None)\n if target_node is None:\n return None\n if isinstance(target_node, exp.Table):\n return target_node.sql(dialect=DIALECT)\n if isinstance(target_node, exp.Schema):\n inner = target_node.this\n if isinstance(inner, exp.Table):\n return inner.sql(dialect=DIALECT)\n if hasattr(target_node, \"sql\"):\n try:\n s = target_node.sql(dialect=DIALECT)\n if len(s) \u003c= 100 and \"\\n\" not in s:\n return s\n except Exception:\n pass\n return None\n","content_type":"text/x-python; charset=utf-8","language":"python","size":10505,"content_sha256":"c09a558b87ded3f30c204d80443594aa64e9761f79e005e2534f455a8f55938c"},{"filename":"scripts/sr_connect/cli.py","content":"\"\"\"click CLI entry points: sr-login / sr-logout / sr-whoami / srsql.\"\"\"\n\nfrom __future__ import annotations\n\nimport os\nimport sys\n\nimport click\n\nfrom . import doctor as doctor_mod\nfrom . import login as login_mod\nfrom . import logout as logout_mod\nfrom . import query as query_mod\nfrom . import whoami as whoami_mod\nfrom .errors import SRConnectError\n\n\ndef _read_password_interactive(prompt: str) -> str:\n if not sys.stdin.isatty():\n _fail(\n \"sr-login needs a password but stdin is not a TTY. \"\n \"Set SR_PASSWORD in the environment for non-interactive login, \"\n \"or run sr-login from an interactive terminal.\"\n )\n return click.prompt(prompt, hide_input=True, err=True)\n\n\ndef _fail(msg: str, code: int = 1) -> None:\n click.echo(f\"ERROR: {msg}\", err=True)\n sys.exit(code)\n\n\[email protected](name=\"sr-login\")\[email protected](\"--host\", default=None, help=\"StarRocks FE host (required unless --from-env)\")\[email protected](\"--port\", default=9030, show_default=True, type=int)\[email protected](\"--user\", default=None, help=\"StarRocks user account (required unless --from-env)\")\[email protected](\n \"--password\", default=None, envvar=\"SR_PASSWORD\",\n help=\"Password. Prefer interactive prompt; SR_PASSWORD env var supported.\"\n)\[email protected](\n \"--profile\", default=\"default\", show_default=True,\n help=\"Local profile name; supports multi-cluster (use SR_PROFILE to select at runtime).\"\n)\[email protected](\"--ssl\", is_flag=True, help=\"Enable TLS (rarely needed; EMR Serverless uses plain TCP)\")\[email protected](\n \"--from-env\", \"from_env\", is_flag=True,\n help=(\n \"Non-interactive login for CI / sandbox setup scripts. \"\n \"Reads required SR_HOST / SR_USER / SR_PASSWORD and optional SR_PORT (default 9030) \"\n \"from the environment. Exits 2 if any required variable is missing.\"\n ),\n)\ndef login_cmd(host, port, user, password, profile, ssl, from_env):\n \"\"\"Log into a StarRocks cluster (registers credentials locally; overwrites if exists).\"\"\"\n if from_env:\n host = host or os.environ.get(\"SR_HOST\")\n user = user or os.environ.get(\"SR_USER\")\n sr_port = os.environ.get(\"SR_PORT\")\n if sr_port:\n try:\n port = int(sr_port)\n except ValueError:\n _fail(f\"SR_PORT must be an integer, got: {sr_port!r}\")\n missing = [\n name for name, value in (\n (\"SR_HOST\", host),\n (\"SR_USER\", user),\n (\"SR_PASSWORD\", password),\n ) if not value\n ]\n if missing:\n _fail(\n \"--from-env requires these environment variables to be set: \"\n + \", \".join(missing),\n code=2,\n )\n else:\n if not host:\n _fail(\"Missing --host\")\n if not user:\n _fail(\"Missing --user\")\n if password is None:\n password = _read_password_interactive(f\"Password for {user}@{host}\")\n try:\n login_mod.run_login(\n host=host, port=port, user=user, password=password,\n profile=profile, ssl=ssl,\n )\n except SRConnectError as e:\n _fail(str(e))\n\n\[email protected](name=\"sr-logout\")\[email protected](\n \"--profile\",\n default=lambda: os.environ.get(\"SR_PROFILE\", \"default\"),\n show_default=\"default or $SR_PROFILE\",\n)\ndef logout_cmd(profile):\n \"\"\"Remove local profile for a cluster.\"\"\"\n try:\n logout_mod.run_logout(profile=profile)\n except SRConnectError as e:\n _fail(str(e))\n\n\[email protected](name=\"sr-whoami\")\[email protected](\n \"--profile\",\n default=lambda: os.environ.get(\"SR_PROFILE\", \"default\"),\n show_default=\"default or $SR_PROFILE\",\n)\ndef whoami_cmd(profile):\n \"\"\"Print current profile state and captured grants.\"\"\"\n try:\n whoami_mod.run_whoami(profile=profile)\n except SRConnectError as e:\n _fail(str(e))\n\n\[email protected](name=\"sr-doctor\")\[email protected](\n \"--host\",\n default=lambda: os.environ.get(\"SR_HOST\"),\n help=\"StarRocks FE host to diagnose (defaults to SR_HOST env var)\",\n)\[email protected](\n \"--port\",\n default=lambda: int(os.environ.get(\"SR_PORT\", \"9030\")),\n show_default=\"9030 or $SR_PORT\",\n type=int,\n)\ndef doctor_cmd(host, port):\n \"\"\"Diagnose connection failures and print actionable next steps.\"\"\"\n if not host:\n _fail(\"Missing --host (or set SR_HOST in the environment)\")\n sys.exit(doctor_mod.run_doctor(host, port))\n\n\[email protected](name=\"srsql\")\[email protected](\"-e\", \"--execute\", \"sql\", required=True, help=\"SQL to execute\")\[email protected](\n \"--profile\",\n default=lambda: os.environ.get(\"SR_PROFILE\", \"default\"),\n show_default=\"default or $SR_PROFILE\",\n)\[email protected](\n \"-f\", \"--format\", \"fmt\",\n default=lambda: os.environ.get(\"SR_FORMAT\", \"tsv\"),\n type=click.Choice(query_mod.SUPPORTED_FORMATS),\n help=\"Output format\",\n)\[email protected](\"--yes\", is_flag=True, help=\"Confirm execution of non-READ statements\")\[email protected](\n \"--dry-run\", is_flag=True,\n help=\"Classify SQL and print verdict; do not execute\"\n)\ndef srsql_cmd(sql, profile, fmt, yes, dry_run):\n \"\"\"Execute SQL against a logged-in cluster. Non-READ statements require --yes.\"\"\"\n try:\n rc = query_mod.run_query(\n profile=profile, sql=sql, fmt=fmt,\n confirm=yes, dry_run=dry_run,\n )\n sys.exit(rc)\n except SRConnectError as e:\n _fail(str(e))\n","content_type":"text/x-python; charset=utf-8","language":"python","size":5439,"content_sha256":"c50efc81a9d93d8dde46c2546e9054e8e7d5f2b3813d47d240cc308b1e941b42"},{"filename":"scripts/sr_connect/config.py","content":"\"\"\"Read/write ~/.starrocks/{profile}.cnf — MySQL client compatible INI format.\n\nPer-profile state lives in two files (both mode 600 under a 700 directory):\n {profile}.cnf INI with [client] (pymysql-compatible) + [meta] section\n {profile}.grants Raw SHOW GRANTS FOR CURRENT_USER() output, plain text\n\"\"\"\n\nfrom __future__ import annotations\n\nimport configparser\nimport os\nimport stat\nfrom dataclasses import dataclass\nfrom pathlib import Path\n\nfrom .errors import ConfigNotFoundError\n\n\n@dataclass\nclass ProfileConfig:\n host: str\n port: int\n user: str\n password: str\n ssl: bool = False\n logged_in_at: str | None = None\n\n\ndef config_dir() -> Path:\n return Path.home() / \".starrocks\"\n\n\ndef config_path(profile: str) -> Path:\n return config_dir() / f\"{profile}.cnf\"\n\n\ndef grants_path(profile: str) -> Path:\n return config_dir() / f\"{profile}.grants\"\n\n\ndef exists(profile: str) -> bool:\n return config_path(profile).exists()\n\n\ndef read(profile: str) -> ProfileConfig:\n path = config_path(profile)\n if not path.exists():\n raise ConfigNotFoundError(\n f\"Profile '{profile}' not found at {path}. \"\n f\"Run sr-login first, or check SR_PROFILE env var.\"\n )\n parser = configparser.ConfigParser()\n parser.read(path)\n if \"client\" not in parser:\n raise ConfigNotFoundError(f\"{path} missing [client] section\")\n client = parser[\"client\"]\n meta = parser[\"meta\"] if \"meta\" in parser else {}\n return ProfileConfig(\n host=client[\"host\"],\n port=int(client.get(\"port\", \"9030\")),\n user=client[\"user\"],\n password=client[\"password\"],\n ssl=client.getboolean(\"ssl\", fallback=False),\n logged_in_at=meta.get(\"logged_in_at\") or None,\n )\n\n\ndef write(profile: str, cfg: ProfileConfig) -> Path:\n \"\"\"Write profile config with strict mode 600. Creates ~/.starrocks with 700 if missing.\"\"\"\n d = config_dir()\n d.mkdir(parents=True, exist_ok=True)\n os.chmod(d, stat.S_IRWXU) # 700\n\n path = config_path(profile)\n parser = configparser.ConfigParser()\n parser[\"client\"] = {\n \"host\": cfg.host,\n \"port\": str(cfg.port),\n \"user\": cfg.user,\n \"password\": cfg.password,\n }\n if cfg.ssl:\n parser[\"client\"][\"ssl\"] = \"true\"\n if cfg.logged_in_at:\n parser[\"meta\"] = {\"logged_in_at\": cfg.logged_in_at}\n\n tmp = path.with_suffix(\".cnf.tmp\")\n fd = os.open(tmp, os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o600)\n try:\n with os.fdopen(fd, \"w\") as f:\n f.write(f\"# Managed by sr-login. Do not edit manually.\\n\")\n f.write(f\"# Profile: {profile}\\n\")\n parser.write(f)\n except Exception:\n tmp.unlink(missing_ok=True)\n raise\n os.replace(tmp, path)\n os.chmod(path, 0o600)\n return path\n\n\ndef read_grants(profile: str) -> str:\n p = grants_path(profile)\n if not p.exists():\n return \"\"\n return p.read_text()\n\n\ndef write_grants(profile: str, grants: str) -> Path:\n d = config_dir()\n d.mkdir(parents=True, exist_ok=True)\n os.chmod(d, stat.S_IRWXU)\n p = grants_path(profile)\n tmp = p.with_suffix(\".grants.tmp\")\n fd = os.open(tmp, os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o600)\n try:\n with os.fdopen(fd, \"w\") as f:\n f.write(grants)\n except Exception:\n tmp.unlink(missing_ok=True)\n raise\n os.replace(tmp, p)\n os.chmod(p, 0o600)\n return p\n\n\ndef remove(profile: str) -> bool:\n \"\"\"Delete profile config + grants. Returns True if .cnf was removed.\"\"\"\n removed = False\n path = config_path(profile)\n if path.exists():\n path.unlink()\n removed = True\n gp = grants_path(profile)\n if gp.exists():\n gp.unlink()\n return removed\n","content_type":"text/x-python; charset=utf-8","language":"python","size":3758,"content_sha256":"b242636ab89983bad4029281cbf7884f52d827c7366bbc541aeea3c2f31880d1"},{"filename":"scripts/sr_connect/connection.py","content":"\"\"\"pymysql-based StarRocks connection helpers.\"\"\"\n\nfrom __future__ import annotations\n\nfrom contextlib import contextmanager\nfrom dataclasses import dataclass\nfrom typing import Iterator\n\nimport pymysql\n\nfrom .errors import ConnectionError as SRConnectionError\n\n\n@dataclass\nclass ConnParams:\n host: str\n port: int = 9030\n user: str = \"\"\n password: str = \"\"\n ssl: bool = False\n database: str | None = None\n connect_timeout: int = 10\n\n\ndef connect(params: ConnParams) -> pymysql.connections.Connection:\n \"\"\"Open a pymysql connection. Raises SRConnectionError on failure.\"\"\"\n kwargs = dict(\n host=params.host,\n port=params.port,\n user=params.user,\n password=params.password,\n connect_timeout=params.connect_timeout,\n charset=\"utf8mb4\",\n autocommit=True,\n )\n if params.database:\n kwargs[\"database\"] = params.database\n if params.ssl:\n # Empty dict is the pymysql idiom for \"enable TLS with defaults\".\n kwargs[\"ssl\"] = {}\n try:\n return pymysql.connect(**kwargs)\n except pymysql.err.OperationalError as e:\n raise SRConnectionError(\n f\"Cannot connect to {params.host}:{params.port} as {params.user}: \"\n f\"{e.args[1] if len(e.args) > 1 else e}\"\n ) from e\n\n\n@contextmanager\ndef connection(params: ConnParams) -> Iterator[pymysql.connections.Connection]:\n conn = connect(params)\n try:\n yield conn\n finally:\n conn.close()\n\n\ndef ping(params: ConnParams) -> None:\n \"\"\"Verify connection + run SELECT 1. Raises on any failure.\"\"\"\n with connection(params) as conn:\n with conn.cursor() as cur:\n cur.execute(\"SELECT 1\")\n cur.fetchone()\n\n\ndef capture_grants(conn) -> str:\n \"\"\"Run SHOW GRANTS FOR CURRENT_USER() and return joined raw text.\n\n Returns empty string on failure (older StarRocks may not support this);\n callers should treat capabilities as unknown in that case.\n \"\"\"\n try:\n with conn.cursor() as cur:\n cur.execute(\"SHOW GRANTS FOR CURRENT_USER()\")\n rows = cur.fetchall()\n except pymysql.err.MySQLError:\n return \"\"\n lines: list[str] = []\n for row in rows:\n for v in row:\n if isinstance(v, str) and v.strip():\n lines.append(v.strip())\n return \"\\n\".join(lines)\n","content_type":"text/x-python; charset=utf-8","language":"python","size":2358,"content_sha256":"759f78f41e85217d6c708efb2087f296264eb95a43468ac2ad5b0e8aa9ba973b"},{"filename":"scripts/sr_connect/doctor.py","content":"\"\"\"sr-doctor: diagnose connection failures and produce actionable next steps.\n\nEMR Serverless StarRocks exposes two FE endpoint suffixes:\n\n - \"-internal.starrocks.aliyuncs.com\" → VPC internal\n - \".starrocks.aliyuncs.com\" → Public\n\nThe diagnostic logic branches on the suffix:\n\n - VPC + unreachable → suggest the public swap (and enabling it in the console\n if the instance hasn't yet).\n - Public + unreachable → discover the client's egress IP via ipinfo.io and\n suggest a /24 whitelist CIDR (the /24 buffers for NAT\n IP drift within a typical datacenter).\n\"\"\"\n\nfrom __future__ import annotations\n\nimport socket\nimport urllib.error\nimport urllib.request\nfrom dataclasses import dataclass\nfrom typing import Literal\n\nEndpointKind = Literal[\"vpc\", \"public\", \"custom\"]\n\nVPC_SUFFIX = \"-internal.starrocks.aliyuncs.com\"\nPUBLIC_SUFFIX = \".starrocks.aliyuncs.com\"\nIPINFO_URL = \"https://ipinfo.io/ip\"\n\n\ndef classify_endpoint(host: str) -> EndpointKind:\n \"\"\"Classify an FE host by its DNS suffix.\"\"\"\n h = host.lower().strip()\n if h.endswith(VPC_SUFFIX):\n return \"vpc\"\n if h.endswith(PUBLIC_SUFFIX):\n return \"public\"\n return \"custom\"\n\n\ndef can_reach(host: str, port: int, timeout: float = 5.0) -> bool:\n \"\"\"TCP-probe host:port. Returns True if a connection can be established.\"\"\"\n try:\n with socket.create_connection((host, port), timeout=timeout):\n return True\n except (OSError, socket.timeout):\n return False\n\n\ndef get_egress_ip(timeout: float = 5.0) -> str | None:\n \"\"\"Discover the client's public egress IP via ipinfo.io.\n\n Returns the IPv4 string on success, or None if the lookup fails or the\n response doesn't look like an IPv4 address. ipinfo.io was picked over\n api.ipify.org and the Aliyun ECS metadata service because those two were\n observed to be unreliable in agenthub sandboxes (empty body / hangs).\n \"\"\"\n try:\n req = urllib.request.Request(\n IPINFO_URL, headers={\"User-Agent\": \"sr-doctor/1.0\"}\n )\n with urllib.request.urlopen(req, timeout=timeout) as r:\n body = r.read().decode(\"utf-8\").strip()\n except (urllib.error.URLError, OSError, ValueError):\n return None\n parts = body.split(\".\")\n if len(parts) != 4:\n return None\n try:\n if all(0 \u003c= int(p) \u003c= 255 for p in parts):\n return body\n except ValueError:\n pass\n return None\n\n\ndef mask_to_24(ip: str) -> str:\n \"\"\"Return the /24 CIDR for an IPv4 address (e.g. '1.2.3.4' -> '1.2.3.0/24').\"\"\"\n parts = ip.split(\".\")\n if len(parts) != 4:\n raise ValueError(f\"Not an IPv4 address: {ip!r}\")\n return f\"{parts[0]}.{parts[1]}.{parts[2]}.0/24\"\n\n\ndef vpc_to_public(host: str) -> str:\n \"\"\"Rewrite a VPC -internal host to its public counterpart.\n\n Returns the input unchanged if it isn't a -internal host.\n \"\"\"\n lower = host.lower()\n if not lower.endswith(VPC_SUFFIX):\n return host\n idx = lower.rfind(VPC_SUFFIX)\n return host[:idx] + PUBLIC_SUFFIX\n\n\n@dataclass\nclass Diagnosis:\n host: str\n port: int\n endpoint_kind: EndpointKind\n reachable: bool\n egress_ip: str | None = None\n suggested_cidr: str | None = None\n suggested_alternate_host: str | None = None\n\n\ndef diagnose(host: str, port: int) -> Diagnosis:\n \"\"\"Run the full diagnostic pipeline for a single host:port.\"\"\"\n kind = classify_endpoint(host)\n reachable = can_reach(host, port)\n d = Diagnosis(host=host, port=port, endpoint_kind=kind, reachable=reachable)\n if reachable:\n return d\n if kind == \"vpc\":\n d.suggested_alternate_host = vpc_to_public(host)\n elif kind == \"public\":\n ip = get_egress_ip()\n if ip:\n d.egress_ip = ip\n d.suggested_cidr = mask_to_24(ip)\n return d\n\n\ndef _describe_kind(kind: EndpointKind) -> str:\n return {\n \"vpc\": \"VPC (-internal.starrocks.aliyuncs.com)\",\n \"public\": \"Public (.starrocks.aliyuncs.com)\",\n \"custom\": \"Custom (not a recognized EMR Serverless suffix)\",\n }[kind]\n\n\ndef format_diagnosis(d: Diagnosis) -> str:\n \"\"\"Render a Diagnosis as a user-facing multi-line message.\"\"\"\n lines: list[str] = []\n\n if d.reachable:\n lines.append(f\"[OK] {d.host}:{d.port} is reachable.\")\n lines.append(f\" Endpoint type: {_describe_kind(d.endpoint_kind)}\")\n return \"\\n\".join(lines)\n\n lines.append(f\"[!] Cannot reach {d.host}:{d.port}\")\n lines.append(\"\")\n lines.append(f\"Endpoint type: {_describe_kind(d.endpoint_kind)}\")\n\n if d.endpoint_kind == \"vpc\":\n lines.append(\"Cause: Your client is not inside the cluster's VPC.\")\n lines.append(\"\")\n lines.append('Fix: Switch to the public endpoint by removing \"-internal\" from the host:')\n lines.append(\"\")\n lines.append(f\" export SR_HOST={d.suggested_alternate_host}\")\n lines.append(\" sr-login --from-env\")\n lines.append(\"\")\n lines.append(\"If the instance has not enabled the public endpoint yet:\")\n lines.append(\" Console -> EMR Serverless StarRocks -> instance details\")\n lines.append(\" -> Gateway info -> Provision SLB (prerequisite)\")\n lines.append(\" -> Public address -> Enable public endpoint\")\n lines.append(\" (creates a billable CLB; provisioning takes a few minutes)\")\n lines.append(\" Docs: https://help.aliyun.com/zh/emr/emr-serverless-starrocks/manage-gateways\")\n\n elif d.endpoint_kind == \"public\":\n if d.egress_ip and d.suggested_cidr:\n lines.append(f\"Your public egress IP: {d.egress_ip}\")\n lines.append(f\"Suggested whitelist CIDR: {d.suggested_cidr}\")\n lines.append(\" (/24 buffers for NAT IP drift within a cluster)\")\n lines.append(\"\")\n lines.append(\"Fix: Add the CIDR to the cluster whitelist:\")\n lines.append(\" Console -> EMR Serverless -> instance details\")\n lines.append(\" -> Network -> Whitelist -> Add CIDR -> Save\")\n lines.append(\"\")\n lines.append(\"Then retry: sr-login --from-env\")\n else:\n lines.append(\"Could not auto-discover your public egress IP (ipinfo.io unreachable).\")\n lines.append(\"\")\n lines.append(\"Fix: Find your egress IP manually (curl ipinfo.io from this host,\")\n lines.append(\" or ask the platform team) and add it to the cluster whitelist:\")\n lines.append(\" Console -> EMR Serverless -> instance details\")\n lines.append(\" -> Network -> Whitelist -> Add IP/CIDR -> Save\")\n\n else:\n lines.append(\"Cause: Host is not a recognized EMR Serverless StarRocks endpoint.\")\n lines.append(\" Verify the address is correct and that the network path is open.\")\n\n return \"\\n\".join(lines)\n\n\ndef run_doctor(host: str, port: int) -> int:\n \"\"\"sr-doctor entry-point logic. Returns shell exit code (0 reachable, 1 not).\"\"\"\n d = diagnose(host, port)\n print(format_diagnosis(d))\n return 0 if d.reachable else 1\n","content_type":"text/x-python; charset=utf-8","language":"python","size":7128,"content_sha256":"df845da8c29c216fbe8aaa68a22c8865c8bbe9632351cd38e2099d46d9654005"},{"filename":"scripts/sr_connect/errors.py","content":"\"\"\"Custom exceptions for sr_connect.\"\"\"\n\n\nclass SRConnectError(Exception):\n \"\"\"Base error for sr_connect.\"\"\"\n\n\nclass ConfigNotFoundError(SRConnectError):\n \"\"\"Profile config file not found.\"\"\"\n\n\nclass ConnectionError(SRConnectError):\n \"\"\"Failed to connect to StarRocks.\"\"\"\n","content_type":"text/x-python; charset=utf-8","language":"python","size":281,"content_sha256":"ff36797765adf660f2b1a1633c18c76fa7169ffcdab6674c8752f5e9591ee46d"},{"filename":"scripts/sr_connect/login.py","content":"\"\"\"sr-login: register a cluster credential locally + smoke-test connection.\"\"\"\n\nfrom __future__ import annotations\n\nfrom datetime import datetime, timezone\n\nimport click\n\nfrom . import config, connection, doctor\nfrom .errors import ConnectionError as SRConnectionError\n\n\ndef run_login(\n host: str,\n port: int,\n user: str,\n password: str,\n profile: str,\n ssl: bool,\n) -> None:\n \"\"\"Connect with user-provided creds, capture grants, write profile.\"\"\"\n click.echo(f\"[..] Connecting to {host}:{port} as {user}\", err=True)\n params = connection.ConnParams(\n host=host, port=port, user=user, password=password, ssl=ssl\n )\n try:\n ctx = connection.connection(params)\n conn = ctx.__enter__()\n except SRConnectionError:\n click.echo(\"\", err=True)\n click.echo(doctor.format_diagnosis(doctor.diagnose(host, port)), err=True)\n raise\n try:\n click.echo(\"[OK] Connection verified\", err=True)\n grants = connection.capture_grants(conn)\n finally:\n ctx.__exit__(None, None, None)\n\n config.write(\n profile,\n config.ProfileConfig(\n host=host, port=port, user=user, password=password, ssl=ssl,\n logged_in_at=datetime.now(timezone.utc).isoformat(timespec=\"seconds\"),\n ),\n )\n if grants:\n config.write_grants(profile, grants)\n click.echo(f\"[OK] Captured {len(grants.splitlines())} grant line(s)\", err=True)\n else:\n click.echo(\"[!!] SHOW GRANTS unavailable; capability hints disabled\", err=True)\n\n cfg_path = config.config_path(profile)\n click.echo(f\"[OK] Profile '{profile}' written to {cfg_path}\", err=True)\n click.echo(\"\", err=True)\n click.echo(f\" Try: srsql -e \\\"SELECT CURRENT_VERSION()\\\"\", err=True)\n if profile != \"default\":\n click.echo(f\" Or: SR_PROFILE={profile} srsql -e \\\"...\\\"\", err=True)\n","content_type":"text/x-python; charset=utf-8","language":"python","size":1879,"content_sha256":"65b984eed4cce5c4233c5a768426f048a34c3b7538ae829244ed738e20c88b2f"},{"filename":"scripts/sr_connect/logout.py","content":"\"\"\"sr-logout: remove the local profile + grants for a cluster.\"\"\"\n\nfrom __future__ import annotations\n\nimport click\n\nfrom . import config\n\n\ndef run_logout(profile: str) -> None:\n \"\"\"Delete local profile files. No cluster-side action.\"\"\"\n cfg_path = config.config_path(profile)\n if not cfg_path.exists():\n click.echo(f\"[..] No profile '{profile}' to remove\", err=True)\n return\n config.remove(profile)\n click.echo(f\"[OK] Removed profile '{profile}' ({cfg_path})\", err=True)\n","content_type":"text/x-python; charset=utf-8","language":"python","size":501,"content_sha256":"27c906b8715464a83018628481f146e878a25851cbe2d940491cc2dbbd6c8e04"},{"filename":"scripts/sr_connect/query.py","content":"\"\"\"srsql: classify SQL → READ executes; non-READ requires --yes; supports multiple output formats.\"\"\"\n\nfrom __future__ import annotations\n\nimport json\nimport sys\nfrom typing import Any, Iterable\n\nimport click\nimport pymysql\n\nfrom . import config, connection\nfrom .classify import classify_one\nfrom .errors import SRConnectError\n\n\nSUPPORTED_FORMATS = (\"tsv\", \"json\", \"table\", \"markdown\", \"vertical\")\n\n\ndef run_query(\n profile: str,\n sql: str,\n fmt: str = \"tsv\",\n confirm: bool = False,\n dry_run: bool = False,\n) -> int:\n \"\"\"Execute sql and print results. Returns exit code.\n\n Gate:\n - READ executes directly.\n - non-READ (including UNKNOWN / parse failures) requires confirm=True.\n - dry_run prints classification verdict only; never executes.\n \"\"\"\n if fmt not in SUPPORTED_FORMATS:\n click.echo(f\"Unknown format: {fmt} (supported: {', '.join(SUPPORTED_FORMATS)})\", err=True)\n return 2\n\n cls = classify_one(sql)\n\n if dry_run:\n _print_classification(cls)\n return 0\n\n if not cls.is_read_only and not confirm:\n lines = [\n \"Refusing to execute non-READ SQL without --yes.\",\n f\" verdict: {cls.verdict.value}\",\n f\" statement_type: {cls.statement_type}\",\n ]\n if cls.target:\n lines.append(f\" target: {cls.target}\")\n if cls.warning:\n lines.append(f\" warning: {cls.warning}\")\n lines.append(\" Re-run with --yes to confirm execution.\")\n raise SRConnectError(\"\\n\".join(lines))\n\n if not cls.is_read_only and cls.warning:\n click.echo(f\"[!!] {cls.warning}\", err=True)\n\n cfg = config.read(profile) # raises ConfigNotFoundError\n params = connection.ConnParams(\n host=cfg.host, port=cfg.port, user=cfg.user, password=cfg.password, ssl=cfg.ssl,\n )\n\n try:\n with connection.connection(params) as conn:\n with conn.cursor() as cur:\n cur.execute(sql)\n if cur.description is None:\n click.echo(f\"Affected rows: {cur.rowcount}\", err=True)\n return 0\n columns = [d[0] for d in cur.description]\n rows = cur.fetchall()\n except pymysql.err.MySQLError as e:\n errno = e.args[0] if e.args else \"?\"\n msg = e.args[1] if len(e.args) > 1 else str(e)\n raise SRConnectError(f\"SQL error [{errno}]: {msg}\") from e\n\n _render(columns, rows, fmt)\n return 0\n\n\ndef _print_classification(cls) -> None:\n click.echo(f\"verdict: {cls.verdict.value}\", err=True)\n click.echo(f\"statement_type: {cls.statement_type}\", err=True)\n if cls.target:\n click.echo(f\"target: {cls.target}\", err=True)\n if cls.warning:\n click.echo(f\"warning: {cls.warning}\", err=True)\n click.echo(f\"would_execute: {'yes' if cls.is_read_only else 'requires --yes'}\", err=True)\n\n\ndef _render(columns: list[str], rows: Iterable[tuple], fmt: str) -> None:\n rows = list(rows)\n if fmt == \"tsv\":\n _render_tsv(columns, rows)\n elif fmt == \"json\":\n _render_json(columns, rows)\n elif fmt == \"table\":\n _render_table(columns, rows)\n elif fmt == \"markdown\":\n _render_markdown(columns, rows)\n elif fmt == \"vertical\":\n _render_vertical(columns, rows)\n\n\ndef _s(v: Any) -> str:\n if v is None:\n return \"NULL\"\n return str(v)\n\n\ndef _render_tsv(columns: list[str], rows: list[tuple]) -> None:\n sys.stdout.write(\"\\t\".join(columns) + \"\\n\")\n for r in rows:\n sys.stdout.write(\"\\t\".join(_s(v) for v in r) + \"\\n\")\n\n\ndef _render_json(columns: list[str], rows: list[tuple]) -> None:\n out = [dict(zip(columns, (_s_json(v) for v in r))) for r in rows]\n json.dump(out, sys.stdout, ensure_ascii=False, indent=2, default=str)\n sys.stdout.write(\"\\n\")\n\n\ndef _s_json(v: Any) -> Any:\n if v is None:\n return None\n if isinstance(v, (str, int, float, bool)):\n return v\n return str(v)\n\n\ndef _render_table(columns: list[str], rows: list[tuple]) -> None:\n widths = [len(c) for c in columns]\n for r in rows:\n for i, v in enumerate(r):\n widths[i] = max(widths[i], len(_s(v)))\n\n def sep() -> str:\n return \"+\" + \"+\".join(\"-\" * (w + 2) for w in widths) + \"+\"\n\n def fmt_row(vals: Iterable[Any]) -> str:\n return \"| \" + \" | \".join(_s(v).ljust(widths[i]) for i, v in enumerate(vals)) + \" |\"\n\n print(sep())\n print(fmt_row(columns))\n print(sep())\n for r in rows:\n print(fmt_row(r))\n print(sep())\n\n\ndef _render_markdown(columns: list[str], rows: list[tuple]) -> None:\n print(\"| \" + \" | \".join(columns) + \" |\")\n print(\"| \" + \" | \".join(\"---\" for _ in columns) + \" |\")\n for r in rows:\n print(\"| \" + \" | \".join(_s(v).replace(\"|\", \"\\\\|\") for v in r) + \" |\")\n\n\ndef _render_vertical(columns: list[str], rows: list[tuple]) -> None:\n width = max((len(c) for c in columns), default=0)\n for i, r in enumerate(rows, 1):\n print(f\"*************************** {i}. row ***************************\")\n for c, v in zip(columns, r):\n print(f\"{c.rjust(width)}: {_s(v)}\")\n","content_type":"text/x-python; charset=utf-8","language":"python","size":5164,"content_sha256":"6a910b05421166c1c5b1576a8cbd6e0237e8b5923deb9fec2999dd4f57bf4c4a"},{"filename":"scripts/sr_connect/whoami.py","content":"\"\"\"sr-whoami: print current profile state — host, user, login time, grants summary.\"\"\"\n\nfrom __future__ import annotations\n\nimport click\n\nfrom . import config\nfrom .errors import SRConnectError\n\n\ndef run_whoami(profile: str) -> None:\n cfg_path = config.config_path(profile)\n if not cfg_path.exists():\n raise SRConnectError(\n f\"No profile '{profile}'. Run sr-login first.\"\n )\n cfg = config.read(profile)\n grants = config.read_grants(profile)\n grant_lines = [ln for ln in grants.splitlines() if ln.strip()]\n\n click.echo(f\"profile: {profile}\")\n click.echo(f\"host: {cfg.host}:{cfg.port}\")\n click.echo(f\"user: {cfg.user}\")\n click.echo(f\"ssl: {'yes' if cfg.ssl else 'no'}\")\n click.echo(f\"logged_in_at: {cfg.logged_in_at or '(unknown)'}\")\n click.echo(f\"config_file: {cfg_path}\")\n\n if grant_lines:\n click.echo(\"\")\n click.echo(f\"grants ({len(grant_lines)} line(s)):\")\n for ln in grant_lines:\n click.echo(f\" {ln}\")\n else:\n click.echo(\"\")\n click.echo(\"grants: (not captured — SHOW GRANTS unavailable at login)\")\n","content_type":"text/x-python; charset=utf-8","language":"python","size":1167,"content_sha256":"78ab38e6cc7183ec99ae52a828f9b323e940704c16c65d38992dc218e681e333"}],"content_json":{"type":"doc","content":[{"type":"heading","attrs":{"level":1},"content":[{"text":"Alibaba Cloud EMR Serverless StarRocks Development & Operations Assistant","type":"text"}]},{"type":"paragraph","content":[{"text":"Help users perform day-to-day table design, data ingestion, SQL writing & tuning, and health diagnostics on Alibaba Cloud EMR Serverless StarRocks. All cluster access goes through the bundled ","type":"text"},{"text":"srsql","type":"text","marks":[{"type":"code_inline"}]},{"text":" CLI (pymysql-based, uses the user's own account); no MySQL client required. Non-READ SQL is classified by sqlglot and requires ","type":"text"},{"text":"--yes","type":"text","marks":[{"type":"code_inline"}]},{"text":" confirmation before execution.","type":"text"}]},{"type":"blockquote","content":[{"type":"paragraph","content":[{"text":"Scope statement","type":"text","marks":[{"type":"strong"}]},{"text":": This Skill focuses on ","type":"text"},{"text":"using","type":"text","marks":[{"type":"em"}]},{"text":" StarRocks — development, diagnostics, and day-to-day data operations. Cluster-internal data and schema operations (DDL, DML, materialized view refresh, GRANT, etc.) are supported and execute under the user's own account, gated by sqlglot classification + ","type":"text"},{"text":"--yes","type":"text","marks":[{"type":"code_inline"}]},{"text":" confirmation. Instance-lifecycle operations (create, scale, restart, configuration change, version upgrade) are control-plane operations and are ","type":"text"},{"text":"not","type":"text","marks":[{"type":"strong"}]},{"text":" in this Skill's scope; please use the EMR Serverless console or the corresponding OpenAPI.","type":"text"}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"When to use / When not to use","type":"text"}]},{"type":"paragraph","content":[{"text":"When to use","type":"text","marks":[{"type":"strong"}]},{"text":":","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Schema design (table model, partitioning, bucketing, sort key, indexes, storage parameters)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Data ingestion selection (Stream/Broker/Routine Load, INSERT, Pipe, Flink/Kafka Connector, CDC)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"SQL writing, rewriting, and tuning (JOIN strategy, window functions, CTE, aggregation optimization, statistics)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Materialized view design and operations","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Cluster health diagnostics (FE/BE/CN nodes, tablet health, compaction, warehouse, recent failed loads)","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"When NOT to use","type":"text","marks":[{"type":"strong"}]},{"text":":","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Instance lifecycle control: create / scale / restart / config change / upgrade StarRocks instances — these are control-plane operations; use the EMR Serverless console or the corresponding OpenAPI instead","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Operating non-StarRocks products: EMR Cluster, Spark, Milvus, ClickHouse, Doris, RDS, ECS, etc.","type":"text"}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"First-time setup: install & log in","type":"text"}]},{"type":"paragraph","content":[{"text":"This Skill ships with the ","type":"text"},{"text":"sr-connect","type":"text","marks":[{"type":"code_inline"}]},{"text":" Python CLI. See ","type":"text"},{"text":"references/connect.md","type":"text","marks":[{"type":"link","attrs":{"href":"references/connect.md","title":null}}]},{"text":" for details.","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Assistant bootstrap protocol (instructions for Claude)","type":"text"}]},{"type":"paragraph","content":[{"text":"When this Skill is invoked and you anticipate running any cluster query, ensure ","type":"text"},{"text":"srsql","type":"text","marks":[{"type":"code_inline"}]},{"text":" is available ","type":"text"},{"text":"before","type":"text","marks":[{"type":"em"}]},{"text":" asking the user for anything:","type":"text"}]},{"type":"ordered_list","attrs":{"order":1,"listStyle":"number"},"content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Run ","type":"text"},{"text":"which srsql","type":"text","marks":[{"type":"code_inline"}]},{"text":". If it returns a path, skip to step 4.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"If missing, install it yourself: ","type":"text"},{"text":"uv tool install \u003cskill-project-root>","type":"text","marks":[{"type":"code_inline"}]},{"text":" where ","type":"text"},{"text":"\u003cskill-project-root>","type":"text","marks":[{"type":"code_inline"}]},{"text":" is the directory containing this ","type":"text"},{"text":"SKILL.md","type":"text","marks":[{"type":"code_inline"}]},{"text":" and ","type":"text"},{"text":"pyproject.toml","type":"text","marks":[{"type":"code_inline"}]},{"text":" (the Skill's base directory shown at invocation time; commonly ","type":"text"},{"text":"~/.claude/skills/alibabacloud-emr-starrocks-assistant/","type":"text","marks":[{"type":"code_inline"}]},{"text":", which may be a symlink). ","type":"text"},{"text":"Do not ask the user to run this","type":"text","marks":[{"type":"strong"}]},{"text":" — the bundled CLI is part of the Skill's capability surface, not user infrastructure.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"If ","type":"text"},{"text":"uv","type":"text","marks":[{"type":"code_inline"}]},{"text":" itself is missing (","type":"text"},{"text":"which uv","type":"text","marks":[{"type":"code_inline"}]},{"text":" fails), surface that to the user — ","type":"text"},{"text":"uv","type":"text","marks":[{"type":"code_inline"}]},{"text":" is a system tool and not auto-installed.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Check ","type":"text"},{"text":"~/.starrocks/{profile}.cnf","type":"text","marks":[{"type":"code_inline"}]},{"text":" (default profile name: ","type":"text"},{"text":"default","type":"text","marks":[{"type":"code_inline"}]},{"text":"; respect ","type":"text"},{"text":"SR_PROFILE","type":"text","marks":[{"type":"code_inline"}]},{"text":" env var if set). If it exists, skip to step 5. If missing:","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"First try ","type":"text","marks":[{"type":"strong"}]},{"text":"sr-login --from-env","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":".","type":"text","marks":[{"type":"strong"}]},{"text":" Safe to call unconditionally — it exits 2 with a clear \"missing\" message when the environment doesn't have the credentials it needs, and does nothing else. You do not need to inspect environment variables yourself.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"If ","type":"text","marks":[{"type":"strong"}]},{"text":"sr-login --from-env","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" exits non-zero","type":"text","marks":[{"type":"strong"}]},{"text":", the user hasn't logged in yet. Give them the ","type":"text"},{"text":"sr-login --host ... --user ...","type":"text","marks":[{"type":"code_inline"}]},{"text":" command and ask them to run it themselves. ","type":"text"},{"text":"Do not run interactive ","type":"text","marks":[{"type":"strong"}]},{"text":"sr-login","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" yourself","type":"text","marks":[{"type":"strong"}]},{"text":" — it would block on a password prompt you cannot answer.","type":"text"}]}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"After both ","type":"text"},{"text":"srsql","type":"text","marks":[{"type":"code_inline"}]},{"text":" is on PATH and the profile file exists, run queries via ","type":"text"},{"text":"srsql -e \"...\"","type":"text","marks":[{"type":"code_inline"}]},{"text":" yourself.","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"If ","type":"text"},{"text":"srsql","type":"text","marks":[{"type":"code_inline"}]},{"text":" was just installed in this session and PATH hasn't been refreshed in the user's shell, fall back to the absolute path printed by ","type":"text"},{"text":"uv tool install","type":"text","marks":[{"type":"code_inline"}]},{"text":" (typically ","type":"text"},{"text":"~/.local/bin/srsql","type":"text","marks":[{"type":"code_inline"}]},{"text":").","type":"text"}]},{"type":"paragraph","content":[{"text":"Chat-style rule after bootstrap succeeds","type":"text","marks":[{"type":"strong"}]},{"text":": Do ","type":"text"},{"text":"not","type":"text","marks":[{"type":"strong"}]},{"text":" echo ","type":"text"},{"text":"sr-whoami","type":"text","marks":[{"type":"code_inline"}]},{"text":" / ","type":"text"},{"text":"srsql -e \"...\"","type":"text","marks":[{"type":"code_inline"}]},{"text":" invocation syntax to the user as a \"you can now run …\" hint. You are the one calling these CLIs on the user's behalf — the user drives the Skill, not the binaries. Skip the post-success \"next step\" narration entirely and just ask what they want to do, or proceed if their intent is already clear.","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Login command (give this to the user when their profile is missing)","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"# EMR Serverless StarRocks — both internal and public endpoints use the MySQL\n# wire protocol over plain TCP; no SSL/TLS. Use the same form for either.\nsr-login --host \u003cfe-endpoint> --port 9030 --user \u003caccount>\n\n# Verify\nsr-whoami\nsrsql -e \"SELECT CURRENT_VERSION()\"","type":"text"}]},{"type":"paragraph","content":[{"text":"Re-running ","type":"text"},{"text":"sr-login","type":"text","marks":[{"type":"code_inline"}]},{"text":" with the same ","type":"text"},{"text":"--profile","type":"text","marks":[{"type":"code_inline"}]},{"text":" silently overwrites the stored credential (same semantics as ","type":"text"},{"text":"docker login","type":"text","marks":[{"type":"code_inline"}]},{"text":"). Use ","type":"text"},{"text":"--profile","type":"text","marks":[{"type":"code_inline"}]},{"text":" for multi-cluster:","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"sr-login --profile prod --host fe-prod.xxx --user app_user\nSR_PROFILE=prod srsql -e \"...\"","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Security model","type":"text"}]},{"type":"paragraph","content":[{"text":"This Skill has ","type":"text"},{"text":"two layers","type":"text","marks":[{"type":"strong"}]},{"text":":","type":"text"}]},{"type":"ordered_list","attrs":{"order":1,"listStyle":"number"},"content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"FE is the authoritative permission boundary.","type":"text","marks":[{"type":"strong"}]},{"text":" The user supplies their own StarRocks account; whatever they're allowed to do, they're allowed to do. The Skill does not create, elevate, or rotate any accounts.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"srsql","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" is a UX gate, not a security boundary.","type":"text","marks":[{"type":"strong"}]},{"text":" Every statement is parsed by sqlglot (dialect ","type":"text"},{"text":"starrocks","type":"text","marks":[{"type":"code_inline"}]},{"text":"):","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"READ","type":"text","marks":[{"type":"code_inline"}]},{"text":" (SELECT / SHOW / DESC / EXPLAIN / WITH / …) executes directly.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Any non-READ","type":"text","marks":[{"type":"strong"}]},{"text":" (INSERT / UPDATE / DELETE / DDL / GRANT / SET / USE / …) ","type":"text"},{"text":"is refused unless ","type":"text","marks":[{"type":"strong"}]},{"text":"--yes","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" is passed","type":"text","marks":[{"type":"strong"}]},{"text":".","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"SQL sqlglot cannot parse falls back to a leading-keyword check; if still ambiguous → ","type":"text"},{"text":"UNKNOWN","type":"text","marks":[{"type":"code_inline"}]},{"text":", treated as non-READ, executable with ","type":"text"},{"text":"--yes","type":"text","marks":[{"type":"code_inline"}]},{"text":" plus a soft warning.","type":"text"}]}]}]}]}]},{"type":"paragraph","content":[{"text":"When the user asks for a write operation:","type":"text"}]},{"type":"ordered_list","attrs":{"order":1,"listStyle":"number"},"content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Show them the SQL you intend to run.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Optionally preview classification via ","type":"text"},{"text":"srsql --dry-run -e \"...\"","type":"text","marks":[{"type":"code_inline"}]},{"text":".","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Get explicit confirmation in chat.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Then run with ","type":"text"},{"text":"srsql --yes -e \"...\"","type":"text","marks":[{"type":"code_inline"}]},{"text":".","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"For DDL on production tables, or operations that change global cluster state (CREATE/DROP USER, ADMIN SET CONFIG, etc.), prefer to print the SQL and let the user run it themselves — even though the gate would let them run it via ","type":"text"},{"text":"--yes","type":"text","marks":[{"type":"code_inline"}]},{"text":". The gate is a safety net, not a license.","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Input validation & command-injection protection","type":"text"}]},{"type":"paragraph","content":[{"text":"SQL passed into ","type":"text"},{"text":"srsql -e \"...\"","type":"text","marks":[{"type":"code_inline"}]},{"text":" is assembled by the LLM and must follow these rules:","type":"text"}]},{"type":"ordered_list","attrs":{"order":1,"listStyle":"number"},"content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Identifiers (table / column / database names) are validated before interpolation: only ","type":"text"},{"text":"[A-Za-z0-9_]","type":"text","marks":[{"type":"code_inline"}]},{"text":" plus backtick-quoted forms.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"User-provided string values (search terms, label names, etc.) are ","type":"text"},{"text":"not","type":"text","marks":[{"type":"strong"}]},{"text":" spliced into SQL directly; use parameter binding or pre-escape.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Never execute raw user-provided strings as SQL fragments.","type":"text"}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Sensitive data masking","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Scenario","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Handling","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Profile file content (incl. user password)","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Never echoed; mode 600 under a 700 directory","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Password in error messages","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Truncate / replace with ","type":"text"},{"text":"******","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Query results contain obvious key / token columns","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Warn the user without displaying full content","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"aliyun configure list","type":"text","marks":[{"type":"code_inline"}]},{"text":" output containing AK","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Show only the first 4 chars; replace the rest with ","type":"text"},{"text":"****","type":"text","marks":[{"type":"code_inline"}]}]}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Intent routing","type":"text"}]},{"type":"blockquote","content":[{"type":"paragraph","content":[{"text":"Disambiguation rule","type":"text","marks":[{"type":"strong"}]},{"text":": When the user input is ambiguous (e.g. \"ingestion is slow\", \"queries are slow\") and context is unclear, ask one clarifying question before acting.","type":"text"}]}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"User intent","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Route","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Reference","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"First-time cluster connection / register or switch credentials / multi-cluster setup","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"sr-login / sr-whoami / sr-logout","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"references/connect.md","type":"text","marks":[{"type":"link","attrs":{"href":"references/connect.md","title":null}}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"New table / change schema / table model selection / partition+bucket design","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Schema design","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"references/schema.md","type":"text","marks":[{"type":"link","attrs":{"href":"references/schema.md","title":null}}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Choose ingestion method / configure Stream/Broker/Routine Load / Flink/Kafka Connector","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Import selection","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"references/data-import.md","type":"text","marks":[{"type":"link","attrs":{"href":"references/data-import.md","title":null}}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Write SQL / optimize SQL / materialized views / function selection / read execution plans","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"SQL development & tuning","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"references/sql.md","type":"text","marks":[{"type":"link","attrs":{"href":"references/sql.md","title":null}}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Cluster health check / FE/BE/CN status / unhealthy tablets / compaction lag","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Cluster diagnostics","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"references/diagnostics.md","type":"text","marks":[{"type":"link","attrs":{"href":"references/diagnostics.md","title":null}}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"\"Ingestion used to be fine, suddenly slow\"","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Cluster diagnostics (distinct from import selection)","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"references/diagnostics.md","type":"text","marks":[{"type":"link","attrs":{"href":"references/diagnostics.md","title":null}}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"\"How should I design a new ingestion pipeline\"","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Import selection","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"references/data-import.md","type":"text","marks":[{"type":"link","attrs":{"href":"references/data-import.md","title":null}}]}]}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Five scenarios at a glance","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"1. Schema design","type":"text"}]},{"type":"paragraph","content":[{"text":"Four table models and their typical use cases:","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Use case","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Model","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Logs / events / detail records","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Duplicate Key","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Pre-aggregated metrics","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Aggregate","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Real-time upsert / CDC","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Primary Key","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Simple deduplication","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Unique Key (for new use cases, prefer Primary Key)","type":"text"}]}]}]}]},{"type":"paragraph","content":[{"text":"⚠ Anti-patterns — do not produce these in DDL:","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Shared-data PK table without ","type":"text","marks":[{"type":"strong"}]},{"text":"persistent_index_type=CLOUD_NATIVE","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" + ","type":"text","marks":[{"type":"strong"}]},{"text":"datacache.partition_duration","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" — LOCAL index doesn't survive CN rebalance; no hot-data caching window. See ","type":"text"},{"text":"schema/storage-properties.md","type":"text","marks":[{"type":"link","attrs":{"href":"references/schema/storage-properties.md","title":null}}]},{"text":".","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Setting ","type":"text","marks":[{"type":"strong"}]},{"text":"datacache.partition_duration","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" to an arbitrary \"hot window\" (e.g. ","type":"text","marks":[{"type":"strong"}]},{"text":"30 DAY","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":") instead of the user's stated query window","type":"text","marks":[{"type":"strong"}]},{"text":" — the value MUST be ","type":"text"},{"text":"≥ the query window","type":"text","marks":[{"type":"strong"}]},{"text":". If the user says \"查询近 N 天\" / \"queries the last N days\", set ","type":"text"},{"text":"datacache.partition_duration = \"N DAY\"","type":"text","marks":[{"type":"code_inline"}]},{"text":" (or larger). A value smaller than the query window guarantees cache misses on in-window queries. Do not default to 7/30/60 days when the user has given you a number.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"storage_cooldown_time","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":"/","type":"text","marks":[{"type":"strong"}]},{"text":"storage_cooldown_ttl","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":"/","type":"text","marks":[{"type":"strong"}]},{"text":"storage_medium","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":"/","type":"text","marks":[{"type":"strong"}]},{"text":"replicated_storage","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" on shared-data","type":"text","marks":[{"type":"strong"}]},{"text":" — silently stripped or rejected by ","type":"text"},{"text":"PropertyAnalyzer","type":"text","marks":[{"type":"code_inline"}]},{"text":"; use ","type":"text"},{"text":"datacache.partition_duration","type":"text","marks":[{"type":"code_inline"}]},{"text":" for the cooldown effect.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"FLOAT / DOUBLE columns inside ","type":"text","marks":[{"type":"strong"}]},{"text":"PRIMARY KEY","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" — not supported; use BIGINT or DECIMAL.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Treating \"CN\" as a shared-nothing signal","type":"text","marks":[{"type":"strong"}]},{"text":" — CN = Compute Node, which is the shared-data terminology. BE = Backend = shared-nothing.","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"See ","type":"text"},{"text":"references/schema.md","type":"text","marks":[{"type":"link","attrs":{"href":"references/schema.md","title":null}}]},{"text":".","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"2. Data ingestion","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Data source","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Recommended method","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Local files \u003c 10 GB","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Stream Load","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Object storage / HDFS bulk","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Broker Load or ","type":"text"},{"text":"INSERT INTO ... FROM FILES()","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Object storage with continuous file arrivals","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Pipe + AUTO_INGEST","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Kafka / Pulsar","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Routine Load or Kafka/Flink Connector","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"MySQL CDC","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Flink CDC + Flink Connector","type":"text"}]}]}]}]},{"type":"paragraph","content":[{"text":"⚠ Anti-patterns — do not produce these in load configs:","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"PK-table DELETE without ","type":"text","marks":[{"type":"strong"}]},{"text":"__op","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" integer column (","type":"text","marks":[{"type":"strong"}]},{"text":"0","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":"=UPSERT, ","type":"text","marks":[{"type":"strong"}]},{"text":"1","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":"=DELETE) in COLUMNS list + ","type":"text","marks":[{"type":"strong"}]},{"text":"$.__op","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" in ","type":"text","marks":[{"type":"strong"}]},{"text":"jsonpaths","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" — all events are silently treated as UPSERT. ","type":"text"},{"text":"The ","type":"text","marks":[{"type":"strong"}]},{"text":"__op","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" contract is a pair and must be taught as a pair","type":"text","marks":[{"type":"strong"}]},{"text":": the literal column name is ","type":"text"},{"text":"__op","type":"text","marks":[{"type":"code_inline"}]},{"text":", and the integer values are ","type":"text"},{"text":"__op=0","type":"text","marks":[{"type":"code_inline"}]},{"text":" for UPSERT ","type":"text"},{"text":"and","type":"text","marks":[{"type":"strong"}]},{"text":" ","type":"text"},{"text":"__op=1","type":"text","marks":[{"type":"code_inline"}]},{"text":" for DELETE. Even when the user only asks about DELETE, your response MUST state ","type":"text"},{"text":"both","type":"text","marks":[{"type":"strong"}]},{"text":" mappings (","type":"text"},{"text":"__op=0","type":"text","marks":[{"type":"code_inline"}]},{"text":" → UPSERT, ","type":"text"},{"text":"__op=1","type":"text","marks":[{"type":"code_inline"}]},{"text":" → DELETE) — never one without the other. This applies on every ingestion path including Flink Connector and Kafka Connector, where the connector populates ","type":"text"},{"text":"__op","type":"text","marks":[{"type":"code_inline"}]},{"text":" for the user but they still need both values to debug \"DELETE not applied\" / \"UPSERT not applied\" symptoms.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Treating ","type":"text","marks":[{"type":"strong"}]},{"text":"partial_update=true","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" as a DELETE enabler","type":"text","marks":[{"type":"strong"}]},{"text":" — it controls partial-column UPSERT and has ","type":"text"},{"text":"nothing to do","type":"text","marks":[{"type":"strong"}]},{"text":" with DELETE. If a user enables it while asking why DELETE doesn't work, ","type":"text"},{"text":"flag it as misconfigured-for-intent","type":"text","marks":[{"type":"strong"}]},{"text":" and tell them to remove it unless they actually have a partial-column UPSERT use case. Do not validate the existing setting just because it parses.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"COLUMNS FROM PATH AS (...)","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" in Routine Load","type":"text","marks":[{"type":"strong"}]},{"text":" — that's Broker Load's Hive-partition path syntax; not valid in Routine Load.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"__op","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" values as strings (","type":"text","marks":[{"type":"strong"}]},{"text":"\"upsert\"","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":"/","type":"text","marks":[{"type":"strong"}]},{"text":"\"delete\"","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":")","type":"text","marks":[{"type":"strong"}]},{"text":" — must be the integers ","type":"text"},{"text":"0","type":"text","marks":[{"type":"code_inline"}]},{"text":" / ","type":"text"},{"text":"1","type":"text","marks":[{"type":"code_inline"}]},{"text":".","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"High-throughput CDC (≥ ~10K events/sec) without flagging TOO_MANY_VERSION risk","type":"text","marks":[{"type":"strong"}]},{"text":" — applies to Routine Load, Flink Connector, Kafka Connector, not just ","type":"text"},{"text":"INSERT INTO VALUES","type":"text","marks":[{"type":"code_inline"}]},{"text":". Whenever the user's scenario implies high event rate, the recommendation MUST cover: (a) the method-appropriate concurrency cap (","type":"text"},{"text":"desired_concurrent_number","type":"text","marks":[{"type":"code_inline"}]},{"text":" ≤ Kafka partitions for Routine Load; ","type":"text"},{"text":"sink.parallelism","type":"text","marks":[{"type":"code_inline"}]},{"text":" ≤ Kafka partitions for Flink/Kafka Connector), AND (b) an explicit TOO_MANY_VERSION / compaction-pressure warning with the relevant flush-interval guidance.","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"See ","type":"text"},{"text":"references/data-import.md","type":"text","marks":[{"type":"link","attrs":{"href":"references/data-import.md","title":null}}]},{"text":".","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"3. SQL development","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Use case","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Pattern","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Period-over-period / cumulative / Top-N","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Window functions","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Large fact table JOIN small dimension (right side ≤ ","type":"text"},{"text":"broadcast_row_limit","type":"text","marks":[{"type":"code_inline"}]},{"text":", default 15M rows)","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Broadcast / Colocate","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Complex layered logic","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"CTE","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Billion-scale deduplication","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"APPROX_COUNT_DISTINCT","type":"text","marks":[{"type":"code_inline"}]},{"text":" / BITMAP / HLL","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"High-frequency repeated query acceleration","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Asynchronous materialized view","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Cross-source query","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"External Catalog","type":"text"}]}]}]}]},{"type":"paragraph","content":[{"text":"⚠ Anti-patterns — do not produce these in query rewrites or tuning advice:","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Wrapping the partition column with ","type":"text","marks":[{"type":"strong"}]},{"text":"date_format()","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" / ","type":"text","marks":[{"type":"strong"}]},{"text":"date_trunc()","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" / ","type":"text","marks":[{"type":"strong"}]},{"text":"cast()","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" in WHERE","type":"text","marks":[{"type":"strong"}]},{"text":" — breaks partition pruning; rewrite as a range predicate (","type":"text"},{"text":"col >= '...' AND col \u003c '...'","type":"text","marks":[{"type":"code_inline"}]},{"text":").","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Tuning advice without ","type":"text","marks":[{"type":"strong"}]},{"text":"EXPLAIN VERBOSE","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" + checking ","type":"text","marks":[{"type":"strong"}]},{"text":"partitions=N/M","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" and ","type":"text","marks":[{"type":"strong"}]},{"text":"tabletRatio=N/M","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" — pruning failures (numerator == denominator) go undetected; never use plain ","type":"text"},{"text":"EXPLAIN","type":"text","marks":[{"type":"code_inline"}]},{"text":" for this.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Reading ","type":"text","marks":[{"type":"strong"}]},{"text":"cardinality","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" in EXPLAIN as the result row count","type":"text","marks":[{"type":"strong"}]},{"text":" — it's the ","type":"text"},{"text":"CBO's row estimate","type":"text","marks":[{"type":"strong"}]},{"text":". Always quantify the staleness gap using the ","type":"text"},{"text":"direct comparison ","type":"text","marks":[{"type":"strong"}]},{"text":"cardinality","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" vs the user-stated total table size","type":"text","marks":[{"type":"strong"}]},{"text":" (e.g. \"estimate 5M vs total 500M ≈ 100×\"); a ratio > 10× means stats are stale → run ","type":"text"},{"text":"ANALYZE TABLE","type":"text","marks":[{"type":"code_inline"}]},{"text":".","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Estimating \"real filtered rows\" by guessing predicate selectivity, then comparing cardinality to that guess","type":"text","marks":[{"type":"strong"}]},{"text":" — you don't have runtime row counts, and guessing selectivity from a predicate like ","type":"text"},{"text":"WHERE create_time > '...'","type":"text","marks":[{"type":"code_inline"}]},{"text":" introduces large errors (you don't know the data distribution). When the user gives you a total row count, compare ","type":"text"},{"text":"cardinality","type":"text","marks":[{"type":"code_inline"}]},{"text":" to that ","type":"text"},{"text":"directly","type":"text","marks":[{"type":"strong"}]},{"text":"; do not divide the total by an assumed time window or selectivity factor.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Conflating ","type":"text","marks":[{"type":"strong"}]},{"text":"partitions","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":"/","type":"text","marks":[{"type":"strong"}]},{"text":"tabletRatio","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" pruning failures with ","type":"text","marks":[{"type":"strong"}]},{"text":"cardinality","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" deviation","type":"text","marks":[{"type":"strong"}]},{"text":" — these are ","type":"text"},{"text":"two independent diagnostic signals","type":"text","marks":[{"type":"strong"}]},{"text":". When both look bad in the same OlapScanNode (e.g. ","type":"text"},{"text":"partitions=N/N","type":"text","marks":[{"type":"code_inline"}]},{"text":" AND ","type":"text"},{"text":"cardinality","type":"text","marks":[{"type":"code_inline"}]},{"text":" off from total table size by 10×–100×), report them as ","type":"text"},{"text":"separate findings with separate fixes","type":"text","marks":[{"type":"strong"}]},{"text":" (predicate/type fix vs ","type":"text"},{"text":"ANALYZE TABLE","type":"text","marks":[{"type":"code_inline"}]},{"text":"). Do not use cardinality deviation to \"explain\" pruning failure, and do not let pruning failure absorb the stale-stats finding.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Recommending BE/CN scale-out before plan/stats analysis","type":"text","marks":[{"type":"strong"}]},{"text":" — SQL/stats fixes precede capacity changes.","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"See ","type":"text"},{"text":"references/sql.md","type":"text","marks":[{"type":"link","attrs":{"href":"references/sql.md","title":null}}]},{"text":".","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"4. Cluster diagnostics","type":"text"}]},{"type":"paragraph","content":[{"text":"Diagnostic order:","type":"text"}]},{"type":"ordered_list","attrs":{"order":1,"listStyle":"number"},"content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Identify architecture (shared-nothing / shared-data) → ","type":"text"},{"text":"SHOW WAREHOUSES","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"FE → ","type":"text"},{"text":"SHOW FRONTENDS","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"BE or CN → ","type":"text"},{"text":"SHOW BACKENDS","type":"text","marks":[{"type":"code_inline"}]},{"text":" / ","type":"text"},{"text":"SHOW COMPUTE NODES","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Warehouse (shared-data only) → ","type":"text"},{"text":"SHOW WAREHOUSES","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Tablet health overview → ","type":"text"},{"text":"SHOW PROC '/statistic'","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Scheduling queue → ","type":"text"},{"text":"information_schema.fe_tablet_schedules","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Compaction → ","type":"text"},{"text":"information_schema.be_compactions","type":"text","marks":[{"type":"code_inline"}]},{"text":" / ","type":"text"},{"text":"be_cloud_native_compactions","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Recent 24-hour loads → ","type":"text"},{"text":"information_schema.loads","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"paragraph","content":[{"text":"⚠ Anti-patterns — do not produce these in diagnostic conclusions:","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Restarting BE/CN or scaling out before checking ","type":"text","marks":[{"type":"strong"}]},{"text":"information_schema.fe_tablet_schedules","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" — may collide with in-flight clone/decommission; root cause first.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Subjectively downgrading ","type":"text","marks":[{"type":"strong"}]},{"text":"UnhealthyTabletNum > 0","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" — always ","type":"text"},{"text":"critical","type":"text","marks":[{"type":"strong"}]},{"text":" per the severity table, never \"medium\" or \"low\" risk; the cluster has unhealthy replicas.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Treating ","type":"text","marks":[{"type":"strong"}]},{"text":"CloningTabletNum > 0","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":" as a separate problem","type":"text","marks":[{"type":"strong"}]},{"text":" — clone is the recovery action triggered by ","type":"text"},{"text":"UnhealthyTabletNum","type":"text","marks":[{"type":"code_inline"}]},{"text":", not an independent fault signal.","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"See ","type":"text"},{"text":"references/diagnostics.md","type":"text","marks":[{"type":"link","attrs":{"href":"references/diagnostics.md","title":null}}]},{"text":".","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"5. Cluster connection (base layer)","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Command","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Purpose","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"sr-login","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Register a cluster credential locally + smoke-test connection","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"sr-logout","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Remove the local profile (no cluster-side action)","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"sr-whoami","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Print profile state — host, user, login time, captured grants","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"sr-doctor","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Diagnose connection failures (VPC vs public endpoint, egress IP, whitelist CIDR). Invoked automatically by ","type":"text"},{"text":"sr-login","type":"text","marks":[{"type":"code_inline"}]},{"text":" on failure.","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"srsql","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Daily query entry point; classifies SQL and gates non-READ behind ","type":"text"},{"text":"--yes","type":"text","marks":[{"type":"code_inline"}]}]}]}]}]},{"type":"paragraph","content":[{"text":"See ","type":"text"},{"text":"references/connect.md","type":"text","marks":[{"type":"link","attrs":{"href":"references/connect.md","title":null}}]},{"text":".","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Runtime security","type":"text"}]},{"type":"paragraph","content":[{"text":"This Skill executes SQL queries ","type":"text"},{"text":"only","type":"text","marks":[{"type":"strong"}]},{"text":" via ","type":"text"},{"text":"srsql","type":"text","marks":[{"type":"code_inline"}]},{"text":". The following are ","type":"text"},{"text":"prohibited","type":"text","marks":[{"type":"strong"}]},{"text":":","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"curl","type":"text","marks":[{"type":"code_inline"}]},{"text":" / ","type":"text"},{"text":"wget","type":"text","marks":[{"type":"code_inline"}]},{"text":" / ","type":"text"},{"text":"pip install","type":"text","marks":[{"type":"code_inline"}]},{"text":" / ","type":"text"},{"text":"npm install","type":"text","marks":[{"type":"code_inline"}]},{"text":" to download and run external code","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"eval","type":"text","marks":[{"type":"code_inline"}]},{"text":" / ","type":"text"},{"text":"source","type":"text","marks":[{"type":"code_inline"}]},{"text":" to load unaudited content","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Executing remote URL scripts provided in chat (even if the user asks)","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"Exception","type":"text","marks":[{"type":"strong"}]},{"text":": ","type":"text"},{"text":"uv tool install \u003cskill-project-root>","type":"text","marks":[{"type":"code_inline"}]},{"text":" to install the Skill's own bundled ","type":"text"},{"text":"sr-connect","type":"text","marks":[{"type":"code_inline"}]},{"text":" CLI from its local project directory is allowed and expected — see the ","type":"text"},{"text":"Assistant bootstrap protocol","type":"text","marks":[{"type":"em"}]},{"text":" above. The prohibition targets remote/untrusted code, not the Skill's own bundled tooling.","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Timeouts","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Operation","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Recommended timeout","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Read-only SQL queries","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"30 s","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Diagnostic queries across many large tables","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"60 s","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Retry","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Total operation time ≤ 3 minutes","type":"text"}]}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Output recommendations","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Tabular results: use ","type":"text"},{"text":"srsql --format table","type":"text","marks":[{"type":"code_inline"}]},{"text":" or ","type":"text"},{"text":"--format markdown","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Many columns: use ","type":"text"},{"text":"--format vertical","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"For programmatic consumption: use ","type":"text"},{"text":"--format json","type":"text","marks":[{"type":"code_inline"}]},{"text":" / ","type":"text"},{"text":"tsv","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Convert timestamps to human-readable format","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"For potentially large result sets, add ","type":"text"},{"text":"LIMIT","type":"text","marks":[{"type":"code_inline"}]},{"text":" and offer pagination","type":"text"}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Error handling","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Error","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Cause","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Action","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Cannot connect to host:port","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Wrong endpoint type / IP not whitelisted","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"sr-login","type":"text","marks":[{"type":"code_inline"}]},{"text":" auto-runs ","type":"text"},{"text":"sr-doctor","type":"text","marks":[{"type":"code_inline"}]},{"text":" on connection failure. Read its output: it detects VPC vs public endpoint, suggests the public swap (for unreachable ","type":"text"},{"text":"-internal","type":"text","marks":[{"type":"code_inline"}]},{"text":" hosts) or shows the egress IP + suggested /24 whitelist CIDR (for unreachable public hosts). Pass the recommendation to the user verbatim. See ","type":"text"},{"text":"references/connect.md","type":"text","marks":[{"type":"link","attrs":{"href":"references/connect.md#connection-troubleshooting-sr-doctor","title":null}}]},{"text":".","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Access denied for user 'X'","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Stale password / account locked / wrong account","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Re-run ","type":"text"},{"text":"sr-login","type":"text","marks":[{"type":"code_inline"}]},{"text":" to update the stored password","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Refusing to execute non-READ SQL without --yes","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Skill correctly classified the SQL as mutating","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Confirm with user, then re-run with ","type":"text"},{"text":"--yes","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Privilege denied: OPERATE / SELECT / ...","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"User account lacks the privilege","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Surface the limitation; skip the affected diagnostic; don't retry","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Table not found","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Wrong DB / table name","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Confirm with ","type":"text"},{"text":"SHOW DATABASES","type":"text","marks":[{"type":"code_inline"}]},{"text":" / ","type":"text"},{"text":"SHOW TABLES FROM db","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Query returns empty but user expects rows","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Over-aggressive predicate / RBAC isolation","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Check WHERE clauses; suggest the user verify with admin","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"No profile 'X'","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"srsql --profile X","type":"text","marks":[{"type":"code_inline"}]},{"text":" without prior ","type":"text"},{"text":"sr-login --profile X","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Run ","type":"text"},{"text":"sr-login","type":"text","marks":[{"type":"code_inline"}]},{"text":" for that profile first","type":"text"}]}]}]}]},{"type":"paragraph","content":[{"text":"Principle","type":"text","marks":[{"type":"strong"}]},{"text":": Read the full error message before deciding; do not retry blindly on the error code alone.","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Related documents","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"references/connect.md","type":"text","marks":[{"type":"link","attrs":{"href":"references/connect.md","title":null}}]},{"text":" — sr-connect CLI, install, security model, troubleshooting","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"references/ram-policies.md","type":"text","marks":[{"type":"link","attrs":{"href":"references/ram-policies.md","title":null}}]},{"text":" — RAM permission declaration (none required; StarRocks-internal auth only)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"references/schema.md","type":"text","marks":[{"type":"link","attrs":{"href":"references/schema.md","title":null}}]},{"text":" — schema design flow: table models, partitioning, bucketing, sort key, indexes, storage parameters","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"references/data-import.md","type":"text","marks":[{"type":"link","attrs":{"href":"references/data-import.md","title":null}}]},{"text":" — ingestion method selection, parameters, performance tuning, Primary Key updates","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"references/sql.md","type":"text","marks":[{"type":"link","attrs":{"href":"references/sql.md","title":null}}]},{"text":" — query writing, window functions, materialized views, functions, SQL tuning, advanced features","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"references/diagnostics.md","type":"text","marks":[{"type":"link","attrs":{"href":"references/diagnostics.md","title":null}}]},{"text":" — cluster health inspection flow, severity classification, synthesis template","type":"text"}]}]}]},{"type":"hr","attrs":{"markup":"---"}}]},"metadata":{"date":"2026-06-05","name":"alibabacloud-emr-starrocks-assistant","author":"@skillopedia","source":{"stars":133,"repo_name":"alibabacloud-aiops-skills","origin_url":"https://github.com/aliyun/alibabacloud-aiops-skills/blob/HEAD/skills/analyticscomputing/emapreduce/alibabacloud-emr-starrocks-assistant/SKILL.md","repo_owner":"aliyun","body_sha256":"db5f75eaf1498d6fe916172b8bb3f7161e1bb8543706fafed6b59366374c4976","cluster_key":"014452159f3705794745dff37ecdeafd6ec11ca23559f9a7bed5b3aeec61f9ca","clean_bundle":{"format":"clean-skill-bundle-v1","source":"aliyun/alibabacloud-aiops-skills/skills/analyticscomputing/emapreduce/alibabacloud-emr-starrocks-assistant/SKILL.md","attachments":[{"id":"23cfb3dc-6ded-557e-8482-eea4afbcef4a","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/23cfb3dc-6ded-557e-8482-eea4afbcef4a/attachment.md","path":"references/connect.md","size":9457,"sha256":"69f0a5633e74b03e9ec79e495f892ca9a196976c6e4f09c194d0ba5595d226f0","contentType":"text/markdown; charset=utf-8"},{"id":"eb482aca-6f6f-5c1a-a08c-313b89b2b263","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/eb482aca-6f6f-5c1a-a08c-313b89b2b263/attachment.md","path":"references/data-import.md","size":9649,"sha256":"1a046a2911f3f610119915c71f7c03a55fe67dcb5a8de31db9549712cf038969","contentType":"text/markdown; charset=utf-8"},{"id":"8dfb7ff7-4730-5971-8b2d-788e52c68474","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/8dfb7ff7-4730-5971-8b2d-788e52c68474/attachment.md","path":"references/data-import/broker-load.md","size":8678,"sha256":"a69ecb6949d69928044c21d4b6786d11d678cacd61750b8fa34c2441a0b112d3","contentType":"text/markdown; charset=utf-8"},{"id":"40f835d0-9877-537e-b571-7b77ba0367f8","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/40f835d0-9877-537e-b571-7b77ba0367f8/attachment.md","path":"references/data-import/connectors.md","size":10801,"sha256":"406944fbabdc3650b50f091f9eb2e66d4e3f38e9aa00d8e112dcf0f20290e3b5","contentType":"text/markdown; charset=utf-8"},{"id":"0f710979-587b-571a-8011-228754aca86e","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/0f710979-587b-571a-8011-228754aca86e/attachment.md","path":"references/data-import/insert-and-pipe.md","size":8048,"sha256":"f9af49489a1792ef6de17a31f25ba3f2fb8c975fdaeef4c40f86b5a018035461","contentType":"text/markdown; charset=utf-8"},{"id":"d5b4a18e-996b-5b6d-9355-49d338dadfda","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/d5b4a18e-996b-5b6d-9355-49d338dadfda/attachment.md","path":"references/data-import/performance-tuning.md","size":10631,"sha256":"575f2fcddc599dd85b1b9dc9a20813e7498d70f96c7d9d2c8d2046f2163cbac6","contentType":"text/markdown; charset=utf-8"},{"id":"5024a173-2502-567a-abd5-8049beefae7b","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/5024a173-2502-567a-abd5-8049beefae7b/attachment.md","path":"references/data-import/primary-key-updates.md","size":9488,"sha256":"a4c8f39bb926b4ae0ba9e49027db4bc6ecc957c790374f1d39a09735e3f8a779","contentType":"text/markdown; charset=utf-8"},{"id":"6262b285-dad1-52b7-aed5-1e34ecc4b571","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/6262b285-dad1-52b7-aed5-1e34ecc4b571/attachment.md","path":"references/data-import/routine-load.md","size":10944,"sha256":"3f15c85b31a08c7080aa47d837c317f630542e0a12863f806760af9a9a68b769","contentType":"text/markdown; charset=utf-8"},{"id":"ca35ae43-a53b-5ddc-986c-ef721fd91048","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/ca35ae43-a53b-5ddc-986c-ef721fd91048/attachment.md","path":"references/data-import/stream-load.md","size":10374,"sha256":"edd5ab00df3475911e20384b3dc687960c292acf4ba8d6048da0e94360a7e75e","contentType":"text/markdown; charset=utf-8"},{"id":"f5e60df6-d7b6-5528-b1d4-b9536771f71b","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/f5e60df6-d7b6-5528-b1d4-b9536771f71b/attachment.md","path":"references/diagnostics.md","size":8431,"sha256":"e6424f2a533f448b8fe0b0185048028b57c1993bf0662e1234396e7334099f92","contentType":"text/markdown; charset=utf-8"},{"id":"db06a2f2-19dc-5685-9149-21b7a54cc845","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/db06a2f2-19dc-5685-9149-21b7a54cc845/attachment.md","path":"references/ram-policies.md","size":552,"sha256":"9098cc06baf7dfd5643adac2e54aabdf617d7a1b4435e0df23b0bf498de8b1b6","contentType":"text/markdown; charset=utf-8"},{"id":"6994b6da-e097-5687-8816-e2dd6a1dc1c0","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/6994b6da-e097-5687-8816-e2dd6a1dc1c0/attachment.md","path":"references/schema.md","size":7108,"sha256":"b7d2a6fa0786a1a4067588b594caa6f1689cbb3319cc253f274e4c6424bae171","contentType":"text/markdown; charset=utf-8"},{"id":"e20e77fb-f60d-5d3f-990f-4d95f85c90ce","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/e20e77fb-f60d-5d3f-990f-4d95f85c90ce/attachment.md","path":"references/schema/bucketing.md","size":9272,"sha256":"865dca8316315cb2a1e258696e52777ff551d87976bb90ef57a7cc530771b9a7","contentType":"text/markdown; charset=utf-8"},{"id":"f61146a1-d2b2-53ab-8f2a-0cbadd40f470","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/f61146a1-d2b2-53ab-8f2a-0cbadd40f470/attachment.md","path":"references/schema/partitioning.md","size":9097,"sha256":"b1e3fdc5f7347e892408149dd60f3c359f5fe7ca9a8a46aef15426226273c4b2","contentType":"text/markdown; charset=utf-8"},{"id":"88881a58-e181-5963-bc3b-005bd6b4e47b","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/88881a58-e181-5963-bc3b-005bd6b4e47b/attachment.md","path":"references/schema/sort-key-and-indexes.md","size":10515,"sha256":"4a006fc61134a79e9714625b412b51723dd96982a508ebf2f60326f7ed445681","contentType":"text/markdown; charset=utf-8"},{"id":"cea6bb37-0172-5d49-8908-0b05cde3e526","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/cea6bb37-0172-5d49-8908-0b05cde3e526/attachment.md","path":"references/schema/storage-properties.md","size":11056,"sha256":"1c4d00916b8fe87db7b8e7070a39626cd8425db32ed274c933fcd1b38302ac32","contentType":"text/markdown; charset=utf-8"},{"id":"2708a941-f484-58b5-ad02-48fd333a6995","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/2708a941-f484-58b5-ad02-48fd333a6995/attachment.md","path":"references/schema/table-types.md","size":8810,"sha256":"497f58bbc3c026e9011466478c12ce90b8db4ac16b664d601e6d557bf1df9819","contentType":"text/markdown; charset=utf-8"},{"id":"6877799c-f72e-5918-a53c-a339da419c0a","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/6877799c-f72e-5918-a53c-a339da419c0a/attachment.md","path":"references/sql.md","size":7059,"sha256":"cf706bec757d469a28b11325b8c404ebdb637cf478892326b3fc5644ce7e2137","contentType":"text/markdown; charset=utf-8"},{"id":"38a8afc5-f160-5f63-83bc-f85a46044e8a","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/38a8afc5-f160-5f63-83bc-f85a46044e8a/attachment.md","path":"references/sql/advanced-sql.md","size":11108,"sha256":"6666350c52b7577fd5f450f822f061d44b2d6e49f4d61e0dca17b4aee8b3920c","contentType":"text/markdown; charset=utf-8"},{"id":"baafaba6-5536-53d6-b8d9-18d46c9b6f3a","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/baafaba6-5536-53d6-b8d9-18d46c9b6f3a/attachment.md","path":"references/sql/functions-guide.md","size":12361,"sha256":"17f87077945d3a43e8b55c3e3a34be59234fca74e5428581fdac57c6520669b0","contentType":"text/markdown; charset=utf-8"},{"id":"b3018841-5215-5579-bf86-fc47d723a92a","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/b3018841-5215-5579-bf86-fc47d723a92a/attachment.md","path":"references/sql/materialized-views.md","size":9310,"sha256":"4901f7a2cb0effd1d8e62899a08e394cbee0f4b12ec1c704bd1cb558ace7e6e8","contentType":"text/markdown; charset=utf-8"},{"id":"d651e918-c780-5168-8a5d-3a8d3b35d34a","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/d651e918-c780-5168-8a5d-3a8d3b35d34a/attachment.md","path":"references/sql/query-writing.md","size":13368,"sha256":"bd6e739979666ab5378d741cf4d06a94c307eddaa405f9745bfdd67e876ad65f","contentType":"text/markdown; charset=utf-8"},{"id":"98d82175-2056-554f-be9e-84b4deb24100","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/98d82175-2056-554f-be9e-84b4deb24100/attachment.md","path":"references/sql/sql-tuning-checklist.md","size":12667,"sha256":"e516738af63bb7e2b233ed3db64ddaa6123a1ae975c6c4237850e40c66a5c4cb","contentType":"text/markdown; charset=utf-8"},{"id":"c81bcba4-e193-5c81-8386-47b681db7903","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/c81bcba4-e193-5c81-8386-47b681db7903/attachment.md","path":"references/sql/window-functions.md","size":10690,"sha256":"df043f79b1f60c439863c13e6d789449182f2f5c6ae3f47c17d1a82ea8c01636","contentType":"text/markdown; charset=utf-8"},{"id":"e86270c8-505f-519a-9436-456139a78258","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/e86270c8-505f-519a-9436-456139a78258/attachment.py","path":"scripts/sr_connect/__init__.py","size":109,"sha256":"e4d16a2f96c0e06e15e8ec717b8c8f97e7f3b11e5df0645712af4034ce8c435b","contentType":"text/x-python; charset=utf-8"},{"id":"b66ae27c-ad31-5f7e-b801-e3ada787eb92","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/b66ae27c-ad31-5f7e-b801-e3ada787eb92/attachment.py","path":"scripts/sr_connect/classify.py","size":10505,"sha256":"c09a558b87ded3f30c204d80443594aa64e9761f79e005e2534f455a8f55938c","contentType":"text/x-python; charset=utf-8"},{"id":"70549380-475e-579c-a51a-2c2ca788d6ff","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/70549380-475e-579c-a51a-2c2ca788d6ff/attachment.py","path":"scripts/sr_connect/cli.py","size":5439,"sha256":"c50efc81a9d93d8dde46c2546e9054e8e7d5f2b3813d47d240cc308b1e941b42","contentType":"text/x-python; charset=utf-8"},{"id":"8ce0b9f7-d1bf-5476-965b-8ab0a528eefc","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/8ce0b9f7-d1bf-5476-965b-8ab0a528eefc/attachment.py","path":"scripts/sr_connect/config.py","size":3758,"sha256":"b242636ab89983bad4029281cbf7884f52d827c7366bbc541aeea3c2f31880d1","contentType":"text/x-python; charset=utf-8"},{"id":"2ee1f300-e1a1-5d8f-82d3-77a22c731f80","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/2ee1f300-e1a1-5d8f-82d3-77a22c731f80/attachment.py","path":"scripts/sr_connect/connection.py","size":2358,"sha256":"759f78f41e85217d6c708efb2087f296264eb95a43468ac2ad5b0e8aa9ba973b","contentType":"text/x-python; charset=utf-8"},{"id":"0d036cd0-d429-5d51-a138-c91cc6c10f38","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/0d036cd0-d429-5d51-a138-c91cc6c10f38/attachment.py","path":"scripts/sr_connect/doctor.py","size":7128,"sha256":"df845da8c29c216fbe8aaa68a22c8865c8bbe9632351cd38e2099d46d9654005","contentType":"text/x-python; charset=utf-8"},{"id":"69944ea5-906e-5d52-a3b4-ea66267838c7","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/69944ea5-906e-5d52-a3b4-ea66267838c7/attachment.py","path":"scripts/sr_connect/errors.py","size":281,"sha256":"ff36797765adf660f2b1a1633c18c76fa7169ffcdab6674c8752f5e9591ee46d","contentType":"text/x-python; charset=utf-8"},{"id":"a8b83003-d8a1-5ab6-8087-95758c50fc22","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/a8b83003-d8a1-5ab6-8087-95758c50fc22/attachment.py","path":"scripts/sr_connect/login.py","size":1879,"sha256":"65b984eed4cce5c4233c5a768426f048a34c3b7538ae829244ed738e20c88b2f","contentType":"text/x-python; charset=utf-8"},{"id":"e32e2791-a02a-569a-b2a5-5eb50962ca6a","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/e32e2791-a02a-569a-b2a5-5eb50962ca6a/attachment.py","path":"scripts/sr_connect/logout.py","size":501,"sha256":"27c906b8715464a83018628481f146e878a25851cbe2d940491cc2dbbd6c8e04","contentType":"text/x-python; charset=utf-8"},{"id":"0085bf9f-35fc-5e54-98ff-e6bf358f2e88","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/0085bf9f-35fc-5e54-98ff-e6bf358f2e88/attachment.py","path":"scripts/sr_connect/query.py","size":5164,"sha256":"6a910b05421166c1c5b1576a8cbd6e0237e8b5923deb9fec2999dd4f57bf4c4a","contentType":"text/x-python; charset=utf-8"},{"id":"06b6666a-7155-5458-9208-e2d30c7ef04e","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/06b6666a-7155-5458-9208-e2d30c7ef04e/attachment.py","path":"scripts/sr_connect/whoami.py","size":1167,"sha256":"78ab38e6cc7183ec99ae52a828f9b323e940704c16c65d38992dc218e681e333","contentType":"text/x-python; charset=utf-8"}],"bundle_sha256":"641b66e3ced7222e3d871ef89a5c7e65ed73b7d0e92ca9a817df373f4341b847","attachment_count":35,"text_attachments":35,"attachment_storage":"skillopedia-attachments-v1","binary_attachments":0,"excluded_attachments":[]},"cluster_size":1,"skill_md_path":"skills/analyticscomputing/emapreduce/alibabacloud-emr-starrocks-assistant/SKILL.md","import_metadata":{"date":"2026-06-05","author":"@skillopedia","version":"v1","category":"data-analytics","category_label":"Data"},"exact_dupes_collapsed_into_this":0},"license":"MIT","version":"v1","category":"data-analytics","metadata":{"owner":"starrocks-team","domain":"aiops","contact":"[email protected]","required_starrocks_privileges":["Whatever the user's account already has. The skill does not create or elevate accounts.","For full feature coverage: SELECT on the databases of interest + OPERATE ON SYSTEM for cluster diagnostics. Missing privileges degrade gracefully (specific suggestions are skipped)."]},"import_tag":"clean-skills-v1","description":"Alibaba Cloud EMR Serverless StarRocks development & operations assistant. Covers five scenarios: cluster connection, schema design, data ingestion, SQL development & tuning, and cluster health diagnostics. Use this Skill when users ask about StarRocks table design, writing SQL, choosing an ingestion method, query execution plans, materialized views, cluster health checks, FE/BE/CN node status, tablet health, or compaction. Typical scenarios: table design, Stream Load / Routine Load / Broker Load selection, SQL optimization, window functions, CTEs, JOIN tuning, materialized view design, cluster health inspection, node-down diagnosis. Not applicable for: StarRocks instance lifecycle management (create / scale / restart / config change / version upgrade — these are control-plane operations, please use the EMR Serverless console or the corresponding OpenAPI), or other Alibaba Cloud products (EMR Cluster, Spark, Milvus, ClickHouse, Doris, RDS, ECS).\n","allowed-tools":"Bash Read","compatibility":"Python 3.10+ with uv; reachable StarRocks FE endpoint (default port 9030). On first use, run `sr-login` to register a cluster credential locally. Privileges follow the user's own account — capabilities (which databases / diagnostic commands are accessible) are introspected at login via SHOW GRANTS FOR CURRENT_USER().\n"}},"renderedAt":1782982029130}

Alibaba Cloud EMR Serverless StarRocks Development & Operations Assistant Help users perform day-to-day table design, data ingestion, SQL writing & tuning, and health diagnostics on Alibaba Cloud EMR Serverless StarRocks. All cluster access goes through the bundled CLI (pymysql-based, uses the user's own account); no MySQL client required. Non-READ SQL is classified by sqlglot and requires confirmation before execution. Scope statement : This Skill focuses on using StarRocks — development, diagnostics, and day-to-day data operations. Cluster-internal data and schema operations (DDL, DML, mate…