cheerio — Skillopedia

Cheerio Overview Cheerio is a fast, lightweight HTML/XML parser for Node.js that implements a jQuery-like API. Unlike Puppeteer, it does not run a browser — it parses raw HTML strings, making it 100x faster and ideal for scraping server-rendered pages, parsing HTML files, and transforming HTML content. Pair it with or for web scraping, or use it standalone for HTML processing. Instructions Step 1: Installation Step 2: Parse HTML and Extract Data Step 3: Web Scraping with Fetch Step 4: Advanced Selectors and Traversal Step 5: Table Extraction Step 6: HTML Transformation Examples Example 1: Bui…

, ''))\n return price \u003c 50\n})\n\n// Text and HTML\n$('.product').first().text() // all text content, flattened\n$('.product').first().html() // inner HTML\n```\n\n### Step 5: Table Extraction\n\n```javascript\n// extract_table.js — Parse HTML tables into structured data\nfunction extractTable($, tableSelector) {\n /**\n * Convert an HTML table to an array of objects using headers as keys.\n * Args:\n * $: Cheerio instance\n * tableSelector: CSS selector for the table element\n */\n const headers = []\n $(`${tableSelector} thead th`).each((i, el) => {\n headers.push($(el).text().trim())\n })\n\n const rows = []\n $(`${tableSelector} tbody tr`).each((i, tr) => {\n const row = {}\n $(tr).find('td').each((j, td) => {\n row[headers[j]] = $(td).text().trim()\n })\n rows.push(row)\n })\n return rows\n}\n\n// Usage\nconst tableData = extractTable($, '#pricing-table')\n// [{ Plan: 'Free', Price: '$0', Users: '1' }, { Plan: 'Pro', Price: '$29', Users: '10' }]\n```\n\n### Step 6: HTML Transformation\n\n```javascript\n// transform.js — Modify HTML content\nconst $ = cheerio.load(html)\n\n// Add class\n$('.product').addClass('featured')\n\n// Remove elements\n$('.ad-banner').remove()\n\n// Replace content\n$('h1').text('Updated Title')\n\n// Wrap elements\n$('.product').wrap('\u003csection class=\"product-section\">\u003c/section>')\n\n// Add attributes\n$('a').attr('target', '_blank')\n$('img').attr('loading', 'lazy')\n\n// Get modified HTML\nconst modifiedHtml = $.html()\n```\n\n## Examples\n\n### Example 1: Build a price monitoring scraper\n**User prompt:** \"Scrape product prices from 5 competitor websites daily and save to a CSV. The sites are server-rendered (no JavaScript needed).\"\n\nThe agent will:\n1. Use `fetch` + cheerio for each site (no browser overhead).\n2. Write site-specific selectors for product name, price, and availability.\n3. Parse prices into numbers, normalize currency.\n4. Append results to a CSV with timestamps.\n5. Set up as a cron job for daily execution.\n\n### Example 2: Extract and clean article content from HTML\n**User prompt:** \"I have 1,000 saved HTML pages from a blog. Extract just the article title, author, date, and body text from each, ignoring navigation, ads, and footers.\"\n\nThe agent will:\n1. Read each HTML file, load with cheerio.\n2. Extract content using article-specific selectors (`article`, `.post-content`, etc.).\n3. Strip HTML tags from body, normalize whitespace.\n4. Output structured JSON with title, author, date, and clean text.\n\n## Guidelines\n\n- Use cheerio for server-rendered pages (where the HTML contains the data you need). For SPAs or JavaScript-rendered content, use Puppeteer instead.\n- Cheerio does not execute JavaScript, fetch external resources, or render CSS — it only parses the HTML string you give it.\n- Always call `.trim()` on extracted text — HTML often contains whitespace, newlines, and indentation that clutters results.\n- Use `.attr('href')` and `.attr('src')` to get link/image URLs. Remember these may be relative — resolve them against the base URL.\n- For large-scale scraping, cheerio is 100x faster than Puppeteer and uses negligible memory. It can process thousands of pages per second.\n- Combine cheerio with `fetch` or `axios` for scraping, and add delays between requests to avoid overwhelming target servers.\n---","attachment_filenames":["_scores.json"],"attachments":[{"filename":"_scores.json","content":"{\n \"version\": \"1.0.0\",\n \"skillHash\": \"sha256:8583f9371f74e335658c6b2e182305cf913f50ca0a2d6ed22ce9cba1b1176b02\",\n \"scoredAt\": \"2026-05-13T15:20:14.706Z\",\n \"backend\": \"ollama\",\n \"model\": \"gpt-oss:20b\",\n \"quality\": {\n \"score\": 87,\n \"dimensions\": {\n \"clarity\": \"PASS\",\n \"completeness\": \"WEAK\",\n \"conciseness\": \"PASS\",\n \"actionability\": \"PASS\",\n \"crossPlatform\": \"WEAK\",\n \"examples\": \"PASS\"\n },\n \"issues\": [\n {\n \"severity\": \"MEDIUM\",\n \"category\": \"completeness\",\n \"detail\": \"The skill does not cover error handling or edge cases such as network failures or missing selectors.\"\n },\n {\n \"severity\": \"MEDIUM\",\n \"category\": \"crossPlatform\",\n \"detail\": \"The skill is limited to Node.js environments and does not support other AI agent platforms.\"\n }\n ]\n },\n \"security\": {\n \"verdict\": \"SAFE\",\n \"issues\": []\n },\n \"impact\": {\n \"multiplier\": 9.8,\n \"baselineAvg\": 5,\n \"treatmentAvg\": 98,\n \"scenarios\": [\n {\n \"name\": \"scrape-product-prices\",\n \"baseline\": 10,\n \"treatment\": 95,\n \"rationale\": \"Response B fully satisfies the rubric with fetch, cheerio, CSS selectors, and JSON output, while Response A uses Python and BeautifulSoup, missing the required JavaScript tools.\"\n },\n {\n \"name\": \"clean-html-email\",\n \"baseline\": 0,\n \"treatment\": 100,\n \"rationale\": \"Response B fully satisfies the rubric using cheerio, while Response A uses BeautifulSoup and does not meet the specified requirements.\"\n }\n ]\n }\n}\n","content_type":"application/json; charset=utf-8","language":"json","size":1603,"content_sha256":"df3d8ccc97e3d0eefb212bd6cbeffeab93fde817bbed60fedff9fcb052fcbef8"}],"content_json":{"type":"doc","content":[{"type":"heading","attrs":{"level":1},"content":[{"text":"Cheerio","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Overview","type":"text"}]},{"type":"paragraph","content":[{"text":"Cheerio is a fast, lightweight HTML/XML parser for Node.js that implements a jQuery-like API. Unlike Puppeteer, it does not run a browser — it parses raw HTML strings, making it 100x faster and ideal for scraping server-rendered pages, parsing HTML files, and transforming HTML content. Pair it with ","type":"text"},{"text":"fetch","type":"text","marks":[{"type":"code_inline"}]},{"text":" or ","type":"text"},{"text":"axios","type":"text","marks":[{"type":"code_inline"}]},{"text":" for web scraping, or use it standalone for HTML processing.","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Instructions","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Step 1: Installation","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"npm install cheerio","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Step 2: Parse HTML and Extract Data","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"javascript"},"content":[{"text":"// parse_html.js — Load HTML and extract structured data with CSS selectors\nimport * as cheerio from 'cheerio'\n\nconst html = `\n\u003chtml>\n \u003cbody>\n \u003ch1>Products\u003c/h1>\n \u003cdiv class=\"product\" data-id=\"1\">\n \u003ch2>Widget Pro\u003c/h2>\n \u003cspan class=\"price\">$29.99\u003c/span>\n \u003ca href=\"/products/widget-pro\">Details\u003c/a>\n \u003c/div>\n \u003cdiv class=\"product\" data-id=\"2\">\n \u003ch2>Gadget Max\u003c/h2>\n \u003cspan class=\"price\">$49.99\u003c/span>\n \u003ca href=\"/products/gadget-max\">Details\u003c/a>\n \u003c/div>\n \u003c/body>\n\u003c/html>`\n\nconst $ = cheerio.load(html)\n\n// Extract all products\nconst products = []\n$('.product').each((i, el) => {\n products.push({\n id: $(el).attr('data-id'),\n title: $(el).find('h2').text().trim(),\n price: $(el).find('.price').text().trim(),\n link: $(el).find('a').attr('href'),\n })\n})\n\nconsole.log(products)\n// [{ id: '1', title: 'Widget Pro', price: '$29.99', link: '/products/widget-pro' }, ...]","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Step 3: Web Scraping with Fetch","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"javascript"},"content":[{"text":"// scrape_site.js — Fetch a page and extract data\nimport * as cheerio from 'cheerio'\n\nasync function scrape(url) {\n const response = await fetch(url)\n const html = await response.text()\n const $ = cheerio.load(html)\n\n // Extract all links\n const links = []\n $('a[href]').each((i, el) => {\n links.push({\n text: $(el).text().trim(),\n href: $(el).attr('href'),\n })\n })\n\n // Extract meta tags\n const meta = {\n title: $('title').text(),\n description: $('meta[name=\"description\"]').attr('content'),\n ogImage: $('meta[property=\"og:image\"]').attr('content'),\n }\n\n return { links, meta }\n}","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Step 4: Advanced Selectors and Traversal","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"javascript"},"content":[{"text":"// selectors.js — Complex CSS selectors and DOM traversal\nconst $ = cheerio.load(html)\n\n// Attribute selectors\n$('a[href^=\"https\"]') // links starting with https\n$('img[src$=\".png\"]') // PNG images\n$('div[class*=\"product\"]') // divs with \"product\" in class\n\n// Traversal\n$('.product').first() // first product\n$('.product').last() // last product\n$('.product').eq(2) // third product (0-indexed)\n$('.price').parent() // parent of each .price element\n$('.product').children('h2') // direct h2 children\n$('.product').find('.price') // descendants matching .price\n$('.product').next() // next sibling\n$('.product').prev() // previous sibling\n\n// Filtering\n$('.product').filter((i, el) => {\n const price = parseFloat($(el).find('.price').text().replace('

Important: agents should read /llm.txt, /llms.txt, or /.well-known/skills.json to discover the public Skillopedia API.

, ''))\n return price \u003c 50\n})\n\n// Text and HTML\n$('.product').first().text() // all text content, flattened\n$('.product').first().html() // inner HTML","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Step 5: Table Extraction","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"javascript"},"content":[{"text":"// extract_table.js — Parse HTML tables into structured data\nfunction extractTable($, tableSelector) {\n /**\n * Convert an HTML table to an array of objects using headers as keys.\n * Args:\n * $: Cheerio instance\n * tableSelector: CSS selector for the table element\n */\n const headers = []\n $(`${tableSelector} thead th`).each((i, el) => {\n headers.push($(el).text().trim())\n })\n\n const rows = []\n $(`${tableSelector} tbody tr`).each((i, tr) => {\n const row = {}\n $(tr).find('td').each((j, td) => {\n row[headers[j]] = $(td).text().trim()\n })\n rows.push(row)\n })\n return rows\n}\n\n// Usage\nconst tableData = extractTable($, '#pricing-table')\n// [{ Plan: 'Free', Price: '$0', Users: '1' }, { Plan: 'Pro', Price: '$29', Users: '10' }]","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Step 6: HTML Transformation","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"javascript"},"content":[{"text":"// transform.js — Modify HTML content\nconst $ = cheerio.load(html)\n\n// Add class\n$('.product').addClass('featured')\n\n// Remove elements\n$('.ad-banner').remove()\n\n// Replace content\n$('h1').text('Updated Title')\n\n// Wrap elements\n$('.product').wrap('\u003csection class=\"product-section\">\u003c/section>')\n\n// Add attributes\n$('a').attr('target', '_blank')\n$('img').attr('loading', 'lazy')\n\n// Get modified HTML\nconst modifiedHtml = $.html()","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Examples","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Example 1: Build a price monitoring scraper","type":"text"}]},{"type":"paragraph","content":[{"text":"User prompt:","type":"text","marks":[{"type":"strong"}]},{"text":" \"Scrape product prices from 5 competitor websites daily and save to a CSV. The sites are server-rendered (no JavaScript needed).\"","type":"text"}]},{"type":"paragraph","content":[{"text":"The agent will:","type":"text"}]},{"type":"ordered_list","attrs":{"order":1,"listStyle":"number"},"content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Use ","type":"text"},{"text":"fetch","type":"text","marks":[{"type":"code_inline"}]},{"text":" + cheerio for each site (no browser overhead).","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Write site-specific selectors for product name, price, and availability.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Parse prices into numbers, normalize currency.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Append results to a CSV with timestamps.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Set up as a cron job for daily execution.","type":"text"}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Example 2: Extract and clean article content from HTML","type":"text"}]},{"type":"paragraph","content":[{"text":"User prompt:","type":"text","marks":[{"type":"strong"}]},{"text":" \"I have 1,000 saved HTML pages from a blog. Extract just the article title, author, date, and body text from each, ignoring navigation, ads, and footers.\"","type":"text"}]},{"type":"paragraph","content":[{"text":"The agent will:","type":"text"}]},{"type":"ordered_list","attrs":{"order":1,"listStyle":"number"},"content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Read each HTML file, load with cheerio.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Extract content using article-specific selectors (","type":"text"},{"text":"article","type":"text","marks":[{"type":"code_inline"}]},{"text":", ","type":"text"},{"text":".post-content","type":"text","marks":[{"type":"code_inline"}]},{"text":", etc.).","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Strip HTML tags from body, normalize whitespace.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Output structured JSON with title, author, date, and clean text.","type":"text"}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Guidelines","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Use cheerio for server-rendered pages (where the HTML contains the data you need). For SPAs or JavaScript-rendered content, use Puppeteer instead.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Cheerio does not execute JavaScript, fetch external resources, or render CSS — it only parses the HTML string you give it.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Always call ","type":"text"},{"text":".trim()","type":"text","marks":[{"type":"code_inline"}]},{"text":" on extracted text — HTML often contains whitespace, newlines, and indentation that clutters results.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Use ","type":"text"},{"text":".attr('href')","type":"text","marks":[{"type":"code_inline"}]},{"text":" and ","type":"text"},{"text":".attr('src')","type":"text","marks":[{"type":"code_inline"}]},{"text":" to get link/image URLs. Remember these may be relative — resolve them against the base URL.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"For large-scale scraping, cheerio is 100x faster than Puppeteer and uses negligible memory. It can process thousands of pages per second.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Combine cheerio with ","type":"text"},{"text":"fetch","type":"text","marks":[{"type":"code_inline"}]},{"text":" or ","type":"text"},{"text":"axios","type":"text","marks":[{"type":"code_inline"}]},{"text":" for scraping, and add delays between requests to avoid overwhelming target servers.","type":"text"}]}]}]},{"type":"hr","attrs":{"markup":"---"}}]},"metadata":{"date":"2026-06-05","name":"cheerio","author":"@skillopedia","source":{"stars":62,"repo_name":"skills","origin_url":"https://github.com/terminalskills/skills/blob/HEAD/skills/cheerio/SKILL.md","repo_owner":"terminalskills","body_sha256":"8a0ef23503631e509f6df034ca62cb406cfff308ac264d1d1569349fab5f78d7","cluster_key":"eec4643dfba81142efb8d2f4b994b532101a455aef4fefd246ab94b20954a80e","clean_bundle":{"format":"clean-skill-bundle-v1","source":"terminalskills/skills/skills/cheerio/SKILL.md","attachments":[{"id":"be77cd75-c93c-5cd7-82d8-03ac5c2a7e3d","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/be77cd75-c93c-5cd7-82d8-03ac5c2a7e3d/attachment.json","path":"_scores.json","size":1603,"sha256":"df3d8ccc97e3d0eefb212bd6cbeffeab93fde817bbed60fedff9fcb052fcbef8","contentType":"application/json; charset=utf-8"}],"bundle_sha256":"58f65c8ee6e0d87bf6f66187cf6179a79d0b2bc9a74535fa90bf1833ac7aca09","attachment_count":1,"text_attachments":1,"attachment_storage":"skillopedia-attachments-v1","binary_attachments":0,"excluded_attachments":[]},"cluster_size":1,"skill_md_path":"skills/cheerio/SKILL.md","import_metadata":{"date":"2026-06-05","author":"@skillopedia","version":"v1","category":"browser-automation-scraping","category_label":"Browser"},"exact_dupes_collapsed_into_this":0},"license":"Apache-2.0","version":"v1","category":"browser-automation-scraping","metadata":{"tags":["cheerio","html","parsing","scraping","dom","extract"],"author":"terminal-skills","version":"1.0.0","category":"development"},"import_tag":"clean-skills-v1","description":"Parse and extract data from HTML with Cheerio. Use when a user asks to scrape static web pages, parse HTML files, extract data from HTML, build a web scraper for server-rendered pages, extract text or links from HTML documents, parse RSS/XML feeds, transform HTML content, or process HTML emails. Covers jQuery-style selectors, DOM traversal, text extraction, attribute parsing, and integration with HTTP clients for web scraping pipelines.","compatibility":"Node.js 14+ (any platform)"}},"renderedAt":1782980264310}

Important: agents should read /llm.txt, /llms.txt, or /.well-known/skills.json to discover the public Skillopedia API.