apify-sdk-patterns — Skillopedia

Apify SDK Patterns Overview Production patterns for both the SDK (building Actors) and (calling Actors remotely). Covers Crawlee crawler selection, data storage, proxy configuration, and typed client wrappers. Prerequisites - and/or + installed - configured - TypeScript recommended Pattern 1: Typed Client Singleton Pattern 2: Crawlee Crawler Selection Choose the right crawler for the job: Pattern 3: Actor Lifecycle with Error Handling Pattern 4: Dataset Operations Pattern 5: Key-Value Store Operations Pattern 6: Proxy Configuration Pattern 7: Router for Multi-Page Actors Pattern 8: Safe Resul…

, '')),\n description: $('div.description').text().trim(),\n });\n});\n\nawait Actor.main(async () => {\n const crawler = new CheerioCrawler({\n requestHandler: router,\n });\n await crawler.run(['https://example-store.com/products']);\n});\n```\n\n## Pattern 8: Safe Result Wrapper\n\n```typescript\ntype Result\u003cT> = { data: T; error: null } | { data: null; error: Error };\n\nasync function safeActorCall\u003cT>(\n client: ApifyClient,\n actorId: string,\n input: Record\u003cstring, unknown>,\n): Promise\u003cResult\u003cT[]>> {\n try {\n const run = await client.actor(actorId).call(input, { timeout: 300 });\n\n if (run.status !== 'SUCCEEDED') {\n return { data: null, error: new Error(`Run ${run.status}: ${run.statusMessage}`) };\n }\n\n const { items } = await client.dataset(run.defaultDatasetId).listItems();\n return { data: items as T[], error: null };\n } catch (err) {\n return { data: null, error: err as Error };\n }\n}\n\n// Usage\nconst result = await safeActorCall\u003c{ url: string; title: string }>(\n client, 'apify/web-scraper', { startUrls: [{ url: 'https://example.com' }] }\n);\n\nif (result.error) {\n console.error('Actor call failed:', result.error.message);\n} else {\n console.log(`Got ${result.data.length} items`);\n}\n```\n\n## Error Handling\n\n| Pattern | Use Case | Benefit |\n|---------|----------|---------|\n| `Actor.main()` | Actor entry point | Auto init/exit + error reporting |\n| `failedRequestHandler` | Per-request failures | Log failures without stopping crawl |\n| Safe wrapper | External calls | Prevents uncaught exceptions |\n| Router | Multi-page scrapes | Clean separation of page types |\n| Proxy rotation | Anti-bot sites | Higher success rate |\n\n## Resources\n\n- [Apify SDK Reference](https://docs.apify.com/sdk/js/reference)\n- [Crawlee Documentation](https://crawlee.dev/js/docs/quick-start)\n- [Apify JS Client Reference](https://docs.apify.com/api/client/js/reference)\n- [Proxy Management Guide](https://docs.apify.com/sdk/js/docs/guides/proxy-management)\n\n## Next Steps\n\nApply patterns in `apify-core-workflow-a` for a complete web scraping workflow.\n---","attachment_filenames":[],"attachments":[],"content_json":{"type":"doc","content":[{"type":"heading","attrs":{"level":1},"content":[{"text":"Apify SDK Patterns","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Overview","type":"text"}]},{"type":"paragraph","content":[{"text":"Production patterns for both the ","type":"text"},{"text":"apify","type":"text","marks":[{"type":"code_inline"}]},{"text":" SDK (building Actors) and ","type":"text"},{"text":"apify-client","type":"text","marks":[{"type":"code_inline"}]},{"text":" (calling Actors remotely). Covers Crawlee crawler selection, data storage, proxy configuration, and typed client wrappers.","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Prerequisites","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"apify-client","type":"text","marks":[{"type":"code_inline"}]},{"text":" and/or ","type":"text"},{"text":"apify","type":"text","marks":[{"type":"code_inline"}]},{"text":" + ","type":"text"},{"text":"crawlee","type":"text","marks":[{"type":"code_inline"}]},{"text":" installed","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"APIFY_TOKEN","type":"text","marks":[{"type":"code_inline"}]},{"text":" configured","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"TypeScript recommended","type":"text"}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Pattern 1: Typed Client Singleton","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"typescript"},"content":[{"text":"// src/apify/client.ts\nimport { ApifyClient } from 'apify-client';\n\nlet instance: ApifyClient | null = null;\n\nexport function getApifyClient(): ApifyClient {\n if (!instance) {\n const token = process.env.APIFY_TOKEN;\n if (!token) throw new Error('APIFY_TOKEN is required');\n instance = new ApifyClient({ token });\n }\n return instance;\n}\n\n// Reset for testing\nexport function resetClient(): void {\n instance = null;\n}","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Pattern 2: Crawlee Crawler Selection","type":"text"}]},{"type":"paragraph","content":[{"text":"Choose the right crawler for the job:","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"typescript"},"content":[{"text":"import { CheerioCrawler, PlaywrightCrawler, PuppeteerCrawler } from 'crawlee';\n\n// CHEERIO — Fast, lightweight, no JavaScript rendering\n// Use for: static HTML, server-rendered pages, APIs\nconst cheerioCrawler = new CheerioCrawler({\n async requestHandler({ request, $, enqueueLinks }) {\n const title = $('title').text();\n await Actor.pushData({ url: request.url, title });\n await enqueueLinks({ strategy: 'same-domain' });\n },\n});\n\n// PLAYWRIGHT — Full browser, all engines, modern API\n// Use for: SPAs, JavaScript-heavy pages, complex interactions\nconst playwrightCrawler = new PlaywrightCrawler({\n launchContext: { launchOptions: { headless: true } },\n async requestHandler({ page, request, enqueueLinks }) {\n await page.waitForSelector('h1');\n const title = await page.title();\n const content = await page.$eval('main', el => el.textContent);\n await Actor.pushData({ url: request.url, title, content });\n await enqueueLinks({ strategy: 'same-domain' });\n },\n});\n\n// PUPPETEER — Chromium-only browser automation\n// Use for: when you need Chromium specifically or legacy Puppeteer code\nconst puppeteerCrawler = new PuppeteerCrawler({\n async requestHandler({ page, request }) {\n const title = await page.title();\n await Actor.pushData({ url: request.url, title });\n },\n});","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Pattern 3: Actor Lifecycle with Error Handling","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"typescript"},"content":[{"text":"import { Actor } from 'apify';\nimport { CheerioCrawler, log } from 'crawlee';\n\n// Actor.main() wraps init + exit + error handling\nawait Actor.main(async () => {\n const input = await Actor.getInput\u003c{\n startUrls: { url: string }[];\n maxPages?: number;\n proxyConfig?: { useApifyProxy: boolean; groups?: string[] };\n }>();\n\n if (!input?.startUrls?.length) {\n throw new Error('Input must include at least one startUrl');\n }\n\n // Configure proxy if requested\n const proxyConfiguration = input.proxyConfig?.useApifyProxy\n ? await Actor.createProxyConfiguration({\n groups: input.proxyConfig.groups,\n })\n : undefined;\n\n const crawler = new CheerioCrawler({\n proxyConfiguration,\n maxRequestsPerCrawl: input.maxPages ?? 50,\n maxConcurrency: 10,\n\n async requestHandler({ request, $, enqueueLinks }) {\n log.info(`Processing ${request.url}`);\n\n await Actor.pushData({\n url: request.url,\n title: $('title').text().trim(),\n h1: $('h1').first().text().trim(),\n paragraphs: $('p').map((_, el) => $(el).text().trim()).get(),\n });\n\n await enqueueLinks({ strategy: 'same-domain' });\n },\n\n async failedRequestHandler({ request }, error) {\n log.error(`Request failed: ${request.url}`, { error: error.message });\n await Actor.pushData({\n url: request.url,\n error: error.message,\n '#isFailed': true,\n });\n },\n });\n\n await crawler.run(input.startUrls.map(s => s.url));\n log.info(`Crawler finished. ${crawler.stats.state.requestsFinished} pages processed.`);\n});","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Pattern 4: Dataset Operations","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"typescript"},"content":[{"text":"import { Actor } from 'apify';\nimport { ApifyClient } from 'apify-client';\n\n// --- Inside an Actor (apify SDK) ---\n\n// Push single item\nawait Actor.pushData({ url: 'https://example.com', title: 'Example' });\n\n// Push batch\nawait Actor.pushData([\n { url: 'https://a.com', price: 10 },\n { url: 'https://b.com', price: 20 },\n]);\n\n// Store named output in key-value store\nawait Actor.setValue('SUMMARY', {\n totalItems: 100,\n avgPrice: 15.50,\n crawledAt: new Date().toISOString(),\n});\n\n// Get value back\nconst summary = await Actor.getValue('SUMMARY');\n\n// --- From external app (apify-client) ---\nconst client = new ApifyClient({ token: process.env.APIFY_TOKEN });\n\n// List dataset items with pagination\nconst { items, total } = await client\n .dataset('DATASET_ID')\n .listItems({ limit: 1000, offset: 0 });\n\n// Push items to a named dataset\nconst dataset = await client.datasets().getOrCreate('my-results');\nawait client.dataset(dataset.id).pushItems([\n { url: 'https://example.com', data: 'scraped content' },\n]);\n\n// Download entire dataset\nconst csv = await client.dataset(dataset.id).downloadItems('csv');\nconst json = await client.dataset(dataset.id).downloadItems('json');","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Pattern 5: Key-Value Store Operations","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"typescript"},"content":[{"text":"import { ApifyClient } from 'apify-client';\n\nconst client = new ApifyClient({ token: process.env.APIFY_TOKEN });\n\n// Create or get a named store\nconst store = await client.keyValueStores().getOrCreate('my-config');\nconst storeClient = client.keyValueStore(store.id);\n\n// Set a record (any content type)\nawait storeClient.setRecord({\n key: 'CONFIG',\n value: { retries: 3, timeout: 30000 },\n contentType: 'application/json',\n});\n\n// Get a record\nconst record = await storeClient.getRecord('CONFIG');\nconsole.log(record?.value); // { retries: 3, timeout: 30000 }\n\n// Store binary data (screenshots, PDFs)\nawait storeClient.setRecord({\n key: 'screenshot.png',\n value: screenshotBuffer,\n contentType: 'image/png',\n});\n\n// List all keys\nconst { items: keys } = await storeClient.listKeys();","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Pattern 6: Proxy Configuration","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"typescript"},"content":[{"text":"import { Actor } from 'apify';\n\n// Datacenter proxy (included in subscription, fast)\nconst dcProxy = await Actor.createProxyConfiguration({\n groups: ['BUYPROXIES94952'],\n});\n\n// Residential proxy (pay per GB, high success rate)\nconst resProxy = await Actor.createProxyConfiguration({\n groups: ['RESIDENTIAL'],\n countryCode: 'US',\n});\n\n// Google SERP proxy (specialized for Google)\nconst serpProxy = await Actor.createProxyConfiguration({\n groups: ['GOOGLE_SERP'],\n});\n\n// Use with any crawler\nconst crawler = new CheerioCrawler({\n proxyConfiguration: dcProxy,\n // ...\n});","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Pattern 7: Router for Multi-Page Actors","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"typescript"},"content":[{"text":"import { Actor } from 'apify';\nimport { CheerioCrawler, createCheerioRouter } from 'crawlee';\n\nconst router = createCheerioRouter();\n\n// Default route — listing pages\nrouter.addDefaultHandler(async ({ request, $, enqueueLinks }) => {\n // Extract links to detail pages\n const detailLinks = $('a.product-link')\n .map((_, el) => $(el).attr('href'))\n .get();\n\n await enqueueLinks({\n urls: detailLinks,\n label: 'DETAIL',\n });\n});\n\n// Detail route — individual item pages\nrouter.addHandler('DETAIL', async ({ request, $ }) => {\n await Actor.pushData({\n url: request.url,\n name: $('h1.product-name').text().trim(),\n price: parseFloat($('.price').text().replace('

Important: agents should read /llm.txt, /llms.txt, or /.well-known/skills.json to discover the public Skillopedia API.