Project Development Methodology This skill covers the principles for identifying tasks suited to LLM processing, designing effective project architectures, and iterating rapidly using agent-assisted development. The methodology applies whether building a batch processing pipeline, a multi-agent research system, or an interactive agent application. The unit of work for this skill is the whole project or a multi-stage pipeline. Individual tool design (descriptions, schemas, error messages) belongs to . Per-skill activation routing belongs to the corresponding skill plus the corpus index. This s…

, section, re.MULTILINE)\n return [item.strip() for item in items]\n```\n\n**Score Extraction with Validation**\n\n```python\ndef extract_score(text: str, field_name: str, min_val: int, max_val: int) -> int | None:\n \"\"\"Extract and validate numeric score.\"\"\"\n raw = extract_field(text, field_name)\n if not raw:\n return None\n \n # Extract first number from the value\n match = re.search(r'\\d+', raw)\n if not match:\n return None\n \n score = int(match.group())\n return max(min_val, min(max_val, score)) # Clamp to valid range\n```\n\n### Graceful Degradation\n\n```python\n@dataclass\nclass ParseResult:\n summary: str = \"\"\n score: int | None = None\n items: list[str] = field(default_factory=list)\n parse_errors: list[str] = field(default_factory=list)\n\ndef parse_response(text: str) -> ParseResult:\n \"\"\"Parse LLM response with graceful error handling.\"\"\"\n result = ParseResult()\n \n # Try each field, log errors but continue\n try:\n result.summary = extract_section(text, \"Summary\") or \"\"\n except Exception as e:\n result.parse_errors.append(f\"Summary extraction failed: {e}\")\n \n try:\n result.score = extract_score(text, \"Rating\", 1, 10)\n except Exception as e:\n result.parse_errors.append(f\"Score extraction failed: {e}\")\n \n try:\n result.items = extract_list_items(text, \"Analysis\")\n except Exception as e:\n result.parse_errors.append(f\"Items extraction failed: {e}\")\n \n return result\n```\n\n## Error Handling Patterns\n\n### Retry with Exponential Backoff\n\n```python\nimport time\nfrom functools import wraps\n\ndef retry_with_backoff(max_retries: int = 3, base_delay: float = 1.0):\n \"\"\"Retry decorator with exponential backoff.\"\"\"\n def decorator(func):\n @wraps(func)\n def wrapper(*args, **kwargs):\n last_exception = None\n for attempt in range(max_retries):\n try:\n return func(*args, **kwargs)\n except Exception as e:\n last_exception = e\n if attempt \u003c max_retries - 1:\n delay = base_delay * (2 ** attempt)\n time.sleep(delay)\n raise last_exception\n return wrapper\n return decorator\n```\n\n### Error Logging Pattern\n\n```python\nimport json\nfrom datetime import datetime\n\ndef log_error(item_dir: Path, stage: str, error: str, context: dict = None):\n \"\"\"Log error to file for later analysis.\"\"\"\n error_file = item_dir / \"errors.jsonl\"\n \n error_record = {\n \"timestamp\": datetime.now().isoformat(),\n \"stage\": stage,\n \"error\": error,\n \"context\": context or {},\n }\n \n with open(error_file, \"a\") as f:\n f.write(json.dumps(error_record) + \"\\n\")\n```\n\n### Partial Success Handling\n\n```python\ndef process_batch_with_partial_success(items: list) -> tuple[list, list]:\n \"\"\"Process batch, separating successes from failures.\"\"\"\n successes = []\n failures = []\n \n for item in items:\n try:\n result = process_item(item)\n successes.append((item, result))\n except Exception as e:\n failures.append((item, str(e)))\n log_error(item.directory, \"process\", str(e))\n \n # Report summary\n print(f\"Processed {len(items)} items: {len(successes)} succeeded, {len(failures)} failed\")\n \n return successes, failures\n```\n\n## Cost Estimation Patterns\n\n### Token Counting\n\n```python\nimport tiktoken\n\ndef count_tokens(text: str, model: str = \"gpt-4\") -> int:\n \"\"\"Count tokens for cost estimation.\"\"\"\n try:\n encoding = tiktoken.encoding_for_model(model)\n except KeyError:\n encoding = tiktoken.get_encoding(\"cl100k_base\")\n \n return len(encoding.encode(text))\n\ndef estimate_cost(\n input_tokens: int,\n output_tokens: int,\n input_price_per_mtok: float,\n output_price_per_mtok: float,\n) -> float:\n \"\"\"Estimate cost in dollars.\"\"\"\n input_cost = (input_tokens / 1_000_000) * input_price_per_mtok\n output_cost = (output_tokens / 1_000_000) * output_price_per_mtok\n return input_cost + output_cost\n```\n\n### Batch Cost Estimation\n\n```python\ndef estimate_batch_cost(\n items: list,\n prompt_template: str,\n avg_output_tokens: int = 1000,\n model_pricing: dict = None,\n) -> dict:\n \"\"\"Estimate total cost for a batch.\"\"\"\n model_pricing = model_pricing or {\n \"input_price_per_mtok\": 3.00, # Example: GPT-4 Turbo input\n \"output_price_per_mtok\": 15.00, # Example: GPT-4 Turbo output\n }\n \n total_input_tokens = 0\n for item in items:\n prompt = format_prompt(prompt_template, item)\n total_input_tokens += count_tokens(prompt)\n \n total_output_tokens = len(items) * avg_output_tokens\n \n estimated_cost = estimate_cost(\n total_input_tokens,\n total_output_tokens,\n **model_pricing,\n )\n \n return {\n \"item_count\": len(items),\n \"total_input_tokens\": total_input_tokens,\n \"total_output_tokens\": total_output_tokens,\n \"estimated_cost_usd\": estimated_cost,\n \"avg_input_tokens_per_item\": total_input_tokens / len(items),\n \"cost_per_item_usd\": estimated_cost / len(items),\n }\n```\n\n## CLI Pattern\n\n### Standard CLI Structure\n\n```python\nimport argparse\nfrom datetime import date\n\ndef main():\n parser = argparse.ArgumentParser(description=\"LLM Processing Pipeline\")\n \n parser.add_argument(\n \"stage\",\n choices=[\"acquire\", \"prepare\", \"process\", \"parse\", \"render\", \"all\", \"clean\"],\n help=\"Pipeline stage to run\",\n )\n parser.add_argument(\n \"--batch-id\",\n default=None,\n help=\"Batch identifier (default: today's date)\",\n )\n parser.add_argument(\n \"--limit\",\n type=int,\n default=None,\n help=\"Limit number of items (for testing)\",\n )\n parser.add_argument(\n \"--workers\",\n type=int,\n default=10,\n help=\"Number of parallel workers for processing\",\n )\n parser.add_argument(\n \"--model\",\n default=\"gpt-4-turbo\",\n help=\"Model to use for processing\",\n )\n parser.add_argument(\n \"--dry-run\",\n action=\"store_true\",\n help=\"Estimate costs without processing\",\n )\n parser.add_argument(\n \"--clean-stage\",\n choices=[\"acquire\", \"prepare\", \"process\", \"parse\"],\n help=\"For clean: only clean this stage and downstream\",\n )\n \n args = parser.parse_args()\n \n batch_id = args.batch_id or date.today().isoformat()\n \n if args.stage == \"clean\":\n stage_clean(batch_id, args.clean_stage)\n elif args.dry_run:\n estimate_costs(batch_id, args.limit)\n else:\n run_pipeline(batch_id, args.stage, args.limit, args.workers, args.model)\n\nif __name__ == \"__main__\":\n main()\n```\n\n## Rendering Patterns\n\n### Static HTML Output\n\n```python\nimport html\nimport json\n\ndef render_html(data: list[dict], output_path: Path, template: str):\n \"\"\"Render data to static HTML file.\"\"\"\n # Escape data for JavaScript embedding\n data_json = json.dumps([\n {k: html.escape(str(v)) if isinstance(v, str) else v \n for k, v in item.items()}\n for item in data\n ])\n \n html_content = template.replace(\"{{DATA_JSON}}\", data_json)\n \n output_path.parent.mkdir(parents=True, exist_ok=True)\n with open(output_path, \"w\") as f:\n f.write(html_content)\n```\n\n### Incremental Output\n\n```python\ndef render_incremental(items: list, output_dir: Path):\n \"\"\"Render each item as it completes, plus index.\"\"\"\n output_dir.mkdir(parents=True, exist_ok=True)\n \n # Render individual item pages\n for item in items:\n item_html = render_item(item)\n item_path = output_dir / f\"{item.id}.html\"\n with open(item_path, \"w\") as f:\n f.write(item_html)\n \n # Render index linking to all items\n index_html = render_index(items)\n with open(output_dir / \"index.html\", \"w\") as f:\n f.write(index_html)\n```\n\n## Checkpoint and Resume Pattern\n\nFor long-running pipelines:\n\n```python\nimport json\nfrom pathlib import Path\n\nclass PipelineCheckpoint:\n def __init__(self, checkpoint_file: Path):\n self.checkpoint_file = checkpoint_file\n self.state = self._load()\n \n def _load(self) -> dict:\n if self.checkpoint_file.exists():\n with open(self.checkpoint_file) as f:\n return json.load(f)\n return {\"completed\": [], \"failed\": [], \"last_item\": None}\n \n def save(self):\n with open(self.checkpoint_file, \"w\") as f:\n json.dump(self.state, f, indent=2)\n \n def mark_complete(self, item_id: str):\n self.state[\"completed\"].append(item_id)\n self.state[\"last_item\"] = item_id\n self.save()\n \n def mark_failed(self, item_id: str, error: str):\n self.state[\"failed\"].append({\"id\": item_id, \"error\": error})\n self.save()\n \n def get_remaining(self, all_items: list[str]) -> list[str]:\n completed = set(self.state[\"completed\"])\n return [item for item in all_items if item not in completed]\n```\n\n## Testing Patterns\n\n### Stage Unit Tests\n\n```python\ndef test_prepare_stage():\n \"\"\"Test prompt generation independently.\"\"\"\n test_item = {\"id\": \"test\", \"content\": \"Sample content\"}\n prompt = prepare_prompt(test_item)\n \n assert \"Sample content\" in prompt\n assert \"## Section 1\" in prompt # Format markers present\n\ndef test_parse_stage():\n \"\"\"Test parsing with known good output.\"\"\"\n test_response = \"\"\"\n ## Summary\n This is a test summary.\n \n ## Score\n Rating: 7\n \"\"\"\n \n result = parse_response(test_response)\n assert result.summary == \"This is a test summary.\"\n assert result.score == 7\n\ndef test_parse_stage_malformed():\n \"\"\"Test parsing handles malformed output.\"\"\"\n test_response = \"Some random text without sections\"\n \n result = parse_response(test_response)\n assert result.summary == \"\"\n assert result.score is None\n assert len(result.parse_errors) > 0\n```\n\n### Integration Test Pattern\n\n```python\ndef test_pipeline_end_to_end():\n \"\"\"Test full pipeline with single item.\"\"\"\n test_dir = Path(\"test_data\")\n test_item = create_test_item()\n \n try:\n # Run each stage\n acquire_result = stage_acquire(test_dir, [test_item])\n assert (test_dir / test_item.id / \"raw.json\").exists()\n \n prepare_result = stage_prepare(test_dir)\n assert (test_dir / test_item.id / \"prompt.md\").exists()\n \n # Skip process stage in unit tests (costs money)\n # Create mock response instead\n mock_response(test_dir / test_item.id)\n \n parse_result = stage_parse(test_dir)\n assert (test_dir / test_item.id / \"parsed.json\").exists()\n \n finally:\n # Cleanup\n shutil.rmtree(test_dir, ignore_errors=True)\n```\n\n","content_type":"text/markdown; charset=utf-8","language":"markdown","size":16921,"content_sha256":"d27c755a0e8dcca6d2c92ff7260a0a884f5fe76fef0d492de6b8eecf8cc4619a"},{"filename":"scripts/pipeline_template.py","content":"\"\"\"\nLLM Batch Processing Pipeline Template.\n\nA composable, staged pipeline architecture for LLM batch processing.\nEach stage is discrete, idempotent, and cacheable. Customize the acquire,\nprepare, process, parse, and render functions for your use case.\n\nUse when:\n - Building a new batch processing pipeline with structured LLM outputs\n - Prototyping an acquire -> prepare -> process -> parse -> render workflow\n - Need a file-system-based state machine for pipeline stage tracking\n\nUsage:\n python pipeline_template.py acquire --batch-id 2025-01-15\n python pipeline_template.py prepare --batch-id 2025-01-15\n python pipeline_template.py process --batch-id 2025-01-15 --workers 10\n python pipeline_template.py parse --batch-id 2025-01-15\n python pipeline_template.py render --batch-id 2025-01-15\n python pipeline_template.py all --batch-id 2025-01-15\n python pipeline_template.py clean --batch-id 2025-01-15 --clean-stage process\n python pipeline_template.py estimate --batch-id 2025-01-15\n\nProgrammatic usage:\n from pipeline_template import stage_acquire, stage_prepare, stage_process\n stage_acquire(\"2025-01-15\", limit=5)\n stage_prepare(\"2025-01-15\")\n stage_process(\"2025-01-15\", model=\"claude-sonnet-4-20250514\", max_workers=3)\n\"\"\"\n\nimport argparse\nimport json\nimport re\nimport time\nfrom concurrent.futures import ThreadPoolExecutor, as_completed\nfrom dataclasses import dataclass, field, asdict\nfrom datetime import date\nfrom pathlib import Path\nfrom typing import Any\n\n__all__ = [\n \"Item\",\n \"ParsedResult\",\n \"stage_acquire\",\n \"stage_prepare\",\n \"stage_process\",\n \"stage_parse\",\n \"stage_render\",\n \"stage_clean\",\n \"stage_estimate\",\n \"parse_response\",\n \"get_batch_dir\",\n \"get_item_dir\",\n \"get_output_dir\",\n]\n\n\n# -----------------------------------------------------------------------------\n# Configuration - Customize for your use case\n# -----------------------------------------------------------------------------\n\nDATA_DIR = Path(\"data\")\nOUTPUT_DIR = Path(\"output\")\n\n# Prompt template with structured output requirements\nPROMPT_TEMPLATE = \"\"\"Analyze the following content and provide your response in exactly this format.\n\n## Summary\n[2-3 sentence summary of the content]\n\n## Key Points\n- [Point 1]\n- [Point 2]\n- [Point 3]\n\n## Score\nRating: [1-10]\nConfidence: [low/medium/high]\n\n## Reasoning\n[Explanation of your analysis]\n\nFollow this format exactly because I will be parsing it programmatically.\n\n---\n\n# Content to Analyze\n\nTitle: {title}\n\n{content}\n\"\"\"\n\n\n# -----------------------------------------------------------------------------\n# Data Structures\n# -----------------------------------------------------------------------------\n\n@dataclass\nclass Item:\n \"\"\"Represents a single item to process through the pipeline.\n\n Use when: creating items during the acquire stage or loading raw data\n from any source (API, database, file system).\n \"\"\"\n\n id: str\n title: str\n content: str\n metadata: dict[str, Any] = field(default_factory=dict)\n\n\n@dataclass\nclass ParsedResult:\n \"\"\"Structured result from LLM response parsing.\n\n Use when: extracting structured data from free-text LLM responses\n during the parse stage.\n \"\"\"\n\n summary: str = \"\"\n key_points: list[str] = field(default_factory=list)\n score: int | None = None\n confidence: str = \"\"\n reasoning: str = \"\"\n parse_errors: list[str] = field(default_factory=list)\n\n\n# -----------------------------------------------------------------------------\n# Path Utilities\n# -----------------------------------------------------------------------------\n\ndef get_batch_dir(batch_id: str) -> Path:\n \"\"\"Get the data directory for a batch.\n\n Use when: resolving the root directory for a specific batch run.\n \"\"\"\n return DATA_DIR / batch_id\n\n\ndef get_item_dir(batch_id: str, item_id: str) -> Path:\n \"\"\"Get the directory for a specific item.\n\n Use when: locating stage output files for a single pipeline item.\n \"\"\"\n return get_batch_dir(batch_id) / item_id\n\n\ndef get_output_dir(batch_id: str) -> Path:\n \"\"\"Get the output directory for a batch.\n\n Use when: writing final rendered outputs (HTML, reports, etc.).\n \"\"\"\n return OUTPUT_DIR / batch_id\n\n\n# -----------------------------------------------------------------------------\n# Stage: Acquire\n# -----------------------------------------------------------------------------\n\ndef stage_acquire(batch_id: str, limit: int | None = None) -> list[Path]:\n \"\"\"Stage 1: Acquire raw data from sources.\n\n Use when: fetching data from APIs, databases, or file systems\n and persisting it as raw.json per item for downstream stages.\n\n Output: {batch_dir}/{item_id}/raw.json\n Returns: List of item directories that were acquired.\n \"\"\"\n batch_dir = get_batch_dir(batch_id)\n batch_dir.mkdir(parents=True, exist_ok=True)\n\n # CUSTOMIZE: Replace with your data acquisition logic\n items = fetch_items_from_source(limit)\n\n acquired_dirs: list[Path] = []\n for item in items:\n item_dir = get_item_dir(batch_id, item.id)\n item_dir.mkdir(exist_ok=True)\n\n raw_file = item_dir / \"raw.json\"\n if not raw_file.exists():\n with open(raw_file, \"w\") as f:\n json.dump(asdict(item), f, indent=2)\n print(f\"Acquired: {item.id}\")\n else:\n print(f\"Cached: {item.id}\")\n\n acquired_dirs.append(item_dir)\n\n print(f\"\\nAcquire complete. {len(items)} items in {batch_dir}\")\n return acquired_dirs\n\n\ndef fetch_items_from_source(limit: int | None = None) -> list[Item]:\n \"\"\"CUSTOMIZE: Implement your data fetching logic here.\n\n Use when: pulling raw items from your specific data source.\n Replace this with actual API calls, database queries, etc.\n \"\"\"\n # Example: Generate sample items\n items: list[Item] = []\n for i in range(limit or 10):\n items.append(Item(\n id=f\"item-{i:04d}\",\n title=f\"Sample Item {i}\",\n content=f\"This is sample content for item {i}. \" * 10,\n metadata={\"source\": \"example\", \"index\": i},\n ))\n return items\n\n\n# -----------------------------------------------------------------------------\n# Stage: Prepare\n# -----------------------------------------------------------------------------\n\ndef stage_prepare(batch_id: str) -> int:\n \"\"\"Stage 2: Generate prompts from raw data.\n\n Use when: transforming raw acquired data into LLM-ready prompts\n using the configured PROMPT_TEMPLATE.\n\n Output: {batch_dir}/{item_id}/prompt.md\n Returns: Number of items prepared.\n \"\"\"\n batch_dir = get_batch_dir(batch_id)\n prepared_count = 0\n\n for item_dir in sorted(batch_dir.iterdir()):\n if not item_dir.is_dir():\n continue\n\n raw_file = item_dir / \"raw.json\"\n prompt_file = item_dir / \"prompt.md\"\n\n if not raw_file.exists():\n continue\n\n if prompt_file.exists():\n continue\n\n with open(raw_file) as f:\n item_data: dict[str, Any] = json.load(f)\n\n prompt = generate_prompt(item_data)\n\n with open(prompt_file, \"w\") as f:\n f.write(prompt)\n\n prepared_count += 1\n print(f\"Prepared: {item_dir.name}\")\n\n print(f\"\\nPrepare complete. {prepared_count} items prepared.\")\n return prepared_count\n\n\ndef generate_prompt(item_data: dict[str, Any]) -> str:\n \"\"\"Generate prompt from item data using template.\n\n Use when: converting a raw item dict into a formatted prompt string.\n \"\"\"\n return PROMPT_TEMPLATE.format(\n title=item_data.get(\"title\", \"Untitled\"),\n content=item_data.get(\"content\", \"\"),\n )\n\n\n# -----------------------------------------------------------------------------\n# Stage: Process\n# -----------------------------------------------------------------------------\n\ndef stage_process(\n batch_id: str,\n model: str = \"claude-sonnet-4-20250514\",\n max_workers: int = 5,\n) -> list[tuple[str, int, str | None]]:\n \"\"\"Stage 3: Execute LLM calls (the expensive, non-deterministic stage).\n\n Use when: sending prepared prompts to the LLM API and caching\n responses. This is the only non-deterministic stage.\n\n Output: {batch_dir}/{item_id}/response.md\n Returns: List of (item_id, char_count, error_or_none) tuples.\n \"\"\"\n batch_dir = get_batch_dir(batch_id)\n\n # Collect items needing processing\n to_process: list[tuple[Path, str]] = []\n for item_dir in sorted(batch_dir.iterdir()):\n if not item_dir.is_dir():\n continue\n\n prompt_file = item_dir / \"prompt.md\"\n response_file = item_dir / \"response.md\"\n\n if prompt_file.exists() and not response_file.exists():\n to_process.append((item_dir, prompt_file.read_text()))\n\n if not to_process:\n print(\"No items to process.\")\n return []\n\n print(f\"Processing {len(to_process)} items with {max_workers} workers...\")\n\n results: list[tuple[str, int, str | None]] = []\n\n def process_one(args: tuple[Path, str]) -> tuple[str, int, str | None]:\n item_dir, prompt = args\n response_file = item_dir / \"response.md\"\n\n try:\n # CUSTOMIZE: Replace with your LLM API call\n response = call_llm(prompt, model)\n\n with open(response_file, \"w\") as f:\n f.write(response)\n\n return (item_dir.name, len(response), None)\n except Exception as e:\n return (item_dir.name, 0, str(e))\n\n with ThreadPoolExecutor(max_workers=max_workers) as executor:\n futures = {executor.submit(process_one, item): item for item in to_process}\n\n for future in as_completed(futures):\n item_id, chars, error = future.result()\n results.append((item_id, chars, error))\n if error:\n print(f\" {item_id}: Error - {error}\")\n else:\n print(f\" {item_id}: Done ({chars} chars)\")\n\n print(f\"\\nProcess complete. {len(results)} items processed.\")\n return results\n\n\ndef call_llm(prompt: str, model: str) -> str:\n \"\"\"CUSTOMIZE: Implement your LLM API call here.\n\n Use when: sending a single prompt to the LLM and returning the response.\n Replace with actual OpenAI, Anthropic, etc. API calls.\n \"\"\"\n # Example mock response - replace with actual API call\n #\n # import anthropic\n # client = anthropic.Anthropic()\n # message = client.messages.create(\n # model=model,\n # max_tokens=1024,\n # messages=[{\"role\": \"user\", \"content\": prompt}],\n # )\n # return message.content[0].text\n\n # Simulate API delay\n time.sleep(0.1)\n\n # Return mock structured response\n return \"\"\"## Summary\nThis is a sample summary of the analyzed content.\n\n## Key Points\n- First key observation from the content\n- Second important finding\n- Third notable aspect\n\n## Score\nRating: 7\nConfidence: medium\n\n## Reasoning\nThe content demonstrates several characteristics that merit this rating.\nThe analysis considered multiple factors including relevance and clarity.\n\"\"\"\n\n\n# -----------------------------------------------------------------------------\n# Stage: Parse\n# -----------------------------------------------------------------------------\n\ndef stage_parse(batch_id: str) -> list[dict[str, Any]]:\n \"\"\"Stage 4: Extract structured data from LLM responses.\n\n Use when: converting free-text LLM responses into structured\n ParsedResult objects for aggregation and rendering.\n\n Output: {batch_dir}/{item_id}/parsed.json\n Returns: List of parsed result dicts with item IDs.\n \"\"\"\n batch_dir = get_batch_dir(batch_id)\n all_results: list[dict[str, Any]] = []\n\n for item_dir in sorted(batch_dir.iterdir()):\n if not item_dir.is_dir():\n continue\n\n response_file = item_dir / \"response.md\"\n parsed_file = item_dir / \"parsed.json\"\n\n if not response_file.exists():\n continue\n\n response = response_file.read_text()\n result = parse_response(response)\n\n with open(parsed_file, \"w\") as f:\n json.dump(asdict(result), f, indent=2)\n\n all_results.append({\n \"id\": item_dir.name,\n **asdict(result),\n })\n\n error_count = len(result.parse_errors)\n print(f\"Parsed: {item_dir.name} (score={result.score}, errors={error_count})\")\n\n # Save aggregated results\n agg_file = batch_dir / \"all_results.json\"\n with open(agg_file, \"w\") as f:\n json.dump(all_results, f, indent=2)\n\n print(f\"\\nParse complete. Results saved to {agg_file}\")\n return all_results\n\n\ndef parse_response(text: str) -> ParsedResult:\n \"\"\"Parse structured LLM response with graceful error handling.\n\n Use when: extracting sections, scores, and lists from a formatted\n LLM response. Logs parse errors rather than raising exceptions.\n \"\"\"\n result = ParsedResult()\n\n # Extract summary\n try:\n result.summary = extract_section(text, \"Summary\") or \"\"\n except Exception as e:\n result.parse_errors.append(f\"Summary: {e}\")\n\n # Extract key points\n try:\n result.key_points = extract_list_items(text, \"Key Points\")\n except Exception as e:\n result.parse_errors.append(f\"Key Points: {e}\")\n\n # Extract score\n try:\n result.score = extract_score(text, \"Rating\", 1, 10)\n except Exception as e:\n result.parse_errors.append(f\"Score: {e}\")\n\n # Extract confidence\n try:\n result.confidence = extract_field(text, \"Confidence\") or \"\"\n except Exception as e:\n result.parse_errors.append(f\"Confidence: {e}\")\n\n # Extract reasoning\n try:\n result.reasoning = extract_section(text, \"Reasoning\") or \"\"\n except Exception as e:\n result.parse_errors.append(f\"Reasoning: {e}\")\n\n return result\n\n\ndef extract_section(text: str, section_name: str) -> str | None:\n \"\"\"Extract content between section headers.\n\n Use when: pulling a named markdown section from LLM output.\n \"\"\"\n pattern = rf'(?:^|\\n)(?:#+ *)?{re.escape(section_name)}[:\\s]*\\n(.*?)(?=\\n#|\\Z)'\n match = re.search(pattern, text, re.IGNORECASE | re.DOTALL)\n return match.group(1).strip() if match else None\n\n\ndef extract_field(text: str, field_name: str) -> str | None:\n \"\"\"Extract value after field label.\n\n Use when: pulling a single key-value field (e.g., \"Confidence: high\").\n \"\"\"\n pattern = rf'(?:\\*\\*)?{re.escape(field_name)}(?:\\*\\*)?[\\s:\\-]+([^\\n]+)'\n match = re.search(pattern, text, re.IGNORECASE)\n return match.group(1).strip() if match else None\n\n\ndef extract_list_items(text: str, section_name: str) -> list[str]:\n \"\"\"Extract bullet points from a section.\n\n Use when: parsing a markdown list under a named section header.\n \"\"\"\n section = extract_section(text, section_name)\n if not section:\n return []\n\n items = re.findall(r'^[\\-\\*]\\s*(.+)

Project Development Methodology This skill covers the principles for identifying tasks suited to LLM processing, designing effective project architectures, and iterating rapidly using agent-assisted development. The methodology applies whether building a batch processing pipeline, a multi-agent research system, or an interactive agent application. The unit of work for this skill is the whole project or a multi-stage pipeline. Individual tool design (descriptions, schemas, error messages) belongs to . Per-skill activation routing belongs to the corresponding skill plus the corpus index. This s…

, section, re.MULTILINE)\n return [item.strip() for item in items]\n\n\ndef extract_score(\n text: str, field_name: str, min_val: int, max_val: int\n) -> int | None:\n \"\"\"Extract and validate numeric score.\n\n Use when: pulling a bounded integer score from LLM output.\n \"\"\"\n raw = extract_field(text, field_name)\n if not raw:\n return None\n\n match = re.search(r'\\d+', raw)\n if not match:\n return None\n\n score = int(match.group())\n return max(min_val, min(max_val, score))\n\n\n# -----------------------------------------------------------------------------\n# Stage: Render\n# -----------------------------------------------------------------------------\n\ndef stage_render(batch_id: str) -> Path | None:\n \"\"\"Stage 5: Generate final outputs from parsed results.\n\n Use when: producing human-readable output (HTML, reports)\n from aggregated parsed results.\n\n Output: {output_dir}/index.html\n Returns: Path to the rendered output file, or None if no results.\n \"\"\"\n batch_dir = get_batch_dir(batch_id)\n output_dir = get_output_dir(batch_id)\n output_dir.mkdir(parents=True, exist_ok=True)\n\n # Load aggregated results\n results_file = batch_dir / \"all_results.json\"\n if not results_file.exists():\n print(\"No results to render. Run parse stage first.\")\n return None\n\n with open(results_file) as f:\n results: list[dict[str, Any]] = json.load(f)\n\n # CUSTOMIZE: Replace with your rendering logic\n html = render_html(results, batch_id)\n\n output_file = output_dir / \"index.html\"\n with open(output_file, \"w\") as f:\n f.write(html)\n\n print(f\"Rendered: {output_file}\")\n return output_file\n\n\ndef render_html(results: list[dict[str, Any]], batch_id: str) -> str:\n \"\"\"Generate HTML output from results.\n\n Use when: creating a summary HTML table from parsed pipeline results.\n \"\"\"\n import html as html_lib\n\n rows = \"\"\n for r in results:\n rows += f\"\"\"\n \u003ctr>\n \u003ctd>{html_lib.escape(r.get('id', ''))}\u003c/td>\n \u003ctd>{html_lib.escape(r.get('summary', '')[:100])}...\u003c/td>\n \u003ctd>{r.get('score', 'N/A')}\u003c/td>\n \u003ctd>{html_lib.escape(r.get('confidence', ''))}\u003c/td>\n \u003c/tr>\"\"\"\n\n return f\"\"\"\u003c!DOCTYPE html>\n\u003chtml>\n\u003chead>\n \u003cmeta charset=\"utf-8\">\n \u003ctitle>Results - {batch_id}\u003c/title>\n \u003cstyle>\n body {{ font-family: system-ui, sans-serif; max-width: 1000px; margin: 0 auto; padding: 20px; }}\n table {{ width: 100%; border-collapse: collapse; }}\n th, td {{ text-align: left; padding: 10px; border-bottom: 1px solid #ddd; }}\n th {{ background: #f5f5f5; }}\n \u003c/style>\n\u003c/head>\n\u003cbody>\n \u003ch1>Results: {batch_id}\u003c/h1>\n \u003cp>{len(results)} items processed\u003c/p>\n \u003ctable>\n \u003ctr>\n \u003cth>ID\u003c/th>\n \u003cth>Summary\u003c/th>\n \u003cth>Score\u003c/th>\n \u003cth>Confidence\u003c/th>\n \u003c/tr>\n {rows}\n \u003c/table>\n\u003c/body>\n\u003c/html>\"\"\"\n\n\n# -----------------------------------------------------------------------------\n# Clean Stage\n# -----------------------------------------------------------------------------\n\ndef stage_clean(batch_id: str, from_stage: str | None = None) -> int:\n \"\"\"Remove stage outputs to enable re-processing.\n\n Use when: a stage produced bad results and needs to be re-run,\n or when clearing all intermediate files for a fresh pipeline run.\n\n Returns: Number of files deleted.\n \"\"\"\n batch_dir = get_batch_dir(batch_id)\n\n if not batch_dir.exists():\n print(f\"No data directory for {batch_id}\")\n return 0\n\n stage_outputs: dict[str, list[str]] = {\n \"acquire\": [\"raw.json\"],\n \"prepare\": [\"prompt.md\"],\n \"process\": [\"response.md\"],\n \"parse\": [\"parsed.json\"],\n }\n\n stage_order = [\"acquire\", \"prepare\", \"process\", \"parse\", \"render\"]\n\n if from_stage:\n start_idx = stage_order.index(from_stage)\n stages_to_clean = stage_order[start_idx:]\n else:\n stages_to_clean = stage_order\n\n files_to_delete: set[str] = set()\n for s in stages_to_clean:\n files_to_delete.update(stage_outputs.get(s, []))\n\n deleted_count = 0\n for item_dir in batch_dir.iterdir():\n if not item_dir.is_dir():\n continue\n\n for filename in files_to_delete:\n filepath = item_dir / filename\n if filepath.exists():\n filepath.unlink()\n deleted_count += 1\n\n # Clean aggregated results\n if \"parse\" in stages_to_clean:\n agg_file = batch_dir / \"all_results.json\"\n if agg_file.exists():\n agg_file.unlink()\n deleted_count += 1\n\n print(f\"Cleaned {deleted_count} files from stage '{from_stage or 'all'}' onwards\")\n return deleted_count\n\n\n# -----------------------------------------------------------------------------\n# Cost Estimation\n# -----------------------------------------------------------------------------\n\ndef stage_estimate(batch_id: str) -> dict[str, Any] | None:\n \"\"\"Estimate processing costs before running the process stage.\n\n Use when: projecting token costs and budget requirements before\n committing to expensive LLM API calls.\n\n Returns: Dict with item_count, token estimates, and cost projection,\n or None if no prompts are available.\n \"\"\"\n batch_dir = get_batch_dir(batch_id)\n\n if not batch_dir.exists():\n print(f\"No data directory for {batch_id}. Run acquire first.\")\n return None\n\n # Count items and estimate tokens\n item_count = 0\n total_prompt_chars = 0\n\n for item_dir in batch_dir.iterdir():\n if not item_dir.is_dir():\n continue\n\n prompt_file = item_dir / \"prompt.md\"\n if prompt_file.exists():\n total_prompt_chars += len(prompt_file.read_text())\n item_count += 1\n\n if item_count == 0:\n print(\"No prompts found. Run prepare first.\")\n return None\n\n # Rough token estimation (1 token ~ 4 chars)\n est_input_tokens = total_prompt_chars / 4\n est_output_tokens = item_count * 500 # Assume 500 tokens per response\n\n # Example pricing (adjust for your model)\n input_price = 3.0 / 1_000_000 # $3 per MTok\n output_price = 15.0 / 1_000_000 # $15 per MTok\n\n est_cost = (est_input_tokens * input_price) + (est_output_tokens * output_price)\n\n estimate: dict[str, Any] = {\n \"batch_id\": batch_id,\n \"item_count\": item_count,\n \"est_input_tokens\": int(est_input_tokens),\n \"est_output_tokens\": int(est_output_tokens),\n \"est_cost_usd\": round(est_cost, 2),\n }\n\n print(f\"Cost Estimate for {batch_id}\")\n print(f\" Items: {item_count}\")\n print(f\" Estimated input tokens: {int(est_input_tokens):,}\")\n print(f\" Estimated output tokens: {int(est_output_tokens):,}\")\n print(f\" Estimated cost: ${est_cost:.2f}\")\n print(f\"\\nNote: Actual costs may vary. Add 20-30% buffer for retries.\")\n\n return estimate\n\n\n# -----------------------------------------------------------------------------\n# CLI\n# -----------------------------------------------------------------------------\n\ndef main() -> None:\n \"\"\"Entry point for CLI usage. Parses arguments and dispatches to stages.\"\"\"\n parser = argparse.ArgumentParser(\n description=\"LLM Batch Processing Pipeline\",\n formatter_class=argparse.RawDescriptionHelpFormatter,\n epilog=__doc__,\n )\n\n parser.add_argument(\n \"stage\",\n choices=[\"acquire\", \"prepare\", \"process\", \"parse\", \"render\", \"all\", \"clean\", \"estimate\"],\n help=\"Pipeline stage to run\",\n )\n parser.add_argument(\n \"--batch-id\",\n default=None,\n help=\"Batch identifier (default: today's date)\",\n )\n parser.add_argument(\n \"--limit\",\n type=int,\n default=None,\n help=\"Limit number of items (for testing)\",\n )\n parser.add_argument(\n \"--workers\",\n type=int,\n default=5,\n help=\"Number of parallel workers for processing\",\n )\n parser.add_argument(\n \"--model\",\n default=\"claude-sonnet-4-20250514\",\n help=\"Model to use for processing\",\n )\n parser.add_argument(\n \"--clean-stage\",\n choices=[\"acquire\", \"prepare\", \"process\", \"parse\"],\n help=\"For clean: only clean this stage and downstream\",\n )\n\n args = parser.parse_args()\n\n batch_id = args.batch_id or date.today().isoformat()\n print(f\"Batch ID: {batch_id}\\n\")\n\n if args.stage == \"clean\":\n stage_clean(batch_id, args.clean_stage)\n elif args.stage == \"estimate\":\n stage_estimate(batch_id)\n elif args.stage == \"all\":\n stage_acquire(batch_id, args.limit)\n stage_prepare(batch_id)\n stage_process(batch_id, args.model, args.workers)\n stage_parse(batch_id)\n stage_render(batch_id)\n else:\n if args.stage == \"acquire\":\n stage_acquire(batch_id, args.limit)\n elif args.stage == \"prepare\":\n stage_prepare(batch_id)\n elif args.stage == \"process\":\n stage_process(batch_id, args.model, args.workers)\n elif args.stage == \"parse\":\n stage_parse(batch_id)\n elif args.stage == \"render\":\n stage_render(batch_id)\n\n\nif __name__ == \"__main__\":\n main()\n","content_type":"text/x-python; charset=utf-8","language":"python","size":24159,"content_sha256":"ca23fd98d7bafb81e32117efd1163316995f04152ebc75cf9cb133bff3e3a16e"}],"content_json":{"type":"doc","content":[{"type":"heading","attrs":{"level":1},"content":[{"text":"Project Development Methodology","type":"text"}]},{"type":"paragraph","content":[{"text":"This skill covers the principles for identifying tasks suited to LLM processing, designing effective project architectures, and iterating rapidly using agent-assisted development. The methodology applies whether building a batch processing pipeline, a multi-agent research system, or an interactive agent application.","type":"text"}]},{"type":"paragraph","content":[{"text":"The unit of work for this skill is the whole project or a multi-stage pipeline. Individual tool design (descriptions, schemas, error messages) belongs to ","type":"text"},{"text":"tool-design","type":"text","marks":[{"type":"code_inline"}]},{"text":". Per-skill activation routing belongs to the corresponding skill plus the corpus index. This skill owns the project-level questions: should you build this with an LLM at all, what shape should the pipeline take, what does it cost, how should it be iterated.","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"When to Activate","type":"text"}]},{"type":"paragraph","content":[{"text":"Activate this skill when the unit of work is a whole project or pipeline:","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Deciding whether an LLM is the right primitive for a task at all (task-model fit before any code).","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Shaping a multi-stage batch or agent pipeline (acquire / prepare / process / parse / render).","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Estimating tokens, dollar cost, and timelines for an LLM-heavy project.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Choosing between single-agent and multi-agent at the project level.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Structuring agent-assisted iteration (where the agent helps build the project itself).","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Designing structured output at the pipeline contract level (cross-stage handoff format).","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"Do not activate this skill for adjacent work owned by other skills:","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Per-tool description, schema, naming, response format, error message: ","type":"text"},{"text":"tool-design","type":"text","marks":[{"type":"code_inline"}]},{"text":".","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Per-trajectory token-efficiency tactics (masking, partitioning, caching): ","type":"text"},{"text":"context-optimization","type":"text","marks":[{"type":"code_inline"}]},{"text":".","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Deciding to split work across sub-agents at the agent topology level: ","type":"text"},{"text":"multi-agent-patterns","type":"text","marks":[{"type":"code_inline"}]},{"text":".","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Designing the autonomous control loop (locked metrics, novelty gates, human approval boundaries): ","type":"text"},{"text":"harness-engineering","type":"text","marks":[{"type":"code_inline"}]},{"text":".","type":"text"}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Core Concepts","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Task-Model Fit Recognition","type":"text"}]},{"type":"paragraph","content":[{"text":"Evaluate task-model fit before writing any code, because building automation on a fundamentally mismatched task wastes days of effort. Run every proposed task through these two tables to decide proceed-or-stop.","type":"text"}]},{"type":"paragraph","content":[{"text":"Proceed when the task has these characteristics:","type":"text","marks":[{"type":"strong"}]}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Characteristic","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Rationale","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Synthesis across sources","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"LLMs combine information from multiple inputs better than rule-based alternatives","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Subjective judgment with rubrics","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Grading, evaluation, and classification with criteria map naturally to language reasoning","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Natural language output","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"When the goal is human-readable text, LLMs deliver it natively","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Error tolerance","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Individual failures do not break the overall system, so LLM non-determinism is acceptable","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Batch processing","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"No conversational state required between items, which keeps context clean","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Domain knowledge in training","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"The model already has relevant context, reducing prompt engineering overhead","type":"text"}]}]}]}]},{"type":"paragraph","content":[{"text":"Stop when the task has these characteristics:","type":"text","marks":[{"type":"strong"}]}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Characteristic","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Rationale","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Precise computation","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Math, counting, and exact algorithms are unreliable in language models","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Real-time requirements","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"LLM latency is too high for sub-second responses","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Perfect accuracy requirements","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Hallucination risk makes 100% accuracy impossible","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Proprietary data dependence","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"The model lacks necessary context and cannot acquire it from prompts alone","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Sequential dependencies","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Each step depends heavily on the previous result, compounding errors","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Deterministic output requirements","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Same input must produce identical output, which LLMs cannot guarantee","type":"text"}]}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"The Manual Prototype Step","type":"text"}]},{"type":"paragraph","content":[{"text":"Always validate task-model fit with a manual test before investing in automation. Copy one representative input into the model interface, evaluate the output quality, and use the result to answer these questions:","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Does the model have the knowledge required for this task?","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Can the model produce output in the format needed?","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"What level of quality should be expected at scale?","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Are there obvious failure modes to address?","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"Do this because a failed manual prototype predicts a failed automated system, while a successful one provides both a quality baseline and a prompt-design template. The test takes minutes and prevents hours of wasted development.","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Pipeline Architecture","type":"text"}]},{"type":"paragraph","content":[{"text":"Structure LLM projects as staged pipelines because separation of deterministic and non-deterministic stages enables fast iteration and cost control. Design each stage to be:","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Discrete","type":"text","marks":[{"type":"strong"}]},{"text":": Clear boundaries between stages so each can be debugged independently","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Idempotent","type":"text","marks":[{"type":"strong"}]},{"text":": Re-running produces the same result, preventing duplicate work","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Cacheable","type":"text","marks":[{"type":"strong"}]},{"text":": Intermediate results persist to disk, avoiding expensive re-computation","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Independent","type":"text","marks":[{"type":"strong"}]},{"text":": Each stage can run separately, enabling selective re-execution","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"Use this canonical pipeline structure:","type":"text","marks":[{"type":"strong"}]}]},{"type":"code_block","attrs":{"wrap":false,"language":""},"content":[{"text":"acquire -> prepare -> process -> parse -> render","type":"text"}]},{"type":"ordered_list","attrs":{"order":1,"listStyle":"number"},"content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Acquire","type":"text","marks":[{"type":"strong"}]},{"text":": Fetch raw data from sources (APIs, files, databases)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Prepare","type":"text","marks":[{"type":"strong"}]},{"text":": Transform data into prompt format","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Process","type":"text","marks":[{"type":"strong"}]},{"text":": Execute LLM calls (the expensive, non-deterministic step)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Parse","type":"text","marks":[{"type":"strong"}]},{"text":": Extract structured data from LLM outputs","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Render","type":"text","marks":[{"type":"strong"}]},{"text":": Generate final outputs (reports, files, visualizations)","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"Stages 1, 2, 4, and 5 are deterministic. Stage 3 is non-deterministic and expensive. Maintain this separation because it allows re-running the expensive LLM stage only when necessary, while iterating quickly on parsing and rendering.","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"File System as State Machine","type":"text"}]},{"type":"paragraph","content":[{"text":"Use the file system to track pipeline state rather than databases or in-memory structures, because file existence provides natural idempotency and human-readable debugging.","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":""},"content":[{"text":"data/{id}/\n raw.json # acquire stage complete\n prompt.md # prepare stage complete\n response.md # process stage complete\n parsed.json # parse stage complete","type":"text"}]},{"type":"paragraph","content":[{"text":"Check if an item needs processing by checking whether the output file exists. Re-run a stage by deleting its output file and downstream files. Debug by reading the intermediate files directly. This pattern works because each directory is independent, enabling simple parallelization and trivial caching.","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Structured Output Design","type":"text"}]},{"type":"paragraph","content":[{"text":"Design prompts for structured, parseable outputs because prompt design directly determines parsing reliability. Include these elements in every structured prompt:","type":"text"}]},{"type":"ordered_list","attrs":{"order":1,"listStyle":"number"},"content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Section markers","type":"text","marks":[{"type":"strong"}]},{"text":": Explicit headers or prefixes that parsers can match on","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Format examples","type":"text","marks":[{"type":"strong"}]},{"text":": Show exactly what output should look like","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Rationale disclosure","type":"text","marks":[{"type":"strong"}]},{"text":": State \"I will be parsing this programmatically\" so the model prioritizes format compliance","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Constrained values","type":"text","marks":[{"type":"strong"}]},{"text":": Enumerated options, score ranges, and fixed formats","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"Build parsers that handle LLM output variations gracefully, because LLMs do not follow instructions perfectly. Use regex patterns flexible enough for minor formatting variations, provide sensible defaults when sections are missing, and log parsing failures for review rather than crashing.","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Agent-Assisted Development","type":"text"}]},{"type":"paragraph","content":[{"text":"Use agent-capable models to accelerate development through rapid iteration: describe the project goal and constraints, let the agent generate initial implementation, test and iterate on specific failures, then refine prompts and architecture based on results.","type":"text"}]},{"type":"paragraph","content":[{"text":"Adopt these practices because they keep agent output focused and high-quality:","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Provide clear, specific requirements upfront to reduce revision cycles","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Break large projects into discrete components so each can be validated independently","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Test each component before moving to the next to catch failures early","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Keep the agent focused on one task at a time to prevent context degradation","type":"text"}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Cost and Scale Estimation","type":"text"}]},{"type":"paragraph","content":[{"text":"Estimate LLM processing costs before starting, because token costs compound quickly at scale and late discovery of budget overruns forces costly rework. Use this formula:","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":""},"content":[{"text":"Total cost = (items x tokens_per_item x price_per_token) + API overhead","type":"text"}]},{"type":"paragraph","content":[{"text":"For batch processing, estimate input tokens per item (prompt + context), estimate output tokens per item (typical response length), multiply by item count, and add 20-30% buffer for retries and failures.","type":"text"}]},{"type":"paragraph","content":[{"text":"Track actual costs during development. If costs exceed estimates significantly, reduce context length through truncation, use smaller models for simpler items, cache and reuse partial results, or add parallel processing to reduce wall-clock time.","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Detailed Topics","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Choosing Single vs Multi-Agent Architecture","type":"text"}]},{"type":"paragraph","content":[{"text":"Default to single-agent pipelines for batch processing with independent items, because they are simpler to manage, cheaper to run, and easier to debug. Escalate to multi-agent architectures only when one of these conditions holds:","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Parallel exploration of different aspects is required","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"The task exceeds single context window capacity","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Specialized sub-agents demonstrably improve quality on benchmarks","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"Choose multi-agent for context isolation, not role anthropomorphization. Sub-agents get fresh context windows for focused subtasks, which prevents context degradation on long-running tasks.","type":"text"}]},{"type":"paragraph","content":[{"text":"See ","type":"text"},{"text":"multi-agent-patterns","type":"text","marks":[{"type":"code_inline"}]},{"text":" skill for detailed architecture guidance.","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Architectural Reduction","type":"text"}]},{"type":"paragraph","content":[{"text":"Start with minimal architecture and add complexity only when production evidence proves it necessary, because over-engineered scaffolding often constrains rather than enables model performance.","type":"text"}]},{"type":"paragraph","content":[{"text":"Vercel's d0 case study reports improved success after reducing many specialized tools to two primitives: command execution and SQL (claim-project-development-vercel-d0-reduction). The file system agent pattern uses standard Unix utilities instead of custom exploration tools.","type":"text"}]},{"type":"paragraph","content":[{"text":"Reduce when:","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"The data layer is well-documented and consistently structured","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"The model has sufficient reasoning capability","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Specialized tools are constraining rather than enabling","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"More time is spent maintaining scaffolding than improving outcomes","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"Add complexity when:","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"The underlying data is messy, inconsistent, or poorly documented","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"The domain requires specialized knowledge the model lacks","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Safety constraints require limiting agent capabilities","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Operations are truly complex and benefit from structured workflows","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"See ","type":"text"},{"text":"tool-design","type":"text","marks":[{"type":"code_inline"}]},{"text":" skill for detailed tool architecture guidance.","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Iteration and Refactoring","type":"text"}]},{"type":"paragraph","content":[{"text":"Plan for multiple architectural iterations from the start, because production agent systems at scale always require refactoring. Manus refactored their agent framework five times since launch. The Bitter Lesson suggests that structures added for current model limitations become constraints as models improve.","type":"text"}]},{"type":"paragraph","content":[{"text":"Build for change by following these practices:","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Keep architecture simple and unopinionated so refactoring is cheap","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Test across model generations to verify the harness is not limiting performance","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Design systems that benefit from model improvements rather than locking in limitations","type":"text"}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Practical Guidance","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Project Planning Template","type":"text"}]},{"type":"paragraph","content":[{"text":"Follow this template in order, because each step validates assumptions before the next step invests effort.","type":"text"}]},{"type":"ordered_list","attrs":{"order":1,"listStyle":"number"},"content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Task Analysis","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Define the input and desired output explicitly","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Classify: synthesis, generation, classification, or analysis","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Set an acceptable error rate based on business impact","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Estimate the value per successful completion to justify costs","type":"text"}]}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Manual Validation","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Test one representative example with the target model","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Evaluate output quality and format against requirements","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Identify failure modes that need parser hardening or prompt revision","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Estimate tokens per item for cost projection","type":"text"}]}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Architecture Selection","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Choose single pipeline vs multi-agent based on the criteria above","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Identify required tools and data sources","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Design storage and caching strategy using file-system state","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Plan parallelization approach for the process stage","type":"text"}]}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Cost Estimation","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Calculate items x tokens x price with a 20-30% buffer","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Estimate development time for each pipeline stage","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Identify infrastructure requirements (API keys, storage, compute)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Project ongoing operational costs for production runs","type":"text"}]}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Development Plan","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Implement stage-by-stage, testing each before proceeding","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Define a testing strategy per stage with expected outputs","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Set iteration milestones tied to quality metrics","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Plan deployment approach with rollback capability","type":"text"}]}]}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Examples","type":"text"}]},{"type":"paragraph","content":[{"text":"Example 1: Batch Analysis Pipeline (Karpathy's HN Time Capsule)","type":"text","marks":[{"type":"strong"}]}]},{"type":"paragraph","content":[{"text":"Task: Analyze 930 HN discussions from 10 years ago with hindsight grading.","type":"text"}]},{"type":"paragraph","content":[{"text":"Architecture:","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"5-stage pipeline: fetch -> prompt -> analyze -> parse -> render","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"File system state: data/{date}/{item_id}/ with stage output files","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Structured output: 6 sections with explicit format requirements","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Parallel execution: 15 workers for LLM calls","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"Results: $58 total cost, ~1 hour execution, static HTML output.","type":"text"}]},{"type":"paragraph","content":[{"text":"Example 2: Architectural Reduction (Vercel d0)","type":"text","marks":[{"type":"strong"}]}]},{"type":"paragraph","content":[{"text":"Task: Text-to-SQL agent for internal analytics.","type":"text"}]},{"type":"paragraph","content":[{"text":"Before: many specialized tools with lower measured success and longer average execution.","type":"text"}]},{"type":"paragraph","content":[{"text":"After: two tools (bash + SQL) with higher measured success and shorter average execution (claim-project-development-vercel-d0-reduction).","type":"text"}]},{"type":"paragraph","content":[{"text":"Key insight: The semantic layer was already good documentation. Claude just needed access to read files directly.","type":"text"}]},{"type":"paragraph","content":[{"text":"See ","type":"text"},{"text":"Case Studies","type":"text","marks":[{"type":"link","attrs":{"href":"./references/case-studies.md","title":null}}]},{"text":" for detailed analysis.","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Guidelines","type":"text"}]},{"type":"ordered_list","attrs":{"order":1,"listStyle":"number"},"content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Validate task-model fit with manual prototyping before building automation","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Structure pipelines as discrete, idempotent, cacheable stages","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Use the file system for state management and debugging","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Design prompts for structured, parseable outputs with explicit format examples","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Start with minimal architecture; add complexity only when proven necessary","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Estimate costs early and track throughout development","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Build robust parsers that handle LLM output variations","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Expect and plan for multiple architectural iterations","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Test whether scaffolding helps or constrains model performance","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Use agent-assisted development for rapid iteration on implementation","type":"text"}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Gotchas","type":"text"}]},{"type":"ordered_list","attrs":{"order":1,"listStyle":"number"},"content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Skipping manual validation","type":"text","marks":[{"type":"strong"}]},{"text":": Building automation before verifying the model can do the task wastes significant time when the approach is fundamentally flawed. Always run one representative example through the model interface first.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Monolithic pipelines","type":"text","marks":[{"type":"strong"}]},{"text":": Combining all stages into one script makes debugging and iteration difficult. Separate stages with persistent intermediate outputs so each can be re-run independently.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Over-constraining the model","type":"text","marks":[{"type":"strong"}]},{"text":": Adding guardrails, pre-filtering, and validation logic that the model could handle on its own reduces performance. Test whether scaffolding helps or hurts before keeping it.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Ignoring costs until production","type":"text","marks":[{"type":"strong"}]},{"text":": Token costs compound quickly at scale. Estimate and track from the beginning to avoid budget surprises that force architectural rework.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Perfect parsing requirements","type":"text","marks":[{"type":"strong"}]},{"text":": Expecting LLMs to follow format instructions perfectly leads to brittle systems. Build robust parsers that handle variations and log failures for review.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Premature optimization","type":"text","marks":[{"type":"strong"}]},{"text":": Adding caching, parallelization, and optimization before the basic pipeline works correctly wastes effort on code that may be discarded during iteration.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Model version lock-in","type":"text","marks":[{"type":"strong"}]},{"text":": Building pipelines that only work with one specific model version creates fragile systems. Test across model generations and abstract the LLM call layer so models can be swapped without rewriting pipeline logic.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Evaluation-less deployment","type":"text","marks":[{"type":"strong"}]},{"text":": Shipping agent pipelines without measuring output quality means regressions go undetected. Define quality metrics during development and run evaluation checks before and after every model or prompt change.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Provenance drift","type":"text","marks":[{"type":"strong"}]},{"text":": Raw inputs, intermediate outputs, and final proposals separated across ad hoc folders become impossible to audit. Keep each pipeline run in a single directory with source evidence, transformations, validation reports, and decisions.","type":"text"}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Integration","type":"text"}]},{"type":"paragraph","content":[{"text":"This skill owns project-shape and pipeline decisions. Adjacent decisions are owned elsewhere:","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"tool-design","type":"text","marks":[{"type":"code_inline"}]},{"text":": the per-tool interface layer (descriptions, schemas, response formats, error messages, MCP namespacing, individual tool consolidation). If the question is \"what should this specific tool look like\" rather than \"what should the pipeline look like,\" route there.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"multi-agent-patterns","type":"text","marks":[{"type":"code_inline"}]},{"text":": agent topology decisions (supervisor vs swarm vs hierarchical, handoff protocols, context isolation across agents). This skill picks single-vs-multi at the project level; the topology details belong to multi-agent-patterns.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"harness-engineering","type":"text","marks":[{"type":"code_inline"}]},{"text":": the autonomous control loop around the project (locked metrics, novelty gates, run state machine, human approval boundaries). If the question is \"how do we make this run unattended for days,\" route there.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"context-fundamentals","type":"text","marks":[{"type":"code_inline"}]},{"text":": the conceptual frame for context constraints that inform prompt design at every stage.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"evaluation","type":"text","marks":[{"type":"code_inline"}]},{"text":": outcome measurement and quality gates for pipeline runs.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"context-compression","type":"text","marks":[{"type":"code_inline"}]},{"text":": when long-running pipeline stages produce trajectories that need summarization.","type":"text"}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"References","type":"text"}]},{"type":"paragraph","content":[{"text":"Internal references:","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Case Studies","type":"text","marks":[{"type":"link","attrs":{"href":"./references/case-studies.md","title":null}}]},{"text":" - Read when: evaluating architecture tradeoffs or reviewing real-world pipeline implementations (Karpathy HN Capsule, Vercel d0, Manus patterns)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Pipeline Patterns","type":"text","marks":[{"type":"link","attrs":{"href":"./references/pipeline-patterns.md","title":null}}]},{"text":" - Read when: designing a new pipeline stage layout, choosing caching strategies, or debugging stage boundaries","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"Related skills in this collection:","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"tool-design - Tool architecture and reduction patterns","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"multi-agent-patterns - When to use multi-agent architectures","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"evaluation - Output evaluation frameworks","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"External resources:","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Karpathy's HN Time Capsule project: https://github.com/karpathy/hn-time-capsule","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Vercel d0 architectural reduction: https://vercel.com/blog/we-removed-80-percent-of-our-agents-tools","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Manus context engineering: Peak Ji's blog on context engineering lessons","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Anthropic multi-agent research: How we built our multi-agent research system","type":"text"}]}]}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Skill Metadata","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Created","type":"text","marks":[{"type":"strong"}]},{"text":": 2025-12-25 ","type":"text"},{"text":"Last Updated","type":"text","marks":[{"type":"strong"}]},{"text":": 2026-05-15 ","type":"text"},{"text":"Author","type":"text","marks":[{"type":"strong"}]},{"text":": Agent Skills for Context Engineering Contributors ","type":"text"},{"text":"Version","type":"text","marks":[{"type":"strong"}]},{"text":": 1.3.0","type":"text"}]}]},"metadata":{"date":"2026-06-05","author":"@skillopedia","source":{"stars":16233,"repo_name":"agent-skills-for-context-engineering","origin_url":"https://github.com/muratcankoylan/agent-skills-for-context-engineering/blob/HEAD/skills/project-development/SKILL.md","repo_owner":"muratcankoylan","body_sha256":"f058d62bc856ab121a2ea37bce3814cbc0bf9de605e126cc30616747ef233e77","cluster_key":"814699ffa728633a09f2f24b58097f61163ab5f0173bba6d1393027124a93c25","clean_bundle":{"format":"clean-skill-bundle-v1","source":"muratcankoylan/agent-skills-for-context-engineering/skills/project-development/SKILL.md","attachments":[{"id":"dd3bc52c-e076-5b22-acba-85096ec73645","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/dd3bc52c-e076-5b22-acba-85096ec73645/attachment.md","path":"references/case-studies.md","size":14830,"sha256":"243ee1adc820f6fef822334dd747dad6069cb22fee6f8e285bad1c8e5309357f","contentType":"text/markdown; charset=utf-8"},{"id":"62c1977b-2a68-5f4f-b174-13be75a52a73","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/62c1977b-2a68-5f4f-b174-13be75a52a73/attachment.md","path":"references/pipeline-patterns.md","size":16921,"sha256":"d27c755a0e8dcca6d2c92ff7260a0a884f5fe76fef0d492de6b8eecf8cc4619a","contentType":"text/markdown; charset=utf-8"},{"id":"cd7ce979-1f9c-51be-9c24-02ec93a09cdc","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/cd7ce979-1f9c-51be-9c24-02ec93a09cdc/attachment.py","path":"scripts/pipeline_template.py","size":24159,"sha256":"ca23fd98d7bafb81e32117efd1163316995f04152ebc75cf9cb133bff3e3a16e","contentType":"text/x-python; charset=utf-8"}],"bundle_sha256":"056c10faa5a9688c28eac2da9b892666634b6b0648c1dc66e109484d283ef6f4","attachment_count":3,"text_attachments":3,"attachment_storage":"skillopedia-attachments-v1","binary_attachments":0,"excluded_attachments":[]},"cluster_size":2,"skill_md_path":"skills/project-development/SKILL.md","import_metadata":{"date":"2026-06-05","author":"@skillopedia","version":"v1","category":"general","category_label":"General"},"exact_dupes_collapsed_into_this":1},"version":"v1","category":"general","import_tag":"clean-skills-v1","_yaml_error":"YAMLException: bad indentation of a mapping entry (2:93)\n\n 1 | ... \n 2 | ... sions about LLM-powered systems: whether an LLM is the right pr ...\n-----------------------------------------^"}},"renderedAt":1782986381148}

Project Development Methodology This skill covers the principles for identifying tasks suited to LLM processing, designing effective project architectures, and iterating rapidly using agent-assisted development. The methodology applies whether building a batch processing pipeline, a multi-agent research system, or an interactive agent application. The unit of work for this skill is the whole project or a multi-stage pipeline. Individual tool design (descriptions, schemas, error messages) belongs to . Per-skill activation routing belongs to the corresponding skill plus the corpus index. This s…