nf-core Pipeline Deployment Run nf-core bioinformatics pipelines on local or public sequencing data. Target users: Bench scientists and researchers without specialized bioinformatics training who need to run large-scale omics analyses—differential expression, variant calling, or chromatin accessibility analysis. Workflow Checklist --- Step 0: Acquire Data (GEO/SRA Only) Skip this step if user has local FASTQ files. For public datasets, fetch from GEO/SRA first. See references/geo-sra-acquisition.md for the full workflow. Quick start: DECISION POINT: After fetching study info, confirm with use…

,\n r'^T\\d*[-_]',\n]\n\nNORMAL_KEYWORDS = [\n r'\\bnormal\\b',\n r'\\bgermline\\b',\n r'\\bblood\\b',\n r'\\bpbmc\\b',\n r'\\bcontrol\\b',\n r'\\bhealthy\\b',\n r'\\bmatched\\b',\n r'[-_]N[-_]',\n r'[-_]N\\d*

nf-core Pipeline Deployment Run nf-core bioinformatics pipelines on local or public sequencing data. Target users: Bench scientists and researchers without specialized bioinformatics training who need to run large-scale omics analyses—differential expression, variant calling, or chromatin accessibility analysis. Workflow Checklist --- Step 0: Acquire Data (GEO/SRA Only) Skip this step if user has local FASTQ files. For public datasets, fetch from GEO/SRA first. See references/geo-sra-acquisition.md for the full workflow. Quick start: DECISION POINT: After fetching study info, confirm with use…

,\n r'^N\\d*[-_]',\n]\n\n# Lane pattern\nLANE_PATTERN = r'[_.]L(\\d{3})[_.]'\n\n# Patient/sample extraction patterns\nPATIENT_PATTERNS = [\n r'^(P\\d+)[-_]', # P001_sample\n r'^(patient\\d+)[-_]', # patient1_sample\n r'^(TCGA-\\w+-\\w+)', # TCGA format\n r'^([A-Z]{2,3}\\d{3,})[-_]', # AB123_sample\n]\n\n# Replicate patterns\nREPLICATE_PATTERNS = [\n r'[_.]rep(\\d+)', # _rep1, .rep2\n r'[_.]replicate(\\d+)', # _replicate1\n r'[_.]R(\\d+)[_.]', # _R1_ (but not R1/R2 for reads!)\n r'[-_](\\d+)

nf-core Pipeline Deployment Run nf-core bioinformatics pipelines on local or public sequencing data. Target users: Bench scientists and researchers without specialized bioinformatics training who need to run large-scale omics analyses—differential expression, variant calling, or chromatin accessibility analysis. Workflow Checklist --- Step 0: Acquire Data (GEO/SRA Only) Skip this step if user has local FASTQ files. For public datasets, fetch from GEO/SRA first. See references/geo-sra-acquisition.md for the full workflow. Quick start: DECISION POINT: After fetching study info, confirm with use…

, # sample_1 (last resort)\n]\n\n\ndef extract_sample_info(filepath: str) -> Dict[str, str]:\n \"\"\"\n Extract sample metadata from filepath.\n\n Args:\n filepath: Path to sequencing file\n\n Returns:\n Dict with: sample, patient, lane (if detectable)\n \"\"\"\n filename = os.path.basename(filepath)\n\n # Remove extensions\n stem = filename\n for ext in ['.fastq.gz', '.fq.gz', '.fastq', '.fq', '.bam', '.cram', '.bai', '.crai']:\n if stem.lower().endswith(ext):\n stem = stem[:-len(ext)]\n break\n\n info = {}\n\n # Extract lane\n lane_match = re.search(LANE_PATTERN, stem)\n info['lane'] = f\"L{lane_match.group(1)}\" if lane_match else \"L001\"\n\n # Remove lane from stem\n clean_stem = re.sub(LANE_PATTERN, '_', stem)\n\n # Remove R1/R2 indicators and everything after\n for pattern, _ in R1_PATTERNS + R2_PATTERNS:\n clean_stem = re.sub(pattern + r'.*', '', clean_stem, flags=re.IGNORECASE)\n\n # Clean up trailing/multiple underscores and dots\n clean_stem = re.sub(r'[_.-]+

nf-core Pipeline Deployment Run nf-core bioinformatics pipelines on local or public sequencing data. Target users: Bench scientists and researchers without specialized bioinformatics training who need to run large-scale omics analyses—differential expression, variant calling, or chromatin accessibility analysis. Workflow Checklist --- Step 0: Acquire Data (GEO/SRA Only) Skip this step if user has local FASTQ files. For public datasets, fetch from GEO/SRA first. See references/geo-sra-acquisition.md for the full workflow. Quick start: DECISION POINT: After fetching study info, confirm with use…

, '', clean_stem)\n clean_stem = re.sub(r'[_.-]{2,}', '_', clean_stem)\n\n # Try to extract patient ID\n for pattern in PATIENT_PATTERNS:\n match = re.match(pattern, clean_stem, re.IGNORECASE)\n if match:\n info['patient'] = match.group(1)\n break\n\n # Sample is the cleaned stem\n info['sample'] = clean_stem if clean_stem else filename.split('.')[0]\n\n # Default patient to sample if not extracted\n if 'patient' not in info:\n info['patient'] = info['sample']\n\n return info\n\n\ndef infer_tumor_normal_status(sample_name: str) -> Optional[int]:\n \"\"\"\n Infer tumor (1) or normal (0) status from sample name.\n\n Args:\n sample_name: Sample identifier\n\n Returns:\n 1 for tumor, 0 for normal, None if cannot determine\n \"\"\"\n name_lower = sample_name.lower()\n\n # Check tumor indicators\n for pattern in TUMOR_KEYWORDS:\n if re.search(pattern, name_lower, re.IGNORECASE):\n return 1\n\n # Check normal indicators\n for pattern in NORMAL_KEYWORDS:\n if re.search(pattern, name_lower, re.IGNORECASE):\n return 0\n\n return None\n\n\ndef extract_replicate_number(sample_name: str) -> Optional[int]:\n \"\"\"\n Extract replicate number from sample name.\n\n Args:\n sample_name: Sample identifier\n\n Returns:\n Replicate number if found, None otherwise\n \"\"\"\n for pattern in REPLICATE_PATTERNS:\n match = re.search(pattern, sample_name, re.IGNORECASE)\n if match:\n try:\n return int(match.group(1))\n except ValueError:\n continue\n return None\n\n\ndef _get_pattern_score(filename: str, patterns: List[Tuple[str, int]]) -> int:\n \"\"\"Get highest matching pattern score.\"\"\"\n max_score = 0\n for pattern, score in patterns:\n if re.search(pattern, filename, re.IGNORECASE):\n max_score = max(max_score, score)\n return max_score\n\n\ndef _get_sample_key(filepath: str) -> str:\n \"\"\"Generate a key for grouping related files.\"\"\"\n info = extract_sample_info(filepath)\n sample = info['sample']\n lane = info.get('lane', 'L001')\n\n # Include lane in key for multi-lane samples\n if lane != \"L001\":\n return f\"{sample}_{lane}\"\n return sample\n\n\ndef match_read_pairs(files) -> Dict[str, Dict]:\n \"\"\"\n Match R1/R2 read pairs using scored pattern matching.\n\n Args:\n files: List of FileInfo objects (from file_discovery)\n\n Returns:\n Dict mapping sample_key to {'r1': path, 'r2': path, 'info': dict}\n \"\"\"\n # Classify files\n r1_files = []\n r2_files = []\n\n for file in files:\n filename = file.name if hasattr(file, 'name') else os.path.basename(str(file))\n filepath = file.path if hasattr(file, 'path') else str(file)\n\n r1_score = _get_pattern_score(filename, R1_PATTERNS)\n r2_score = _get_pattern_score(filename, R2_PATTERNS)\n\n if r2_score > r1_score and r2_score > 0:\n r2_files.append((filepath, r2_score))\n elif r1_score > 0:\n r1_files.append((filepath, r1_score))\n else:\n # No clear indicator - assume R1 (single-end or non-standard naming)\n r1_files.append((filepath, 0))\n\n # Build pairs by matching sample keys\n pairs = {}\n\n # Process R1 files first\n for r1_path, score in r1_files:\n key = _get_sample_key(r1_path)\n info = extract_sample_info(r1_path)\n\n if key not in pairs:\n pairs[key] = {\n 'r1': r1_path,\n 'r2': None,\n 'info': info,\n 'score': score\n }\n else:\n # Multiple R1 files for same sample (should not happen)\n pairs[key]['r1'] = r1_path\n\n # Match R2 files\n for r2_path, score in r2_files:\n key = _get_sample_key(r2_path)\n info = extract_sample_info(r2_path)\n\n if key in pairs:\n pairs[key]['r2'] = r2_path\n else:\n # R2 without matching R1\n pairs[key] = {\n 'r1': None,\n 'r2': r2_path,\n 'info': info,\n 'score': score\n }\n\n return pairs\n\n\ndef infer_patient_groupings(sample_names: List[str]) -> Dict[str, str]:\n \"\"\"\n Infer patient groupings from sample names.\n\n Groups samples that share a common prefix pattern.\n\n Args:\n sample_names: List of sample identifiers\n\n Returns:\n Dict mapping sample_name to patient_id\n \"\"\"\n patient_map = {}\n\n for sample in sample_names:\n # Try to find a patient pattern\n for pattern in PATIENT_PATTERNS:\n match = re.match(pattern, sample, re.IGNORECASE)\n if match:\n patient_map[sample] = match.group(1)\n break\n\n if sample not in patient_map:\n # Default: each sample is its own patient\n patient_map[sample] = sample\n\n return patient_map\n","content_type":"text/x-python; charset=utf-8","language":"python","size":7848,"content_sha256":"3f2dcebbc540fa76e9285dd247cb0d6922d4fbe2588b7f7b3ad7ac7ab82b63af"},{"filename":"scripts/utils/validators.py","content":"\"\"\"\nSamplesheet validation utilities.\n\nValidates samplesheet rows against pipeline configuration before writing,\ncatching errors early with helpful messages.\n\"\"\"\n\nimport os\nfrom dataclasses import dataclass, field\nfrom pathlib import Path\nfrom typing import Dict, List, Optional\nimport yaml\n\n\n@dataclass\nclass ValidationResult:\n \"\"\"Result of samplesheet validation.\"\"\"\n valid: bool\n errors: List[str] = field(default_factory=list)\n warnings: List[str] = field(default_factory=list)\n suggestions: List[str] = field(default_factory=list)\n\n def __bool__(self):\n return self.valid\n\n def summary(self) -> str:\n \"\"\"Generate human-readable summary.\"\"\"\n lines = []\n if self.errors:\n lines.append(\"Errors:\")\n for e in self.errors:\n lines.append(f\" - {e}\")\n if self.warnings:\n lines.append(\"Warnings:\")\n for w in self.warnings:\n lines.append(f\" - {w}\")\n if self.suggestions:\n lines.append(\"Suggestions:\")\n for s in self.suggestions:\n lines.append(f\" - {s}\")\n return \"\\n\".join(lines)\n\n\ndef load_pipeline_config(pipeline: str) -> Optional[Dict]:\n \"\"\"Load pipeline configuration from YAML file.\"\"\"\n # Find config directory relative to this file\n script_dir = Path(__file__).parent.parent.parent\n config_path = script_dir / \"config\" / \"pipelines\" / f\"{pipeline}.yaml\"\n\n if not config_path.exists():\n return None\n\n with open(config_path) as f:\n return yaml.safe_load(f)\n\n\ndef validate_samplesheet(\n rows: List[Dict],\n pipeline: str,\n config: Optional[Dict] = None\n) -> ValidationResult:\n \"\"\"\n Validate samplesheet rows against pipeline requirements.\n\n Args:\n rows: List of row dictionaries\n pipeline: Pipeline name (e.g., 'rnaseq', 'sarek')\n config: Optional pre-loaded config dict\n\n Returns:\n ValidationResult with errors, warnings, and suggestions\n \"\"\"\n errors = []\n warnings = []\n suggestions = []\n\n # Load config if not provided\n if config is None:\n config = load_pipeline_config(pipeline)\n\n if config is None:\n errors.append(f\"Unknown pipeline: {pipeline}\")\n return ValidationResult(valid=False, errors=errors)\n\n columns = config.get(\"samplesheet\", {}).get(\"columns\", [])\n required_cols = [c[\"name\"] for c in columns if c.get(\"required\", False)]\n\n if not rows:\n errors.append(\"Samplesheet is empty - no samples found\")\n return ValidationResult(valid=False, errors=errors)\n\n # Validate each row\n for i, row in enumerate(rows):\n row_num = i + 2 # Account for header row\n\n # Check required columns\n for col_name in required_cols:\n col_config = next((c for c in columns if c[\"name\"] == col_name), None)\n\n # Skip columns with conditions that don't apply\n if col_config and \"condition\" in col_config:\n # Simple condition check - skip for now\n # Full implementation would evaluate conditions\n pass\n\n if col_name not in row or row[col_name] is None or row[col_name] == \"\":\n # Check if there's a default\n if col_config and \"default\" in col_config:\n continue\n errors.append(f\"Row {row_num}: Missing required column '{col_name}'\")\n\n # Validate path columns exist\n for col_name in [\"fastq_1\", \"fastq_2\", \"bam\", \"bai\"]:\n if col_name in row and row[col_name]:\n path = row[col_name]\n if not os.path.exists(path):\n errors.append(f\"Row {row_num}: File not found: {path}\")\n elif not os.path.isfile(path):\n errors.append(f\"Row {row_num}: Not a file: {path}\")\n\n # Validate enum values\n for col_config in columns:\n col_name = col_config[\"name\"]\n if col_name in row and row[col_name] and \"allowed\" in col_config:\n value = row[col_name]\n allowed = col_config[\"allowed\"]\n if value not in allowed:\n errors.append(\n f\"Row {row_num}: Invalid value '{value}' for '{col_name}'. \"\n f\"Allowed: {allowed}\"\n )\n\n # Check R1/R2 pairing consistency\n r1 = row.get(\"fastq_1\", \"\")\n r2 = row.get(\"fastq_2\", \"\")\n if r1 and not r2:\n warnings.append(f\"Row {row_num}: Single-end data (no R2 file)\")\n elif r2 and not r1:\n errors.append(f\"Row {row_num}: R2 present but R1 missing\")\n\n # Check for duplicate samples\n sample_col = \"sample\" if \"sample\" in rows[0] else \"patient\"\n if sample_col in rows[0]:\n samples = [r.get(sample_col, \"\") for r in rows]\n duplicates = [s for s in set(samples) if samples.count(s) > 1]\n if duplicates:\n warnings.append(f\"Duplicate sample names: {duplicates}\")\n suggestions.append(\n \"Duplicates may be intentional (multi-lane sequencing). \"\n \"Verify sample grouping is correct.\"\n )\n\n # Pipeline-specific validation\n if pipeline == \"sarek\":\n _validate_sarek_specific(rows, errors, warnings, suggestions)\n elif pipeline == \"atacseq\":\n _validate_atacseq_specific(rows, errors, warnings, suggestions)\n\n return ValidationResult(\n valid=len(errors) == 0,\n errors=errors,\n warnings=warnings,\n suggestions=suggestions\n )\n\n\ndef _validate_sarek_specific(\n rows: List[Dict],\n errors: List[str],\n warnings: List[str],\n suggestions: List[str]\n):\n \"\"\"Sarek-specific validation for tumor/normal pairing.\"\"\"\n # Group by patient\n patients = {}\n for row in rows:\n patient = row.get(\"patient\", \"\")\n status = row.get(\"status\")\n\n if patient not in patients:\n patients[patient] = {\"tumor\": 0, \"normal\": 0, \"unknown\": 0}\n\n if status == 1:\n patients[patient][\"tumor\"] += 1\n elif status == 0:\n patients[patient][\"normal\"] += 1\n else:\n patients[patient][\"unknown\"] += 1\n\n # Check pairing\n for patient, counts in patients.items():\n if counts[\"tumor\"] > 0 and counts[\"normal\"] == 0:\n warnings.append(\n f\"Patient '{patient}': Tumor sample(s) without matched normal. \"\n \"Somatic calling works best with paired tumor-normal.\"\n )\n suggestions.append(\n f\"For patient '{patient}': Add a normal sample or use tumor-only mode.\"\n )\n\n if counts[\"unknown\"] > 0:\n warnings.append(\n f\"Patient '{patient}': {counts['unknown']} sample(s) with unknown status. \"\n \"Set status column to 0 (normal) or 1 (tumor).\"\n )\n\n\ndef _validate_atacseq_specific(\n rows: List[Dict],\n errors: List[str],\n warnings: List[str],\n suggestions: List[str]\n):\n \"\"\"ATAC-seq specific validation for replicates.\"\"\"\n # Group by sample (condition)\n samples = {}\n for row in rows:\n sample = row.get(\"sample\", \"\")\n replicate = row.get(\"replicate\", 1)\n\n if sample not in samples:\n samples[sample] = []\n\n samples[sample].append(replicate)\n\n # Check replicates\n for sample, reps in samples.items():\n if len(reps) \u003c 2:\n warnings.append(\n f\"Sample '{sample}': Only {len(reps)} replicate(s). \"\n \"Consensus peaks require 2+ replicates.\"\n )\n\n # Check for duplicate replicate numbers\n if len(reps) != len(set(reps)):\n errors.append(\n f\"Sample '{sample}': Duplicate replicate numbers detected. \"\n \"Each replicate must have a unique number.\"\n )\n\n # Check all samples have R2 (ATAC-seq requires paired-end)\n for i, row in enumerate(rows):\n if not row.get(\"fastq_2\"):\n errors.append(\n f\"Row {i+2}: ATAC-seq requires paired-end data. R2 file missing.\"\n )\n\n\ndef validate_file_exists(path: str) -> bool:\n \"\"\"Check if file exists and is accessible.\"\"\"\n return os.path.isfile(path) and os.access(path, os.R_OK)\n\n\ndef validate_absolute_path(path: str) -> bool:\n \"\"\"Check if path is absolute.\"\"\"\n return os.path.isabs(path)\n","content_type":"text/x-python; charset=utf-8","language":"python","size":8419,"content_sha256":"2572671195244a3ac7e67a6ebe4c44bbd541fbaa1afe3ebab1f81a46b29feea1"}],"content_json":{"type":"doc","content":[{"type":"heading","attrs":{"level":1},"content":[{"text":"nf-core Pipeline Deployment","type":"text"}]},{"type":"paragraph","content":[{"text":"Run nf-core bioinformatics pipelines on local or public sequencing data.","type":"text"}]},{"type":"paragraph","content":[{"text":"Target users:","type":"text","marks":[{"type":"strong"}]},{"text":" Bench scientists and researchers without specialized bioinformatics training who need to run large-scale omics analyses—differential expression, variant calling, or chromatin accessibility analysis.","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Workflow Checklist","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":""},"content":[{"text":"- [ ] Step 0: Acquire data (if from GEO/SRA)\n- [ ] Step 1: Environment check (MUST pass)\n- [ ] Step 2: Select pipeline (confirm with user)\n- [ ] Step 3: Run test profile (MUST pass)\n- [ ] Step 4: Create samplesheet\n- [ ] Step 5: Configure & run (confirm genome with user)\n- [ ] Step 6: Verify outputs","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Step 0: Acquire Data (GEO/SRA Only)","type":"text"}]},{"type":"paragraph","content":[{"text":"Skip this step if user has local FASTQ files.","type":"text","marks":[{"type":"strong"}]}]},{"type":"paragraph","content":[{"text":"For public datasets, fetch from GEO/SRA first. See ","type":"text"},{"text":"references/geo-sra-acquisition.md","type":"text","marks":[{"type":"link","attrs":{"href":"references/geo-sra-acquisition.md","title":null}}]},{"text":" for the full workflow.","type":"text"}]},{"type":"paragraph","content":[{"text":"Quick start:","type":"text","marks":[{"type":"strong"}]}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"# 1. Get study info\npython scripts/sra_geo_fetch.py info GSE110004\n\n# 2. Download (interactive mode)\npython scripts/sra_geo_fetch.py download GSE110004 -o ./fastq -i\n\n# 3. Generate samplesheet\npython scripts/sra_geo_fetch.py samplesheet GSE110004 --fastq-dir ./fastq -o samplesheet.csv","type":"text"}]},{"type":"paragraph","content":[{"text":"DECISION POINT:","type":"text","marks":[{"type":"strong"}]},{"text":" After fetching study info, confirm with user:","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Which sample subset to download (if multiple data types)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Suggested genome and pipeline","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"Then continue to Step 1.","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Step 1: Environment Check","type":"text"}]},{"type":"paragraph","content":[{"text":"Run first. Pipeline will fail without passing environment.","type":"text","marks":[{"type":"strong"}]}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"python scripts/check_environment.py","type":"text"}]},{"type":"paragraph","content":[{"text":"All critical checks must pass. If any fail, provide fix instructions:","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Docker issues","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Problem","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Fix","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Not installed","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Install from https://docs.docker.com/get-docker/","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Permission denied","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"sudo usermod -aG docker $USER","type":"text","marks":[{"type":"code_inline"}]},{"text":" then re-login","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Daemon not running","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"sudo systemctl start docker","type":"text","marks":[{"type":"code_inline"}]}]}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Nextflow issues","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Problem","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Fix","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Not installed","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"curl -s https://get.nextflow.io | bash && mv nextflow ~/bin/","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Version \u003c 23.04","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"nextflow self-update","type":"text","marks":[{"type":"code_inline"}]}]}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Java issues","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Problem","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Fix","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Not installed / \u003c 11","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"sudo apt install openjdk-11-jdk","type":"text","marks":[{"type":"code_inline"}]}]}]}]}]},{"type":"paragraph","content":[{"text":"Do not proceed until all checks pass.","type":"text","marks":[{"type":"strong"}]},{"text":" For HPC/Singularity, see ","type":"text"},{"text":"references/troubleshooting.md","type":"text","marks":[{"type":"link","attrs":{"href":"references/troubleshooting.md","title":null}}]},{"text":".","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Step 2: Select Pipeline","type":"text"}]},{"type":"paragraph","content":[{"text":"DECISION POINT: Confirm with user before proceeding.","type":"text","marks":[{"type":"strong"}]}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Data Type","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Pipeline","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Version","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Goal","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"RNA-seq","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"rnaseq","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"3.22.2","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Gene expression","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"WGS/WES","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"sarek","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"3.7.1","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Variant calling","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"ATAC-seq","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"atacseq","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"2.1.2","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Chromatin accessibility","type":"text"}]}]}]}]},{"type":"paragraph","content":[{"text":"Auto-detect from data:","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"python scripts/detect_data_type.py /path/to/data","type":"text"}]},{"type":"paragraph","content":[{"text":"For pipeline-specific details:","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"references/pipelines/rnaseq.md","type":"text","marks":[{"type":"link","attrs":{"href":"references/pipelines/rnaseq.md","title":null}}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"references/pipelines/sarek.md","type":"text","marks":[{"type":"link","attrs":{"href":"references/pipelines/sarek.md","title":null}}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"references/pipelines/atacseq.md","type":"text","marks":[{"type":"link","attrs":{"href":"references/pipelines/atacseq.md","title":null}}]}]}]}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Step 3: Run Test Profile","type":"text"}]},{"type":"paragraph","content":[{"text":"Validates environment with small data. MUST pass before real data.","type":"text","marks":[{"type":"strong"}]}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"nextflow run nf-core/\u003cpipeline> -r \u003cversion> -profile test,docker --outdir test_output","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Pipeline","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Command","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"rnaseq","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"nextflow run nf-core/rnaseq -r 3.22.2 -profile test,docker --outdir test_rnaseq","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"sarek","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"nextflow run nf-core/sarek -r 3.7.1 -profile test,docker --outdir test_sarek","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"atacseq","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"nextflow run nf-core/atacseq -r 2.1.2 -profile test,docker --outdir test_atacseq","type":"text","marks":[{"type":"code_inline"}]}]}]}]}]},{"type":"paragraph","content":[{"text":"Verify:","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"ls test_output/multiqc/multiqc_report.html\ngrep \"Pipeline completed successfully\" .nextflow.log","type":"text"}]},{"type":"paragraph","content":[{"text":"If test fails, see ","type":"text"},{"text":"references/troubleshooting.md","type":"text","marks":[{"type":"link","attrs":{"href":"references/troubleshooting.md","title":null}}]},{"text":".","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Step 4: Create Samplesheet","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Generate automatically","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"python scripts/generate_samplesheet.py /path/to/data \u003cpipeline> -o samplesheet.csv","type":"text"}]},{"type":"paragraph","content":[{"text":"The script:","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Discovers FASTQ/BAM/CRAM files","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Pairs R1/R2 reads","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Infers sample metadata","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Validates before writing","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"For sarek:","type":"text","marks":[{"type":"strong"}]},{"text":" Script prompts for tumor/normal status if not auto-detected.","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Validate existing samplesheet","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"python scripts/generate_samplesheet.py --validate samplesheet.csv \u003cpipeline>","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Samplesheet formats","type":"text"}]},{"type":"paragraph","content":[{"text":"rnaseq:","type":"text","marks":[{"type":"strong"}]}]},{"type":"code_block","attrs":{"wrap":false,"language":"csv"},"content":[{"text":"sample,fastq_1,fastq_2,strandedness\nSAMPLE1,/abs/path/R1.fq.gz,/abs/path/R2.fq.gz,auto","type":"text"}]},{"type":"paragraph","content":[{"text":"sarek:","type":"text","marks":[{"type":"strong"}]}]},{"type":"code_block","attrs":{"wrap":false,"language":"csv"},"content":[{"text":"patient,sample,lane,fastq_1,fastq_2,status\npatient1,tumor,L001,/abs/path/tumor_R1.fq.gz,/abs/path/tumor_R2.fq.gz,1\npatient1,normal,L001,/abs/path/normal_R1.fq.gz,/abs/path/normal_R2.fq.gz,0","type":"text"}]},{"type":"paragraph","content":[{"text":"atacseq:","type":"text","marks":[{"type":"strong"}]}]},{"type":"code_block","attrs":{"wrap":false,"language":"csv"},"content":[{"text":"sample,fastq_1,fastq_2,replicate\nCONTROL,/abs/path/ctrl_R1.fq.gz,/abs/path/ctrl_R2.fq.gz,1","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Step 5: Configure & Run","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"5a. Check genome availability","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"python scripts/manage_genomes.py check \u003cgenome>\n# If not installed:\npython scripts/manage_genomes.py download \u003cgenome>","type":"text"}]},{"type":"paragraph","content":[{"text":"Common genomes: GRCh38 (human), GRCh37 (legacy), GRCm39 (mouse), R64-1-1 (yeast), BDGP6 (fly)","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"5b. Decision points","type":"text"}]},{"type":"paragraph","content":[{"text":"DECISION POINT: Confirm with user:","type":"text","marks":[{"type":"strong"}]}]},{"type":"ordered_list","attrs":{"order":1,"listStyle":"number"},"content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Genome:","type":"text","marks":[{"type":"strong"}]},{"text":" Which reference to use","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Pipeline-specific options:","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"rnaseq:","type":"text","marks":[{"type":"strong"}]},{"text":" aligner (star_salmon recommended, hisat2 for low memory)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"sarek:","type":"text","marks":[{"type":"strong"}]},{"text":" tools (haplotypecaller for germline, mutect2 for somatic)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"atacseq:","type":"text","marks":[{"type":"strong"}]},{"text":" read_length (50, 75, 100, or 150)","type":"text"}]}]}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"5c. Run pipeline","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"nextflow run nf-core/\u003cpipeline> \\\n -r \u003cversion> \\\n -profile docker \\\n --input samplesheet.csv \\\n --outdir results \\\n --genome \u003cgenome> \\\n -resume","type":"text"}]},{"type":"paragraph","content":[{"text":"Key flags:","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"-r","type":"text","marks":[{"type":"code_inline"}]},{"text":": Pin version","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"-profile docker","type":"text","marks":[{"type":"code_inline"}]},{"text":": Use Docker (or ","type":"text"},{"text":"singularity","type":"text","marks":[{"type":"code_inline"}]},{"text":" for HPC)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"--genome","type":"text","marks":[{"type":"code_inline"}]},{"text":": iGenomes key","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"-resume","type":"text","marks":[{"type":"code_inline"}]},{"text":": Continue from checkpoint","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"Resource limits (if needed):","type":"text","marks":[{"type":"strong"}]}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"--max_cpus 8 --max_memory '32.GB' --max_time '24.h'","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Step 6: Verify Outputs","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Check completion","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"ls results/multiqc/multiqc_report.html\ngrep \"Pipeline completed successfully\" .nextflow.log","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Key outputs by pipeline","type":"text"}]},{"type":"paragraph","content":[{"text":"rnaseq:","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"results/star_salmon/salmon.merged.gene_counts.tsv","type":"text","marks":[{"type":"code_inline"}]},{"text":" - Gene counts","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"results/star_salmon/salmon.merged.gene_tpm.tsv","type":"text","marks":[{"type":"code_inline"}]},{"text":" - TPM values","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"sarek:","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"results/variant_calling/*/","type":"text","marks":[{"type":"code_inline"}]},{"text":" - VCF files","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"results/preprocessing/recalibrated/","type":"text","marks":[{"type":"code_inline"}]},{"text":" - BAM files","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"atacseq:","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"results/macs2/narrowPeak/","type":"text","marks":[{"type":"code_inline"}]},{"text":" - Peak calls","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"results/bwa/mergedLibrary/bigwig/","type":"text","marks":[{"type":"code_inline"}]},{"text":" - Coverage tracks","type":"text"}]}]}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Quick Reference","type":"text"}]},{"type":"paragraph","content":[{"text":"For common exit codes and fixes, see ","type":"text"},{"text":"references/troubleshooting.md","type":"text","marks":[{"type":"link","attrs":{"href":"references/troubleshooting.md","title":null}}]},{"text":".","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Resume failed run","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"nextflow run nf-core/\u003cpipeline> -resume","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"References","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"references/geo-sra-acquisition.md","type":"text","marks":[{"type":"link","attrs":{"href":"references/geo-sra-acquisition.md","title":null}}]},{"text":" - Downloading public GEO/SRA data","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"references/troubleshooting.md","type":"text","marks":[{"type":"link","attrs":{"href":"references/troubleshooting.md","title":null}}]},{"text":" - Common issues and fixes","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"references/installation.md","type":"text","marks":[{"type":"link","attrs":{"href":"references/installation.md","title":null}}]},{"text":" - Environment setup","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"references/pipelines/rnaseq.md","type":"text","marks":[{"type":"link","attrs":{"href":"references/pipelines/rnaseq.md","title":null}}]},{"text":" - RNA-seq pipeline details","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"references/pipelines/sarek.md","type":"text","marks":[{"type":"link","attrs":{"href":"references/pipelines/sarek.md","title":null}}]},{"text":" - Variant calling details","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"references/pipelines/atacseq.md","type":"text","marks":[{"type":"link","attrs":{"href":"references/pipelines/atacseq.md","title":null}}]},{"text":" - ATAC-seq details","type":"text"}]}]}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Disclaimer","type":"text"}]},{"type":"paragraph","content":[{"text":"This skill is provided as a prototype example demonstrating how to integrate nf-core bioinformatics pipelines into Claude Code for automated analysis workflows. The current implementation supports three pipelines (rnaseq, sarek, and atacseq), serving as a foundation that enables the community to expand support to the full set of nf-core pipelines.","type":"text"}]},{"type":"paragraph","content":[{"text":"It is intended for educational and research purposes and should not be considered production-ready without appropriate validation for your specific use case. Users are responsible for ensuring their computing environment meets pipeline requirements and for verifying analysis results.","type":"text"}]},{"type":"paragraph","content":[{"text":"Anthropic does not guarantee the accuracy of bioinformatics outputs, and users should follow standard practices for validating computational analyses. This integration is not officially endorsed by or affiliated with the nf-core community.","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Attribution","type":"text"}]},{"type":"paragraph","content":[{"text":"When publishing results, cite the appropriate pipeline. Citations are available in each nf-core repository's CITATIONS.md file (e.g., https://github.com/nf-core/rnaseq/blob/3.22.2/CITATIONS.md).","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Licenses","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"nf-core pipelines:","type":"text","marks":[{"type":"strong"}]},{"text":" MIT License (https://nf-co.re/about)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Nextflow:","type":"text","marks":[{"type":"strong"}]},{"text":" Apache License, Version 2.0 (https://www.nextflow.io/about-us.html)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"NCBI SRA Toolkit:","type":"text","marks":[{"type":"strong"}]},{"text":" Public Domain (https://github.com/ncbi/sra-tools/blob/master/LICENSE)","type":"text"}]}]}]},{"type":"hr","attrs":{"markup":"---"}}]},"metadata":{"date":"2026-06-05","name":"nextflow-development","author":"@skillopedia","source":{"stars":18616,"repo_name":"knowledge-work-plugins","origin_url":"https://github.com/anthropics/knowledge-work-plugins/blob/HEAD/bio-research/skills/nextflow-development/SKILL.md","repo_owner":"anthropics","body_sha256":"48b0c99125fe1678a1b258c468801ef6235ab42570fdfdc7757d3cf291248079","cluster_key":"4f6fea64a1f88dcfbd5ca9e650e4fe0056c439877f95b528a193f4945dbc16da","clean_bundle":{"format":"clean-skill-bundle-v1","source":"anthropics/knowledge-work-plugins/bio-research/skills/nextflow-development/SKILL.md","attachments":[{"id":"a11d3999-818b-5336-b7bd-866cb0b76b60","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/a11d3999-818b-5336-b7bd-866cb0b76b60/attachment.md","path":"references/geo-sra-acquisition.md","size":14122,"sha256":"85238c3bf2c43d2ca5ad8c4fb4418927ff3443bf8993813925a188a1c287d170","contentType":"text/markdown; charset=utf-8"},{"id":"91ad67cd-4c28-524a-8d28-9224b15b340c","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/91ad67cd-4c28-524a-8d28-9224b15b340c/attachment.md","path":"references/installation.md","size":1816,"sha256":"d214ad4508868ed9fd648ec625e633c2b7629df300a4953a4b89fc6010fa7577","contentType":"text/markdown; charset=utf-8"},{"id":"e5cfdda1-5ae0-55c7-bc22-fd1e832bad2e","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/e5cfdda1-5ae0-55c7-bc22-fd1e832bad2e/attachment.md","path":"references/pipelines/atacseq.md","size":4181,"sha256":"ca6ca22e409e78be3aa6f0e18de9c255fd5e9ac66c59d71ce214fc4a76849a1b","contentType":"text/markdown; charset=utf-8"},{"id":"5e9bb772-26ce-5861-967b-7480d700d3e5","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/5e9bb772-26ce-5861-967b-7480d700d3e5/attachment.md","path":"references/pipelines/rnaseq.md","size":3614,"sha256":"c7b0c59b33d52b62d6ca2a476f1f45c849dc83923773f258a0731ba2b7b7c93a","contentType":"text/markdown; charset=utf-8"},{"id":"806efd2b-d64a-57ca-971f-5302277d0d2d","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/806efd2b-d64a-57ca-971f-5302277d0d2d/attachment.md","path":"references/pipelines/sarek.md","size":4018,"sha256":"67fa5ee65022875b95e6181baa3eec67c572d0564d2aa7bb498d85551458c2fa","contentType":"text/markdown; charset=utf-8"},{"id":"78f5609b-af9e-515d-891b-cdc4e7544127","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/78f5609b-af9e-515d-891b-cdc4e7544127/attachment.md","path":"references/troubleshooting.md","size":3368,"sha256":"f49b6b48590170770d8e0eb69ee9894c3de04d779b0031d0515529f2e7c01450","contentType":"text/markdown; charset=utf-8"},{"id":"f79ccc0f-0186-5a6c-8102-d93957a4857f","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/f79ccc0f-0186-5a6c-8102-d93957a4857f/attachment.py","path":"scripts/check_environment.py","size":14111,"sha256":"ab259534959c86c4bcdb7a4a6d2eab1edf9d47de6e273c2a3ca0eb13e6b61604","contentType":"text/x-python; charset=utf-8"},{"id":"d55bff02-ded1-5179-80e5-fd4ac9fe507d","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/d55bff02-ded1-5179-80e5-fd4ac9fe507d/attachment.yaml","path":"scripts/config/genomes.yaml","size":3520,"sha256":"940bd833b87542a25705668a3ba8f157ea25fcb698f654f2d410efd410fa4172","contentType":"application/yaml; charset=utf-8"},{"id":"0eec19fe-c742-5b77-a67c-de117a16d9d3","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/0eec19fe-c742-5b77-a67c-de117a16d9d3/attachment.yaml","path":"scripts/config/pipelines/atacseq.yaml","size":5158,"sha256":"fb63a0f76fec6b71303e8a58bba8c4c38a56ee533784d438953fe5e394444158","contentType":"application/yaml; charset=utf-8"},{"id":"de54547a-4b6a-5ec9-b38d-959552919f88","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/de54547a-4b6a-5ec9-b38d-959552919f88/attachment.yaml","path":"scripts/config/pipelines/rnaseq.yaml","size":4173,"sha256":"8df7b123fc64328aa4b97d047b316cdbaf4622d89684dfd0b6aca5a5da263f5a","contentType":"application/yaml; charset=utf-8"},{"id":"583ad52a-650e-5c8b-9b2c-fe61d84e1be3","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/583ad52a-650e-5c8b-9b2c-fe61d84e1be3/attachment.yaml","path":"scripts/config/pipelines/sarek.yaml","size":6278,"sha256":"03a904ab4a5f3de216200b60646f58faa37416d4bafda6289fcca91caf1c4f2f","contentType":"application/yaml; charset=utf-8"},{"id":"28794f6a-ad60-5feb-8d92-7582591de6b7","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/28794f6a-ad60-5feb-8d92-7582591de6b7/attachment.py","path":"scripts/detect_data_type.py","size":9953,"sha256":"b69c0293c00f292b27e00b5d36629469e843e60f952d299ce85845fbf258314b","contentType":"text/x-python; charset=utf-8"},{"id":"7f13aa15-df58-5074-8fc8-bb9067eb859e","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/7f13aa15-df58-5074-8fc8-bb9067eb859e/attachment.py","path":"scripts/generate_samplesheet.py","size":15440,"sha256":"14a4acb03e08369ea9d73de872b20e3578bcd3eec7bbfa03c2df6adcf5eacbf1","contentType":"text/x-python; charset=utf-8"},{"id":"42e29048-b65d-5956-aae4-7a7933294b25","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/42e29048-b65d-5956-aae4-7a7933294b25/attachment.py","path":"scripts/manage_genomes.py","size":16718,"sha256":"d27e36e0c442b081ef7e60d426c037967afec97e441e8fd7ae0a5e436b64e863","contentType":"text/x-python; charset=utf-8"},{"id":"5c73b10a-865e-5032-8c61-9a4e9c6a5afa","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/5c73b10a-865e-5032-8c61-9a4e9c6a5afa/attachment.py","path":"scripts/sra_geo_fetch.py","size":24982,"sha256":"0bd4f292350b47b71e0048a223ca50ebdde0ced1f33d5661daf756c73d1b5a31","contentType":"text/x-python; charset=utf-8"},{"id":"b2867b1d-4308-589b-9e61-a1bef34de68e","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/b2867b1d-4308-589b-9e61-a1bef34de68e/attachment.py","path":"scripts/utils/__init__.py","size":1788,"sha256":"4ef421fc4ad6f13aafc0c34e8c92cf3463cebb0c9f7732d2b9ff876c5cd4d89c","contentType":"text/x-python; charset=utf-8"},{"id":"ebe832ae-6af8-55ae-ba57-2c0f31ec59d6","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/ebe832ae-6af8-55ae-ba57-2c0f31ec59d6/attachment.py","path":"scripts/utils/file_discovery.py","size":5169,"sha256":"3540e5411b1bf52e017a1636a127c4f52392f5e888e0ace47361e4c29e3ef380","contentType":"text/x-python; charset=utf-8"},{"id":"7784fe08-3602-5bdd-a4d1-ff99894c74ff","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/7784fe08-3602-5bdd-a4d1-ff99894c74ff/attachment.py","path":"scripts/utils/ncbi_utils.py","size":28480,"sha256":"bfed0a0570f3344036277867c3e69fb6136ae87040ed2640d5b62919a046cfc8","contentType":"text/x-python; charset=utf-8"},{"id":"3a1d8e56-2ded-5a4a-b4d7-190e914e2191","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/3a1d8e56-2ded-5a4a-b4d7-190e914e2191/attachment.py","path":"scripts/utils/sample_inference.py","size":7848,"sha256":"3f2dcebbc540fa76e9285dd247cb0d6922d4fbe2588b7f7b3ad7ac7ab82b63af","contentType":"text/x-python; charset=utf-8"},{"id":"d887c099-dde4-5a93-97c1-59a3b9836c2b","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/d887c099-dde4-5a93-97c1-59a3b9836c2b/attachment.py","path":"scripts/utils/validators.py","size":8419,"sha256":"2572671195244a3ac7e67a6ebe4c44bbd541fbaa1afe3ebab1f81a46b29feea1","contentType":"text/x-python; charset=utf-8"}],"bundle_sha256":"df5ecdde4f62921430324a442889fa6e7b8e32be340544e2719fbbdc48c3b5d4","attachment_count":20,"text_attachments":20,"attachment_storage":"skillopedia-attachments-v1","binary_attachments":0,"excluded_attachments":[]},"cluster_size":2,"skill_md_path":"bio-research/skills/nextflow-development/SKILL.md","import_metadata":{"date":"2026-06-05","author":"@skillopedia","version":"v1","category":"data-analytics","category_label":"Data"},"exact_dupes_collapsed_into_this":1},"version":"v1","category":"data-analytics","import_tag":"clean-skills-v1","description":"Run nf-core bioinformatics pipelines (rnaseq, sarek, atacseq) on sequencing data. Use when analyzing RNA-seq, WGS/WES, or ATAC-seq data—either local FASTQs or public datasets from GEO/SRA. Triggers on nf-core, Nextflow, FASTQ analysis, variant calling, gene expression, differential expression, GEO reanalysis, GSE/GSM/SRR accessions, or samplesheet creation."}},"renderedAt":1782979459095}

nf-core Pipeline Deployment Run nf-core bioinformatics pipelines on local or public sequencing data. Target users: Bench scientists and researchers without specialized bioinformatics training who need to run large-scale omics analyses—differential expression, variant calling, or chromatin accessibility analysis. Workflow Checklist --- Step 0: Acquire Data (GEO/SRA Only) Skip this step if user has local FASTQ files. For public datasets, fetch from GEO/SRA first. See references/geo-sra-acquisition.md for the full workflow. Quick start: DECISION POINT: After fetching study info, confirm with use…