Rare Disease Diagnosis Advisor Systematic diagnosis support for rare diseases using phenotype matching, gene panel prioritization, and variant interpretation across Orphanet, OMIM, HPO, ClinVar, and structure-based analysis. KEY PRINCIPLES : 1. Report-first - Create report file FIRST, update progressively 2. Phenotype-driven - Convert symptoms to HPO terms before searching 3. Multi-database triangulation - Cross-reference Orphanet, OMIM, OpenTargets 4. Evidence grading - Grade diagnoses by supporting evidence strength 5. English-first queries - Always use English terms in tool calls LOOK UP,…

, variant_info.get('aa_change', ''))\n if variant_info.get('uniprot_id') and aa_match:\n ref_aa, pos_str, alt_aa = aa_match.groups()\n position = int(pos_str)\n up = tu.tools.UniProt_get_entry_by_accession(accession=variant_info['uniprot_id'])\n wt_seq = ((up.get('data') or {}).get('sequence', {}) or {}).get('value') if up.get('status') == 'success' else None\n # Only call SAE when the WT residue at `position` matches `ref_aa` —\n # otherwise the variant is on a different isoform than the UniProt\n # canonical sequence, and a silent skip is safer than the tool's\n # ref_aa-mismatch error in the diagnostic report.\n if wt_seq and 1 \u003c= position \u003c= len(wt_seq) and wt_seq[position - 1] == ref_aa:\n mech = tu.tools.ESM_explain_variant_mechanism(\n sequence=wt_seq, position=position, ref_aa=ref_aa, alt_aa=alt_aa,\n top_k_features=5,\n )\n if mech.get('status') == 'success':\n predictions['sae_mechanism'] = {\n 'summary': mech['data']['mechanism_summary'],\n 'lost_categories': mech['data']['lost_feature_categories'],\n 'gained_categories': mech['data']['gained_feature_categories'],\n # Map to ACMG: catalytic / ligand-binding / ptm loss is\n # mechanistic evidence supporting PP3 (does not replace\n # functional study PS3).\n }\n\n # 3. EVE - Evolutionary prediction\n eve = tu.tools.EVE_get_variant_score(\n chrom=variant_info['chrom'],\n pos=variant_info['pos'],\n ref=variant_info['ref'],\n alt=variant_info['alt']\n )\n if eve.get('status') == 'success':\n eve_scores = eve['data'].get('eve_scores', [])\n if eve_scores:\n predictions['eve'] = {\n 'score': eve_scores[0].get('eve_score'),\n 'classification': eve_scores[0].get('classification'),\n 'acmg': 'PP3' if eve_scores[0].get('eve_score', 0) > 0.5 else 'BP4'\n }\n\n # 4. SpliceAI - Splice variant prediction\n variant_str = f\"chr{variant_info['chrom']}-{variant_info['pos']}-{variant_info['ref']}-{variant_info['alt']}\"\n splice = tu.tools.SpliceAI_predict_splice(\n variant=variant_str,\n genome=\"38\"\n )\n if splice.get('data'):\n max_score = splice['data'].get('max_delta_score', 0)\n interpretation = splice['data'].get('interpretation', '')\n\n if max_score >= 0.8:\n splice_acmg = 'PP3 (strong) - high splice impact'\n elif max_score >= 0.5:\n splice_acmg = 'PP3 (moderate) - splice impact'\n elif max_score >= 0.2:\n splice_acmg = 'PP3 (supporting) - possible splice effect'\n else:\n splice_acmg = 'BP7 (if synonymous) - no splice impact'\n\n predictions['spliceai'] = {\n 'max_delta_score': max_score,\n 'interpretation': interpretation,\n 'scores': splice['data'].get('scores', []),\n 'acmg': splice_acmg\n }\n\n # Consensus for PP3/BP4\n damaging = sum(1 for p in predictions.values() if 'PP3' in p.get('acmg', ''))\n benign = sum(1 for p in predictions.values() if 'BP4' in p.get('acmg', ''))\n\n return {\n 'predictions': predictions,\n 'consensus': {\n 'damaging_count': damaging,\n 'benign_count': benign,\n 'pp3_applicable': damaging >= 2 and benign == 0,\n 'bp4_applicable': benign >= 2 and damaging == 0\n }\n }\n```\n\n### 4.4 ACMG Classification Criteria\n\n| Evidence Type | Criteria | Weight |\n|---------------|----------|--------|\n| **PVS1** | Null variant in gene where LOF is mechanism | Very Strong |\n| **PS1** | Same amino acid change as established pathogenic | Strong |\n| **PM2** | Absent from population databases | Moderate |\n| **PP3** | Computational evidence supports deleterious (AlphaMissense, CADD, EVE, SpliceAI) | Supporting |\n| **BA1** | Allele frequency >5% | Benign standalone |\n\n**Enhanced PP3 Evidence**:\n- **AlphaMissense pathogenic** (>0.564) = Strong PP3 support (~90% accuracy)\n- **CADD >=20** + **EVE >0.5** = Multiple concordant predictions\n- Agreement from 2+ predictors strengthens PP3 evidence\n\n---\n\n## Phase 5: Structure Analysis for VUS\n\n### 5.1 When to Perform Structure Analysis\n\nPerform when:\n- Variant is VUS or conflicting interpretations\n- Missense variant in critical domain\n- Novel variant not in databases\n- Additional evidence needed for classification\n\n### 5.2 Structure Prediction (NVIDIA NIM)\n\n```python\ndef analyze_variant_structure(tu, protein_sequence, variant_position):\n \"\"\"Predict structure and analyze variant impact.\"\"\"\n\n structure = tu.tools.NvidiaNIM_alphafold2(\n sequence=protein_sequence,\n algorithm=\"mmseqs2\",\n relax_prediction=False\n )\n\n variant_plddt = get_residue_plddt(structure, variant_position)\n confidence = \"High\" if variant_plddt > 70 else \"Low\"\n\n return {\n 'structure': structure,\n 'variant_plddt': variant_plddt,\n 'confidence': confidence\n }\n```\n\n### 5.3 Domain Impact Assessment\n\n```python\ndef assess_domain_impact(tu, uniprot_id, variant_position):\n \"\"\"Check if variant affects functional domain.\"\"\"\n\n domains = tu.tools.InterPro_get_protein_domains(accession=uniprot_id)\n\n for domain in domains:\n if domain['start'] \u003c= variant_position \u003c= domain['end']:\n return {\n 'in_domain': True,\n 'domain_name': domain['name'],\n 'domain_function': domain['description']\n }\n\n return {'in_domain': False}\n```\n\n---\n\n## Phase 6: Literature Evidence\n\n### 6.1 Published Literature (PubMed)\n\n```python\ndef search_disease_literature(tu, disease_name, genes):\n \"\"\"Search for relevant published literature.\"\"\"\n\n disease_papers = tu.tools.PubMed_search_articles(\n query=f'\"{disease_name}\" AND (genetics OR mutation OR variant)',\n limit=20\n )\n\n gene_papers = []\n for gene in genes[:5]:\n papers = tu.tools.PubMed_search_articles(\n query=f'\"{gene}\" AND rare disease AND pathogenic',\n limit=10\n )\n gene_papers.extend(papers)\n\n return {\n 'disease_literature': disease_papers,\n 'gene_literature': gene_papers\n }\n```\n\n### 6.2 Preprint Literature (BioRxiv/MedRxiv)\n\n```python\ndef search_preprints(tu, disease_name, genes):\n \"\"\"Search preprints for cutting-edge findings.\"\"\"\n\n biorxiv = tu.tools.BioRxiv_list_recent_preprints(\n query=f\"{disease_name} genetics\",\n limit=10\n )\n\n arxiv = tu.tools.ArXiv_search_papers(\n query=f\"rare disease diagnosis {' OR '.join(genes[:3])}\",\n category=\"q-bio\",\n limit=5\n )\n\n return {\n 'biorxiv': biorxiv,\n 'arxiv': arxiv\n }\n```\n\n### 6.3 Citation Analysis (OpenAlex)\n\n```python\ndef analyze_citations(tu, key_papers):\n \"\"\"Analyze citation network for key papers.\"\"\"\n\n citation_analysis = []\n for paper in key_papers[:5]:\n work = tu.tools.openalex_search_works(\n query=paper['title'],\n limit=1\n )\n if work:\n citation_analysis.append({\n 'title': paper['title'],\n 'citations': work[0].get('cited_by_count', 0),\n 'year': work[0].get('publication_year')\n })\n\n return citation_analysis\n```\n","content_type":"text/markdown; charset=utf-8","language":"markdown","size":22098,"content_sha256":"db518547bd44e240de7e7ef8c68132f24674ab277ba6ef8d1abc5cdc115ba7f0"},{"filename":"EXAMPLES.md","content":"# Rare Disease Diagnosis - Examples\n\n## Example 1: Connective Tissue Disorder\n\n### User Query\n\"23-year-old male with tall stature (6'5\"), long fingers, joint hypermobility, and recent echo showing aortic root dilation 4.2cm. Father had similar build and died suddenly at age 45. What could this be?\"\n\n### Expected Output\n\n```markdown\n# Rare Disease Diagnostic Report\n\n**Patient ID**: Example-001 | **Date**: 2026-02-04\n\n---\n\n## Executive Summary\n\nThis patient presents with a classic phenotype highly suggestive of **Marfan syndrome** (★★★). Key findings include tall stature, arachnodactyly, joint hypermobility, and aortic root dilation with a positive family history of sudden death. The phenotype match is 89% for Marfan syndrome. **Immediate FBN1 genetic testing is recommended**, along with urgent cardiology referral given the aortic dilation and family history of sudden death.\n\n---\n\n## 1. Phenotype Analysis\n\n### 1.1 Standardized HPO Terms\n\n| Clinical Feature | HPO Term | HPO ID | Category |\n|------------------|----------|--------|----------|\n| Tall stature (6'5\") | Tall stature | HP:0000098 | Core |\n| Long fingers | Arachnodactyly | HP:0001166 | Core |\n| Joint hypermobility | Joint hypermobility | HP:0001382 | Core |\n| Aortic root dilation | Aortic root aneurysm | HP:0002616 | Core |\n| Family history of sudden death | Sudden cardiac death | HP:0001645 | Family |\n\n**Total HPO Terms**: 5\n**Age of Onset**: Childhood (stature)\n**Family History**: Father deceased age 45, sudden death, similar phenotype\n**Suspected Inheritance**: Autosomal dominant\n\n*Source: HPO via `HPO_search_terms`*\n\n---\n\n## 2. Differential Diagnosis\n\n### 2.1 Ranked Candidate Diseases\n\n| Rank | Disease | ORPHA | OMIM | Match | Inheritance | Genes |\n|------|---------|-------|------|-------|-------------|-------|\n| 1 | Marfan syndrome | 558 | 154700 | 89% ★★★ | AD | FBN1 |\n| 2 | Loeys-Dietz syndrome | 60030 | 609192 | 72% ★★☆ | AD | TGFBR1, TGFBR2, SMAD3 |\n| 3 | Vascular EDS | 286 | 130050 | 58% ★☆☆ | AD | COL3A1 |\n| 4 | Familial TAAD | 91387 | 607086 | 52% ★☆☆ | AD | ACTA2, MYH11 |\n| 5 | Homocystinuria | 394 | 236200 | 45% ☆☆☆ | AR | CBS |\n\n### 2.2 Disease Details\n\n#### 1. Marfan Syndrome (★★★)\n\n**ORPHA**: 558 | **OMIM**: 154700 | **Prevalence**: 1-5/10,000\n\n**Phenotype Comparison**:\n| Patient Feature | Marfan Feature | Frequency | Match |\n|-----------------|----------------|-----------|-------|\n| Tall stature | Tall stature | 95% | ✓ |\n| Arachnodactyly | Arachnodactyly | 90% | ✓ |\n| Joint hypermobility | Joint hypermobility | 85% | ✓ |\n| Aortic root dilation | Aortic root dilation | 80% | ✓ |\n| Family sudden death | Aortic dissection | 30% | ✓ |\n| Ectopia lentis | Ectopia lentis | 60% | Not assessed |\n\n**Ghent Criteria Assessment**:\n- Aortic root Z-score: Likely ≥2 (4.2cm at age 23)\n- Systemic score: ≥7 points (tall, arachnodactyly, joint hypermobility)\n- FBN1 mutation: Pending testing\n- **Clinical diagnosis likely met even without genetic testing**\n\n**Gene**: FBN1 (fibrillin-1)\n- ClinGen validity: Definitive\n- Inheritance: AD (25% de novo)\n\n*Source: Orphanet via `Orphanet_558`, OMIM via `OMIM_get_entry`*\n\n#### 2. Loeys-Dietz Syndrome (★★☆)\n\n**ORPHA**: 60030 | **OMIM**: 609192\n\n**Key Distinguishing Features** (not present in patient):\n- Hypertelorism (wide-set eyes)\n- Bifid uvula or cleft palate\n- Arterial tortuosity\n- Translucent skin\n\n**Consider if**: FBN1 negative AND craniofacial features present\n\n*Source: Orphanet, OMIM*\n\n---\n\n## 3. Recommended Gene Panel\n\n### 3.1 Prioritized Genes\n\n| Priority | Gene | Disease | Evidence | pLI | Expression |\n|----------|------|---------|----------|-----|------------|\n| ★★★ | FBN1 | Marfan | Definitive | 1.00 | Aorta, heart |\n| ★★☆ | TGFBR1 | LDS1 | Definitive | 0.98 | Ubiquitous |\n| ★★☆ | TGFBR2 | LDS2 | Definitive | 0.99 | Ubiquitous |\n| ★★☆ | SMAD3 | LDS3 | Definitive | 0.89 | Ubiquitous |\n| ★☆☆ | COL3A1 | vEDS | Definitive | 1.00 | Connective |\n| ★☆☆ | ACTA2 | FTAAD | Definitive | 0.97 | Smooth muscle |\n\n### 3.2 Testing Strategy\n\n**Recommended Approach**:\n1. **Immediate**: FBN1 sequencing + deletion/duplication analysis\n - Highest pre-test probability\n - Expected turnaround: 2-4 weeks\n \n2. **If FBN1 negative**: Aortopathy gene panel\n - TGFBR1, TGFBR2, SMAD3, COL3A1, ACTA2, MYH11\n \n3. **If panel negative**: Consider WES with phenotype-guided analysis\n\n*Source: ClinGen gene-disease validity, GTEx expression*\n\n---\n\n## 4. Variant Interpretation\n\n**No variants provided for interpretation.**\n\nGenetic testing recommended - see Section 3.\n\n---\n\n## 5. Structural Analysis\n\n**Not applicable** - No VUS requiring structural analysis.\n\n---\n\n## 6. Clinical Recommendations\n\n### 6.1 Diagnostic Next Steps\n\n| Priority | Action | Rationale | Timeline |\n|----------|--------|-----------|----------|\n| 1 | **Urgent cardiology referral** | Aortic root 4.2cm + family history | This week |\n| 2 | **FBN1 genetic testing** | Confirm diagnosis, family cascade | Order now |\n| 3 | **Ophthalmology exam** | Ectopia lentis screening | Within 1 month |\n| 4 | **Full skeletal assessment** | Document systemic features | Within 1 month |\n\n### 6.2 Specialist Referrals\n\n- **Cardiology** (URGENT): Aortic surveillance, beta-blocker consideration\n- **Medical Genetics**: Genetic counseling, testing coordination\n- **Ophthalmology**: Slit-lamp exam for ectopia lentis\n- **Orthopedics**: Scoliosis screening if indicated\n\n### 6.3 Family Screening\n\n**High priority given family history**:\n- Father deceased (autopsy results if available)\n- Any siblings should be offered:\n - Clinical screening (echo, skeletal exam)\n - Genetic testing once proband result available\n- Extended family (paternal) should be informed\n\n### 6.4 Urgent Considerations\n\n⚠️ **URGENT**: Given aortic root dilation AND family history of sudden death:\n- Avoid competitive sports and isometric exercise\n- Discuss blood pressure management\n- Review for aortic dissection symptoms\n- Cardiology referral within 1 week\n\n---\n\n## 7. Data Gaps & Limitations\n\n| Gap | Impact | Recommendation |\n|-----|--------|----------------|\n| No ophthalmology exam | Cannot assess ectopia lentis | Schedule exam |\n| Echo Z-score not calculated | Need BSA-adjusted measurement | Request from cardiology |\n| Father's autopsy unknown | Cannot confirm aortic dissection | Obtain records |\n| No genetic testing yet | Diagnosis presumptive | Order FBN1 testing |\n\n---\n\n## 8. Data Sources\n\n| Tool | Query | Data Retrieved |\n|------|-------|----------------|\n| HPO_search_terms | Patient symptoms | HPO term mapping |\n| Orphanet_558 | Marfan syndrome | Disease details |\n| OMIM_get_entry | 154700 | Clinical synopsis |\n| ClinGen | FBN1-Marfan | Gene-disease validity |\n| GTEx | FBN1 expression | Tissue expression |\n```\n\n---\n\n## Example 2: Pediatric Neurological Phenotype\n\n### User Query\n\"5-year-old with developmental delay, hypotonia, seizures starting at age 2, and MRI showing periventricular white matter changes. WES found a VUS: GFAP c.1186C>T (p.Arg396Cys). What's the diagnosis?\"\n\n### Expected Output (Key Sections)\n\n```markdown\n# Rare Disease Diagnostic Report\n\n**Patient ID**: Example-002 | **Date**: 2026-02-04\n\n---\n\n## Executive Summary\n\nThis patient's phenotype and VUS in GFAP are highly consistent with **Alexander disease** (★★★). The combination of developmental delay, hypotonia, seizures, and frontal-predominant white matter changes in a young child matches infantile/juvenile Alexander disease. The GFAP p.Arg396Cys variant affects a highly conserved residue in the rod domain. **Structural analysis and segregation studies are recommended to support reclassification of this VUS to Likely Pathogenic.**\n\n---\n\n## 1. Phenotype Analysis\n\n| Clinical Feature | HPO Term | HPO ID | Category |\n|------------------|----------|--------|----------|\n| Developmental delay | Global developmental delay | HP:0001263 | Core |\n| Hypotonia | Muscular hypotonia | HP:0001252 | Core |\n| Seizures (age 2) | Seizures | HP:0001250 | Core |\n| White matter changes | Leukoencephalopathy | HP:0002352 | Core |\n| Frontal predominance | Frontal white matter abnormality | HP:0012762 | Specific |\n\n---\n\n## 2. Differential Diagnosis\n\n| Rank | Disease | ORPHA | Match | Key Features |\n|------|---------|-------|-------|--------------|\n| 1 | Alexander disease | 58 | 92% ★★★ | GFAP, frontal WM, macrocephaly |\n| 2 | Vanishing white matter | 135 | 68% ★★☆ | eIF2B genes, progressive |\n| 3 | Canavan disease | 141 | 55% ★☆☆ | ASPA, NAA elevated |\n| 4 | Metachromatic leukodystrophy | 512 | 48% ★☆☆ | ARSA, progressive |\n\n### Alexander Disease Details\n\n**Diagnostic Criteria**:\n1. Clinical: Macrocephaly, seizures, developmental delay ✓\n2. MRI: Frontal-predominant white matter changes ✓\n3. Genetic: Heterozygous GFAP variant ✓ (VUS)\n\n*Source: Orphanet via `Orphanet_58`*\n\n---\n\n## 4. Variant Interpretation\n\n### Variant: GFAP c.1186C>T (p.Arg396Cys)\n\n| Property | Value | Interpretation |\n|----------|-------|----------------|\n| Gene | GFAP | Alexander disease gene |\n| Consequence | Missense | Amino acid change |\n| ClinVar | VUS | 1 submission |\n| gnomAD AF | 0.0000032 | Absent (PM2) |\n| CADD | 29.2 | Deleterious |\n| REVEL | 0.89 | Likely damaging |\n\n### ACMG Evidence\n\n| Criterion | Evidence | Strength |\n|-----------|----------|----------|\n| PM2 | Absent from gnomAD | Moderate |\n| PP3 | CADD=29.2, REVEL=0.89 | Supporting |\n| PP4 | Phenotype specific for Alexander | Supporting |\n| PM1 | Located in rod domain (critical) | Moderate |\n\n**Current Classification**: VUS (2 Moderate + 2 Supporting)\n**With segregation (PS2) or functional data**: Would become Likely Pathogenic\n\n---\n\n## 5. Structural Analysis\n\n### 5.1 Structure Prediction\n\n**Method**: AlphaFold2 via NVIDIA NIM\n**Protein**: Glial fibrillary acidic protein (GFAP)\n**Sequence Length**: 432 amino acids\n\n| Metric | Value |\n|--------|-------|\n| Mean pLDDT | 78.4 |\n| Position 396 pLDDT | 89.2 (high confidence) |\n| Domain | Rod domain, coil 2B |\n\n### 5.2 Variant Impact\n\n**p.Arg396Cys Analysis**:\n\n| Feature | Finding |\n|---------|---------|\n| Location | Rod domain (coiled-coil) |\n| Wild-type | Arginine (positive, polar) |\n| Mutant | Cysteine (neutral, potential disulfide) |\n| Conservation | 100% across vertebrates |\n| Nearby pathogenic | p.Arg398Trp (Pathogenic) |\n\n**Structural Interpretation**:\n- Arginine at position 396 participates in coiled-coil interactions\n- Cysteine substitution disrupts ionic interactions\n- Adjacent residue 398 has pathogenic variants\n- **Strong structural support for pathogenicity**\n\n*Source: NVIDIA NIM via `NvidiaNIM_alphafold2`, InterPro*\n\n---\n\n## 6. Clinical Recommendations\n\n### Immediate Actions\n1. **Clinical genetics consultation** - Discuss VUS implications\n2. **Parental testing** - De novo status would upgrade variant (PS2)\n3. **Neurology follow-up** - Seizure management, prognosis discussion\n\n### Supporting Studies\n- Obtain parental samples for GFAP c.1186C>T\n- If not in parents → add PS2 (strong evidence) → Likely Pathogenic\n```\n\n---\n\n## Example 3: Limited Data Scenario\n\n### User Query\n\"6-month-old with severe hypotonia and feeding difficulties. No genetic testing done yet. Where do we start?\"\n\n### Expected Output (Key Sections)\n\n```markdown\n## Executive Summary\n\nThis infant presents with a non-specific phenotype (hypotonia + feeding difficulties) that could represent numerous conditions. Given the limited phenotypic information, **a broad approach is recommended**: either chromosomal microarray + trio WES, or a comprehensive neuromuscular panel. Additional clinical information (neuroimaging, EMG, metabolic workup) would help narrow the differential.\n\n---\n\n## 2. Differential Diagnosis\n\n**Note**: With only 2 phenotypic features, differential diagnosis is broad.\n\n### Top Categories to Consider\n\n| Category | Examples | Key Tests |\n|----------|----------|-----------|\n| **SMA/Neuromuscular** | SMA type 1, CMD | SMN1 del/dup, CK |\n| **Congenital myopathy** | RYR1, MTM1 | Muscle biopsy, EMG |\n| **Chromosomal** | Prader-Willi, 1p36 del | CMA |\n| **Metabolic** | Pompe, mitochondrial | GAA enzyme, lactate |\n| **Syndromic** | Various | Dysmorphology exam |\n\n### Priority Testing\n\n| Test | Yield | Rationale |\n|------|-------|-----------|\n| Chromosomal microarray | 15-20% | Screen for CNVs |\n| SMN1 deletion | 8-10% if SMA | Treatable if positive |\n| Trio WES | 25-40% | Broad diagnostic screen |\n\n---\n\n## 3. Recommended Additional Workup\n\nBefore genetic testing, consider:\n\n| Test | Purpose | Urgency |\n|------|---------|---------|\n| CK level | Myopathy vs neuropathy | High |\n| Lactate | Metabolic disease | Medium |\n| Brain MRI | Structural vs metabolic | Medium |\n| EMG/NCS | Localize lesion | Medium |\n| Dysmorphology exam | Syndromic features | High |\n\n---\n\n## 6. Recommendations\n\n### Immediate\n1. **Genetics referral** for comprehensive evaluation\n2. **CK and metabolic labs** as first-line screening\n3. **SMN1 testing** - treatable condition, high impact\n\n### Diagnostic Strategy Options\n\n**Option A**: Targeted approach\n- SMN1 → CMA → Neuromuscular panel → WES\n\n**Option B**: Broad approach (recommended if resources allow)\n- Trio WES + CMA concurrently\n\n---\n\n## 7. Data Gaps\n\n| Gap | Impact | Action Needed |\n|-----|--------|---------------|\n| Limited phenotype | Broad differential | Detailed clinical exam |\n| No neuroimaging | Cannot assess CNS | Order brain MRI |\n| No metabolic studies | May miss treatable | Order basic metabolic |\n| No family history | Cannot assess inheritance | Take detailed pedigree |\n\n*Note: This case has significant diagnostic uncertainty due to limited phenotypic information.*\n```\n","content_type":"text/markdown; charset=utf-8","language":"markdown","size":13697,"content_sha256":"81b51f3c61a9e0b991eba8e34d5ca590a244f776d8939dced79c3167004921c9"},{"filename":"REPORT_TEMPLATE.md","content":"# Report Template & Output Examples\n\nTemplates and example outputs for each phase of the rare disease diagnosis workflow.\n\n---\n\n## Report File Template\n\n**File**: `[PATIENT_ID]_rare_disease_report.md`\n\n```markdown\n# Rare Disease Diagnostic Report\n\n**Patient ID**: [ID] | **Date**: [Date] | **Status**: In Progress\n\n---\n\n## Executive Summary\n[Researching...]\n\n---\n\n## 1. Phenotype Analysis\n### 1.1 Standardized HPO Terms\n[Researching...]\n### 1.2 Key Clinical Features\n[Researching...]\n\n---\n\n## 2. Differential Diagnosis\n### 2.1 Ranked Candidate Diseases\n[Researching...]\n### 2.2 Disease Details\n[Researching...]\n\n---\n\n## 3. Recommended Gene Panel\n### 3.1 Prioritized Genes\n[Researching...]\n### 3.2 Testing Strategy\n[Researching...]\n\n---\n\n## 4. Variant Interpretation (if applicable)\n### 4.1 Variant Details\n[Researching...]\n### 4.2 ACMG Classification\n[Researching...]\n\n---\n\n## 5. Structural Analysis (if applicable)\n### 5.1 Structure Prediction\n[Researching...]\n### 5.2 Variant Impact\n[Researching...]\n\n---\n\n## 6. Clinical Recommendations\n### 6.1 Diagnostic Next Steps\n[Researching...]\n### 6.2 Specialist Referrals\n[Researching...]\n### 6.3 Family Screening\n[Researching...]\n\n---\n\n## 7. Data Gaps & Limitations\n[Researching...]\n\n---\n\n## 8. Data Sources\n[Will be populated as research progresses...]\n```\n\n---\n\n## Phase Output Examples\n\n### Phase 1: Phenotype Analysis Output\n\n```markdown\n## 1. Phenotype Analysis\n\n### 1.1 Standardized HPO Terms\n\n| Clinical Feature | HPO Term | HPO ID | Category |\n|------------------|----------|--------|----------|\n| Tall stature | Tall stature | HP:0000098 | Core |\n| Long fingers | Arachnodactyly | HP:0001166 | Core |\n| Heart murmur | Cardiac murmur | HP:0030148 | Variable |\n| Joint hypermobility | Joint hypermobility | HP:0001382 | Core |\n\n**Total HPO Terms**: 8\n**Onset**: Childhood\n**Family History**: Father with similar features (AD suspected)\n\n*Source: HPO via `HPO_search_terms`*\n```\n\n### Phase 2: Differential Diagnosis Output\n\n```markdown\n## 2. Differential Diagnosis\n\n### Top Candidate Diseases (Ranked by Phenotype Match)\n\n| Rank | Disease | ORPHA | OMIM | Match | Inheritance | Key Gene(s) |\n|------|---------|-------|------|-------|-------------|-------------|\n| 1 | Marfan syndrome | 558 | 154700 | 85% | AD | FBN1 |\n| 2 | Loeys-Dietz syndrome | 60030 | 609192 | 72% | AD | TGFBR1, TGFBR2 |\n| 3 | Ehlers-Danlos, vascular | 286 | 130050 | 65% | AD | COL3A1 |\n| 4 | Homocystinuria | 394 | 236200 | 58% | AR | CBS |\n\n### DisGeNET Gene-Disease Evidence\n\n| Gene | Associated Diseases | GDA Score | Evidence |\n|------|---------------------|-----------|----------|\n| FBN1 | Marfan syndrome, MASS phenotype | 0.95 | Curated |\n| TGFBR1 | Loeys-Dietz syndrome | 0.89 | Curated |\n| COL3A1 | vascular EDS | 0.91 | Curated |\n\n*Source: DisGeNET via `DisGeNET_search_gene`*\n\n### Disease Details\n\n#### 1. Marfan Syndrome\n\n**ORPHA**: 558 | **OMIM**: 154700 | **Prevalence**: 1-5/10,000\n\n**Phenotype Match Analysis**:\n| Patient Feature | Disease Feature | Match |\n|-----------------|-----------------|-------|\n| Tall stature | Present in 95% | Yes |\n| Arachnodactyly | Present in 90% | Yes |\n| Joint hypermobility | Present in 85% | Yes |\n| Cardiac murmur | Aortic root dilation (70%) | Partial |\n\n**OMIM Clinical Synopsis** (via `OMIM_get_clinical_synopsis`):\n- **Cardiovascular**: Aortic root dilation, mitral valve prolapse\n- **Skeletal**: Scoliosis, pectus excavatum, tall stature\n- **Ocular**: Ectopia lentis, myopia\n\n**Diagnostic Criteria**: Ghent nosology (2010)\n- Aortic root dilation/dissection + FBN1 mutation = Diagnosis\n- Without genetic testing: systemic score >=7 + ectopia lentis\n\n**Inheritance**: Autosomal dominant (25% de novo)\n\n*Source: Orphanet via `Orphanet_get_disease`, OMIM via `OMIM_get_entry`, DisGeNET*\n```\n\n### Phase 3: Gene Panel Output\n\n```markdown\n## 3. Recommended Gene Panel\n\n### 3.1 Prioritized Genes for Testing\n\n| Priority | Gene | Diseases | Evidence | Constraint (pLI) | Expression |\n|----------|------|----------|----------|------------------|------------|\n| High | FBN1 | Marfan syndrome | Definitive | 1.00 | Heart, aorta |\n| High | TGFBR1 | Loeys-Dietz 1 | Definitive | 0.98 | Ubiquitous |\n| High | TGFBR2 | Loeys-Dietz 2 | Definitive | 0.99 | Ubiquitous |\n| Medium | COL3A1 | EDS vascular | Definitive | 1.00 | Connective tissue |\n| Low | CBS | Homocystinuria | Definitive | 0.00 | Liver |\n\n### 3.2 Panel Design Recommendation\n\n**Minimum Panel** (high yield): FBN1, TGFBR1, TGFBR2, COL3A1\n**Extended Panel** (+differential): Add CBS, SMAD3, ACTA2\n\n**Testing Strategy**:\n1. Start with FBN1 sequencing (highest pre-test probability)\n2. If negative, proceed to full connective tissue panel\n3. Consider WES if panel negative\n\n*Source: ClinGen via gene-disease validity, GTEx expression*\n```\n\n### Phase 3.5: Expression & Regulatory Context Output\n\n```markdown\n## 3.5 Expression & Regulatory Context\n\n### Cell-Type Specific Expression (CELLxGENE)\n\n| Gene | Top Expressing Cell Types | Expression Level | Tissue Relevance |\n|------|---------------------------|------------------|------------------|\n| FBN1 | Fibroblasts, Smooth muscle | High (TPM=45) | Connective tissue |\n| TGFBR1 | Endothelial, Fibroblasts | Medium (TPM=12) | Vascular |\n| COL3A1 | Fibroblasts, Myofibroblasts | Very High (TPM=120) | Connective tissue |\n\n**Interpretation**: All top candidate genes show high expression in disease-relevant cell types.\n\n### Regulatory Context (ChIPAtlas)\n\n| Gene | Key TF Regulators | Regulatory Significance |\n|------|-------------------|------------------------|\n| FBN1 | TGFb pathway (SMAD2/3), AP-1 | TGFb-responsive |\n| TGFBR1 | STAT3, NF-kB | Inflammation-responsive |\n\n*Source: CELLxGENE Census, ChIPAtlas*\n```\n\n### Phase 3.6: Pathway & Network Context Output\n\n```markdown\n## 3.6 Pathway & Network Context\n\n### KEGG Pathways\n\n| Gene | Key Pathways | Biological Process |\n|------|--------------|-------------------|\n| FBN1 | ECM-receptor interaction (hsa04512) | Extracellular matrix |\n| TGFBR1/2 | TGF-beta signaling (hsa04350) | Cell signaling |\n| COL3A1 | Focal adhesion (hsa04510) | Cell-matrix adhesion |\n\n### Shared Pathway Analysis\n\n**Convergent pathways** (>=2 candidate genes):\n- TGF-beta signaling pathway: FBN1, TGFBR1, TGFBR2, SMAD3\n- ECM organization: FBN1, COL3A1\n\n**Interpretation**: Candidate genes converge on TGF-beta signaling and extracellular matrix pathways, consistent with connective tissue disorder etiology.\n\n### Protein-Protein Interactions (IntAct)\n\n| Gene | Direct Interactors | Notable Partners |\n|------|-------------------|------------------|\n| FBN1 | 42 | LTBP1, TGFB1, ADAMTS10 |\n| TGFBR1 | 68 | TGFBR2, SMAD2, SMAD3 |\n\n*Source: KEGG, IntAct, Reactome*\n```\n\n### Phase 4: Variant Interpretation Output\n\n```markdown\n## 4. Variant Interpretation\n\n### 4.1 Variant: FBN1 c.4621G>A (p.Glu1541Lys)\n\n| Property | Value | Interpretation |\n|----------|-------|----------------|\n| Gene | FBN1 | Marfan syndrome gene |\n| Consequence | Missense | Amino acid change |\n| ClinVar | VUS | Uncertain significance |\n| gnomAD AF | 0.000004 | Ultra-rare (PM2) |\n\n### 4.2 Computational Predictions\n\n| Predictor | Score | Classification | ACMG Support |\n|-----------|-------|----------------|--------------|\n| **AlphaMissense** | 0.78 | Pathogenic | PP3 (strong) |\n| **CADD PHRED** | 28.5 | Top 0.1% deleterious | PP3 |\n| **EVE** | 0.72 | Likely pathogenic | PP3 |\n\n**Consensus**: 3/3 predictors concordant damaging -> **Strong PP3 support**\n\n*Source: AlphaMissense, CADD API, EVE via Ensembl VEP*\n\n### 4.3 ACMG Evidence Summary\n\n| Criterion | Evidence | Strength |\n|-----------|----------|----------|\n| PM2 | Absent from gnomAD (AF \u003c 0.00001) | Moderate |\n| PP3 | AlphaMissense + CADD + EVE concordant | Supporting (strong) |\n| PP4 | Phenotype highly specific for Marfan | Supporting |\n| PS4 | Multiple affected family members | Strong |\n\n**Preliminary Classification**: Likely Pathogenic (1 Strong + 1 Moderate + 2 Supporting)\n\n*Source: ClinVar, gnomAD, AlphaMissense, CADD, EVE*\n```\n\n### Phase 5: Structure Analysis Output\n\n```markdown\n## 5. Structural Analysis\n\n### 5.1 Structure Prediction\n\n**Method**: AlphaFold2 via NVIDIA NIM\n**Protein**: Fibrillin-1 (FBN1)\n**Sequence Length**: 2,871 amino acids\n\n| Metric | Value | Interpretation |\n|--------|-------|----------------|\n| Mean pLDDT | 85.3 | High confidence overall |\n| Variant position pLDDT | 92.1 | Very high confidence |\n| Nearby domain | cbEGF-like domain 23 | Calcium-binding |\n\n### 5.2 Variant Location Analysis\n\n**Variant**: p.Glu1541Lys\n\n| Feature | Finding | Impact |\n|---------|---------|--------|\n| Domain | cbEGF-like domain 23 | Critical for calcium binding |\n| Conservation | 100% conserved across vertebrates | High constraint |\n| Structural role | Calcium coordination residue | Likely destabilizing |\n| Nearby pathogenic | p.Glu1540Lys (Pathogenic) | Adjacent residue |\n\n### 5.3 Structural Interpretation\n\nThe variant p.Glu1541Lys:\n1. **Located in cbEGF domain** - Critical for fibrillin-1 function\n2. **Glutamate to Lysine** - Charge reversal (negative to positive)\n3. **Calcium binding** - Glutamate at this position coordinates Ca2+\n4. **Adjacent pathogenic variant** - p.Glu1540Lys is classified Pathogenic\n\n**Structural Evidence**: Strong support for pathogenicity (PM1 - critical domain)\n\n*Source: NVIDIA NIM via `NvidiaNIM_alphafold2`, InterPro*\n```\n\n### Phase 6: Literature Evidence Output\n\n```markdown\n## 6. Literature Evidence\n\n### 6.1 Key Published Studies\n\n| PMID | Title | Year | Citations | Relevance |\n|------|-------|------|-----------|-----------|\n| 32123456 | FBN1 variants in Marfan syndrome... | 2023 | 45 | Direct |\n| 31987654 | TGF-beta signaling in connective... | 2022 | 89 | Pathway |\n| 30876543 | Novel diagnostic criteria for... | 2021 | 156 | Diagnostic |\n\n### 6.2 Recent Preprints (Not Yet Peer-Reviewed)\n\n| Source | Title | Posted | Relevance |\n|--------|-------|--------|-----------|\n| BioRxiv | Novel FBN1 splice variant causes... | 2024-01 | Case report |\n| MedRxiv | Machine learning for Marfan... | 2024-02 | Diagnostic |\n\n**Note**: Preprints have not undergone peer review. Use with caution.\n\n### 6.3 Evidence Summary\n\n| Evidence Type | Count | Strength |\n|---------------|-------|----------|\n| Case reports | 12 | Supporting |\n| Functional studies | 5 | Strong |\n| Clinical trials | 2 | Strong |\n| Reviews | 8 | Context |\n\n*Source: PubMed, BioRxiv, OpenAlex*\n```\n\n---\n\n## Additional Output Files\n\n### Gene Panel CSV\n\n**File**: `[PATIENT_ID]_gene_panel.csv`\n\n```csv\npriority,gene,diseases,evidence_level,pLI,expression,clingen_classification,actionable\n1,FBN1,Marfan syndrome,Definitive,1.00,\"Heart, aorta\",Definitive,Yes\n2,TGFBR1,Loeys-Dietz 1,Definitive,0.98,Ubiquitous,Definitive,Yes\n3,TGFBR2,Loeys-Dietz 2,Definitive,0.99,Ubiquitous,Definitive,Yes\n4,COL3A1,EDS vascular,Definitive,1.00,Connective tissue,Definitive,Yes\n5,CBS,Homocystinuria,Definitive,0.00,Liver,Definitive,No\n```\n\n### Variant Interpretation CSV\n\n**File**: `[PATIENT_ID]_variant_interpretation.csv`\n\n```csv\ngene,variant,consequence,clinvar,gnomad_af,cadd,alphamissense,eve,acmg_class\nFBN1,c.4621G>A,Missense,VUS,0.000004,28.5,0.78,0.72,Likely Pathogenic\n```\n","content_type":"text/markdown; charset=utf-8","language":"markdown","size":11129,"content_sha256":"76585f3cb12d6bf02975c60845aa35f801dcfa959625d14ccf9929ea40755d19"},{"filename":"scripts/clinical_patterns.py","content":"\"\"\"\nClinical diagnosis reference for rare-disease-diagnosis skill.\n\nUsage:\n python clinical_patterns.py --type syndrome --name \"Felty\"\n python clinical_patterns.py --type differential --symptoms \"RA,splenomegaly,neutropenia\"\n python clinical_patterns.py --type red_flag --symptom \"splenomegaly\"\n python clinical_patterns.py --type occupational --exposure \"asbestos\"\n\"\"\"\n\nimport argparse\nimport json\nimport sys\n\nSYNDROMES = {\n \"felty\": {\n \"name\": \"Felty's Syndrome\",\n \"triad\": [\"rheumatoid arthritis\", \"splenomegaly\", \"neutropenia\"],\n \"key_distinction\": \"NOT infectious — neutropenia is autoimmune, not septic\",\n \"misdiagnosis_traps\": [\n \"Splenomegaly + neutropenia mistaken for lymphoma or leukemia\",\n \"Recurrent infections (secondary to neutropenia) mistaken for primary immunodeficiency\",\n \"RA assumed inactive when joint disease quiets but hematologic disease progresses\",\n ],\n \"red_flags\": [\n \"Seropositive RA (high RF, anti-CCP) — essential for diagnosis\",\n \"Neutrophil count \u003c 2.0 × 10⁹/L in RA patient\",\n \"Palpable spleen in long-standing RA\",\n \"Recurrent bacterial infections without obvious cause\",\n ],\n \"diagnostic_steps\": [\n \"1. Confirm seropositive RA (RF, anti-CCP)\",\n \"2. CBC with differential — absolute neutrophil count\",\n \"3. Abdominal exam / ultrasound for splenomegaly\",\n \"4. Bone marrow biopsy if needed to exclude myeloid pathology\",\n \"5. Rule out drug-induced neutropenia (methotrexate, gold, penicillamine)\",\n ],\n \"treatment_hint\": \"MTX or leflunomide; G-CSF for severe/recurrent infections; splenectomy rarely needed\",\n \"icd10\": \"M05.0\",\n \"orpha\": \"ORPHA:47612\",\n },\n \"coccidioidomycosis\": {\n \"name\": \"Coccidioidomycosis (Valley Fever)\",\n \"triad\": [\"fever\", \"cough\", \"chest pain\"],\n \"key_distinction\": \"Endemic fungal infection — travel history to SW USA, Mexico, Central/South America is ESSENTIAL\",\n \"misdiagnosis_traps\": [\n \"Pulmonary form mistaken for community-acquired pneumonia or TB\",\n \"Disseminated form (meningitis, bone) mistaken for bacterial meningitis or osteomyelitis\",\n \"Skin lesions mistaken for sarcoidosis or cutaneous TB\",\n \"Eosinophilia clue often overlooked\",\n ],\n \"red_flags\": [\n \"Flu-like illness + eosinophilia after travel to endemic region\",\n \"Erythema nodosum or erythema multiforme with respiratory symptoms\",\n \"Pneumonia not responding to standard antibiotics\",\n \"Meningitis with eosinophils in CSF\",\n \"Lytic bone lesions + pulmonary infiltrates\",\n ],\n \"diagnostic_steps\": [\n \"1. ALWAYS take travel/residence history (endemic: Arizona, California's Central Valley, Texas, Mexico, Central America)\",\n \"2. Serology: IgM (early) and IgG (complement fixation — severity marker)\",\n \"3. Culture of sputum/BAL (BSL-3 precautions — inform lab)\",\n \"4. Urine antigen for disseminated disease\",\n \"5. Chest CT: nodules, cavities, hilar lymphadenopathy\",\n ],\n \"treatment_hint\": \"Mild pulmonary: fluconazole or itraconazole. Severe/disseminated: amphotericin B then azole step-down. Meningitis: lifelong fluconazole.\",\n \"icd10\": \"B38\",\n \"orpha\": None,\n },\n \"sle_nephritis\": {\n \"name\": \"SLE Nephritis (Lupus Nephritis)\",\n \"triad\": [\"hematuria\", \"proteinuria\", \"hypertension\"],\n \"key_distinction\": \"vs. post-streptococcal GN: check ASO titers and complement (C3/C4 persistently low in SLE, transiently low in PSGN)\",\n \"misdiagnosis_traps\": [\n \"Post-strep GN: low C3 but ASO high, short course, resolves in weeks\",\n \"SLE: C3 AND C4 both low, ANA positive, multi-system involvement\",\n \"IgA nephropathy: normal complement, elevated IgA, no systemic features\",\n \"ANCA vasculitis: pauci-immune on biopsy, ANCA positive\",\n ],\n \"red_flags\": [\n \"Hematuria + proteinuria in young woman with malar rash or photosensitivity\",\n \"Nephrotic range proteinuria (>3.5 g/day) with systemic symptoms\",\n \"Complement levels (C3, C4) persistently depressed — not just transiently\",\n \"Positive ANA (>1:160), anti-dsDNA, anti-Smith\",\n \"Thrombocytopenia or hemolytic anemia alongside renal disease\",\n ],\n \"diagnostic_steps\": [\n \"1. ASO titer + anti-DNase B — if high, favors PSGN; repeat complement in 6-8 weeks\",\n \"2. ANA panel (ANA, anti-dsDNA, anti-Smith, anti-SSA/SSB, antiphospholipid)\",\n \"3. C3, C4, CH50 — persistent low favors SLE\",\n \"4. Urinalysis with microscopy — RBC casts = glomerulonephritis\",\n \"5. 24-hour urine protein or urine protein:creatinine ratio\",\n \"6. Renal biopsy — ISN/RPS class I-VI guides treatment\",\n ],\n \"treatment_hint\": \"Class III/IV: hydroxychloroquine + MMF/cyclophosphamide + steroids. Class V: MMF. Target remission (proteinuria \u003c0.5 g/day).\",\n \"icd10\": \"M32.14\",\n \"orpha\": \"ORPHA:536425\",\n },\n \"turner\": {\n \"name\": \"Turner Syndrome\",\n \"triad\": [\"short stature\", \"ovarian dysgenesis / primary amenorrhea\", \"cardiac malformation\"],\n \"key_distinction\": \"Chromosomal (45,X or mosaic) — cardiac and renal anomalies must be screened even in mild phenotypes\",\n \"misdiagnosis_traps\": [\n \"Mosaic Turner (45,X/46,XX) has near-normal phenotype — missed if karyotype not performed\",\n \"Short stature attributed to growth hormone deficiency without chromosomal workup\",\n \"Bicuspid aortic valve/coarctation diagnosed without Turner consideration\",\n \"Amenorrhea attributed to hypothalamic cause without karyotype\",\n ],\n \"red_flags\": [\n \"Short stature in a girl (height >2 SD below mean) with no clear cause\",\n \"Webbed neck, low posterior hairline, wide-carrying angle (cubitus valgus)\",\n \"Bicuspid aortic valve or coarctation of aorta in a female\",\n \"Primary amenorrhea or premature ovarian insufficiency\",\n \"Shield chest, widely spaced nipples, lymphedema in neonate\",\n ],\n \"diagnostic_steps\": [\n \"1. Karyotype (at least 30 cells to detect mosaicism)\",\n \"2. Cardiac MRI/echo — bicuspid AV, coarctation, aortic root dilation\",\n \"3. Renal ultrasound — horseshoe kidney, duplicated collecting system\",\n \"4. Pelvic ultrasound — streak gonads\",\n \"5. FSH, LH (elevated in ovarian failure), estradiol\",\n \"6. Bone age X-ray\",\n \"7. Hearing evaluation — sensorineural hearing loss common\",\n ],\n \"treatment_hint\": \"Growth hormone from early childhood. Estrogen replacement at puberty (bone, cardiovascular, quality of life). Cardiac surveillance lifelong.\",\n \"icd10\": \"Q96\",\n \"orpha\": \"ORPHA:881\",\n },\n \"wernicke_korsakoff\": {\n \"name\": \"Wernicke-Korsakoff Syndrome\",\n \"triad\": [\"confusion / encephalopathy\", \"ophthalmoplegia (eye movement abnormalities)\", \"ataxia\"],\n \"key_distinction\": \"Wernicke's encephalopathy (acute, reversible with thiamine) progresses to Korsakoff's (chronic amnestic syndrome with confabulation if untreated)\",\n \"misdiagnosis_traps\": [\n \"Classic triad present in only 16% of cases — do NOT wait for all three\",\n \"Confusion attributed to alcohol intoxication/withdrawal, missing thiamine deficiency\",\n \"Ophthalmoplegia subtle (nystagmus only) and attributed to intoxication\",\n \"MRI normal in up to 50% of acute Wernicke's\",\n \"Korsakoff's confabulation mistaken for psychosis or dementia\",\n ],\n \"red_flags\": [\n \"ANY two of: confusion, ophthalmoplegia, ataxia in an at-risk patient\",\n \"Risk groups: alcohol use disorder, prolonged vomiting/starvation, bariatric surgery, malabsorption, refeeding without thiamine\",\n \"Nystagmus (most common eye finding), lateral gaze palsy, or complete ophthalmoplegia\",\n \"Confabulation: plausible but fabricated memories — pathognomonic of Korsakoff's\",\n \"Peripheral neuropathy + encephalopathy in malnourished patient\",\n ],\n \"diagnostic_steps\": [\n \"1. GIVE THIAMINE BEFORE GLUCOSE — IV thiamine 500 mg TID × 3 days (NEVER give dextrose first)\",\n \"2. Whole blood thiamine level (but do not delay treatment for result)\",\n \"3. MRI brain: hyperintense lesions in periaqueductal gray, mammillary bodies, thalamus (DWI/FLAIR)\",\n \"4. Blood glucose (hypoglycemia may coexist)\",\n \"5. Comprehensive metabolic panel, B12, folate\",\n \"6. Neuropsychological testing for memory deficits (Korsakoff's)\",\n ],\n \"treatment_hint\": \"IV thiamine STAT. Oral thiamine unreliable in alcoholics (malabsorption). Continue thiamine supplementation. Address underlying cause.\",\n \"icd10\": \"E51.2\",\n \"orpha\": \"ORPHA:900\",\n },\n}\n\n# Pre-compute lowercased triads once to avoid repeating in every build_differential call.\n_SYNDROME_TRIADS_LOWER = {\n key: [t.lower() for t in s[\"triad\"]] for key, s in SYNDROMES.items()\n}\n\nOCCUPATIONAL_EXPOSURES = {\n \"asbestos\": {\n \"diseases\": [\"Mesothelioma\", \"Asbestosis\", \"Pleural plaques\", \"Pleural effusion\", \"Bronchogenic carcinoma (additive with smoking)\"],\n \"latency\": \"10-40 years (mesothelioma up to 50 years)\",\n \"at_risk_occupations\": [\"Shipbuilding / ship repair\", \"Insulation installation\", \"Construction (pre-1980 buildings)\", \"Brake lining repair\", \"Boilermaking\", \"Mining (crocidolite / amosite)\"],\n \"key_findings\": [\n \"Pleural plaques on CT (calcified, bilateral, diaphragmatic) — pathognomonic of asbestos exposure\",\n \"Mesothelioma: pleural thickening + effusion + restricted lung; mesothelial cells on cytology\",\n \"Asbestosis: bilateral basal fibrosis, honeycombing on HRCT; ferruginous bodies on BAL\",\n ],\n \"diagnostic_note\": \"No safe level of asbestos. All-fiber types cause mesothelioma. Amphiboles (crocidolite, amosite) > chrysotile risk.\",\n \"workup\": [\"HRCT chest\", \"PFTs (restrictive pattern)\", \"Bronchoscopy + BAL\", \"Pleural biopsy (CT/thoracoscopy-guided)\", \"Calretinin, WT1, D2-40 IHC for mesothelioma\"],\n },\n \"silica\": {\n \"diseases\": [\"Silicosis (simple, accelerated, acute)\", \"Progressive massive fibrosis (PMF)\", \"COPD\", \"Lung cancer (IARC Group 1)\", \"Autoimmune diseases (SLE, RA, scleroderma)\", \"Chronic kidney disease\"],\n \"latency\": \"Simple: >10 years. Accelerated: 5-10 years. Acute: \u003c5 years (high dose).\",\n \"at_risk_occupations\": [\"Mining / tunneling\", \"Sandblasting\", \"Quarrying\", \"Foundry work\", \"Ceramics / pottery\", \"Denim sandblasting (epidemic in Turkey)\", \"Artificial stone (engineered quartz) countertop fabrication\"],\n \"key_findings\": [\n \"Bilateral upper lobe nodules (1-10 mm) on CXR/CT\",\n \"Eggshell calcification of hilar/mediastinal lymph nodes — classic\",\n \"PMF: conglomerate upper lobe masses >1 cm\",\n \"Acute silicosis: alveolar proteinosis pattern (crazy paving on HRCT)\",\n ],\n \"diagnostic_note\": \"Occupational history is the key — silicosis is radiographic diagnosis. Rule out TB (silico-TB complication). Accelerated silicosis epidemic from engineered stone.\",\n \"workup\": [\"HRCT chest\", \"PFTs\", \"ANA, ANCA (autoimmune screen)\", \"TB screening (IGRA / TST)\", \"Urinalysis + creatinine\"],\n },\n \"coal\": {\n \"diseases\": [\"Coal workers' pneumoconiosis (CWP) / black lung\", \"Progressive massive fibrosis\", \"COPD\", \"Caplan syndrome (RA + large pulmonary nodules)\"],\n \"latency\": \"Simple CWP: 10+ years. PMF: earlier with higher dust levels.\",\n \"at_risk_occupations\": [\"Underground coal mining\", \"Surface coal mining\", \"Coal processing / washing\", \"Coal-fired power plant workers\"],\n \"key_findings\": [\n \"Small rounded opacities on CXR (ILO classification p, q, r type)\",\n \"PMF: bilateral upper lobe masses, may cavitate\",\n \"Caplan nodules: well-defined 1-5 cm nodules in setting of RA\",\n \"Black lung resurgence (newer mining exposures to silica-rich seams)\",\n ],\n \"diagnostic_note\": \"ILO chest X-ray classification used for grading. Differentiate from TB (coalminers at increased TB risk). MSHA-certified B-reader for X-ray interpretation.\",\n \"workup\": [\"ILO-classified CXR\", \"HRCT chest\", \"PFTs (spirometry)\", \"RA factor, anti-CCP (Caplan's)\", \"TB screening\"],\n },\n \"metal_smelting\": {\n \"diseases\": [\"Heavy metal poisoning (lead, arsenic, mercury, cadmium, manganese)\", \"Metal fume fever (zinc, copper fumes)\", \"Hard metal disease (cobalt)\", \"Nickel-induced lung cancer / nasal cancer\", \"Chromium-induced lung cancer / nasal ulcers\"],\n \"latency\": \"Metal fume fever: hours to days. Chronic poisoning: months to years.\",\n \"at_risk_occupations\": [\"Smelting / foundry workers\", \"Battery manufacturing (lead)\", \"Electronic waste recycling\", \"Welders\", \"Pesticide manufacturers (arsenic)\", \"Chloralkali plant workers (mercury)\", \"Hard metal (WC-Co) tooling\"],\n \"key_findings\": {\n \"lead\": \"Anemia + basophilic stippling, peripheral neuropathy, Burton's lines (gingival), encephalopathy; blood Pb >5 µg/dL\",\n \"arsenic\": \"Peripheral neuropathy + hyperkeratosis + Mees' lines (transverse white bands on nails) + rain-drop skin pigmentation\",\n \"mercury\": \"Erethism (excessive shyness, memory loss, irritability) + tremor + gingivitis ('mad hatter')\",\n \"cadmium\": \"Renal tubular dysfunction (proximal) + osteomalacia (Itai-itai disease) + proteinuria\",\n \"manganese\": \"Manganism: Parkinsonism-like syndrome with cock-walk gait, psychiatric symptoms; basal ganglia T1 hyperintensity on MRI\",\n \"cobalt\": \"Hard metal lung disease: giant cell interstitial pneumonitis on biopsy\",\n },\n \"diagnostic_note\": \"Collect 24-hour urine for heavy metals (not just serum). Chelation therapy depends on specific metal and severity.\",\n \"workup\": [\"Blood lead, mercury, arsenic levels\", \"24-hour urine arsenic, mercury, cadmium\", \"CBC with differential (lead: basophilic stippling)\", \"Nerve conduction studies\", \"Renal function / urinalysis\", \"Brain MRI (manganese)\", \"Chest HRCT (cobalt)\"],\n },\n}\n\nRED_FLAGS = {\n \"splenomegaly\": {\n \"symptom\": \"Splenomegaly\",\n \"differentials\": [\n {\n \"diagnosis\": \"Felty's Syndrome\",\n \"additional_features\": [\"seropositive RA\", \"neutropenia\", \"recurrent infections\"],\n \"key_test\": \"CBC + RF + anti-CCP\",\n \"pitfall\": \"Confused with infection or lymphoma\",\n },\n {\n \"diagnosis\": \"Portal hypertension (cirrhosis)\",\n \"additional_features\": [\"variceal bleeding\", \"ascites\", \"jaundice\", \"spider angiomata\"],\n \"key_test\": \"LFTs, platelet count, hepatic ultrasound with Doppler\",\n \"pitfall\": \"Hypersplenism causes cytopenias — may mimic hematologic disease\",\n },\n {\n \"diagnosis\": \"Lymphoma / CLL\",\n \"additional_features\": [\"lymphadenopathy\", \"B symptoms (fever, night sweats, weight loss)\", \"fatigue\"],\n \"key_test\": \"CBC + flow cytometry + CT staging + biopsy\",\n \"pitfall\": \"B symptoms overlap with infection\",\n },\n {\n \"diagnosis\": \"Visceral leishmaniasis (kala-azar)\",\n \"additional_features\": [\"massive splenomegaly\", \"fever\", \"weight loss\", \"travel to endemic area\", \"pancytopenia\"],\n \"key_test\": \"Leishmania serology / rK39 antigen / bone marrow biopsy\",\n \"pitfall\": \"Missed without travel history\",\n },\n {\n \"diagnosis\": \"Storage diseases (Gaucher, Niemann-Pick)\",\n \"additional_features\": [\"hepatomegaly\", \"bone pain\", \"cytopenias\", \"childhood onset\"],\n \"key_test\": \"Glucocerebrosidase enzyme assay (Gaucher), GBA sequencing\",\n \"pitfall\": \"Adult Gaucher often missed; bone crisis confused with osteomyelitis\",\n },\n ],\n \"urgent_workup\": [\"CBC with differential\", \"Peripheral blood smear\", \"LFTs\", \"LDH + uric acid\", \"Abdominal ultrasound\", \"Epstein-Barr virus / CMV serology\"],\n },\n \"neutropenia\": {\n \"symptom\": \"Neutropenia (ANC \u003c 1.5 × 10⁹/L)\",\n \"differentials\": [\n {\n \"diagnosis\": \"Drug-induced neutropenia\",\n \"additional_features\": [\"recent medication start (clozapine, carbimazole, trimethoprim, methotrexate, chemotherapy)\"],\n \"key_test\": \"Medication review; stop offending drug\",\n \"pitfall\": \"Drug timeline not reviewed systematically\",\n },\n {\n \"diagnosis\": \"Felty's Syndrome\",\n \"additional_features\": [\"seropositive RA\", \"splenomegaly\"],\n \"key_test\": \"RF, anti-CCP, abdominal exam\",\n \"pitfall\": \"Attributed to infection rather than autoimmune cause\",\n },\n {\n \"diagnosis\": \"Autoimmune neutropenia\",\n \"additional_features\": [\"isolated neutropenia\", \"otherwise well\", \"childhood more common\"],\n \"key_test\": \"ANA, anti-neutrophil antibodies\",\n \"pitfall\": \"Often benign and self-limited in children\",\n },\n {\n \"diagnosis\": \"Large granular lymphocyte (LGL) leukemia\",\n \"additional_features\": [\"cytopenias\", \"splenomegaly\", \"recurrent infections\", \"RA association\"],\n \"key_test\": \"Peripheral blood flow cytometry (CD8+/CD57+ clonal T cells), TCR gene rearrangement\",\n \"pitfall\": \"Overlaps with Felty syndrome; LGL leukemia may coexist\",\n },\n ],\n \"urgent_workup\": [\"CBC with differential + peripheral smear\", \"Drug review\", \"ANA, RF, anti-CCP\", \"Vitamin B12, folate, copper\", \"Flow cytometry if persistent\", \"Bone marrow biopsy if ANC \u003c0.5\"],\n },\n \"ophthalmoplegia\": {\n \"symptom\": \"Ophthalmoplegia / Eye movement abnormality\",\n \"differentials\": [\n {\n \"diagnosis\": \"Wernicke's Encephalopathy\",\n \"additional_features\": [\"confusion\", \"ataxia\", \"malnutrition / alcohol use disorder\", \"nystagmus\"],\n \"key_test\": \"GIVE THIAMINE FIRST — then MRI brain, blood thiamine level\",\n \"pitfall\": \"Attributed to alcohol intoxication; dextrose given before thiamine\",\n },\n {\n \"diagnosis\": \"Miller Fisher Syndrome (GBS variant)\",\n \"additional_features\": [\"ophthalmoplegia\", \"ataxia\", \"areflexia — triad\", \"recent infection\"],\n \"key_test\": \"Anti-GQ1b antibodies (positive in >90%), CSF albumino-cytologic dissociation\",\n \"pitfall\": \"Mistaken for Wernicke's — key: anti-GQ1b and areflexia distinguish\",\n },\n {\n \"diagnosis\": \"Myasthenia Gravis\",\n \"additional_features\": [\"ptosis\", \"diplopia worse with fatigue\", \"bulbar symptoms\", \"proximal limb weakness\"],\n \"key_test\": \"Anti-AChR, anti-MuSK antibodies; repetitive nerve stimulation\",\n \"pitfall\": \"Fatigable ophthalmoplegia missed; ice pack test positive\",\n },\n {\n \"diagnosis\": \"Internuclear Ophthalmoplegia (INO)\",\n \"additional_features\": [\"adduction deficit\", \"contralateral nystagmus\", \"diplopia\", \"MS or brainstem lesion\"],\n \"key_test\": \"MRI brain (MLF lesion), VEP\",\n \"pitfall\": \"INO in young adult — consider multiple sclerosis\",\n },\n ],\n \"urgent_workup\": [\"Thiamine 500 mg IV (before glucose)\", \"MRI brain + posterior fossa\", \"Anti-GQ1b antibodies\", \"AChR antibodies\", \"Edrophonium test (myasthenia)\", \"CSF analysis\"],\n },\n \"confabulation\": {\n \"symptom\": \"Confabulation (fabricated plausible memories)\",\n \"differentials\": [\n {\n \"diagnosis\": \"Korsakoff's Syndrome\",\n \"additional_features\": [\"severe anterograde amnesia\", \"intact procedural memory\", \"history of Wernicke's episode\", \"alcohol use disorder / malnutrition\"],\n \"key_test\": \"Detailed neuropsychological testing, MRI (mammillary body atrophy), thiamine history\",\n \"pitfall\": \"Mistaken for dementia or psychosis; confabulation not explicitly elicited\",\n },\n {\n \"diagnosis\": \"Frontotemporal Dementia\",\n \"additional_features\": [\"behavioral disinhibition\", \"executive dysfunction\", \"early onset (\u003c65)\", \"language problems\"],\n \"key_test\": \"MRI (frontal/temporal atrophy), neuropsychological battery, FDG-PET\",\n \"pitfall\": \"Korsakoff's distinguished by specific amnestic profile + thiamine history\",\n },\n ],\n \"urgent_workup\": [\"Thiamine level\", \"MRI brain with hippocampal + mammillary body volumetry\", \"Neuropsychological testing (RBMT, WMS)\", \"B12, folate, TSH\", \"RPR/VDRL (neurosyphilis)\"],\n },\n \"erythema_nodosum\": {\n \"symptom\": \"Erythema Nodosum (tender red nodules, shins)\",\n \"differentials\": [\n {\n \"diagnosis\": \"Sarcoidosis (Löfgren syndrome)\",\n \"additional_features\": [\"bilateral hilar lymphadenopathy\", \"arthritis\", \"fever\", \"excellent prognosis\"],\n \"key_test\": \"CXR / CT chest, ACE level, BAL\",\n \"pitfall\": \"Löfgren syndrome treated conservatively — biopsy usually NOT needed\",\n },\n {\n \"diagnosis\": \"Coccidioidomycosis\",\n \"additional_features\": [\"travel to SW USA / Mexico\", \"respiratory symptoms\", \"eosinophilia\"],\n \"key_test\": \"Coccidioides serology (IgM/IgG), urine antigen\",\n \"pitfall\": \"Mistaken for sarcoidosis without travel history\",\n },\n {\n \"diagnosis\": \"Inflammatory bowel disease\",\n \"additional_features\": [\"diarrhea\", \"rectal bleeding\", \"abdominal pain\", \"weight loss\"],\n \"key_test\": \"Colonoscopy with biopsies, calprotectin, CRP\",\n \"pitfall\": \"Skin manifestation precedes gut symptoms in some IBD patients\",\n },\n {\n \"diagnosis\": \"Streptococcal infection\",\n \"additional_features\": [\"recent pharyngitis\", \"high ASO titer\"],\n \"key_test\": \"Throat culture, ASO titer, anti-DNase B\",\n \"pitfall\": \"Most common cause in children — always check ASO\",\n },\n ],\n \"urgent_workup\": [\"CXR\", \"ASO titer\", \"Coccidioides serology (if travel history)\", \"ACE level\", \"ANA\", \"Stool culture / calprotectin\", \"Tuberculin skin test / IGRA\"],\n },\n \"short_stature\": {\n \"symptom\": \"Short Stature in Females (>2 SD below mean)\",\n \"differentials\": [\n {\n \"diagnosis\": \"Turner Syndrome\",\n \"additional_features\": [\"ovarian failure\", \"cardiac anomalies\", \"webbed neck\", \"cubitus valgus\", \"learning differences\"],\n \"key_test\": \"Karyotype (minimum 30 cells), FSH/LH/estradiol, cardiac echo\",\n \"pitfall\": \"Mosaic Turner underdiagnosed; karyotype not ordered for 'short but otherwise normal' girls\",\n },\n {\n \"diagnosis\": \"Growth Hormone Deficiency\",\n \"additional_features\": [\"delayed bone age\", \"decreased IGF-1\", \"normal body proportions\"],\n \"key_test\": \"IGF-1, IGFBP-3, GH stimulation test, MRI pituitary\",\n \"pitfall\": \"Turner excluded if chromosomes not checked\",\n },\n {\n \"diagnosis\": \"Hypothyroidism\",\n \"additional_features\": [\"fatigue\", \"constipation\", \"delayed bone age\", \"elevated TSH\"],\n \"key_test\": \"TSH, free T4\",\n \"pitfall\": \"Acquired hypothyroidism in girls — easily treatable cause of growth failure\",\n },\n {\n \"diagnosis\": \"Celiac Disease\",\n \"additional_features\": [\"failure to thrive\", \"diarrhea\", \"anemia\", \"delayed puberty\"],\n \"key_test\": \"Anti-tTG IgA, total IgA, duodenal biopsy\",\n \"pitfall\": \"Silent celiac (no GI symptoms) presents with short stature only\",\n },\n ],\n \"urgent_workup\": [\"Karyotype (females)\", \"Bone age X-ray\", \"TSH + free T4\", \"IGF-1, IGFBP-3\", \"CBC, CMP\", \"Anti-tTG IgA + total IgA\", \"FSH, LH, estradiol\"],\n },\n \"hematuria_proteinuria\": {\n \"symptom\": \"Hematuria + Proteinuria (nephritic/nephrotic syndrome)\",\n \"differentials\": [\n {\n \"diagnosis\": \"SLE Nephritis\",\n \"additional_features\": [\"young woman\", \"malar rash\", \"photosensitivity\", \"arthritis\", \"persistent low C3/C4\"],\n \"key_test\": \"ANA, anti-dsDNA, complement (C3/C4), renal biopsy\",\n \"pitfall\": \"Mistaken for PSGN; ASO titers and persistent complement depression distinguish\",\n },\n {\n \"diagnosis\": \"Post-streptococcal GN (PSGN)\",\n \"additional_features\": [\"2-3 weeks after strep throat / skin infection\", \"periorbital edema\", \"hypertension\", \"HIGH ASO titers\", \"TRANSIENT low C3\"],\n \"key_test\": \"ASO titer, anti-DNase B, C3 (normalizes in 6-8 weeks)\",\n \"pitfall\": \"Persistent low complement or positive ANA → SLE, not PSGN\",\n },\n {\n \"diagnosis\": \"IgA Nephropathy\",\n \"additional_features\": [\"synpharyngitic hematuria (during illness, not 2-3 weeks after)\", \"normal complement\", \"male predominance\"],\n \"key_test\": \"Serum IgA, renal biopsy (mesangial IgA deposits on IF)\",\n \"pitfall\": \"Timing of hematuria relative to infection is key (concurrent vs. delayed)\",\n },\n {\n \"diagnosis\": \"ANCA Vasculitis (GPA/MPA)\",\n \"additional_features\": [\"rapidly progressive GN\", \"pulmonary hemorrhage\", \"sinusitis\", \"weight loss\"],\n \"key_test\": \"c-ANCA/p-ANCA (PR3/MPO), renal biopsy (pauci-immune)\",\n \"pitfall\": \"Rapidly progressive — act urgently; delay causes permanent renal loss\",\n },\n ],\n \"urgent_workup\": [\"Urinalysis with microscopy (RBC casts)\", \"24-h urine protein or Pr:Cr ratio\", \"ANA, anti-dsDNA, C3, C4\", \"ASO titer, anti-DNase B\", \"ANCA (PR3/MPO)\", \"Renal biopsy if progressive\"],\n },\n}\n\n\ndef _header(title: str) -> list[str]:\n bar = \"=\" * 60\n return [f\"\\n{bar}\", f\" {title}\", bar]\n\n\ndef _fuzzy_find(query: str, mapping: dict, name_field: str) -> tuple[str, dict] | tuple[None, None]:\n \"\"\"Return (key, entry) for the first entry whose key or name_field contains query.\"\"\"\n q = query.lower()\n for key, entry in mapping.items():\n if q in key or q in entry[name_field].lower():\n return key, entry\n return None, None\n\n\ndef _require_arg(value: str | None, flag: str, query_type: str) -> None:\n if not value:\n print(f\"Error: {flag} required for --type {query_type}\", file=sys.stderr)\n sys.exit(1)\n\n\ndef build_differential(symptoms_str: str) -> dict:\n symptoms = [s.strip().lower() for s in symptoms_str.split(\",\")]\n matches = []\n for key, syndrome in SYNDROMES.items():\n triad_lower = _SYNDROME_TRIADS_LOWER[key]\n matched = [s for s in symptoms if any(s in t or t in s for t in triad_lower)]\n if matched:\n matches.append({\n \"syndrome\": syndrome[\"name\"],\n \"matched_features\": matched,\n \"match_count\": len(matched),\n \"triad\": syndrome[\"triad\"],\n \"key_distinction\": syndrome[\"key_distinction\"],\n \"top_diagnostic_step\": syndrome[\"diagnostic_steps\"][0] if syndrome[\"diagnostic_steps\"] else \"\",\n \"icd10\": syndrome.get(\"icd10\"),\n \"orpha\": syndrome.get(\"orpha\"),\n })\n matches.sort(key=lambda x: x[\"match_count\"], reverse=True)\n return {\n \"input_symptoms\": symptoms,\n \"matched_syndromes\": matches,\n \"note\": \"Differential built from syndrome triad matching. Use --type red_flag for symptom-specific differentials.\",\n }\n\n\ndef parse_args() -> argparse.Namespace:\n parser = argparse.ArgumentParser(\n description=\"Clinical diagnosis reference for rare-disease-diagnosis skill.\",\n formatter_class=argparse.RawDescriptionHelpFormatter,\n epilog=__doc__,\n )\n parser.add_argument(\n \"--type\",\n required=True,\n choices=[\"syndrome\", \"differential\", \"red_flag\", \"occupational\", \"list\"],\n help=(\n \"Query type: \"\n \"'syndrome' (look up a named syndrome), \"\n \"'differential' (build diff from symptom list), \"\n \"'red_flag' (red-flag analysis for a single symptom), \"\n \"'occupational' (occupational exposure patterns), \"\n \"'list' (list all available items)\"\n ),\n )\n parser.add_argument(\"--name\", help=\"Syndrome name (for --type syndrome), partial match supported\")\n parser.add_argument(\n \"--symptoms\",\n help=\"Comma-separated symptoms for --type differential (e.g. 'RA,splenomegaly,neutropenia')\",\n )\n parser.add_argument(\"--symptom\", help=\"Single symptom for --type red_flag\")\n parser.add_argument(\"--exposure\", help=\"Occupational exposure agent for --type occupational\")\n parser.add_argument(\n \"--format\",\n choices=[\"text\", \"json\"],\n default=\"text\",\n help=\"Output format (default: text)\",\n )\n return parser.parse_args()\n\n\ndef format_syndrome(s: dict) -> str:\n lines = _header(s[\"name\"]) + [\n f\"\\nTRIAD: {' + '.join(s['triad'])}\",\n f\"\\nKEY DISTINCTION:\\n {s['key_distinction']}\",\n \"\\nMISDIAGNOSIS TRAPS:\",\n ]\n for trap in s[\"misdiagnosis_traps\"]:\n lines.append(f\" ! {trap}\")\n lines.append(\"\\nRED FLAGS:\")\n for flag in s[\"red_flags\"]:\n lines.append(f\" >> {flag}\")\n lines.append(\"\\nDIAGNOSTIC STEPS:\")\n for step in s[\"diagnostic_steps\"]:\n lines.append(f\" {step}\")\n lines += [\n f\"\\nTREATMENT HINT:\\n {s['treatment_hint']}\",\n f\"\\nICD-10: {s.get('icd10', 'N/A')} | ORPHA: {s.get('orpha', 'N/A')}\",\n \"\",\n ]\n return \"\\n\".join(lines)\n\n\ndef format_occupational(key: str, data: dict) -> str:\n lines = _header(f\"Occupational Exposure: {key.upper()}\") + [\n f\"\\nDISEASES: {', '.join(data['diseases'])}\",\n f\"\\nLATENCY: {data['latency']}\",\n \"\\nAT-RISK OCCUPATIONS:\",\n ]\n for occ in data[\"at_risk_occupations\"]:\n lines.append(f\" - {occ}\")\n lines.append(\"\\nKEY CLINICAL FINDINGS:\")\n findings = data[\"key_findings\"]\n if isinstance(findings, list):\n for f in findings:\n lines.append(f\" - {f}\")\n else:\n for metal, finding in findings.items():\n lines.append(f\" [{metal.upper()}] {finding}\")\n lines += [\n f\"\\nDIAGNOSTIC NOTE:\\n {data['diagnostic_note']}\",\n \"\\nWORKUP:\",\n ]\n for w in data[\"workup\"]:\n lines.append(f\" - {w}\")\n return \"\\n\".join(lines)\n\n\ndef format_red_flag(data: dict) -> str:\n lines = _header(f\"Red-Flag Analysis: {data['symptom']}\") + [\"\\nDIFFERENTIAL DIAGNOSIS:\"]\n for i, d in enumerate(data[\"differentials\"], 1):\n lines.append(f\"\\n {i}. {d['diagnosis']}\")\n lines.append(f\" Additional features: {', '.join(d['additional_features'])}\")\n lines.append(f\" Key test: {d['key_test']}\")\n lines.append(f\" Pitfall: {d['pitfall']}\")\n lines.append(\"\\nURGENT WORKUP:\")\n for w in data[\"urgent_workup\"]:\n lines.append(f\" - {w}\")\n return \"\\n\".join(lines)\n\n\ndef format_differential(result: dict) -> str:\n lines = _header(f\"Differential for: {', '.join(result['input_symptoms'])}\")\n if not result[\"matched_syndromes\"]:\n lines.append(\"\\n No syndrome triad matches found. Try --type red_flag for individual symptoms.\")\n for m in result[\"matched_syndromes\"]:\n lines.append(f\"\\n [{m['match_count']} match(es)] {m['syndrome']}\")\n lines.append(f\" Triad: {' + '.join(m['triad'])}\")\n lines.append(f\" Matched on: {', '.join(m['matched_features'])}\")\n lines.append(f\" Key distinction: {m['key_distinction']}\")\n lines.append(f\" First step: {m['top_diagnostic_step']}\")\n if m.get(\"orpha\"):\n lines.append(f\" ORPHA: {m['orpha']} ICD-10: {m.get('icd10', 'N/A')}\")\n lines.append(f\"\\n Note: {result['note']}\")\n return \"\\n\".join(lines)\n\n\ndef list_all() -> str:\n lines = [\"\\n=== SYNDROMES ===\"]\n for key, s in SYNDROMES.items():\n lines.append(f\" {key:25s} → {s['name']}: {' + '.join(s['triad'])}\")\n lines.append(\"\\n=== OCCUPATIONAL EXPOSURES ===\")\n for key, data in OCCUPATIONAL_EXPOSURES.items():\n lines.append(f\" {key:25s} → {', '.join(data['diseases'][:2])} ...\")\n lines.append(\"\\n=== RED-FLAG SYMPTOMS ===\")\n for key, data in RED_FLAGS.items():\n lines.append(f\" {key:25s} → {data['symptom']}\")\n return \"\\n\".join(lines)\n\n\ndef main() -> None:\n args = parse_args()\n\n if args.type == \"list\":\n result_text = list_all()\n result_data = {\n \"syndromes\": list(SYNDROMES.keys()),\n \"occupational\": list(OCCUPATIONAL_EXPOSURES.keys()),\n \"red_flags\": list(RED_FLAGS.keys()),\n }\n\n elif args.type == \"syndrome\":\n _require_arg(args.name, \"--name\", \"syndrome\")\n _, syndrome = _fuzzy_find(args.name, SYNDROMES, \"name\")\n if syndrome is None:\n print(f\"Error: No syndrome found matching '{args.name}'. Try --type list.\", file=sys.stderr)\n sys.exit(1)\n result_text = format_syndrome(syndrome)\n result_data = syndrome\n\n elif args.type == \"differential\":\n _require_arg(args.symptoms, \"--symptoms\", \"differential\")\n result_data = build_differential(args.symptoms)\n result_text = format_differential(result_data)\n\n elif args.type == \"red_flag\":\n _require_arg(args.symptom, \"--symptom\", \"red_flag\")\n symptom_norm = args.symptom.lower().replace(\" \", \"_\")\n _, data = _fuzzy_find(symptom_norm, RED_FLAGS, \"symptom\")\n if data is None:\n print(f\"Error: No red-flag entry found for '{args.symptom}'. Try --type list.\", file=sys.stderr)\n sys.exit(1)\n result_text = format_red_flag(data)\n result_data = data\n\n else: # occupational\n _require_arg(args.exposure, \"--exposure\", \"occupational\")\n exp_lower = args.exposure.lower()\n exp_key, exp_data = _fuzzy_find(exp_lower, OCCUPATIONAL_EXPOSURES, \"diseases\")\n # _fuzzy_find matches on diseases[0] via name_field — but diseases is a list, not a string.\n # Fall back to key-only match if name_field lookup doesn't apply cleanly.\n if exp_data is None:\n for key, data in OCCUPATIONAL_EXPOSURES.items():\n if exp_lower in key or exp_lower in \" \".join(data[\"diseases\"]).lower():\n exp_key, exp_data = key, data\n break\n if exp_data is None:\n print(f\"Error: No exposure entry found for '{args.exposure}'. Try --type list.\", file=sys.stderr)\n sys.exit(1)\n result_text = format_occupational(exp_key, exp_data)\n result_data = exp_data\n\n if args.format == \"json\":\n print(json.dumps(result_data, indent=2, default=str))\n else:\n print(result_text)\n\n\nif __name__ == \"__main__\":\n main()\n","content_type":"text/x-python; charset=utf-8","language":"python","size":36160,"content_sha256":"6c1893fa875a5a89d2478a9782e21e8aeda0c6e332e5ee49a282950186d549c6"},{"filename":"TOOLS_REFERENCE.md","content":"# Rare Disease Diagnosis - Tool Reference\n\n## Phase 1: Phenotype Standardization\n\n### HPO Tools\n\n| Tool | Purpose | Key Parameters |\n|------|---------|----------------|\n| `HPO_search_terms` | Search HPO by text | `query` |\n| `HPO_get_term_by_id` | Get HPO term details | `hp_id` |\n| `HPO_get_term_genes` | Genes associated with HPO term | `hp_id` |\n| `HPO_get_term_diseases` | Diseases with HPO term | `hp_id` |\n\n**Example - Convert symptom to HPO**:\n```python\n# Search for HPO term\nresults = tu.tools.HPO_search_terms(query=\"tall stature\")\n# Returns: [{\"id\": \"HP:0000098\", \"name\": \"Tall stature\", ...}]\n```\n\n---\n\n## Phase 2: Disease Matching\n\n### Orphanet Tools (UPDATED)\n\n| Tool | Purpose | Key Parameters |\n|------|---------|----------------|\n| `Orphanet_search_diseases` | Search rare diseases | `operation=\"search_diseases\"`, `query` |\n| `Orphanet_get_disease` | Get disease details | `operation=\"get_disease\"`, `orpha_code` |\n| `Orphanet_get_genes` | Genes for disease | `operation=\"get_genes\"`, `orpha_code` |\n| `Orphanet_get_classification` | Disease hierarchy | `operation=\"get_classification\"`, `orpha_code` |\n| `Orphanet_search_by_name` | Exact name search | `operation=\"search_by_name\"`, `name`, `exact` |\n\n**Example - Search Orphanet (NEW)**:\n```python\n# Search for rare diseases\nresults = tu.tools.Orphanet_search_diseases(\n operation=\"search_diseases\",\n query=\"Marfan\"\n)\n# Returns: List of matching rare diseases with ORPHA codes\n\n# Get genes for a disease\ngenes = tu.tools.Orphanet_get_genes(\n operation=\"get_genes\",\n orpha_code=\"558\"\n)\n# Returns: FBN1 (causative), associated genes\n```\n\n**Common Orphanet Disease Codes**:\n| Disease | ORPHA Code |\n|---------|------------|\n| Marfan syndrome | 558 |\n| Loeys-Dietz syndrome | 60030 |\n| Vascular EDS | 286 |\n| Alexander disease | 58 |\n| Prader-Willi syndrome | 739 |\n\n### OMIM Tools (UPDATED)\n\n**⚠️ Requires**: `OMIM_API_KEY` environment variable (register at omim.org/api)\n\n| Tool | Purpose | Key Parameters |\n|------|---------|----------------|\n| `OMIM_search` | Search OMIM | `operation=\"search\"`, `query`, `limit` |\n| `OMIM_get_entry` | Get MIM entry | `operation=\"get_entry\"`, `mim_number` |\n| `OMIM_get_clinical_synopsis` | Clinical features by organ | `operation=\"get_clinical_synopsis\"`, `mim_number` |\n| `OMIM_get_gene_map` | Gene-disease mappings | `operation=\"get_gene_map\"`, `mim_number` or `chromosome` |\n\n**Example - Get OMIM details (NEW)**:\n```python\n# Search OMIM\nsearch = tu.tools.OMIM_search(\n operation=\"search\",\n query=\"BRCA1\",\n limit=5\n)\n# Returns: List of MIM numbers\n\n# Get detailed entry\nentry = tu.tools.OMIM_get_entry(\n operation=\"get_entry\",\n mim_number=\"154700\" # Marfan syndrome\n)\n# Returns: Full text, inheritance, molecular genetics\n\n# Get clinical synopsis (structured phenotype)\nsynopsis = tu.tools.OMIM_get_clinical_synopsis(\n operation=\"get_clinical_synopsis\",\n mim_number=\"154700\"\n)\n# Returns: Features by organ system (neurologicCentralNervousSystem, cardiovascular, etc.)\n```\n\n### DisGeNET Tools (NEW)\n\n**⚠️ Requires**: `DISGENET_API_KEY` environment variable (register free at disgenet.org)\n\n| Tool | Purpose | Key Parameters |\n|------|---------|----------------|\n| `DisGeNET_search_gene` | Diseases for a gene | `operation=\"search_gene\"`, `gene`, `limit` |\n| `DisGeNET_search_disease` | Genes for a disease | `operation=\"search_disease\"`, `disease`, `limit` |\n| `DisGeNET_get_gda` | Gene-disease associations | `operation=\"get_gda\"`, `gene`/`disease`, `source`, `min_score` |\n| `DisGeNET_get_vda` | Variant-disease associations | `operation=\"get_vda\"`, `variant`/`gene`, `limit` |\n| `DisGeNET_get_disease_genes` | All genes for disease | `operation=\"get_disease_genes\"`, `disease`, `min_score` |\n\n**Example - DisGeNET gene-disease associations**:\n```python\n# Get diseases associated with gene\nresult = tu.tools.DisGeNET_search_gene(\n operation=\"search_gene\",\n gene=\"FBN1\",\n limit=20\n)\n# Returns: Marfan syndrome (score: 0.95), MASS phenotype, etc.\n\n# Get high-confidence curated associations\ngda = tu.tools.DisGeNET_get_gda(\n operation=\"get_gda\",\n gene=\"FBN1\",\n source=\"CURATED\",\n min_score=0.3,\n limit=20\n)\n# Returns: Gene-disease associations with evidence scores\n\n# Get variant-disease associations for diagnosis\nvda = tu.tools.DisGeNET_get_vda(\n operation=\"get_vda\",\n gene=\"FBN1\",\n limit=30\n)\n# Returns: Variants with disease associations\n```\n\n**DisGeNET Score Interpretation**:\n| Score | Interpretation | Use |\n|-------|----------------|-----|\n| >0.7 | Very Strong | High confidence |\n| 0.4-0.7 | Strong | Good evidence |\n| 0.2-0.4 | Moderate | Consider |\n| \u003c0.2 | Weak | Low confidence |\n\n### ClinGen - Gene-Disease Validity (NEW)\n\nAuthoritative curation of gene-disease relationships.\n\n| Tool | Purpose | Key Parameters |\n|------|---------|----------------|\n| `ClinGen_search_gene_validity` | Gene-disease validity | `gene` |\n| `ClinGen_search_dosage_sensitivity` | HI/TS scores | `gene` |\n| `ClinGen_search_actionability` | Clinical actionability | `gene` |\n| `ClinGen_get_variant_classifications` | Expert variant classifications | `gene`, `variant` |\n\n```python\n# Check gene-disease validity classification\nvalidity = tu.tools.ClinGen_search_gene_validity(gene=\"FBN1\")\n# Returns: Definitive for Marfan syndrome, Strong for MASS phenotype\n\n# Check dosage sensitivity (for CNV interpretation)\ndosage = tu.tools.ClinGen_search_dosage_sensitivity(gene=\"MECP2\")\n# Returns: HI Score 3 (haploinsufficient), TS Score 0\n\n# Check clinical actionability\nactionability = tu.tools.ClinGen_search_actionability(gene=\"BRCA1\")\n# Returns: Adult and pediatric actionability data\n```\n\n**ClinGen Validity Classification** (for gene panel prioritization):\n| Classification | Include in Panel? | ACMG Impact |\n|----------------|-------------------|-------------|\n| **Definitive** | Yes - mandatory | Strong PP4 support |\n| **Strong** | Yes | Good PP4 support |\n| **Moderate** | Yes | Moderate PP4 support |\n| **Limited** | Yes, but flag | Weak support |\n| **Disputed** | Exclude | Conflicting evidence |\n| **Refuted** | EXCLUDE | Gene not causative |\n\n**Dosage Sensitivity Scores** (for CNV interpretation):\n| Score | Meaning | ACMG Impact |\n|-------|---------|-------------|\n| **3** | Sufficient evidence | PVS1 for LOF deletions |\n| **2** | Emerging evidence | PM1 |\n| **1** | Little evidence | Weak support |\n| **0/40** | None/Unlikely | No dosage sensitivity |\n\n### OpenTargets Disease Tools\n\n| Tool | Purpose | Key Parameters |\n|------|---------|----------------|\n| `OpenTargets_get_disease_info_by_efoId` | Disease details | `efoId` |\n| `OpenTargets_get_disease_associated_targets` | Genes for disease | `efoId` |\n| `OpenTargets_get_associated_diseases_by_target_ensemblId` | Diseases for gene | `ensemblId` |\n\n---\n\n## Phase 3: Gene Panel\n\n### Gene Information\n\n| Tool | Purpose | Key Parameters |\n|------|---------|----------------|\n| `MyGene_query_genes` | Search genes | `q`, `species` |\n| `MyGene_get_gene_by_id` | Gene details | `geneid` |\n| `ensembl_lookup_gene` | Ensembl gene info | `id`, `species` |\n\n**Parameter Note**: Use `q` not `gene` for MyGene_query_genes.\n\n### Expression Validation\n\n| Tool | Purpose | Key Parameters |\n|------|---------|----------------|\n| `GTEx_get_median_gene_expression` | Tissue expression | `gencode_id` |\n| `HPA_get_gene_expression` | Protein expression | `ensembl_id` |\n\n**Note**: GTEx requires versioned Ensembl ID (e.g., `ENSG00000166147.15`)\n\n### Constraint Scores\n\n| Tool | Purpose | Key Parameters |\n|------|---------|----------------|\n| `gnomAD_get_gene_constraints` | pLI, LOEUF scores | `gene_symbol` |\n| `ExAC_get_constraint_metrics` | Constraint data | `gene` |\n\n---\n\n## Phase 4: Variant Interpretation\n\n### ClinVar Tools\n\n| Tool | Purpose | Key Parameters |\n|------|---------|----------------|\n| `ClinVar_search_variants` | Search variants | `query` |\n| `clinvar_get_variant_details` | Get variant details | `id` (not `variant_id`) |\n| `ClinVar_get_variant_classifications` | Classification history | `id` |\n\n**Parameter Note**: Use `id` not `variant_id` for ClinVar lookups.\n\n### Population Frequency\n\n| Tool | Purpose | Key Parameters |\n|------|---------|----------------|\n| `gnomAD_get_variant_frequencies` | Allele frequencies | `variant_id` |\n| `gnomAD_get_variant_annotations` | Variant annotations | `variant_id` |\n\n**Variant ID Format**: `1-55505647-G-A` (chrom-pos-ref-alt)\n\n### Pathogenicity Prediction (ENHANCED)\n\n| Tool | Purpose | Key Parameters |\n|------|---------|----------------|\n| `CADD_get_variant_score` | **CADD deleteriousness (NEW API)** | `chrom`, `pos`, `ref`, `alt`, `version` |\n| `AlphaMissense_get_variant_score` | **DeepMind pathogenicity (NEW)** | `uniprot_id`, `variant` |\n| `ESM_explain_variant_mechanism` | **ESMC-6B SAE mechanism of effect (NEW)** | `sequence`, `position`, `ref_aa`, `alt_aa` |\n| `EVE_get_variant_score` | **Evolutionary pathogenicity (NEW)** | `chrom`, `pos`, `ref`, `alt` OR `variant` (HGVS) |\n| `SpliceAI_predict_splice` | **Splice impact (NEW)** | `variant`, `genome` |\n| `SpliceAI_get_max_delta` | **Quick splice triage (NEW)** | `variant`, `genome` |\n| `SpliceAI_predict_pangolin` | Alternative splice model | `variant`, `genome` |\n\n### CADD API (NEW)\n\nDirect access to CADD deleteriousness scores:\n\n```python\n# Get CADD score for variant\nresult = tu.tools.CADD_get_variant_score(\n chrom=\"15\",\n pos=48942946,\n ref=\"G\",\n alt=\"A\",\n version=\"GRCh38-v1.7\"\n)\n# Returns: phred_score, raw_score, interpretation\n# PHRED ≥20 = top 1% deleterious (PP3 support)\n```\n\n### AlphaMissense (NEW)\n\nDeepMind's state-of-the-art missense pathogenicity prediction (~90% accuracy):\n\n```python\n# Get pathogenicity score for missense variant\nresult = tu.tools.AlphaMissense_get_variant_score(\n uniprot_id=\"P35555\", # FBN1\n variant=\"E1541K\" # or \"p.E1541K\"\n)\n# Returns: pathogenicity_score, classification (pathogenic/ambiguous/benign)\n# Thresholds: >0.564 pathogenic, \u003c0.34 benign\n```\n\n### ESMC-6B SAE — Mechanism of Effect (NEW)\n\nAlphaMissense gives a pathogenicity score but no mechanism. ESMC-6B Sparse Autoencoder features identify **which interpretable protein-language-model features the mutation disrupts** (catalytic, ligand-binding, PTM, domain, transmembrane, etc.). Use as a mechanism complement when the report needs to explain *how* a variant is pathogenic, not just *whether*.\n\n```python\n# One-call mechanism for a VUS (requires WT protein sequence)\nresult = tu.tools.ESM_explain_variant_mechanism(\n sequence=wt_protein_sequence,\n position=1541, ref_aa=\"E\", alt_aa=\"K\",\n top_k_features=5,\n)\n# result[\"data\"][\"mechanism_summary\"] e.g.:\n# \"Disrupted feature categories (lost): ligand-binding=2, domain=1\"\n```\n\n**Other SAE tools** (advanced):\n- `ESM_score_variant_sae_disruption` — single variant, raw feature deltas, no labels (faster, no describe-feature calls)\n- `ESM_score_variant_sae_batch` — many variants at once (N+1 Forge calls instead of 2N); use for saturation mutagenesis\n- `ESM_get_region_sae_features` — aggregate features over a residue range (e.g. characterize a domain or motif)\n- `ESM_describe_sae_feature` — biological category label for a feature_id (cached per id)\n\n**Mapping SAE categories → ACMG support**:\n| SAE category lost | Mechanistic claim | ACMG line |\n|---|---|---|\n| `catalytic` | Active-site disruption | Mechanistic support for PP3 |\n| `ligand-binding` / `ptm` / `domain` | Functional site disruption | Supports PP3 |\n| `structural-stability` / `secondary-structure` | Fold-destabilizing | Supports PP3 |\n| `transmembrane` / `signal-peptide` | Targeting / membrane integration | Supports PP3 |\n| (no interpretable change) | No mechanistic signal | Do not strengthen PP3 above the predictor score alone |\n\n**Requires**: `ESM_API_KEY` env var (free non-commercial token at https://forge.evolutionaryscale.ai) and `pip install 'esm @ git+https://github.com/evolutionaryscale/esm@ee891c52'` (SAE support on unmerged feature branch; PyPI esm 3.2.x lacks SAEConfig). Outputs governed by EvolutionaryScale Cambrian Inference License — non-commercial use only.\n\n### EVE (NEW)\n\nEvolutionary variant effect prediction (Harvard/Oxford):\n\n```python\n# Get EVE score\nresult = tu.tools.EVE_get_variant_score(\n chrom=\"15\",\n pos=48942946,\n ref=\"G\",\n alt=\"A\"\n)\n# Returns: eve_score, classification (likely_pathogenic/likely_benign)\n# Threshold: >0.5 likely pathogenic\n```\n\n### SpliceAI - Splice Variant Prediction (NEW)\n\nDeep learning model for predicting splice-altering effects. ~15% of pathogenic variants affect splicing.\n\n```python\n# Full splice prediction\nresult = tu.tools.SpliceAI_predict_splice(\n variant=\"chr15-48942946-G-A\",\n genome=\"38\" # or \"37\"\n)\n# Returns: DS_AG, DS_AL, DS_DG, DS_DL scores + max_delta_score + interpretation\n\n# Quick triage (max score only)\nquick = tu.tools.SpliceAI_get_max_delta(\n variant=\"chr15-48942946-G-A\",\n genome=\"38\"\n)\n# Returns: max_delta_score, interpretation, pathogenicity_threshold\n```\n\n**Variant Format**: `chr{chrom}-{pos}-{ref}-{alt}`\n\n**SpliceAI Delta Score Interpretation**:\n| Score Type | Meaning |\n|------------|---------|\n| DS_AG | Acceptor Gain (creates new) |\n| DS_AL | Acceptor Loss (disrupts existing) |\n| DS_DG | Donor Gain (creates new) |\n| DS_DL | Donor Loss (disrupts existing) |\n\n**Max Score Thresholds for ACMG**:\n| Max Delta Score | Interpretation | ACMG |\n|-----------------|----------------|------|\n| ≥0.8 | High splice impact | PP3 (strong) |\n| 0.5-0.8 | Moderate impact | PP3 (supporting) |\n| 0.2-0.5 | Low impact | PP3 (weak) |\n| \u003c0.2 | Likely no impact | BP7 (if synonymous) |\n\n**When to Use SpliceAI**:\n- Intronic variants within ±50bp of splice sites\n- Synonymous variants (may still affect splicing)\n- Exonic variants near splice junctions\n- Variants creating cryptic splice sites\n\n---\n\n**Prediction Tool Thresholds for PP3**:\n| Tool | Damaging | Uncertain | Benign |\n|------|----------|-----------|--------|\n| **AlphaMissense** | >0.564 | 0.34-0.564 | \u003c0.34 |\n| **CADD PHRED** | ≥20 | 15-20 | \u003c15 |\n| **EVE** | >0.5 | - | ≤0.5 |\n| **SpliceAI** | ≥0.5 | 0.2-0.5 | \u003c0.2 |\n\n**Recommended Strategy for VUS**:\n1. Run all predictors (AlphaMissense, CADD, EVE for missense; SpliceAI for splice)\n2. If ≥2 concordant damaging → Strong PP3 support\n3. If ≥2 concordant benign → BP4 support\n4. If discordant → Weight AlphaMissense highest for missense, SpliceAI for splice\n\n---\n\n## Phase 3.5: Expression & Regulatory Context (NEW)\n\n### CELLxGENE - Single-Cell Expression\n\n| Tool | Purpose | Key Parameters |\n|------|---------|----------------|\n| `CELLxGENE_get_expression_data` | Cell-type specific expression | `gene`, `tissue` |\n| `CELLxGENE_get_cell_metadata` | Cell type annotations | `gene` |\n| `CELLxGENE_download_h5ad` | Download full dataset | `dataset_id` |\n| `CELLxGENE_get_embeddings` | UMAP/tSNE coordinates | `dataset_id` |\n\n**Example - Get cell-type expression**:\n```python\n# Get expression across cell types\nexpression = tu.tools.CELLxGENE_get_expression_data(\n gene=\"FBN1\",\n tissue=\"heart\"\n)\n# Returns: Expression values per cell type\n```\n\n**Why use it**: Validates that candidate genes are expressed in disease-relevant cell types (e.g., fibroblasts for connective tissue disorders).\n\n### ChIPAtlas - Transcription Factor Binding\n\n| Tool | Purpose | Key Parameters |\n|------|---------|----------------|\n| `ChIPAtlas_enrichment_analysis` | TF binding enrichment | `gene`, `cell_type` |\n| `ChIPAtlas_get_peak_data` | ChIP-seq peaks | `gene`, `experiment_type` |\n| `ChIPAtlas_search_datasets` | Find experiments | `antigen`, `cell_type` |\n| `ChIPAtlas_get_experiments` | Experiment metadata | `experiment_id` |\n\n**Example - Get regulatory context**:\n```python\n# Find TFs that regulate gene\ntf_binding = tu.tools.ChIPAtlas_enrichment_analysis(\n gene=\"FBN1\",\n cell_type=\"Fibroblast\"\n)\n# Returns: TFs with significant binding near gene\n```\n\n**Why use it**: Identifies regulatory mechanisms that may be disrupted; helps interpret regulatory variants.\n\n### ENCODE - Regulatory Elements\n\n| Tool | Purpose | Key Parameters |\n|------|---------|----------------|\n| `ENCODE_search_experiments` | Find experiments | `assay_title`, `biosample` |\n| `ENCODE_get_experiment` | Experiment details | `accession` |\n| `ENCODE_get_biosample` | Sample annotations | `accession` |\n| `ENCODE_list_files` | Get data files | `experiment_accession` |\n\n**Example - Get regulatory data**:\n```python\n# Search for regulatory data\nexperiments = tu.tools.ENCODE_search_experiments(\n assay_title=\"ATAC-seq\",\n biosample=\"heart\"\n)\n```\n\n---\n\n## Phase 3.6: Pathway Analysis (NEW)\n\n### KEGG - Metabolic & Signaling Pathways\n\n| Tool | Purpose | Key Parameters |\n|------|---------|----------------|\n| `kegg_search_pathway` | Search pathways | `query` |\n| `kegg_get_pathway_info` | Pathway details | `pathway_id` |\n| `kegg_find_genes` | Find gene in KEGG | `query` |\n| `kegg_get_gene_info` | Gene pathway membership | `gene_id` |\n\n**Example - Get pathway context**:\n```python\n# Find gene in KEGG\nkegg_gene = tu.tools.kegg_find_genes(query=\"hsa:FBN1\")\n# Get pathway membership\ngene_info = tu.tools.kegg_get_gene_info(gene_id=\"hsa:2200\")\n# Returns: Pathways containing FBN1\n```\n\n### Reactome - Biological Processes\n\n| Tool | Purpose | Key Parameters |\n|------|---------|----------------|\n| `reactome_search_pathways` | Search pathways | `query` |\n| `reactome_get_pathway` | Pathway details | `pathway_id` |\n| `reactome_disease_target_score` | Disease-pathway links | `disease`, `target` |\n\n**Example - Get Reactome pathways**:\n```python\n# Search for pathways\npathways = tu.tools.reactome_search_pathways(query=\"TGF-beta signaling\")\n```\n\n### IntAct - Protein-Protein Interactions\n\n| Tool | Purpose | Key Parameters |\n|------|---------|----------------|\n| `intact_search_interactions` | Search interactions | `query`, `species` |\n| `intact_get_interaction_network` | Network view | `gene`, `depth` |\n| `intact_get_complex_details` | Protein complexes | `complex_id` |\n\n**Example - Get protein interactions**:\n```python\n# Get interaction partners\ninteractions = tu.tools.intact_search_interactions(\n query=\"FBN1\",\n species=\"human\"\n)\n# Returns: Direct interaction partners with confidence scores\n```\n\n**Why use it**: Identifies protein complexes and pathways; variants may disrupt protein-protein interactions.\n\n---\n\n## Phase 5: Structure Analysis (NVIDIA NIM)\n\n### Structure Prediction\n\n| Tool | Purpose | Key Parameters |\n|------|---------|----------------|\n| `NvidiaNIM_alphafold2` | High-accuracy prediction | `sequence`, `algorithm` |\n| `NvidiaNIM_esmfold` | Fast prediction | `sequence` |\n\n**Example - AlphaFold2 prediction**:\n```python\nstructure = tu.tools.NvidiaNIM_alphafold2(\n sequence=protein_sequence,\n algorithm=\"mmseqs2\",\n relax_prediction=False\n)\n# Returns: PDB structure with pLDDT scores\n```\n\n### Domain Annotation\n\n| Tool | Purpose | Key Parameters |\n|------|---------|----------------|\n| `InterPro_get_protein_domains` | Domain architecture | `accession` |\n| `UniProt_get_protein_features` | Sequence features | `accession` |\n| `Pfam_get_domains` | Pfam domains | `uniprot_id` |\n\n---\n\n## Phase 6: Literature Evidence (NEW)\n\n### PubMed - Published Literature\n\n| Tool | Purpose | Key Parameters |\n|------|---------|----------------|\n| `PubMed_search_articles` | Search articles | `query`, `limit` |\n| `PubMed_get_article` | Get article details | `pmid` |\n| `PubMed_get_related` | Related articles | `pmid` |\n| `PubMed_get_cited_by` | Citation tracking | `pmid` |\n\n**Example - Search disease literature**:\n```python\n# Disease-specific search\npapers = tu.tools.PubMed_search_articles(\n query='\"Marfan syndrome\" AND (FBN1 OR genetics)',\n limit=20\n)\n```\n\n### BioRxiv/MedRxiv - Preprints\n\n| Tool | Purpose | Key Parameters |\n|------|---------|----------------|\n| `EuropePMC_search_articles` | Search preprints (bioRxiv/medRxiv) | `query`, `source='PPR'`, `pageSize` |\n| `BioRxiv_get_preprint` | Get preprint by DOI | `doi` |\n| `ArXiv_search_papers` | Search ArXiv | `query`, `category`, `limit` |\n\n**Example - Search preprints** (bioRxiv/medRxiv don't have search APIs, use EuropePMC):\n```python\n# Search for recent preprints\npreprints = tu.tools.EuropePMC_search_articles(\n query=\"Marfan syndrome genetics\",\n source=\"PPR\", # PPR = Preprints only\n pageSize=10\n)\n\n# Get full metadata if you have a DOI\nif doi_from_results.startswith('10.1101/'):\n full = tu.tools.BioRxiv_get_preprint(doi=doi_from_results)\n# Returns: Recent preprints (not peer-reviewed)\n```\n\n**⚠️ Important**: Preprints are NOT peer-reviewed. Flag this in reports.\n\n### OpenAlex - Citation Analysis\n\n| Tool | Purpose | Key Parameters |\n|------|---------|----------------|\n| `openalex_search_works` | Search publications | `query`, `limit` |\n| `openalex_get_author` | Author metrics | `author_id` |\n| `openalex_literature_search` | Advanced search | `query`, `filters` |\n\n**Example - Citation analysis**:\n```python\n# Get citation data for paper\nwork = tu.tools.openalex_search_works(\n query=\"FBN1 Marfan pathogenic\",\n limit=10\n)\n# Returns: Papers with citation counts, open access status\n```\n\n### Semantic Scholar - AI-Enhanced Search\n\n| Tool | Purpose | Key Parameters |\n|------|---------|----------------|\n| `SemanticScholar_search_papers` | AI-ranked search | `query`, `limit` |\n\n**Example**:\n```python\n# AI-enhanced literature search\npapers = tu.tools.SemanticScholar_search_papers(\n query=\"rare disease diagnosis machine learning\",\n limit=15\n)\n```\n\n---\n\n## Workflow Code Examples\n\n### Example 1: Full Phenotype-to-Diagnosis\n\n```python\ndef diagnose_rare_disease(tu, symptoms, patient_id):\n \"\"\"Complete rare disease diagnostic workflow.\"\"\"\n \n # Phase 1: Standardize phenotype\n hpo_terms = []\n for symptom in symptoms:\n results = tu.tools.HPO_search_terms(query=symptom)\n if results:\n hpo_terms.append(results[0])\n \n # Phase 2: Match diseases\n candidate_diseases = []\n for hpo in hpo_terms:\n diseases = tu.tools.HPO_get_term_diseases(hp_id=hpo['id'])\n candidate_diseases.extend(diseases)\n \n # Rank by frequency\n disease_counts = Counter(d['orpha_id'] for d in candidate_diseases)\n top_diseases = disease_counts.most_common(10)\n \n # Phase 3: Build gene panel\n genes = set()\n for orpha_id, count in top_diseases:\n disease_genes = tu.tools.Orphanet_get_disease_genes(orpha_code=orpha_id)\n genes.update(disease_genes)\n \n return {\n 'hpo_terms': hpo_terms,\n 'candidate_diseases': top_diseases,\n 'gene_panel': list(genes)\n }\n```\n\n### Example 2: Variant Interpretation\n\n```python\ndef interpret_variant(tu, variant_hgvs, gene_symbol):\n \"\"\"Interpret a variant using ACMG criteria.\"\"\"\n \n evidence = {}\n \n # PM2: Population frequency\n freq = tu.tools.gnomAD_get_variant_frequencies(variant_id=variant_hgvs)\n if freq['allele_frequency'] \u003c 0.00001:\n evidence['PM2'] = {'strength': 'Moderate', 'reason': 'Absent from gnomAD'}\n \n # PP3: Computational predictions\n cadd = tu.tools.CADD_get_scores(variant=variant_hgvs)\n if cadd['phred_score'] > 25:\n evidence['PP3'] = {'strength': 'Supporting', 'reason': f'CADD={cadd[\"phred_score\"]}'}\n \n # ClinVar\n clinvar = tu.tools.ClinVar_search_variants(query=variant_hgvs)\n if clinvar:\n evidence['ClinVar'] = clinvar[0]['clinical_significance']\n \n return evidence\n```\n\n### Example 3: Structure Analysis for VUS\n\n```python\ndef analyze_vus_structure(tu, uniprot_id, variant_position):\n \"\"\"Structural analysis for variant of uncertain significance.\"\"\"\n \n # Get protein sequence\n protein = tu.tools.UniProt_get_protein_by_accession(accession=uniprot_id)\n sequence = protein['sequence']\n \n # Predict structure\n structure = tu.tools.NvidiaNIM_alphafold2(\n sequence=sequence,\n algorithm=\"mmseqs2\"\n )\n \n # Get domain annotations\n domains = tu.tools.InterPro_get_protein_domains(accession=uniprot_id)\n \n # Check if variant in domain\n variant_domain = None\n for domain in domains:\n if domain['start'] \u003c= variant_position \u003c= domain['end']:\n variant_domain = domain\n break\n \n return {\n 'structure': structure,\n 'plddt_at_position': get_plddt(structure, variant_position),\n 'domain': variant_domain\n }\n```\n\n---\n\n## Fallback Chains\n\n### Disease Matching\n| Primary | Fallback 1 | Fallback 2 |\n|---------|------------|------------|\n| `Orphanet_search_diseases` | `OMIM_search` | `DisGeNET_search_disease` |\n| `Orphanet_get_genes` | `OMIM_get_gene_map` | `DisGeNET_get_disease_genes` |\n| `OMIM_get_clinical_synopsis` | `Orphanet_get_disease` | `OpenTargets` |\n| `DisGeNET_search_gene` | `OpenTargets_diseases` | Literature search |\n\n### Expression & Regulatory\n| Primary | Fallback 1 | Fallback 2 |\n|---------|------------|------------|\n| `CELLxGENE_get_expression_data` | `GTEx_get_median_gene_expression` | `HPA_get_gene_expression` |\n| `ChIPAtlas_enrichment_analysis` | `ENCODE_search_experiments` | Literature search |\n\n### Pathway Analysis\n| Primary | Fallback 1 | Fallback 2 |\n|---------|------------|------------|\n| `kegg_get_gene_info` | `reactome_search_pathways` | `OpenTargets_pathways` |\n| `intact_search_interactions` | `STRING_interactions` | Literature search |\n\n### Variant Annotation\n| Primary | Fallback 1 | Fallback 2 |\n|---------|------------|------------|\n| `clinvar_get_variant_details` | `gnomAD_get_variant` | Literature search |\n| `gnomAD_get_variant_frequencies` | `gnomad_get_variant` | 1000 Genomes |\n\n### Pathogenicity Prediction (ENHANCED)\n| Primary | Fallback 1 | Fallback 2 |\n|---------|------------|------------|\n| `AlphaMissense_get_variant_score` | `CADD_get_variant_score` | `EVE_get_variant_score` |\n| `CADD_get_variant_score` | myvariant CADD field | PolyPhen-2 |\n| `EVE_get_variant_score` | VEP with EVE plugin | REVEL |\n\n### Structure Prediction\n| Primary | Fallback 1 | Fallback 2 |\n|---------|------------|------------|\n| `NvidiaNIM_alphafold2` | `alphafold_get_prediction` | `NvidiaNIM_esmfold` |\n| `InterPro_get_protein_domains` | `Pfam_get_domains` | `UniProt_features` |\n\n### Literature\n| Primary | Fallback 1 | Fallback 2 |\n|---------|------------|------------|\n| `PubMed_search_articles` | `EuropePMC_search_articles` | `SemanticScholar_search_papers` |\n| `EuropePMC_search_articles` (source='PPR') | `web_search` (site:biorxiv.org) | Skip preprints |\n| `openalex_search_works` | `Crossref_search_works` | PubMed |\n\n---\n\n## Common Parameter Mistakes\n\n| Tool | Wrong | Correct |\n|------|-------|---------|\n| `MyGene_query_genes` | `gene=\"FBN1\"` | `q=\"FBN1\"` |\n| `clinvar_get_variant_details` | `variant_id=123` | `id=123` |\n| `OpenTargets_*` | `ensemblID` | `ensemblId` (camelCase) |\n| `GTEx_get_median_gene_expression` | `ensembl_id` | `gencode_id` (versioned) |\n| `gnomAD_get_variant_frequencies` | `variant=\"c.123A>G\"` | `variant_id=\"1-123-A-G\"` |\n\n---\n\n## NVIDIA NIM Requirements\n\n**API Key**: `NVIDIA_API_KEY` environment variable required\n\n**Check availability**:\n```python\nimport os\nnvidia_available = bool(os.environ.get(\"NVIDIA_API_KEY\"))\n```\n\n**Rate limits**: 40 RPM (1.5 second minimum between calls)\n\n**Async operations**: AlphaFold2 may return 202, requiring polling:\n```python\n# Initial call may return 202\nresult = tu.tools.NvidiaNIM_alphafold2(sequence=seq)\nif result.get('status') == 'pending':\n # Poll for completion (handled internally by tool)\n pass\n```\n","content_type":"text/markdown; charset=utf-8","language":"markdown","size":27519,"content_sha256":"43181355de7a4527c380b2f502c8e4e9640d86f6e53077ad475be3d544c2ea89"}],"content_json":{"type":"doc","content":[{"type":"heading","attrs":{"level":1},"content":[{"text":"Rare Disease Diagnosis Advisor","type":"text"}]},{"type":"paragraph","content":[{"text":"Systematic diagnosis support for rare diseases using phenotype matching, gene panel prioritization, and variant interpretation across Orphanet, OMIM, HPO, ClinVar, and structure-based analysis.","type":"text"}]},{"type":"paragraph","content":[{"text":"KEY PRINCIPLES","type":"text","marks":[{"type":"strong"}]},{"text":":","type":"text"}]},{"type":"ordered_list","attrs":{"order":1,"listStyle":"number"},"content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Report-first","type":"text","marks":[{"type":"strong"}]},{"text":" - Create report file FIRST, update progressively","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Phenotype-driven","type":"text","marks":[{"type":"strong"}]},{"text":" - Convert symptoms to HPO terms before searching","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Multi-database triangulation","type":"text","marks":[{"type":"strong"}]},{"text":" - Cross-reference Orphanet, OMIM, OpenTargets","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Evidence grading","type":"text","marks":[{"type":"strong"}]},{"text":" - Grade diagnoses by supporting evidence strength","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"English-first queries","type":"text","marks":[{"type":"strong"}]},{"text":" - Always use English terms in tool calls","type":"text"}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"LOOK UP, DON'T GUESS","type":"text"}]},{"type":"paragraph","content":[{"text":"When uncertain about any scientific fact, SEARCH databases first rather than reasoning from memory.","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"COMPUTE, DON'T DESCRIBE","type":"text"}]},{"type":"paragraph","content":[{"text":"When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Clinical Reasoning Framework (BEFORE Tools)","type":"text"}]},{"type":"paragraph","content":[{"text":"Apply these strategies to form a 3-5 candidate differential, then use tools to confirm/refute:","type":"text"}]},{"type":"ordered_list","attrs":{"order":1,"listStyle":"number"},"content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Multi-system involvement","type":"text","marks":[{"type":"strong"}]},{"text":" - Symptoms spanning 2+ organ systems = strongest rare disease signal. Ask: what single pathway explains ALL features?","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Regression question","type":"text","marks":[{"type":"strong"}]},{"text":" - Losing abilities vs never acquired? Regression = neurodegenerative/metabolic storage. Stable = developmental/structural.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Trigger question","type":"text","marks":[{"type":"strong"}]},{"text":" - Episodic/triggered (fasting, illness, exercise) = metabolic disorder (often treatable). Constitutive = structural/degenerative.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Rarest feature first","type":"text","marks":[{"type":"strong"}]},{"text":" - Build differential from most specific finding, not most prominent. Check remaining features for consistency.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Treatable-first","type":"text","marks":[{"type":"strong"}]},{"text":" - Move treatable conditions to top for urgent workup (enzyme replacement, dietary, chelation, vitamin-responsive).","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Occupational/environmental exposure","type":"text","marks":[{"type":"strong"}]},{"text":" - Latency up to 50 years. Asbestos/silica/heavy metals/solvents/farming. Always ask about PAST jobs.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Autoimmune differential","type":"text","marks":[{"type":"strong"}]},{"text":" - Which joints? Symmetric? Extra-articular? Serologic pattern? Organ under attack?","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Rare syndrome signals","type":"text","marks":[{"type":"strong"}]},{"text":" - Named triads, common diagnoses failing to explain ALL findings, failed standard treatment, unusual lab findings.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Tools verify, not generate","type":"text","marks":[{"type":"strong"}]},{"text":" - Form hypothesis first, then use databases to confirm.","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"Common pitfalls","type":"text","marks":[{"type":"strong"}]},{"text":": Felty's (RA+splenomegaly+neutropenia) mimics infection; SLE nephritis mimics PSGN (check ASO); occupational exposures trigger autoimmunity (silica→scleroderma/RA/SLE).","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Tool Parameter Corrections","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Tool","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"WRONG","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"CORRECT","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"OpenTargets_get_associated_drugs_by_target_ensemblID","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"ensemblID","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"ensemblId","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"ClinVar_get_variant_details","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"variant_id","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"id","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"MyGene_query_genes","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"gene","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"q","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"gnomad_get_variant","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"variant","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"variant_id","type":"text","marks":[{"type":"code_inline"}]}]}]}]}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Workflow","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":""},"content":[{"text":"Phase 0: Clinical Reasoning → 3-5 candidate differential\nPhase 1: Phenotype → HPO terms (HPO_search_terms), core vs variable, onset, family history\nPhase 2: Disease Matching → Orphanet_search_diseases, OMIM_search, DisGeNET_search_gene\nPhase 3: Gene Panel → ClinGen validation, GTEx expression, prioritization scoring\nPhase 3.5: Expression Context → CELLxGENE, ChIPAtlas for tissue/cell-type confirmation\nPhase 3.6: Pathway Analysis → KEGG, IntAct for convergent pathways\nPhase 4: Variant Interpretation → ClinVar, gnomAD frequency, CADD/AlphaMissense/EVE/SpliceAI, ACMG criteria\nPhase 5: Structure Analysis → AlphaFold2, InterPro domains (for VUS)\nPhase 6: Literature → PubMed, BioRxiv/MedRxiv, OpenAlex\nPhase 7: Report Synthesis → Prioritized differential with next steps","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Key Phase Details","type":"text"}]},{"type":"paragraph","content":[{"text":"Phase 2 - Disease Matching","type":"text","marks":[{"type":"strong"}]},{"text":": ","type":"text"},{"text":"Orphanet_search_diseases(operation=\"search_diseases\", query=keyword)","type":"text","marks":[{"type":"code_inline"}]},{"text":" then ","type":"text"},{"text":"Orphanet_get_genes(operation=\"get_genes\", orpha_code=code)","type":"text","marks":[{"type":"code_inline"}]},{"text":". Score overlap: Excellent >80%, Good 60-80%, Possible 40-60%.","type":"text"}]},{"type":"paragraph","content":[{"text":"Phase 3 - Gene Panel","type":"text","marks":[{"type":"strong"}]},{"text":": ClinGen classification drives inclusion (Definitive/Strong/Moderate = include; Limited = flag; Disputed/Refuted = exclude). Scoring: Tier 1 (top disease gene +5), Tier 2 (multi-disease +3), Tier 3 (ClinGen Definitive +3), Tier 4 (tissue expression +2), Tier 5 (pLI >0.9 +1).","type":"text"}]},{"type":"paragraph","content":[{"text":"Phase 4 - Variants","type":"text","marks":[{"type":"strong"}]},{"text":": gnomAD frequency classes: ultra-rare \u003c0.00001, rare \u003c0.0001, low-freq \u003c0.01. ACMG: PVS1 (null), PS1 (same AA), PM2 (absent pop), PP3 (computational), BA1 (>5% AF). 2+ concordant predictors strengthen PP3.","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Evidence Grading","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Tier","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Criteria","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"T1","type":"text","marks":[{"type":"strong"}]},{"text":" (High)","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Phenotype match >80% + gene match","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"T2","type":"text","marks":[{"type":"strong"}]},{"text":" (Medium-High)","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Phenotype match 60-80% OR likely pathogenic variant","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"T3","type":"text","marks":[{"type":"strong"}]},{"text":" (Medium)","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Phenotype match 40-60% OR VUS in candidate gene","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"T4","type":"text","marks":[{"type":"strong"}]},{"text":" (Low)","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Phenotype \u003c40% OR uncertain gene","type":"text"}]}]}]}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Fallback Chains","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Primary","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Fallback 1","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Fallback 2","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"get_joint_associated_diseases_by_HPO_ID_list","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Orphanet_search_diseases","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"PubMed phenotype search","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"ClinVar_get_variant_details","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"gnomad_get_variant","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"VEP annotation","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"GTEx_get_expression_summary","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"HPA_search_genes_by_query","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Tissue-specific literature","type":"text"}]}]}]}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Reference Files","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"DIAGNOSTIC_WORKFLOW.md","type":"text","marks":[{"type":"link","attrs":{"href":"DIAGNOSTIC_WORKFLOW.md","title":null}}]},{"text":" - Code examples and algorithms per phase","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"REPORT_TEMPLATE.md","type":"text","marks":[{"type":"link","attrs":{"href":"REPORT_TEMPLATE.md","title":null}}]},{"text":" - Report template and examples","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"CHECKLIST.md","type":"text","marks":[{"type":"link","attrs":{"href":"CHECKLIST.md","title":null}}]},{"text":" - Interactive completeness checklist","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"scripts/clinical_patterns.py","type":"text","marks":[{"type":"code_inline"}]},{"text":" - Clinical pattern lookup (syndromes, differentials, red flags, occupational exposures)","type":"text"}]}]}]},{"type":"hr","attrs":{"markup":"---"}}]},"metadata":{"date":"2026-06-05","name":"tooluniverse-rare-disease-diagnosis","author":"@skillopedia","source":{"stars":1404,"repo_name":"tooluniverse","origin_url":"https://github.com/mims-harvard/tooluniverse/blob/HEAD/skills/tooluniverse-rare-disease-diagnosis/SKILL.md","repo_owner":"mims-harvard","body_sha256":"6c3b594064446581b743fb869115dcb5d16b55806942cce60cfe4f5f4221b43c","cluster_key":"6bc30c1bbcdb55fb0351cf2153ce045defc72795220885a1e614abced93d1012","clean_bundle":{"format":"clean-skill-bundle-v1","source":"mims-harvard/tooluniverse/skills/tooluniverse-rare-disease-diagnosis/SKILL.md","attachments":[{"id":"0bb5463d-75d9-5f30-8ca9-57be17104d6d","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/0bb5463d-75d9-5f30-8ca9-57be17104d6d/attachment.md","path":"CHECKLIST.md","size":6320,"sha256":"d8fc5d08ca3b584f3641466cc76eb0fb654e83ce1a2af7443db6ee4869a60d6c","contentType":"text/markdown; charset=utf-8"},{"id":"ad9e68ea-778e-5fe7-a669-a2abc124465f","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/ad9e68ea-778e-5fe7-a669-a2abc124465f/attachment.md","path":"DIAGNOSTIC_WORKFLOW.md","size":22098,"sha256":"db518547bd44e240de7e7ef8c68132f24674ab277ba6ef8d1abc5cdc115ba7f0","contentType":"text/markdown; charset=utf-8"},{"id":"e8661e7a-e3d7-5f92-85a8-f54937d8b212","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/e8661e7a-e3d7-5f92-85a8-f54937d8b212/attachment.md","path":"EXAMPLES.md","size":13697,"sha256":"81b51f3c61a9e0b991eba8e34d5ca590a244f776d8939dced79c3167004921c9","contentType":"text/markdown; charset=utf-8"},{"id":"96b82c92-04e0-5834-8c65-a4fadf5b156a","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/96b82c92-04e0-5834-8c65-a4fadf5b156a/attachment.md","path":"REPORT_TEMPLATE.md","size":11129,"sha256":"76585f3cb12d6bf02975c60845aa35f801dcfa959625d14ccf9929ea40755d19","contentType":"text/markdown; charset=utf-8"},{"id":"684105d8-510a-5db1-b932-5955a6b79dfc","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/684105d8-510a-5db1-b932-5955a6b79dfc/attachment.md","path":"TOOLS_REFERENCE.md","size":27519,"sha256":"43181355de7a4527c380b2f502c8e4e9640d86f6e53077ad475be3d544c2ea89","contentType":"text/markdown; charset=utf-8"},{"id":"071e73b6-2112-5149-a409-8b52c4b051a9","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/071e73b6-2112-5149-a409-8b52c4b051a9/attachment.py","path":"scripts/clinical_patterns.py","size":36160,"sha256":"6c1893fa875a5a89d2478a9782e21e8aeda0c6e332e5ee49a282950186d549c6","contentType":"text/x-python; charset=utf-8"}],"bundle_sha256":"cb36540d8c412bd117bcaf8a47eef21d360ba19656233e5f3cf1e744f42a3326","attachment_count":6,"text_attachments":6,"attachment_storage":"skillopedia-attachments-v1","binary_attachments":0,"excluded_attachments":[]},"cluster_size":2,"skill_md_path":"skills/tooluniverse-rare-disease-diagnosis/SKILL.md","import_metadata":{"date":"2026-06-05","author":"@skillopedia","version":"v1","category":"general","category_label":"General"},"exact_dupes_collapsed_into_this":1},"version":"v1","category":"general","import_tag":"clean-skills-v1","description":"Rare disease differential diagnosis from patient phenotype — HPO term matching to candidate diseases (Orphanet, OMIM), gene panel prioritization, ACMG variant interpretation, and structure-based variant analysis. Use for diagnostic odyssey assistance, phenotype-to-disease ranking, and genetic-counseling differential generation.","disable-model-invocation":true}},"renderedAt":1782989650812}

Rare Disease Diagnosis Advisor Systematic diagnosis support for rare diseases using phenotype matching, gene panel prioritization, and variant interpretation across Orphanet, OMIM, HPO, ClinVar, and structure-based analysis. KEY PRINCIPLES : 1. Report-first - Create report file FIRST, update progressively 2. Phenotype-driven - Convert symptoms to HPO terms before searching 3. Multi-database triangulation - Cross-reference Orphanet, OMIM, OpenTargets 4. Evidence grading - Grade diagnoses by supporting evidence strength 5. English-first queries - Always use English terms in tool calls LOOK UP,…