Claw Compactor — OpenClaw Skill Reference Overview Claw Compactor reduces token usage across the full OpenClaw workspace using 6 compression layers: | Layer | Name | Cost | Notes | |-------|------|------|-------| | 1 | Rule Engine | Free | Dedup, strip filler, merge sections | | 2 | Dictionary Encoding | Free | Auto-codebook, substitution | | 3 | Observation Compression | Free | Session JSONL → structured summaries | | 4 | RLE Patterns | Free | Path/IP/enum shorthand | | 5 | Compressed Context Protocol | Free | Format abbreviations | | 6 | Engram | LLM API | Real-time Observational Memory | S…

+ chr(65 + i) + chr(65 + j))\n if len(codes) >= n:\n return codes\n # 3-letter codes if needed\n for i in range(26):\n for j in range(26):\n for k in range(26):\n codes.append('

Claw Compactor — OpenClaw Skill Reference Overview Claw Compactor reduces token usage across the full OpenClaw workspace using 6 compression layers: | Layer | Name | Cost | Notes | |-------|------|------|-------| | 1 | Rule Engine | Free | Dedup, strip filler, merge sections | | 2 | Dictionary Encoding | Free | Auto-codebook, substitution | | 3 | Observation Compression | Free | Session JSONL → structured summaries | | 4 | RLE Patterns | Free | Path/IP/enum shorthand | | 5 | Compressed Context Protocol | Free | Format abbreviations | | 6 | Engram | LLM API | Real-time Observational Memory | S…

+ chr(65 + i) + chr(65 + j) + chr(65 + k))\n if len(codes) >= n:\n return codes\n return codes\n\n\ndef _tokenize_ngrams(text: str, min_n: int = 2, max_n: int = 5) -> Counter:\n \"\"\"Extract word n-grams from *text*, filtering by minimum length.\"\"\"\n counter: Counter = Counter()\n if not text:\n return counter\n words = text.split()\n for n in range(min_n, max_n + 1):\n for i in range(len(words) - n + 1):\n gram = ' '.join(words[i:i + n])\n if len(gram) >= MIN_PHRASE_LEN:\n counter[gram] += 1\n return counter\n\n\ndef _extract_ip_prefixes(texts: List[str]) -> Dict[str, int]:\n \"\"\"Find frequently occurring IP prefixes (3-octet) across *texts*.\"\"\"\n counter: Counter = Counter()\n for text in texts:\n for ip in _IP_RE.findall(text):\n parts = ip.split('.')\n prefix = '.'.join(parts[:3]) + '.'\n counter[prefix] += 1\n return {prefix: count for prefix, count in counter.items() if count >= 2}\n\n\ndef _extract_path_prefixes(texts: List[str]) -> Dict[str, int]:\n \"\"\"Find frequently occurring path prefixes (directory components) across *texts*.\"\"\"\n all_paths: List[str] = []\n for text in texts:\n for m in _PATH_RE.finditer(text):\n all_paths.append(m.group())\n \n if len(all_paths) \u003c 2:\n return {}\n \n # Extract directory prefixes at various depths\n counter: Counter = Counter()\n for path in all_paths:\n parts = path.split('/')\n # Generate prefixes of increasing length (at least 3 components)\n for depth in range(3, len(parts)):\n prefix = '/'.join(parts[:depth])\n counter[prefix] += 1\n \n return {prefix: count for prefix, count in counter.items() if count >= 2}\n\n\ndef build_codebook(\n texts: List[str],\n min_freq: int = MIN_FREQ,\n max_entries: int = MAX_CODEBOOK,\n) -> Dict[str, str]:\n \"\"\"Build a codebook from a list of text documents.\n\n Scans for high-frequency n-grams, IPs, and paths. Returns a dict\n mapping short codes ($XX) to the phrases they replace.\n \"\"\"\n if not texts:\n return {}\n\n # Gather candidates: n-grams + IPs + paths\n combined = Counter()\n for text in texts:\n combined.update(_tokenize_ngrams(text))\n\n # Add IPs and paths\n ip_freqs = _extract_ip_prefixes(texts)\n for ip, count in ip_freqs.items():\n if len(ip) >= MIN_PHRASE_LEN:\n combined[ip] = max(combined.get(ip, 0), count)\n\n path_freqs = _extract_path_prefixes(texts)\n for path, count in path_freqs.items():\n if len(path) >= MIN_PHRASE_LEN:\n combined[path] = max(combined.get(path, 0), count)\n\n # Filter by min_freq and sort by savings potential (freq * len)\n candidates = [\n (phrase, count)\n for phrase, count in combined.items()\n if count >= min_freq and len(phrase) >= MIN_PHRASE_LEN\n ]\n candidates.sort(key=lambda x: x[1] * len(x[0]), reverse=True)\n\n # Take top entries, avoiding overlapping phrases\n codes = _generate_codes(min(len(candidates), max_entries))\n codebook: Dict[str, str] = {}\n used_phrases: Set[str] = set()\n\n for (phrase, _count), code in zip(candidates, codes):\n # Skip if this phrase is a substring of an already-selected phrase\n skip = False\n for existing in used_phrases:\n if phrase in existing or existing in phrase:\n skip = True\n break\n if skip:\n continue\n codebook[code] = phrase\n used_phrases.add(phrase)\n if len(codebook) >= max_entries:\n break\n\n return codebook\n\n\ndef _normalize_codebook(codebook: Dict[str, str]) -> Dict[str, str]:\n \"\"\"Normalize codebook to {code: phrase} format.\n \n Accepts either {code: phrase} or {phrase: code} format.\n Detects format by checking if keys start with '

Claw Compactor — OpenClaw Skill Reference Overview Claw Compactor reduces token usage across the full OpenClaw workspace using 6 compression layers: | Layer | Name | Cost | Notes | |-------|------|------|-------| | 1 | Rule Engine | Free | Dedup, strip filler, merge sections | | 2 | Dictionary Encoding | Free | Auto-codebook, substitution | | 3 | Observation Compression | Free | Session JSONL → structured summaries | | 4 | RLE Patterns | Free | Path/IP/enum shorthand | | 5 | Compressed Context Protocol | Free | Format abbreviations | | 6 | Engram | LLM API | Real-time Observational Memory | S…

.\n \"\"\"\n if not codebook:\n return {}\n # Check first key to determine format\n first_key = next(iter(codebook))\n if first_key.startswith('

Claw Compactor — OpenClaw Skill Reference Overview Claw Compactor reduces token usage across the full OpenClaw workspace using 6 compression layers: | Layer | Name | Cost | Notes | |-------|------|------|-------| | 1 | Rule Engine | Free | Dedup, strip filler, merge sections | | 2 | Dictionary Encoding | Free | Auto-codebook, substitution | | 3 | Observation Compression | Free | Session JSONL → structured summaries | | 4 | RLE Patterns | Free | Path/IP/enum shorthand | | 5 | Compressed Context Protocol | Free | Format abbreviations | | 6 | Engram | LLM API | Real-time Observational Memory | S…

):\n return codebook # Already {code: phrase}\n else:\n # {phrase: code} -> {code: phrase}\n return {code: phrase for phrase, code in codebook.items()}\n\n\n_DOLLAR_ESCAPE = \"\\x00DLR\\x00\" # sentinel for literal '

Claw Compactor — OpenClaw Skill Reference Overview Claw Compactor reduces token usage across the full OpenClaw workspace using 6 compression layers: | Layer | Name | Cost | Notes | |-------|------|------|-------| | 1 | Rule Engine | Free | Dedup, strip filler, merge sections | | 2 | Dictionary Encoding | Free | Auto-codebook, substitution | | 3 | Observation Compression | Free | Session JSONL → structured summaries | | 4 | RLE Patterns | Free | Path/IP/enum shorthand | | 5 | Compressed Context Protocol | Free | Format abbreviations | | 6 | Engram | LLM API | Real-time Observational Memory | S…

in source text\n\n\ndef compress_text(text: str, codebook: Dict[str, str]) -> str:\n \"\"\"Apply codebook substitutions to *text*. Lossless.\n \n Accepts codebook in either {code: phrase} or {phrase: code} format.\n Pre-existing '

Claw Compactor — OpenClaw Skill Reference Overview Claw Compactor reduces token usage across the full OpenClaw workspace using 6 compression layers: | Layer | Name | Cost | Notes | |-------|------|------|-------| | 1 | Rule Engine | Free | Dedup, strip filler, merge sections | | 2 | Dictionary Encoding | Free | Auto-codebook, substitution | | 3 | Observation Compression | Free | Session JSONL → structured summaries | | 4 | RLE Patterns | Free | Path/IP/enum shorthand | | 5 | Compressed Context Protocol | Free | Format abbreviations | | 6 | Engram | LLM API | Real-time Observational Memory | S…

characters are escaped so they survive roundtrip.\n \"\"\"\n if not text or not codebook:\n return text\n normalized = _normalize_codebook(codebook)\n # Escape pre-existing '

Claw Compactor — OpenClaw Skill Reference Overview Claw Compactor reduces token usage across the full OpenClaw workspace using 6 compression layers: | Layer | Name | Cost | Notes | |-------|------|------|-------| | 1 | Rule Engine | Free | Dedup, strip filler, merge sections | | 2 | Dictionary Encoding | Free | Auto-codebook, substitution | | 3 | Observation Compression | Free | Session JSONL → structured summaries | | 4 | RLE Patterns | Free | Path/IP/enum shorthand | | 5 | Compressed Context Protocol | Free | Format abbreviations | | 6 | Engram | LLM API | Real-time Observational Memory | S…

to avoid collisions with codes\n result = text.replace(\"$\", _DOLLAR_ESCAPE)\n # Sort by phrase length descending to avoid partial matches\n for code, phrase in sorted(normalized.items(), key=lambda x: -len(x[1])):\n escaped_phrase = phrase.replace(\"$\", _DOLLAR_ESCAPE)\n result = result.replace(escaped_phrase, code)\n # Restore pre-existing '

Claw Compactor — OpenClaw Skill Reference Overview Claw Compactor reduces token usage across the full OpenClaw workspace using 6 compression layers: | Layer | Name | Cost | Notes | |-------|------|------|-------| | 1 | Rule Engine | Free | Dedup, strip filler, merge sections | | 2 | Dictionary Encoding | Free | Auto-codebook, substitution | | 3 | Observation Compression | Free | Session JSONL → structured summaries | | 4 | RLE Patterns | Free | Path/IP/enum shorthand | | 5 | Compressed Context Protocol | Free | Format abbreviations | | 6 | Engram | LLM API | Real-time Observational Memory | S…

that were not part of any codebook phrase\n result = result.replace(_DOLLAR_ESCAPE, \"$\")\n return result\n\n\ndef decompress_text(text: str, codebook: Dict[str, str]) -> str:\n \"\"\"Reverse codebook substitutions. Lossless.\n \n Accepts codebook in either {code: phrase} or {phrase: code} format.\n \"\"\"\n if not text or not codebook:\n return text\n normalized = _normalize_codebook(codebook)\n result = text\n # Sort by code length descending to handle $AAA before $AA\n for code, phrase in sorted(normalized.items(), key=lambda x: -len(x[0])):\n result = result.replace(code, phrase)\n # Unescape literal '

Claw Compactor — OpenClaw Skill Reference Overview Claw Compactor reduces token usage across the full OpenClaw workspace using 6 compression layers: | Layer | Name | Cost | Notes | |-------|------|------|-------| | 1 | Rule Engine | Free | Dedup, strip filler, merge sections | | 2 | Dictionary Encoding | Free | Auto-codebook, substitution | | 3 | Observation Compression | Free | Session JSONL → structured summaries | | 4 | RLE Patterns | Free | Path/IP/enum shorthand | | 5 | Compressed Context Protocol | Free | Format abbreviations | | 6 | Engram | LLM API | Real-time Observational Memory | S…

characters\n result = result.replace(_DOLLAR_ESCAPE, \"$\")\n return result\n\n\ndef save_codebook(codebook: Dict[str, str], path: Path) -> None:\n \"\"\"Save codebook to a JSON file.\"\"\"\n path = Path(path)\n path.parent.mkdir(parents=True, exist_ok=True)\n data = {\"version\": 1, \"entries\": codebook}\n path.write_text(json.dumps(data, indent=2, ensure_ascii=False), encoding=\"utf-8\")\n\n\ndef load_codebook(path: Path) -> Dict[str, str]:\n \"\"\"Load codebook from a JSON file.\"\"\"\n path = Path(path)\n if not path.exists():\n raise FileNotFoundError(f\"Codebook not found: {path}\")\n data = json.loads(path.read_text(encoding=\"utf-8\"))\n if not isinstance(data, dict) or \"entries\" not in data:\n raise ValueError(f\"Invalid codebook format: {path}\")\n return data[\"entries\"]\n\n\ndef compression_stats(\n texts_or_original, codebook_or_compressed=None, codebook=None\n) -> Dict[str, object]:\n \"\"\"Calculate compression statistics.\n \n Can be called as:\n compression_stats(texts_dict, codebook) — where texts_dict maps filenames to content\n compression_stats(original_str, compressed_str, codebook)\n \"\"\"\n if codebook is not None:\n # 3-arg form: (original, compressed, codebook)\n original = texts_or_original\n compressed = codebook_or_compressed\n orig_len = len(original)\n comp_len = len(compressed)\n elif isinstance(texts_or_original, dict) and isinstance(codebook_or_compressed, dict):\n # 2-arg form: (texts_dict, codebook)\n codebook = codebook_or_compressed\n all_text = '\\n'.join(texts_or_original.values())\n original = all_text\n compressed = compress_text(all_text, codebook)\n orig_len = len(original)\n comp_len = len(compressed)\n else:\n return {\"original_chars\": 0, \"compressed_chars\": 0, \"gross_reduction_pct\": 0.0,\n \"codebook_entries\": 0, \"codes_used\": 0}\n\n reduction = ((orig_len - comp_len) / orig_len * 100) if orig_len else 0.0\n\n # Count how many codes are actually used in the compressed text\n normalized = _normalize_codebook(codebook)\n codes_used = sum(1 for code in normalized if code in compressed)\n\n # Net reduction accounts for codebook overhead\n codebook_overhead = sum(len(k) + len(v) + 2 for k, v in normalized.items()) # code: phrase + separator\n net_saved = orig_len - comp_len - codebook_overhead\n net_reduction = (net_saved / orig_len * 100) if orig_len else 0.0\n\n return {\n \"original_chars\": orig_len,\n \"compressed_chars\": comp_len,\n \"gross_reduction_pct\": round(reduction, 2),\n \"net_reduction_pct\": round(net_reduction, 2),\n \"codebook_entries\": len(codebook),\n \"codes_used\": codes_used,\n }\n","content_type":"text/x-python; charset=utf-8","language":"python","size":9918,"content_sha256":"75e7dee23c3f0c09c1c7dc20d681ae03ac2890d111cd3231c64ff27645d59817"},{"filename":"scripts/lib/engram_http.py","content":"\"\"\"\nengram_http.py — HTTP POST helper with retry logic for Engram LLM calls.\n\nPrefers httpx when available, falls back to stdlib urllib.\nPart of claw-compactor / Engram layer. License: MIT.\n\"\"\"\n\nfrom __future__ import annotations\n\nimport json\nimport logging\nimport time\nimport urllib.error\nimport urllib.request\nfrom typing import Optional\n\nlogger = logging.getLogger(__name__)\n\n# ---------------------------------------------------------------------------\n# Optional httpx import\n# ---------------------------------------------------------------------------\ntry:\n import httpx as _httpx\n _HTTPX_AVAILABLE = True\nexcept ImportError:\n _httpx = None # type: ignore[assignment]\n _HTTPX_AVAILABLE = False\n\n# HTTP status codes that should not be retried (client errors)\n_NO_RETRY_CODES = {400, 401, 403}\n# HTTP status codes that are transient and worth retrying\n_RETRY_CODES = {429, 500, 502, 503, 504}\n# Exception types that indicate transient network issues\n_RETRY_EXCEPTIONS = (ConnectionError, ConnectionResetError, TimeoutError,\n urllib.error.URLError)\n\n\ndef http_post(url: str, headers: dict, body: dict, max_retries: int = 3) -> dict:\n \"\"\"\n POST JSON body to *url* and return parsed JSON response.\n\n Retries on transient HTTP errors (429, 500, 502, 503, 504) and network\n exceptions using exponential back-off: 2, 4, 8 seconds between attempts.\n Non-retriable errors (400, 401, 403) are raised immediately.\n\n Args:\n url: Target URL.\n headers: HTTP headers dict.\n body: Request body (will be JSON-serialised).\n max_retries: Maximum number of retry attempts (default 3).\n\n Returns:\n Parsed JSON response dict.\n\n Raises:\n RuntimeError: On non-retriable HTTP errors or after exhausting retries.\n \"\"\"\n payload = json.dumps(body, ensure_ascii=False).encode(\"utf-8\")\n\n if _HTTPX_AVAILABLE and _httpx is not None:\n return _post_httpx(url, headers, payload, max_retries)\n\n return _post_urllib(url, headers, payload, max_retries)\n\n\ndef _post_httpx(url: str, headers: dict, payload: bytes, max_retries: int) -> dict:\n last_exc: Optional[Exception] = None\n with _httpx.Client(timeout=120.0) as client:\n for attempt in range(max_retries + 1):\n try:\n resp = client.post(url, headers=headers, content=payload)\n if resp.status_code in _NO_RETRY_CODES:\n raise RuntimeError(\n f\"Engram HTTP {resp.status_code} from {url}: {resp.text[:200]}\"\n )\n if resp.status_code in _RETRY_CODES and attempt \u003c max_retries:\n delay = 2 ** (attempt + 1)\n logger.warning(\n \"Engram HTTP %d, retry %d/%d in %ds…\",\n resp.status_code, attempt + 1, max_retries, delay,\n )\n time.sleep(delay)\n last_exc = RuntimeError(\n f\"Engram HTTP {resp.status_code} from {url}\"\n )\n continue\n resp.raise_for_status()\n return resp.json()\n except _RETRY_EXCEPTIONS as exc:\n last_exc = exc\n if attempt \u003c max_retries:\n delay = 2 ** (attempt + 1)\n logger.warning(\n \"Engram network error (%s), retry %d/%d in %ds…\",\n exc, attempt + 1, max_retries, delay,\n )\n time.sleep(delay)\n else:\n raise\n raise last_exc or RuntimeError(f\"Engram: max retries exceeded for {url}\")\n\n\ndef _post_urllib(url: str, headers: dict, payload: bytes, max_retries: int) -> dict:\n last_exc: Optional[Exception] = None\n for attempt in range(max_retries + 1):\n req = urllib.request.Request(url, data=payload, headers=headers, method=\"POST\")\n try:\n with urllib.request.urlopen(req, timeout=120) as resp:\n raw = resp.read().decode(\"utf-8\")\n return json.loads(raw)\n except urllib.error.HTTPError as exc:\n if exc.code in _NO_RETRY_CODES:\n body_text = exc.read().decode(\"utf-8\", errors=\"replace\")[:200]\n raise RuntimeError(\n f\"Engram HTTP {exc.code} from {url}: {body_text}\"\n ) from exc\n if exc.code in _RETRY_CODES and attempt \u003c max_retries:\n delay = 2 ** (attempt + 1)\n logger.warning(\n \"Engram HTTP %d, retry %d/%d in %ds…\",\n exc.code, attempt + 1, max_retries, delay,\n )\n time.sleep(delay)\n last_exc = exc\n continue\n body_text = exc.read().decode(\"utf-8\", errors=\"replace\")[:200]\n raise RuntimeError(\n f\"Engram HTTP {exc.code} from {url}: {body_text}\"\n ) from exc\n except _RETRY_EXCEPTIONS as exc:\n last_exc = exc\n if attempt \u003c max_retries:\n delay = 2 ** (attempt + 1)\n logger.warning(\n \"Engram network error (%s), retry %d/%d in %ds…\",\n exc, attempt + 1, max_retries, delay,\n )\n time.sleep(delay)\n else:\n raise\n raise last_exc or RuntimeError(f\"Engram: max retries exceeded for {url}\")\n","content_type":"text/x-python; charset=utf-8","language":"python","size":5489,"content_sha256":"6dc4d14a27b45301486205e495020d3be27f47f21a6d6187b98c24c4a9c47aee"},{"filename":"scripts/lib/engram_learner.py","content":"\"\"\"engram_learner.py — Engram v2: failure learning for claw-compactor.\n\nScans JSONL session logs, classifies error events into known failure patterns,\nand generates compression rules (with evidence thresholds) that can be exported\nfor insertion into MEMORY.md.\n\nZero required dependencies beyond the Python 3.9+ standard library.\n\nPart of claw-compactor / Engram layer. License: MIT.\n\"\"\"\nfrom __future__ import annotations\n\nimport json\nimport logging\nimport re\nfrom dataclasses import dataclass, field\nfrom pathlib import Path\nfrom typing import Iterator\n\nlogger = logging.getLogger(__name__)\n\n# ---------------------------------------------------------------------------\n# Data classes (immutable)\n# ---------------------------------------------------------------------------\n\n\n@dataclass(frozen=True)\nclass FailureEvent:\n \"\"\"A single classified failure extracted from a session log.\"\"\"\n\n pattern_name: str # key from ERROR_PATTERNS\n raw_message: str # original error text\n source_file: str # absolute path to the JSONL file\n line_number: int # 1-based line in the JSONL file\n role: str = \"unknown\" # message role if available\n timestamp: str = \"\" # ISO timestamp if available\n\n\n@dataclass(frozen=True)\nclass CompressionRule:\n \"\"\"A learnt compression rule derived from repeated failure patterns.\n\n Only generated when ``evidence_count >= 2``.\n \"\"\"\n\n pattern_name: str\n description: str\n evidence_count: int\n example_messages: tuple[str, ...] # up to 3 representative raw messages\n suggested_annotation: str # short text for MEMORY.md\n\n\n# ---------------------------------------------------------------------------\n# Error pattern registry\n# ---------------------------------------------------------------------------\n\n# Each entry maps a pattern name → list of regex fragments (any match = hit).\n# Patterns are compiled once at import time.\n\n_RAW_PATTERNS: dict[str, list[str]] = {\n \"FILE_NOT_FOUND\": [\n r\"No such file or directory\",\n r\"FileNotFoundError\",\n r\"ENOENT\",\n r\"cannot find.*file\",\n r\"file not found\",\n ],\n \"MODULE_NOT_FOUND\": [\n r\"ModuleNotFoundError\",\n r\"Cannot find module\",\n r\"No module named\",\n r\"ImportError.*no module\",\n r\"module not found\",\n ],\n \"PERMISSION_DENIED\": [\n r\"Permission denied\",\n r\"EACCES\",\n r\"PermissionError\",\n r\"Access is denied\",\n r\"not permitted\",\n ],\n \"TIMEOUT\": [\n r\"TimeoutError\",\n r\"timed out\",\n r\"ETIMEDOUT\",\n r\"deadline exceeded\",\n r\"operation timed out\",\n ],\n \"BUILD_FAILED\": [\n r\"Build failed\",\n r\"compilation error\",\n r\"make.*Error\",\n r\"exit code [1-9]\",\n r\"FAILED.*build\",\n ],\n \"TEST_FAILED\": [\n r\"FAILED.*test\",\n r\"AssertionError\",\n r\"test.*failed\",\n r\"pytest.*FAILED\",\n r\"FAIL.*suite\",\n ],\n \"SYNTAX_ERROR\": [\n r\"SyntaxError\",\n r\"syntax error\",\n r\"unexpected token\",\n r\"unexpected.*EOF\",\n r\"invalid syntax\",\n ],\n \"TYPE_ERROR\": [\n r\"TypeError\",\n r\"type error\",\n r\"cannot read propert\", # JS: \"Cannot read property X of undefined\"\n r\"is not a function\",\n r\"unsupported operand type\",\n ],\n \"IMPORT_ERROR\": [\n r\"ImportError\",\n r\"cannot import name\",\n r\"failed to import\",\n r\"import.*failed\",\n r\"unresolved import\",\n ],\n \"CONNECTION_ERROR\": [\n r\"ConnectionError\",\n r\"connection refused\",\n r\"ECONNREFUSED\",\n r\"network error\",\n r\"failed to connect\",\n ],\n \"AUTH_FAILED\": [\n r\"401 Unauthorized\",\n r\"authentication failed\",\n r\"invalid credentials\",\n r\"AuthenticationError\",\n r\"access token.*expired\",\n ],\n \"RATE_LIMITED\": [\n r\"429 Too Many Requests\",\n r\"rate limit\",\n r\"RateLimitError\",\n r\"quota exceeded\",\n r\"too many requests\",\n ],\n \"OUT_OF_MEMORY\": [\n r\"MemoryError\",\n r\"out of memory\",\n r\"OOM\",\n r\"Cannot allocate memory\",\n r\"JavaScript heap out of memory\",\n ],\n \"DISK_FULL\": [\n r\"No space left on device\",\n r\"ENOSPC\",\n r\"disk full\",\n r\"DiskFullError\",\n r\"not enough space\",\n ],\n}\n\n# Compile all patterns once\nERROR_PATTERNS: dict[str, list[re.Pattern[str]]] = {\n name: [re.compile(frag, re.IGNORECASE) for frag in frags]\n for name, frags in _RAW_PATTERNS.items()\n}\n\n# Minimum evidence count required to emit a CompressionRule\n_MIN_EVIDENCE = 2\n\n# Max example messages kept per rule\n_MAX_EXAMPLES = 3\n\n# Fields in a JSONL line that may carry error text\n_TEXT_FIELDS = (\"content\", \"text\", \"message\", \"error\", \"output\", \"stderr\", \"stdout\")\n\n# Role values that typically contain error information\n_ERROR_ROLES = {\"assistant\", \"tool\", \"system\", \"error\"}\n\n\n# ---------------------------------------------------------------------------\n# EngramLearner\n# ---------------------------------------------------------------------------\n\n\nclass EngramLearner:\n \"\"\"Learn from session failures to generate compression rules.\n\n Usage::\n\n learner = EngramLearner()\n failures = learner.scan_session(\"/path/to/session/dir\")\n rules = learner.generate_rules(failures)\n md_block = learner.export_rules(rules)\n \"\"\"\n\n # Expose pattern map as a class attribute for introspection / testing\n ERROR_PATTERNS: dict[str, list[re.Pattern[str]]] = ERROR_PATTERNS\n\n # ------------------------------------------------------------------\n # Public API\n # ------------------------------------------------------------------\n\n def scan_session(self, session_dir: str) -> list[FailureEvent]:\n \"\"\"Scan all JSONL files in *session_dir* and return classified failures.\n\n Each line of every ``*.jsonl`` file is parsed as a JSON object. Lines\n that contain error-like text (according to ``ERROR_PATTERNS``) are\n converted to :class:`FailureEvent` instances.\n\n Args:\n session_dir: Path to the directory containing session JSONL files.\n\n Returns:\n List of :class:`FailureEvent` objects, in file/line order.\n \"\"\"\n root = Path(session_dir)\n if not root.exists():\n logger.warning(\"EngramLearner.scan_session: directory not found: %s\", session_dir)\n return []\n\n events: list[FailureEvent] = []\n for jsonl_path in sorted(root.rglob(\"*.jsonl\")):\n events.extend(self._scan_file(jsonl_path))\n\n logger.info(\n \"EngramLearner: scanned %s, found %d failure events\",\n session_dir,\n len(events),\n )\n return events\n\n def classify_failure(self, event: dict) -> str:\n \"\"\"Classify a single raw event dict against ERROR_PATTERNS.\n\n Args:\n event: Arbitrary dict (e.g. a parsed JSONL line).\n\n Returns:\n The first matching pattern name, or ``\"UNKNOWN\"`` if none match.\n \"\"\"\n text = _extract_text(event)\n return self._classify_text(text)\n\n def generate_rules(self, failures: list[FailureEvent]) -> list[CompressionRule]:\n \"\"\"Derive compression rules from a list of failure events.\n\n Only patterns with ``evidence_count >= 2`` produce a rule.\n\n Args:\n failures: Output of :meth:`scan_session`.\n\n Returns:\n List of :class:`CompressionRule` objects sorted by evidence_count\n descending (highest evidence first).\n \"\"\"\n # Bucket failures by pattern name\n buckets: dict[str, list[FailureEvent]] = {}\n for evt in failures:\n buckets.setdefault(evt.pattern_name, []).append(evt)\n\n rules: list[CompressionRule] = []\n for name, evts in buckets.items():\n count = len(evts)\n if count \u003c _MIN_EVIDENCE:\n logger.debug(\n \"EngramLearner: skipping rule %s (evidence=%d \u003c %d)\",\n name,\n count,\n _MIN_EVIDENCE,\n )\n continue\n\n examples = tuple(e.raw_message[:200] for e in evts[:_MAX_EXAMPLES])\n annotation = _build_annotation(name, count)\n\n rules.append(\n CompressionRule(\n pattern_name=name,\n description=_DESCRIPTIONS.get(name, name),\n evidence_count=count,\n example_messages=examples,\n suggested_annotation=annotation,\n )\n )\n\n rules.sort(key=lambda r: r.evidence_count, reverse=True)\n return rules\n\n def export_rules(self, rules: list[CompressionRule]) -> str:\n \"\"\"Format rules as a Markdown block suitable for insertion into MEMORY.md.\n\n Args:\n rules: Output of :meth:`generate_rules`.\n\n Returns:\n A formatted Markdown string. Empty string if *rules* is empty.\n \"\"\"\n if not rules:\n return \"\"\n\n lines: list[str] = [\n \"## Learnt Failure Patterns (Engram v2)\",\n \"\",\n \"Auto-generated by EngramLearner. Review before committing.\",\n \"\",\n ]\n\n for rule in rules:\n lines.append(f\"### {rule.pattern_name} (seen {rule.evidence_count}x)\")\n lines.append(f\"- **Description**: {rule.description}\")\n lines.append(f\"- **Annotation**: {rule.suggested_annotation}\")\n if rule.example_messages:\n lines.append(\"- **Examples**:\")\n for ex in rule.example_messages:\n # Truncate long examples for readability\n safe = ex.replace(\"\\n\", \" \")[:120]\n lines.append(f\" - `{safe}`\")\n lines.append(\"\")\n\n return \"\\n\".join(lines)\n\n # ------------------------------------------------------------------\n # Internal helpers\n # ------------------------------------------------------------------\n\n def _scan_file(self, path: Path) -> list[FailureEvent]:\n \"\"\"Parse a single JSONL file and return failure events.\"\"\"\n events: list[FailureEvent] = []\n try:\n with path.open(encoding=\"utf-8\", errors=\"replace\") as fh:\n for lineno, raw_line in enumerate(fh, start=1):\n raw_line = raw_line.strip()\n if not raw_line:\n continue\n event = self._parse_line(raw_line, str(path), lineno)\n if event is not None:\n events.append(event)\n except OSError as exc:\n logger.warning(\"EngramLearner: cannot read %s: %s\", path, exc)\n return events\n\n def _parse_line(\n self, raw_line: str, source_file: str, lineno: int\n ) -> FailureEvent | None:\n \"\"\"Parse a single JSONL line. Return a FailureEvent or None.\"\"\"\n try:\n obj = json.loads(raw_line)\n except json.JSONDecodeError:\n # Non-JSON line — try treating the raw text as the message\n pattern = self._classify_text(raw_line)\n if pattern == \"UNKNOWN\":\n return None\n return FailureEvent(\n pattern_name=pattern,\n raw_message=raw_line[:500],\n source_file=source_file,\n line_number=lineno,\n )\n\n if not isinstance(obj, dict):\n return None\n\n text = _extract_text(obj)\n if not text:\n return None\n\n pattern = self._classify_text(text)\n if pattern == \"UNKNOWN\":\n return None\n\n role = obj.get(\"role\", \"unknown\")\n timestamp = obj.get(\"timestamp\", obj.get(\"ts\", \"\"))\n\n return FailureEvent(\n pattern_name=pattern,\n raw_message=text[:500],\n source_file=source_file,\n line_number=lineno,\n role=str(role),\n timestamp=str(timestamp),\n )\n\n def _classify_text(self, text: str) -> str:\n \"\"\"Return the first matching ERROR_PATTERNS key, or ``\"UNKNOWN\"``.\"\"\"\n if not text:\n return \"UNKNOWN\"\n for name, compiled_patterns in ERROR_PATTERNS.items():\n for pat in compiled_patterns:\n if pat.search(text):\n return name\n return \"UNKNOWN\"\n\n\n# ---------------------------------------------------------------------------\n# Helpers\n# ---------------------------------------------------------------------------\n\ndef _extract_text(obj: dict) -> str:\n \"\"\"Pull the most informative text from a JSONL event dict.\"\"\"\n parts: list[str] = []\n for field_name in _TEXT_FIELDS:\n val = obj.get(field_name)\n if isinstance(val, str) and val.strip():\n parts.append(val.strip())\n elif isinstance(val, list):\n # List of content blocks (Anthropic-style)\n for block in val:\n if isinstance(block, dict):\n inner = block.get(\"text\", block.get(\"content\", \"\"))\n if isinstance(inner, str) and inner.strip():\n parts.append(inner.strip())\n return \" | \".join(parts)\n\n\ndef _build_annotation(pattern_name: str, count: int) -> str:\n \"\"\"Build a short MEMORY.md annotation for a pattern.\"\"\"\n desc = _DESCRIPTIONS.get(pattern_name, pattern_name.replace(\"_\", \" \").title())\n return (\n f\"[{pattern_name}] {desc} occurred {count} time(s) in recent sessions. \"\n \"Investigate root cause and add mitigation.\"\n )\n\n\n_DESCRIPTIONS: dict[str, str] = {\n \"FILE_NOT_FOUND\": \"A required file was missing from the expected path.\",\n \"MODULE_NOT_FOUND\": \"A Python/Node module was not installed or importable.\",\n \"PERMISSION_DENIED\": \"A file or network operation was blocked by OS permissions.\",\n \"TIMEOUT\": \"An operation exceeded its time limit.\",\n \"BUILD_FAILED\": \"The build or compilation step exited with an error.\",\n \"TEST_FAILED\": \"One or more automated tests failed.\",\n \"SYNTAX_ERROR\": \"Source code contained a syntax error.\",\n \"TYPE_ERROR\": \"A value had an unexpected or incompatible type.\",\n \"IMPORT_ERROR\": \"A module import failed at runtime.\",\n \"CONNECTION_ERROR\": \"A network connection could not be established.\",\n \"AUTH_FAILED\": \"Authentication or authorisation was rejected.\",\n \"RATE_LIMITED\": \"An API or service enforced a rate limit.\",\n \"OUT_OF_MEMORY\": \"The process ran out of available memory.\",\n \"DISK_FULL\": \"The disk or volume ran out of free space.\",\n}\n","content_type":"text/x-python; charset=utf-8","language":"python","size":14681,"content_sha256":"fe801442b2deee338d6c91eeca5ab0618d8a04777ef74d58405d0c9655c4ebae"},{"filename":"scripts/lib/engram_llm.py","content":"\"\"\"\nengram_llm.py — LLM client for Engram Observer/Reflector calls.\n\nSupports Anthropic Messages API and OpenAI-compatible chat completions.\nPart of claw-compactor / Engram layer. License: MIT.\n\"\"\"\n\nfrom __future__ import annotations\n\nimport logging\nfrom typing import Optional\n\nfrom claw_compactor.engram_http import http_post\n\nlogger = logging.getLogger(__name__)\n\nDEFAULT_ANTHROPIC_VERSION = \"2023-06-01\"\n\n\nclass EngramLLMClient:\n \"\"\"\n LLM client that routes calls to Anthropic or OpenAI-compatible endpoints.\n\n Args:\n model: LLM model identifier.\n max_tokens: Max tokens the LLM may produce per call.\n anthropic_api_key: Anthropic API key (empty string = disabled).\n openai_api_key: OpenAI API key (empty string = disabled).\n openai_base_url: OpenAI-compatible base URL.\n \"\"\"\n\n def __init__(\n self,\n model: str,\n max_tokens: int,\n anthropic_api_key: str = \"\",\n openai_api_key: str = \"\",\n openai_base_url: str = \"https://api.openai.com\",\n ) -> None:\n self.model = model\n self.max_tokens = max_tokens\n self.anthropic_api_key = anthropic_api_key\n self.openai_api_key = openai_api_key\n self.openai_base_url = openai_base_url\n\n def call(self, system: str, user: str) -> str:\n \"\"\"\n Call LLM API. Prefers Anthropic if key available, else OpenAI-compatible.\n\n Args:\n system: System prompt.\n user: User message content.\n\n Returns:\n Assistant response text.\n\n Raises:\n RuntimeError: If no API key is configured.\n \"\"\"\n if self.anthropic_api_key:\n return self._call_anthropic(system, user)\n if self.openai_api_key:\n return self._call_openai_compatible(system, user)\n raise RuntimeError(\n \"EngramEngine: no API key configured. \"\n \"Set ANTHROPIC_API_KEY or OPENAI_API_KEY environment variable.\"\n )\n\n def _call_anthropic(self, system: str, user: str) -> str:\n url = \"https://api.anthropic.com/v1/messages\"\n headers = {\n \"x-api-key\": self.anthropic_api_key,\n \"anthropic-version\": DEFAULT_ANTHROPIC_VERSION,\n \"content-type\": \"application/json\",\n }\n body = {\n \"model\": self.model,\n \"max_tokens\": self.max_tokens,\n \"system\": system,\n \"messages\": [{\"role\": \"user\", \"content\": user}],\n }\n data = http_post(url, headers, body)\n content = data.get(\"content\", [])\n for block in content:\n if block.get(\"type\") == \"text\":\n return block[\"text\"]\n raise ValueError(f\"Engram: no text content in Anthropic response: {data}\")\n\n def _call_openai_compatible(self, system: str, user: str) -> str:\n base = self.openai_base_url.rstrip(\"/\")\n url = f\"{base}/v1/chat/completions\"\n headers = {\n \"Authorization\": f\"Bearer {self.openai_api_key}\",\n \"content-type\": \"application/json\",\n }\n body = {\n \"model\": self.model,\n \"max_tokens\": self.max_tokens,\n \"messages\": [\n {\"role\": \"system\", \"content\": system},\n {\"role\": \"user\", \"content\": user},\n ],\n }\n data = http_post(url, headers, body)\n try:\n return data[\"choices\"][0][\"message\"][\"content\"]\n except (KeyError, IndexError) as exc:\n raise ValueError(\n f\"Engram: unexpected OpenAI response structure: {data}\"\n ) from exc\n","content_type":"text/x-python; charset=utf-8","language":"python","size":3628,"content_sha256":"a2394d1c3cbbe97fc65ff12622f98fa2103599ed6edcf248071668df1c406ef0"},{"filename":"scripts/lib/engram_prompts.py","content":"\"\"\"\nengram_prompts.py — System prompts for Engram Observer and Reflector agents.\n\nBoth prompts produce structured, bilingual (EN/ZH), priority-annotated\nobservation logs compatible with the claw-compactor pipeline.\n\nPart of claw-compactor / Engram layer. License: MIT.\n\"\"\"\n\nfrom __future__ import annotations\n\n# ---------------------------------------------------------------------------\n# Observer system prompt\n# ---------------------------------------------------------------------------\n\nOBSERVER_SYSTEM_PROMPT = \"\"\"\\\nYou are the **Observer Agent** — a specialist in compressing raw conversation \\\nhistory into a structured, high-signal observation log.\n\n## Your Mission\nTransform a batch of raw messages into a concise, structured observation log \\\nthat preserves all important information while drastically reducing token count.\n\n## Output Format\n\nProduce observations grouped by date, using the following exact structure:\n\nDate: YYYY-MM-DD\n- 🔴 HH:MM \u003ccritical observation — user goals, deadlines, blockers, key decisions>\n - 🔴 HH:MM \u003csub-point — equally critical detail>\n - 🟡 HH:MM \u003csub-point — important context>\n - 🟢 HH:MM \u003csub-point — useful but lower priority>\n- 🟡 HH:MM \u003cimportant observation — technical details, preferences, plans>\n- 🟢 HH:MM \u003cuseful observation — background info, mentions, soft context>\n\n## Priority Legend\n- 🔴 **Critical** — user goals, hard deadlines, blocking issues, key decisions, \\\nimportant user preferences\n- 🟡 **Important** — technical details, ongoing work, significant context, \\\ntool outputs summary, preferences\n- 🟢 **Useful** — background information, mentions, soft context, \\\nnon-blocking observations\n\n## Timestamp Rules\n- Use the actual timestamps from the conversation when available.\n- If no timestamp is present, use an approximate relative position (e.g., 00:01, 00:02…).\n- Each observation entry must have exactly ONE timestamp on the same line.\n\n## Three-Date Model\n- **Observation date**: The date the Observer is running (today).\n- **Referenced date**: The date the events actually occurred (from message timestamps).\n- **Relative date**: How far back the events are (e.g., \"yesterday\", \"3 days ago\").\nUse the referenced date in entries, not the observation date, unless they are the same.\n\n## Compression Targets\n- Plain text conversations: achieve **3–6× compression**\n- Tool call outputs / code blocks: achieve **5–40× compression** \\\n(summarise results, not raw output)\n- **Never** omit critical information (🔴); minimise 🟢 freely.\n\n## Language\n- Write observations in **both Chinese and English** when the conversation is \\\nbilingual, or match the dominant language of the conversation.\n- Technical terms, proper nouns, and code identifiers: keep in original language.\n\n## Important Rules\n1. Preserve ALL 🔴 critical items — no exceptions.\n2. Merge closely related consecutive items into one entry with sub-bullets.\n3. For tool outputs: summarise the outcome, not the raw data.\n4. For code blocks: note what the code does / its result, not the full code \\\n(unless it's a critical snippet ≤5 lines).\n5. Dates come from the messages; if ambiguous use today's date.\n6. Output ONLY the observation log — no preamble, no explanation, no markdown fences.\n\n## Example Output\nDate: 2026-03-05\n- 🔴 12:10 User is building OpenCompress project; deadline within one week / \\\n用户在构建 OpenCompress 项目,deadline 一周内\n - 🔴 12:10 Using ModernBERT-large for inference / 使用 ModernBERT-large 做推理\n - 🟡 12:12 Discussed training data annotation strategy / 讨论了训练数据标注策略\n - 🟢 12:15 Mentioned benchmark results are promising / 提到 benchmark 结果不错\n- 🟡 12:30 Switched to discussing deployment pipeline on M3 Ultra\n- 🟢 12:45 User prefers concise, structured replies\n\"\"\"\n\n# ---------------------------------------------------------------------------\n# Reflector system prompt\n# ---------------------------------------------------------------------------\n\nREFLECTOR_SYSTEM_PROMPT = \"\"\"\\\nYou are the **Reflector Agent** — a specialist in distilling and compressing \\\nan accumulated observation log into a tighter, pattern-aware reflection log.\n\n## Your Mission\nTake a large observation log (previously produced by the Observer Agent) and:\n1. **Merge** related entries across dates into unified threads.\n2. **Promote** recurring patterns and long-term context to the top.\n3. **Prune** outdated or superseded information.\n4. **Preserve** all 🔴 critical items — never drop them.\n\n## Output Format\n\nProduce a two-section reflection log:\n\n## Persistent Context (long-term patterns & facts)\n- 🔴 \u003cfact/pattern that spans multiple sessions or is permanently relevant>\n- 🟡 \u003crecurring theme or preference>\n- 🟢 \u003cbackground context>\n\n## Recent Events (chronological, compressed)\nDate: YYYY-MM-DD\n- 🔴 HH:MM \u003ccritical event>\n - 🟡 HH:MM \u003cimportant sub-detail>\n- 🟡 HH:MM \u003cimportant event>\n- 🟢 HH:MM \u003cuseful event>\n\nDate: YYYY-MM-DD\n...\n\n## Priority Legend (same as Observer)\n- 🔴 **Critical** — user goals, hard deadlines, blocking issues, key decisions\n- 🟡 **Important** — technical details, ongoing work, significant context\n- 🟢 **Useful** — background, mentions, soft context\n\n## Reflection Rules\n1. **Never drop 🔴 items** — consolidate if possible, never delete.\n2. **Merge related items**: If the same topic appears across multiple dates, \\\nmerge into a single \"Persistent Context\" entry with a note like \\\n\"(repeated across 3 sessions, last: 2026-03-05)\".\n3. **Mark superseded info**: If a later entry contradicts an earlier one, \\\nkeep only the latest and note it was updated.\n4. **Identify patterns**: If a user repeatedly asks about the same topic, \\\nnote it as a persistent interest.\n5. **Prune freely**: 🟢 items older than 7 days that are not referenced again \\\ncan be dropped. 🟡 items older than 30 days that are not part of a pattern \\\ncan be condensed to a one-liner.\n6. **Keep event structure**: Do NOT collapse everything into a blob summary. \\\nThe output must remain scannable and structured.\n\n## Compression Target\n- Achieve **2–4× compression** over the input observation log while retaining \\\nall 🔴 items and key 🟡 items.\n\n## Language\n- Match the language style of the input (bilingual if input is bilingual).\n- Technical terms, proper nouns, code identifiers: keep in original language.\n\n## Output\nOutput ONLY the reflection log — no preamble, no explanation, no markdown fences.\n\"\"\"\n\n# ---------------------------------------------------------------------------\n# User-turn templates\n# ---------------------------------------------------------------------------\n\nOBSERVER_USER_TEMPLATE = \"\"\"\\\nPlease observe and compress the following conversation messages into a \\\nstructured observation log.\n\nCurrent date/time: {current_datetime}\n\n--- MESSAGES START ---\n{messages_text}\n--- MESSAGES END ---\n\"\"\"\n\nREFLECTOR_USER_TEMPLATE = \"\"\"\\\nPlease reflect on and compress the following accumulated observation log.\n\nCurrent date/time: {current_datetime}\n\n--- OBSERVATIONS START ---\n{observations_text}\n--- OBSERVATIONS END ---\n\"\"\"\n","content_type":"text/x-python; charset=utf-8","language":"python","size":7176,"content_sha256":"bd296836393694e1ab2eb29fc5538b74b23dd4d5751aa1c4facd3f0f50238807"},{"filename":"scripts/lib/engram_storage.py","content":"\"\"\"\nengram_storage.py — File-system storage backend for Engram (Observational Memory).\n\nLayout under base_path/memory/engram/{thread_id}/:\n observations.md — append-only observation log (Observer output)\n reflections.md — latest reflection (Reflector output, overwritten each run)\n pending.jsonl — raw pending messages not yet observed (JSONL, append-only)\n meta.json — per-thread statistics and timestamps\n\nAll writes use atomic rename (tempfile + os.replace) to avoid partial-write\ncorruption even on crash.\n\nPart of claw-compactor / Engram layer. License: MIT.\n\"\"\"\n\nfrom __future__ import annotations\n\nimport json\nimport os\nimport tempfile\nfrom datetime import datetime, timezone\nfrom pathlib import Path\nfrom typing import List, Optional\n\n\nclass EngramStorage:\n \"\"\"\n File-system storage for Engram's three-layer memory.\n\n Args:\n base_path: Workspace root directory. Engram data lives at\n ``{base_path}/memory/engram/{thread_id}/``.\n \"\"\"\n\n def __init__(self, base_path: Path) -> None:\n self.base_path = Path(base_path)\n\n # ------------------------------------------------------------------\n # Path helpers\n # ------------------------------------------------------------------\n\n def _thread_dir(self, thread_id: str) -> Path:\n \"\"\"Return (and create) the directory for a thread.\"\"\"\n if not thread_id or \"/\" in thread_id or \"\\\\\" in thread_id or \"..\" in thread_id:\n raise ValueError(f\"Invalid thread_id: {thread_id!r}\")\n d = self.base_path / \"memory\" / \"engram\" / thread_id\n d.mkdir(parents=True, exist_ok=True)\n return d\n\n def _obs_path(self, thread_id: str) -> Path:\n return self._thread_dir(thread_id) / \"observations.md\"\n\n def _ref_path(self, thread_id: str) -> Path:\n return self._thread_dir(thread_id) / \"reflections.md\"\n\n def _pending_path(self, thread_id: str) -> Path:\n return self._thread_dir(thread_id) / \"pending.jsonl\"\n\n def _meta_path(self, thread_id: str) -> Path:\n return self._thread_dir(thread_id) / \"meta.json\"\n\n # ------------------------------------------------------------------\n # Observations (append-only Markdown)\n # ------------------------------------------------------------------\n\n def append_observation(\n self,\n thread_id: str,\n observation: str,\n timestamp: Optional[str] = None,\n ) -> None:\n \"\"\"\n Append a new observation block to the thread's observation log.\n\n A separator header is prepended so multiple Observer runs are\n distinguishable.\n\n Args:\n thread_id: Thread identifier.\n observation: Observation text from the Observer LLM.\n timestamp: Optional ISO timestamp; defaults to UTC now.\n \"\"\"\n ts = timestamp or _now_utc()\n header = f\"\\n\u003c!-- observed_at: {ts} -->\\n\"\n content = header + observation.strip() + \"\\n\"\n\n path = self._obs_path(thread_id)\n with path.open(\"a\", encoding=\"utf-8\") as f:\n f.write(content)\n\n self._update_meta(thread_id, last_observed_at=ts)\n\n def read_observations(self, thread_id: str) -> str:\n \"\"\"Read the full observation log for a thread (empty string if none).\"\"\"\n path = self._obs_path(thread_id)\n if not path.exists():\n return \"\"\n return path.read_text(encoding=\"utf-8\")\n\n # ------------------------------------------------------------------\n # Reflections (overwrite each run)\n # ------------------------------------------------------------------\n\n def write_reflection(\n self,\n thread_id: str,\n reflection: str,\n timestamp: Optional[str] = None,\n ) -> None:\n \"\"\"\n Write (overwrite) the reflection for a thread using atomic rename.\n\n Args:\n thread_id: Thread identifier.\n reflection: Reflection text from the Reflector LLM.\n timestamp: Optional ISO timestamp; defaults to UTC now.\n \"\"\"\n ts = timestamp or _now_utc()\n header = f\"\u003c!-- reflected_at: {ts} -->\\n\"\n content = header + reflection.strip() + \"\\n\"\n\n path = self._ref_path(thread_id)\n _atomic_write(path, content)\n self._update_meta(thread_id, last_reflected_at=ts)\n\n def read_reflection(self, thread_id: str) -> str:\n \"\"\"Read the latest reflection for a thread (empty string if none).\"\"\"\n path = self._ref_path(thread_id)\n if not path.exists():\n return \"\"\n return path.read_text(encoding=\"utf-8\")\n\n # ------------------------------------------------------------------\n # Pending messages (JSONL, append-only)\n # ------------------------------------------------------------------\n\n def append_message(self, thread_id: str, message: dict) -> None:\n \"\"\"\n Append a raw message dict to the pending queue.\n\n Args:\n thread_id: Thread identifier.\n message: Dict with at least ``\"role\"`` and ``\"content\"``.\n \"\"\"\n path = self._pending_path(thread_id)\n with path.open(\"a\", encoding=\"utf-8\") as f:\n f.write(json.dumps(message, ensure_ascii=False) + \"\\n\")\n\n def read_pending(self, thread_id: str) -> List[dict]:\n \"\"\"\n Read all pending messages for a thread.\n\n Returns:\n List of message dicts in append order.\n \"\"\"\n path = self._pending_path(thread_id)\n if not path.exists():\n return []\n messages: List[dict] = []\n with path.open(\"r\", encoding=\"utf-8\") as f:\n for line in f:\n line = line.strip()\n if line:\n try:\n messages.append(json.loads(line))\n except json.JSONDecodeError:\n pass # skip corrupted lines\n return messages\n\n def clear_pending(self, thread_id: str) -> None:\n \"\"\"\n Truncate the pending queue (called after a successful observe run).\n\n Args:\n thread_id: Thread identifier.\n \"\"\"\n path = self._pending_path(thread_id)\n if path.exists():\n path.write_text(\"\", encoding=\"utf-8\")\n self._update_meta(thread_id, pending_count=0)\n\n def pending_count(self, thread_id: str) -> int:\n \"\"\"Return the number of pending messages (counts lines without parsing).\"\"\"\n path = self._pending_path(thread_id)\n if not path.exists():\n return 0\n count = 0\n with path.open(\"r\", encoding=\"utf-8\") as f:\n for line in f:\n if line.strip():\n count += 1\n return count\n\n # ------------------------------------------------------------------\n # Metadata\n # ------------------------------------------------------------------\n\n def read_meta(self, thread_id: str) -> dict:\n \"\"\"\n Read thread metadata.\n\n Returns:\n Metadata dict, or a minimal default dict if none exists.\n \"\"\"\n path = self._meta_path(thread_id)\n if not path.exists():\n return {\"thread_id\": thread_id, \"created_at\": None}\n try:\n return json.loads(path.read_text(encoding=\"utf-8\"))\n except (json.JSONDecodeError, OSError):\n return {\"thread_id\": thread_id}\n\n def _update_meta(self, thread_id: str, **kwargs: object) -> None:\n \"\"\"Merge *kwargs* into thread metadata and persist atomically.\"\"\"\n meta = self.read_meta(thread_id)\n if not meta.get(\"created_at\"):\n meta[\"created_at\"] = datetime.now(timezone.utc).isoformat()\n meta[\"thread_id\"] = thread_id\n meta.update(kwargs)\n meta[\"updated_at\"] = datetime.now(timezone.utc).isoformat()\n _atomic_write(\n self._meta_path(thread_id),\n json.dumps(meta, ensure_ascii=False, indent=2),\n )\n\n # ------------------------------------------------------------------\n # Thread discovery\n # ------------------------------------------------------------------\n\n def list_threads(self) -> List[str]:\n \"\"\"Return sorted list of all known thread IDs.\"\"\"\n engram_dir = self.base_path / \"memory\" / \"engram\"\n if not engram_dir.exists():\n return []\n return sorted(\n d.name\n for d in engram_dir.iterdir()\n if d.is_dir() and (d / \"meta.json\").exists()\n )\n\n\n# ---------------------------------------------------------------------------\n# Helpers\n# ---------------------------------------------------------------------------\n\ndef _now_utc() -> str:\n return datetime.now(timezone.utc).strftime(\"%Y-%m-%d %H:%M:%S UTC\")\n\n\ndef _atomic_write(path: Path, content: str) -> None:\n \"\"\"Write *content* to *path* atomically via tempfile + os.replace.\"\"\"\n dir_ = path.parent\n dir_.mkdir(parents=True, exist_ok=True)\n fd, tmp_path = tempfile.mkstemp(dir=dir_, prefix=\".tmp_engram_\")\n try:\n with os.fdopen(fd, \"w\", encoding=\"utf-8\") as f:\n f.write(content)\n os.replace(tmp_path, path)\n except Exception:\n try:\n os.unlink(tmp_path)\n except OSError:\n pass\n raise\n","content_type":"text/x-python; charset=utf-8","language":"python","size":9269,"content_sha256":"952fc5e63d91539e73f65baf095ba0e26858b2b082773c504b42b1c850dbf9de"},{"filename":"scripts/lib/engram_utils.py","content":"\"\"\"\nengram_utils.py — Utility functions for Engram message processing.\n\nPart of claw-compactor / Engram layer. License: MIT.\n\"\"\"\n\nfrom __future__ import annotations\n\nimport json\nfrom datetime import datetime, timezone\nfrom typing import List\n\nfrom claw_compactor.tokens import estimate_tokens\n\n\ndef now_utc() -> str:\n \"\"\"Return current UTC timestamp as a formatted string.\"\"\"\n return datetime.now(timezone.utc).strftime(\"%Y-%m-%d %H:%M:%S UTC\")\n\n\ndef count_messages_tokens(messages: List[dict]) -> int:\n \"\"\"Estimate token count for a list of message dicts.\"\"\"\n total = 0\n for msg in messages:\n content = msg.get(\"content\", \"\")\n if isinstance(content, list):\n for block in content:\n if isinstance(block, dict):\n total += estimate_tokens(block.get(\"text\", \"\"))\n total += estimate_tokens(str(block.get(\"input\", \"\")))\n else:\n total += estimate_tokens(str(content))\n total += 4 # per-message overhead\n return total\n\n\ndef messages_to_text(messages: List[dict]) -> str:\n \"\"\"Serialise a list of message dicts into a human-readable text block.\"\"\"\n lines: List[str] = []\n for i, msg in enumerate(messages):\n role = msg.get(\"role\", \"unknown\").upper()\n ts = msg.get(\"timestamp\", \"\")\n ts_str = f\" [{ts}]\" if ts else \"\"\n content = msg.get(\"content\", \"\")\n\n if isinstance(content, list):\n parts: List[str] = []\n for block in content:\n if isinstance(block, dict):\n btype = block.get(\"type\", \"\")\n if btype == \"text\":\n parts.append(block.get(\"text\", \"\"))\n elif btype == \"tool_use\":\n parts.append(\n f\"[tool_call: {block.get('name')} \"\n f\"input={json.dumps(block.get('input', {}), ensure_ascii=False)[:200]}]\"\n )\n elif btype == \"tool_result\":\n raw = block.get(\"content\", \"\")\n if isinstance(raw, list):\n raw = \" \".join(\n b.get(\"text\", \"\") for b in raw if isinstance(b, dict)\n )\n parts.append(f\"[tool_result: {str(raw)[:500]}]\")\n else:\n parts.append(str(block))\n content_str = \"\\n\".join(parts)\n else:\n content_str = str(content)\n\n lines.append(f\"[{i + 1}] {role}{ts_str}:\\n{content_str}\\n\")\n\n return \"\\n\".join(lines)\n","content_type":"text/x-python; charset=utf-8","language":"python","size":2633,"content_sha256":"0192c034c8808617dfaa1d3335cb63605c4736016c536e589dac696cdf30f412"},{"filename":"scripts/lib/engram.py","content":"\"\"\"\nengram.py — EngramEngine: LLM-driven Observational Memory for claw-compactor.\n\nArchitecture (Layer 6 — sits on top of the 5 deterministic layers):\n\n Layer 1 — Rule engine (compress_memory.py)\n Layer 2 — Dictionary (dictionary_compress.py)\n Layer 3 — Observation (observation_compressor.py) ← rule-based\n Layer 4 — RLE patterns (lib/rle.py)\n Layer 5 — CCP (lib/tokenizer_optimizer.py)\n ──────────────────────────────────────────────────────\n Layer 6 — Engram (THIS) ← LLM-driven, real-time\n\nEngramEngine maintains three memory layers per thread:\n • pending.jsonl — raw un-observed messages\n • observations.md — Observer-compressed event log (append-only)\n • reflections.md — Reflector-distilled long-term context\n\nTwo LLM agents run automatically when token thresholds are exceeded:\n • Observer : pending messages → structured observation log\n • Reflector : accumulated obs → compressed long-term reflection\n\nZero required dependencies: Python 3.9+.\nOptional: httpx (faster HTTP), tiktoken (exact token counts).\n\nPart of claw-compactor / Engram layer. License: MIT.\n\"\"\"\n\nfrom __future__ import annotations\n\nimport logging\nimport os\nfrom datetime import datetime, timezone\nfrom pathlib import Path\nfrom typing import Any, Dict, List, Optional\n\nfrom claw_compactor.tokens import estimate_tokens\nfrom claw_compactor.engram_storage import EngramStorage\nfrom claw_compactor.engram_prompts import (\n OBSERVER_SYSTEM_PROMPT,\n REFLECTOR_SYSTEM_PROMPT,\n OBSERVER_USER_TEMPLATE,\n REFLECTOR_USER_TEMPLATE,\n)\nfrom claw_compactor.engram_llm import EngramLLMClient\nfrom claw_compactor.engram_utils import (\n now_utc,\n count_messages_tokens,\n messages_to_text,\n)\n\nlogger = logging.getLogger(__name__)\n\n# ---------------------------------------------------------------------------\n# Default configuration\n# ---------------------------------------------------------------------------\n\nDEFAULT_OBSERVER_THRESHOLD = 30_000 # tokens — pending messages before observe\nDEFAULT_REFLECTOR_THRESHOLD = 40_000 # tokens — accumulated obs before reflect\nDEFAULT_MODEL_ANTHROPIC = \"claude-opus-4-5\"\nDEFAULT_MODEL_OPENAI = \"gpt-4o\"\nDEFAULT_MAX_TOKENS = 4096\n\nMAX_OBSERVER_INPUT_TOKENS = 80_000 # max tokens per Observer LLM call\nMAX_REFLECTOR_INPUT_TOKENS = 80_000 # max tokens per Reflector LLM call\n\n\n# ---------------------------------------------------------------------------\n# EngramEngine\n# ---------------------------------------------------------------------------\n\nclass EngramEngine:\n \"\"\"\n Real-time, LLM-driven Observational Memory engine.\n\n Usage::\n\n engine = EngramEngine(workspace_path=\"/path/to/workspace\")\n engine.add_message(\"thread-1\", role=\"user\", content=\"Hello!\")\n engine.add_message(\"thread-1\", role=\"assistant\", content=\"Hi!\")\n ctx_str = engine.build_system_context(\"thread-1\")\n engine.observe(\"thread-1\")\n engine.reflect(\"thread-1\")\n\n Args:\n workspace_path: Workspace root (data stored at {workspace}/memory/engram/).\n observer_threshold: Token count of pending messages that triggers Observer.\n reflector_threshold: Token count of accumulated observations that triggers Reflector.\n model: LLM model identifier (auto-detected per provider).\n max_tokens: Max tokens the LLM may produce per call.\n anthropic_api_key: Anthropic API key (falls back to ANTHROPIC_API_KEY env).\n openai_api_key: OpenAI API key (falls back to OPENAI_API_KEY env).\n openai_base_url: OpenAI-compatible base URL (default: official OpenAI).\n config: Raw dict to override any of the above.\n \"\"\"\n\n def __init__(\n self,\n workspace_path: str | Path,\n observer_threshold: int = DEFAULT_OBSERVER_THRESHOLD,\n reflector_threshold: int = DEFAULT_REFLECTOR_THRESHOLD,\n model: Optional[str] = None,\n max_tokens: int = DEFAULT_MAX_TOKENS,\n anthropic_api_key: Optional[str] = None,\n openai_api_key: Optional[str] = None,\n openai_base_url: Optional[str] = None,\n config: Optional[Dict[str, Any]] = None,\n ) -> None:\n cfg = config or {}\n\n self.observer_threshold = cfg.get(\"observer_threshold\", observer_threshold)\n self.reflector_threshold = cfg.get(\"reflector_threshold\", reflector_threshold)\n\n # API keys — explicit args > config dict > env vars\n _anthropic_key = (\n anthropic_api_key\n or cfg.get(\"anthropic_api_key\")\n or os.environ.get(\"ANTHROPIC_API_KEY\", \"\")\n )\n _openai_key = (\n openai_api_key\n or cfg.get(\"openai_api_key\")\n or os.environ.get(\"OPENAI_API_KEY\", \"\")\n )\n _openai_base = (\n openai_base_url\n or cfg.get(\"openai_base_url\")\n or os.environ.get(\"OPENAI_BASE_URL\", \"https://api.openai.com\")\n )\n\n # Model selection (explicit arg > config > ENGRAM_MODEL env > provider default)\n _env_model = os.environ.get(\"ENGRAM_MODEL\", \"\")\n _max_tokens = cfg.get(\"max_tokens\", max_tokens)\n if model:\n _model = model\n elif cfg.get(\"model\"):\n _model = cfg[\"model\"]\n elif _env_model:\n _model = _env_model\n elif _anthropic_key:\n _model = cfg.get(\"anthropic_model\", DEFAULT_MODEL_ANTHROPIC)\n else:\n _model = cfg.get(\"openai_model\", DEFAULT_MODEL_OPENAI)\n\n self.llm = EngramLLMClient(\n model=_model,\n max_tokens=_max_tokens,\n anthropic_api_key=_anthropic_key,\n openai_api_key=_openai_key,\n openai_base_url=_openai_base,\n )\n self.storage = EngramStorage(Path(workspace_path))\n\n if not _anthropic_key and not _openai_key:\n logger.warning(\n \"EngramEngine: no API key configured. \"\n \"Set ANTHROPIC_API_KEY or OPENAI_API_KEY to enable LLM compression.\"\n )\n\n # ------------------------------------------------------------------\n # Public API\n # ------------------------------------------------------------------\n\n def add_message(\n self,\n thread_id: str,\n role: str,\n content: str,\n timestamp: Optional[str] = None,\n auto_observe: bool = True,\n ) -> Dict[str, Any]:\n \"\"\"Add a message to the thread and auto-trigger observe/reflect if needed.\"\"\"\n ts = timestamp or datetime.now(timezone.utc).strftime(\"%Y-%m-%d %H:%M\")\n message = {\"role\": role, \"content\": content, \"timestamp\": ts}\n self.storage.append_message(thread_id, message)\n\n if not auto_observe:\n return {\n \"observed\": False,\n \"reflected\": False,\n \"pending_tokens\": 0,\n \"observation_tokens\": 0,\n \"error\": None,\n }\n\n return self._check_thresholds(thread_id)\n\n def _check_thresholds(self, thread_id: str) -> Dict[str, Any]:\n \"\"\"Check Observer and Reflector thresholds and trigger as needed.\"\"\"\n status: Dict[str, Any] = {\n \"observed\": False,\n \"reflected\": False,\n \"pending_tokens\": 0,\n \"observation_tokens\": 0,\n \"error\": None,\n }\n\n pending = self.storage.read_pending(thread_id)\n pending_tokens = count_messages_tokens(pending)\n status[\"pending_tokens\"] = pending_tokens\n\n if pending_tokens >= self.observer_threshold:\n logger.info(\n \"Engram: Observer triggered (thread=%s, pending_tokens=%d >= %d)\",\n thread_id, pending_tokens, self.observer_threshold,\n )\n try:\n self._run_observer(thread_id, pending)\n status[\"observed\"] = True\n except Exception as exc:\n logger.error(\"Engram: Observer failed: %s\", exc)\n status[\"error\"] = str(exc)\n\n obs_text = self.storage.read_observations(thread_id)\n obs_tokens = estimate_tokens(obs_text)\n status[\"observation_tokens\"] = obs_tokens\n\n if obs_tokens >= self.reflector_threshold:\n logger.info(\n \"Engram: Reflector triggered (thread=%s, obs_tokens=%d >= %d)\",\n thread_id, obs_tokens, self.reflector_threshold,\n )\n try:\n self._run_reflector(thread_id, obs_text)\n status[\"reflected\"] = True\n except Exception as exc:\n logger.error(\"Engram: Reflector failed: %s\", exc)\n if status[\"error\"]:\n status[\"error\"] += \"; \" + str(exc)\n else:\n status[\"error\"] = str(exc)\n\n return status\n\n def batch_ingest(\n self,\n thread_id: str,\n messages: List[Dict[str, Any]],\n batch_size: int = 500,\n ) -> Dict[str, Any]:\n \"\"\"Bulk-write messages then check thresholds once at the end.\"\"\"\n for msg in messages:\n self.add_message(\n thread_id,\n msg[\"role\"],\n msg[\"content\"],\n msg.get(\"timestamp\"),\n auto_observe=False,\n )\n return self._check_thresholds(thread_id)\n\n def observe(self, thread_id: str) -> Optional[str]:\n \"\"\"Manually trigger the Observer for a thread regardless of thresholds.\"\"\"\n pending = self.storage.read_pending(thread_id)\n if not pending:\n logger.info(\"Engram observe: no pending messages for thread=%s\", thread_id)\n return None\n return self._run_observer(thread_id, pending)\n\n def reflect(self, thread_id: str) -> Optional[str]:\n \"\"\"Manually trigger the Reflector for a thread regardless of thresholds.\"\"\"\n obs_text = self.storage.read_observations(thread_id)\n if not obs_text.strip():\n logger.info(\"Engram reflect: no observations for thread=%s\", thread_id)\n return None\n return self._run_reflector(thread_id, obs_text)\n\n def get_context(self, thread_id: str) -> Dict[str, Any]:\n \"\"\"Return the full three-layer memory context for a thread.\"\"\"\n observations = self.storage.read_observations(thread_id)\n reflection = self.storage.read_reflection(thread_id)\n recent_messages = self.storage.read_pending(thread_id)\n meta = self.storage.read_meta(thread_id)\n\n obs_tokens = estimate_tokens(observations)\n ref_tokens = estimate_tokens(reflection)\n pending_tokens = count_messages_tokens(recent_messages)\n\n return {\n \"thread_id\": thread_id,\n \"observations\": observations,\n \"reflection\": reflection,\n \"recent_messages\": recent_messages,\n \"stats\": {\n \"observation_tokens\": obs_tokens,\n \"reflection_tokens\": ref_tokens,\n \"pending_tokens\": pending_tokens,\n \"total_tokens\": obs_tokens + ref_tokens + pending_tokens,\n \"pending_count\": len(recent_messages),\n },\n \"meta\": meta,\n }\n\n def build_system_context(self, thread_id: str) -> str:\n \"\"\"Build a compact, injectable system-context string for this thread.\"\"\"\n ctx = self.get_context(thread_id)\n parts: List[str] = []\n\n if ctx[\"reflection\"].strip():\n parts.append(\"## Long-Term Memory (Reflections)\\n\" + ctx[\"reflection\"])\n\n if ctx[\"observations\"].strip():\n obs_lines = ctx[\"observations\"].splitlines()\n if len(obs_lines) > 200:\n obs_lines = obs_lines[-200:]\n parts.append(\"## Recent Observations\\n\" + \"\\n\".join(obs_lines))\n\n if not parts:\n return \"\"\n\n total = ctx[\"stats\"][\"total_tokens\"]\n parts.append(f\"\\n\u003c!-- engram_tokens: {total} -->\")\n return \"\\n\\n\".join(parts)\n\n # ------------------------------------------------------------------\n # Internal helpers\n # ------------------------------------------------------------------\n\n def _run_observer(self, thread_id: str, messages: List[dict]) -> str:\n \"\"\"Run Observer LLM, persist result, clear pending queue.\"\"\"\n total_tokens = count_messages_tokens(messages)\n\n if total_tokens \u003c= MAX_OBSERVER_INPUT_TOKENS:\n observation = self._llm_observe(messages)\n ts = now_utc()\n self.storage.append_observation(thread_id, observation, timestamp=ts)\n self.storage.clear_pending(thread_id)\n logger.debug(\n \"Engram: Observer done (thread=%s, chars=%d)\", thread_id, len(observation)\n )\n return observation\n\n # Batch path — split messages into chunks\n logger.info(\n \"Engram: Observer batching (thread=%s, total_tokens=%d, max=%d)\",\n thread_id, total_tokens, MAX_OBSERVER_INPUT_TOKENS,\n )\n\n all_observations: List[str] = []\n batch_start = 0\n\n while batch_start \u003c len(messages):\n batch: List[dict] = []\n batch_tokens = 0\n next_start = batch_start\n\n for i in range(batch_start, len(messages)):\n msg = messages[i]\n msg_tokens = count_messages_tokens([msg])\n if batch_tokens + msg_tokens > MAX_OBSERVER_INPUT_TOKENS and batch:\n break\n batch.append(msg)\n batch_tokens += msg_tokens\n next_start = i + 1\n\n if not batch:\n batch = [messages[batch_start]]\n next_start = batch_start + 1\n\n logger.info(\n \"Engram: Observer batch %d (thread=%s, msgs=%d, tokens=%d)\",\n len(all_observations) + 1, thread_id, len(batch), batch_tokens,\n )\n\n observation = self._llm_observe(batch)\n all_observations.append(observation)\n batch_start = next_start\n\n combined = \"\\n\\n---\\n\\n\".join(all_observations)\n ts = now_utc()\n self.storage.append_observation(thread_id, combined, timestamp=ts)\n self.storage.clear_pending(thread_id)\n\n logger.debug(\n \"Engram: Observer done (thread=%s, batches=%d, chars=%d)\",\n thread_id, len(all_observations), len(combined),\n )\n return combined\n\n def _run_reflector(self, thread_id: str, observations: str) -> str:\n \"\"\"Run Reflector LLM, persist result (overwrites previous reflection).\"\"\"\n obs_tokens = estimate_tokens(observations)\n\n if obs_tokens > MAX_REFLECTOR_INPUT_TOKENS:\n lines = observations.splitlines()\n truncated: List[str] = []\n running_tokens = 0\n for line in reversed(lines):\n line_tokens = estimate_tokens(line) + 1 # +1 for newline join token\n if running_tokens + line_tokens > MAX_REFLECTOR_INPUT_TOKENS:\n break\n truncated.append(line)\n running_tokens += line_tokens\n observations = \"\\n\".join(reversed(truncated))\n logger.info(\n \"Engram: Reflector input truncated (thread=%s, %d -> %d tokens)\",\n thread_id, obs_tokens, running_tokens,\n )\n\n reflection = self._llm_reflect(observations)\n ts = now_utc()\n self.storage.write_reflection(thread_id, reflection, timestamp=ts)\n logger.debug(\n \"Engram: Reflector done (thread=%s, chars=%d)\", thread_id, len(reflection)\n )\n return reflection\n\n def _llm_observe(self, messages: List[dict]) -> str:\n \"\"\"Format messages and call the Observer LLM.\"\"\"\n text = messages_to_text(messages)\n current_dt = datetime.now(timezone.utc).strftime(\"%Y-%m-%d %H:%M UTC\")\n user_content = OBSERVER_USER_TEMPLATE.format(\n current_datetime=current_dt,\n messages_text=text,\n )\n return self.llm.call(OBSERVER_SYSTEM_PROMPT, user_content)\n\n def _llm_reflect(self, observations: str) -> str:\n \"\"\"Format observations and call the Reflector LLM.\"\"\"\n current_dt = datetime.now(timezone.utc).strftime(\"%Y-%m-%d %H:%M UTC\")\n user_content = REFLECTOR_USER_TEMPLATE.format(\n current_datetime=current_dt,\n observations_text=observations,\n )\n return self.llm.call(REFLECTOR_SYSTEM_PROMPT, user_content)\n","content_type":"text/x-python; charset=utf-8","language":"python","size":16669,"content_sha256":"dd908fbd78d3e4e17547c8c8b7ff55b395d611c478de421b2ced6f7930e41f24"},{"filename":"scripts/lib/engram/http.py","content":"\"\"\"\nengram/http.py — HTTP POST helper with retry logic for LLM API calls.\n\nSupports httpx (preferred) with stdlib urllib fallback.\nRetries transient errors with exponential back-off.\n\"\"\"\n\nfrom __future__ import annotations\n\nimport json\nimport logging\nimport time\nimport urllib.request\nimport urllib.error\nfrom typing import Optional\n\nlogger = logging.getLogger(__name__)\n\n# ---------------------------------------------------------------------------\n# Optional httpx import\n# ---------------------------------------------------------------------------\ntry:\n import httpx as _httpx\n _HTTPX_AVAILABLE = True\nexcept ImportError:\n _httpx = None # type: ignore[assignment]\n _HTTPX_AVAILABLE = False\n\n\n# HTTP status codes that should not be retried (client errors)\n_NO_RETRY_CODES = {400, 401, 403}\n# HTTP status codes that are transient and worth retrying\n_RETRY_CODES = {429, 500, 502, 503, 504}\n# Exception types that indicate transient network issues\n_RETRY_EXCEPTIONS = (ConnectionError, ConnectionResetError, TimeoutError,\n urllib.error.URLError)\n\n\ndef _http_post(url: str, headers: dict, body: dict, max_retries: int = 3) -> dict:\n \"\"\"\n POST JSON body to *url* and return parsed JSON response.\n\n Retries on transient HTTP errors (429, 500, 502, 503, 504) and network\n exceptions using exponential back-off: 2, 4, 8 seconds between attempts.\n Non-retriable errors (400, 401, 403) are raised immediately.\n\n Args:\n url: Target URL.\n headers: HTTP headers dict.\n body: Request body (will be JSON-serialised).\n max_retries: Maximum number of retry attempts (default 3).\n\n Returns:\n Parsed JSON response dict.\n\n Raises:\n RuntimeError: On non-retriable HTTP errors or after exhausting retries.\n \"\"\"\n payload = json.dumps(body, ensure_ascii=False).encode(\"utf-8\")\n\n if _HTTPX_AVAILABLE and _httpx is not None:\n last_exc: Optional[Exception] = None\n with _httpx.Client(timeout=120.0) as client:\n for attempt in range(max_retries + 1):\n try:\n resp = client.post(url, headers=headers, content=payload)\n if resp.status_code in _NO_RETRY_CODES:\n raise RuntimeError(\n f\"Engram HTTP {resp.status_code} from {url}: {resp.text[:200]}\"\n )\n if resp.status_code in _RETRY_CODES and attempt \u003c max_retries:\n delay = 2 ** (attempt + 1)\n logger.warning(\n \"Engram HTTP %d, retry %d/%d in %ds…\",\n resp.status_code, attempt + 1, max_retries, delay,\n )\n time.sleep(delay)\n last_exc = RuntimeError(\n f\"Engram HTTP {resp.status_code} from {url}\"\n )\n continue\n resp.raise_for_status()\n return resp.json()\n except _RETRY_EXCEPTIONS as exc:\n last_exc = exc\n if attempt \u003c max_retries:\n delay = 2 ** (attempt + 1)\n logger.warning(\n \"Engram network error (%s), retry %d/%d in %ds…\",\n exc, attempt + 1, max_retries, delay,\n )\n time.sleep(delay)\n else:\n raise\n raise last_exc or RuntimeError(f\"Engram: max retries exceeded for {url}\")\n\n # Fallback: stdlib urllib\n last_exc2: Optional[Exception] = None\n for attempt in range(max_retries + 1):\n req = urllib.request.Request(url, data=payload, headers=headers, method=\"POST\")\n try:\n with urllib.request.urlopen(req, timeout=120) as resp:\n raw = resp.read().decode(\"utf-8\")\n return json.loads(raw)\n except urllib.error.HTTPError as exc:\n if exc.code in _NO_RETRY_CODES:\n body_text = exc.read().decode(\"utf-8\", errors=\"replace\")[:200]\n raise RuntimeError(\n f\"Engram HTTP {exc.code} from {url}: {body_text}\"\n ) from exc\n if exc.code in _RETRY_CODES and attempt \u003c max_retries:\n delay = 2 ** (attempt + 1)\n logger.warning(\n \"Engram HTTP %d, retry %d/%d in %ds…\",\n exc.code, attempt + 1, max_retries, delay,\n )\n time.sleep(delay)\n last_exc2 = exc\n continue\n body_text = exc.read().decode(\"utf-8\", errors=\"replace\")[:200]\n raise RuntimeError(\n f\"Engram HTTP {exc.code} from {url}: {body_text}\"\n ) from exc\n except _RETRY_EXCEPTIONS as exc:\n last_exc2 = exc\n if attempt \u003c max_retries:\n delay = 2 ** (attempt + 1)\n logger.warning(\n \"Engram network error (%s), retry %d/%d in %ds…\",\n exc, attempt + 1, max_retries, delay,\n )\n time.sleep(delay)\n else:\n raise\n raise last_exc2 or RuntimeError(f\"Engram: max retries exceeded for {url}\")\n","content_type":"text/x-python; charset=utf-8","language":"python","size":5358,"content_sha256":"5e6e0aca091abe149b93a4f0f69daaaf13c08ad3d0d6f354f51548e854a68251"},{"filename":"scripts/lib/exceptions.py","content":"\"\"\"Custom exceptions for claw-compactor.\n\nPart of claw-compactor. License: MIT.\n\"\"\"\n\n\nclass MemCompressError(Exception):\n \"\"\"Base exception for claw-compactor operations.\"\"\"\n pass\n\n\nclass FileNotFoundError_(MemCompressError):\n \"\"\"Raised when a required file or directory is not found.\"\"\"\n pass\n\n\nclass ParseError(MemCompressError):\n \"\"\"Raised when input cannot be parsed (malformed markdown, JSON, etc.).\"\"\"\n pass\n\n\nclass TokenEstimationError(MemCompressError):\n \"\"\"Raised when token estimation fails.\"\"\"\n pass\n","content_type":"text/x-python; charset=utf-8","language":"python","size":535,"content_sha256":"b4673698011eeac97014a52f9b3f9c55b716741fb4bcf478aa8af92dc56e1718"},{"filename":"scripts/lib/feedback.py","content":"\"\"\"FeedbackLoop: track Rewind retrieval events and auto-adjust compression rates.\n\nPart of claw-compactor. License: MIT.\n\"\"\"\nfrom __future__ import annotations\n\nimport time\nfrom collections import deque\nfrom dataclasses import dataclass\n\n# ---------------------------------------------------------------------------\n# Threshold constants\n# ---------------------------------------------------------------------------\n\n# If retrieval rate for a stage exceeds this fraction, suggest backing off.\n_HIGH_RETRIEVAL_THRESHOLD = 0.3\n\n# Default suggested compression-rate reduction when threshold is exceeded.\n_DEFAULT_REDUCTION = 0.1\n\n\n# ---------------------------------------------------------------------------\n# Data model\n# ---------------------------------------------------------------------------\n\n@dataclass(frozen=True)\nclass RetrievalEvent:\n \"\"\"Immutable record of one Rewind retrieval observation.\"\"\"\n hash_id: str\n stage_name: str # which FusionStage produced this compressed chunk\n compression_ratio: float\n was_retrieved: bool # True if the LLM actually called rewind_retrieve\n timestamp: float # monotonic seconds\n\n\n# ---------------------------------------------------------------------------\n# FeedbackLoop\n# ---------------------------------------------------------------------------\n\nclass FeedbackLoop:\n \"\"\"Track Rewind retrieval events and auto-adjust compression rates.\n\n Maintains a sliding window of the last *window_size* events. When the\n retrieval rate for a stage exceeds the high-retrieval threshold (0.3) the\n loop recommends reducing that stage's compression rate by\n ``_DEFAULT_REDUCTION`` (10 percentage points).\n \"\"\"\n\n def __init__(self, window_size: int = 100) -> None:\n if window_size \u003c 1:\n raise ValueError(f\"window_size must be >= 1, got {window_size}\")\n self._window_size = window_size\n self._events: deque[RetrievalEvent] = deque(maxlen=window_size)\n\n # ------------------------------------------------------------------\n # Public API\n # ------------------------------------------------------------------\n\n def record(self, event: RetrievalEvent) -> None:\n \"\"\"Append *event* to the sliding window.\n\n Once the window is full, the oldest event is automatically evicted.\n \"\"\"\n self._events.append(event)\n\n def retrieval_rate(self, stage_name: str | None = None) -> float:\n \"\"\"Return the fraction of events where ``was_retrieved`` is True.\n\n Args:\n stage_name: When provided, only events for that stage are\n considered. When ``None``, all events are included.\n\n Returns:\n A float in [0.0, 1.0]. Returns 0.0 when there are no matching\n events (avoids division-by-zero).\n \"\"\"\n events = self._filter(stage_name)\n if not events:\n return 0.0\n retrieved_count = sum(1 for e in events if e.was_retrieved)\n return retrieved_count / len(events)\n\n def suggest_adjustments(self) -> dict[str, float]:\n \"\"\"Return per-stage suggested compression-rate reductions.\n\n For each stage whose retrieval rate exceeds the high-retrieval\n threshold (0.3), the suggested reduction is:\n\n suggested_reduction = _DEFAULT_REDUCTION\n * (retrieval_rate / _HIGH_RETRIEVAL_THRESHOLD)\n\n Stages below the threshold are omitted from the result dict.\n\n Returns:\n ``{stage_name: reduction_amount}`` where the reduction is a\n positive float (e.g. 0.1 means \"reduce compression rate by 10%\").\n \"\"\"\n stage_names = {e.stage_name for e in self._events}\n adjustments: dict[str, float] = {}\n for stage in stage_names:\n rate = self.retrieval_rate(stage)\n if rate > _HIGH_RETRIEVAL_THRESHOLD:\n # Scale reduction proportionally to how far over threshold we are\n reduction = _DEFAULT_REDUCTION * (rate / _HIGH_RETRIEVAL_THRESHOLD)\n adjustments[stage] = round(reduction, 6)\n return adjustments\n\n def export_stats(self) -> dict:\n \"\"\"Return summary statistics for monitoring / dashboards.\"\"\"\n stage_names = sorted({e.stage_name for e in self._events})\n per_stage: dict[str, dict] = {}\n for stage in stage_names:\n events = self._filter(stage)\n retrieved = sum(1 for e in events if e.was_retrieved)\n ratios = [e.compression_ratio for e in events]\n per_stage[stage] = {\n \"event_count\": len(events),\n \"retrieved_count\": retrieved,\n \"retrieval_rate\": retrieved / len(events) if events else 0.0,\n \"avg_compression_ratio\": (\n sum(ratios) / len(ratios) if ratios else 0.0\n ),\n }\n\n total_events = len(self._events)\n total_retrieved = sum(1 for e in self._events if e.was_retrieved)\n\n return {\n \"window_size\": self._window_size,\n \"total_events\": total_events,\n \"total_retrieved\": total_retrieved,\n \"overall_retrieval_rate\": (\n total_retrieved / total_events if total_events else 0.0\n ),\n \"per_stage\": per_stage,\n \"adjustments\": self.suggest_adjustments(),\n }\n\n # ------------------------------------------------------------------\n # Internal helpers\n # ------------------------------------------------------------------\n\n def _filter(self, stage_name: str | None) -> list[RetrievalEvent]:\n \"\"\"Return events matching *stage_name*, or all events when None.\"\"\"\n if stage_name is None:\n return list(self._events)\n return [e for e in self._events if e.stage_name == stage_name]\n","content_type":"text/x-python; charset=utf-8","language":"python","size":5876,"content_sha256":"fa0a2f3174cc08cb8e29da455054466cdd6efedb48450a089433956a7e18c2c0"},{"filename":"scripts/lib/fusion/__init__.py","content":"\"\"\"Fusion Pipeline — 14-stage LLM token compression framework.\n\nStages (execution order):\n QuantumLock(3) KV-cache alignment for system prompts\n Cortex(5) Content type + language auto-detection\n Photon(8) Base64 image compression\n RLE(10) Path/IP/enum shorthand encoding\n SemanticDedup(12) SimHash near-duplicate block elimination\n Ionizer(15) JSON array statistical sampling (reversible)\n LogCrunch(16) Build/test log line folding\n SearchCrunch(17) Search result deduplication\n DiffCrunch(18) Git diff context folding\n StructuralCollapse(20) Import merging + repeated pattern collapse\n Neurosyntax(25) AST-aware code compression (tree-sitter)\n Nexus(35) ML token-level compression\n TokenOpt(40) Tokenizer format optimization\n Abbrev(45) Natural language abbreviation (text only)\n\nCore abstractions:\n FusionContext Immutable input snapshot flowing through the pipeline\n FusionResult Immutable output from a single stage\n FusionStage Abstract base: should_apply() + apply()\n FusionPipeline Ordered chain with timing and metrics\n FusionEngine Unified entry point (see engine.py)\n\nPart of claw-compactor v7. License: MIT.\n\"\"\"\nfrom claw_compactor.fusion.base import FusionStage, FusionContext, FusionResult\nfrom claw_compactor.fusion.pipeline import FusionPipeline, FusionPipelineResult\n\n__all__ = [\n \"FusionStage\",\n \"FusionPipeline\",\n \"FusionContext\",\n \"FusionResult\",\n \"FusionPipelineResult\",\n]\n\n# v8: Conversation-level compaction (inspired by Claude Code architecture)\nfrom claw_compactor.fusion.tool_result_budget import budget_tool_results\nfrom claw_compactor.fusion.conversation_summarizer import summarize_conversation\nfrom claw_compactor.fusion.tiered_compaction import (\n CompactionLevel,\n CircuitBreaker,\n FileAccessTracker,\n compact,\n determine_level,\n)\n\n__all__ += [\n \"budget_tool_results\",\n \"summarize_conversation\",\n \"CompactionLevel\",\n \"CircuitBreaker\",\n \"FileAccessTracker\",\n \"compact\",\n \"determine_level\",\n]\n\n# v8.1: Additional Claude Code-inspired features\nfrom claw_compactor.fusion.llm_summarizer import LLMSummarizer\nfrom claw_compactor.fusion.plan_reinjection import PlanTaskTracker\nfrom claw_compactor.fusion.skill_reinjection import SkillSchemaTracker\nfrom claw_compactor.fusion.compact_hooks import HookRegistry, HookPhase\nfrom claw_compactor.fusion.content_stripper import strip_images_and_docs\nfrom claw_compactor.fusion.cache_prefix import CachePrefixManager\n\n__all__ += [\n \"LLMSummarizer\",\n \"PlanTaskTracker\",\n \"SkillSchemaTracker\",\n \"HookRegistry\",\n \"HookPhase\",\n \"strip_images_and_docs\",\n \"CachePrefixManager\",\n]\n","content_type":"text/x-python; charset=utf-8","language":"python","size":2807,"content_sha256":"4e303de89b9b681bf8c186825abf30b323cbc65479204e901f5d3e100da339de"},{"filename":"scripts/lib/fusion/base.py","content":"\"\"\"Fusion stage base classes for the Claw Compactor pipeline.\n\nThis module defines the three core abstractions that all 14 pipeline stages\nbuild upon:\n\n FusionContext Immutable snapshot of the text being compressed, along with\n detected content type, language, role, and metadata. Flows\n forward through the pipeline — each stage receives the\n previous stage's output as a new FusionContext.\n\n FusionResult Immutable output from a single stage: the compressed text,\n token counts, timing, Rewind markers, and optional context\n overrides for downstream stages.\n\n FusionStage Abstract base class. Subclasses implement should_apply()\n (gating) and apply() (compression). The pipeline calls\n timed_apply() which wraps both with timing and skip logic.\n\nDesign invariants:\n - All dataclasses are frozen — no mutation after construction.\n - Stages are stateless functions of (FusionContext -> FusionResult).\n - Stage ordering is declarative (the ``order`` class attribute) and\n resolved by FusionPipeline at construction time.\n\nPart of claw-compactor v7. License: MIT.\n\"\"\"\nfrom __future__ import annotations\nimport time\nfrom abc import ABC, abstractmethod\nfrom dataclasses import dataclass, field, replace\nfrom typing import Any\n\n\n@dataclass(frozen=True)\nclass FusionContext:\n \"\"\"Immutable context passed through the fusion pipeline.\"\"\"\n content: str\n content_type: str = \"text\" # text|code|json|log|diff|search\n language: str | None = None\n role: str = \"user\" # system|user|assistant|tool\n model: str | None = None\n token_budget: int | None = None\n query: str | None = None\n metadata: dict = field(default_factory=dict)\n\n def evolve(self, **kwargs) -> FusionContext:\n \"\"\"Return a new context with specified fields replaced.\"\"\"\n return replace(self, **kwargs)\n\n\n@dataclass(frozen=True)\nclass FusionResult:\n \"\"\"Immutable result from a single fusion stage.\"\"\"\n content: str\n original_tokens: int = 0\n compressed_tokens: int = 0\n markers: list[str] = field(default_factory=list)\n warnings: list[str] = field(default_factory=list)\n timing_ms: float = 0.0\n skipped: bool = False\n # Optional overrides applied to FusionContext after this stage runs.\n # Keys must match FusionContext field names (e.g. content_type, language).\n context_updates: dict[str, Any] = field(default_factory=dict)\n\n\nclass FusionStage(ABC):\n \"\"\"Base class for all compression fusion stages.\"\"\"\n name: str = \"unnamed\"\n order: int = 50 # execution order (lower = earlier)\n\n @abstractmethod\n def should_apply(self, ctx: FusionContext) -> bool:\n \"\"\"Return True if this fusion stage should run on the given context.\"\"\"\n ...\n\n @abstractmethod\n def apply(self, ctx: FusionContext) -> FusionResult:\n \"\"\"Apply the fusion stage and return the result.\"\"\"\n ...\n\n def timed_apply(self, ctx: FusionContext) -> FusionResult:\n \"\"\"Apply with timing. Used by FusionPipeline.\"\"\"\n if not self.should_apply(ctx):\n return FusionResult(content=ctx.content, skipped=True)\n start = time.monotonic()\n result = self.apply(ctx)\n elapsed = (time.monotonic() - start) * 1000\n return FusionResult(\n content=result.content,\n original_tokens=result.original_tokens,\n compressed_tokens=result.compressed_tokens,\n markers=result.markers,\n warnings=result.warnings,\n timing_ms=elapsed,\n skipped=False,\n context_updates=result.context_updates,\n )\n","content_type":"text/x-python; charset=utf-8","language":"python","size":3723,"content_sha256":"92959b686f5a42b16ffae9a14a86272f41ba69869b268e91c9b37939e57990fd"},{"filename":"scripts/lib/fusion/cache_prefix.py","content":"\"\"\"CachePrefix — prompt cache prefix management for compaction.\n\nInspired by Claude Code's cache prefix reuse: the compaction loop and the\nmain conversation loop share a common prompt cache prefix. This avoids\nre-processing the system prompt and early conversation turns that remain\nunchanged after compaction.\n\nThe cache prefix is the longest common prefix of system messages and\nearly conversation turns that hasn't changed between compaction rounds.\n\nUsage::\n\n from claw_compactor.fusion.cache_prefix import CachePrefixManager\n\n manager = CachePrefixManager()\n prefix_info = manager.compute_prefix(messages)\n # Use prefix_info['prefix_hash'] and prefix_info['prefix_tokens']\n # to enable API-level prompt caching.\n\nPart of claw-compactor v8. License: MIT.\n\"\"\"\nfrom __future__ import annotations\n\nimport hashlib\nimport json\nfrom typing import Any, Optional\n\nfrom claw_compactor.tokens import estimate_tokens\n\n\n# Maximum prefix length in tokens.\nMAX_PREFIX_TOKENS = 50_000\n\n\nclass CachePrefixManager:\n \"\"\"Manages prompt cache prefix computation and reuse.\n\n Tracks the stable prefix of a conversation — system messages and early\n turns that don't change between compaction rounds — so that API-level\n prompt caching can skip re-processing them.\n \"\"\"\n\n def __init__(self) -> None:\n self._last_prefix_hash: Optional[str] = None\n self._last_prefix_length: int = 0\n self._cache_hits: int = 0\n self._cache_misses: int = 0\n\n def compute_prefix(\n self,\n messages: list[dict[str, Any]],\n max_tokens: int = MAX_PREFIX_TOKENS,\n ) -> dict[str, Any]:\n \"\"\"Compute the cacheable prefix of a message list.\n\n The prefix includes all leading system messages plus any messages\n that appear before the first user message (or the first N messages\n that fit within max_tokens).\n\n Parameters\n ----------\n messages:\n OpenAI-format message list.\n max_tokens:\n Maximum tokens for the prefix.\n\n Returns\n -------\n dict with:\n prefix_messages — the messages that form the prefix\n prefix_tokens — total tokens in the prefix\n prefix_hash — hash of the prefix content (for cache key)\n prefix_length — number of messages in the prefix\n cache_hit — whether the prefix matches the last computation\n \"\"\"\n prefix_messages: list[dict[str, Any]] = []\n total_tokens = 0\n\n for msg in messages:\n role = msg.get(\"role\", \"\")\n content = msg.get(\"content\", \"\")\n\n # Include system messages and compact_boundary messages.\n if role == \"system\":\n msg_tokens = estimate_tokens(\n content if isinstance(content, str) else str(content)\n )\n if total_tokens + msg_tokens > max_tokens:\n break\n prefix_messages.append(msg)\n total_tokens += msg_tokens\n else:\n # Stop at first non-system message (user/assistant/tool).\n break\n\n # Compute hash for cache key.\n prefix_content = json.dumps(\n [\n {\"role\": m.get(\"role\"), \"content\": m.get(\"content\")}\n for m in prefix_messages\n ],\n sort_keys=True,\n ensure_ascii=False,\n )\n prefix_hash = hashlib.sha256(prefix_content.encode(\"utf-8\")).hexdigest()[:16]\n\n # Check cache hit.\n cache_hit = prefix_hash == self._last_prefix_hash\n if cache_hit:\n self._cache_hits += 1\n else:\n self._cache_misses += 1\n\n self._last_prefix_hash = prefix_hash\n self._last_prefix_length = len(prefix_messages)\n\n return {\n \"prefix_messages\": prefix_messages,\n \"prefix_tokens\": total_tokens,\n \"prefix_hash\": prefix_hash,\n \"prefix_length\": len(prefix_messages),\n \"cache_hit\": cache_hit,\n }\n\n def annotate_messages_for_caching(\n self,\n messages: list[dict[str, Any]],\n max_tokens: int = MAX_PREFIX_TOKENS,\n ) -> list[dict[str, Any]]:\n \"\"\"Annotate messages with cache_control markers for API caching.\n\n Adds ``cache_control: {\"type\": \"ephemeral\"}`` to the last message\n in the stable prefix, following the Anthropic prompt caching format.\n\n Parameters\n ----------\n messages:\n OpenAI-format message list.\n max_tokens:\n Maximum tokens for the prefix.\n\n Returns\n -------\n New message list with cache_control annotations.\n \"\"\"\n prefix_info = self.compute_prefix(messages, max_tokens)\n prefix_length = prefix_info[\"prefix_length\"]\n\n if prefix_length == 0:\n return list(messages)\n\n result = list(messages)\n # Mark the last prefix message with cache_control.\n last_prefix_idx = prefix_length - 1\n msg = dict(result[last_prefix_idx])\n\n # Handle multipart content.\n content = msg.get(\"content\")\n if isinstance(content, str):\n msg[\"content\"] = [\n {\n \"type\": \"text\",\n \"text\": content,\n \"cache_control\": {\"type\": \"ephemeral\"},\n }\n ]\n elif isinstance(content, list):\n # Add cache_control to the last text block.\n new_content = list(content)\n for i in range(len(new_content) - 1, -1, -1):\n if isinstance(new_content[i], dict) and new_content[i].get(\"type\") == \"text\":\n new_content[i] = {\n **new_content[i],\n \"cache_control\": {\"type\": \"ephemeral\"},\n }\n break\n msg[\"content\"] = new_content\n\n result[last_prefix_idx] = msg\n return result\n\n @property\n def stats(self) -> dict[str, Any]:\n \"\"\"Return cache statistics.\"\"\"\n total = self._cache_hits + self._cache_misses\n return {\n \"cache_hits\": self._cache_hits,\n \"cache_misses\": self._cache_misses,\n \"hit_rate\": round(self._cache_hits / total, 3) if total > 0 else 0.0,\n \"last_prefix_hash\": self._last_prefix_hash,\n \"last_prefix_length\": self._last_prefix_length,\n }\n\n def reset(self) -> None:\n \"\"\"Reset cache state.\"\"\"\n self._last_prefix_hash = None\n self._last_prefix_length = 0\n self._cache_hits = 0\n self._cache_misses = 0\n","content_type":"text/x-python; charset=utf-8","language":"python","size":6672,"content_sha256":"5f4bd6a1f4695e927b63574af5bb89e41eb53cae137ccaccfb06dcbafdc0a504"},{"filename":"scripts/lib/fusion/compact_hooks.py","content":"\"\"\"CompactHooks — plugin hooks for pre/post compaction.\n\nInspired by Claude Code's plugin architecture: allows external code to\nregister callbacks that run before and after compaction. Use cases:\n - Inject custom context before compaction (e.g., environment state)\n - Log or audit compaction events\n - Modify messages before they enter the compaction pipeline\n - Post-process compacted messages (e.g., add watermarks)\n\nUsage::\n\n from claw_compactor.fusion.compact_hooks import HookRegistry, HookPhase\n\n registry = HookRegistry()\n\n @registry.register(HookPhase.PRE_COMPACT)\n def inject_env_context(messages, **kwargs):\n messages.append({\"role\": \"system\", \"content\": \"ENV: production\"})\n return messages\n\n @registry.register(HookPhase.POST_COMPACT)\n def log_compaction(messages, **kwargs):\n print(f\"Compacted to {len(messages)} messages\")\n return messages\n\nPart of claw-compactor v8. License: MIT.\n\"\"\"\nfrom __future__ import annotations\n\nimport logging\nimport time\nfrom enum import Enum\nfrom typing import Any, Callable, Optional\n\nlogger = logging.getLogger(__name__)\n\n\nclass HookPhase(Enum):\n \"\"\"Phases where hooks can be registered.\"\"\"\n PRE_COMPACT = \"pre_compact\" # Before any compaction\n POST_COMPACT = \"post_compact\" # After all compaction\n PRE_SUMMARIZE = \"pre_summarize\" # Before conversation summarization\n POST_SUMMARIZE = \"post_summarize\" # After conversation summarization\n PRE_BUDGET = \"pre_budget\" # Before tool result budgeting\n POST_BUDGET = \"post_budget\" # After tool result budgeting\n\n\n# Type for hook callbacks.\n# Signature: (messages, **kwargs) -> messages\nHookCallback = Callable[..., list[dict[str, Any]]]\n\n\nclass HookRegistry:\n \"\"\"Registry for compaction lifecycle hooks.\n\n Hooks are called in registration order within each phase. Each hook\n receives the current message list and must return a (possibly modified)\n message list.\n\n If a hook raises an exception, it is logged and skipped (fail-open).\n \"\"\"\n\n def __init__(self) -> None:\n self._hooks: dict[HookPhase, list[tuple[str, HookCallback]]] = {\n phase: [] for phase in HookPhase\n }\n self._stats: dict[str, Any] = {\n \"hooks_registered\": 0,\n \"hooks_called\": 0,\n \"hooks_failed\": 0,\n }\n\n def register(\n self,\n phase: HookPhase,\n name: Optional[str] = None,\n ) -> Callable[[HookCallback], HookCallback]:\n \"\"\"Decorator to register a hook for a specific phase.\n\n Parameters\n ----------\n phase:\n The compaction phase to hook into.\n name:\n Optional human-readable name for the hook (for logging).\n\n Returns\n -------\n Decorator that registers the function and returns it unchanged.\n \"\"\"\n def decorator(func: HookCallback) -> HookCallback:\n hook_name = name or func.__name__\n self._hooks[phase].append((hook_name, func))\n self._stats[\"hooks_registered\"] += 1\n return func\n return decorator\n\n def add_hook(\n self,\n phase: HookPhase,\n callback: HookCallback,\n name: Optional[str] = None,\n ) -> None:\n \"\"\"Imperatively add a hook (non-decorator API).\"\"\"\n hook_name = name or getattr(callback, \"__name__\", \"anonymous\")\n self._hooks[phase].append((hook_name, callback))\n self._stats[\"hooks_registered\"] += 1\n\n def remove_hook(self, phase: HookPhase, name: str) -> bool:\n \"\"\"Remove a hook by name. Returns True if found and removed.\"\"\"\n hooks = self._hooks[phase]\n for i, (hook_name, _) in enumerate(hooks):\n if hook_name == name:\n hooks.pop(i)\n return True\n return False\n\n def run_hooks(\n self,\n phase: HookPhase,\n messages: list[dict[str, Any]],\n **kwargs: Any,\n ) -> tuple[list[dict[str, Any]], dict[str, Any]]:\n \"\"\"Run all hooks for a phase, passing messages through each.\n\n Parameters\n ----------\n phase:\n The compaction phase.\n messages:\n Current message list.\n **kwargs:\n Additional context (e.g., token_budget, level, stats).\n\n Returns\n -------\n (modified_messages, hook_stats)\n \"\"\"\n hooks = self._hooks.get(phase, [])\n if not hooks:\n return messages, {\"phase\": phase.value, \"hooks_run\": 0}\n\n hook_stats: dict[str, Any] = {\n \"phase\": phase.value,\n \"hooks_run\": 0,\n \"hooks_failed\": 0,\n \"details\": [],\n }\n\n current_messages = messages\n for hook_name, callback in hooks:\n t0 = time.monotonic()\n try:\n result = callback(current_messages, **kwargs)\n if isinstance(result, list):\n current_messages = result\n elapsed_ms = (time.monotonic() - t0) * 1000\n hook_stats[\"details\"].append({\n \"name\": hook_name,\n \"success\": True,\n \"timing_ms\": round(elapsed_ms, 2),\n })\n hook_stats[\"hooks_run\"] += 1\n self._stats[\"hooks_called\"] += 1\n except Exception as exc:\n elapsed_ms = (time.monotonic() - t0) * 1000\n logger.warning(\n \"Hook %s (%s) failed: %s\",\n hook_name, phase.value, exc,\n )\n hook_stats[\"details\"].append({\n \"name\": hook_name,\n \"success\": False,\n \"error\": str(exc),\n \"timing_ms\": round(elapsed_ms, 2),\n })\n hook_stats[\"hooks_failed\"] += 1\n self._stats[\"hooks_failed\"] += 1\n # Fail-open: continue with unmodified messages.\n\n return current_messages, hook_stats\n\n def has_hooks(self, phase: HookPhase) -> bool:\n \"\"\"Check if any hooks are registered for a phase.\"\"\"\n return bool(self._hooks.get(phase))\n\n def list_hooks(self, phase: Optional[HookPhase] = None) -> dict[str, list[str]]:\n \"\"\"List registered hook names, optionally filtered by phase.\"\"\"\n if phase is not None:\n return {phase.value: [name for name, _ in self._hooks.get(phase, [])]}\n return {\n p.value: [name for name, _ in hooks]\n for p, hooks in self._hooks.items()\n if hooks\n }\n\n @property\n def stats(self) -> dict[str, Any]:\n \"\"\"Return hook execution statistics.\"\"\"\n return dict(self._stats)\n\n def clear(self) -> None:\n \"\"\"Remove all registered hooks.\"\"\"\n for phase in HookPhase:\n self._hooks[phase] = []\n self._stats = {\n \"hooks_registered\": 0,\n \"hooks_called\": 0,\n \"hooks_failed\": 0,\n }\n","content_type":"text/x-python; charset=utf-8","language":"python","size":6986,"content_sha256":"87bb11861ee8ee56a9334e69e473252de04ac0ffdfeffdc5f3e5082cd0c8377a"},{"filename":"scripts/lib/fusion/content_detector.py","content":"\"\"\"Rule-based content type detector for the Fusion Pipeline Cortex.\n\nDetection priority (highest confidence first):\n 1. Markdown code fences → code + language (0.95)\n 2. Diff headers → diff (0.95)\n 3. JSON parse → json (0.90)\n 4. Shebang line → code + language (0.90)\n 5. Log line density → log (0.80)\n 6. Search result density → search (0.80)\n 7. Code keyword density → code (0.70)\n 8. Fallback → text (0.50)\n\nPart of claw-compactor. License: MIT.\n\"\"\"\nfrom __future__ import annotations\n\nimport json\nimport re\nfrom dataclasses import dataclass\n\n\n# ---------------------------------------------------------------------------\n# Public types\n# ---------------------------------------------------------------------------\n\n@dataclass(frozen=True)\nclass DetectionResult:\n content_type: str # text | code | json | log | diff | search\n language: str | None\n confidence: float # 0.0 – 1.0\n\n\n@dataclass(frozen=True)\nclass Section:\n content: str\n content_type: str\n language: str | None\n start_line: int\n end_line: int\n\n\n# ---------------------------------------------------------------------------\n# Regex constants\n# ---------------------------------------------------------------------------\n\n# Code fence: ```lang or ~~~lang (lang optional)\n_FENCE_OPEN = re.compile(r\"^(`{3,}|~{3,})([\\w+-]*)$\", re.MULTILINE)\n_FENCE_CLOSE_BACKTICK = re.compile(r\"^`{3,}\\s*$\", re.MULTILINE)\n_FENCE_CLOSE_TILDE = re.compile(r\"^~{3,}\\s*$\", re.MULTILINE)\n\n# Diff\n_DIFF_HEADER = re.compile(r\"^(--- a/|\\+\\+\\+ b/|@@ .* @@)\", re.MULTILINE)\n\n# JSON first char\n_JSON_START = re.compile(r\"^\\s*[\\[{]\")\n\n# Shebang\n_SHEBANG = re.compile(r\"^#!\")\n\n# Log line: leading timestamp + log level keyword\n_LOG_LINE = re.compile(\n r\"(?:\"\n r\"\\d{4}-\\d{2}-\\d{2}[T ]\\d{2}:\\d{2}:\\d{2}\" # ISO timestamp\n r\"|\"\n r\"\\[?\\d{2}[:/]\\d{2}[:/]\\d{2}\\]?\" # HH:MM:SS\n r\")\"\n r\".{0,40}\"\n r\"\\b(?:INFO|WARN(?:ING)?|ERROR|DEBUG|FATAL|TRACE|CRITICAL)\\b\",\n re.IGNORECASE,\n)\n\n# Search result: path:lineno: content (grep/rg style)\n_SEARCH_LINE = re.compile(r\"^[^\\s:][^:]*:\\d+[:\\s]\")\n\n# Code keywords (per-line density check)\n_CODE_KEYWORDS = re.compile(\n r\"\\b(?:import|from|def |class |function |const |let |var |return|if |else |\"\n r\"for |while |switch |case |elif |endif|public |private |protected |\"\n r\"static |void |int |str |bool |fn |func |package |use )\\b\"\n)\n\n# Language fingerprints for content-based detection (no fence)\n_LANG_PATTERNS: list[tuple[str, re.Pattern[str]]] = [\n (\"python\", re.compile(r\"\\bdef \\w+\\(|^from \\w+ import |^import \\w|class \\w+\\s*:\", re.MULTILINE)),\n (\"go\", re.compile(r\"^package \\w|^func \\w+\\(|^import \\(\", re.MULTILINE)),\n (\"rust\", re.compile(r\"\\bfn \\w+\\(|let mut |^impl |^use \\w\", re.MULTILINE)),\n (\"java\", re.compile(r\"\\bpublic class |\\bprivate |\\bprotected |\\bpublic static void main\\b\")),\n (\"typescript\", re.compile(r\"\\b(const|let|var)\\b\\s+\\w+\\s*:\\s*\\w+|interface \\w+\\s*\\{|export type |:\\s*(string|number|boolean|any|void|never)\\b\")),\n (\"javascript\", re.compile(r\"\\b(const|let|var)\\b|\\bfunction\\b|\\b=>\\b|\\bexport\\b|\\brequire\\s*\\(\")),\n (\"css\", re.compile(r\"^\\s*[\\w#.:\\[*][^{]*\\{\\s*$\", re.MULTILINE)),\n (\"html\", re.compile(r\"\u003c(!DOCTYPE|html|head|body|div|span|p|a)\\b\", re.IGNORECASE)),\n (\"sql\", re.compile(r\"\\b(SELECT|INSERT|UPDATE|DELETE|CREATE|DROP|ALTER|FROM|WHERE)\\b\", re.IGNORECASE)),\n (\"yaml\", re.compile(r\"^\\w[\\w\\s]*:\\s*\\S\", re.MULTILINE)),\n]\n\n# Fence language aliases → canonical name\n_FENCE_LANG_MAP: dict[str, str] = {\n \"py\": \"python\",\n \"python\": \"python\",\n \"python3\": \"python\",\n \"js\": \"javascript\",\n \"javascript\": \"javascript\",\n \"jsx\": \"javascript\",\n \"ts\": \"typescript\",\n \"typescript\": \"typescript\",\n \"tsx\": \"typescript\",\n \"java\": \"java\",\n \"go\": \"go\",\n \"golang\": \"go\",\n \"rs\": \"rust\",\n \"rust\": \"rust\",\n \"c\": \"c\",\n \"cpp\": \"cpp\",\n \"c++\": \"cpp\",\n \"cxx\": \"cpp\",\n \"rb\": \"ruby\",\n \"ruby\": \"ruby\",\n \"php\": \"php\",\n \"sh\": \"shell\",\n \"bash\": \"shell\",\n \"shell\": \"shell\",\n \"zsh\": \"shell\",\n \"fish\": \"shell\",\n \"sql\": \"sql\",\n \"yaml\": \"yaml\",\n \"yml\": \"yaml\",\n \"toml\": \"toml\",\n \"html\": \"html\",\n \"css\": \"css\",\n \"json\": \"json\",\n \"xml\": \"xml\",\n \"md\": \"markdown\",\n \"markdown\": \"markdown\",\n}\n\n# Shebang interpreter → language\n_SHEBANG_LANG: list[tuple[re.Pattern[str], str]] = [\n (re.compile(r\"python\"), \"python\"),\n (re.compile(r\"node|nodejs\"), \"javascript\"),\n (re.compile(r\"ruby\"), \"ruby\"),\n (re.compile(r\"php\"), \"php\"),\n (re.compile(r\"perl\"), \"perl\"),\n (re.compile(r\"bash|sh|zsh|fish|dash\"), \"shell\"),\n (re.compile(r\"env\\s+(\\w+)\"), None), # handled specially below\n]\n\n\n# ---------------------------------------------------------------------------\n# Detector\n# ---------------------------------------------------------------------------\n\nclass ContentDetector:\n \"\"\"Rule-based content type detector.\"\"\"\n\n # -- Public API ----------------------------------------------------------\n\n def detect(self, text: str) -> DetectionResult:\n \"\"\"Detect content type from text. Returns best match.\"\"\"\n if not text or not text.strip():\n return DetectionResult(\"text\", None, 0.5)\n\n # 1. Markdown code fence\n fence_result = self._check_code_fence(text)\n if fence_result is not None:\n return fence_result\n\n # 2. Diff headers\n if self._check_diff(text):\n return DetectionResult(\"diff\", None, 0.95)\n\n # 3. JSON\n if self._check_json(text):\n return DetectionResult(\"json\", None, 0.9)\n\n # 4. Shebang\n shebang_result = self._check_shebang(text)\n if shebang_result is not None:\n return shebang_result\n\n lines = text.splitlines()\n non_empty = [ln for ln in lines if ln.strip()]\n total = max(len(non_empty), 1)\n\n # 5. Log density\n log_hits = sum(1 for ln in non_empty if _LOG_LINE.search(ln))\n if log_hits / total > 0.30:\n return DetectionResult(\"log\", None, 0.8)\n\n # 6. Search result density\n search_hits = sum(1 for ln in non_empty if _SEARCH_LINE.match(ln))\n if search_hits / total > 0.40:\n return DetectionResult(\"search\", None, 0.8)\n\n # 7. Code keyword density\n kw_hits = sum(1 for ln in non_empty if _CODE_KEYWORDS.search(ln))\n if kw_hits / total > 0.15:\n lang = self.detect_language(text)\n return DetectionResult(\"code\", lang, 0.7)\n\n return DetectionResult(\"text\", None, 0.5)\n\n def detect_language(self, text: str) -> str | None:\n \"\"\"Detect programming language from code text (no fence context).\"\"\"\n for lang, pattern in _LANG_PATTERNS:\n if pattern.search(text):\n return lang\n return None\n\n def detect_sections(self, text: str) -> list[Section]:\n \"\"\"Split mixed content into typed sections (text interleaved with code fences).\"\"\"\n sections: list[Section] = []\n lines = text.splitlines(keepends=True)\n i = 0\n text_start = 0\n\n while i \u003c len(lines):\n stripped = lines[i].rstrip(\"\\n\\r\")\n m = _FENCE_OPEN.match(stripped)\n if m is None:\n i += 1\n continue\n\n # Flush preceding text block\n if i > text_start:\n block = \"\".join(lines[text_start:i])\n sections.append(self._classify_block(block, text_start + 1, i))\n\n fence_char = m.group(1)[0]\n raw_lang = m.group(2).strip().lower()\n lang = _FENCE_LANG_MAP.get(raw_lang) or (raw_lang or None)\n fence_start = i\n close_pat = _FENCE_CLOSE_BACKTICK if fence_char == \"`\" else _FENCE_CLOSE_TILDE\n\n i += 1\n while i \u003c len(lines) and not close_pat.match(lines[i].rstrip(\"\\n\\r\")):\n i += 1\n\n code_lines = lines[fence_start: i + 1]\n code_block = \"\".join(code_lines)\n sections.append(Section(\n content=code_block,\n content_type=\"code\",\n language=lang,\n start_line=fence_start + 1,\n end_line=i + 1,\n ))\n i += 1\n text_start = i\n\n # Trailing text\n if text_start \u003c len(lines):\n block = \"\".join(lines[text_start:])\n sections.append(self._classify_block(block, text_start + 1, len(lines)))\n\n return sections\n\n # -- Private helpers -----------------------------------------------------\n\n def _check_code_fence(self, text: str) -> DetectionResult | None:\n m = _FENCE_OPEN.search(text)\n if m is None:\n return None\n raw_lang = m.group(2).strip().lower()\n lang = _FENCE_LANG_MAP.get(raw_lang) or (raw_lang or None)\n return DetectionResult(\"code\", lang, 0.95)\n\n def _check_diff(self, text: str) -> bool:\n matches = _DIFF_HEADER.findall(text)\n return len(matches) >= 2\n\n def _check_json(self, text: str) -> bool:\n stripped = text.strip()\n if not stripped or stripped[0] not in (\"{\", \"[\"):\n return False\n try:\n json.loads(stripped)\n return True\n except (json.JSONDecodeError, ValueError):\n return False\n\n def _check_shebang(self, text: str) -> DetectionResult | None:\n first_line = text.split(\"\\n\", 1)[0]\n if not _SHEBANG.match(first_line):\n return None\n lang = self._lang_from_shebang(first_line)\n return DetectionResult(\"code\", lang, 0.9)\n\n def _lang_from_shebang(self, shebang: str) -> str | None:\n for pattern, lang in _SHEBANG_LANG:\n m = pattern.search(shebang)\n if m:\n if lang is not None:\n return lang\n # env case: look at captured interpreter name\n interpreter = m.group(1).lower() if m.lastindex else \"\"\n return _FENCE_LANG_MAP.get(interpreter, interpreter or None)\n return None\n\n def _classify_block(self, block: str, start_line: int, end_line: int) -> Section:\n result = self.detect(block)\n return Section(\n content=block,\n content_type=result.content_type,\n language=result.language,\n start_line=start_line,\n end_line=end_line,\n )\n","content_type":"text/x-python; charset=utf-8","language":"python","size":10810,"content_sha256":"a9e88fa1f0da3a8feef844ea57cbdcd01ae5ad5de9ea2dcbc8aef03469f9a610"},{"filename":"scripts/lib/fusion/content_stripper.py","content":"\"\"\"ContentStripper — strip images and documents before summarization.\n\nInspired by Claude Code's pre-summarization content stripping: before\nsending conversation history to the summarizer, large binary content\n(base64 images, embedded documents) is replaced with lightweight\nplaceholders. This prevents the summarizer from wasting tokens on\nnon-textual content.\n\nUsage::\n\n from claw_compactor.fusion.content_stripper import strip_images_and_docs\n\n cleaned_messages, stats = strip_images_and_docs(messages)\n\nPart of claw-compactor v8. License: MIT.\n\"\"\"\nfrom __future__ import annotations\n\nimport re\nfrom typing import Any\n\nfrom claw_compactor.tokens import estimate_tokens\n\n\n# Patterns for detecting embedded content.\n_BASE64_DATA_URI_RE = re.compile(\n r'data:(?P\u003cmime>[a-zA-Z0-9/+.-]+);base64,(?P\u003cb64>[A-Za-z0-9+/=\\n]{100,})',\n)\n_MARKDOWN_IMAGE_RE = re.compile(\n r'!\\[(?P\u003calt>[^\\]]*)\\]\\((?P\u003curl>[^)]+)\\)',\n)\n_HTML_IMG_RE = re.compile(\n r'\u003cimg\\s[^>]*src=[\"\\'](?P\u003curl>[^\"\\']+)[\"\\'][^>]*/?>',\n re.IGNORECASE,\n)\n_DOCUMENT_BLOCK_RE = re.compile(\n r'```(?:pdf|doc|docx|xlsx|csv)\\s*\\n.*?```',\n re.DOTALL,\n)\n\n# Placeholder templates.\n_IMAGE_PLACEHOLDER = \"[image: {mime}, ~{size_kb}KB]\"\n_MARKDOWN_IMAGE_PLACEHOLDER = \"[image: {alt}]\"\n_DOCUMENT_PLACEHOLDER = \"[embedded document removed, ~{size_kb}KB]\"\n\n\ndef strip_images_and_docs(\n messages: list[dict[str, Any]],\n strip_base64: bool = True,\n strip_markdown_images: bool = True,\n strip_html_images: bool = True,\n strip_document_blocks: bool = True,\n min_base64_length: int = 100,\n) -> tuple[list[dict[str, Any]], dict[str, Any]]:\n \"\"\"Strip images and documents from messages, replacing with placeholders.\n\n Parameters\n ----------\n messages:\n OpenAI-format message list.\n strip_base64:\n Replace base64 data URIs with placeholders.\n strip_markdown_images:\n Replace markdown image syntax with alt-text placeholders.\n strip_html_images:\n Replace HTML img tags with placeholders.\n strip_document_blocks:\n Replace embedded document code blocks with placeholders.\n min_base64_length:\n Minimum base64 string length to trigger stripping.\n\n Returns\n -------\n (cleaned_messages, stats)\n \"\"\"\n stats: dict[str, Any] = {\n \"images_stripped\": 0,\n \"documents_stripped\": 0,\n \"tokens_saved\": 0,\n \"multipart_images_stripped\": 0,\n }\n\n result_messages: list[dict[str, Any]] = []\n\n for msg in messages:\n content = msg.get(\"content\")\n\n # Handle multipart content (OpenAI list format).\n if isinstance(content, list):\n new_parts, part_stats = _strip_multipart(content)\n result_messages.append({**msg, \"content\": new_parts})\n stats[\"multipart_images_stripped\"] += part_stats[\"images_stripped\"]\n stats[\"tokens_saved\"] += part_stats[\"tokens_saved\"]\n continue\n\n if not isinstance(content, str) or not content:\n result_messages.append(msg)\n continue\n\n original_tokens = estimate_tokens(content)\n cleaned = content\n\n # Strip base64 data URIs.\n if strip_base64:\n def _replace_base64(match: re.Match) -> str:\n mime = match.group(\"mime\")\n b64 = match.group(\"b64\")\n size_kb = round(len(b64) * 3 / 4 / 1024, 1)\n stats[\"images_stripped\"] += 1\n return _IMAGE_PLACEHOLDER.format(mime=mime, size_kb=size_kb)\n\n cleaned = _BASE64_DATA_URI_RE.sub(_replace_base64, cleaned)\n\n # Strip markdown images.\n if strip_markdown_images:\n def _replace_md_image(match: re.Match) -> str:\n alt = match.group(\"alt\") or \"unnamed\"\n stats[\"images_stripped\"] += 1\n return _MARKDOWN_IMAGE_PLACEHOLDER.format(alt=alt)\n\n cleaned = _MARKDOWN_IMAGE_RE.sub(_replace_md_image, cleaned)\n\n # Strip HTML images.\n if strip_html_images:\n def _replace_html_image(match: re.Match) -> str:\n stats[\"images_stripped\"] += 1\n return \"[image removed]\"\n\n cleaned = _HTML_IMG_RE.sub(_replace_html_image, cleaned)\n\n # Strip document blocks.\n if strip_document_blocks:\n def _replace_doc(match: re.Match) -> str:\n size_kb = round(len(match.group(0)) / 1024, 1)\n stats[\"documents_stripped\"] += 1\n return _DOCUMENT_PLACEHOLDER.format(size_kb=size_kb)\n\n cleaned = _DOCUMENT_BLOCK_RE.sub(_replace_doc, cleaned)\n\n new_tokens = estimate_tokens(cleaned)\n stats[\"tokens_saved\"] += original_tokens - new_tokens\n\n result_messages.append({**msg, \"content\": cleaned})\n\n return result_messages, stats\n\n\ndef _strip_multipart(\n parts: list[Any],\n) -> tuple[list[Any], dict[str, int]]:\n \"\"\"Strip image parts from multipart content.\"\"\"\n stats = {\"images_stripped\": 0, \"tokens_saved\": 0}\n new_parts: list[Any] = []\n\n for part in parts:\n if not isinstance(part, dict):\n new_parts.append(part)\n continue\n\n part_type = part.get(\"type\", \"\")\n\n if part_type == \"image_url\":\n # Replace image_url with a text placeholder.\n url = part.get(\"image_url\", {}).get(\"url\", \"\")\n if url.startswith(\"data:\"):\n mime_match = re.match(r'data:([^;]+)', url)\n mime = mime_match.group(1) if mime_match else \"unknown\"\n size_kb = round(len(url) * 3 / 4 / 1024, 1)\n placeholder = _IMAGE_PLACEHOLDER.format(mime=mime, size_kb=size_kb)\n else:\n placeholder = f\"[image: {url[:80]}]\"\n\n new_parts.append({\"type\": \"text\", \"text\": placeholder})\n stats[\"images_stripped\"] += 1\n stats[\"tokens_saved\"] += max(0, estimate_tokens(url) - estimate_tokens(placeholder))\n\n elif part_type == \"image\":\n # Another image format variant.\n new_parts.append({\"type\": \"text\", \"text\": \"[image removed]\"})\n stats[\"images_stripped\"] += 1\n\n else:\n new_parts.append(part)\n\n return new_parts, stats\n","content_type":"text/x-python; charset=utf-8","language":"python","size":6195,"content_sha256":"0a79f4c98faf273f80ea1142c8bb177c429b9b71925945bb629f715799a66bcb"},{"filename":"scripts/lib/fusion/conversation_summarizer.py","content":"\"\"\"ConversationSummarizer — LLM-free conversation turn summarization.\n\nInspired by Claude Code's AutoCompact: when a conversation exceeds a token\nbudget, older turns are collapsed into a structured summary block.\n\nUnlike Claude Code (which calls the LLM for summarization), this module uses\ndeterministic extraction to avoid API calls and latency:\n - Extracts key decisions, file paths, function names, and error patterns\n - Preserves user instructions and system messages verbatim\n - Collapses assistant responses to their first sentence + action items\n - Collapses tool results to one-line summaries\n\nThe summarized turns are replaced by a single system message with subtype\n``compact_boundary`` (compatible with Claude Code's format) so downstream\nconsumers can detect and handle compacted history.\n\nPart of claw-compactor v8. License: MIT.\n\"\"\"\nfrom __future__ import annotations\n\nimport json\nimport re\nfrom typing import Any\n\nfrom claw_compactor.tokens import estimate_tokens\n\n\n# Summarization fires when total message tokens exceed this fraction of budget.\nDEFAULT_TRIGGER_PCT = 0.80\n\n# After summarization, the summary should be at most this many tokens.\nMAX_SUMMARY_TOKENS = 20_000\n\n# Keep the N most recent turns unsummarized (a \"turn\" = one user + one assistant).\nDEFAULT_PRESERVE_RECENT_TURNS = 4\n\n# Patterns to extract from assistant messages.\n_FILE_PATH_RE = re.compile(r'[`\"\\']?(/[\\w./-]+\\.\\w{1,10})[`\"\\']?')\n_FUNCTION_RE = re.compile(r'(?:def|function|class|fn|func)\\s+(\\w+)')\n_ERROR_RE = re.compile(r'(?:Error|Exception|FAIL|error|failed|bug)[:. ]\\s*(.{10,80})')\n_DECISION_RE = re.compile(\n r'(?:decided|decision|chose|choosing|will use|going with|plan is|approach:)\\s+(.{10,120})',\n re.IGNORECASE,\n)\n\n\ndef summarize_conversation(\n messages: list[dict[str, Any]],\n token_budget: int = 200_000,\n trigger_pct: float = DEFAULT_TRIGGER_PCT,\n preserve_recent_turns: int = DEFAULT_PRESERVE_RECENT_TURNS,\n) -> tuple[list[dict[str, Any]], dict[str, Any]]:\n \"\"\"Summarize older conversation turns if total tokens exceed budget threshold.\n\n Parameters\n ----------\n messages:\n OpenAI-format message list.\n token_budget:\n The context window size in tokens.\n trigger_pct:\n Fraction of token_budget at which summarization activates.\n preserve_recent_turns:\n Number of recent user+assistant turn pairs to keep verbatim.\n\n Returns\n -------\n (new_messages, stats) — stats includes tokens_before, tokens_after, turns_summarized.\n \"\"\"\n total_tokens = sum(estimate_tokens(m.get(\"content\", \"\") if isinstance(m.get(\"content\"), str) else str(m.get(\"content\", \"\"))) for m in messages)\n threshold = int(token_budget * trigger_pct)\n\n stats: dict[str, Any] = {\n \"total_tokens_before\": total_tokens,\n \"total_tokens_after\": total_tokens,\n \"turns_summarized\": 0,\n \"triggered\": False,\n \"threshold\": threshold,\n }\n\n if total_tokens \u003c threshold:\n return messages, stats\n\n stats[\"triggered\"] = True\n\n # Split messages into: system prefix, conversation body, recent tail.\n system_msgs, body_msgs, recent_msgs = _split_messages(\n messages, preserve_recent_turns\n )\n\n if len(body_msgs) \u003c 2:\n # Not enough to summarize.\n return messages, stats\n\n # Build a deterministic summary of the body.\n summary_lines = _extract_summary(body_msgs)\n summary_text = \"\\n\".join(summary_lines)\n\n # Enforce MAX_SUMMARY_TOKENS.\n summary_tokens = estimate_tokens(summary_text)\n if summary_tokens > MAX_SUMMARY_TOKENS:\n # Truncate to budget.\n lines = summary_lines\n while estimate_tokens(\"\\n\".join(lines)) > MAX_SUMMARY_TOKENS and len(lines) > 5:\n lines = lines[:len(lines) - 1]\n summary_text = \"\\n\".join(lines) + \"\\n[...truncated summary]\"\n\n # Build compact_boundary message.\n boundary_msg = _make_compact_boundary(\n summary_text,\n turns_summarized=len(body_msgs),\n original_tokens=sum(\n estimate_tokens(m.get(\"content\", \"\") if isinstance(m.get(\"content\"), str) else \"\")\n for m in body_msgs\n ),\n )\n\n # Reassemble.\n new_messages = system_msgs + [boundary_msg] + recent_msgs\n\n new_total = sum(\n estimate_tokens(m.get(\"content\", \"\") if isinstance(m.get(\"content\"), str) else str(m.get(\"content\", \"\")))\n for m in new_messages\n )\n stats[\"total_tokens_after\"] = new_total\n stats[\"turns_summarized\"] = len(body_msgs)\n\n return new_messages, stats\n\n\ndef _split_messages(\n messages: list[dict[str, Any]],\n preserve_recent_turns: int,\n) -> tuple[list[dict[str, Any]], list[dict[str, Any]], list[dict[str, Any]]]:\n \"\"\"Split messages into (system_prefix, compactable_body, recent_tail).\"\"\"\n # System messages at the start.\n system_msgs: list[dict[str, Any]] = []\n i = 0\n while i \u003c len(messages) and messages[i].get(\"role\") == \"system\":\n system_msgs.append(messages[i])\n i += 1\n\n remaining = messages[i:]\n\n # Count turns from the end (a turn = user msg followed by any non-user msgs).\n if preserve_recent_turns \u003c= 0:\n return system_msgs, remaining, []\n\n # Walk backwards counting user messages as turn boundaries.\n turns_found = 0\n split_idx = len(remaining)\n for j in range(len(remaining) - 1, -1, -1):\n if remaining[j].get(\"role\") == \"user\":\n turns_found += 1\n if turns_found >= preserve_recent_turns:\n split_idx = j\n break\n\n body = remaining[:split_idx]\n recent = remaining[split_idx:]\n return system_msgs, body, recent\n\n\ndef _extract_summary(messages: list[dict[str, Any]]) -> list[str]:\n \"\"\"Extract a structured summary from a list of conversation messages.\"\"\"\n lines: list[str] = [\"## Conversation Summary (auto-compacted)\"]\n lines.append(\"\")\n\n decisions: list[str] = []\n files_mentioned: set[str] = set()\n functions_mentioned: set[str] = set()\n errors: list[str] = []\n user_instructions: list[str] = []\n actions_taken: list[str] = []\n\n for msg in messages:\n role = msg.get(\"role\", \"\")\n content = msg.get(\"content\", \"\")\n if not isinstance(content, str):\n content = str(content)\n\n if role == \"user\":\n # Preserve user instructions (first 200 chars each).\n trimmed = content.strip()[:200]\n if trimmed:\n user_instructions.append(trimmed)\n\n elif role == \"assistant\":\n # Extract decisions.\n for m in _DECISION_RE.finditer(content):\n decisions.append(m.group(1).strip())\n # Extract first sentence as action summary.\n first_sentence = content.split(\"\\n\")[0][:150].strip()\n if first_sentence:\n actions_taken.append(first_sentence)\n\n elif role == \"tool\":\n # One-line summary.\n tool_name = msg.get(\"name\", \"tool\")\n token_count = estimate_tokens(content)\n actions_taken.append(f\"[{tool_name}: {token_count} tokens]\")\n\n # Extract file paths, functions, errors from any role.\n files_mentioned.update(_FILE_PATH_RE.findall(content))\n functions_mentioned.update(_FUNCTION_RE.findall(content))\n for m in _ERROR_RE.finditer(content):\n errors.append(m.group(1).strip()[:100])\n\n # Build summary sections.\n if user_instructions:\n lines.append(\"### User Instructions\")\n for instr in user_instructions[-10:]: # cap at 10\n lines.append(f\"- {instr}\")\n lines.append(\"\")\n\n if decisions:\n lines.append(\"### Key Decisions\")\n for d in decisions[-10:]:\n lines.append(f\"- {d}\")\n lines.append(\"\")\n\n if actions_taken:\n lines.append(\"### Actions Taken\")\n for a in actions_taken[-15:]:\n lines.append(f\"- {a}\")\n lines.append(\"\")\n\n if files_mentioned:\n lines.append(\"### Files Referenced\")\n for f in sorted(files_mentioned)[:20]:\n lines.append(f\"- `{f}`\")\n lines.append(\"\")\n\n if errors:\n lines.append(\"### Errors Encountered\")\n for e in errors[-5:]:\n lines.append(f\"- {e}\")\n lines.append(\"\")\n\n return lines\n\n\ndef _make_compact_boundary(\n summary: str,\n turns_summarized: int,\n original_tokens: int,\n) -> dict[str, Any]:\n \"\"\"Create a compact_boundary system message (Claude Code compatible format).\"\"\"\n return {\n \"role\": \"system\",\n \"content\": json.dumps({\n \"type\": \"system\",\n \"subtype\": \"compact_boundary\",\n \"summary\": summary,\n \"compactMetadata\": {\n \"turnsSummarized\": turns_summarized,\n \"originalTokens\": original_tokens,\n \"compressedTokens\": estimate_tokens(summary),\n \"preservedSegment\": True,\n },\n }),\n }\n","content_type":"text/x-python; charset=utf-8","language":"python","size":8916,"content_sha256":"b474d5cd358bd902b10817515bfdafe067222b97d21349727d8fb6ba38d45a43"},{"filename":"scripts/lib/fusion/cortex.py","content":"\"\"\"Cortex — intelligent content router for the Fusion Pipeline.\n\nCortex is the pipeline's \"brain\" — it runs at order=5 (before all compressor\nstages) and auto-detects content type (code, json, log, diff, search, text)\nand programming language (16 languages supported) by analyzing structural\npatterns, keywords, and syntax markers. Detection results are propagated\ninto FusionContext via context_updates, so every downstream stage can make\ntype-aware compression decisions without redundant analysis.\n\nPart of claw-compactor v7. License: MIT.\n\"\"\"\nfrom __future__ import annotations\n\nfrom claw_compactor.fusion.base import FusionContext, FusionResult, FusionStage\nfrom claw_compactor.fusion.content_detector import ContentDetector\nfrom claw_compactor.tokens import estimate_tokens\n\n\nclass Cortex(FusionStage):\n \"\"\"Intelligent content router. Detects content type and routes to appropriate compressors.\"\"\"\n\n name = \"cortex\"\n order = 5 # must run before all compressor stages\n\n def __init__(self) -> None:\n self.detector = ContentDetector()\n\n def should_apply(self, ctx: FusionContext) -> bool:\n # Skip if a caller has already made an explicit type decision (non-default value).\n return ctx.content_type == \"text\"\n\n def apply(self, ctx: FusionContext) -> FusionResult:\n detection = self.detector.detect(ctx.content)\n tokens = estimate_tokens(ctx.content)\n\n context_updates: dict[str, object] = {\n \"content_type\": detection.content_type,\n }\n if detection.language is not None:\n context_updates[\"language\"] = detection.language\n\n return FusionResult(\n content=ctx.content,\n original_tokens=tokens,\n compressed_tokens=tokens, # Cortex never modifies content\n skipped=False,\n context_updates=context_updates,\n )\n","content_type":"text/x-python; charset=utf-8","language":"python","size":1877,"content_sha256":"81b793b1445043d6e058bef074444ba3766ad6762b0bddca2cbd6ae339fefa00"},{"filename":"scripts/lib/fusion/diff_crunch.py","content":"\"\"\"DiffCrunch — git diff compression FusionStage.\n\nPreserves file headers, hunk headers, and all changed lines (+/-).\nCompresses context blocks (unchanged lines) to at most 1 line at each end.\nStores large diffs in RewindStore for full retrieval.\n\nPart of claw-compactor. License: MIT.\n\"\"\"\nfrom __future__ import annotations\n\nimport re\n\nfrom claw_compactor.fusion.base import FusionStage, FusionContext, FusionResult\nfrom claw_compactor.rewind.marker import embed_marker\nfrom claw_compactor.rewind.store import RewindStore\nfrom claw_compactor.tokens import estimate_tokens\n\n# ---------------------------------------------------------------------------\n# Configuration\n# ---------------------------------------------------------------------------\n\n# Context lines to keep at the start/end of each context block.\n_CONTEXT_KEEP = 1\n\n# Line count above which we store the original in RewindStore.\n_LARGE_DIFF_THRESHOLD = 200\n\n# ---------------------------------------------------------------------------\n# Line-type classification\n# ---------------------------------------------------------------------------\n\n# File header patterns (unified diff format).\n_FILE_HEADER_RE = re.compile(r'^(--- |--- a/|\\+\\+\\+ |\\+\\+\\+ b/|diff --git |index [0-9a-f]+\\.\\.|new file mode|deleted file mode|rename from |rename to |old mode |new mode )')\n_HUNK_HEADER_RE = re.compile(r'^@@')\n_ADDED_RE = re.compile(r'^\\+(?!\\+\\+)') # + lines that are not +++\n_REMOVED_RE = re.compile(r'^-(?!--)') # - lines that are not ---\n_NO_NEWLINE_RE = re.compile(r'^\\\\ No newline')\n\n\ndef _line_type(line: str) -> str:\n \"\"\"Classify a diff line.\n\n Returns one of: \"file_header\" | \"hunk_header\" | \"added\" | \"removed\"\n | \"no_newline\" | \"context\"\n \"\"\"\n if _FILE_HEADER_RE.match(line):\n return \"file_header\"\n if _HUNK_HEADER_RE.match(line):\n return \"hunk_header\"\n if _ADDED_RE.match(line):\n return \"added\"\n if _REMOVED_RE.match(line):\n return \"removed\"\n if _NO_NEWLINE_RE.match(line):\n return \"no_newline\"\n return \"context\"\n\n\n# ---------------------------------------------------------------------------\n# Context block compression\n# ---------------------------------------------------------------------------\n\ndef _compress_context_block(block: list[str]) -> list[str]:\n \"\"\"\n Compress a run of context lines.\n\n If the block has \u003c= 2*_CONTEXT_KEEP lines: keep all.\n Otherwise: keep first _CONTEXT_KEEP, emit ellipsis, keep last _CONTEXT_KEEP.\n \"\"\"\n keep = _CONTEXT_KEEP\n if len(block) \u003c= keep * 2:\n return list(block)\n\n head = block[:keep]\n tail = block[-keep:]\n omitted = len(block) - keep * 2\n ellipsis = f\" [... {omitted} unchanged line{'s' if omitted != 1 else ''} ...]\"\n return head + [ellipsis] + tail\n\n\n# ---------------------------------------------------------------------------\n# Main compression logic\n# ---------------------------------------------------------------------------\n\ndef _compress_diff(lines: list[str]) -> list[str]:\n \"\"\"\n Walk diff lines, preserving structural lines and compressing context blocks.\n \"\"\"\n output: list[str] = []\n context_buffer: list[str] = []\n\n def flush_context() -> None:\n if context_buffer:\n output.extend(_compress_context_block(context_buffer))\n context_buffer.clear()\n\n for line in lines:\n ltype = _line_type(line)\n\n if ltype == \"context\":\n context_buffer.append(line)\n else:\n flush_context()\n output.append(line)\n\n # Flush any trailing context.\n flush_context()\n return output\n\n\n# ---------------------------------------------------------------------------\n# Summary generation (for very large diffs)\n# ---------------------------------------------------------------------------\n\ndef _summarise_diff(lines: list[str]) -> str:\n \"\"\"Generate a high-level summary of a large diff.\"\"\"\n files_changed: list[str] = []\n added_lines = 0\n removed_lines = 0\n hunks = 0\n\n current_file: str | None = None\n for line in lines:\n ltype = _line_type(line)\n if ltype == \"file_header\":\n if line.startswith(\"+++ \"):\n path = line[4:].strip()\n # Strip \"b/\" prefix from git diff output.\n if path.startswith(\"b/\"):\n path = path[2:]\n if path != \"/dev/null\":\n current_file = path\n if current_file not in files_changed:\n files_changed.append(current_file)\n elif ltype == \"hunk_header\":\n hunks += 1\n elif ltype == \"added\":\n added_lines += 1\n elif ltype == \"removed\":\n removed_lines += 1\n\n summary_lines = [\n f\"[Large diff summary: {len(files_changed)} file(s) changed, \"\n f\"+{added_lines} insertions, -{removed_lines} deletions, {hunks} hunk(s)]\",\n \"Files:\",\n ]\n for f in files_changed:\n summary_lines.append(f\" {f}\")\n\n return \"\\n\".join(summary_lines)\n\n\n# ---------------------------------------------------------------------------\n# FusionStage implementation\n# ---------------------------------------------------------------------------\n\nclass DiffCrunch(FusionStage):\n \"\"\"git diff compression — preserves headers and changes, compresses context.\"\"\"\n\n name = \"diff_crunch\"\n order = 18\n\n def __init__(\n self,\n rewind_store: RewindStore | None = None,\n large_diff_threshold: int = _LARGE_DIFF_THRESHOLD,\n context_keep: int = _CONTEXT_KEEP,\n ) -> None:\n self._rewind_store = rewind_store\n self._large_diff_threshold = large_diff_threshold\n self._context_keep = context_keep\n\n def should_apply(self, ctx: FusionContext) -> bool:\n return ctx.content_type == \"diff\"\n\n def apply(self, ctx: FusionContext) -> FusionResult:\n original_tokens = estimate_tokens(ctx.content)\n lines = ctx.content.splitlines()\n original_line_count = len(lines)\n markers: list[str] = []\n warnings: list[str] = []\n\n is_large = original_line_count > self._large_diff_threshold\n\n if is_large and self._rewind_store is not None:\n # Store the full original for later retrieval.\n hash_id = self._rewind_store.store(\n original=ctx.content,\n compressed=\"\", # will be filled in after compression\n original_tokens=original_tokens,\n compressed_tokens=0,\n )\n markers.append(f\"diff_crunch:large:hash={hash_id}\")\n\n # Compress the diff.\n compressed_lines = _compress_diff(lines)\n compressed = \"\\n\".join(compressed_lines)\n\n if is_large:\n summary = _summarise_diff(lines)\n compressed = summary + \"\\n\\n\" + compressed\n if self._rewind_store is not None:\n compressed = embed_marker(\n compressed,\n original_count=original_line_count,\n compressed_count=len(compressed_lines),\n hash_id=hash_id,\n )\n warnings.append(\n f\"diff_crunch: large diff ({original_line_count} lines) — summary prepended\"\n )\n\n compressed_tokens = estimate_tokens(compressed)\n markers.insert(0, f\"diff_crunch:{original_line_count}->{len(compressed_lines)} lines\")\n\n return FusionResult(\n content=compressed,\n original_tokens=original_tokens,\n compressed_tokens=compressed_tokens,\n markers=markers,\n warnings=warnings,\n )\n","content_type":"text/x-python; charset=utf-8","language":"python","size":7669,"content_sha256":"0480113c5e919616e24a7a4c32d3748f96ce58e2abb911bc3f442f16be183491"},{"filename":"scripts/lib/fusion/engine.py","content":"\"\"\"FusionEngine — unified entry point for all Claw Compactor compression.\n\nConstructs the full 14-stage Fusion Pipeline and exposes two public methods:\n\n engine.compress(text, ...) — compress a single string\n engine.compress_messages(messages) — compress a list of OpenAI-format messages\n\nThe pipeline chains 14 stages in a fixed execution order:\n\n QuantumLock(3) -> Cortex(5) -> Photon(8) -> RLE(10) -> SemanticDedup(12)\n -> Ionizer(15) -> LogCrunch(16) -> SearchCrunch(17) -> DiffCrunch(18)\n -> StructuralCollapse(20) -> Neurosyntax(25) -> Nexus(35) -> TokenOpt(40)\n -> Abbrev(45)\n\nEach stage receives an immutable FusionContext and returns an immutable\nFusionResult. The pipeline threads the compressed output forward — each\nstage's result becomes the next stage's input context. Stages that don't\napply to the current content type are skipped at zero cost via should_apply().\n\nThree legacy modules (RLE, TokenizerOptimizer, CompressedContext) are wrapped\nas adapter FusionStages so they participate in the same pipeline and metrics\ninfrastructure.\n\nAchieves 54% weighted-average compression across six content types (code, JSON,\nlogs, diffs, search results, agent conversations) — a 5.9x improvement over\nthe legacy regex-only path.\n\nPart of claw-compactor v7. License: MIT.\n\"\"\"\nfrom __future__ import annotations\n\nimport sys\nimport time\nfrom pathlib import Path\nfrom typing import Any\n\n# ---------------------------------------------------------------------------\n# Path bootstrap — allow running from any cwd; project root is three levels up\n# from this file (scripts/lib/fusion/engine.py → scripts/)\n# ---------------------------------------------------------------------------\n_SCRIPTS_DIR = Path(__file__).resolve().parent.parent.parent\nif str(_SCRIPTS_DIR) not in sys.path:\n sys.path.insert(0, str(_SCRIPTS_DIR))\n\nfrom claw_compactor.fusion.base import FusionContext, FusionResult, FusionStage\nfrom claw_compactor.fusion.pipeline import FusionPipeline\nfrom claw_compactor.fusion.cortex import Cortex\nfrom claw_compactor.fusion.quantum_lock import QuantumLock\nfrom claw_compactor.fusion.photon import PhotonStage\nfrom claw_compactor.fusion.ionizer import Ionizer\nfrom claw_compactor.fusion.log_crunch import LogCrunch\nfrom claw_compactor.fusion.search_crunch import SearchCrunch\nfrom claw_compactor.fusion.diff_crunch import DiffCrunch\nfrom claw_compactor.fusion.semantic_dedup import SemanticDedup, dedup_across_messages\nfrom claw_compactor.fusion.structural_collapse import StructuralCollapse\nfrom claw_compactor.fusion.neurosyntax import Neurosyntax\nfrom claw_compactor.fusion.nexus import NexusStage\nfrom claw_compactor.rewind.store import RewindStore\nfrom claw_compactor.tokens import estimate_tokens\n\n# Legacy modules wrapped as adapter stages\nimport claw_compactor.rle as _rle\nfrom claw_compactor.tokenizer_optimizer import optimize_tokens as _optimize_tokens\n\n# compressed_context lives in scripts/, not scripts/lib/ — optional import\n_CC_DIR = _SCRIPTS_DIR\nif str(_CC_DIR) not in sys.path:\n sys.path.insert(0, str(_CC_DIR))\n\ntry:\n from compressed_context import ( # type: ignore[import]\n compress_ultra as _compress_ultra,\n ULTRA_ABBREVS as _ULTRA_ABBREVS,\n ULTRA_FILLERS as _ULTRA_FILLERS,\n )\nexcept ImportError:\n _compress_ultra = None\n _ULTRA_ABBREVS = {}\n _ULTRA_FILLERS = set()\n\n\n# ---------------------------------------------------------------------------\n# Adapter FusionStages wrapping legacy modules\n# ---------------------------------------------------------------------------\n\nclass RLEStage(FusionStage):\n \"\"\"Wraps lib.rle.compress() — path, IP, and enum compression.\n\n Applies to all content types (the RLE transforms are structural-pattern\n aware and safe on any text). Order 10 — runs after Photon (8) and before\n Ionizer (15).\n \"\"\"\n\n name = \"rle\"\n order = 10\n\n def should_apply(self, ctx: FusionContext) -> bool:\n return bool(ctx.content)\n\n def apply(self, ctx: FusionContext) -> FusionResult:\n original_tokens = estimate_tokens(ctx.content)\n compressed = _rle.compress(ctx.content)\n compressed_tokens = estimate_tokens(compressed)\n markers: list[str] = []\n if compressed_tokens \u003c original_tokens:\n markers.append(f\"rle:{original_tokens}->{compressed_tokens}\")\n return FusionResult(\n content=compressed,\n original_tokens=original_tokens,\n compressed_tokens=compressed_tokens,\n markers=markers,\n )\n\n\nclass TokenOptStage(FusionStage):\n \"\"\"Wraps lib.tokenizer_optimizer.optimize_tokens(aggressive=True).\n\n Cleans up formatting (bold/italic, excess whitespace, tables, bullets) to\n reduce tokenizer overhead. Order 40 — runs after most semantic stages,\n before AbbrevStage.\n \"\"\"\n\n name = \"token_opt\"\n order = 40\n\n def should_apply(self, ctx: FusionContext) -> bool:\n return bool(ctx.content)\n\n def apply(self, ctx: FusionContext) -> FusionResult:\n original_tokens = estimate_tokens(ctx.content)\n compressed = _optimize_tokens(ctx.content, aggressive=True)\n compressed_tokens = estimate_tokens(compressed)\n markers: list[str] = []\n if compressed_tokens \u003c original_tokens:\n markers.append(f\"token_opt:{original_tokens}->{compressed_tokens}\")\n return FusionResult(\n content=compressed,\n original_tokens=original_tokens,\n compressed_tokens=compressed_tokens,\n markers=markers,\n )\n\n\nclass AbbrevStage(FusionStage):\n \"\"\"Wraps abbreviation + filler removal from compressed_context.compress_ultra().\n\n Only applied to natural language text (content_type == \"text\"), never to\n code, JSON, logs, diffs, or search results where abbreviations would corrupt\n structured data. Order 45 — final aggressive pass before Nexus (35) has\n already run, but after TokenOpt cleans whitespace.\n \"\"\"\n\n name = \"abbrev\"\n order = 45\n\n def should_apply(self, ctx: FusionContext) -> bool:\n return ctx.content_type == \"text\" and bool(ctx.content) and _compress_ultra is not None\n\n def apply(self, ctx: FusionContext) -> FusionResult:\n original_tokens = estimate_tokens(ctx.content)\n compressed = _compress_ultra(ctx.content) # type: ignore[misc]\n compressed_tokens = estimate_tokens(compressed)\n markers: list[str] = []\n if compressed_tokens \u003c original_tokens:\n markers.append(f\"abbrev:{original_tokens}->{compressed_tokens}\")\n return FusionResult(\n content=compressed,\n original_tokens=original_tokens,\n compressed_tokens=compressed_tokens,\n markers=markers,\n )\n\n\n# ---------------------------------------------------------------------------\n# Stage ordering summary (for documentation)\n# ---------------------------------------------------------------------------\n#\n# 3 QuantumLock — KV-cache alignment (system messages only)\n# 5 Cortex — content-type detection\n# 8 Photon — image/base64 compression\n# 10 RLEStage — path / IP / enum compression [adapter]\n# 12 SemanticDedup — near-duplicate block deduplication\n# 15 Ionizer — JSON array sampling\n# 16 LogCrunch — build/test log compression\n# 17 SearchCrunch — search result compression\n# 18 DiffCrunch — diff/patch compression\n# 20 StructuralCollapse — import/repeated-line collapse\n# 25 Neurosyntax — AST-aware code compression\n# 35 NexusStage — ML token-level compressor (fallback: stopword removal)\n# 40 TokenOptStage — tokenizer format optimisation [adapter]\n# 45 AbbrevStage — ultra-abbreviation (text only) [adapter]\n\n\ndef _build_pipeline(rewind_store: RewindStore | None) -> FusionPipeline:\n \"\"\"Construct the full pipeline with every stage, in order.\"\"\"\n stages: list[FusionStage] = [\n QuantumLock(),\n Cortex(),\n PhotonStage(),\n RLEStage(),\n SemanticDedup(),\n Ionizer(rewind_store=rewind_store),\n LogCrunch(),\n SearchCrunch(),\n DiffCrunch(),\n StructuralCollapse(),\n Neurosyntax(),\n NexusStage(),\n TokenOptStage(),\n AbbrevStage(),\n ]\n return FusionPipeline(stages)\n\n\n# ---------------------------------------------------------------------------\n# FusionEngine\n# ---------------------------------------------------------------------------\n\nclass FusionEngine:\n \"\"\"Unified compression engine. Single entry point for all compression.\n\n Parameters\n ----------\n enable_rewind:\n Maintain a RewindStore so compressed JSON arrays can be reversed.\n Default True.\n aggressive:\n Reserved for future per-stage aggressiveness knob. Currently all\n adapter stages run at maximum aggressiveness. Default True.\n \"\"\"\n\n def __init__(\n self,\n enable_rewind: bool = True,\n aggressive: bool = True,\n ) -> None:\n self._rewind_store: RewindStore | None = (\n RewindStore() if enable_rewind else None\n )\n self._aggressive = aggressive\n self._pipeline: FusionPipeline = _build_pipeline(self._rewind_store)\n\n # ------------------------------------------------------------------\n # Public API\n # ------------------------------------------------------------------\n\n def compress(\n self,\n text: str,\n content_type: str = \"text\",\n role: str = \"user\",\n language: str | None = None,\n model: str | None = None,\n token_budget: int | None = None,\n query: str | None = None,\n metadata: dict | None = None,\n ) -> dict[str, Any]:\n \"\"\"Compress a single text string through the full pipeline.\n\n Parameters\n ----------\n text:\n The text to compress.\n content_type:\n Hint for the pipeline: \"text\", \"code\", \"json\", \"log\", \"diff\",\n \"search\". Cortex will auto-detect if left as \"text\".\n role:\n Message role — \"system\", \"user\", \"assistant\", \"tool\".\n language:\n Optional programming language hint (e.g. \"python\", \"go\").\n model, token_budget, query, metadata:\n Additional context passed into FusionContext.\n\n Returns\n -------\n dict with keys:\n compressed — the compressed string\n original — the original string\n stats — per-stage and aggregate stats dict\n markers — list of compression marker strings\n warnings — list of warning strings\n \"\"\"\n if not isinstance(text, str):\n raise TypeError(f\"compress() requires a string, got {type(text).__name__}\")\n if not text:\n return {\n \"compressed\": text,\n \"original\": text,\n \"stats\": _empty_stats(text),\n \"markers\": [],\n \"warnings\": [],\n }\n\n ctx = FusionContext(\n content=text,\n content_type=content_type,\n role=role,\n language=language,\n model=model,\n token_budget=token_budget,\n query=query,\n metadata=metadata or {},\n )\n\n pipeline_result = self._pipeline.run(ctx)\n stats = _build_stats(text, pipeline_result.content, pipeline_result)\n\n return {\n \"compressed\": pipeline_result.content,\n \"original\": text,\n \"stats\": stats,\n \"markers\": pipeline_result.markers,\n \"warnings\": pipeline_result.warnings,\n }\n\n def compress_messages(self, messages: list[dict[str, Any]]) -> dict[str, Any]:\n \"\"\"Compress a list of OpenAI-format chat messages.\n\n Each message must have at minimum ``role`` and ``content`` keys.\n Content may be a string or a list (OpenAI multipart format — only the\n text parts are compressed; image_url parts are passed through the\n normal Photon path).\n\n Parameters\n ----------\n messages:\n List of dicts, each with \"role\" and \"content\".\n\n Returns\n -------\n dict with keys:\n messages — list of compressed message dicts (same structure\n as input, content replaced with compressed text)\n stats — aggregate stats across all messages\n per_message — list of per-message stat dicts\n markers — all markers from all messages combined\n warnings — all warnings from all messages combined\n \"\"\"\n if not messages:\n return {\n \"messages\": [],\n \"stats\": _empty_aggregate_stats(),\n \"per_message\": [],\n \"markers\": [],\n \"warnings\": [],\n }\n\n # Phase 0: cross-message semantic dedup\n deduped_messages, dedup_stats = dedup_across_messages(messages)\n\n compressed_messages: list[dict[str, Any]] = []\n per_message_stats: list[dict[str, Any]] = []\n all_markers: list[str] = []\n all_warnings: list[str] = []\n\n if dedup_stats.get(\"messages_deduped\", 0) > 0:\n all_markers.append(\n f\"cross_msg_dedup:{dedup_stats['messages_deduped']}_msgs_deduped\"\n )\n\n total_original_tokens = 0\n total_compressed_tokens = 0\n total_original_chars = 0\n total_compressed_chars = 0\n total_timing_ms = 0.0\n\n for msg in deduped_messages:\n role = msg.get(\"role\", \"user\")\n content = msg.get(\"content\", \"\")\n\n # Handle multipart content (OpenAI list format).\n if isinstance(content, list):\n result_msg, msg_stats, msg_markers, msg_warnings = (\n self._compress_multipart_message(role, content, msg)\n )\n else:\n result_msg, msg_stats, msg_markers, msg_warnings = (\n self._compress_text_message(role, str(content), msg)\n )\n\n compressed_messages.append(result_msg)\n per_message_stats.append(msg_stats)\n all_markers.extend(msg_markers)\n all_warnings.extend(msg_warnings)\n\n total_original_tokens += msg_stats[\"original_tokens\"]\n total_compressed_tokens += msg_stats[\"compressed_tokens\"]\n total_original_chars += msg_stats[\"original_chars\"]\n total_compressed_chars += msg_stats[\"compressed_chars\"]\n total_timing_ms += msg_stats[\"timing_ms\"]\n\n aggregate_stats = _aggregate_stats(\n original_tokens=total_original_tokens,\n compressed_tokens=total_compressed_tokens,\n original_chars=total_original_chars,\n compressed_chars=total_compressed_chars,\n timing_ms=total_timing_ms,\n message_count=len(messages),\n )\n\n return {\n \"messages\": compressed_messages,\n \"stats\": aggregate_stats,\n \"per_message\": per_message_stats,\n \"markers\": all_markers,\n \"warnings\": all_warnings,\n }\n\n # ------------------------------------------------------------------\n # Internal helpers\n # ------------------------------------------------------------------\n\n def _compress_text_message(\n self,\n role: str,\n content: str,\n original_msg: dict[str, Any],\n ) -> tuple[dict[str, Any], dict[str, Any], list[str], list[str]]:\n \"\"\"Compress a plain-text message. Returns (msg, stats, markers, warnings).\"\"\"\n t0 = time.monotonic()\n result = self.compress(text=content, role=role)\n elapsed_ms = (time.monotonic() - t0) * 1000\n\n # Build the output message preserving all keys from the original.\n out_msg = {**original_msg, \"content\": result[\"compressed\"]}\n\n original_tokens = estimate_tokens(content)\n compressed_tokens = estimate_tokens(result[\"compressed\"])\n\n msg_stats = {\n \"role\": role,\n \"original_tokens\": original_tokens,\n \"compressed_tokens\": compressed_tokens,\n \"original_chars\": len(content),\n \"compressed_chars\": len(result[\"compressed\"]),\n \"reduction_pct\": _reduction_pct(original_tokens, compressed_tokens),\n \"timing_ms\": round(elapsed_ms, 2),\n \"stages_run\": result[\"stats\"].get(\"stages_run\", 0),\n }\n\n return out_msg, msg_stats, result[\"markers\"], result[\"warnings\"]\n\n def _compress_multipart_message(\n self,\n role: str,\n parts: list[Any],\n original_msg: dict[str, Any],\n ) -> tuple[dict[str, Any], dict[str, Any], list[str], list[str]]:\n \"\"\"Compress a multipart (list-content) message.\n\n Text parts are run through the full pipeline. Other part types\n (image_url, etc.) are passed through unchanged — Photon handles\n base64 images at the string level, but multipart image_url objects\n are left alone here.\n \"\"\"\n t0 = time.monotonic()\n compressed_parts: list[Any] = []\n all_markers: list[str] = []\n all_warnings: list[str] = []\n total_original_tokens = 0\n total_compressed_tokens = 0\n total_original_chars = 0\n total_compressed_chars = 0\n\n for part in parts:\n if isinstance(part, dict) and part.get(\"type\") == \"text\":\n text = part.get(\"text\", \"\")\n result = self.compress(text=text, role=role)\n compressed_parts.append({**part, \"text\": result[\"compressed\"]})\n all_markers.extend(result[\"markers\"])\n all_warnings.extend(result[\"warnings\"])\n total_original_tokens += estimate_tokens(text)\n total_compressed_tokens += estimate_tokens(result[\"compressed\"])\n total_original_chars += len(text)\n total_compressed_chars += len(result[\"compressed\"])\n else:\n # Non-text part — pass through unchanged.\n compressed_parts.append(part)\n\n elapsed_ms = (time.monotonic() - t0) * 1000\n out_msg = {**original_msg, \"content\": compressed_parts}\n\n msg_stats = {\n \"role\": role,\n \"original_tokens\": total_original_tokens,\n \"compressed_tokens\": total_compressed_tokens,\n \"original_chars\": total_original_chars,\n \"compressed_chars\": total_compressed_chars,\n \"reduction_pct\": _reduction_pct(total_original_tokens, total_compressed_tokens),\n \"timing_ms\": round(elapsed_ms, 2),\n \"stages_run\": 0, # aggregated across parts\n }\n\n return out_msg, msg_stats, all_markers, all_warnings\n\n # ------------------------------------------------------------------\n # Introspection\n # ------------------------------------------------------------------\n\n @property\n def pipeline(self) -> FusionPipeline:\n \"\"\"The underlying FusionPipeline instance.\"\"\"\n return self._pipeline\n\n @property\n def rewind_store(self) -> RewindStore | None:\n \"\"\"The RewindStore instance (None if enable_rewind=False).\"\"\"\n return self._rewind_store\n\n @property\n def stage_names(self) -> list[str]:\n \"\"\"Ordered list of stage names in the pipeline.\"\"\"\n return [t.name for t in self._pipeline.transforms]\n\n\n# ---------------------------------------------------------------------------\n# Stats helpers\n# ---------------------------------------------------------------------------\n\ndef _reduction_pct(original: int, compressed: int) -> float:\n if original == 0:\n return 0.0\n return round((original - compressed) / original * 100, 2)\n\n\ndef _build_stats(\n original_text: str,\n compressed_text: str,\n pipeline_result: Any,\n) -> dict[str, Any]:\n \"\"\"Build a rich stats dict from a single-text pipeline result.\"\"\"\n original_tokens = estimate_tokens(original_text)\n compressed_tokens = estimate_tokens(compressed_text)\n\n stages_run = sum(\n 1 for step in pipeline_result.steps if not step.result.skipped\n )\n stages_skipped = sum(\n 1 for step in pipeline_result.steps if step.result.skipped\n )\n\n per_stage = [\n {\n \"name\": step.transform_name,\n \"skipped\": step.result.skipped,\n \"original_tokens\": step.result.original_tokens,\n \"compressed_tokens\": step.result.compressed_tokens,\n \"timing_ms\": round(step.result.timing_ms, 3),\n }\n for step in pipeline_result.steps\n ]\n\n return {\n \"original_tokens\": original_tokens,\n \"compressed_tokens\": compressed_tokens,\n \"original_chars\": len(original_text),\n \"compressed_chars\": len(compressed_text),\n \"reduction_pct\": _reduction_pct(original_tokens, compressed_tokens),\n \"total_timing_ms\": round(pipeline_result.total_timing_ms, 3),\n \"stages_run\": stages_run,\n \"stages_skipped\": stages_skipped,\n \"per_stage\": per_stage,\n }\n\n\ndef _empty_stats(text: str) -> dict[str, Any]:\n tokens = estimate_tokens(text)\n return {\n \"original_tokens\": tokens,\n \"compressed_tokens\": tokens,\n \"original_chars\": len(text),\n \"compressed_chars\": len(text),\n \"reduction_pct\": 0.0,\n \"total_timing_ms\": 0.0,\n \"stages_run\": 0,\n \"stages_skipped\": 0,\n \"per_stage\": [],\n }\n\n\ndef _empty_aggregate_stats() -> dict[str, Any]:\n return {\n \"original_tokens\": 0,\n \"compressed_tokens\": 0,\n \"original_chars\": 0,\n \"compressed_chars\": 0,\n \"reduction_pct\": 0.0,\n \"total_timing_ms\": 0.0,\n \"message_count\": 0,\n }\n\n\ndef _aggregate_stats(\n original_tokens: int,\n compressed_tokens: int,\n original_chars: int,\n compressed_chars: int,\n timing_ms: float,\n message_count: int,\n) -> dict[str, Any]:\n return {\n \"original_tokens\": original_tokens,\n \"compressed_tokens\": compressed_tokens,\n \"original_chars\": original_chars,\n \"compressed_chars\": compressed_chars,\n \"reduction_pct\": _reduction_pct(original_tokens, compressed_tokens),\n \"total_timing_ms\": round(timing_ms, 3),\n \"message_count\": message_count,\n }\n\n\n# ---------------------------------------------------------------------------\n# v8: Conversation-level compaction API\n# ---------------------------------------------------------------------------\n\nfrom claw_compactor.fusion.tiered_compaction import (\n CompactionLevel,\n CircuitBreaker,\n FileAccessTracker,\n compact as _tiered_compact,\n)\n\n# Re-export for convenience.\nFusionEngine.CompactionLevel = CompactionLevel # type: ignore[attr-defined]\n\n\ndef _compact_messages_method(\n self,\n messages: list[dict[str, Any]],\n token_budget: int = 200_000,\n level: CompactionLevel | None = None,\n) -> tuple[list[dict[str, Any]], dict[str, Any]]:\n \"\"\"Apply tiered compaction to a message list.\n\n Combines tool result budgeting, conversation summarization, and\n per-message Fusion Pipeline compression based on context pressure.\n\n Parameters\n ----------\n messages:\n OpenAI-format message list.\n token_budget:\n Context window size in tokens.\n level:\n Force a specific compaction level. If None, auto-detected.\n\n Returns\n -------\n (compacted_messages, stats)\n \"\"\"\n if not hasattr(self, \"_circuit_breaker\"):\n self._circuit_breaker = CircuitBreaker()\n if not hasattr(self, \"_file_tracker\"):\n self._file_tracker = FileAccessTracker()\n\n return _tiered_compact(\n messages=messages,\n token_budget=token_budget,\n circuit_breaker=self._circuit_breaker,\n file_tracker=self._file_tracker,\n fusion_engine=self,\n level_override=level,\n )\n\n\n# Monkey-patch onto FusionEngine (avoids modifying the class definition above).\nFusionEngine.compact_messages = _compact_messages_method # type: ignore[attr-defined]\n","content_type":"text/x-python; charset=utf-8","language":"python","size":24229,"content_sha256":"699f0919ca3290975d7301897c5a91847f70dba482fe5c88d5ea1f870c60039a"},{"filename":"scripts/lib/fusion/ionizer.py","content":"\"\"\"Ionizer — JSON/structured data compression via statistical sampling.\n\nFor large JSON arrays (common in tool call responses), Ionizer performs\nintelligent sampling rather than brute-force truncation:\n\n 1. Schema discovery — identifies shared keys across dict items\n 2. Error preservation — items containing error/exception signals are\n always kept, regardless of sampling\n 3. Statistical sampling — keeps front/back boundary items plus a\n representative sample from the middle\n 4. Reversible storage — full original array is stored in RewindStore\n with a hash marker, so the LLM can retrieve it via tool call\n\nAchieves 81.9% compression on 100-item JSON arrays while preserving\nstructural understanding and all error cases.\n\nPart of claw-compactor v7. License: MIT.\n\"\"\"\nfrom __future__ import annotations\n\nimport json\nimport random\nfrom typing import Any\n\nfrom claw_compactor.fusion.base import FusionStage, FusionContext, FusionResult\nfrom claw_compactor.rewind.marker import embed_marker\nfrom claw_compactor.rewind.store import RewindStore\nfrom claw_compactor.tokens import estimate_tokens\n\n# Minimum array length before sampling is considered worthwhile.\n_MIN_ARRAY_LEN = 5\n\n# Number of items to keep from the front and back of a dict array.\n_FRONT_BACK_K = 3\n\n# Maximum array length before we start sampling dict arrays.\n_SAMPLE_THRESHOLD = 20\n\n# For large string arrays, keep at most this many unique entries.\n_MAX_UNIQUE_STRINGS = 30\n\n# Keywords that flag an item as an error record that must be preserved.\n_ERROR_KEYWORDS = frozenset({\"error\", \"exception\", \"failed\", \"failure\", \"fatal\"})\n\n\ndef _item_is_error(item: dict) -> bool:\n \"\"\"Return True if any value in *item* contains an error keyword.\"\"\"\n for v in item.values():\n if isinstance(v, str):\n lowered = v.lower()\n if any(kw in lowered for kw in _ERROR_KEYWORDS):\n return True\n if isinstance(v, bool):\n continue\n if isinstance(v, (int, float)):\n continue\n # Also check keys\n for k in item:\n if any(kw in k.lower() for kw in _ERROR_KEYWORDS):\n return True\n return False\n\n\ndef _detect_id_fields(items: list[dict]) -> list[str]:\n \"\"\"Heuristically detect ID-like field names in a list of dicts.\"\"\"\n if not items:\n return []\n candidate_keys: list[str] = []\n first = items[0]\n for k in first:\n lower_k = k.lower()\n if lower_k in {\"id\", \"uuid\", \"key\", \"name\", \"index\", \"seq\", \"sequence\", \"num\", \"number\"}:\n candidate_keys.append(k)\n elif lower_k.endswith(\"_id\") or lower_k.endswith(\"_key\") or lower_k.endswith(\"_uuid\"):\n candidate_keys.append(k)\n return candidate_keys\n\n\ndef _discover_schema(items: list[dict]) -> list[str]:\n \"\"\"Return the union of all keys seen across items.\"\"\"\n seen: dict[str, None] = {}\n for item in items:\n for k in item:\n seen[k] = None\n return list(seen.keys())\n\n\ndef _sample_dict_array(items: list[dict], k: int) -> list[dict]:\n \"\"\"\n Sample a dict array:\n 1. Always keep error items.\n 2. Keep first K + last K items.\n 3. Fill remaining budget with uniform random sample from the middle.\n \"\"\"\n n = len(items)\n error_indices = {i for i, item in enumerate(items) if _item_is_error(item)}\n front_indices = set(range(min(k, n)))\n back_indices = set(range(max(0, n - k), n))\n\n protected = error_indices | front_indices | back_indices\n middle_indices = [i for i in range(n) if i not in protected]\n\n # Determine how many middle items to sample.\n total_budget = min(n, k * 2 + len(error_indices) + max(0, _SAMPLE_THRESHOLD - k * 2))\n middle_budget = max(0, total_budget - len(protected))\n\n if middle_indices and middle_budget > 0:\n sampled_middle = sorted(random.sample(middle_indices, min(middle_budget, len(middle_indices))))\n else:\n sampled_middle = []\n\n kept_indices = sorted(protected | set(sampled_middle))\n return [items[i] for i in kept_indices]\n\n\ndef _compress_dict_array(items: list[dict], rewind_store: RewindStore | None) -> tuple[str, str, int, int]:\n \"\"\"\n Compress a JSON array of dicts. Returns (original_json, compressed_json,\n original_count, compressed_count).\n \"\"\"\n original_json = json.dumps(items, indent=2)\n schema = _discover_schema(items)\n id_fields = _detect_id_fields(items)\n sampled = _sample_dict_array(items, _FRONT_BACK_K)\n\n compressed_json = json.dumps(sampled, indent=2)\n\n schema_comment = f\"// Schema fields: {', '.join(schema)}\"\n if id_fields:\n schema_comment += f\" | ID fields: {', '.join(id_fields)}\"\n\n header = f\"{schema_comment}\\n// Showing {len(sampled)} of {len(items)} items\"\n result_text = f\"{header}\\n{compressed_json}\"\n\n if rewind_store is not None:\n hash_id = rewind_store.store(\n original=original_json,\n compressed=result_text,\n original_tokens=estimate_tokens(original_json),\n compressed_tokens=estimate_tokens(result_text),\n )\n result_text = embed_marker(result_text, len(items), len(sampled), hash_id)\n\n return original_json, result_text, len(items), len(sampled)\n\n\ndef _compress_string_array(items: list[str], rewind_store: RewindStore | None) -> tuple[str, str, int, int]:\n \"\"\"\n Compress a JSON array of strings via deduplication + sampling.\n \"\"\"\n original_json = json.dumps(items, indent=2)\n\n # Deduplicate while preserving order.\n seen: dict[str, None] = {}\n for s in items:\n seen[s] = None\n unique = list(seen.keys())\n\n if len(unique) > _MAX_UNIQUE_STRINGS:\n kept = unique[:_FRONT_BACK_K] + unique[-_FRONT_BACK_K:]\n middle = unique[_FRONT_BACK_K: len(unique) - _FRONT_BACK_K]\n budget = max(0, _MAX_UNIQUE_STRINGS - _FRONT_BACK_K * 2)\n if middle and budget > 0:\n kept += random.sample(middle, min(budget, len(middle)))\n kept_sorted = sorted(set(range(len(unique))),\n key=lambda i: unique[i] not in kept)\n unique = [u for u in unique if u in set(kept)]\n\n compressed_json = json.dumps(unique, indent=2)\n header = f\"// {len(items) - len(unique)} duplicates removed. Showing {len(unique)} of {len(items)} items.\"\n result_text = f\"{header}\\n{compressed_json}\"\n\n if rewind_store is not None:\n hash_id = rewind_store.store(\n original=original_json,\n compressed=result_text,\n original_tokens=estimate_tokens(original_json),\n compressed_tokens=estimate_tokens(result_text),\n )\n result_text = embed_marker(result_text, len(items), len(unique), hash_id)\n\n return original_json, result_text, len(items), len(unique)\n\n\nclass Ionizer(FusionStage):\n \"\"\"JSON array statistical sampling compressor.\"\"\"\n\n name = \"ionizer\"\n order = 15\n\n def __init__(self, rewind_store: RewindStore | None = None) -> None:\n self._rewind_store = rewind_store\n\n def should_apply(self, ctx: FusionContext) -> bool:\n return ctx.content_type == \"json\"\n\n def apply(self, ctx: FusionContext) -> FusionResult:\n original_tokens = estimate_tokens(ctx.content)\n content = ctx.content.strip()\n\n # Attempt to parse the JSON.\n try:\n data: Any = json.loads(content)\n except json.JSONDecodeError as exc:\n return FusionResult(\n content=ctx.content,\n original_tokens=original_tokens,\n compressed_tokens=original_tokens,\n warnings=[f\"Ionizer: JSON parse error — {exc}\"],\n skipped=True,\n )\n\n # Only operate on arrays.\n if not isinstance(data, list):\n return FusionResult(\n content=ctx.content,\n original_tokens=original_tokens,\n compressed_tokens=original_tokens,\n skipped=True,\n )\n\n # Skip small arrays.\n if len(data) \u003c _MIN_ARRAY_LEN:\n return FusionResult(\n content=ctx.content,\n original_tokens=original_tokens,\n compressed_tokens=original_tokens,\n skipped=True,\n )\n\n markers: list[str] = []\n\n # Dispatch based on element type.\n if data and all(isinstance(item, dict) for item in data):\n _, compressed, orig_count, comp_count = _compress_dict_array(data, self._rewind_store)\n markers.append(f\"ionizer:dict_array:{orig_count}->{comp_count}\")\n elif data and all(isinstance(item, str) for item in data):\n _, compressed, orig_count, comp_count = _compress_string_array(data, self._rewind_store)\n markers.append(f\"ionizer:string_array:{orig_count}->{comp_count}\")\n else:\n # Mixed or unsupported array — skip.\n return FusionResult(\n content=ctx.content,\n original_tokens=original_tokens,\n compressed_tokens=original_tokens,\n skipped=True,\n )\n\n compressed_tokens = estimate_tokens(compressed)\n return FusionResult(\n content=compressed,\n original_tokens=original_tokens,\n compressed_tokens=compressed_tokens,\n markers=markers,\n )\n","content_type":"text/x-python; charset=utf-8","language":"python","size":9344,"content_sha256":"afd9d7f6ec65605e734d0b408ad1d3ed141894f74fc86de6f1841c40e8a9004a"},{"filename":"scripts/lib/fusion/llm_summarizer.py","content":"\"\"\"LLMSummarizer — API-based conversation summarization.\n\nInspired by Claude Code's AutoCompact which calls the LLM to generate a 20K\ntoken summary of compacted conversation history. This module provides an\noptional LLM-powered summarizer that produces higher-quality summaries than\nthe deterministic extractor in conversation_summarizer.py.\n\nUsage::\n\n from claw_compactor.fusion.llm_summarizer import LLMSummarizer\n\n summarizer = LLMSummarizer(api_key=\"sk-...\", model=\"claude-sonnet-4-20250514\")\n messages, stats = summarizer.summarize(messages, token_budget=200_000)\n\nFalls back to deterministic summarization if:\n - No API key is provided\n - The LLM call fails\n - The LLM response is empty or too large\n\nPart of claw-compactor v8. License: MIT.\n\"\"\"\nfrom __future__ import annotations\n\nimport json\nimport logging\nimport time\nfrom typing import Any, Callable, Optional, Protocol\n\nfrom claw_compactor.tokens import estimate_tokens\nfrom claw_compactor.fusion.conversation_summarizer import (\n summarize_conversation as _deterministic_summarize,\n _split_messages,\n DEFAULT_PRESERVE_RECENT_TURNS,\n DEFAULT_TRIGGER_PCT,\n MAX_SUMMARY_TOKENS,\n)\n\nlogger = logging.getLogger(__name__)\n\n# Default model for summarization.\nDEFAULT_MODEL = \"claude-sonnet-4-20250514\"\n\n# System prompt for the summarizer LLM call.\n_SUMMARIZER_SYSTEM_PROMPT = \"\"\"\\\nYou are a conversation summarizer. Summarize the conversation history into a \\\nstructured, concise summary that preserves:\n1. All user instructions and requirements\n2. Key decisions made\n3. Files and functions referenced\n4. Errors encountered and their resolutions\n5. Current state of the task\n\nOutput format: Use markdown headers for each section. Be concise but complete.\nDo NOT include any preamble or explanation — output only the summary.\nTarget length: {target_tokens} tokens maximum.\"\"\"\n\n# Budget for the summarizer LLM call itself.\nSUMMARIZER_MAX_INPUT_TOKENS = 100_000\nSUMMARIZER_MAX_OUTPUT_TOKENS = 20_000\n\n\nclass LLMClient(Protocol):\n \"\"\"Protocol for LLM API clients.\"\"\"\n\n def create_message(\n self,\n messages: list[dict[str, str]],\n system: str,\n model: str,\n max_tokens: int,\n ) -> str:\n \"\"\"Send a chat completion request and return the text response.\"\"\"\n ...\n\n\nclass SimpleLLMClient:\n \"\"\"Minimal LLM client that calls an HTTP API.\n\n Supports both Anthropic and OpenAI-compatible endpoints.\n \"\"\"\n\n def __init__(\n self,\n api_key: str,\n base_url: str = \"https://api.anthropic.com\",\n provider: str = \"anthropic\",\n ) -> None:\n self.api_key = api_key\n self.base_url = base_url.rstrip(\"/\")\n self.provider = provider\n\n def create_message(\n self,\n messages: list[dict[str, str]],\n system: str,\n model: str,\n max_tokens: int,\n ) -> str:\n \"\"\"Call the LLM API and return the text response.\"\"\"\n import urllib.request\n import urllib.error\n\n if self.provider == \"anthropic\":\n url = f\"{self.base_url}/v1/messages\"\n headers = {\n \"Content-Type\": \"application/json\",\n \"x-api-key\": self.api_key,\n \"anthropic-version\": \"2023-06-01\",\n }\n body = {\n \"model\": model,\n \"max_tokens\": max_tokens,\n \"system\": system,\n \"messages\": messages,\n }\n else:\n # OpenAI-compatible\n url = f\"{self.base_url}/v1/chat/completions\"\n headers = {\n \"Content-Type\": \"application/json\",\n \"Authorization\": f\"Bearer {self.api_key}\",\n }\n body = {\n \"model\": model,\n \"max_tokens\": max_tokens,\n \"messages\": [{\"role\": \"system\", \"content\": system}] + messages,\n }\n\n data = json.dumps(body).encode(\"utf-8\")\n req = urllib.request.Request(url, data=data, headers=headers, method=\"POST\")\n\n try:\n with urllib.request.urlopen(req, timeout=60) as resp:\n result = json.loads(resp.read().decode(\"utf-8\"))\n except (urllib.error.URLError, urllib.error.HTTPError) as exc:\n raise RuntimeError(f\"LLM API call failed: {exc}\") from exc\n\n if self.provider == \"anthropic\":\n content_blocks = result.get(\"content\", [])\n return \"\".join(\n block.get(\"text\", \"\") for block in content_blocks\n if block.get(\"type\") == \"text\"\n )\n else:\n choices = result.get(\"choices\", [])\n if choices:\n return choices[0].get(\"message\", {}).get(\"content\", \"\")\n return \"\"\n\n\nclass LLMSummarizer:\n \"\"\"LLM-powered conversation summarizer with deterministic fallback.\n\n Parameters\n ----------\n api_key:\n API key for the LLM service. If None, always falls back to\n deterministic summarization.\n model:\n Model to use for summarization.\n base_url:\n API base URL.\n provider:\n \"anthropic\" or \"openai\".\n client:\n Optional pre-built LLM client (overrides api_key/base_url/provider).\n fallback_to_deterministic:\n If True (default), falls back to deterministic summarization on failure.\n max_output_tokens:\n Maximum tokens for the LLM response.\n \"\"\"\n\n def __init__(\n self,\n api_key: Optional[str] = None,\n model: str = DEFAULT_MODEL,\n base_url: str = \"https://api.anthropic.com\",\n provider: str = \"anthropic\",\n client: Optional[LLMClient] = None,\n fallback_to_deterministic: bool = True,\n max_output_tokens: int = SUMMARIZER_MAX_OUTPUT_TOKENS,\n ) -> None:\n self.model = model\n self.fallback_to_deterministic = fallback_to_deterministic\n self.max_output_tokens = max_output_tokens\n\n if client is not None:\n self._client: Optional[LLMClient] = client\n elif api_key:\n self._client = SimpleLLMClient(\n api_key=api_key, base_url=base_url, provider=provider\n )\n else:\n self._client = None\n\n @property\n def has_llm(self) -> bool:\n \"\"\"Whether an LLM client is configured.\"\"\"\n return self._client is not None\n\n def summarize(\n self,\n messages: list[dict[str, Any]],\n token_budget: int = 200_000,\n trigger_pct: float = DEFAULT_TRIGGER_PCT,\n preserve_recent_turns: int = DEFAULT_PRESERVE_RECENT_TURNS,\n ) -> tuple[list[dict[str, Any]], dict[str, Any]]:\n \"\"\"Summarize conversation using LLM, with deterministic fallback.\n\n Parameters\n ----------\n messages:\n OpenAI-format message list.\n token_budget:\n Context window size in tokens.\n trigger_pct:\n Fraction of token_budget at which summarization activates.\n preserve_recent_turns:\n Number of recent turn pairs to keep verbatim.\n\n Returns\n -------\n (new_messages, stats)\n \"\"\"\n total_tokens = sum(\n estimate_tokens(\n m.get(\"content\", \"\") if isinstance(m.get(\"content\"), str)\n else str(m.get(\"content\", \"\"))\n )\n for m in messages\n )\n threshold = int(token_budget * trigger_pct)\n\n if total_tokens \u003c threshold:\n return messages, {\n \"method\": \"none\",\n \"triggered\": False,\n \"total_tokens_before\": total_tokens,\n \"total_tokens_after\": total_tokens,\n }\n\n # Split messages.\n system_msgs, body_msgs, recent_msgs = _split_messages(\n messages, preserve_recent_turns\n )\n\n if len(body_msgs) \u003c 2:\n return messages, {\n \"method\": \"none\",\n \"triggered\": False,\n \"reason\": \"too_few_messages\",\n \"total_tokens_before\": total_tokens,\n \"total_tokens_after\": total_tokens,\n }\n\n # Try LLM summarization first.\n if self._client is not None:\n try:\n summary, llm_stats = self._llm_summarize(body_msgs)\n boundary_msg = self._make_boundary(\n summary, len(body_msgs), total_tokens, method=\"llm\"\n )\n new_messages = system_msgs + [boundary_msg] + recent_msgs\n new_total = sum(\n estimate_tokens(\n m.get(\"content\", \"\") if isinstance(m.get(\"content\"), str)\n else str(m.get(\"content\", \"\"))\n )\n for m in new_messages\n )\n return new_messages, {\n \"method\": \"llm\",\n \"model\": self.model,\n \"triggered\": True,\n \"turns_summarized\": len(body_msgs),\n \"total_tokens_before\": total_tokens,\n \"total_tokens_after\": new_total,\n **llm_stats,\n }\n except Exception as exc:\n logger.warning(\"LLM summarization failed: %s\", exc)\n if not self.fallback_to_deterministic:\n return messages, {\n \"method\": \"llm_failed\",\n \"triggered\": True,\n \"error\": str(exc),\n \"total_tokens_before\": total_tokens,\n \"total_tokens_after\": total_tokens,\n }\n # Fall through to deterministic.\n\n # Deterministic fallback.\n return _deterministic_summarize(\n messages,\n token_budget=token_budget,\n trigger_pct=trigger_pct,\n preserve_recent_turns=preserve_recent_turns,\n )\n\n def _llm_summarize(\n self, body_msgs: list[dict[str, Any]]\n ) -> tuple[str, dict[str, Any]]:\n \"\"\"Call the LLM to summarize conversation body messages.\"\"\"\n assert self._client is not None\n\n # Build the conversation text for the LLM.\n conversation_parts: list[str] = []\n total_input_tokens = 0\n\n for msg in body_msgs:\n role = msg.get(\"role\", \"unknown\")\n content = msg.get(\"content\", \"\")\n if not isinstance(content, str):\n content = str(content)\n line = f\"[{role}]: {content}\"\n line_tokens = estimate_tokens(line)\n if total_input_tokens + line_tokens > SUMMARIZER_MAX_INPUT_TOKENS:\n conversation_parts.append(\"[...older messages truncated...]\")\n break\n conversation_parts.append(line)\n total_input_tokens += line_tokens\n\n system_prompt = _SUMMARIZER_SYSTEM_PROMPT.format(\n target_tokens=self.max_output_tokens\n )\n\n t0 = time.monotonic()\n summary = self._client.create_message(\n messages=[{\"role\": \"user\", \"content\": \"\\n\\n\".join(conversation_parts)}],\n system=system_prompt,\n model=self.model,\n max_tokens=self.max_output_tokens,\n )\n elapsed_ms = (time.monotonic() - t0) * 1000\n\n # Enforce MAX_SUMMARY_TOKENS.\n summary_tokens = estimate_tokens(summary)\n if summary_tokens > MAX_SUMMARY_TOKENS:\n lines = summary.split(\"\\n\")\n while estimate_tokens(\"\\n\".join(lines)) > MAX_SUMMARY_TOKENS and len(lines) > 5:\n lines.pop()\n summary = \"\\n\".join(lines) + \"\\n[...truncated]\"\n\n stats = {\n \"llm_input_tokens\": total_input_tokens,\n \"llm_output_tokens\": estimate_tokens(summary),\n \"llm_latency_ms\": round(elapsed_ms, 2),\n }\n return summary, stats\n\n def _make_boundary(\n self,\n summary: str,\n turns_summarized: int,\n original_tokens: int,\n method: str = \"llm\",\n ) -> dict[str, Any]:\n \"\"\"Create a compact_boundary system message.\"\"\"\n return {\n \"role\": \"system\",\n \"content\": json.dumps({\n \"type\": \"system\",\n \"subtype\": \"compact_boundary\",\n \"summary\": summary,\n \"compactMetadata\": {\n \"turnsSummarized\": turns_summarized,\n \"originalTokens\": original_tokens,\n \"compressedTokens\": estimate_tokens(summary),\n \"preservedSegment\": True,\n \"method\": method,\n },\n }),\n }\n","content_type":"text/x-python; charset=utf-8","language":"python","size":12608,"content_sha256":"ce11aa5e0a9956863b0a99b527ec8f438a13cd423f8b5d3a67c57f96e8dea714"},{"filename":"scripts/lib/fusion/log_crunch.py","content":"\"\"\"LogCrunch — Build/test log compression FusionStage.\n\nPreserves ERROR/WARN/FATAL lines, stack traces, and failure-related lines.\nCompresses repeated INFO/DEBUG lines to occurrence summaries.\nNormalises timestamps to relative deltas.\n\nPart of claw-compactor. License: MIT.\n\"\"\"\nfrom __future__ import annotations\n\nimport re\nfrom typing import NamedTuple\n\nfrom claw_compactor.fusion.base import FusionStage, FusionContext, FusionResult\nfrom claw_compactor.tokens import estimate_tokens\n\n# ---------------------------------------------------------------------------\n# Regex patterns\n# ---------------------------------------------------------------------------\n\n# Standard log level prefix: optional timestamp, optional logger name, level.\n_LEVEL_RE = re.compile(\n r'(?i)\\b(ERROR|ERR|FATAL|CRITICAL|WARN(?:ING)?|INFO|DEBUG|TRACE|VERBOSE)\\b'\n)\n\n_ERROR_LEVEL_RE = re.compile(r'(?i)\\b(ERROR|ERR|FATAL|CRITICAL)\\b')\n_WARN_LEVEL_RE = re.compile(r'(?i)\\bWARN(?:ING)?\\b')\n_INFO_DEBUG_RE = re.compile(r'(?i)\\b(INFO|DEBUG|TRACE|VERBOSE)\\b')\n\n# Lines that always matter regardless of log level.\n_IMPORTANT_CONTENT_RE = re.compile(\n r'(?i)(failed|failure|exception|error|assert|panic|abort|traceback|caused by)',\n)\n\n# Stack-trace indicators: indented lines or common stack frame patterns.\n_STACK_INDENT_RE = re.compile(r'^(\\s{2,}|\\t)')\n_STACK_FRAME_RE = re.compile(\n r'(?:'\n r'^\\s+at\\s+' # Java/JS: \" at ...\"\n r'|^\\s+File\\s+\"' # Python: ' File \"...'\n r'|^\\s+in\\s+\\w' # Go: ' in funcName'\n r'|\\bTraceback\\b' # Python: 'Traceback (most recent call last):'\n r'|\\bgoroutine\\s+\\d+\\b' # Go goroutine dump\n r')',\n re.IGNORECASE,\n)\n\n# Common timestamp formats — we capture the group so we can normalise.\n_TIMESTAMP_RE = re.compile(\n r'(?:'\n r'\\d{4}-\\d{2}-\\d{2}[T ]\\d{2}:\\d{2}:\\d{2}(?:\\.\\d+)?(?:Z|[+-]\\d{2}:?\\d{2})?' # ISO 8601\n r'|\\d{2}:\\d{2}:\\d{2}(?:\\.\\d+)?' # HH:MM:SS\n r'|\\d{10,13}' # Unix epoch (seconds or ms)\n r')'\n)\n\n# How many trailing non-important INFO/DEBUG lines to keep as \"context\".\n_TAIL_CONTEXT = 2\n# Minimum repetition count before we collapse.\n_MIN_REPEAT = 3\n\n\nclass _LineInfo(NamedTuple):\n raw: str\n important: bool # must always be preserved\n in_trace: bool # part of a stack-trace block\n level: str # \"error\" | \"warn\" | \"info_debug\" | \"other\"\n norm: str # normalised version of the line (timestamps replaced)\n\n\ndef _classify_line(line: str) -> _LineInfo:\n \"\"\"Classify a single log line.\"\"\"\n level_match = _LEVEL_RE.search(line)\n level_str = level_match.group(1).upper() if level_match else \"\"\n\n is_error = bool(_ERROR_LEVEL_RE.search(line))\n is_warn = bool(_WARN_LEVEL_RE.search(line))\n is_info_debug = bool(_INFO_DEBUG_RE.search(line)) and not is_error and not is_warn\n is_important_content = bool(_IMPORTANT_CONTENT_RE.search(line))\n is_stack = bool(_STACK_FRAME_RE.search(line))\n\n important = is_error or is_warn or is_important_content or is_stack\n\n if is_error:\n level = \"error\"\n elif is_warn:\n level = \"warn\"\n elif is_info_debug:\n level = \"info_debug\"\n else:\n level = \"other\"\n\n norm = _TIMESTAMP_RE.sub(\"\u003cTS>\", line)\n\n return _LineInfo(\n raw=line,\n important=important,\n in_trace=is_stack,\n level=level,\n norm=norm,\n )\n\n\ndef _is_stack_continuation(line: str) -> bool:\n \"\"\"Return True if this line looks like it belongs inside a stack trace.\"\"\"\n return bool(_STACK_FRAME_RE.search(line) or _STACK_INDENT_RE.match(line))\n\n\ndef _normalise_timestamps(lines: list[str]) -> list[str]:\n \"\"\"Replace absolute timestamps with relative deltas (+Xs) where possible.\"\"\"\n # We do a best-effort pass: find the first ISO timestamp and use it as t0.\n first_ts: float | None = None\n result: list[str] = []\n\n for line in lines:\n m = re.search(\n r'(\\d{4}-\\d{2}-\\d{2}[T ](\\d{2}):(\\d{2}):(\\d{2})(?:\\.(\\d+))?)',\n line,\n )\n if m:\n try:\n h, mn, s = int(m.group(2)), int(m.group(3)), int(m.group(4))\n frac = float(\"0.\" + m.group(5)) if m.group(5) else 0.0\n ts = h * 3600 + mn * 60 + s + frac\n if first_ts is None:\n first_ts = ts\n delta = ts - first_ts\n new_line = line[: m.start()] + f\"[+{delta:.3f}s]\" + line[m.end():]\n result.append(new_line)\n continue\n except (ValueError, IndexError):\n pass\n result.append(line)\n\n return result\n\n\ndef _compress_log(lines: list[str]) -> list[str]:\n \"\"\"\n Core compression logic:\n - Always keep important lines (error/warn/important content/stack traces).\n - Collapse runs of repeated info/debug lines.\n - Keep first + last occurrence of repeated patterns.\n \"\"\"\n classified = [_classify_line(ln) for ln in lines]\n output: list[str] = []\n\n # Track whether we are inside a stack-trace block.\n in_trace = False\n trace_buffer: list[str] = []\n\n # Track runs of info/debug lines with the same normalised form.\n run_norm: str | None = None\n run_lines: list[str] = []\n\n def flush_run() -> None:\n nonlocal run_norm, run_lines\n if not run_lines:\n return\n if len(run_lines) >= _MIN_REPEAT:\n output.append(run_lines[0])\n output.append(f\"[... repeated {len(run_lines) - 2} more times ...]\")\n output.append(run_lines[-1])\n else:\n output.extend(run_lines)\n run_norm = None\n run_lines = []\n\n def flush_trace() -> None:\n nonlocal in_trace, trace_buffer\n output.extend(trace_buffer)\n in_trace = False\n trace_buffer = []\n\n i = 0\n while i \u003c len(classified):\n info = classified[i]\n line = info.raw\n\n # Detect start of stack trace block.\n if not in_trace and _STACK_FRAME_RE.search(line):\n flush_run()\n in_trace = True\n trace_buffer = [line]\n i += 1\n # Collect continuation lines.\n while i \u003c len(classified):\n next_info = classified[i]\n if _is_stack_continuation(next_info.raw) or next_info.in_trace:\n trace_buffer.append(next_info.raw)\n i += 1\n else:\n break\n flush_trace()\n continue\n\n # Important line — always keep.\n if info.important:\n flush_run()\n output.append(line)\n i += 1\n continue\n\n # Info/debug line — try to collapse repetitions.\n if info.level == \"info_debug\":\n if info.norm == run_norm:\n run_lines.append(line)\n else:\n flush_run()\n run_norm = info.norm\n run_lines = [line]\n i += 1\n continue\n\n # Other lines (no level detected): keep them but break any run.\n flush_run()\n output.append(line)\n i += 1\n\n flush_run()\n return output\n\n\nclass LogCrunch(FusionStage):\n \"\"\"Build/test log compression. Preserves errors, warnings and stack traces.\"\"\"\n\n name = \"log_crunch\"\n order = 16\n\n def __init__(self, normalise_timestamps: bool = True) -> None:\n self._normalise_timestamps = normalise_timestamps\n\n def should_apply(self, ctx: FusionContext) -> bool:\n return ctx.content_type == \"log\"\n\n def apply(self, ctx: FusionContext) -> FusionResult:\n original_tokens = estimate_tokens(ctx.content)\n lines = ctx.content.splitlines()\n\n if self._normalise_timestamps:\n lines = _normalise_timestamps(lines)\n\n compressed_lines = _compress_log(lines)\n compressed = \"\\n\".join(compressed_lines)\n compressed_tokens = estimate_tokens(compressed)\n\n original_count = len(lines)\n compressed_count = len(compressed_lines)\n markers = [f\"log_crunch:{original_count}->{compressed_count} lines\"]\n\n return FusionResult(\n content=compressed,\n original_tokens=original_tokens,\n compressed_tokens=compressed_tokens,\n markers=markers,\n )\n","content_type":"text/x-python; charset=utf-8","language":"python","size":8316,"content_sha256":"f50e70cdd3152aa6d6cfe29f35bcd2a140c4d8f1ce05c6f084ce66d1a396a1e6"},{"filename":"scripts/lib/fusion/neurosyntax.py","content":"\"\"\"Neurosyntax — AST-aware code compression FusionStage.\n\nUses tree-sitter for multi-language AST parsing when available; falls back to\nsafe regex-based compression that strips comments and normalizes whitespace\nwithout touching code semantics.\n\nCritical safety rule: identifier names are NEVER shortened. Class names,\nfunction names, and variable names are semantic anchors that LLMs use to\nunderstand code context. Shortening them destroys comprehension and causes\ndownstream task failures (validated on SWE-bench).\n\nSupports: Python, JavaScript, TypeScript, Go, Rust, Java, C, C++, Ruby,\nPHP, Swift, Kotlin, Scala, Bash, R, Perl.\n\nPart of claw-compactor v7. License: MIT.\n\"\"\"\nfrom __future__ import annotations\n\nimport re\nfrom typing import Any\n\nfrom claw_compactor.fusion.base import FusionStage, FusionContext, FusionResult\nfrom claw_compactor.tokens import estimate_tokens\n\n# ---------------------------------------------------------------------------\n# Optional tree-sitter import\n# ---------------------------------------------------------------------------\n_TREE_SITTER_AVAILABLE = False\ntry:\n import tree_sitter_language_pack as tslp # type: ignore[import]\n _TREE_SITTER_AVAILABLE = True\nexcept ImportError:\n pass\n\n\n# ---------------------------------------------------------------------------\n# Comment patterns per language family\n# ---------------------------------------------------------------------------\n_HASH_COMMENT_LANGS = {\"python\", \"ruby\", \"bash\", \"sh\", \"perl\", \"r\"}\n_SLASH_COMMENT_LANGS = {\"javascript\", \"typescript\", \"java\", \"go\", \"rust\", \"c\", \"cpp\", \"csharp\", \"kotlin\", \"swift\"}\n\n# Matches a full-line Python/Ruby/shell comment (optional leading whitespace + #)\n_HASH_COMMENT_RE = re.compile(r\"^\\s*#\")\n# Matches a full-line C-family comment (optional leading whitespace + //)\n_SLASH_COMMENT_RE = re.compile(r\"^\\s*//\")\n# Matches a full-line block-comment opener or closer /* ... */\n_BLOCK_OPEN_RE = re.compile(r\"^\\s*/\\*\")\n_BLOCK_CLOSE_RE = re.compile(r\"\\*/\\s*$\")\n\n# Annotations that must be preserved even inside comment lines\n_IMPORTANT_COMMENT_RE = re.compile(\n r\"type:\\s*ignore|noqa|pragma|TODO|FIXME|HACK|NOTE\"\n r\"|eslint-disable|@ts-ignore|@ts-expect-error\",\n re.IGNORECASE,\n)\n\n# Python triple-quote docstring openers\n_TRIPLE_DOUBLE_RE = re.compile(r'^\\s*(\"\"\")')\n_TRIPLE_SINGLE_RE = re.compile(r\"^\\s*(''')\")\n\n# Python import lines\n_IMPORT_RE = re.compile(r\"^\\s*(import |from \\S+ import )\")\n\n\nclass Neurosyntax(FusionStage):\n \"\"\"AST-aware code compression. Uses tree-sitter when available, regex fallback otherwise.\"\"\"\n\n name = \"neurosyntax\"\n order = 25 # after Cortex(5), before dictionary/dedup stages\n\n SUPPORTED_LANGS = {\"python\", \"javascript\", \"typescript\", \"java\", \"go\", \"rust\", \"c\", \"cpp\"}\n\n def __init__(self) -> None:\n self._tree_sitter_available = _TREE_SITTER_AVAILABLE\n\n # ------------------------------------------------------------------\n # FusionStage interface\n # ------------------------------------------------------------------\n\n def should_apply(self, ctx: FusionContext) -> bool:\n return ctx.content_type == \"code\"\n\n def apply(self, ctx: FusionContext) -> FusionResult:\n language = ctx.language\n original_tokens = estimate_tokens(ctx.content)\n\n if self._tree_sitter_available and language in self.SUPPORTED_LANGS:\n compressed = self._ast_compress(ctx.content, language)\n else:\n compressed = self._fallback_compress(ctx.content, language)\n\n compressed_tokens = estimate_tokens(compressed)\n return FusionResult(\n content=compressed,\n original_tokens=original_tokens,\n compressed_tokens=compressed_tokens,\n )\n\n # ------------------------------------------------------------------\n # Regex fallback (primary path — tree-sitter is optional)\n # ------------------------------------------------------------------\n\n def _fallback_compress(self, text: str, language: str | None) -> str:\n \"\"\"Safe regex-based code compression. No identifier shortening.\"\"\"\n lines = text.split(\"\\n\")\n result: list[str] = []\n in_block_comment = False\n in_docstring = False\n docstring_quote: str | None = None\n docstring_first_content: str | None = None\n is_python = (language == \"python\")\n\n i = 0\n while i \u003c len(lines):\n line = lines[i]\n stripped = line.strip()\n\n # ---- Block comment tracking (C-family) ----\n if not is_python and not in_block_comment and _BLOCK_OPEN_RE.match(line):\n in_block_comment = True\n if self._is_important_comment(line):\n result.append(line.rstrip())\n # else: skip the opening line entirely\n if \"*/\" in line:\n in_block_comment = False # single-line /* ... */\n i += 1\n continue\n\n if in_block_comment:\n if self._is_important_comment(line):\n result.append(line.rstrip())\n if _BLOCK_CLOSE_RE.search(line):\n in_block_comment = False\n i += 1\n continue\n\n # ---- Python docstring collapsing ----\n if is_python and not in_docstring:\n quote = self._docstring_opener(stripped)\n if quote is not None:\n # Check if it closes on the same line (after the opener)\n rest = stripped[len(quote):]\n if rest.endswith(quote) and len(rest) >= len(quote):\n # Single-line docstring — keep as-is\n result.append(line.rstrip())\n i += 1\n continue\n # Multi-line docstring: record first content line\n first_content = rest.strip()\n in_docstring = True\n docstring_quote = quote\n docstring_first_content = first_content\n indent = len(line) - len(line.lstrip())\n # Emit collapsed single-line version once we know the content\n # We'll finalize when we hit the closing quote\n i += 1\n # Collect until closing quote\n closing_found = False\n while i \u003c len(lines):\n dl = lines[i]\n ds = dl.strip()\n if docstring_quote in ds:\n closing_found = True\n in_docstring = False\n # emit collapsed form\n preview = docstring_first_content or ds.replace(docstring_quote, \"\").strip()\n if preview:\n result.append(\" \" * indent + quote + preview + \" \" + quote)\n i += 1\n break\n if not docstring_first_content:\n docstring_first_content = ds\n i += 1\n if not closing_found:\n in_docstring = False\n continue\n\n # ---- Pure comment lines ----\n if self._is_pure_comment(line, language):\n if self._is_important_comment(line):\n result.append(line.rstrip())\n # else: drop\n i += 1\n continue\n\n # ---- Blank line deduplication ----\n if not stripped:\n if result and not result[-1].strip():\n i += 1\n continue # skip consecutive blanks\n result.append(\"\")\n i += 1\n continue\n\n # ---- Trailing whitespace strip ----\n result.append(line.rstrip())\n i += 1\n\n return \"\\n\".join(result)\n\n # ------------------------------------------------------------------\n # Tree-sitter AST path (optional)\n # ------------------------------------------------------------------\n\n def _ast_compress(self, text: str, language: str) -> str:\n \"\"\"AST-aware compression using tree-sitter.\"\"\"\n try:\n parser = tslp.get_parser(language)\n tree = parser.parse(text.encode())\n root = tree.root_node\n lines = text.split(\"\\n\")\n keep_ranges = self._collect_keep_ranges(root, language)\n return self._reconstruct(lines, keep_ranges)\n except Exception: # noqa: BLE001 — graceful fallback\n return self._fallback_compress(text, language)\n\n def _collect_keep_ranges(self, root: Any, language: str) -> list[tuple[int, int]]:\n \"\"\"Walk the AST and return (start_line, end_line) ranges to keep (0-indexed, inclusive).\"\"\"\n keep: list[tuple[int, int]] = []\n self._walk(root, keep, language)\n return sorted(set_merge(keep))\n\n def _walk(self, node: Any, keep: list[tuple[int, int]], language: str) -> None:\n \"\"\"Recursively walk tree-sitter nodes and collect keep ranges.\"\"\"\n node_type = node.type\n\n # Always keep: import statements, top-level declarations, type annotations\n if node_type in {\n \"import_statement\", \"import_from_statement\", # Python\n \"import_declaration\", \"import_specifier\", # JS/TS\n \"use_declaration\", # Rust\n \"package_declaration\", \"import_declaration\", # Java/Go\n }:\n keep.append((node.start_point[0], node.end_point[0]))\n return\n\n # Always keep: function / method / class signatures (first line only for bodies)\n if node_type in {\n \"function_definition\", \"function_declaration\", \"method_definition\",\n \"class_definition\", \"class_declaration\",\n \"decorated_definition\", # Python decorators + def/class\n }:\n sig_end = node.start_point[0]\n # Keep decorator lines too\n keep.append((node.start_point[0], sig_end))\n # Walk children to keep signature parts and returns; compress body\n for child in node.children:\n if child.type == \"block\" or child.type == \"statement_block\":\n self._compress_body(child, keep)\n else:\n keep.append((child.start_point[0], child.end_point[0]))\n return\n\n # Always keep: error handling\n if node_type in {\n \"try_statement\", \"except_clause\", \"finally_clause\",\n \"catch_clause\", \"try_expression\",\n }:\n keep.append((node.start_point[0], node.end_point[0]))\n return\n\n # Recurse into everything else\n for child in node.children:\n self._walk(child, keep, language)\n\n def _compress_body(self, block_node: Any, keep: list[tuple[int, int]]) -> None:\n \"\"\"Keep only first line + return/raise statements from a function body.\"\"\"\n if not block_node.children:\n return\n first = block_node.children[0]\n keep.append((first.start_point[0], first.end_point[0]))\n for child in block_node.children:\n if child.type in {\"return_statement\", \"raise_statement\", \"throw_statement\"}:\n keep.append((child.start_point[0], child.end_point[0]))\n\n def _reconstruct(self, lines: list[str], keep_ranges: list[tuple[int, int]]) -> str:\n \"\"\"Rebuild source from original lines, keeping only the kept ranges.\"\"\"\n if not keep_ranges:\n return \"\\n\".join(lines)\n kept: list[str] = []\n for start, end in keep_ranges:\n for ln in range(start, min(end + 1, len(lines))):\n kept.append(lines[ln].rstrip())\n return \"\\n\".join(kept)\n\n # ------------------------------------------------------------------\n # Comment helpers\n # ------------------------------------------------------------------\n\n def _is_pure_comment(self, line: str, language: str | None) -> bool:\n \"\"\"Return True if the line is entirely a comment (no code).\"\"\"\n if not line.strip():\n return False\n lang = (language or \"\").lower()\n if lang in _HASH_COMMENT_LANGS or lang == \"\":\n if _HASH_COMMENT_RE.match(line):\n return True\n if lang in _SLASH_COMMENT_LANGS:\n if _SLASH_COMMENT_RE.match(line):\n return True\n # Python fallback\n if lang == \"python\" and _HASH_COMMENT_RE.match(line):\n return True\n return False\n\n def _is_important_comment(self, line: str) -> bool:\n \"\"\"Return True if the comment contains a marker that must be preserved.\"\"\"\n return bool(_IMPORTANT_COMMENT_RE.search(line))\n\n # ------------------------------------------------------------------\n # Docstring helpers\n # ------------------------------------------------------------------\n\n def _docstring_opener(self, stripped: str) -> str | None:\n \"\"\"Return the triple-quote token if this line opens a Python docstring, else None.\"\"\"\n if stripped.startswith('\"\"\"'):\n return '\"\"\"'\n if stripped.startswith(\"'''\"):\n return \"'''\"\n return None\n\n\n# ---------------------------------------------------------------------------\n# Utility: merge overlapping line ranges\n# ---------------------------------------------------------------------------\n\ndef set_merge(ranges: list[tuple[int, int]]) -> list[tuple[int, int]]:\n \"\"\"Merge overlapping or adjacent (start, end) ranges.\"\"\"\n if not ranges:\n return []\n sorted_ranges = sorted(ranges)\n merged: list[tuple[int, int]] = [sorted_ranges[0]]\n for start, end in sorted_ranges[1:]:\n prev_start, prev_end = merged[-1]\n if start \u003c= prev_end + 1:\n merged[-1] = (prev_start, max(prev_end, end))\n else:\n merged.append((start, end))\n return merged\n","content_type":"text/x-python; charset=utf-8","language":"python","size":14058,"content_sha256":"356a661404d5e5871e3a5d1fa3678d724777f76fe8ef9c069588bef5e8836c62"},{"filename":"scripts/lib/fusion/nexus_model.py","content":"\"\"\"Nexus ML model architecture — CrunchModel dual-head token classifier.\n\nProvides:\n - CrunchModel(nn.Module): backbone + token_head + span_head\n - forward() returning token_logits and span_scores\n - compress() running inference and filtering tokens\n\nWhen torch is unavailable the module exports stub classes that raise\nImportError on instantiation, so callers can guard with TORCH_AVAILABLE.\n\nPart of claw-compactor. License: MIT.\n\"\"\"\nfrom __future__ import annotations\n\n# ---------------------------------------------------------------------------\n# Optional torch import\n# ---------------------------------------------------------------------------\ntry:\n import torch\n import torch.nn as nn\n import torch.nn.functional as F\n TORCH_AVAILABLE = True\nexcept ImportError: # pragma: no cover\n TORCH_AVAILABLE = False\n torch = None # type: ignore[assignment]\n nn = None # type: ignore[assignment]\n F = None # type: ignore[assignment]\n\n\n# ---------------------------------------------------------------------------\n# Model constants\n# ---------------------------------------------------------------------------\n_HIDDEN_SIZE = 128 # lightweight backbone hidden dim (mock/test-safe)\n_SPAN_KERNEL = 3 # 1-D CNN kernel size for span head\n_NUM_LABELS = 2 # keep / discard\n\n\n# ---------------------------------------------------------------------------\n# CrunchModel — only defined when torch is present\n# ---------------------------------------------------------------------------\nif TORCH_AVAILABLE:\n\n class CrunchModel(nn.Module): # type: ignore[misc]\n \"\"\"Dual-head ModernBERT-style token classifier.\n\n Architecture:\n backbone — 2-layer bidirectional GRU over token embeddings\n token_head — linear → 2-class logits (keep / discard) per token\n span_head — 1-D CNN → scalar importance score per token position\n\n The backbone is intentionally small so tests run on CPU with random\n weights in milliseconds. In production the backbone would be replaced\n by a pretrained ModernBERT encoder.\n \"\"\"\n\n def __init__(\n self,\n vocab_size: int = 30522, # default BERT vocab size\n embed_dim: int = 64,\n hidden_size: int = _HIDDEN_SIZE,\n num_labels: int = _NUM_LABELS,\n span_kernel: int = _SPAN_KERNEL,\n ) -> None:\n super().__init__()\n self.embedding = nn.Embedding(vocab_size, embed_dim, padding_idx=0)\n self.backbone = nn.GRU(\n input_size=embed_dim,\n hidden_size=hidden_size,\n num_layers=2,\n batch_first=True,\n bidirectional=True,\n )\n backbone_out = hidden_size * 2 # bidirectional\n\n # Token head: per-token binary classification\n self.token_head = nn.Linear(backbone_out, num_labels)\n\n # Span head: 1-D CNN over backbone output → importance scalar\n self.span_conv = nn.Conv1d(\n in_channels=backbone_out,\n out_channels=1,\n kernel_size=span_kernel,\n padding=span_kernel // 2,\n )\n\n def forward(\n self,\n input_ids: \"torch.Tensor\", # (B, T)\n ) -> tuple[\"torch.Tensor\", \"torch.Tensor\"]:\n \"\"\"Return (token_logits, span_scores).\n\n token_logits : (B, T, num_labels) — raw logits for keep/discard\n span_scores : (B, T) — importance score in [0, 1]\n \"\"\"\n emb = self.embedding(input_ids) # (B, T, E)\n hidden, _ = self.backbone(emb) # (B, T, 2*H)\n\n token_logits = self.token_head(hidden) # (B, T, 2)\n\n # Span head needs (B, C, T) channel-first layout\n hidden_t = hidden.transpose(1, 2) # (B, 2*H, T)\n span_raw = self.span_conv(hidden_t) # (B, 1, T)\n span_scores = torch.sigmoid(\n span_raw.squeeze(1) # (B, T)\n )\n\n return token_logits, span_scores\n\n def compress(\n self,\n tokens: list[str],\n token_prob_threshold: float = 0.5,\n span_score_threshold: float = 0.6,\n uncertain_low: float = 0.3,\n ) -> list[str]:\n \"\"\"Run inference and return the filtered token list.\n\n Fusion rule:\n keep if token_prob > token_prob_threshold\n OR (uncertain_low \u003c token_prob \u003c token_prob_threshold\n AND span_score > span_score_threshold)\n\n Args:\n tokens: whitespace-split word tokens.\n token_prob_threshold: minimum keep-class probability to keep\n a token outright.\n span_score_threshold: span importance threshold applied in the\n uncertain band.\n uncertain_low: lower bound of the uncertain probability band.\n\n Returns:\n Filtered list of kept tokens (same strings, no modifications).\n \"\"\"\n if not tokens:\n return []\n\n # Encode tokens as simple char-hash indices (mock tokenizer).\n # In production this would use a real BPE tokenizer.\n input_ids = torch.tensor(\n [[_char_hash(t) for t in tokens]],\n dtype=torch.long,\n ) # (1, T)\n\n self.eval()\n with torch.no_grad():\n token_logits, span_scores = self.forward(input_ids)\n\n # token_logits: (1, T, 2) → probabilities\n probs = F.softmax(token_logits, dim=-1) # (1, T, 2)\n keep_probs = probs[0, :, 1].tolist() # (T,) — prob of keep class\n span_vals = span_scores[0, :].tolist() # (T,)\n\n kept: list[str] = []\n for token, kp, sv in zip(tokens, keep_probs, span_vals):\n if kp > token_prob_threshold:\n kept.append(token)\n elif uncertain_low \u003c kp and sv > span_score_threshold:\n kept.append(token)\n # else: discard\n\n return kept\n\nelse:\n # Stub so `from claw_compactor.fusion.nexus_model import CrunchModel` always works.\n class CrunchModel: # type: ignore[no-redef]\n \"\"\"Stub — torch is not installed.\"\"\"\n\n def __init__(self, *args, **kwargs): # noqa: ANN204\n raise ImportError(\n \"CrunchModel requires torch. Install it with: pip install torch\"\n )\n\n\n# ---------------------------------------------------------------------------\n# Utility: simple hash-based mock tokenizer\n# ---------------------------------------------------------------------------\n\ndef _char_hash(token: str, vocab_size: int = 30522) -> int:\n \"\"\"Map a token string to a vocabulary index via its string hash.\"\"\"\n return (hash(token) & 0x7FFF_FFFF) % max(1, vocab_size - 1) + 1\n","content_type":"text/x-python; charset=utf-8","language":"python","size":7026,"content_sha256":"ef8838149e82aff33d193b84fc63ec56b0a2a5538dce285d42b99fa59d5c9d24"},{"filename":"scripts/lib/fusion/nexus.py","content":"\"\"\"Nexus — ML-powered token compressor FusionStage (order=35).\n\nUses a dual-head ModernBERT-style classifier (CrunchModel) to make\nkeep/discard decisions for each token in a text passage.\n\nWhen torch is unavailable the stage falls back to a rule-based heuristic\ncompressor (stopword removal + repetition detection) so the pipeline stays\nfunctional without heavy ML dependencies.\n\nPart of claw-compactor. License: MIT.\n\"\"\"\nfrom __future__ import annotations\n\nimport re\nfrom typing import Any\n\nfrom claw_compactor.fusion.base import FusionStage, FusionContext, FusionResult\n\ntry:\n from claw_compactor.tokens import estimate_tokens # type: ignore[import]\nexcept ImportError: # pragma: no cover — tokens module may not exist yet\n def estimate_tokens(text: str) -> int: # type: ignore[misc]\n return max(1, len(text.split()))\n\n# ---------------------------------------------------------------------------\n# Optional torch / transformers import\n# ---------------------------------------------------------------------------\nTORCH_AVAILABLE = False\ntry:\n import torch # noqa: F401\n TORCH_AVAILABLE = True\nexcept ImportError:\n pass\n\n# Import CrunchModel regardless — it has its own graceful stub when torch\n# is absent. We gate actual instantiation on TORCH_AVAILABLE.\nfrom claw_compactor.fusion.nexus_model import CrunchModel # noqa: E402\n\n# ---------------------------------------------------------------------------\n# Rule-based fallback constants\n# ---------------------------------------------------------------------------\n_STOPWORDS: frozenset[str] = frozenset({\n \"a\", \"an\", \"the\", \"and\", \"or\", \"but\", \"in\", \"on\", \"at\", \"to\", \"for\",\n \"of\", \"with\", \"by\", \"from\", \"up\", \"about\", \"into\", \"through\", \"during\",\n \"is\", \"are\", \"was\", \"were\", \"be\", \"been\", \"being\", \"have\", \"has\", \"had\",\n \"do\", \"does\", \"did\", \"will\", \"would\", \"could\", \"should\", \"may\", \"might\",\n \"shall\", \"can\", \"it\", \"its\", \"this\", \"that\", \"these\", \"those\",\n \"he\", \"she\", \"they\", \"we\", \"you\", \"i\", \"me\", \"him\", \"her\", \"us\",\n \"which\", \"who\", \"whom\", \"what\", \"where\", \"when\", \"how\",\n})\n\n# Minimum word count before NexusStage runs.\n_MIN_WORDS = 20\n\n# Fusion thresholds (also used to expose for testing).\nTOKEN_PROB_THRESHOLD = 0.5\nSPAN_SCORE_THRESHOLD = 0.6\nUNCERTAIN_LOW = 0.3\n\n\n# ---------------------------------------------------------------------------\n# NexusModel — thin wrapper around CrunchModel\n# ---------------------------------------------------------------------------\n\nclass NexusModel:\n \"\"\"Dual-head ModernBERT token classifier for keep/discard decisions.\n\n Wraps CrunchModel with configurable fusion thresholds.\n\n Fusion rule applied per token t_i with keep-class probability p_i and\n span importance score s_i:\n\n keep ← p_i > TOKEN_PROB_THRESHOLD\n keep ← UNCERTAIN_LOW \u003c p_i ≤ TOKEN_PROB_THRESHOLD AND s_i > SPAN_SCORE_THRESHOLD\n discard otherwise\n \"\"\"\n\n def __init__(\n self,\n token_prob_threshold: float = TOKEN_PROB_THRESHOLD,\n span_score_threshold: float = SPAN_SCORE_THRESHOLD,\n uncertain_low: float = UNCERTAIN_LOW,\n **model_kwargs: Any,\n ) -> None:\n if not TORCH_AVAILABLE:\n raise ImportError(\n \"NexusModel requires torch. Install it with: pip install torch\"\n )\n self._model = CrunchModel(**model_kwargs)\n self._token_prob_threshold = token_prob_threshold\n self._span_score_threshold = span_score_threshold\n self._uncertain_low = uncertain_low\n\n def compress(self, tokens: list[str]) -> list[str]:\n \"\"\"Return the subset of *tokens* that the model decides to keep.\"\"\"\n return self._model.compress(\n tokens,\n token_prob_threshold=self._token_prob_threshold,\n span_score_threshold=self._span_score_threshold,\n uncertain_low=self._uncertain_low,\n )\n\n\n# ---------------------------------------------------------------------------\n# NexusStage\n# ---------------------------------------------------------------------------\n\nclass NexusStage(FusionStage):\n \"\"\"ML-powered token compressor FusionStage.\n\n - Uses NexusModel (CrunchModel) when torch is available.\n - Falls back to rule-based compression (stopword removal + repetition\n detection) when torch is absent, so the pipeline still runs.\n - Skips entirely (should_apply → False) when torch is absent AND\n the caller has set require_torch=True in the constructor.\n\n Ordering: 35 (after Cortex=5, Neurosyntax=25; before later dedup stages).\n \"\"\"\n\n name = \"nexus\"\n order = 35\n\n def __init__(\n self,\n require_torch: bool = False,\n token_prob_threshold: float = TOKEN_PROB_THRESHOLD,\n span_score_threshold: float = SPAN_SCORE_THRESHOLD,\n uncertain_low: float = UNCERTAIN_LOW,\n ) -> None:\n self._require_torch = require_torch\n self._token_prob_threshold = token_prob_threshold\n self._span_score_threshold = span_score_threshold\n self._uncertain_low = uncertain_low\n self._model: NexusModel | None = None\n\n if TORCH_AVAILABLE:\n self._model = NexusModel(\n token_prob_threshold=token_prob_threshold,\n span_score_threshold=span_score_threshold,\n uncertain_low=uncertain_low,\n )\n\n # ------------------------------------------------------------------\n # FusionStage interface\n # ------------------------------------------------------------------\n\n def should_apply(self, ctx: FusionContext) -> bool:\n \"\"\"Return True when the stage should run.\n\n Conditions:\n 1. content_type must be \"text\"\n 2. content must contain at least _MIN_WORDS words\n 3. If require_torch=True, torch must be available.\n If require_torch=False (default), falls back gracefully.\n \"\"\"\n if ctx.content_type != \"text\":\n return False\n if len(ctx.content.split()) \u003c _MIN_WORDS:\n return False\n if self._require_torch and not TORCH_AVAILABLE:\n return False\n return True\n\n def apply(self, ctx: FusionContext) -> FusionResult:\n \"\"\"Apply ML or rule-based token compression.\"\"\"\n original_tokens = estimate_tokens(ctx.content)\n words = ctx.content.split()\n\n warnings: list[str] = []\n\n if TORCH_AVAILABLE and self._model is not None:\n kept_words, method = self._ml_compress(words)\n else:\n kept_words, method = self._fallback_compress(words)\n warnings.append(\n \"nexus: torch unavailable — used rule-based fallback compression\"\n )\n\n compressed = \" \".join(kept_words)\n compressed_tokens = estimate_tokens(compressed)\n\n return FusionResult(\n content=compressed,\n original_tokens=original_tokens,\n compressed_tokens=compressed_tokens,\n markers=[f\"nexus:{method}\"],\n warnings=warnings,\n )\n\n # ------------------------------------------------------------------\n # ML compression path\n # ------------------------------------------------------------------\n\n def _ml_compress(self, words: list[str]) -> tuple[list[str], str]:\n \"\"\"Run CrunchModel inference and return (kept_words, method_label).\"\"\"\n assert self._model is not None\n kept = self._model.compress(words)\n # Always keep at least one word to avoid empty output.\n if not kept and words:\n kept = [words[0]]\n return kept, \"ml\"\n\n # ------------------------------------------------------------------\n # Rule-based fallback compression\n # ------------------------------------------------------------------\n\n def _fallback_compress(self, words: list[str]) -> tuple[list[str], str]:\n \"\"\"Simple heuristic compression: stopword removal + repetition detection.\"\"\"\n # Phase 1: Remove stop-words (case-insensitive) but keep words that are\n # purely stopwords if the whole sentence would collapse.\n after_stopwords = [\n w for w in words\n if _clean(w) not in _STOPWORDS or not _clean(w)\n ]\n\n # Ensure we did not over-compress (keep at least 40% of original words)\n if len(after_stopwords) \u003c max(1, len(words) * 0.4):\n after_stopwords = words[:]\n\n # Phase 2: Remove exact-duplicate consecutive tokens (repetition).\n deduplicated = _deduplicate_consecutive(after_stopwords)\n\n # Phase 3: Remove repeated n-grams (bigrams that appear 3+ times).\n compressed = _remove_repeated_ngrams(deduplicated, n=2, min_count=3)\n\n # Guarantee non-empty output.\n if not compressed and words:\n compressed = [words[0]]\n\n return compressed, \"fallback\"\n\n\n# ---------------------------------------------------------------------------\n# Fallback helpers\n# ---------------------------------------------------------------------------\n\ndef _clean(word: str) -> str:\n \"\"\"Lowercase and strip punctuation from a word for stopword lookup.\"\"\"\n return re.sub(r\"[^\\w]\", \"\", word).lower()\n\n\ndef _deduplicate_consecutive(words: list[str]) -> list[str]:\n \"\"\"Remove consecutive duplicate tokens (case-insensitive).\"\"\"\n if not words:\n return []\n result: list[str] = [words[0]]\n for word in words[1:]:\n if word.lower() != result[-1].lower():\n result.append(word)\n return result\n\n\ndef _remove_repeated_ngrams(\n words: list[str],\n n: int = 2,\n min_count: int = 3,\n) -> list[str]:\n \"\"\"Drop tokens that belong to an n-gram repeated >= min_count times.\"\"\"\n if len(words) \u003c n:\n return words[:]\n\n # Count n-gram occurrences.\n ngram_counts: dict[tuple[str, ...], int] = {}\n for i in range(len(words) - n + 1):\n gram = tuple(w.lower() for w in words[i : i + n])\n ngram_counts[gram] = ngram_counts.get(gram, 0) + 1\n\n # Find n-grams that exceed the threshold.\n repeated: set[tuple[str, ...]] = {\n gram for gram, count in ngram_counts.items() if count >= min_count\n }\n if not repeated:\n return words[:]\n\n # Mark positions that are part of a repeated n-gram (keep first occurrence).\n seen_repeated: set[tuple[str, ...]] = set()\n drop_positions: set[int] = set()\n\n for i in range(len(words) - n + 1):\n gram = tuple(w.lower() for w in words[i : i + n])\n if gram in repeated:\n if gram in seen_repeated:\n for j in range(i, i + n):\n drop_positions.add(j)\n else:\n seen_repeated.add(gram)\n\n return [w for i, w in enumerate(words) if i not in drop_positions]\n","content_type":"text/x-python; charset=utf-8","language":"python","size":10724,"content_sha256":"cf29107f265c8fe5b2877944c3788b10cc0803e3a06ef4ae7604c2ce972aacdd"},{"filename":"scripts/lib/fusion/photon.py","content":"\"\"\"Photon — image optimiser FusionStage for the claw-compactor pipeline.\n\nDetects base64-encoded images embedded in message content (OpenAI, Anthropic,\nand Google GenAI multi-modal formats), applies size-based resizing / quality\nreduction via Pillow when available, converts PNG screenshots to JPEG, and\nsets OpenAI ``detail: \"low\"`` to cap vision-token cost.\n\norder = 8 (runs early; images bloat context most aggressively)\n\nPart of claw-compactor. License: MIT.\n\"\"\"\nfrom __future__ import annotations\n\nimport base64\nimport io\nimport json\nimport logging\nimport math\nimport re\nfrom typing import Any\n\nfrom claw_compactor.fusion.base import FusionContext, FusionResult, FusionStage\nfrom claw_compactor.tokens import estimate_tokens\n\nlogger = logging.getLogger(__name__)\n\n# ---------------------------------------------------------------------------\n# Optional Pillow import\n# ---------------------------------------------------------------------------\ntry:\n from PIL import Image as _PILImage # type: ignore[import]\n PILLOW_AVAILABLE = True\nexcept ImportError:\n _PILImage = None # type: ignore[assignment]\n PILLOW_AVAILABLE = False\n\n# ---------------------------------------------------------------------------\n# Constants\n# ---------------------------------------------------------------------------\n\n# Size thresholds (decoded bytes)\n_THRESHOLD_1MB = 1 * 1024 * 1024 # 1 MB — resize to 512 px wide, quality=85\n_THRESHOLD_2MB = 2 * 1024 * 1024 # 2 MB — resize to 384 px wide, quality=75\n\n# Resize targets: (max_width, jpeg_quality)\n_RESIZE_1MB = (512, 85)\n_RESIZE_2MB = (384, 75)\n\n# PNG → JPEG conversion quality\n_PNG_JPEG_QUALITY = 85\n\n# Regex: match a full data-URI base64 payload\n_DATA_URI_RE = re.compile(\n r\"data:image/(?P\u003cfmt>[a-zA-Z0-9+.\\-]+);base64,(?P\u003cb64>[A-Za-z0-9+/=\\n]+)\"\n)\n\n# OpenAI \"detail\" field values\n_DETAIL_LOW = \"low\"\n_DETAIL_HIGH = \"high\"\n_DETAIL_AUTO = \"auto\"\n\n\n# ---------------------------------------------------------------------------\n# Token estimation\n# ---------------------------------------------------------------------------\n\ndef estimate_image_tokens(width: int, height: int) -> int:\n \"\"\"Estimate vision tokens for an image using OpenAI tile formula.\n\n Formula: (width/512) * (height/512) * 85 + 170\n Rounded up to nearest integer.\n \"\"\"\n tiles_w = math.ceil(width / 512)\n tiles_h = math.ceil(height / 512)\n return int(math.ceil(tiles_w * tiles_h * 85 + 170))\n\n\n# ---------------------------------------------------------------------------\n# Image helpers\n# ---------------------------------------------------------------------------\n\ndef _decode_b64(b64_str: str) -> bytes:\n \"\"\"Decode a base64 string (strips whitespace first).\"\"\"\n cleaned = b64_str.strip().replace(\"\\n\", \"\").replace(\" \", \"\")\n return base64.b64decode(cleaned)\n\n\ndef _encode_b64(data: bytes) -> str:\n \"\"\"Encode bytes to a base64 string (no newlines).\"\"\"\n return base64.b64encode(data).decode(\"ascii\")\n\n\ndef _image_size_bytes(b64_str: str) -> int:\n \"\"\"Return decoded byte size of a base64 image payload.\"\"\"\n try:\n return len(_decode_b64(b64_str))\n except Exception:\n return 0\n\n\ndef _resize_and_encode(\n raw: bytes,\n max_width: int,\n jpeg_quality: int,\n source_fmt: str,\n) -> tuple[bytes, str]:\n \"\"\"Resize *raw* image bytes to *max_width* and return (new_bytes, mime_type).\n\n The output format is always JPEG. *source_fmt* is used only for logging.\n Requires Pillow.\n \"\"\"\n img = _PILImage.open(io.BytesIO(raw))\n orig_w, orig_h = img.size\n\n if orig_w > max_width:\n ratio = max_width / orig_w\n new_h = max(1, int(orig_h * ratio))\n img = img.resize((max_width, new_h), _PILImage.LANCZOS)\n\n # Convert to RGB for JPEG (removes alpha channel if present)\n if img.mode not in (\"RGB\", \"L\"):\n img = img.convert(\"RGB\")\n\n buf = io.BytesIO()\n img.save(buf, format=\"JPEG\", quality=jpeg_quality, optimize=True)\n return buf.getvalue(), \"jpeg\"\n\n\ndef _png_to_jpeg(raw: bytes, quality: int = _PNG_JPEG_QUALITY) -> tuple[bytes, str]:\n \"\"\"Convert PNG bytes to JPEG. Requires Pillow.\"\"\"\n img = _PILImage.open(io.BytesIO(raw))\n if img.mode not in (\"RGB\", \"L\"):\n img = img.convert(\"RGB\")\n buf = io.BytesIO()\n img.save(buf, format=\"JPEG\", quality=quality, optimize=True)\n return buf.getvalue(), \"jpeg\"\n\n\n# ---------------------------------------------------------------------------\n# Per-image optimisation (returns updated data-URI or original on failure)\n# ---------------------------------------------------------------------------\n\ndef _optimise_image_data_uri(\n fmt: str,\n b64_payload: str,\n) -> tuple[str, str, int, int]:\n \"\"\"Optimise a single image represented as ``(fmt, b64_payload)``.\n\n Returns ``(new_fmt, new_b64, original_bytes, new_bytes)``.\n\n Without Pillow, only records sizes; no transformation is applied.\n \"\"\"\n raw = _decode_b64(b64_payload)\n original_bytes = len(raw)\n fmt_lower = fmt.lower().replace(\"image/\", \"\")\n\n if not PILLOW_AVAILABLE:\n # Cannot resize without Pillow; return unchanged\n return fmt_lower, b64_payload, original_bytes, original_bytes\n\n try:\n if original_bytes >= _THRESHOLD_2MB:\n new_raw, new_fmt = _resize_and_encode(raw, _RESIZE_2MB[0], _RESIZE_2MB[1], fmt_lower)\n elif original_bytes >= _THRESHOLD_1MB:\n new_raw, new_fmt = _resize_and_encode(raw, _RESIZE_1MB[0], _RESIZE_1MB[1], fmt_lower)\n elif fmt_lower == \"png\":\n new_raw, new_fmt = _png_to_jpeg(raw)\n else:\n # Nothing to do\n return fmt_lower, b64_payload, original_bytes, original_bytes\n\n new_b64 = _encode_b64(new_raw)\n return new_fmt, new_b64, original_bytes, len(new_raw)\n except Exception as exc:\n logger.warning(\"Photon: image optimisation failed (%s); keeping original.\", exc)\n return fmt_lower, b64_payload, original_bytes, original_bytes\n\n\n# ---------------------------------------------------------------------------\n# Content traversal helpers\n# ---------------------------------------------------------------------------\n\ndef _process_openai_content(content: list[dict[str, Any]]) -> tuple[list[dict[str, Any]], list[str], int, int]:\n \"\"\"Walk an OpenAI message content list and optimise image_url blocks.\n\n Returns ``(new_content, markers, saved_bytes, total_original_bytes)``.\n \"\"\"\n new_content: list[dict[str, Any]] = []\n markers: list[str] = []\n saved = 0\n original_total = 0\n\n for block in content:\n if not isinstance(block, dict) or block.get(\"type\") != \"image_url\":\n new_content.append(block)\n continue\n\n image_url = block.get(\"image_url\", {})\n if not isinstance(image_url, dict):\n new_content.append(block)\n continue\n\n url = image_url.get(\"url\", \"\")\n detail = image_url.get(\"detail\", _DETAIL_AUTO)\n\n # Always set detail:low for token savings\n new_detail = _DETAIL_LOW\n updated_url_obj: dict[str, Any] = {**image_url, \"detail\": new_detail}\n\n m = _DATA_URI_RE.match(url) if isinstance(url, str) else None\n if m:\n fmt = m.group(\"fmt\")\n b64 = m.group(\"b64\")\n new_fmt, new_b64, orig_b, new_b = _optimise_image_data_uri(fmt, b64)\n original_total += orig_b\n saved += orig_b - new_b\n new_data_uri = f\"data:image/{new_fmt};base64,{new_b64}\"\n updated_url_obj = {**updated_url_obj, \"url\": new_data_uri}\n markers.append(\n f\"photon:openai_image orig={orig_b}B new={new_b}B fmt={fmt}->{new_fmt}\"\n )\n else:\n # External URL — only set detail:low\n if detail != _DETAIL_LOW:\n markers.append(f\"photon:openai_detail_low url={url[:60]}\")\n\n new_block: dict[str, Any] = {**block, \"image_url\": updated_url_obj}\n new_content.append(new_block)\n\n return new_content, markers, saved, original_total\n\n\ndef _process_anthropic_content(content: list[dict[str, Any]]) -> tuple[list[dict[str, Any]], list[str], int, int]:\n \"\"\"Walk an Anthropic message content list and optimise image blocks.\n\n Anthropic format::\n\n {\"type\": \"image\", \"source\": {\"type\": \"base64\", \"media_type\": \"image/png\", \"data\": \"\u003cb64>\"}}\n\n Returns ``(new_content, markers, saved_bytes, total_original_bytes)``.\n \"\"\"\n new_content: list[dict[str, Any]] = []\n markers: list[str] = []\n saved = 0\n original_total = 0\n\n for block in content:\n if not isinstance(block, dict) or block.get(\"type\") != \"image\":\n new_content.append(block)\n continue\n\n source = block.get(\"source\", {})\n if not isinstance(source, dict) or source.get(\"type\") != \"base64\":\n new_content.append(block)\n continue\n\n media_type = source.get(\"media_type\", \"image/jpeg\")\n b64_data = source.get(\"data\", \"\")\n fmt = media_type.replace(\"image/\", \"\")\n\n new_fmt, new_b64, orig_b, new_b = _optimise_image_data_uri(fmt, b64_data)\n original_total += orig_b\n saved += orig_b - new_b\n\n new_source: dict[str, Any] = {\n **source,\n \"media_type\": f\"image/{new_fmt}\",\n \"data\": new_b64,\n }\n new_block: dict[str, Any] = {**block, \"source\": new_source}\n new_content.append(new_block)\n markers.append(\n f\"photon:anthropic_image orig={orig_b}B new={new_b}B fmt={fmt}->{new_fmt}\"\n )\n\n return new_content, markers, saved, original_total\n\n\ndef _process_google_content(content: list[dict[str, Any]]) -> tuple[list[dict[str, Any]], list[str], int, int]:\n \"\"\"Walk a Google GenAI ``parts`` list and optimise inlineData image parts.\n\n Google format::\n\n {\"inlineData\": {\"mimeType\": \"image/png\", \"data\": \"\u003cb64>\"}}\n\n Returns ``(new_content, markers, saved_bytes, total_original_bytes)``.\n \"\"\"\n new_content: list[dict[str, Any]] = []\n markers: list[str] = []\n saved = 0\n original_total = 0\n\n for part in content:\n if not isinstance(part, dict) or \"inlineData\" not in part:\n new_content.append(part)\n continue\n\n inline = part[\"inlineData\"]\n if not isinstance(inline, dict):\n new_content.append(part)\n continue\n\n mime = inline.get(\"mimeType\", \"image/jpeg\")\n b64_data = inline.get(\"data\", \"\")\n fmt = mime.replace(\"image/\", \"\")\n\n new_fmt, new_b64, orig_b, new_b = _optimise_image_data_uri(fmt, b64_data)\n original_total += orig_b\n saved += orig_b - new_b\n\n new_inline: dict[str, Any] = {\n **inline,\n \"mimeType\": f\"image/{new_fmt}\",\n \"data\": new_b64,\n }\n new_part: dict[str, Any] = {**part, \"inlineData\": new_inline}\n new_content.append(new_part)\n markers.append(\n f\"photon:google_image orig={orig_b}B new={new_b}B fmt={fmt}->{new_fmt}\"\n )\n\n return new_content, markers, saved, original_total\n\n\n# ---------------------------------------------------------------------------\n# Data-URI scanning in plain text / JSON strings\n# ---------------------------------------------------------------------------\n\ndef _scan_and_replace_data_uris(text: str) -> tuple[str, list[str], int, int]:\n \"\"\"Find all data-URI image payloads inside *text* and optimise them.\n\n Handles plain-text content that embeds images as data URIs (e.g. raw JSON\n serialised into the content string).\n\n Returns ``(new_text, markers, saved_bytes, original_bytes)``.\n \"\"\"\n markers: list[str] = []\n saved = 0\n original_total = 0\n\n def replacer(m: re.Match) -> str:\n nonlocal saved, original_total\n fmt = m.group(\"fmt\")\n b64 = m.group(\"b64\")\n new_fmt, new_b64, orig_b, new_b = _optimise_image_data_uri(fmt, b64)\n original_total += orig_b\n saved += orig_b - new_b\n markers.append(\n f\"photon:inline_image orig={orig_b}B new={new_b}B fmt={fmt}->{new_fmt}\"\n )\n return f\"data:image/{new_fmt};base64,{new_b64}\"\n\n new_text = _DATA_URI_RE.sub(replacer, text)\n return new_text, markers, saved, original_total\n\n\n# ---------------------------------------------------------------------------\n# PhotonStage\n# ---------------------------------------------------------------------------\n\nclass PhotonStage(FusionStage):\n \"\"\"Image optimiser fusion stage.\n\n Detects base64 images in message content in OpenAI, Anthropic, and Google\n GenAI multi-modal formats. Applies size-based resizing (Pillow required),\n PNG-to-JPEG conversion, and OpenAI ``detail:low`` token caps.\n\n Without Pillow installed, only the OpenAI ``detail:low`` optimisation is\n applied; all other paths degrade gracefully (images are passed through\n unchanged, markers still emitted for accounting).\n \"\"\"\n\n name = \"photon\"\n order = 8\n\n def should_apply(self, ctx: FusionContext) -> bool:\n content = ctx.content.strip()\n # Apply if the content looks like it may contain images\n if \"base64\" in content:\n return True\n if '\"image_url\"' in content or '\"image\"' in content or \"inlineData\" in content:\n return True\n # Check for data: URI scheme\n if \"data:image/\" in content:\n return True\n return False\n\n def apply(self, ctx: FusionContext) -> FusionResult: # noqa: C901\n content = ctx.content\n original_tokens = estimate_tokens(content)\n all_markers: list[str] = []\n all_warnings: list[str] = []\n total_saved = 0\n total_original = 0\n\n # ------------------------------------------------------------------\n # Attempt to parse content as JSON (multi-modal message list)\n # ------------------------------------------------------------------\n parsed: Any = None\n try:\n parsed = json.loads(content)\n except (json.JSONDecodeError, ValueError):\n parsed = None\n\n if parsed is not None:\n # Could be a list of content blocks or a single object\n if isinstance(parsed, list):\n # Try to figure out the format from the first image block found\n new_parsed, markers, sv, orig = _dispatch_list(parsed)\n all_markers.extend(markers)\n total_saved += sv\n total_original += orig\n if markers:\n try:\n content = json.dumps(new_parsed, ensure_ascii=False)\n except Exception as exc:\n all_warnings.append(f\"photon: JSON re-serialisation failed: {exc}\")\n\n elif isinstance(parsed, dict):\n # Might be a message object with a \"content\" key\n inner = parsed.get(\"content\")\n if isinstance(inner, list):\n new_inner, markers, sv, orig = _dispatch_list(inner)\n all_markers.extend(markers)\n total_saved += sv\n total_original += orig\n if markers:\n try:\n new_parsed = {**parsed, \"content\": new_inner}\n content = json.dumps(new_parsed, ensure_ascii=False)\n except Exception as exc:\n all_warnings.append(\n f\"photon: JSON re-serialisation failed: {exc}\"\n )\n\n # ------------------------------------------------------------------\n # Scan plain-text content for inline data URIs\n # ------------------------------------------------------------------\n if \"data:image/\" in content:\n new_content, markers, sv, orig = _scan_and_replace_data_uris(content)\n if markers:\n content = new_content\n all_markers.extend(markers)\n total_saved += sv\n total_original += orig\n\n if not all_markers:\n all_warnings.append(\"photon: no images detected or optimised\")\n\n if not PILLOW_AVAILABLE:\n all_warnings.append(\n \"photon: Pillow not installed; only detail:low applied. \"\n \"Install Pillow for full image resizing support.\"\n )\n\n compressed_tokens = estimate_tokens(content)\n\n return FusionResult(\n content=content,\n original_tokens=original_tokens,\n compressed_tokens=compressed_tokens,\n markers=all_markers,\n warnings=all_warnings,\n )\n\n\n# ---------------------------------------------------------------------------\n# Format dispatch helper\n# ---------------------------------------------------------------------------\n\ndef _dispatch_list(\n blocks: list[Any],\n) -> tuple[list[Any], list[str], int, int]:\n \"\"\"Detect format and dispatch to the right processor.\n\n Tries OpenAI → Anthropic → Google in order. Processes whichever format is\n detected; if none match, returns the list unchanged.\n \"\"\"\n # OpenAI: blocks have type == \"image_url\"\n if any(\n isinstance(b, dict) and b.get(\"type\") == \"image_url\" for b in blocks\n ):\n return _process_openai_content(blocks)\n\n # Anthropic: blocks have type == \"image\" with source.type == \"base64\"\n if any(\n isinstance(b, dict) and b.get(\"type\") == \"image\"\n and isinstance(b.get(\"source\"), dict)\n for b in blocks\n ):\n return _process_anthropic_content(blocks)\n\n # Google: blocks (parts) have \"inlineData\" key\n if any(isinstance(b, dict) and \"inlineData\" in b for b in blocks):\n return _process_google_content(blocks)\n\n return blocks, [], 0, 0\n","content_type":"text/x-python; charset=utf-8","language":"python","size":17817,"content_sha256":"0c7aea548a8715fa7cffda1382945434f5872e83870ad3a615dd360e3ae310a8"},{"filename":"scripts/lib/fusion/pipeline.py","content":"\"\"\"Fusion pipeline engine: ordered chain of FusionStages with immutable data flow.\n\nStages are sorted by their ``order`` attribute at construction time. At runtime,\neach stage's timed_apply() is called sequentially — the compressed output from\nstage N becomes the input FusionContext for stage N+1. Stages may propagate\ncontext_updates (e.g. Cortex setting content_type=\"code\") that modify the\ncontext for all downstream stages.\n\nThe pipeline is immutable: add() returns a new FusionPipeline instance.\n\nPart of claw-compactor v7. License: MIT.\n\"\"\"\nfrom __future__ import annotations\nimport logging\nfrom dataclasses import dataclass, field\nfrom claw_compactor.fusion.base import FusionStage, FusionContext, FusionResult\n\nlogger = logging.getLogger(__name__)\n\n\n@dataclass(frozen=True)\nclass FusionStepResult:\n \"\"\"Result from a single fusion pipeline step.\"\"\"\n transform_name: str\n result: FusionResult\n\n\n@dataclass(frozen=True)\nclass FusionPipelineResult:\n \"\"\"Aggregated result from running all fusion stages.\"\"\"\n content: str\n steps: list[FusionStepResult] = field(default_factory=list)\n total_timing_ms: float = 0.0\n markers: list[str] = field(default_factory=list)\n warnings: list[str] = field(default_factory=list)\n\n\nclass FusionPipeline:\n \"\"\"Ordered chain of FusionStages.\"\"\"\n\n def __init__(self, transforms: list[FusionStage] | None = None):\n self._transforms: list[FusionStage] = sorted(\n transforms or [], key=lambda t: t.order\n )\n\n def add(self, transform: FusionStage) -> FusionPipeline:\n \"\"\"Return a new FusionPipeline with the fusion stage added (immutable).\"\"\"\n new_transforms = sorted(\n [*self._transforms, transform], key=lambda t: t.order\n )\n return FusionPipeline(new_transforms)\n\n @property\n def transforms(self) -> list[FusionStage]:\n return list(self._transforms)\n\n def run(self, ctx: FusionContext) -> FusionPipelineResult:\n \"\"\"Run all fusion stages sequentially. Each stage's output feeds the next.\"\"\"\n steps: list[FusionStepResult] = []\n all_markers: list[str] = []\n all_warnings: list[str] = []\n total_ms = 0.0\n current_ctx = ctx\n\n for transform in self._transforms:\n result = transform.timed_apply(current_ctx)\n steps.append(FusionStepResult(\n transform_name=transform.name,\n result=result,\n ))\n total_ms += result.timing_ms\n\n if not result.skipped:\n updates = {\"content\": result.content, **result.context_updates}\n current_ctx = current_ctx.evolve(**updates)\n all_markers.extend(result.markers)\n all_warnings.extend(result.warnings)\n logger.debug(\n \"%s: %d→%d tokens (%.1fms)\",\n transform.name,\n result.original_tokens,\n result.compressed_tokens,\n result.timing_ms,\n )\n else:\n logger.debug(\"%s: skipped\", transform.name)\n\n return FusionPipelineResult(\n content=current_ctx.content,\n steps=steps,\n total_timing_ms=total_ms,\n markers=all_markers,\n warnings=all_warnings,\n )\n","content_type":"text/x-python; charset=utf-8","language":"python","size":3332,"content_sha256":"f5bcdde38a7669592ce579f196f392b63478b77150e13b2e0813909b2f17d41d"},{"filename":"scripts/lib/fusion/plan_reinjection.py","content":"\"\"\"PlanReinjection — re-inject active plans and tasks after compaction.\n\nInspired by Claude Code's plan/task re-injection: when conversation history\nis compacted, active plans and incomplete tasks are re-injected into the\ncontext so the LLM doesn't lose track of what it's working on.\n\nUsage::\n\n from claw_compactor.fusion.plan_reinjection import PlanTaskTracker\n\n tracker = PlanTaskTracker()\n tracker.record_plan(\"Build authentication system\", steps=[...])\n tracker.record_task(\"T-001\", \"Implement login endpoint\", status=\"running\")\n\n # After compaction:\n injection_msg = tracker.build_injection_message()\n messages.append(injection_msg)\n\nPart of claw-compactor v8. License: MIT.\n\"\"\"\nfrom __future__ import annotations\n\nimport re\nimport time\nfrom dataclasses import dataclass, field\nfrom typing import Any, Optional\n\nfrom claw_compactor.tokens import estimate_tokens\n\n# Budget limits.\nPLAN_INJECTION_MAX_TOKENS = 10_000\nTASK_INJECTION_MAX_TOKENS = 5_000\nTOTAL_INJECTION_MAX_TOKENS = 15_000\n\n# Patterns to detect plans and tasks in messages.\n_PLAN_PATTERN = re.compile(\n r'(?:plan|roadmap|strategy|approach|steps?)[\\s:]+\\n((?:\\s*[-*\\d]+\\.?\\s+.+\\n?)+)',\n re.IGNORECASE | re.MULTILINE,\n)\n_TASK_PATTERN = re.compile(\n r'(?:T-\\d+|task|todo|action item)[\\s:]+(.+)',\n re.IGNORECASE,\n)\n_STATUS_PATTERN = re.compile(\n r'(?:status|state)[\\s:]+(\\w+)',\n re.IGNORECASE,\n)\n\n\n@dataclass\nclass PlanItem:\n \"\"\"A tracked plan with steps.\"\"\"\n title: str\n steps: list[str] = field(default_factory=list)\n created_at: float = field(default_factory=time.time)\n completed: bool = False\n\n\n@dataclass\nclass TaskItem:\n \"\"\"A tracked task.\"\"\"\n task_id: str\n title: str\n status: str = \"pending\" # pending, running, done, failed, blocked\n created_at: float = field(default_factory=time.time)\n updated_at: float = field(default_factory=time.time)\n\n\nclass PlanTaskTracker:\n \"\"\"Tracks active plans and tasks for re-injection after compaction.\n\n Plans and tasks can be explicitly recorded via the API, or auto-detected\n from conversation messages using heuristic patterns.\n \"\"\"\n\n def __init__(self) -> None:\n self._plans: list[PlanItem] = []\n self._tasks: dict[str, TaskItem] = {}\n self._auto_id_counter: int = 0\n\n # ------------------------------------------------------------------\n # Explicit API\n # ------------------------------------------------------------------\n\n def record_plan(\n self,\n title: str,\n steps: Optional[list[str]] = None,\n ) -> PlanItem:\n \"\"\"Record an active plan.\"\"\"\n plan = PlanItem(title=title, steps=steps or [])\n self._plans.append(plan)\n return plan\n\n def complete_plan(self, title: str) -> None:\n \"\"\"Mark a plan as completed.\"\"\"\n for plan in self._plans:\n if plan.title == title:\n plan.completed = True\n return\n\n def record_task(\n self,\n task_id: str,\n title: str,\n status: str = \"pending\",\n ) -> TaskItem:\n \"\"\"Record or update a task.\"\"\"\n if task_id in self._tasks:\n task = self._tasks[task_id]\n task.title = title\n task.status = status\n task.updated_at = time.time()\n else:\n task = TaskItem(task_id=task_id, title=title, status=status)\n self._tasks[task_id] = task\n return task\n\n def update_task_status(self, task_id: str, status: str) -> None:\n \"\"\"Update a task's status.\"\"\"\n if task_id in self._tasks:\n self._tasks[task_id].status = status\n self._tasks[task_id].updated_at = time.time()\n\n # ------------------------------------------------------------------\n # Auto-detection from messages\n # ------------------------------------------------------------------\n\n def scan_messages(self, messages: list[dict[str, Any]]) -> dict[str, int]:\n \"\"\"Scan messages for plans and tasks, auto-recording them.\n\n Returns stats about what was found.\n \"\"\"\n plans_found = 0\n tasks_found = 0\n\n for msg in messages:\n content = msg.get(\"content\", \"\")\n if not isinstance(content, str):\n continue\n\n # Detect plans.\n for match in _PLAN_PATTERN.finditer(content):\n steps_text = match.group(1)\n steps = [\n line.strip().lstrip(\"-*0123456789. \")\n for line in steps_text.strip().split(\"\\n\")\n if line.strip()\n ]\n if steps:\n self._auto_id_counter += 1\n title = f\"Auto-detected plan #{self._auto_id_counter}\"\n self.record_plan(title, steps)\n plans_found += 1\n\n # Detect tasks.\n for match in _TASK_PATTERN.finditer(content):\n task_title = match.group(1).strip()[:200]\n if task_title:\n self._auto_id_counter += 1\n task_id = f\"auto-{self._auto_id_counter}\"\n status = \"pending\"\n status_match = _STATUS_PATTERN.search(content)\n if status_match:\n detected = status_match.group(1).lower()\n if detected in (\"running\", \"done\", \"failed\", \"blocked\", \"pending\"):\n status = detected\n self.record_task(task_id, task_title, status)\n tasks_found += 1\n\n return {\"plans_found\": plans_found, \"tasks_found\": tasks_found}\n\n # ------------------------------------------------------------------\n # Injection\n # ------------------------------------------------------------------\n\n @property\n def active_plans(self) -> list[PlanItem]:\n \"\"\"Return plans that are not completed.\"\"\"\n return [p for p in self._plans if not p.completed]\n\n @property\n def active_tasks(self) -> list[TaskItem]:\n \"\"\"Return tasks that are not done or cancelled.\"\"\"\n return [\n t for t in self._tasks.values()\n if t.status not in (\"done\", \"cancelled\")\n ]\n\n def build_injection_message(self) -> Optional[dict[str, Any]]:\n \"\"\"Build a system message with active plans and tasks for re-injection.\n\n Returns None if there's nothing to inject.\n \"\"\"\n parts: list[str] = []\n total_tokens = 0\n\n # Plans section.\n active_plans = self.active_plans\n if active_plans:\n plan_lines = [\"## Active Plans (re-injected after compaction)\\n\"]\n for plan in active_plans:\n plan_lines.append(f\"### {plan.title}\")\n for i, step in enumerate(plan.steps, 1):\n plan_lines.append(f\" {i}. {step}\")\n plan_lines.append(\"\")\n\n plan_text = \"\\n\".join(plan_lines)\n plan_tokens = estimate_tokens(plan_text)\n if plan_tokens \u003c= PLAN_INJECTION_MAX_TOKENS:\n parts.append(plan_text)\n total_tokens += plan_tokens\n else:\n # Truncate steps.\n truncated_lines = plan_lines[:20]\n truncated_lines.append(\n f\"\\n[...truncated, {len(plan_lines) - 20} more lines]\"\n )\n parts.append(\"\\n\".join(truncated_lines))\n total_tokens += estimate_tokens(\"\\n\".join(truncated_lines))\n\n # Tasks section.\n active_tasks = self.active_tasks\n if active_tasks:\n task_lines = [\"## Active Tasks (re-injected after compaction)\\n\"]\n task_lines.append(\"| ID | Title | Status |\")\n task_lines.append(\"|-----|-------|--------|\")\n for task in active_tasks:\n task_lines.append(f\"| {task.task_id} | {task.title[:80]} | {task.status} |\")\n task_lines.append(\"\")\n\n task_text = \"\\n\".join(task_lines)\n task_tokens = estimate_tokens(task_text)\n if total_tokens + task_tokens \u003c= TOTAL_INJECTION_MAX_TOKENS:\n parts.append(task_text)\n total_tokens += task_tokens\n\n if not parts:\n return None\n\n return {\n \"role\": \"system\",\n \"content\": \"\\n\".join(parts),\n \"_metadata\": {\n \"type\": \"plan_task_reinjection\",\n \"plans_count\": len(active_plans),\n \"tasks_count\": len(active_tasks),\n \"tokens\": total_tokens,\n },\n }\n\n def clear_completed(self) -> int:\n \"\"\"Remove completed plans and done tasks. Returns count removed.\"\"\"\n removed = 0\n self._plans = [p for p in self._plans if not p.completed]\n before = len(self._tasks)\n self._tasks = {\n k: v for k, v in self._tasks.items()\n if v.status not in (\"done\", \"cancelled\")\n }\n removed += before - len(self._tasks)\n return removed\n\n def to_dict(self) -> dict[str, Any]:\n \"\"\"Serialize tracker state.\"\"\"\n return {\n \"plans\": [\n {\"title\": p.title, \"steps\": p.steps, \"completed\": p.completed}\n for p in self._plans\n ],\n \"tasks\": [\n {\n \"id\": t.task_id,\n \"title\": t.title,\n \"status\": t.status,\n }\n for t in self._tasks.values()\n ],\n }\n","content_type":"text/x-python; charset=utf-8","language":"python","size":9528,"content_sha256":"add36e18f758a88d3bec5aa95b0076ac54a381336bf8aa0291e937a0c725028f"},{"filename":"scripts/lib/fusion/quantum_lock.py","content":"\"\"\"Quantum Lock — KV Cache alignment as a FusionStage.\n\nRuns at order=3, just before Cortex (order=5), so that downstream stages\nalways receive a prefix-stable system message.\n\nThe Anthropic prompt cache keys on the first N tokens of the system prompt.\nAny dynamic content (dates, UUIDs, API keys, JWTs, timestamps) that appears\nnear the top of a system message will bust the cache on every request.\n\nQuantumLock solves this by:\n 1. Detecting all dynamic fragments using regex patterns.\n 2. Replacing each occurrence with a stable placeholder token.\n 3. Appending a clearly delimited \"dynamic context\" block at the END of the\n message so the model still has access to the real values.\n\nThe \"quantum\" metaphor: dynamic values are collapsed into a deterministic\ntail section so the wavefunction of the prefix stays locked (stable).\n\nPart of claw-compactor Phase 5. License: MIT.\n\"\"\"\nfrom __future__ import annotations\n\nimport hashlib\nimport re\nfrom dataclasses import dataclass\nfrom typing import Any\n\nfrom claw_compactor.fusion.base import FusionContext, FusionResult, FusionStage\nfrom claw_compactor.tokens import estimate_tokens\n\n\n# ---------------------------------------------------------------------------\n# Dynamic content patterns\n# ---------------------------------------------------------------------------\n\n@dataclass(frozen=True)\nclass DynamicPattern:\n \"\"\"A compiled pattern that identifies dynamic content.\"\"\"\n name: str\n regex: re.Pattern\n placeholder: str\n\n\n_RAW_PATTERNS: list[tuple[str, str, str]] = [\n # ISO 8601 date/datetime\n (\n \"iso_date\",\n r\"\\b\\d{4}-\\d{2}-\\d{2}\"\n r\"(?:T\\d{2}:\\d{2}:\\d{2}(?:\\.\\d+)?(?:Z|[+-]\\d{2}:\\d{2})?)?\\b\",\n \"\u003cdate>\",\n ),\n # Plain HH:MM:SS times\n (\n \"time\",\n r\"\\b\\d{2}:\\d{2}:\\d{2}\\b\",\n \"\u003ctime>\",\n ),\n # JWTs (eyJ...) — header.payload.signature\n (\n \"jwt\",\n r\"\\beyJ[A-Za-z0-9_-]+\\.[A-Za-z0-9_-]+\\.[A-Za-z0-9_-]+\\b\",\n \"\u003cjwt>\",\n ),\n # API keys: sk-..., rk-... OR pk_live_..., pk_test_... (Stripe-style underscore separator)\n (\n \"api_key\",\n r\"\\b(?:(?:sk|rk)-[A-Za-z0-9_-]{16,}|(?:pk_live|pk_test)_[A-Za-z0-9_-]{16,})\\b\",\n \"\u003capi_key>\",\n ),\n # UUIDs (case-insensitive)\n (\n \"uuid\",\n r\"\\b[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}\"\n r\"-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\\b\",\n \"\u003cuuid>\",\n ),\n # Unix timestamps: 10-digit (seconds since ~2001) or 13-digit (ms)\n (\n \"unix_ts\",\n r\"\\b(?:1[5-9]\\d{8}|[2-9]\\d{9}|\\d{13})\\b\",\n \"\u003ctimestamp>\",\n ),\n # High-entropy hex strings: 32–64 hex chars (request/trace/session IDs)\n (\n \"hex_id\",\n r\"\\b[0-9a-fA-F]{32,64}\\b\",\n \"\u003cid>\",\n ),\n]\n\nDYNAMIC_PATTERNS: list[DynamicPattern] = [\n DynamicPattern(\n name=name,\n regex=re.compile(pattern),\n placeholder=placeholder,\n )\n for name, pattern, placeholder in _RAW_PATTERNS\n]\n\nAPPENDIX_START = \"\u003c!-- quantum-lock: dynamic context -->\"\nAPPENDIX_END = \"\u003c!-- end quantum-lock -->\"\nAPPENDIX_SEPARATOR = \"---\"\n\n\n# ---------------------------------------------------------------------------\n# Public functions (usable standalone, not only as a FusionStage)\n# ---------------------------------------------------------------------------\n\n@dataclass(frozen=True)\nclass DynamicFragment:\n \"\"\"A single dynamic fragment extracted from content.\"\"\"\n name: str\n original: str\n placeholder: str\n indices: tuple[int, ...] # positions in the original string\n\n\ndef extract_dynamic(content: str) -> list[DynamicFragment]:\n \"\"\"Return all dynamic fragments found in *content*, sorted by first index.\n\n De-duplicates by original value: the same UUID appearing multiple times\n is reported once with all its positions.\n \"\"\"\n seen: dict[str, DynamicFragment] = {}\n\n for dp in DYNAMIC_PATTERNS:\n for match in dp.regex.finditer(content):\n val = match.group(0)\n if val in seen:\n frag = seen[val]\n seen[val] = DynamicFragment(\n name=frag.name,\n original=frag.original,\n placeholder=frag.placeholder,\n indices=(*frag.indices, match.start()),\n )\n else:\n seen[val] = DynamicFragment(\n name=dp.name,\n original=val,\n placeholder=dp.placeholder,\n indices=(match.start(),),\n )\n\n return sorted(seen.values(), key=lambda f: f.indices[0])\n\n\ndef stabilize(content: str) -> str:\n \"\"\"Stabilise *content* for KV cache alignment.\n\n Replaces dynamic fragments with placeholders and appends a\n \"dynamic context\" appendix at the end so the model still has\n access to the real values.\n\n Returns *content* unchanged if no dynamic fragments are found.\n \"\"\"\n fragments = extract_dynamic(content)\n if not fragments:\n return content\n\n stabilized = content\n # Process longest originals first to avoid partial substitution\n for frag in sorted(fragments, key=lambda f: len(f.original), reverse=True):\n stabilized = stabilized.replace(frag.original, frag.placeholder)\n\n appendix_lines = [\n \"\",\n APPENDIX_SEPARATOR,\n APPENDIX_START,\n ]\n for frag in fragments:\n appendix_lines.append(f\"{frag.name}: {frag.original}\")\n appendix_lines.append(APPENDIX_END)\n\n return stabilized + \"\\n\".join(appendix_lines)\n\n\ndef get_prefix_hash(content: str) -> str:\n \"\"\"Return a SHA-256 hex digest of the stable prefix of *content*.\n\n The stable prefix is the portion before the quantum-lock appendix\n delimiter. Identical hashes across requests indicate a likely\n prompt-cache hit.\n \"\"\"\n stabilized = stabilize(content)\n marker = f\"\\n{APPENDIX_SEPARATOR}\\n{APPENDIX_START}\"\n idx = stabilized.find(marker)\n prefix = stabilized[:idx] if idx != -1 else stabilized\n return hashlib.sha256(prefix.encode(\"utf-8\")).hexdigest()\n\n\n# ---------------------------------------------------------------------------\n# FusionStage implementation\n# ---------------------------------------------------------------------------\n\nclass QuantumLock(FusionStage):\n \"\"\"KV cache alignment stage for the Fusion Pipeline.\n\n Runs at order=3 (before Cortex at order=5) so that every downstream\n stage receives a prefix-stable version of the content.\n\n Only applies to system-role content; user/assistant/tool messages are\n passed through unchanged (they are not cached by Anthropic).\n \"\"\"\n\n name = \"quantum_lock\"\n order = 3 # runs before Cortex (order=5)\n\n def should_apply(self, ctx: FusionContext) -> bool:\n \"\"\"Apply only to system messages that contain dynamic content.\"\"\"\n if ctx.role != \"system\":\n return False\n return bool(extract_dynamic(ctx.content))\n\n def apply(self, ctx: FusionContext) -> FusionResult:\n original_tokens = estimate_tokens(ctx.content)\n stabilized = stabilize(ctx.content)\n compressed_tokens = estimate_tokens(stabilized)\n\n fragments = extract_dynamic(ctx.content)\n markers = [\n f\"quantum_lock:{frag.name}={frag.placeholder}\"\n for frag in fragments\n ]\n\n warnings: list[str] = []\n if compressed_tokens > original_tokens:\n warnings.append(\n f\"quantum_lock: stabilized content is larger than original \"\n f\"({compressed_tokens} > {original_tokens} tokens) — \"\n f\"dynamic appendix overhead\"\n )\n\n return FusionResult(\n content=stabilized,\n original_tokens=original_tokens,\n compressed_tokens=compressed_tokens,\n markers=markers,\n warnings=warnings,\n )\n","content_type":"text/x-python; charset=utf-8","language":"python","size":7844,"content_sha256":"51abf9ea0309959cfc6af4ead2d25d616c6caba5275ee869f8c9573880560dc3"},{"filename":"scripts/lib/fusion/search_crunch.py","content":"\"\"\"SearchCrunch — grep/ripgrep output compression FusionStage.\n\nParses \"file:line:content\" search output, groups by file, deduplicates\nidentical matches, merges consecutive line numbers into ranges, and\ntruncates to top-N files by match count when the result set is large.\n\nPart of claw-compactor. License: MIT.\n\"\"\"\nfrom __future__ import annotations\n\nimport re\nfrom collections import defaultdict\nfrom dataclasses import dataclass, field\n\nfrom claw_compactor.fusion.base import FusionStage, FusionContext, FusionResult\nfrom claw_compactor.tokens import estimate_tokens\n\n# ---------------------------------------------------------------------------\n# Configuration constants\n# ---------------------------------------------------------------------------\n\n# Maximum number of files to retain; the rest are summarised.\n_MAX_FILES = 20\n\n# Maximum matches to show per file before truncating.\n_MAX_MATCHES_PER_FILE = 50\n\n# Matches the canonical \"file:line:content\" format produced by grep/rg.\n# We also tolerate Windows paths like \"C:\\path\\to\\file:10:content\".\n_GREP_LINE_RE = re.compile(\n r'^(?P\u003cfile>(?:[A-Za-z]:[\\\\/]|/|\\.[\\\\/])?[^\\x00:]+?)'\n r':(?P\u003cline>\\d+)'\n r':(?P\u003ccontent>.*)

Claw Compactor — OpenClaw Skill Reference Overview Claw Compactor reduces token usage across the full OpenClaw workspace using 6 compression layers: | Layer | Name | Cost | Notes | |-------|------|------|-------| | 1 | Rule Engine | Free | Dedup, strip filler, merge sections | | 2 | Dictionary Encoding | Free | Auto-codebook, substitution | | 3 | Observation Compression | Free | Session JSONL → structured summaries | | 4 | RLE Patterns | Free | Path/IP/enum shorthand | | 5 | Compressed Context Protocol | Free | Format abbreviations | | 6 | Engram | LLM API | Real-time Observational Memory | S…

\n)\n\n# Lines that look like binary-match notifications or separator lines.\n_SEPARATOR_RE = re.compile(r'^--

Claw Compactor — OpenClaw Skill Reference Overview Claw Compactor reduces token usage across the full OpenClaw workspace using 6 compression layers: | Layer | Name | Cost | Notes | |-------|------|------|-------| | 1 | Rule Engine | Free | Dedup, strip filler, merge sections | | 2 | Dictionary Encoding | Free | Auto-codebook, substitution | | 3 | Observation Compression | Free | Session JSONL → structured summaries | | 4 | RLE Patterns | Free | Path/IP/enum shorthand | | 5 | Compressed Context Protocol | Free | Format abbreviations | | 6 | Engram | LLM API | Real-time Observational Memory | S…

)\n_BINARY_RE = re.compile(r'Binary file .+ matches')\n\n\n# ---------------------------------------------------------------------------\n# Data structures\n# ---------------------------------------------------------------------------\n\n@dataclass\nclass _Match:\n line_no: int\n content: str\n\n\n@dataclass\nclass _FileMatches:\n path: str\n matches: list[_Match] = field(default_factory=list)\n\n\n# ---------------------------------------------------------------------------\n# Parsing helpers\n# ---------------------------------------------------------------------------\n\ndef _parse_grep_output(text: str) -> tuple[dict[str, _FileMatches], list[str]]:\n \"\"\"\n Parse grep/rg output into per-file match collections.\n\n Returns:\n (file_map, unparsed_lines) where file_map maps path -> _FileMatches\n and unparsed_lines holds lines that did not match the grep format.\n \"\"\"\n file_map: dict[str, _FileMatches] = {}\n unparsed: list[str] = []\n\n for raw_line in text.splitlines():\n if not raw_line.strip():\n continue\n if _SEPARATOR_RE.match(raw_line):\n continue\n if _BINARY_RE.match(raw_line):\n unparsed.append(raw_line)\n continue\n\n m = _GREP_LINE_RE.match(raw_line)\n if m:\n path = m.group(\"file\")\n line_no = int(m.group(\"line\"))\n content = m.group(\"content\")\n if path not in file_map:\n file_map[path] = _FileMatches(path=path)\n file_map[path].matches.append(_Match(line_no=line_no, content=content))\n else:\n unparsed.append(raw_line)\n\n return file_map, unparsed\n\n\ndef _dedup_matches(matches: list[_Match]) -> list[_Match]:\n \"\"\"Remove matches with identical content on the same line number.\"\"\"\n seen: set[tuple[int, str]] = set()\n result: list[_Match] = []\n for m in matches:\n key = (m.line_no, m.content.strip())\n if key not in seen:\n seen.add(key)\n result.append(m)\n return result\n\n\ndef _merge_consecutive(matches: list[_Match]) -> list[str]:\n \"\"\"\n Merge consecutive or adjacent line numbers into range strings.\n\n Returns a list of formatted strings like:\n \" L10: content\"\n \" L12-15: [4 lines]\"\n \"\"\"\n if not matches:\n return []\n\n sorted_matches = sorted(matches, key=lambda m: m.line_no)\n output: list[str] = []\n\n i = 0\n while i \u003c len(sorted_matches):\n start = sorted_matches[i]\n j = i + 1\n # Extend run while line numbers are consecutive.\n while j \u003c len(sorted_matches) and sorted_matches[j].line_no == sorted_matches[j - 1].line_no + 1:\n j += 1\n\n run = sorted_matches[i:j]\n if len(run) == 1:\n output.append(f\" L{start.line_no}: {start.content}\")\n elif len(run) == 2:\n # Two lines — show both individually; the range marker adds no value.\n for r in run:\n output.append(f\" L{r.line_no}: {r.content}\")\n else:\n first_content = run[0].content\n last_content = run[-1].content\n output.append(f\" L{run[0].line_no}: {first_content}\")\n output.append(f\" L{run[0].line_no + 1}-{run[-1].line_no - 1}: [{len(run) - 2} lines omitted]\")\n output.append(f\" L{run[-1].line_no}: {last_content}\")\n\n i = j\n\n return output\n\n\ndef _format_file_section(fm: _FileMatches, max_matches: int) -> list[str]:\n \"\"\"Format a single file's matches into output lines.\"\"\"\n deduped = _dedup_matches(fm.matches)\n total = len(deduped)\n\n truncated = deduped[:max_matches]\n lines = _merge_consecutive(truncated)\n\n section: list[str] = [f\"{fm.path} ({total} match{'es' if total != 1 else ''}):\"]\n section.extend(lines)\n if total > max_matches:\n section.append(f\" ... [{total - max_matches} more matches not shown]\")\n return section\n\n\n# ---------------------------------------------------------------------------\n# FusionStage implementation\n# ---------------------------------------------------------------------------\n\nclass SearchCrunch(FusionStage):\n \"\"\"grep/ripgrep search result compression.\"\"\"\n\n name = \"search_crunch\"\n order = 17\n\n def __init__(\n self,\n max_files: int = _MAX_FILES,\n max_matches_per_file: int = _MAX_MATCHES_PER_FILE,\n ) -> None:\n self._max_files = max_files\n self._max_matches_per_file = max_matches_per_file\n\n def should_apply(self, ctx: FusionContext) -> bool:\n return ctx.content_type == \"search\"\n\n def apply(self, ctx: FusionContext) -> FusionResult:\n original_tokens = estimate_tokens(ctx.content)\n file_map, unparsed = _parse_grep_output(ctx.content)\n\n if not file_map:\n # Nothing parseable — return as-is.\n return FusionResult(\n content=ctx.content,\n original_tokens=original_tokens,\n compressed_tokens=original_tokens,\n skipped=True,\n warnings=[\"search_crunch: no grep-format lines found\"],\n )\n\n # Sort files by descending match count, then alphabetically.\n sorted_files = sorted(\n file_map.values(),\n key=lambda fm: (-len(fm.matches), fm.path),\n )\n\n total_files = len(sorted_files)\n omitted_files = max(0, total_files - self._max_files)\n top_files = sorted_files[: self._max_files]\n\n output_sections: list[str] = []\n\n # Summary header.\n total_matches = sum(len(fm.matches) for fm in sorted_files)\n output_sections.append(\n f\"Search results: {total_matches} matches across {total_files} file{'s' if total_files != 1 else ''}\"\n )\n if omitted_files:\n output_sections.append(\n f\"[Showing top {self._max_files} of {total_files} files by match count]\"\n )\n\n for fm in top_files:\n output_sections.append(\"\")\n output_sections.extend(_format_file_section(fm, self._max_matches_per_file))\n\n if omitted_files:\n omitted_names = [fm.path for fm in sorted_files[self._max_files:]]\n output_sections.append(\"\")\n output_sections.append(\n f\"[{omitted_files} additional file{'s' if omitted_files != 1 else ''} omitted: \"\n + \", \".join(omitted_names[:5])\n + (\" ...\" if len(omitted_names) > 5 else \"\")\n + \"]\"\n )\n\n if unparsed:\n output_sections.append(\"\")\n output_sections.append(f\"[{len(unparsed)} non-grep line(s):]\")\n output_sections.extend(f\" {ln}\" for ln in unparsed[:10])\n if len(unparsed) > 10:\n output_sections.append(f\" ... [{len(unparsed) - 10} more]\")\n\n compressed = \"\\n\".join(output_sections)\n compressed_tokens = estimate_tokens(compressed)\n\n markers = [f\"search_crunch:{total_files} files, {total_matches} matches\"]\n if omitted_files:\n markers.append(f\"search_crunch:omitted {omitted_files} files\")\n\n return FusionResult(\n content=compressed,\n original_tokens=original_tokens,\n compressed_tokens=compressed_tokens,\n markers=markers,\n )\n","content_type":"text/x-python; charset=utf-8","language":"python","size":8578,"content_sha256":"01ca4a0f6cb0c728213901dccaeab9d35dd29a72215c059b5bad6190cf27ecd0"},{"filename":"scripts/lib/fusion/semantic_dedup.py","content":"\"\"\"SemanticDedup — near-duplicate content block elimination FusionStage.\n\nDetects and eliminates repeated content blocks within a single text using\n3-word shingle fingerprinting (no external dependencies). Blocks with\nJaccard similarity > 0.8 are considered near-duplicates; only the first\noccurrence is kept, later ones are replaced with a compact reference.\n\nAlso exposes ``dedup_across_messages`` for cross-message deduplication in\na chat message list.\n\nPart of claw-compactor. License: MIT.\n\"\"\"\nfrom __future__ import annotations\n\nimport re\nfrom dataclasses import dataclass, field\nfrom typing import Sequence\n\nfrom claw_compactor.fusion.base import FusionContext, FusionResult, FusionStage\nfrom claw_compactor.tokens import estimate_tokens\n\n# ---------------------------------------------------------------------------\n# Constants\n# ---------------------------------------------------------------------------\n\n# Minimum block length (chars) to be considered for deduplication.\n_MIN_BLOCK_CHARS = 50\n\n# Jaccard similarity threshold above which two blocks are \"near-duplicate\".\n_SIM_THRESHOLD = 0.80\n\n# Shingle size (number of consecutive words).\n_SHINGLE_N = 3\n\n# Minimum shingle set size; blocks with fewer shingles are not fingerprinted.\n_MIN_SHINGLES = 2\n\n# Template used to replace a duplicate block in-place.\n_REF_TEMPLATE = \"[duplicate of block {n} — omitted]\"\n\n# Template used for cross-message references.\n_MSG_REF_TEMPLATE = \"[content similar to message {idx} — omitted]\"\n\n\n# ---------------------------------------------------------------------------\n# Fingerprinting helpers\n# ---------------------------------------------------------------------------\n\ndef _tokenise(text: str) -> list[str]:\n \"\"\"Split text into lowercase word tokens (letters + digits only).\"\"\"\n return re.findall(r\"[a-z0-9]+\", text.lower())\n\n\ndef _shingles(tokens: list[str], n: int = _SHINGLE_N) -> frozenset[tuple[str, ...]]:\n \"\"\"Return the set of n-gram shingles from *tokens*.\"\"\"\n if len(tokens) \u003c n:\n return frozenset()\n return frozenset(tuple(tokens[i : i + n]) for i in range(len(tokens) - n + 1))\n\n\ndef _jaccard(a: frozenset, b: frozenset) -> float:\n \"\"\"Return the Jaccard similarity of two sets.\"\"\"\n if not a and not b:\n return 1.0\n union = len(a | b)\n if union == 0:\n return 0.0\n return len(a & b) / union\n\n\n# ---------------------------------------------------------------------------\n# Block splitting\n# ---------------------------------------------------------------------------\n\n@dataclass\nclass _Block:\n \"\"\"A single logical block extracted from the source text.\"\"\"\n text: str\n # Offsets into the *original* text for reconstruction.\n start: int\n end: int\n is_code: bool = False\n shingles: frozenset = field(default_factory=frozenset)\n kept: bool = True\n ref_to: int | None = None # 1-based index of the first occurrence\n\n\n_CODE_FENCE_RE = re.compile(r\"```.*?```\", re.DOTALL)\n\n\ndef _split_blocks(text: str) -> list[_Block]:\n \"\"\"\n Split *text* into logical blocks.\n\n Rules (applied in order):\n 1. Fenced code blocks (``` ... ```) are treated as atomic units.\n 2. All remaining text is split on blank lines (one or more empty lines).\n \"\"\"\n blocks: list[_Block] = []\n # Find code fence spans so we can protect them.\n fence_spans: list[tuple[int, int]] = [\n (m.start(), m.end()) for m in _CODE_FENCE_RE.finditer(text)\n ]\n\n def _in_fence(start: int, end: int) -> bool:\n return any(fs \u003c= start and end \u003c= fe for fs, fe in fence_spans)\n\n # Add fenced code blocks as atomic blocks first.\n for fs, fe in fence_spans:\n block_text = text[fs:fe]\n sh = _shingles(_tokenise(block_text))\n blocks.append(_Block(\n text=block_text,\n start=fs,\n end=fe,\n is_code=True,\n shingles=sh,\n ))\n\n # Build a set of positions covered by fences.\n fence_positions: set[int] = set()\n for fs, fe in fence_spans:\n fence_positions.update(range(fs, fe))\n\n # Split the non-fence remainder on blank lines.\n # We iterate over the text, collecting runs of non-fence characters.\n # Then split those runs by blank-line boundaries.\n non_fence_segments: list[tuple[int, str]] = []\n i = 0\n while i \u003c len(text):\n if i in fence_positions:\n i += 1\n continue\n seg_start = i\n buf: list[str] = []\n while i \u003c len(text) and i not in fence_positions:\n buf.append(text[i])\n i += 1\n segment = \"\".join(buf)\n if segment.strip():\n non_fence_segments.append((seg_start, segment))\n\n for seg_start, segment in non_fence_segments:\n # Split by blank lines (2+ newlines or line with only whitespace).\n para_re = re.compile(r\"\\n\\s*\\n\")\n last = 0\n for m in para_re.finditer(segment):\n chunk = segment[last : m.start()]\n if chunk.strip():\n abs_start = seg_start + last\n abs_end = seg_start + m.start()\n sh = _shingles(_tokenise(chunk))\n blocks.append(_Block(\n text=chunk,\n start=abs_start,\n end=abs_end,\n is_code=False,\n shingles=sh,\n ))\n last = m.end()\n # Trailing chunk after last separator.\n chunk = segment[last:]\n if chunk.strip():\n abs_start = seg_start + last\n abs_end = seg_start + len(segment)\n sh = _shingles(_tokenise(chunk))\n blocks.append(_Block(\n text=chunk,\n start=abs_start,\n end=abs_end,\n is_code=False,\n shingles=sh,\n ))\n\n # Sort by position in original text.\n blocks.sort(key=lambda b: b.start)\n return blocks\n\n\n# ---------------------------------------------------------------------------\n# Core dedup logic\n# ---------------------------------------------------------------------------\n\n@dataclass\nclass DedupStats:\n \"\"\"Statistics returned from a dedup run.\"\"\"\n blocks_total: int = 0\n blocks_kept: int = 0\n blocks_deduped: int = 0\n chars_removed: int = 0\n tokens_before: int = 0\n tokens_after: int = 0\n\n @property\n def blocks_skipped_too_short(self) -> int:\n return self.blocks_total - self.blocks_kept - self.blocks_deduped\n\n def as_dict(self) -> dict:\n return {\n \"blocks_total\": self.blocks_total,\n \"blocks_kept\": self.blocks_kept,\n \"blocks_deduped\": self.blocks_deduped,\n \"chars_removed\": self.chars_removed,\n \"tokens_before\": self.tokens_before,\n \"tokens_after\": self.tokens_after,\n }\n\n\ndef _run_dedup(text: str) -> tuple[str, DedupStats]:\n \"\"\"\n Run within-text block deduplication.\n\n Returns the rewritten text and statistics.\n \"\"\"\n stats = DedupStats(tokens_before=estimate_tokens(text))\n\n blocks = _split_blocks(text)\n stats.blocks_total = len(blocks)\n\n if not blocks:\n stats.tokens_after = stats.tokens_before\n return text, stats\n\n # Assign 1-based sequential numbers for use in references.\n # We'll use the position in the sorted block list as the \"block number\".\n # Blocks that are too short to consider receive no shingle set.\n\n # First pass: mark duplicates.\n # kept_blocks: list of (block_number, shingles) for blocks we are keeping.\n kept_blocks: list[tuple[int, frozenset]] = []\n\n for idx, block in enumerate(blocks):\n block_num = idx + 1 # 1-based\n short = len(block.text.strip()) \u003c _MIN_BLOCK_CHARS\n no_shingles = len(block.shingles) \u003c _MIN_SHINGLES\n\n if short or no_shingles:\n # Too short / no shingles — always keep, never dedup.\n block.kept = True\n block.ref_to = None\n continue\n\n # Compare against all previously kept blocks.\n duplicate_of: int | None = None\n for prev_num, prev_sh in kept_blocks:\n sim = _jaccard(block.shingles, prev_sh)\n if sim >= _SIM_THRESHOLD:\n duplicate_of = prev_num\n break\n\n if duplicate_of is not None:\n block.kept = False\n block.ref_to = duplicate_of\n else:\n block.kept = True\n kept_blocks.append((block_num, block.shingles))\n\n # Second pass: reconstruct the text.\n # We rebuild from the original text, replacing duplicate block spans with\n # compact references. Because blocks may not cover the full text (gaps\n # between them contain separators / fences), we work by scanning through\n # the original text character by character.\n\n result_parts: list[str] = []\n pos = 0\n blocks_kept = 0\n blocks_deduped = 0\n chars_removed = 0\n\n for block in blocks:\n # Append any gap before this block.\n if block.start > pos:\n result_parts.append(text[pos : block.start])\n pos = block.end\n\n if block.kept:\n result_parts.append(block.text)\n blocks_kept += 1\n else:\n ref = _REF_TEMPLATE.format(n=block.ref_to)\n result_parts.append(ref)\n chars_removed += len(block.text) - len(ref)\n blocks_deduped += 1\n\n # Append any trailing text after the last block.\n if pos \u003c len(text):\n result_parts.append(text[pos:])\n\n output = \"\".join(result_parts)\n\n stats.blocks_kept = blocks_kept\n stats.blocks_deduped = blocks_deduped\n stats.chars_removed = max(0, chars_removed)\n stats.tokens_after = estimate_tokens(output)\n\n return output, stats\n\n\n# ---------------------------------------------------------------------------\n# FusionStage\n# ---------------------------------------------------------------------------\n\nclass SemanticDedup(FusionStage):\n \"\"\"Near-duplicate content block eliminator.\n\n Splits text into blocks (paragraphs + fenced code blocks), fingerprints\n each with 3-word shingles, and replaces near-duplicate blocks\n (Jaccard >= 0.8) with compact back-references.\n \"\"\"\n\n name = \"semantic_dedup\"\n order = 12 # After Cortex(5), after any RLE-style stages(10), before Ionizer(15)\n\n def should_apply(self, ctx: FusionContext) -> bool:\n \"\"\"Apply to any content longer than 200 characters.\"\"\"\n return len(ctx.content) > 200\n\n def apply(self, ctx: FusionContext) -> FusionResult:\n original_tokens = estimate_tokens(ctx.content)\n output, stats = _run_dedup(ctx.content)\n compressed_tokens = estimate_tokens(output)\n\n markers: list[str] = []\n if stats.blocks_deduped > 0:\n markers.append(\n f\"semantic_dedup:{stats.blocks_deduped}_blocks_removed\"\n f\":{stats.tokens_before}->{compressed_tokens}_tokens\"\n )\n\n return FusionResult(\n content=output,\n original_tokens=original_tokens,\n compressed_tokens=compressed_tokens,\n markers=markers,\n )\n\n\n# ---------------------------------------------------------------------------\n# Cross-message deduplication\n# ---------------------------------------------------------------------------\n\ndef dedup_across_messages(\n messages: list[dict],\n) -> tuple[list[dict], dict]:\n \"\"\"Deduplicate repeated content across multiple chat messages.\n\n If message B's content is >80% similar to a prior message A's content,\n B's content is replaced with a compact reference to A.\n\n Only processes messages whose ``content`` value is a non-empty string.\n Messages with list-valued content (multi-part) are passed through\n unchanged.\n\n Args:\n messages: List of message dicts, each with at least a ``\"content\"``\n key (and typically a ``\"role\"`` key).\n\n Returns:\n A 2-tuple of (deduped_messages, stats).\n ``deduped_messages`` is a new list — the originals are not mutated.\n ``stats`` is a plain dict with keys:\n - ``messages_total``\n - ``messages_deduped``\n - ``tokens_before``\n - ``tokens_after``\n \"\"\"\n if not messages:\n return [], {\n \"messages_total\": 0,\n \"messages_deduped\": 0,\n \"tokens_before\": 0,\n \"tokens_after\": 0,\n }\n\n tokens_before = sum(\n estimate_tokens(m[\"content\"])\n for m in messages\n if isinstance(m.get(\"content\"), str)\n )\n\n # Build fingerprints for messages that are eligible for comparison.\n # A message is eligible when its content is a non-empty string and has\n # enough shingles.\n kept: list[tuple[int, frozenset]] = [] # (0-based index, shingles)\n deduped_messages: list[dict] = []\n deduped_count = 0\n\n for idx, msg in enumerate(messages):\n content = msg.get(\"content\")\n\n # Non-string or empty content — pass through unchanged.\n if not isinstance(content, str) or not content.strip():\n deduped_messages.append(dict(msg))\n continue\n\n sh = _shingles(_tokenise(content))\n too_short = len(content.strip()) \u003c _MIN_BLOCK_CHARS\n no_shingles = len(sh) \u003c _MIN_SHINGLES\n\n if too_short or no_shingles:\n deduped_messages.append(dict(msg))\n kept.append((idx, sh))\n continue\n\n # Compare against all previously kept messages.\n duplicate_of: int | None = None\n for prev_idx, prev_sh in kept:\n sim = _jaccard(sh, prev_sh)\n if sim >= _SIM_THRESHOLD:\n duplicate_of = prev_idx\n break\n\n if duplicate_of is not None:\n new_msg = dict(msg)\n new_msg[\"content\"] = _MSG_REF_TEMPLATE.format(idx=duplicate_of)\n deduped_messages.append(new_msg)\n deduped_count += 1\n else:\n deduped_messages.append(dict(msg))\n kept.append((idx, sh))\n\n tokens_after = sum(\n estimate_tokens(m[\"content\"])\n for m in deduped_messages\n if isinstance(m.get(\"content\"), str)\n )\n\n stats = {\n \"messages_total\": len(messages),\n \"messages_deduped\": deduped_count,\n \"tokens_before\": tokens_before,\n \"tokens_after\": tokens_after,\n }\n return deduped_messages, stats\n","content_type":"text/x-python; charset=utf-8","language":"python","size":14351,"content_sha256":"f16193b2bd48b88aec9f267fd899676474a8ac1be12556c3ce7761ba6765bdbd"},{"filename":"scripts/lib/fusion/skill_reinjection.py","content":"\"\"\"SkillReinjection — re-inject recently used skill schemas after compaction.\n\nInspired by Claude Code's skill schema re-injection: when history is compacted,\nrecently used tool/skill schemas are re-injected so the LLM retains awareness\nof available tools.\n\nUsage::\n\n from claw_compactor.fusion.skill_reinjection import SkillSchemaTracker\n\n tracker = SkillSchemaTracker()\n tracker.record_usage(\"read_file\", schema={...})\n tracker.record_usage(\"edit_file\", schema={...})\n\n # After compaction:\n injection_msg = tracker.build_injection_message()\n messages.append(injection_msg)\n\nPart of claw-compactor v8. License: MIT.\n\"\"\"\nfrom __future__ import annotations\n\nimport time\nfrom dataclasses import dataclass, field\nfrom typing import Any, Optional\n\nfrom claw_compactor.tokens import estimate_tokens\n\n\n# Budget for skill schema injection.\nSKILL_INJECTION_MAX_TOKENS = 10_000\nMAX_SKILLS_TO_INJECT = 15\n\n\n@dataclass\nclass SkillRecord:\n \"\"\"A tracked skill/tool usage.\"\"\"\n name: str\n schema: dict[str, Any] = field(default_factory=dict)\n usage_count: int = 0\n last_used_at: float = field(default_factory=time.time)\n description: str = \"\"\n\n\nclass SkillSchemaTracker:\n \"\"\"Tracks recently used skills/tools for re-injection after compaction.\n\n Skills are ranked by recency and usage frequency. After compaction,\n the most relevant skill schemas are re-injected into the context.\n \"\"\"\n\n def __init__(self, max_skills: int = MAX_SKILLS_TO_INJECT) -> None:\n self._skills: dict[str, SkillRecord] = {}\n self._max_skills = max_skills\n\n def record_usage(\n self,\n name: str,\n schema: Optional[dict[str, Any]] = None,\n description: str = \"\",\n ) -> SkillRecord:\n \"\"\"Record a skill/tool usage.\"\"\"\n if name in self._skills:\n record = self._skills[name]\n record.usage_count += 1\n record.last_used_at = time.time()\n if schema is not None:\n record.schema = schema\n if description:\n record.description = description\n else:\n record = SkillRecord(\n name=name,\n schema=schema or {},\n usage_count=1,\n description=description,\n )\n self._skills[name] = record\n return record\n\n def scan_messages_for_tools(\n self, messages: list[dict[str, Any]]\n ) -> dict[str, int]:\n \"\"\"Scan messages to auto-detect tool usage patterns.\n\n Looks for tool-role messages and function_call patterns.\n Returns stats about tools found.\n \"\"\"\n tools_found = 0\n for msg in messages:\n role = msg.get(\"role\", \"\")\n\n # Tool result messages.\n if role == \"tool\":\n tool_name = msg.get(\"name\", msg.get(\"tool_call_id\", \"\"))\n if tool_name:\n self.record_usage(tool_name)\n tools_found += 1\n\n # Assistant messages with tool_calls.\n tool_calls = msg.get(\"tool_calls\", [])\n if isinstance(tool_calls, list):\n for tc in tool_calls:\n if isinstance(tc, dict):\n fn = tc.get(\"function\", {})\n name = fn.get(\"name\", \"\")\n if name:\n self.record_usage(name)\n tools_found += 1\n\n # Legacy function_call format.\n fc = msg.get(\"function_call\")\n if isinstance(fc, dict):\n name = fc.get(\"name\", \"\")\n if name:\n self.record_usage(name)\n tools_found += 1\n\n return {\"tools_found\": tools_found}\n\n @property\n def recent_skills(self) -> list[SkillRecord]:\n \"\"\"Return skills sorted by recency, limited to max_skills.\"\"\"\n sorted_skills = sorted(\n self._skills.values(),\n key=lambda s: (s.last_used_at, s.usage_count),\n reverse=True,\n )\n return sorted_skills[: self._max_skills]\n\n def build_injection_message(self) -> Optional[dict[str, Any]]:\n \"\"\"Build a system message with recently used skill schemas.\n\n Returns None if there are no skills to inject.\n \"\"\"\n skills = self.recent_skills\n if not skills:\n return None\n\n lines: list[str] = [\n \"## Recently Used Tools (re-injected after compaction)\\n\"\n ]\n total_tokens = 0\n\n for skill in skills:\n skill_line = f\"- **{skill.name}**\"\n if skill.description:\n skill_line += f\": {skill.description}\"\n skill_line += f\" (used {skill.usage_count}x)\"\n\n line_tokens = estimate_tokens(skill_line)\n if total_tokens + line_tokens > SKILL_INJECTION_MAX_TOKENS:\n lines.append(\n f\"\\n[...{len(skills) - len(lines) + 1} more tools truncated]\"\n )\n break\n lines.append(skill_line)\n total_tokens += line_tokens\n\n # Include schema if available and within budget.\n if skill.schema:\n import json\n schema_text = json.dumps(skill.schema, indent=2)\n schema_tokens = estimate_tokens(schema_text)\n if total_tokens + schema_tokens \u003c= SKILL_INJECTION_MAX_TOKENS:\n lines.append(f\" ```json\\n {schema_text}\\n ```\")\n total_tokens += schema_tokens\n\n if len(lines) \u003c= 1:\n return None\n\n content = \"\\n\".join(lines)\n return {\n \"role\": \"system\",\n \"content\": content,\n \"_metadata\": {\n \"type\": \"skill_schema_reinjection\",\n \"skills_count\": len(skills),\n \"tokens\": estimate_tokens(content),\n },\n }\n\n def clear(self) -> None:\n \"\"\"Clear all tracked skills.\"\"\"\n self._skills.clear()\n\n def to_dict(self) -> dict[str, Any]:\n \"\"\"Serialize tracker state.\"\"\"\n return {\n \"skills\": [\n {\n \"name\": s.name,\n \"usage_count\": s.usage_count,\n \"description\": s.description,\n \"has_schema\": bool(s.schema),\n }\n for s in self.recent_skills\n ],\n }\n","content_type":"text/x-python; charset=utf-8","language":"python","size":6435,"content_sha256":"13093d7fe8be0182491f7dd23d0dcd8b94c0d09d60de83c9141c14c416d44752"},{"filename":"scripts/lib/rewind/__init__.py","content":"\"\"\"Rewind reversible compression engine for Claw Compactor v7.0.\n\nPart of claw-compactor. License: MIT.\n\"\"\"\nfrom .store import RewindStore\nfrom .marker import embed_marker, extract_markers, has_markers\nfrom .retriever import rewind_tool_def, handle_rewind\n\n__all__ = [\n \"RewindStore\",\n \"embed_marker\",\n \"extract_markers\",\n \"has_markers\",\n \"rewind_tool_def\",\n \"handle_rewind\",\n]\n","content_type":"text/x-python; charset=utf-8","language":"python","size":396,"content_sha256":"fffb3cca9482c191f35c9703c4e8f12afafb4d504da2053608f30ae6e3a06a54"},{"filename":"scripts/lib/rewind/marker.py","content":"\"\"\"Rewind markers: embed/extract hash references in compressed text.\n\nPart of claw-compactor. License: MIT.\n\"\"\"\nfrom __future__ import annotations\nimport re\nfrom dataclasses import dataclass\n\n\nMARKER_PATTERN = re.compile(\n r'\\[(\\d+) items? compressed to (\\d+)\\. Retrieve: hash=([a-f0-9]{24})\\]'\n)\n\n\n@dataclass(frozen=True)\nclass MarkerInfo:\n original_count: int\n compressed_count: int\n hash_id: str\n span: tuple[int, int] # (start, end) in text\n\n\ndef embed_marker(text: str, original_count: int, compressed_count: int, hash_id: str) -> str:\n \"\"\"Append a Rewind retrieval marker to compressed text.\"\"\"\n item_word = \"item\" if original_count == 1 else \"items\"\n marker = f\"[{original_count} {item_word} compressed to {compressed_count}. Retrieve: hash={hash_id}]\"\n return f\"{text}\\n{marker}\"\n\n\ndef extract_markers(text: str) -> list[MarkerInfo]:\n \"\"\"Extract all Rewind markers from text.\"\"\"\n markers = []\n for match in MARKER_PATTERN.finditer(text):\n markers.append(MarkerInfo(\n original_count=int(match.group(1)),\n compressed_count=int(match.group(2)),\n hash_id=match.group(3),\n span=(match.start(), match.end()),\n ))\n return markers\n\n\ndef has_markers(text: str) -> bool:\n \"\"\"Return True if text contains any Rewind markers.\"\"\"\n return bool(MARKER_PATTERN.search(text))\n\n\ndef strip_markers(text: str) -> str:\n \"\"\"Remove all Rewind markers from text.\"\"\"\n return MARKER_PATTERN.sub(\"\", text).rstrip()\n","content_type":"text/x-python; charset=utf-8","language":"python","size":1506,"content_sha256":"333d90370024761da6a5d134846b5a646a1a6f353cb8cd2d1787b6394369cfb8"},{"filename":"SECURITY.md","content":"# Security Policy\n\n## Reporting Security Issues\n\nIf you discover a security vulnerability, please report it privately via GitHub Security Advisories.\n\nPlease include:\n- Description of the issue\n- Steps to reproduce\n- Potential impact\n- Suggested fix (if available)\n\n## Response Timeline\n\n- Acknowledgment within 48 hours\n- Regular progress updates\n\n## Supported Versions\n\nOnly the latest release is actively supported with security updates.","content_type":"text/markdown; charset=utf-8","language":"markdown","size":440,"content_sha256":"114e46847bf36110b67aecf02f40f0d9bf80d57f60d241babe6959c409b5331e"},{"filename":"tests/__init__.py","content":"","content_type":"text/x-python; charset=utf-8","language":"python","size":0,"content_sha256":"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"}],"content_json":{"type":"doc","content":[{"type":"heading","attrs":{"level":1},"content":[{"text":"Claw Compactor — OpenClaw Skill Reference","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Overview","type":"text"}]},{"type":"paragraph","content":[{"text":"Claw Compactor reduces token usage across the full OpenClaw workspace using 6 compression layers:","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Layer","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Name","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Cost","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Notes","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"1","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Rule Engine","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Free","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Dedup, strip filler, merge sections","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"2","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Dictionary Encoding","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Free","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Auto-codebook, ","type":"text"},{"text":"$XX","type":"text","marks":[{"type":"code_inline"}]},{"text":" substitution","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"3","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Observation Compression","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Free","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Session JSONL → structured summaries","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"4","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"RLE Patterns","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Free","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Path/IP/enum shorthand","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"5","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Compressed Context Protocol","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Free","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Format abbreviations","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"6","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Engram","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"LLM API","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Real-time Observational Memory","type":"text"}]}]}]}]},{"type":"paragraph","content":[{"text":"Skill location:","type":"text","marks":[{"type":"strong"}]},{"text":" ","type":"text"},{"text":"skills/claw-compactor/","type":"text","marks":[{"type":"code_inline"}]},{"type":"br"},{"text":"Entry point:","type":"text","marks":[{"type":"strong"}]},{"text":" ","type":"text"},{"text":"scripts/mem_compress.py","type":"text","marks":[{"type":"code_inline"}]},{"type":"br"},{"text":"Engram CLI:","type":"text","marks":[{"type":"strong"}]},{"text":" ","type":"text"},{"text":"scripts/engram_cli.py","type":"text","marks":[{"type":"code_inline"}]}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Auto Mode (Recommended — Run at Session Start)","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"python3 skills/claw-compactor/scripts/mem_compress.py \u003cworkspace> auto","type":"text"}]},{"type":"paragraph","content":[{"text":"Automatically compresses all workspace files, tracks token counts between runs, and reports savings. Run this at the start of every session.","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Core Commands","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Full Pipeline (All Layers)","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"python3 scripts/mem_compress.py \u003cworkspace> full","type":"text"}]},{"type":"paragraph","content":[{"text":"Runs all 5 deterministic layers in optimal order. Typical: 50%+ combined savings.","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Benchmark (Non-Destructive)","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"python3 scripts/mem_compress.py \u003cworkspace> benchmark\n# JSON output:\npython3 scripts/mem_compress.py \u003cworkspace> benchmark --json","type":"text"}]},{"type":"paragraph","content":[{"text":"Dry-run report showing potential savings without writing any files.","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Individual Layers","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"# Layer 1: Rule-based compression\npython3 scripts/mem_compress.py \u003cworkspace> compress\n\n# Layer 2: Dictionary encoding\npython3 scripts/mem_compress.py \u003cworkspace> dict\n\n# Layer 3: Observation compression (session JSONL → summaries)\npython3 scripts/mem_compress.py \u003cworkspace> observe\n\n# Layer 4: RLE pattern encoding (runs inside `compress`)\n# Layer 5: Tokenizer optimization\npython3 scripts/mem_compress.py \u003cworkspace> optimize\n\n# Tiered summaries (L0/L1/L2)\npython3 scripts/mem_compress.py \u003cworkspace> tiers\n\n# Cross-file deduplication\npython3 scripts/mem_compress.py \u003cworkspace> dedup\n\n# Token count report\npython3 scripts/mem_compress.py \u003cworkspace> estimate\n\n# Workspace health check\npython3 scripts/mem_compress.py \u003cworkspace> audit","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Global Options","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":""},"content":[{"text":"--json Machine-readable JSON output\n--dry-run Preview without writing files\n--since DATE Filter sessions by date (YYYY-MM-DD)\n--auto-merge Auto-merge duplicates (dedup command)","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Engram — Layer 6: Real-Time Observational Memory","type":"text"}]},{"type":"paragraph","content":[{"text":"Engram is the flagship layer. It operates as a live engine alongside conversations, automatically compressing messages into structured, priority-annotated knowledge.","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Prerequisites","type":"text"}]},{"type":"paragraph","content":[{"text":"Configure via ","type":"text"},{"text":"engram.yaml","type":"text","marks":[{"type":"code_inline"}]},{"text":" (recommended) or environment variables:","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"yaml"},"content":[{"text":"# engram.yaml — place in claw-compactor root\nllm:\n provider: openai-compatible\n base_url: http://localhost:8403\n model: claude-code/sonnet\n max_tokens: 4096\n\nthreads:\n default:\n observer_threshold: 30000 # pending tokens before Observer fires\n reflector_threshold: 40000 # observation tokens before Reflector fires\n\nconcurrency:\n max_workers: 4 # parallel thread workers","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"# Alternative: environment variables\nexport ANTHROPIC_API_KEY=sk-ant-... # Preferred\n# or\nexport OPENAI_API_KEY=sk-... # OpenAI-compatible fallback\nexport OPENAI_BASE_URL=https://... # Optional: custom endpoint (local LLM, etc.)","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Engram Auto-Mode (Recommended for Production)","type":"text"}]},{"type":"paragraph","content":[{"text":"Auto-detects all active threads and processes them concurrently (4 workers):","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"# Single run — auto-detects all threads\npython3 scripts/engram_auto.py --workspace ~/.openclaw/workspace\n\n# Via shell wrapper\nbash scripts/engram-auto.sh\n\n# Via CLI\npython3 scripts/engram_cli.py \u003cworkspace> auto --config engram.yaml\npython3 scripts/engram_cli.py \u003cworkspace> status --thread openclaw-main\npython3 scripts/engram_cli.py \u003cworkspace> observe --thread openclaw-main\npython3 scripts/engram_cli.py \u003cworkspace> reflect --thread openclaw-main","type":"text"}]},{"type":"paragraph","content":[{"text":"Retry:","type":"text","marks":[{"type":"strong"}]},{"text":" LLM calls retry on 429/5xx with exponential backoff (2s→4s→8s, max 3 attempts). No retry on 400/401/403 (fail fast on config errors).","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Engram via Unified Entry Point","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"# Check all thread statuses\npython3 scripts/mem_compress.py \u003cworkspace> engram status\n\n# Force Observer for a thread\npython3 scripts/mem_compress.py \u003cworkspace> engram observe --thread \u003cthread-id>\n\n# Force Reflector for a thread\npython3 scripts/mem_compress.py \u003cworkspace> engram reflect --thread \u003cthread-id>\n\n# Print injectable context\npython3 scripts/mem_compress.py \u003cworkspace> engram context --thread \u003cthread-id>","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Engram via Dedicated CLI","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"# Status: all threads\npython3 scripts/engram_cli.py \u003cworkspace> status\n\n# Status: single thread\npython3 scripts/engram_cli.py \u003cworkspace> status --thread \u003cthread-id>\n\n# Force observe\npython3 scripts/engram_cli.py \u003cworkspace> observe --thread \u003cthread-id>\n\n# Force reflect\npython3 scripts/engram_cli.py \u003cworkspace> reflect --thread \u003cthread-id>\n\n# Import conversation from file (JSON array or JSONL)\npython3 scripts/engram_cli.py \u003cworkspace> ingest \\\n --thread \u003cthread-id> --input /path/to/conversation.jsonl\n\n# Get injectable context string (ready for system prompt)\npython3 scripts/engram_cli.py \u003cworkspace> context --thread \u003cthread-id>\n\n# JSON output for any command\npython3 scripts/engram_cli.py \u003cworkspace> status --json\npython3 scripts/engram_cli.py \u003cworkspace> context --thread \u003cid> --json","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Engram Daemon Mode (Real-Time Streaming)","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"# Start daemon, pipe JSONL messages via stdin\npython3 scripts/engram_cli.py \u003cworkspace> daemon --thread \u003cthread-id>\n\n# Pipe a message:\necho '{\"role\":\"user\",\"content\":\"Hello!\",\"timestamp\":\"12:00\"}' | \\\n python3 scripts/engram_cli.py \u003cworkspace> daemon --thread \u003cthread-id>\n\n# Control commands (send as JSONL):\necho '{\"__cmd\":\"observe\"}' # force observe now\necho '{\"__cmd\":\"reflect\"}' # force reflect now\necho '{\"__cmd\":\"status\"}' # print thread status JSON\necho '{\"__cmd\":\"quit\"}' # exit daemon\n\n# Quiet mode (suppress startup messages on stderr)\npython3 scripts/engram_cli.py \u003cworkspace> daemon --thread \u003cid> --quiet","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Engram Python API","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"python"},"content":[{"text":"from scripts.lib.engram import EngramEngine\n\nengine = EngramEngine(\n workspace_path=\"/path/to/workspace\",\n observer_threshold=30_000, # tokens before auto-observe\n reflector_threshold=40_000, # tokens before auto-reflect\n anthropic_api_key=\"sk-ant-...\", # or set ANTHROPIC_API_KEY env\n)\n\n# Add a message — auto-triggers observe/reflect when thresholds exceeded\nstatus = engine.add_message(\"thread-id\", role=\"user\", content=\"Hello!\")\n# Returns: {\"observed\": bool, \"reflected\": bool, \"pending_tokens\": int, ...}\n\n# Manual trigger regardless of thresholds\nobs_text = engine.observe(\"thread-id\") # returns None if no pending msgs\nref_text = engine.reflect(\"thread-id\") # returns None if no observations\n\n# Get full context dict\nctx = engine.get_context(\"thread-id\")\n# Returns: {\"thread_id\", \"observations\", \"reflection\", \"recent_messages\", \"stats\", \"meta\"}\n\n# Build injectable system context string\nctx_str = engine.build_system_context(\"thread-id\")\n# Ready to prepend to system prompt","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Engram Configuration Variables","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Variable","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Default","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Description","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"ANTHROPIC_API_KEY","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"—","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Anthropic API key (preferred)","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"OPENAI_API_KEY","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"—","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"OpenAI-compatible API key","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"OPENAI_BASE_URL","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"https://api.openai.com","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Custom endpoint for local LLMs","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"OM_OBSERVER_THRESHOLD","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"30000","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Pending tokens before auto-observe","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"OM_REFLECTOR_THRESHOLD","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"40000","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Observation tokens before auto-reflect","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"OM_MODEL","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"claude-opus-4-5","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"LLM model override","type":"text"}]}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Threshold Tuning Quick Reference","type":"text"}]},{"type":"paragraph","content":[{"text":"Each Observer call ≈ 2K output tokens (Sonnet). Daily volume at default 30K threshold:","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Channel","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Daily Tokens","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"@30K threshold","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"@10K threshold","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"#aimm","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"~149K","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"~5×/day","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"~15×/day","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"openclaw-main","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"~138K","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"~4.5×/day","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"~14×/day","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"#open-compress","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"~68K","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"~2.3×/day","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"~7×/day","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"#general","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"~62K","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"~2×/day","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"~6×/day","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"subagent","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"~43K","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"~1.4×/day","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"~4×/day","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"cron","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"~9K","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"~0.3×/day","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"~1×/day","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Total","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"~470K/day","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"~16×/day (~32K output tokens)","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"~47×/day (~94K output tokens)","type":"text","marks":[{"type":"strong"}]}]}]}]}]},{"type":"paragraph","content":[{"text":"Start at ","type":"text"},{"text":"observer_threshold: 30000","type":"text","marks":[{"type":"code_inline"}]},{"text":". Tune down for fresher context; tune up to reduce cost.","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Engram Benchmark Summary","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Strategy","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Token Savings","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"ROUGE-L","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"IR-F1","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Latency","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"LLM Calls","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Engram (L6)","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"87.5%","type":"text","marks":[{"type":"strong"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"0.038","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"0.414","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"~35s","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"2","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"RuleCompressor (L1–5)","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"9.0%","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"0.923","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"0.958","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"~6ms","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"0","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"RandomDrop","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"21.5%","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"0.852","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"0.911","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"~0ms","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"0","type":"text"}]}]}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Engram low ROUGE-L = semantic restructuring, not verbatim copy — intent is preserved","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Use RuleCompressor for instant prompt compression; Engram for long-term memory","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Full results → ","type":"text"},{"text":"benchmark/RESULTS.md","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Observation Format","type":"text"}]},{"type":"paragraph","content":[{"text":"Engram produces structured, bilingual (EN/中文) priority-annotated logs:","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":""},"content":[{"text":"Date: 2026-03-05\n- 🔴 12:10 User building OpenCompress; deadline one week / 用户在构建 OpenCompress,deadline 一周内\n - 🔴 12:10 Using ModernBERT-large / 使用 ModernBERT-large\n - 🟡 12:12 Discussed annotation strategy / 讨论了标注策略\n- 🟡 12:30 Deployment pipeline discussion on M3 Ultra\n- 🟢 12:45 User prefers concise replies","type":"text"}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"🔴 ","type":"text"},{"text":"Critical","type":"text","marks":[{"type":"strong"}]},{"text":" — goals, deadlines, blockers, key decisions (never dropped)","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"🟡 ","type":"text"},{"text":"Important","type":"text","marks":[{"type":"strong"}]},{"text":" — technical details, ongoing work, preferences","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"🟢 ","type":"text"},{"text":"Useful","type":"text","marks":[{"type":"strong"}]},{"text":" — background, mentions, soft context","type":"text"}]}]}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Memory Storage Layout","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":""},"content":[{"text":"memory/engram/{thread_id}/\n├── pending.jsonl # Unobserved message buffer (auto-cleared after observe)\n├── observations.md # Observer output — append-only structured log\n├── reflections.md # Reflector output — compressed long-term memory (overwrites)\n└── meta.json # Timestamps and token counts","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Integration with OpenClaw Memory System","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"System Prompt Injection","type":"text"}]},{"type":"paragraph","content":[{"text":"Inject Engram context at the start of each session:","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"python"},"content":[{"text":"from scripts.lib.engram import EngramEngine\n\nengine = EngramEngine(workspace_path)\nctx_str = engine.build_system_context(\"my-session\")\nif ctx_str:\n system_prompt = ctx_str + \"\\n\\n\" + base_system_prompt","type":"text"}]},{"type":"paragraph","content":[{"text":"The ","type":"text"},{"text":"build_system_context()","type":"text","marks":[{"type":"code_inline"}]},{"text":" output structure:","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":""},"content":[{"text":"## Long-Term Memory (Reflections)\n\u003cReflector output — long-term compressed context>\n\n## Recent Observations\n\u003cLast 200 lines of Observer output>\n\n\u003c!-- engram_tokens: 1234 -->","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Combining Engram with Deterministic Layers","type":"text"}]},{"type":"paragraph","content":[{"text":"After an Engram session, run the deterministic pipeline on the output files:","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"# Engram produces observations.md and reflections.md\n# Then apply deterministic compression to further reduce those:\npython3 scripts/mem_compress.py \u003cworkspace> full","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Recommended Workflow for Long-Running Agent Sessions","type":"text"}]},{"type":"ordered_list","attrs":{"order":1,"listStyle":"number"},"content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Session start:","type":"text","marks":[{"type":"strong"}]},{"text":" inject ","type":"text"},{"text":"build_system_context()","type":"text","marks":[{"type":"code_inline"}]},{"text":" into system prompt","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Each message:","type":"text","marks":[{"type":"strong"}]},{"text":" call ","type":"text"},{"text":"engine.add_message()","type":"text","marks":[{"type":"code_inline"}]},{"text":" — auto-triggers observe/reflect","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Session end / weekly cron:","type":"text","marks":[{"type":"strong"}]},{"text":" run ","type":"text"},{"text":"full","type":"text","marks":[{"type":"code_inline"}]},{"text":" pipeline on workspace","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Multi-session continuity:","type":"text","marks":[{"type":"strong"}]},{"text":" context persists in ","type":"text"},{"text":"memory/engram/{thread}/","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"OpenClaw Skill Installation","type":"text"}]},{"type":"paragraph","content":[{"text":"To install as an OpenClaw skill, ensure the skill directory is available at:","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":""},"content":[{"text":"~/.openclaw/workspace/skills/claw-compactor/","type":"text"}]},{"type":"paragraph","content":[{"text":"or configure the path in your OpenClaw skill registry.","type":"text"}]},{"type":"paragraph","content":[{"text":"SKILL.md is read by the OpenClaw agent dispatcher. The ","type":"text"},{"text":"description","type":"text","marks":[{"type":"code_inline"}]},{"text":" and ","type":"text"},{"text":"triggers","type":"text","marks":[{"type":"code_inline"}]},{"text":" fields above control when this skill is automatically activated.","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Heartbeat / Cron Automation","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"markdown"},"content":[{"text":"## Memory Maintenance (weekly)\n- python3 skills/claw-compactor/scripts/mem_compress.py \u003cworkspace> benchmark\n- If savings > 5%: run full pipeline\n- If pending Engram messages: run engram observe --thread \u003cid>","type":"text"}]},{"type":"paragraph","content":[{"text":"Cron (Sunday 3am):","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"0 3 * * 0 cd /path/to/skills/claw-compactor && \\\n python3 scripts/mem_compress.py /path/to/workspace full","type":"text"}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Output Artifacts Reference","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Artifact","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Location","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Description","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Dictionary codebook","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"memory/.codebook.json","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Must travel with memory files","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Observed session log","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"memory/.observed-sessions.json","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Tracks processed transcripts","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Layer 3 summaries","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"memory/observations/","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Observation compression output","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Engram observations","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"memory/engram/{thread}/observations.md","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Live Observer log","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Engram reflections","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"memory/engram/{thread}/reflections.md","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Distilled long-term memory","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Level 0 summary","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"memory/MEMORY-L0.md","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"~200 token ultra-compressed summary","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Level 1 summary","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"memory/MEMORY-L1.md","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"~500 token compressed summary","type":"text"}]}]}]}]},{"type":"hr","attrs":{"markup":"---"}},{"type":"heading","attrs":{"level":2},"content":[{"text":"Troubleshooting","type":"text"}]},{"type":"table","attrs":{"layout":null},"content":[{"type":"tr","content":[{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Problem","type":"text"}]}]},{"type":"th","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Solution","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"FileNotFoundError","type":"text","marks":[{"type":"code_inline"}]},{"text":" on workspace","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Point path to workspace root containing ","type":"text"},{"text":"memory/","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Dictionary decompression fails","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Check ","type":"text"},{"text":"memory/.codebook.json","type":"text","marks":[{"type":"code_inline"}]},{"text":" is valid JSON","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Zero savings on ","type":"text"},{"text":"benchmark","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Workspace already optimized","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"observe","type":"text","marks":[{"type":"code_inline"}]},{"text":" finds no transcripts","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Check ","type":"text"},{"text":"sessions/","type":"text","marks":[{"type":"code_inline"}]},{"text":" for ","type":"text"},{"text":".jsonl","type":"text","marks":[{"type":"code_inline"}]},{"text":" files","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Engram: \"no API key configured\"","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Set ","type":"text"},{"text":"ANTHROPIC_API_KEY","type":"text","marks":[{"type":"code_inline"}]},{"text":" or ","type":"text"},{"text":"OPENAI_API_KEY","type":"text","marks":[{"type":"code_inline"}]}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Engram Observer returns ","type":"text"},{"text":"None","type":"text","marks":[{"type":"code_inline"}]}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"No pending messages for that thread","type":"text"}]}]}]},{"type":"tr","content":[{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Token counts seem wrong","type":"text"}]}]},{"type":"td","attrs":{"colspan":1,"rowspan":1,"colwidth":null,"alignment":""},"content":[{"type":"paragraph","content":[{"text":"Install tiktoken: ","type":"text"},{"text":"pip3 install tiktoken","type":"text","marks":[{"type":"code_inline"}]}]}]}]}]},{"type":"hr","attrs":{"markup":"---"}}]},"metadata":{"date":"2026-06-05","name":"claw-compactor","author":"@skillopedia","source":{"stars":2209,"repo_name":"claw-compactor","origin_url":"https://github.com/aeromomo/claw-compactor/blob/HEAD/SKILL.md","repo_owner":"aeromomo","body_sha256":"310e1146d1ebc646ada2b46a00bf9e8cfa8c486df632dc628290c8a72a50b943","cluster_key":"a298d67945b0c3c5b0d6ce3e09f8dd98d24db7f203312170ff203c0b91024447","clean_bundle":{"format":"clean-skill-bundle-v1","source":"aeromomo/claw-compactor/SKILL.md","attachments":[{"id":"480dd1c9-f9ac-5121-838c-a8aa15c783fe","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/480dd1c9-f9ac-5121-838c-a8aa15c783fe/attachment.yml","path":".github/DISCUSSION_TEMPLATE/ideas.yml","size":830,"sha256":"2789534e12cbf6ef6500da5189da2993f983bb2eeea1ea20b5871136e218ef00","contentType":"application/yaml; charset=utf-8"},{"id":"8d0f204b-323b-5aaa-be0d-37090c9ec26c","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/8d0f204b-323b-5aaa-be0d-37090c9ec26c/attachment.yml","path":".github/DISCUSSION_TEMPLATE/q-a.yml","size":571,"sha256":"a6ad681ba1b2373ad74e82812bb6c1164072dc48dc988cd0978a7a24427545ca","contentType":"application/yaml; charset=utf-8"},{"id":"23e52975-6da1-5145-846b-248b178f3f0b","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/23e52975-6da1-5145-846b-248b178f3f0b/attachment.yml","path":".github/FUNDING.yml","size":49,"sha256":"a92671cb63708a035dfc831ba92b67fdceb34e89f01dfe65e368743dda1f8ca4","contentType":"application/yaml; charset=utf-8"},{"id":"47e2e6b8-d843-51c2-9df7-7bb91395bfb9","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/47e2e6b8-d843-51c2-9df7-7bb91395bfb9/attachment.md","path":".github/ISSUE_TEMPLATE/bug_report.md","size":701,"sha256":"967a6944c52902df9c09be9c4daab90024665ad0e1870fe7df371dfc6a5cfc87","contentType":"text/markdown; charset=utf-8"},{"id":"0b309921-7d82-5e52-a60e-0c93220e8477","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/0b309921-7d82-5e52-a60e-0c93220e8477/attachment.yml","path":".github/ISSUE_TEMPLATE/config.yml","size":285,"sha256":"2370899fc8b2933174bb647d0665f9129bacaf54a295d141b9534aa2868eccc6","contentType":"application/yaml; charset=utf-8"},{"id":"23b45fde-eb69-5597-b1e9-0a0c211882eb","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/23b45fde-eb69-5597-b1e9-0a0c211882eb/attachment.md","path":".github/ISSUE_TEMPLATE/feature_request.md","size":784,"sha256":"f97190a59e8018ce50b1bccab4ed3ebbe6f31c7942be29d64ef5e6735252586a","contentType":"text/markdown; charset=utf-8"},{"id":"8ea22e82-08db-56fd-b8f3-bc16317dc709","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/8ea22e82-08db-56fd-b8f3-bc16317dc709/attachment.md","path":".github/PULL_REQUEST_TEMPLATE.md","size":1208,"sha256":"8190df918b783368a4e50694beca1fecadce72a3ffc0a6033a34f52c50da2781","contentType":"text/markdown; charset=utf-8"},{"id":"bc030f98-5980-5cda-8e48-e0df32426c31","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/bc030f98-5980-5cda-8e48-e0df32426c31/attachment.yml","path":".github/workflows/ci.yml","size":2361,"sha256":"1de6cc704ff6accd379e857ef8f197d7dc3515df6d91532de2adfb3d31b50915","contentType":"application/yaml; charset=utf-8"},{"id":"667194f2-3385-597e-9546-ecc0ef88f164","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/667194f2-3385-597e-9546-ecc0ef88f164/attachment.yml","path":".github/workflows/docs.yml","size":484,"sha256":"d6cce84b87a599121a18741a7c70176cdc75f8e33c07d8eab214ac8ef4b86d09","contentType":"application/yaml; charset=utf-8"},{"id":"45640b0d-94c8-51c6-92f2-a5966ffeb4be","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/45640b0d-94c8-51c6-92f2-a5966ffeb4be/attachment.yml","path":".github/workflows/publish.yml","size":466,"sha256":"2422038e652d7090ade352227a093b687d3f9b2298e3f5e5610963aadb7e6c35","contentType":"application/yaml; charset=utf-8"},{"id":"49c7bc55-a884-5e52-855d-994ab20bc11c","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/49c7bc55-a884-5e52-855d-994ab20bc11c/attachment.yml","path":".github/workflows/stale.yml","size":522,"sha256":"236ef0772413aa810e2a690d628088a1b06b310fbc37cf14312d1166e0cf1b16","contentType":"application/yaml; charset=utf-8"},{"id":"c2db10fe-d5f7-5570-8502-d1b78cd060cb","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/c2db10fe-d5f7-5570-8502-d1b78cd060cb/attachment","path":".gitignore","size":145,"sha256":"ebe34bec08da60eab0aebc4a164dcc8c3ad7471a582f4a93846bdf056a457dee","contentType":"text/plain; charset=utf-8"},{"id":"57b4848c-d2b2-51be-87f3-a2a27e07531d","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/57b4848c-d2b2-51be-87f3-a2a27e07531d/attachment.md","path":"ARCHITECTURE.md","size":27022,"sha256":"5f787d6218830e0bea6d612e68aad9420134d176cb241c16e594208553f3b5ff","contentType":"text/markdown; charset=utf-8"},{"id":"3cbd4752-71db-5534-86cd-7ffaebdac565","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/3cbd4752-71db-5534-86cd-7ffaebdac565/attachment.md","path":"CHANGELOG.md","size":4244,"sha256":"747f4e565fabc983eed6a70030e039348525134cbfb2606180230abbc0b2b23c","contentType":"text/markdown; charset=utf-8"},{"id":"cbc0883b-1860-5a5b-ae23-fa661e484654","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/cbc0883b-1860-5a5b-ae23-fa661e484654/attachment.cff","path":"CITATION.cff","size":706,"sha256":"8b827a16f9b768fb08f5a936cc5e66d3299c1971d747424a1a502b46cbc5abef","contentType":"text/plain; charset=utf-8"},{"id":"94dd0a7e-82ad-53b3-8e82-8ebf622b046e","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/94dd0a7e-82ad-53b3-8e82-8ebf622b046e/attachment.md","path":"CODE_OF_CONDUCT.md","size":2336,"sha256":"4d6cdec9894b340cf2ede882cb32b9db6dbacb6d0e9fb9f74c4c30696a022a42","contentType":"text/markdown; charset=utf-8"},{"id":"5c0071a5-b75c-5d5a-b48b-0722c570fd1e","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/5c0071a5-b75c-5d5a-b48b-0722c570fd1e/attachment.md","path":"CONTRIBUTING.md","size":4001,"sha256":"0f333fb62c551786a3e16fcbf2b878f3c423f2776072a1bcdfe60fa8c475cf53","contentType":"text/markdown; charset=utf-8"},{"id":"731674b6-8fc6-5213-bde0-a752fff103b0","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/731674b6-8fc6-5213-bde0-a752fff103b0/attachment.md","path":"README.md","size":16755,"sha256":"bb7cfbd0f7a2bd4ff4eaf39c3572af28505c29ccbe4a722e2b72b0ffdb252cba","contentType":"text/markdown; charset=utf-8"},{"id":"00dd281d-1ccf-5931-ab41-a1c279c10ad1","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/00dd281d-1ccf-5931-ab41-a1c279c10ad1/attachment.md","path":"SECURITY.md","size":440,"sha256":"114e46847bf36110b67aecf02f40f0d9bf80d57f60d241babe6959c409b5331e","contentType":"text/markdown; charset=utf-8"},{"id":"f91e7f0c-6c57-5bc3-a1f0-9e19125f6bcd","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/f91e7f0c-6c57-5bc3-a1f0-9e19125f6bcd/attachment.png","path":"assets/banner.png","size":909779,"sha256":"520b1cfa1f23a8c6f46f6642342cf04664afbd3116603227dd33a274683a393a","contentType":"image/png"},{"id":"246d0fd8-bbc0-5f1c-9efd-a0564594f1a5","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/246d0fd8-bbc0-5f1c-9efd-a0564594f1a5/attachment.webp","path":"assets/banner.webp","size":36,"sha256":"c3fa59901d56ce8a95a303b22fd119cb94abf4f43c4f6d60a81fd78b7d00fa65","contentType":"image/webp"},{"id":"388605b8-6534-5709-a6a6-459508b9e458","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/388605b8-6534-5709-a6a6-459508b9e458/attachment.md","path":"benchmark/RESULTS.md","size":5904,"sha256":"26b1179f59c2d8b6e6330a8f9ef6514e76641edd29c49301f6e697cda69c5b67","contentType":"text/markdown; charset=utf-8"},{"id":"f96ab669-85b5-53b4-90eb-13a7d5d25893","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/f96ab669-85b5-53b4-90eb-13a7d5d25893/attachment.py","path":"benchmark/__init__.py","size":20,"sha256":"6b3afc29b6c065525a16de48a0d287d429a469743c214a5cbe6c15edce14e219","contentType":"text/x-python; charset=utf-8"},{"id":"b9cc0bc6-5c3b-5cd0-b5a6-a5744c2fa657","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/b9cc0bc6-5c3b-5cd0-b5a6-a5744c2fa657/attachment.py","path":"benchmark/compressors.py","size":17973,"sha256":"48b3eced499d84a150eff7e96b0afaface85697a0899a29aeebe0e7cd983250c","contentType":"text/x-python; charset=utf-8"},{"id":"99eee07d-2e6a-5502-b754-b1ceedc2e3d5","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/99eee07d-2e6a-5502-b754-b1ceedc2e3d5/attachment.json","path":"benchmark/data/sample_01_devops.json","size":20694,"sha256":"f15866eaefeadce3bee9295e823b148ee357c587f189ac0371984ba3aae7e516","contentType":"application/json; charset=utf-8"},{"id":"9ed433d5-be7b-56cb-b586-d5ed10356a19","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/9ed433d5-be7b-56cb-b586-d5ed10356a19/attachment.json","path":"benchmark/data/sample_02_trading.json","size":15836,"sha256":"c92127d72e2edaa628fc7d08c54f89bcb4a3042980c9b8d157fc20bb402be667","contentType":"application/json; charset=utf-8"},{"id":"5db1aaa9-695f-59f8-a24a-b03109fbce02","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/5db1aaa9-695f-59f8-a24a-b03109fbce02/attachment.json","path":"benchmark/data/sample_03_ml_short.json","size":8757,"sha256":"65a1513898e3bb88694e15bc789639a46dcedc7ba10c1f067b6c4d48e7c828ec","contentType":"application/json; charset=utf-8"},{"id":"29fc3f4f-2956-5ec9-8737-9fd5691450c5","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/29fc3f4f-2956-5ec9-8737-9fd5691450c5/attachment.json","path":"benchmark/data/sample_04_mixed_long.json","size":22600,"sha256":"92405651ba890fcddedafceb066b56983c6d7137c735aaa69b14060cb12a929c","contentType":"application/json; charset=utf-8"},{"id":"9e428c9d-24ec-5803-851b-179aed6dd150","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/9e428c9d-24ec-5803-851b-179aed6dd150/attachment.json","path":"benchmark/data/sample_05_sysadmin.json","size":15159,"sha256":"ceb9992c44f5302c1ad196195bf6e19b554bea81e5a1f0d75bb06a82b85d16f9","contentType":"application/json; charset=utf-8"},{"id":"9444c96a-8bc0-591e-b4b8-dd20a600afad","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/9444c96a-8bc0-591e-b4b8-dd20a600afad/attachment.py","path":"benchmark/evaluate.py","size":9851,"sha256":"7bf8e50b3e556eac91bd6f5eaedfd2cd58bf9b55ef1fe26d891f5a02e9923ced","contentType":"text/x-python; charset=utf-8"},{"id":"3be7c23b-64a0-598d-ba91-b17b66bf13ed","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/3be7c23b-64a0-598d-ba91-b17b66bf13ed/attachment.py","path":"benchmark/report.py","size":10448,"sha256":"3dee350168e6d93662ebb5fe4350f0c781768f1f44d8371d9693919e08e2b0b4","contentType":"text/x-python; charset=utf-8"},{"id":"cbf1ed4c-ced2-5ca2-aa85-c67dfd0a072a","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/cbf1ed4c-ced2-5ca2-aa85-c67dfd0a072a/attachment.json","path":"benchmark/results/benchmark_results.json","size":16892,"sha256":"fa604abb9e929f6252e4a6e9fc34aa46321a2eff4cbb6486247989b249eb4b36","contentType":"application/json; charset=utf-8"},{"id":"56246f62-8958-529c-95f4-e6604af1c763","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/56246f62-8958-529c-95f4-e6604af1c763/attachment.py","path":"benchmark/run_benchmark.py","size":8230,"sha256":"42dd7213a94fa5173c22221325cf45f4b63806c2cf5b279061280eb8af073fab","contentType":"text/x-python; charset=utf-8"},{"id":"4f12e33e-a95c-5765-b3fc-1b62f391af3b","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/4f12e33e-a95c-5765-b3fc-1b62f391af3b/attachment.md","path":"docs/README.md","size":4919,"sha256":"80fce28c2a799d22d579c130b1f6485d27bb5fb93c723b7a84a60e46e34afadb","contentType":"text/markdown; charset=utf-8"},{"id":"462f4955-4369-55de-9a05-48c30d4d3aaa","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/462f4955-4369-55de-9a05-48c30d4d3aaa/attachment.md","path":"docs/api/engine.md","size":1110,"sha256":"3976ff931d4cc96ccd2f7fd936fcb15ca43d5697e7ade0f0d7659daec6f00dda","contentType":"text/markdown; charset=utf-8"},{"id":"5a2f50ad-9ad9-5185-8e47-8bdcc10d025d","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/5a2f50ad-9ad9-5185-8e47-8bdcc10d025d/attachment.md","path":"docs/api/rewind.md","size":1258,"sha256":"40f3457a6f0bba56aee72efc76d0454b9199bf3dea55838e45ba7733d2238609","contentType":"text/markdown; charset=utf-8"},{"id":"47771627-9217-5e71-a8f3-8967b9a64b5e","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/47771627-9217-5e71-a8f3-8967b9a64b5e/attachment.md","path":"docs/api/stages.md","size":893,"sha256":"8e1e96450cc145d663ca373e9a9032f8828a3c1f52e7832a3821bf689f7891a3","contentType":"text/markdown; charset=utf-8"},{"id":"8fcfbea4-f923-56d2-a1cf-f5a22c230f73","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/8fcfbea4-f923-56d2-a1cf-f5a22c230f73/attachment.md","path":"docs/architecture/overview.md","size":1981,"sha256":"87a32fa67865358802a333eb6d712d48924b4fa15a411b5504d13be20be5d01f","contentType":"text/markdown; charset=utf-8"},{"id":"795d47b9-e67e-5753-905f-f9d1950ca78b","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/795d47b9-e67e-5753-905f-f9d1950ca78b/attachment.md","path":"docs/architecture/pipeline.md","size":2018,"sha256":"e508a88934fc5a0791b97a00fd1449c7edb22cba0ed969bc5426acd67ac8aa48","contentType":"text/markdown; charset=utf-8"},{"id":"6a0dd285-41b8-5b61-8ef3-fd4ec4db3d51","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/6a0dd285-41b8-5b61-8ef3-fd4ec4db3d51/attachment.md","path":"docs/architecture/stages.md","size":1778,"sha256":"677f8e54dea734770c25a4492dd8bd9135b4f82a7283f6eea1237098b7d4cbf2","contentType":"text/markdown; charset=utf-8"},{"id":"9f226c21-7499-5142-b75d-1b91d1d8e3ad","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/9f226c21-7499-5142-b75d-1b91d1d8e3ad/attachment.md","path":"docs/benchmarks.md","size":4627,"sha256":"093bd401c2de4f30d0fbb96fe5e9315d686bca33fa63379977f6d0595f7dd703","contentType":"text/markdown; charset=utf-8"},{"id":"455f8491-b584-5d75-9323-3a3e360f2afc","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/455f8491-b584-5d75-9323-3a3e360f2afc/attachment.md","path":"docs/contributing.md","size":149,"sha256":"c981bcb4ad2a4140291e867d2d0c7e6929d69250946d6c6fc58d197e8c03c28d","contentType":"text/markdown; charset=utf-8"},{"id":"28fd2ceb-f42e-50b5-a3b8-9b8d623bec79","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/28fd2ceb-f42e-50b5-a3b8-9b8d623bec79/attachment.md","path":"docs/getting-started/installation.md","size":712,"sha256":"9b082cf5498607dc20dd7bab7cb7c136cb2152b99ddd9a493ab1cd5602514706","contentType":"text/markdown; charset=utf-8"},{"id":"ab70aa33-e35c-5eda-a59f-13cfd0e6f516","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/ab70aa33-e35c-5eda-a59f-13cfd0e6f516/attachment.md","path":"docs/getting-started/quickstart.md","size":1865,"sha256":"d6d540e71354bfd72b77a12cb78921a16052141eb0e513a70d975c559ec974ea","contentType":"text/markdown; charset=utf-8"},{"id":"9eff42e3-7c64-57dc-bd64-c5a8c1b71052","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/9eff42e3-7c64-57dc-bd64-c5a8c1b71052/attachment.md","path":"docs/index.md","size":2031,"sha256":"bddf2382fd71c8e76f44f4208110948d930a7cbb767c6441277d23267085c06a","contentType":"text/markdown; charset=utf-8"},{"id":"4ecb8f3a-833a-5445-a09c-03d290fbed93","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/4ecb8f3a-833a-5445-a09c-03d290fbed93/attachment.yaml","path":"engram.yaml","size":1528,"sha256":"f4351ce7f6b9b56cbb9c5d6f3eed32d03b0d06ef19d12789624e380521f05bbf","contentType":"application/yaml; charset=utf-8"},{"id":"ce75e16a-82e1-5c96-b33b-cef84471115a","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/ce75e16a-82e1-5c96-b33b-cef84471115a/attachment.yml","path":"mkdocs.yml","size":1647,"sha256":"c73d8a72ebe0abaa2ebe1c72c2067fbc3392f38e80c0484c643e5bd90001b01f","contentType":"application/yaml; charset=utf-8"},{"id":"9eb52745-83b5-5c10-a37e-a3c1f62d27af","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/9eb52745-83b5-5c10-a37e-a3c1f62d27af/attachment","path":"proxy/.gitignore","size":62,"sha256":"0670b7347c1ade8c63505c9f517d4a46ad11359ccb32e48830c301662a3773fe","contentType":"text/plain; charset=utf-8"},{"id":"6f699d0c-6333-51c1-883e-afcf8a6280ba","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/6f699d0c-6333-51c1-883e-afcf8a6280ba/attachment.md","path":"proxy/README.md","size":3767,"sha256":"ccf5021d69280e9a3fbced0b0654af71db10e2cefc4fea0a93d026a3a70ee956","contentType":"text/markdown; charset=utf-8"},{"id":"64e814a6-358a-5d06-ab4b-829be9184527","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/64e814a6-358a-5d06-ab4b-829be9184527/attachment.mjs","path":"proxy/compression-middleware.mjs","size":10064,"sha256":"d839dd77e839d5cea9ce3e8e256f47e07c4d0422370213e96d5d469aae87a5d1","contentType":"text/javascript"},{"id":"69e63417-45db-5c0d-ace2-afc62bef496e","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/69e63417-45db-5c0d-ace2-afc62bef496e/attachment.html","path":"proxy/dashboard.html","size":40632,"sha256":"908e2843742606c193843c6309b858ebca14ab90ec90ce214a156556bf22ecb4","contentType":"text/html; charset=utf-8"},{"id":"da433878-7826-5368-8087-191a3eceed72","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/da433878-7826-5368-8087-191a3eceed72/attachment.mjs","path":"proxy/event-log.mjs","size":5942,"sha256":"20eea125179143879d25491d751fe3dfb28bcd8878b4fbef46bbffd4ba2774f8","contentType":"text/javascript"},{"id":"58d1eec9-633b-5490-ac1b-3efb362c0aae","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/58d1eec9-633b-5490-ac1b-3efb362c0aae/attachment.mjs","path":"proxy/fair-queue.mjs","size":8739,"sha256":"463f9add602934b31940ae2dd40641e3e3370d7bd77903b9d8a26c6f985b53aa","contentType":"text/javascript"},{"id":"d6a51de8-d3a1-50c7-94e9-3a8f520abe54","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/d6a51de8-d3a1-50c7-94e9-3a8f520abe54/attachment.mjs","path":"proxy/metrics-store.mjs","size":11444,"sha256":"351df4254c8453fdcbeb74aa05218011c86557e88ab17e222124cca5fa972054","contentType":"text/javascript"},{"id":"23c1bd37-9aca-584b-96c7-c5b2fdb87eeb","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/23c1bd37-9aca-584b-96c7-c5b2fdb87eeb/attachment.json","path":"proxy/package-lock.json","size":4510,"sha256":"605f78614b2a2b5b5b455fef064830047ca50014a1f6de39673ad5c102413d46","contentType":"application/json; charset=utf-8"},{"id":"eeb4bc60-857a-5c97-b15a-7d6fe50f0bb8","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/eeb4bc60-857a-5c97-b15a-7d6fe50f0bb8/attachment.json","path":"proxy/package.json","size":375,"sha256":"fcbf92c04de5a88853f7582f1ba547318a5a1d877879738d26d7f53df57b4ad2","contentType":"application/json; charset=utf-8"},{"id":"1e48530f-1209-50cc-869f-37978ab39dba","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/1e48530f-1209-50cc-869f-37978ab39dba/attachment.html","path":"proxy/portal.html","size":9211,"sha256":"e13d0370ba62d7a42e2dd7c47fbc576116cea999e5cc4c7230edcf483610ebad","contentType":"text/html; charset=utf-8"},{"id":"71498db0-67dd-5c0a-8f6a-d0df9c6221f3","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/71498db0-67dd-5c0a-8f6a-d0df9c6221f3/attachment.mjs","path":"proxy/process-registry.mjs","size":12217,"sha256":"bf1382cab3e9c32187613ca048b1b974ef3d91d4b3ba96fa434bab6c499c40fe","contentType":"text/javascript"},{"id":"40a71347-b53f-5ad8-8309-5b40e248d2b6","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/40a71347-b53f-5ad8-8309-5b40e248d2b6/attachment.mjs","path":"proxy/quantum-lock.mjs","size":8863,"sha256":"8fe1e1892753a230740790e847a39169586dd95c19906240414f861dbc581db9","contentType":"text/javascript"},{"id":"1b11a911-8c65-56a1-8935-d9a7f7afc258","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/1b11a911-8c65-56a1-8935-d9a7f7afc258/attachment.mjs","path":"proxy/rate-limiter.mjs","size":3705,"sha256":"84b506e5be3859538f67592609453e27127d0b337337bd2682fc5a50bb17ea1a","contentType":"text/javascript"},{"id":"e6dd9869-b719-5459-8a4b-3b147b514e08","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/e6dd9869-b719-5459-8a4b-3b147b514e08/attachment.mjs","path":"proxy/redis-client.mjs","size":2097,"sha256":"f40f970ebee5a462efdfdbcee953f0e69b6f750cceb076c3daa0262cf743a146","contentType":"text/javascript"},{"id":"0f58f1c3-f685-5a04-863f-41578c73aa58","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/0f58f1c3-f685-5a04-863f-41578c73aa58/attachment.mjs","path":"proxy/retry.mjs","size":4794,"sha256":"4946b0aaef5eecd7eb8a20b04cb0b2653adf47f89cfd6d988d6219bbab644680","contentType":"text/javascript"},{"id":"84d6e611-7bbd-5c29-b188-b9fd930d9f82","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/84d6e611-7bbd-5c29-b188-b9fd930d9f82/attachment.mjs","path":"proxy/rewind-handler.mjs","size":9393,"sha256":"7b75a5b20c101d00d3f2015f8b218d9baf607f3132f87746ecd33095026d3de3","contentType":"text/javascript"},{"id":"5067fab4-dc15-5bf3-ba9d-c74cb9576cfd","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/5067fab4-dc15-5bf3-ba9d-c74cb9576cfd/attachment.mjs","path":"proxy/server.mjs","size":82293,"sha256":"80f100b7371aee87726b37f211ae59d044c5fe17a55bd3c8883aae84943823df","contentType":"text/javascript"},{"id":"3981e3d1-f9a4-53cd-9e95-e89f22335bf8","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/3981e3d1-f9a4-53cd-9e95-e89f22335bf8/attachment.mjs","path":"proxy/session-affinity.mjs","size":4609,"sha256":"3f3f7c06d7881cd9ce712f40386d7c9c10fdd75496c474db0f125b3d9b530397","contentType":"text/javascript"},{"id":"1f0c603a-054c-5566-8ee8-5f6959184277","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/1f0c603a-054c-5566-8ee8-5f6959184277/attachment.sh","path":"proxy/start.sh","size":723,"sha256":"411368e9eed5378f16b21aa73ccc599560dd1d98420d60fe8a47a7dc4f328cbc","contentType":"application/x-sh; charset=utf-8"},{"id":"8a59b6d5-9b7f-5162-a921-6f51791bebb8","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/8a59b6d5-9b7f-5162-a921-6f51791bebb8/attachment.mjs","path":"proxy/test/event-log.test.mjs","size":7479,"sha256":"85e6ed06c93536951a81106cb7c6542185a531fe3ea9da2eb2a0e411f14f50ef","contentType":"text/javascript"},{"id":"9c018a60-5717-5b9b-aa1b-4d996dd12e9c","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/9c018a60-5717-5b9b-aa1b-4d996dd12e9c/attachment.mjs","path":"proxy/test/helpers.mjs","size":2425,"sha256":"e50b302e8896a9ab3c36c267ead48ec6af272c8fca6e28bbeea9e7d287725436","contentType":"text/javascript"},{"id":"d623a78c-bb69-5c99-b540-5c7cf3bc3821","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/d623a78c-bb69-5c99-b540-5c7cf3bc3821/attachment.mjs","path":"proxy/test/integration.test.mjs","size":10106,"sha256":"3a5e0384bf40770a24006a4024a00c9290f108e2d67349c361928c959f234274","contentType":"text/javascript"},{"id":"a2fc29a9-1ce9-500f-8985-8fe2a407b287","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/a2fc29a9-1ce9-500f-8985-8fe2a407b287/attachment.mjs","path":"proxy/test/metrics-store.test.mjs","size":7356,"sha256":"e7d58b2e2bcaf961677e3092d1f1f65e28180fee35bd24582dd7b0f5ae975393","contentType":"text/javascript"},{"id":"04ecb38c-6659-5c2a-b1fb-86a5d0655993","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/04ecb38c-6659-5c2a-b1fb-86a5d0655993/attachment.mjs","path":"proxy/test/process-registry.test.mjs","size":12753,"sha256":"935741b235eba12529b1e198fbe544d08530578a53c80810f84fa3152cdf3593","contentType":"text/javascript"},{"id":"420980d6-7aa2-5bc0-ab64-ab259d38130c","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/420980d6-7aa2-5bc0-ab64-ab259d38130c/attachment.mjs","path":"proxy/test/rate-limiter.test.mjs","size":3860,"sha256":"ed3de43fdf02d175388ba13c1f1c161a963c5704f46df9ae941446f088d08231","contentType":"text/javascript"},{"id":"f240d58b-647a-5706-919c-de3ef20b5241","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/f240d58b-647a-5706-919c-de3ef20b5241/attachment.mjs","path":"proxy/test/redis-client.test.mjs","size":1748,"sha256":"5f65ecbbb710e96ba99b94a32da5f123fb3e622abc96d53c1c083281bf47622e","contentType":"text/javascript"},{"id":"6453716e-317c-5a97-8422-5eabfb436ddb","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/6453716e-317c-5a97-8422-5eabfb436ddb/attachment.mjs","path":"proxy/test/token-tracker.test.mjs","size":6877,"sha256":"7a5a380040b2a61c3883890fca93c7ee4aab0765200a6d88479584d7a3807b4b","contentType":"text/javascript"},{"id":"0bb994bf-6cd2-5919-a284-f941b5330539","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/0bb994bf-6cd2-5919-a284-f941b5330539/attachment.mjs","path":"proxy/token-tracker.mjs","size":13960,"sha256":"30c4ab058444ee50c73c633374be0a2ed20341b3c51b35ce4d2e36707632c9e9","contentType":"text/javascript"},{"id":"48084059-0d14-54a6-bf66-1799188d33d9","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/48084059-0d14-54a6-bf66-1799188d33d9/attachment.toml","path":"pyproject.toml","size":2275,"sha256":"266012ba705733b6a91544546f7ff55e0a32bbbbb689ebfa4272ae1bd330ca1d","contentType":"text/plain; charset=utf-8"},{"id":"70033ef5-2305-5e56-81dc-99a1a51c93f0","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/70033ef5-2305-5e56-81dc-99a1a51c93f0/attachment.md","path":"references/README.md","size":1438,"sha256":"c7a471164bc11dda6461f409ce0d39c66762a8fcd2f18cca6c28ddbaada9addd","contentType":"text/markdown; charset=utf-8"},{"id":"2515619d-2a23-5e6e-851b-2343e4c72ad3","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/2515619d-2a23-5e6e-851b-2343e4c72ad3/attachment.md","path":"references/architecture.md","size":12211,"sha256":"bc604ff4d29f9e1ea9e9e8ea68c164284db39ae1c8c769f852e85b7369f50509","contentType":"text/markdown; charset=utf-8"},{"id":"79d3d751-0946-53b6-9f9f-9e9b1d50a86b","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/79d3d751-0946-53b6-9f9f-9e9b1d50a86b/attachment.md","path":"references/benchmarks.md","size":4179,"sha256":"943317d7141bc31aa116159a254fb82789023e084bae46db348e7c988813dcde","contentType":"text/markdown; charset=utf-8"},{"id":"b0878d1b-bd8a-5f17-8f74-712444ca9701","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/b0878d1b-bd8a-5f17-8f74-712444ca9701/attachment.md","path":"references/compression-prompts.md","size":1942,"sha256":"c104be8c979d0011aecb6533d5ff2454eaf8bcc3739fa36c8044120a502a4cfd","contentType":"text/markdown; charset=utf-8"},{"id":"bfc17f07-5e93-500a-addf-904d31a798ad","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/bfc17f07-5e93-500a-addf-904d31a798ad/attachment.md","path":"references/compression-techniques.md","size":8237,"sha256":"5708995cf9e84cd1fadafe51a1f944e85be80940ce6d91345e4b2149e07de94e","contentType":"text/markdown; charset=utf-8"},{"id":"4c079384-96ec-5860-8dfe-4a07da800034","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/4c079384-96ec-5860-8dfe-4a07da800034/attachment.md","path":"references/testing.md","size":6307,"sha256":"881dcabd1c90eb8bf41d1d5659da5ca153e037a4a3890b852c221b1327fc20b8","contentType":"text/markdown; charset=utf-8"},{"id":"d161de6e-ac86-5493-9e77-68ef65883d8d","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/d161de6e-ac86-5493-9e77-68ef65883d8d/attachment.py","path":"scripts/__init__.py","size":0,"sha256":"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855","contentType":"text/x-python; charset=utf-8"},{"id":"2bbafd4e-f665-51d2-a1f6-1853d3e9b350","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/2bbafd4e-f665-51d2-a1f6-1853d3e9b350/attachment.py","path":"scripts/audit_memory.py","size":6522,"sha256":"01eb81f45160ddfed9888f940bc9ea745784484246a357423f6a91d090385902","contentType":"text/x-python; charset=utf-8"},{"id":"4c564b16-2f31-5dd7-a974-c9053f87f64e","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/4c564b16-2f31-5dd7-a974-c9053f87f64e/attachment.py","path":"scripts/benchmark_fusion.py","size":48706,"sha256":"b3e12e95508812a526a321e80b679b99ab5b0e91824d93a1f4d7a4f37d04f244","contentType":"text/x-python; charset=utf-8"},{"id":"f568fc58-29c8-514f-ae5b-d9f2569ab20e","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/f568fc58-29c8-514f-ae5b-d9f2569ab20e/attachment.py","path":"scripts/cli.py","size":746,"sha256":"8a01fcda385ab35bdd22f66f9b4541c63caa15cae841647222f9215382bfaa50","contentType":"text/x-python; charset=utf-8"},{"id":"e7f1b7ec-18da-545b-86bc-63ee9f2a96b0","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/e7f1b7ec-18da-545b-86bc-63ee9f2a96b0/attachment.py","path":"scripts/compress_memory.py","size":7881,"sha256":"49c78013918d057f4c7f9360353383ba56a2fc40a9b66017abf10105bf0840fe","contentType":"text/x-python; charset=utf-8"},{"id":"ed642a42-c5be-51f6-942b-1d1ffcc05e05","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/ed642a42-c5be-51f6-942b-1d1ffcc05e05/attachment.py","path":"scripts/compressed_context.py","size":8798,"sha256":"57a31d227a8988bdb1a1b4472d201dce024400aeba14e409d4139a5e4eb5cd53","contentType":"text/x-python; charset=utf-8"},{"id":"f2a07f92-d1e4-57c2-94bc-8ebf769c29ef","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/f2a07f92-d1e4-57c2-94bc-8ebf769c29ef/attachment.py","path":"scripts/dedup_memory.py","size":5086,"sha256":"027f07957f8f1617419ea73071c0c49782b94c436cc1b1a70da2a5b1d2a3b178","contentType":"text/x-python; charset=utf-8"},{"id":"7edda73c-7cd6-59f2-8d10-62255fd69352","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/7edda73c-7cd6-59f2-8d10-62255fd69352/attachment.py","path":"scripts/dictionary_compress.py","size":5617,"sha256":"6e8693ecec5684593cb5aecc7b73b1115600556ee45da80e36f5cb4096af2dfd","contentType":"text/x-python; charset=utf-8"},{"id":"9d98732a-7d4a-563a-8153-b3c0965fe29c","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/9d98732a-7d4a-563a-8153-b3c0965fe29c/attachment.sh","path":"scripts/engram-auto.sh","size":5058,"sha256":"2329c3e75928fe2275a2a1346df7e2d9fc0cc3be42495593d13cbdde75a79193","contentType":"application/x-sh; charset=utf-8"},{"id":"d61e9c93-9fea-5661-9ced-e69a9be81f09","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/d61e9c93-9fea-5661-9ced-e69a9be81f09/attachment.py","path":"scripts/engram_auto.py","size":41036,"sha256":"7f7867ecc38a470e30bba2a14ec20d30b853d66682b80db4b13bcfa3adbe88f8","contentType":"text/x-python; charset=utf-8"},{"id":"2b65b57b-1163-5452-bbde-10c128ee2c6f","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/2b65b57b-1163-5452-bbde-10c128ee2c6f/attachment.py","path":"scripts/engram_cli.py","size":17522,"sha256":"0dc968ab504f7883b66f9cd0d88a6504ef55c32aa3054a058f518d3b14b46311","contentType":"text/x-python; charset=utf-8"},{"id":"7918dc4f-5b74-55da-944c-e0dd5acb278d","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/7918dc4f-5b74-55da-944c-e0dd5acb278d/attachment.py","path":"scripts/estimate_tokens.py","size":4297,"sha256":"c3a8206bcdf0230b4dff8e23c11b839ad2b60962d2b6c6396430d2871ade1934","contentType":"text/x-python; charset=utf-8"},{"id":"f595e2e6-eaa9-50ce-a5bb-71f32fa45776","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/f595e2e6-eaa9-50ce-a5bb-71f32fa45776/attachment.py","path":"scripts/generate_summary_tiers.py","size":8837,"sha256":"9c8275b11619332a915e8118c963d3a84357b7cf66d966f651212c3282417611","contentType":"text/x-python; charset=utf-8"},{"id":"863432c5-3656-5751-a532-98b954a41ee6","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/863432c5-3656-5751-a532-98b954a41ee6/attachment.py","path":"scripts/lib/__init__.py","size":1113,"sha256":"40cbf7ac55e0882f8143ab0b59e13907288f2739a91a3a7aadf47ff385a6a170","contentType":"text/x-python; charset=utf-8"},{"id":"441ee5fd-d07b-5b2d-94e9-e4c6e8a24555","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/441ee5fd-d07b-5b2d-94e9-e4c6e8a24555/attachment.py","path":"scripts/lib/cli.py","size":648,"sha256":"f48deaa2d26c0d4c5b672cd649fd1517fa12d0308c0e298d922b5d15f9270012","contentType":"text/x-python; charset=utf-8"},{"id":"b176691e-67ee-5d2a-b0cb-d50aafa7f2dc","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/b176691e-67ee-5d2a-b0cb-d50aafa7f2dc/attachment.py","path":"scripts/lib/config.py","size":11407,"sha256":"9aad2dc3c8bb90bb7a5fbeffea6065ea213dd157f2fd3c2cf722e07abb49570b","contentType":"text/x-python; charset=utf-8"},{"id":"4744b853-dab1-5810-a285-531c81b94d6c","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/4744b853-dab1-5810-a285-531c81b94d6c/attachment.py","path":"scripts/lib/crunch_bench.py","size":10354,"sha256":"867fc19d186b0f142eb2841c1ee83c1098aa8b8ca97dd3b2ccbd91fceaf491db","contentType":"text/x-python; charset=utf-8"},{"id":"bf1733ba-4f50-57c3-9d25-ff5defce2cc7","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/bf1733ba-4f50-57c3-9d25-ff5defce2cc7/attachment.py","path":"scripts/lib/dedup.py","size":3425,"sha256":"52a68a5f3db2a9b0fa611ad4135fa70c06f4722b331f85d0f4fb1f45f83d60cd","contentType":"text/x-python; charset=utf-8"},{"id":"7ab29635-ef2e-5c08-af7b-7e470f7becf0","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/7ab29635-ef2e-5c08-af7b-7e470f7becf0/attachment.py","path":"scripts/lib/dictionary.py","size":9918,"sha256":"75e7dee23c3f0c09c1c7dc20d681ae03ac2890d111cd3231c64ff27645d59817","contentType":"text/x-python; charset=utf-8"},{"id":"4e21e128-b1aa-58a4-a9e7-c11e9d364104","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/4e21e128-b1aa-58a4-a9e7-c11e9d364104/attachment.py","path":"scripts/lib/engram.py","size":16669,"sha256":"dd908fbd78d3e4e17547c8c8b7ff55b395d611c478de421b2ced6f7930e41f24","contentType":"text/x-python; charset=utf-8"},{"id":"3f57e6ce-8dc6-5ef7-bdda-01977afc7794","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/3f57e6ce-8dc6-5ef7-bdda-01977afc7794/attachment.py","path":"scripts/lib/engram/http.py","size":5358,"sha256":"5e6e0aca091abe149b93a4f0f69daaaf13c08ad3d0d6f354f51548e854a68251","contentType":"text/x-python; charset=utf-8"},{"id":"437ae345-a1f9-550d-9466-9ed4c535cdf3","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/437ae345-a1f9-550d-9466-9ed4c535cdf3/attachment.py","path":"scripts/lib/engram_http.py","size":5489,"sha256":"6dc4d14a27b45301486205e495020d3be27f47f21a6d6187b98c24c4a9c47aee","contentType":"text/x-python; charset=utf-8"},{"id":"32fcf7f9-6a0a-581a-b199-a5d4bc8f2720","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/32fcf7f9-6a0a-581a-b199-a5d4bc8f2720/attachment.py","path":"scripts/lib/engram_learner.py","size":14681,"sha256":"fe801442b2deee338d6c91eeca5ab0618d8a04777ef74d58405d0c9655c4ebae","contentType":"text/x-python; charset=utf-8"},{"id":"dd254ac1-20b7-5902-9f50-a88c08d0b1fc","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/dd254ac1-20b7-5902-9f50-a88c08d0b1fc/attachment.py","path":"scripts/lib/engram_llm.py","size":3628,"sha256":"a2394d1c3cbbe97fc65ff12622f98fa2103599ed6edcf248071668df1c406ef0","contentType":"text/x-python; charset=utf-8"},{"id":"de2a1a96-21dd-54e4-abd8-9c579da403a5","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/de2a1a96-21dd-54e4-abd8-9c579da403a5/attachment.py","path":"scripts/lib/engram_prompts.py","size":7176,"sha256":"bd296836393694e1ab2eb29fc5538b74b23dd4d5751aa1c4facd3f0f50238807","contentType":"text/x-python; charset=utf-8"},{"id":"17475583-1354-5dca-8059-82cf632767ac","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/17475583-1354-5dca-8059-82cf632767ac/attachment.py","path":"scripts/lib/engram_storage.py","size":9269,"sha256":"952fc5e63d91539e73f65baf095ba0e26858b2b082773c504b42b1c850dbf9de","contentType":"text/x-python; charset=utf-8"},{"id":"54f354ef-297d-5cdd-b28d-4c4b739791ca","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/54f354ef-297d-5cdd-b28d-4c4b739791ca/attachment.py","path":"scripts/lib/engram_utils.py","size":2633,"sha256":"0192c034c8808617dfaa1d3335cb63605c4736016c536e589dac696cdf30f412","contentType":"text/x-python; charset=utf-8"},{"id":"9af319cc-60dc-57d6-92bd-21b17625b88d","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/9af319cc-60dc-57d6-92bd-21b17625b88d/attachment.py","path":"scripts/lib/exceptions.py","size":535,"sha256":"b4673698011eeac97014a52f9b3f9c55b716741fb4bcf478aa8af92dc56e1718","contentType":"text/x-python; charset=utf-8"},{"id":"4407cb72-ea88-59cf-bd39-9a3b245daa7e","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/4407cb72-ea88-59cf-bd39-9a3b245daa7e/attachment.py","path":"scripts/lib/feedback.py","size":5876,"sha256":"fa0a2f3174cc08cb8e29da455054466cdd6efedb48450a089433956a7e18c2c0","contentType":"text/x-python; charset=utf-8"},{"id":"1c5b4d87-2a0a-51ff-b790-bc03c639c999","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/1c5b4d87-2a0a-51ff-b790-bc03c639c999/attachment.py","path":"scripts/lib/fusion/__init__.py","size":2807,"sha256":"4e303de89b9b681bf8c186825abf30b323cbc65479204e901f5d3e100da339de","contentType":"text/x-python; charset=utf-8"},{"id":"c58192a5-5cc2-5ca3-87e4-18213c6fc887","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/c58192a5-5cc2-5ca3-87e4-18213c6fc887/attachment.py","path":"scripts/lib/fusion/base.py","size":3723,"sha256":"92959b686f5a42b16ffae9a14a86272f41ba69869b268e91c9b37939e57990fd","contentType":"text/x-python; charset=utf-8"},{"id":"641cb2f4-47fe-568f-a307-ae1188157a9d","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/641cb2f4-47fe-568f-a307-ae1188157a9d/attachment.py","path":"scripts/lib/fusion/cache_prefix.py","size":6672,"sha256":"5f4bd6a1f4695e927b63574af5bb89e41eb53cae137ccaccfb06dcbafdc0a504","contentType":"text/x-python; charset=utf-8"},{"id":"37656299-a7b0-5a3b-8f08-c1a46675e83b","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/37656299-a7b0-5a3b-8f08-c1a46675e83b/attachment.py","path":"scripts/lib/fusion/compact_hooks.py","size":6986,"sha256":"87bb11861ee8ee56a9334e69e473252de04ac0ffdfeffdc5f3e5082cd0c8377a","contentType":"text/x-python; charset=utf-8"},{"id":"648ccf1c-5880-5f82-81df-3736e0fc2aef","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/648ccf1c-5880-5f82-81df-3736e0fc2aef/attachment.py","path":"scripts/lib/fusion/content_detector.py","size":10810,"sha256":"a9e88fa1f0da3a8feef844ea57cbdcd01ae5ad5de9ea2dcbc8aef03469f9a610","contentType":"text/x-python; charset=utf-8"},{"id":"f31c54bf-0373-531d-80c9-19283c6b0bc1","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/f31c54bf-0373-531d-80c9-19283c6b0bc1/attachment.py","path":"scripts/lib/fusion/content_stripper.py","size":6195,"sha256":"0a79f4c98faf273f80ea1142c8bb177c429b9b71925945bb629f715799a66bcb","contentType":"text/x-python; charset=utf-8"},{"id":"07b96651-f42a-5176-bed7-483668710879","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/07b96651-f42a-5176-bed7-483668710879/attachment.py","path":"scripts/lib/fusion/conversation_summarizer.py","size":8916,"sha256":"b474d5cd358bd902b10817515bfdafe067222b97d21349727d8fb6ba38d45a43","contentType":"text/x-python; charset=utf-8"},{"id":"107f436d-e6bf-5e56-9fae-500233716b61","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/107f436d-e6bf-5e56-9fae-500233716b61/attachment.py","path":"scripts/lib/fusion/cortex.py","size":1877,"sha256":"81b793b1445043d6e058bef074444ba3766ad6762b0bddca2cbd6ae339fefa00","contentType":"text/x-python; charset=utf-8"},{"id":"b2621c36-03e9-5c19-9400-6f5c3a37d8e9","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/b2621c36-03e9-5c19-9400-6f5c3a37d8e9/attachment.py","path":"scripts/lib/fusion/diff_crunch.py","size":7669,"sha256":"0480113c5e919616e24a7a4c32d3748f96ce58e2abb911bc3f442f16be183491","contentType":"text/x-python; charset=utf-8"},{"id":"d00ba230-b7e8-5663-b6b2-615acd9ae101","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/d00ba230-b7e8-5663-b6b2-615acd9ae101/attachment.py","path":"scripts/lib/fusion/engine.py","size":24229,"sha256":"699f0919ca3290975d7301897c5a91847f70dba482fe5c88d5ea1f870c60039a","contentType":"text/x-python; charset=utf-8"},{"id":"c6a42c60-966a-58a9-8c69-7c09205e7843","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/c6a42c60-966a-58a9-8c69-7c09205e7843/attachment.py","path":"scripts/lib/fusion/ionizer.py","size":9344,"sha256":"afd9d7f6ec65605e734d0b408ad1d3ed141894f74fc86de6f1841c40e8a9004a","contentType":"text/x-python; charset=utf-8"},{"id":"cb854da7-0525-5e8c-8e40-362a948c0912","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/cb854da7-0525-5e8c-8e40-362a948c0912/attachment.py","path":"scripts/lib/fusion/llm_summarizer.py","size":12608,"sha256":"ce11aa5e0a9956863b0a99b527ec8f438a13cd423f8b5d3a67c57f96e8dea714","contentType":"text/x-python; charset=utf-8"},{"id":"6074b314-bfdc-5e32-a38d-89e9ffe0064c","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/6074b314-bfdc-5e32-a38d-89e9ffe0064c/attachment.py","path":"scripts/lib/fusion/log_crunch.py","size":8316,"sha256":"f50e70cdd3152aa6d6cfe29f35bcd2a140c4d8f1ce05c6f084ce66d1a396a1e6","contentType":"text/x-python; charset=utf-8"},{"id":"11ace14a-7961-5bd7-b76b-7981ebe36043","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/11ace14a-7961-5bd7-b76b-7981ebe36043/attachment.py","path":"scripts/lib/fusion/neurosyntax.py","size":14058,"sha256":"356a661404d5e5871e3a5d1fa3678d724777f76fe8ef9c069588bef5e8836c62","contentType":"text/x-python; charset=utf-8"},{"id":"07c544c0-249b-506d-8d16-96bdcf8b9dc0","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/07c544c0-249b-506d-8d16-96bdcf8b9dc0/attachment.py","path":"scripts/lib/fusion/nexus.py","size":10724,"sha256":"cf29107f265c8fe5b2877944c3788b10cc0803e3a06ef4ae7604c2ce972aacdd","contentType":"text/x-python; charset=utf-8"},{"id":"d19fb571-14ca-5cd1-84b2-9eb1b6871021","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/d19fb571-14ca-5cd1-84b2-9eb1b6871021/attachment.py","path":"scripts/lib/fusion/nexus_model.py","size":7026,"sha256":"ef8838149e82aff33d193b84fc63ec56b0a2a5538dce285d42b99fa59d5c9d24","contentType":"text/x-python; charset=utf-8"},{"id":"468f9f48-82a5-5dad-9157-dfec8b789809","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/468f9f48-82a5-5dad-9157-dfec8b789809/attachment.py","path":"scripts/lib/fusion/photon.py","size":17817,"sha256":"0c7aea548a8715fa7cffda1382945434f5872e83870ad3a615dd360e3ae310a8","contentType":"text/x-python; charset=utf-8"},{"id":"a4fc90b9-de14-5da8-ab86-2220df48b875","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/a4fc90b9-de14-5da8-ab86-2220df48b875/attachment.py","path":"scripts/lib/fusion/pipeline.py","size":3332,"sha256":"f5bcdde38a7669592ce579f196f392b63478b77150e13b2e0813909b2f17d41d","contentType":"text/x-python; charset=utf-8"},{"id":"34d73d3f-4757-5d4a-a3a6-6e7993aace47","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/34d73d3f-4757-5d4a-a3a6-6e7993aace47/attachment.py","path":"scripts/lib/fusion/plan_reinjection.py","size":9528,"sha256":"add36e18f758a88d3bec5aa95b0076ac54a381336bf8aa0291e937a0c725028f","contentType":"text/x-python; charset=utf-8"},{"id":"b3559de2-d211-5776-8d00-a23055866258","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/b3559de2-d211-5776-8d00-a23055866258/attachment.py","path":"scripts/lib/fusion/quantum_lock.py","size":7844,"sha256":"51abf9ea0309959cfc6af4ead2d25d616c6caba5275ee869f8c9573880560dc3","contentType":"text/x-python; charset=utf-8"},{"id":"e80fe227-3f81-5479-b298-4e8592bfb8cb","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/e80fe227-3f81-5479-b298-4e8592bfb8cb/attachment.py","path":"scripts/lib/fusion/search_crunch.py","size":8578,"sha256":"01ca4a0f6cb0c728213901dccaeab9d35dd29a72215c059b5bad6190cf27ecd0","contentType":"text/x-python; charset=utf-8"},{"id":"25fd5c2a-0b15-5955-aa70-dedfc2701e72","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/25fd5c2a-0b15-5955-aa70-dedfc2701e72/attachment.py","path":"scripts/lib/fusion/semantic_dedup.py","size":14351,"sha256":"f16193b2bd48b88aec9f267fd899676474a8ac1be12556c3ce7761ba6765bdbd","contentType":"text/x-python; charset=utf-8"},{"id":"5d1a06e4-a4c6-5a82-8c03-5534adf64a6e","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/5d1a06e4-a4c6-5a82-8c03-5534adf64a6e/attachment.py","path":"scripts/lib/fusion/skill_reinjection.py","size":6435,"sha256":"13093d7fe8be0182491f7dd23d0dcd8b94c0d09d60de83c9141c14c416d44752","contentType":"text/x-python; charset=utf-8"},{"id":"c91790c9-8f3c-5a78-81cc-857627e3e376","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/c91790c9-8f3c-5a78-81cc-857627e3e376/attachment.py","path":"scripts/lib/fusion/structural_collapse.py","size":16271,"sha256":"1941acd288fa8b7e33360eb7c0ca2320a3cd56043a4a58ab1be4bb488fd26934","contentType":"text/x-python; charset=utf-8"},{"id":"a93262f3-e9f3-54e8-9ac5-d025132faf8f","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/a93262f3-e9f3-54e8-9ac5-d025132faf8f/attachment.py","path":"scripts/lib/fusion/tiered_compaction.py","size":15408,"sha256":"472a7270b24495de40b086baa9d7304b61006df19f13d70788b3689b5117b99f","contentType":"text/x-python; charset=utf-8"},{"id":"37b3399e-eae7-5852-9f7d-fdaf45ece31b","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/37b3399e-eae7-5852-9f7d-fdaf45ece31b/attachment.py","path":"scripts/lib/fusion/tool_result_budget.py","size":4597,"sha256":"89ca5ef6a2e7f6504d2564c98cae0757edd1b18b129bacd175a2bb4e7568391f","contentType":"text/x-python; charset=utf-8"},{"id":"cf4bd67b-21ab-5514-bf02-baf9a12a30b1","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/cf4bd67b-21ab-5514-bf02-baf9a12a30b1/attachment.py","path":"scripts/lib/markdown.py","size":9852,"sha256":"8f7827e7a8d591d9a84773660d0d7089a5228039d77fe2eb3a3b88b48bc6f195","contentType":"text/x-python; charset=utf-8"},{"id":"e6888906-8450-5ecb-9d04-d7c430bd45ef","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/e6888906-8450-5ecb-9d04-d7c430bd45ef/attachment.py","path":"scripts/lib/rewind/__init__.py","size":396,"sha256":"fffb3cca9482c191f35c9703c4e8f12afafb4d504da2053608f30ae6e3a06a54","contentType":"text/x-python; charset=utf-8"},{"id":"05144407-ddc7-5a9e-8083-511623c2e406","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/05144407-ddc7-5a9e-8083-511623c2e406/attachment.py","path":"scripts/lib/rewind/marker.py","size":1506,"sha256":"333d90370024761da6a5d134846b5a646a1a6f353cb8cd2d1787b6394369cfb8","contentType":"text/x-python; charset=utf-8"},{"id":"f0fb2d39-531f-58d5-8687-d9f0233d2227","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/f0fb2d39-531f-58d5-8687-d9f0233d2227/attachment.py","path":"scripts/lib/rewind/retriever.py","size":2181,"sha256":"b2c15b37abf9b67bcabb45b8fc5d4f1323d6cbb0ca1f956203862e4924586fed","contentType":"text/x-python; charset=utf-8"},{"id":"b56db5d2-5995-56fa-b378-264583c970b8","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/b56db5d2-5995-56fa-b378-264583c970b8/attachment.py","path":"scripts/lib/rewind/store.py","size":3286,"sha256":"0ae86b87db5f803b2599cf4e40b91f3ff11c7a648982e26a59f0bad31c37bd49","contentType":"text/x-python; charset=utf-8"},{"id":"80705fd2-c393-5bbb-a1e6-11aee5dc395b","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/80705fd2-c393-5bbb-a1e6-11aee5dc395b/attachment.py","path":"scripts/lib/rle.py","size":5104,"sha256":"81265aeaad8feafb733341785d1997e994208158fa79f0ca863e854ef52ac702","contentType":"text/x-python; charset=utf-8"},{"id":"ec0799fc-8e20-5bd4-809f-2c11b758e2e5","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/ec0799fc-8e20-5bd4-809f-2c11b758e2e5/attachment.py","path":"scripts/lib/tokenizer_optimizer.py","size":5806,"sha256":"93bd5f293ae9fb7ba03c58b35fb2321488a0501e9bb70c8545a0550b5364b288","contentType":"text/x-python; charset=utf-8"},{"id":"c93bfd78-aa19-5ffb-be11-dad6a88b47a1","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/c93bfd78-aa19-5ffb-be11-dad6a88b47a1/attachment.py","path":"scripts/lib/tokens.py","size":2607,"sha256":"fa6adb5c1e041752817dbbd174effcbe60591b9d7e4ca3ff325541d97935c78e","contentType":"text/x-python; charset=utf-8"},{"id":"0bfdf6f6-cd0a-59ea-8716-15eceb8242e9","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/0bfdf6f6-cd0a-59ea-8716-15eceb8242e9/attachment.py","path":"scripts/lib/unicode_maps.py","size":3372,"sha256":"9c5027e4ca0869c5e5bd0b5c57b232d7c79863d4db898f7e17ffeef9531dc143","contentType":"text/x-python; charset=utf-8"},{"id":"4960f8be-02b3-5897-bbce-13aa5463c1c6","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/4960f8be-02b3-5897-bbce-13aa5463c1c6/attachment.py","path":"scripts/mem_compress.py","size":29395,"sha256":"f104b302941efd6b76817026562ea64d5280008a7fa7d574ffe4240c38897be9","contentType":"text/x-python; charset=utf-8"},{"id":"2e406639-3728-563f-965e-a32144cad543","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/2e406639-3728-563f-965e-a32144cad543/attachment.py","path":"scripts/observation_compressor.py","size":15106,"sha256":"2df124e9d3294b10d25304f83beabcebf3d7bf29f2e4c9c6e0b6a4a27b192e09","contentType":"text/x-python; charset=utf-8"},{"id":"5c0ac007-f508-51b8-a986-341b2aa20b0c","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/5c0ac007-f508-51b8-a986-341b2aa20b0c/attachment.py","path":"tests/__init__.py","size":0,"sha256":"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855","contentType":"text/x-python; charset=utf-8"},{"id":"9ed05ce8-4121-57cd-b796-dc28f73b464b","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/9ed05ce8-4121-57cd-b796-dc28f73b464b/attachment.py","path":"tests/conftest.py","size":3316,"sha256":"a267e9b0ee256616066934fa042a7dd4b85a950ac6a49d31fce7300bf9287d09","contentType":"text/x-python; charset=utf-8"},{"id":"612202d3-82bc-5959-bc29-609b9e60b96b","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/612202d3-82bc-5959-bc29-609b9e60b96b/attachment.py","path":"tests/test_audit_comprehensive.py","size":4020,"sha256":"7f1f7325dd197fa6fbe26e42823632d8da6de96137b6a14b25a23ff9f8a6dce4","contentType":"text/x-python; charset=utf-8"},{"id":"d5017981-66e9-575d-a9c3-0d5c774496dc","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/d5017981-66e9-575d-a9c3-0d5c774496dc/attachment.py","path":"tests/test_audit_memory.py","size":2299,"sha256":"741487d1dd2235104be3489c48118cd51a3009cae80dd4f0cb9e0c4d0bb46bbc","contentType":"text/x-python; charset=utf-8"},{"id":"56cd54ac-bf15-55ef-86b0-0b7ad7f80b26","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/56cd54ac-bf15-55ef-86b0-0b7ad7f80b26/attachment.py","path":"tests/test_benchmark.py","size":4762,"sha256":"2182c2022bfc96290d6cba60cbd7162e71d675e714ae9adb7c506462fb49bcc7","contentType":"text/x-python; charset=utf-8"},{"id":"f22e6bfd-ff15-5bb7-962e-5ab26096a430","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/f22e6bfd-ff15-5bb7-962e-5ab26096a430/attachment.py","path":"tests/test_cli_commands.py","size":7626,"sha256":"b7d840f65d0bfa721f72449573a1a55e2c2e45a9d1fe8bfc041e4b09e116c90e","contentType":"text/x-python; charset=utf-8"},{"id":"ca40b172-9dc4-53e5-8804-5f32ad1e0d84","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/ca40b172-9dc4-53e5-8804-5f32ad1e0d84/attachment.py","path":"tests/test_compress_memory.py","size":3811,"sha256":"b1673cbdfea37959ea97f8b037d9dab70824a414d37db0bf571e0d57c773e3b8","contentType":"text/x-python; charset=utf-8"},{"id":"e69baac4-ebf6-5a64-a646-dbc84b14d0b9","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/e69baac4-ebf6-5a64-a646-dbc84b14d0b9/attachment.py","path":"tests/test_compress_memory_comprehensive.py","size":6799,"sha256":"d3a50aad5665a2f5cef20b5b7015c9c74d4b3cf43cab84a85fc30daf9742aff4","contentType":"text/x-python; charset=utf-8"},{"id":"cf3acc20-fe7c-5fcc-b41d-4dcd716e65ba","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/cf3acc20-fe7c-5fcc-b41d-4dcd716e65ba/attachment.py","path":"tests/test_compressed_context.py","size":5219,"sha256":"d767b9cf2feb2eee3393e059b73e41bb440b4f85c3e7caecf9fdee8bdacf8c24","contentType":"text/x-python; charset=utf-8"},{"id":"305b2a8e-28b8-56dc-91d2-2254d01c77f1","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/305b2a8e-28b8-56dc-91d2-2254d01c77f1/attachment.py","path":"tests/test_config.py","size":1901,"sha256":"293d33b95e34f4c7c8b47031f687f613336da4a11ada2ee43f0ccb22defdecda","contentType":"text/x-python; charset=utf-8"},{"id":"69f5d9a3-8dd7-5710-9266-38b681cb3718","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/69f5d9a3-8dd7-5710-9266-38b681cb3718/attachment.py","path":"tests/test_conversation_summarizer.py","size":7318,"sha256":"b32aa41c57cf896daf970d1525f191a5d478c5004bfc98efa9fe7fbd64646479","contentType":"text/x-python; charset=utf-8"},{"id":"c384107a-bf6c-542b-b8b4-c96ce808646a","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/c384107a-bf6c-542b-b8b4-c96ce808646a/attachment.py","path":"tests/test_cortex.py","size":8922,"sha256":"1cab9c9e850b6ba352adf2ce17bb51cc6230e63de5fa88a67809fbab40a11171","contentType":"text/x-python; charset=utf-8"},{"id":"90e1900b-71d5-5a06-83ec-7db219002bef","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/90e1900b-71d5-5a06-83ec-7db219002bef/attachment.py","path":"tests/test_crunch_bench.py","size":17923,"sha256":"568e02f345597cd95dd6e52973db00d4dbe700f40545c7149cc6c927a0364c2f","contentType":"text/x-python; charset=utf-8"},{"id":"e9eaa2ac-a27e-534d-8c1a-f20c342aa157","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/e9eaa2ac-a27e-534d-8c1a-f20c342aa157/attachment.py","path":"tests/test_dedup_memory.py","size":4168,"sha256":"3896289d932fa12312ddc4e7e6bb32a8c46accd6c761dea9d364122767060b31","contentType":"text/x-python; charset=utf-8"},{"id":"0bb4a70e-54ef-5d64-88a5-f4fe28312ff2","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/0bb4a70e-54ef-5d64-88a5-f4fe28312ff2/attachment.py","path":"tests/test_dictionary.py","size":11069,"sha256":"e0d342d50f83070582573c3ff8e35bdadc6f0b87b0567c447a367f520c5ed57d","contentType":"text/x-python; charset=utf-8"},{"id":"35f7dcc5-3270-596f-be96-d17c709b3cbd","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/35f7dcc5-3270-596f-be96-d17c709b3cbd/attachment.py","path":"tests/test_dictionary_comprehensive.py","size":7806,"sha256":"05504ee07732151f7be02d2151e2ca3bb7ec9d69aa3cac27641cf54f50595aa9","contentType":"text/x-python; charset=utf-8"},{"id":"6940a307-f3ca-5b76-8f97-c5438e06816c","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/6940a307-f3ca-5b76-8f97-c5438e06816c/attachment.py","path":"tests/test_engram.py","size":44177,"sha256":"39c54a5342a88bc162de96ae1f395aa89d604acb317af80a23a850ebe3200e96","contentType":"text/x-python; charset=utf-8"},{"id":"950ad3d8-60b9-5984-92f8-682abd246fe9","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/950ad3d8-60b9-5984-92f8-682abd246fe9/attachment.py","path":"tests/test_engram_auto.py","size":39426,"sha256":"e7eedaab09c2cecc6137eaef7d67456f929901270063f9b51ac3f744b2cbaf5f","contentType":"text/x-python; charset=utf-8"},{"id":"61aff553-e5cb-5bae-95e0-97666db8a044","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/61aff553-e5cb-5bae-95e0-97666db8a044/attachment.py","path":"tests/test_engram_learner.py","size":15129,"sha256":"9d9cdbbaac65e80ef8ea50450dc0f0cc3f09855027675bdfd7af6b5233881bf6","contentType":"text/x-python; charset=utf-8"},{"id":"d9d60e38-f299-56d7-bd25-e7f972c6de8c","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/d9d60e38-f299-56d7-bd25-e7f972c6de8c/attachment.py","path":"tests/test_error_handling.py","size":6575,"sha256":"cca182d7849d8ada16c91eb43e802553e77e49f79cd5e53e9608ff33140dab81","contentType":"text/x-python; charset=utf-8"},{"id":"a0e9a93c-f9be-5576-95d4-6966d4e891d3","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/a0e9a93c-f9be-5576-95d4-6966d4e891d3/attachment.py","path":"tests/test_estimate_tokens.py","size":3027,"sha256":"adbc81d4a3116bb817566dbbc4e643c69176b610a11f24417a22f58095d26445","contentType":"text/x-python; charset=utf-8"},{"id":"5cf23f9f-df18-5bd4-94fd-e254179a2f35","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/5cf23f9f-df18-5bd4-94fd-e254179a2f35/attachment.py","path":"tests/test_existing_edge_cases.py","size":15568,"sha256":"41a551181f5fdc7a212a68295eddffeeb5515cef863238af754a597142b36467","contentType":"text/x-python; charset=utf-8"},{"id":"184f6b14-c3ce-5884-a74a-7c2def39d7ff","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/184f6b14-c3ce-5884-a74a-7c2def39d7ff/attachment.py","path":"tests/test_feedback.py","size":12647,"sha256":"1e6720daca80bc1b3f612a16503d8dd5c54ada54cda175c4aa349f4544393402","contentType":"text/x-python; charset=utf-8"},{"id":"7bd9abeb-e7d7-5108-b278-fb08f5a78d76","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/7bd9abeb-e7d7-5108-b278-fb08f5a78d76/attachment.py","path":"tests/test_fusion_engine.py","size":36725,"sha256":"b4dd325f566284f25a19bec325d1dd1385bfba72edc504ded2f0d446b2cfab8d","contentType":"text/x-python; charset=utf-8"},{"id":"3b1b228f-b8a9-544d-adc1-aa426e3edc8e","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/3b1b228f-b8a9-544d-adc1-aa426e3edc8e/attachment.py","path":"tests/test_fusion_pipeline.py","size":18551,"sha256":"e04d1da553769fa3fb38771a4d2d70dc5d69a6a60b5258f1e0101c443e3b502f","contentType":"text/x-python; charset=utf-8"},{"id":"39769ef6-9560-5afc-9e34-d3ef58a0a269","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/39769ef6-9560-5afc-9e34-d3ef58a0a269/attachment.py","path":"tests/test_generate_summary_tiers.py","size":3621,"sha256":"db2828fa2da522469b796d150fbf0c3802d082f299463aa01adfa0e304776762","contentType":"text/x-python; charset=utf-8"},{"id":"f6009c07-271d-56f1-bea1-203dfc943421","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/f6009c07-271d-56f1-bea1-203dfc943421/attachment.py","path":"tests/test_integration.py","size":10197,"sha256":"8103524f7df26b8d6776bfa219a59282d1a8183e9ab9bf2d60ee061ff82d2119","contentType":"text/x-python; charset=utf-8"},{"id":"5dc60a43-fc96-561a-b4ef-7c871dd7e770","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/5dc60a43-fc96-561a-b4ef-7c871dd7e770/attachment.py","path":"tests/test_lib_dedup.py","size":4244,"sha256":"cbf274dba3289cf6021bd587f9b7459212f139889e5287688cd151fbecb38211","contentType":"text/x-python; charset=utf-8"},{"id":"5f87f2d2-0147-5799-987d-2cd57bd4fe0a","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/5f87f2d2-0147-5799-987d-2cd57bd4fe0a/attachment.py","path":"tests/test_lib_markdown.py","size":12436,"sha256":"ee86a3c5023f9d57b6f740b35a1ca5d3f6eeda229734c3f4d5a3c99b589e95c4","contentType":"text/x-python; charset=utf-8"},{"id":"f7bb583b-7d07-59ee-81cf-fe02e0104b74","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/f7bb583b-7d07-59ee-81cf-fe02e0104b74/attachment.py","path":"tests/test_lib_tokens.py","size":3272,"sha256":"f5c17e66714960bdcacac9e90600e9f9ce84fd6847ddce7ca86a3b8959c9a331","contentType":"text/x-python; charset=utf-8"},{"id":"60323c2a-a9b5-565a-ad13-45a8de48a495","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/60323c2a-a9b5-565a-ad13-45a8de48a495/attachment.py","path":"tests/test_main_entry.py","size":6254,"sha256":"c5a5b079e4c754ca1f197aa6f522a940b96bfb18c9199d411391970bb3024133","contentType":"text/x-python; charset=utf-8"},{"id":"55050b31-0831-5d45-9d75-6a8a625655f7","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/55050b31-0831-5d45-9d75-6a8a625655f7/attachment.py","path":"tests/test_markdown_advanced.py","size":8984,"sha256":"36f70094f270cdcb2d5d597dd67f04d9d38cadc42e665590df528da0ea45a9e0","contentType":"text/x-python; charset=utf-8"},{"id":"0a83271e-8de5-595d-922c-38b72fc3ba24","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/0a83271e-8de5-595d-922c-38b72fc3ba24/attachment.py","path":"tests/test_neurosyntax.py","size":14497,"sha256":"6355cc7ea57ea50d62b82e3b81eda0bfc5d1584fec0566c7491f6cbef7d1d19d","contentType":"text/x-python; charset=utf-8"},{"id":"830f02d7-ede4-58fb-948c-6d9d23c2fdac","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/830f02d7-ede4-58fb-948c-6d9d23c2fdac/attachment.py","path":"tests/test_new_features.py","size":2752,"sha256":"58ea94a696b511f6c1124cecabb4869cbb27b618f32a4adaa77fc1bba501e374","contentType":"text/x-python; charset=utf-8"},{"id":"a86c09ac-c5c0-5f5f-9f38-c5b76acaf8dc","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/a86c09ac-c5c0-5f5f-9f38-c5b76acaf8dc/attachment.py","path":"tests/test_new_features_v8.py","size":46652,"sha256":"5cb8b2a2cfa4dc005d98ceab4d6f337b4569ae93d0ba2ed4652abad607a03dd2","contentType":"text/x-python; charset=utf-8"},{"id":"d0040817-8842-5bc3-a6e1-650043c40360","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/d0040817-8842-5bc3-a6e1-650043c40360/attachment.py","path":"tests/test_nexus.py","size":21141,"sha256":"697c05abe37440b98fb5642f5e280196fed9ce098a4bf19d4b892dee2cf86df5","contentType":"text/x-python; charset=utf-8"},{"id":"7e2b7c1c-875f-57cf-8b69-cf212d548993","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/7e2b7c1c-875f-57cf-8b69-cf212d548993/attachment.py","path":"tests/test_observation_comprehensive.py","size":9834,"sha256":"6f2f5b7608011ed3b97dc99b8d163307a285775477de59600bc710753a5aed15","contentType":"text/x-python; charset=utf-8"},{"id":"0bafe5d5-f71a-5c36-af47-074c216edd6b","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/0bafe5d5-f71a-5c36-af47-074c216edd6b/attachment.py","path":"tests/test_observation_compressor.py","size":4754,"sha256":"8e16240e5e428337ba10a22ae868f2fd0405a5351da3644260b59d05e0a70f59","contentType":"text/x-python; charset=utf-8"},{"id":"6a18a83b-ae9f-5f56-9049-61b2afe9dce4","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/6a18a83b-ae9f-5f56-9049-61b2afe9dce4/attachment.py","path":"tests/test_performance.py","size":6840,"sha256":"416cc23d718ac4934279f0e5b506c6d2577e5174cad39db75e9e251313901bfa","contentType":"text/x-python; charset=utf-8"},{"id":"e241b907-0cd3-5403-9167-ce0cc5fc9cde","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/e241b907-0cd3-5403-9167-ce0cc5fc9cde/attachment.py","path":"tests/test_phase3_structured.py","size":24329,"sha256":"9a1b765568a5c45e2311de2786c1a2c9e0bd4c5c98af2e088e9123f22626147c","contentType":"text/x-python; charset=utf-8"},{"id":"aea81b19-41a1-5281-8387-0a0a04c08f5a","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/aea81b19-41a1-5281-8387-0a0a04c08f5a/attachment.py","path":"tests/test_photon.py","size":16322,"sha256":"d61f69d9df82800580519562eb17141e23faf359c7b0aa50bec84138c2103ff1","contentType":"text/x-python; charset=utf-8"},{"id":"ff555e80-9f47-5ac6-b2db-4b77ec7bf3af","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/ff555e80-9f47-5ac6-b2db-4b77ec7bf3af/attachment.py","path":"tests/test_pipeline.py","size":4402,"sha256":"40c6db5a53adddbc0da7afa85097f1b85670436029fa9b0bd952946e44e9a930","contentType":"text/x-python; charset=utf-8"},{"id":"95f4f08f-e732-5b19-9c8f-4c7cf17c7f67","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/95f4f08f-e732-5b19-9c8f-4c7cf17c7f67/attachment.py","path":"tests/test_quantum_lock.py","size":14802,"sha256":"4e813eb641e259824496202e0f1e5833decd6d3dc389c6fdc8c04cfee9776d91","contentType":"text/x-python; charset=utf-8"},{"id":"5efa3467-31d4-5e42-bce4-b0a791e6f2a4","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/5efa3467-31d4-5e42-bce4-b0a791e6f2a4/attachment.py","path":"tests/test_real_workspace.py","size":6615,"sha256":"348aa1093d8ff49f84337643d830855feb5b3c1341196571316cd65536cf56e4","contentType":"text/x-python; charset=utf-8"},{"id":"4f8ea45b-e239-5c2f-9dac-4847bd8cb2f1","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/4f8ea45b-e239-5c2f-9dac-4847bd8cb2f1/attachment.py","path":"tests/test_rewind.py","size":20518,"sha256":"fd4924829fae452abe7a4541b329d9e3d8eeb3e7e8e7741a49ab92cd2bbe90ef","contentType":"text/x-python; charset=utf-8"},{"id":"8db3e83f-0e88-5f15-8357-e87f183dc9ec","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/8db3e83f-0e88-5f15-8357-e87f183dc9ec/attachment.py","path":"tests/test_rle.py","size":3671,"sha256":"2eda01e0fbc6aa5aa6179dc2fcdd773a0beda1dd83f42012a1d519e7871b3f7a","contentType":"text/x-python; charset=utf-8"},{"id":"5a3b675a-2a5d-5d29-a73b-7be2a4b139ea","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/5a3b675a-2a5d-5d29-a73b-7be2a4b139ea/attachment.py","path":"tests/test_rle_comprehensive.py","size":4629,"sha256":"5835189eefcce8ea40e4c3528ae8a86186a7454197f37c61996822f3b1443713","contentType":"text/x-python; charset=utf-8"},{"id":"6235e384-b77a-5faf-90c3-3cdc974a6d4b","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/6235e384-b77a-5faf-90c3-3cdc974a6d4b/attachment.py","path":"tests/test_roundtrip.py","size":8877,"sha256":"1aa5dbb625c12ec94a77ee266363d6e666032f6c4afd2783d660c57b4f95f27b","contentType":"text/x-python; charset=utf-8"},{"id":"7bc9de3b-0fa0-59db-8ea7-c7f1ca57f4e1","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/7bc9de3b-0fa0-59db-8ea7-c7f1ca57f4e1/attachment.py","path":"tests/test_roundtrip_comprehensive.py","size":8121,"sha256":"77c9debcba6773d0b9a51a7c285ff15a0e99be1d9ee7ec390568583b80f67996","contentType":"text/x-python; charset=utf-8"},{"id":"3c76f7e0-2eb3-5386-b760-18d652b95f3e","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/3c76f7e0-2eb3-5386-b760-18d652b95f3e/attachment.py","path":"tests/test_semantic_dedup.py","size":27416,"sha256":"edc34b6ec3f3a499bd15eebd7505c4091438d8ef244d5a16be58bbfdd9768452","contentType":"text/x-python; charset=utf-8"},{"id":"304ce442-65b8-5849-aa9f-b11c75d2f337","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/304ce442-65b8-5849-aa9f-b11c75d2f337/attachment.py","path":"tests/test_structural_collapse.py","size":31673,"sha256":"a7afae932ddbf83245148b1c3685d5eba008d61c8c1dfcc9e8f212640e43b1ed","contentType":"text/x-python; charset=utf-8"},{"id":"ae0e226e-4ca3-5765-b749-aa6432a5df13","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/ae0e226e-4ca3-5765-b749-aa6432a5df13/attachment.py","path":"tests/test_tiered_compaction.py","size":7759,"sha256":"4c6b64316270e56913f37834ef3832ace8004926335dfc4bdbcbe59cef5e4799","contentType":"text/x-python; charset=utf-8"},{"id":"2eb4e8e2-f72b-5b9b-8514-ee4adb200387","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/2eb4e8e2-f72b-5b9b-8514-ee4adb200387/attachment.py","path":"tests/test_tiers_comprehensive.py","size":5921,"sha256":"04c56a8afd4b8322ffd0bfaba2f68e14caad51c973df9719d0933007618fd7d4","contentType":"text/x-python; charset=utf-8"},{"id":"74502164-8cfe-5900-9e84-f5ded151a721","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/74502164-8cfe-5900-9e84-f5ded151a721/attachment.py","path":"tests/test_token_economics.py","size":9729,"sha256":"d3be8b463f8c7c080f20c53b5d31082384ffc9f690704d7b6fb7811ec9a88381","contentType":"text/x-python; charset=utf-8"},{"id":"483488ee-b759-5c7d-b896-00c45eca254a","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/483488ee-b759-5c7d-b896-00c45eca254a/attachment.py","path":"tests/test_tokenizer_optimizer.py","size":5930,"sha256":"00c7945e84cc15d52879b5e7f7342bf56b316e4e95ee7ddabd5978e1f99e6d7f","contentType":"text/x-python; charset=utf-8"},{"id":"694dd2c5-1317-524f-a6a6-975a24658585","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/694dd2c5-1317-524f-a6a6-975a24658585/attachment.py","path":"tests/test_tokenizer_optimizer_comprehensive.py","size":8058,"sha256":"88a15447cff97efbccde27b75a2d48c9371bf11cb8c9277f0af844770f8f05a7","contentType":"text/x-python; charset=utf-8"},{"id":"1b15e49d-2bf0-59c4-9bcf-d89b406594a8","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/1b15e49d-2bf0-59c4-9bcf-d89b406594a8/attachment.py","path":"tests/test_tokens.py","size":803,"sha256":"f25e565fb93d638de2e1cb67e47aed883ccbeb1364b12bdfb2773d070a325286","contentType":"text/x-python; charset=utf-8"},{"id":"3f557c6d-02df-5ac5-b3b9-f66a776d6e4a","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/3f557c6d-02df-5ac5-b3b9-f66a776d6e4a/attachment.py","path":"tests/test_tool_result_budget.py","size":4609,"sha256":"5eaeab91a64f696517ee0ddc1f7a9837a66ad32f28a2d6504bddcaada931a411","contentType":"text/x-python; charset=utf-8"},{"id":"0f421592-62b5-5d54-90b9-4f958924f8c1","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/0f421592-62b5-5d54-90b9-4f958924f8c1/attachment.py","path":"tests/test_v8_deep.py","size":53684,"sha256":"c28829fccdc58d362dc7f36acd5409af91c7defe40cf3b206916f3625d47c99f","contentType":"text/x-python; charset=utf-8"}],"bundle_sha256":"1cad24ce6146874d5adf9fe8709bd2e32b1f69de2b22d5256abe5a6d028e16b9","attachment_count":207,"text_attachments":203,"attachment_storage":"skillopedia-attachments-v1","binary_attachments":4,"excluded_attachments":[]},"cluster_size":1,"skill_md_path":"SKILL.md","import_metadata":{"date":"2026-06-05","author":"@skillopedia","version":"v1","category":"ai-agent-development","category_label":"AI"},"exact_dupes_collapsed_into_this":0},"version":"v1","category":"ai-agent-development","triggers":["compress memory","compress workspace","save tokens","token savings","compress context","run engram","engram observe","engram reflect","memory compression","benchmark compression"],"import_tag":"clean-skills-v1","description":"Claw Compactor — 6-layer token compression skill for OpenClaw agents. Cuts workspace token spend by 50–97% using deterministic rule-engines plus Engram: a real-time, LLM-driven Observational Memory system. Run at session start for automatic savings reporting.\n"}},"renderedAt":1782979669283}

Claw Compactor — OpenClaw Skill Reference Overview Claw Compactor reduces token usage across the full OpenClaw workspace using 6 compression layers: | Layer | Name | Cost | Notes | |-------|------|------|-------| | 1 | Rule Engine | Free | Dedup, strip filler, merge sections | | 2 | Dictionary Encoding | Free | Auto-codebook, substitution | | 3 | Observation Compression | Free | Session JSONL → structured summaries | | 4 | RLE Patterns | Free | Path/IP/enum shorthand | | 5 | Compressed Context Protocol | Free | Format abbreviations | | 6 | Engram | LLM API | Real-time Observational Memory | S…