<!-- Adapted from: claude-scientific-skills/scientific-skills/huggingface-tokenizers -- HuggingFace Tokenizers Fast, production-ready tokenization - Rust-powered, Python API. When to Use - High-performance tokenization (<20s per GB) - Train custom tokenizers from scratch - Track token-to-text alignment - Production NLP pipelines - Need BPE, WordPiece, or Unigram tokenization Quick Start Train Custom Tokenizer BPE (GPT-2 style) WordPiece (BERT style) Encoding Options Access Encoding Data Pre-tokenizers Post-processing Normalization With Transformers Save and Load Performance Tips 1. Use batch…