transformer-lens-interpretability

TransformerLens: Mechanistic Interpretability for Transformers TransformerLens is the de facto standard library for mechanistic interpretability research on GPT-style language models. Created by Neel Nanda and maintained by Bryce Meyer, it provides clean interfaces to inspect and manipulate model internals via HookPoints on every activation. GitHub : TransformerLensOrg/TransformerLens (2,900+ stars) When to Use TransformerLens Use TransformerLens when you need to: - Reverse-engineer algorithms learned during training - Perform activation patching / causal tracing experiments - Study attention…