Sparse autoencoders can interpret randomly initialized transformers
Published in arXiv, 2024
Recommended citation: Heap T, Lawson T, Farnik L, and Aitchison L. (2024). "Sparse autoencoders can interpret randomly initialized transformers." arXiv:2501.17727.
Download Paper