Sparse autoencoders can interpret randomly initialized transformers

Published in arXiv, 2024

Recommended citation: Heap T, Lawson T, Farnik L, and Aitchison L. (2024). "Sparse autoencoders can interpret randomly initialized transformers." arXiv:2501.17727.
Download Paper