Skip to content

Commit e19b94f

Browse files
author
zhengfuhe
committed
merge with main
2 parents ab84485 + 9cb4984 commit e19b94f

File tree

234 files changed

+2344
-236845
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

234 files changed

+2344
-236845
lines changed

.github/workflows/checks.yml

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,10 @@ jobs:
4242
name: Code Checks
4343
runs-on: ubuntu-latest
4444
steps:
45-
- uses: actions/checkout@v4
45+
- name: Checkout repository
46+
uses: actions/checkout@v4
47+
with:
48+
submodules: "true"
4649
- name: Setup PDM
4750
uses: pdm-project/setup-pdm@v4
4851
# You are now able to use PDM in your workflow
@@ -52,3 +55,9 @@ jobs:
5255
run: pdm run mypy .
5356
- name: Unit tests
5457
run: pdm run pytest ./tests
58+
59+
ruff:
60+
runs-on: ubuntu-latest
61+
steps:
62+
- uses: actions/checkout@v4
63+
- uses: astral-sh/ruff-action@v1
File renamed without changes.

.pre-commit-config.yaml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
repos:
2+
- repo: https://github.com/astral-sh/ruff-pre-commit
3+
# Ruff version.
4+
rev: v0.7.1
5+
hooks:
6+
# Run the linter.
7+
- id: ruff
8+
args: [--fix]
9+
# Run the formatter.
10+
- id: ruff-format

README.md

Lines changed: 9 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,19 @@
11
# Language-Model-SAEs
22

3-
This repo aims to provide a general codebase for conducting dictionary-learning-based mechanistic interpretability research on Language Models (LMs). It powers a configurable pipeline for training and evaluating GPT-2 dictionaries, and provides a set of tools (mainly a React-based webpage) for analyzing and visualizing the learned dictionaries.
3+
This repo aims to provide a general codebase for conducting dictionary-learning-based mechanistic interpretability research on Language Models (LMs). It powers a configurable pipeline for training and evaluating Sparse Autoencoders and their variants, and provides a set of tools (mainly a React-based webpage) for analyzing and visualizing the learned dictionaries.
44

55
The design of the pipeline (including the configuration and some training detail) is highly inspired by the [mats_sae_training
6-
](https://github.com/jbloomAus/mats_sae_training) project and heavily relies on the [TransformerLens](https://github.com/TransformerLensOrg/TransformerLens) library. We thank the authors for their great work.
6+
](https://github.com/jbloomAus/mats_sae_training) project (now known as [SAELens](https://github.com/jbloomAus/SAELens)) and heavily relies on the [TransformerLens](https://github.com/TransformerLensOrg/TransformerLens) library. We thank the authors for their great work.
77

8-
## Getting Started with Mechanistic Interpretability and Dictionary Learning
8+
## News
99

10-
If you are new to the concept of mechanistic interpretability and dictionary learning, we recommend you to start from the following paper:
10+
- 2024.10.29 We introduce Llama Scope, our first contribution to the open-source Sparse Autoencoder ecosystem. Stay tuned! Link: [Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders](http://arxiv.org/abs/2410.20526)
1111

12-
- [A Mathematical Framework for Transformer Circuits](https://transformer-circuits.pub/2021/framework/index.html)
13-
- [Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small](https://arxiv.org/abs/2211.00593)
14-
- [Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task](https://arxiv.org/abs/2210.13382)
15-
- [Toy Models of Superposition](https://transformer-circuits.pub/2022/toy_model/index.html)
16-
- [Towards Monosemanticity: Decomposing Language Models With Dictionary Learning](https://transformer-circuits.pub/2023/monosemantic-features/index.html)
17-
- [Sparse Autoencoders Find Highly Interpretable Features in Language Models](https://arxiv.org/abs/2309.08600)
12+
- 2024.10.9 Transformers and Mambas are mechanistically similar in both feature and circuit level. Can we follow this line and find universal motifs and fundamental differences between language model architectures? Link: [Towards Universality: Studying Mechanistic Similarity Across Language Model Architectures](https://arxiv.org/pdf/2410.06672)
1813

19-
Furthermore, to dive deeper into the inner activations of LMs, it's recommended to get familiar with the [TransformerLens](https://github.com/TransformerLensOrg/TransformerLens) library.
14+
- 2024.5.22 We propose hierarchical tracing, a promising method to scale up sparse feature circuit analysis to industrial size language models! Link: [Automatically Identifying Local and Global Circuits with Linear Computation Graphs](https://arxiv.org/pdf/2405.13868)
15+
16+
- 2024.2.19 Our first attempt on SAE-based circuit analysis for Othello-GPT and found an example of Attention Superposition in the wild! Link: [Dictionary learning improves patch-free circuit discovery in mechanistic interpretability: A case study on othello-gpt](https://arxiv.org/pdf/2402.12201).
2017

2118
## Installation
2219

@@ -35,7 +32,7 @@ cd ui
3532
bun install
3633
```
3734

38-
It's worth noting that `bun` is not well-supported on Windows, so you may need to use WSL or other Linux-based solutions to run the frontend, or consider using a different package manager, such as `pnpm` or `yarn`.
35+
`bun` is not well-supported on Windows, so you may need to use WSL or other Linux-based solutions to run the frontend, or consider using a different package manager, such as `pnpm` or `yarn`.
3936

4037
## Launch an Experiment
4138

TransformerLens/.devcontainer/Dockerfile

Lines changed: 0 additions & 34 deletions
This file was deleted.

TransformerLens/.devcontainer/devcontainer.json

Lines changed: 0 additions & 24 deletions
This file was deleted.

TransformerLens/.gitattributes

Lines changed: 0 additions & 1 deletion
This file was deleted.

TransformerLens/.gitconfig

Lines changed: 0 additions & 11 deletions
This file was deleted.

TransformerLens/.github/ISSUE_TEMPLATE/bug.md

Lines changed: 0 additions & 27 deletions
This file was deleted.

TransformerLens/.github/ISSUE_TEMPLATE/proposal.md

Lines changed: 0 additions & 31 deletions
This file was deleted.

0 commit comments

Comments
 (0)