OpenMOSS
diff --git a/‎README.md‎
Lines changed: 1 addition & 1 deletion b/‎README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/assets/images/lm-saes-overview.svg‎
Lines changed: 4 additions & 0 deletions b/‎docs/assets/images/lm-saes-overview.svg‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎docs/concepts.md‎
Lines changed: 3 additions & 0 deletions b/‎docs/concepts.md‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎docs/index.md‎
Lines changed: 107 additions & 2 deletions b/‎docs/index.md‎
Lines changed: 107 additions & 2 deletions
diff --git a/‎examples/reproduce_evolution_of_concepts/README.md‎
Lines changed: 88 additions & 0 deletions b/‎examples/reproduce_evolution_of_concepts/README.md‎
Lines changed: 88 additions & 0 deletions
diff --git a/‎examples/reproduce_evolution_of_concepts/analyze_pythia_crosscoder.py‎
Lines changed: 141 additions & 0 deletions b/‎examples/reproduce_evolution_of_concepts/analyze_pythia_crosscoder.py‎
Lines changed: 141 additions & 0 deletions
@@ -92,7 +92,7 @@ Please cite this library as:
 ```
 @misc{Ge2024OpenMossSAEs,
     title  = {OpenMoss Language Model Sparse Autoencoders},
-    author = {Xuyang Ge, Fukang Zhu, Junxuan Wang, Wentao Shu, Lingjie Chen, Zhengfu He},
+    author = {Xuyang Ge, Wentao Shu, Junxuan Wang, Guancheng Zhou, Jiaxing Wu, Fukang Zhu, Lingjie Chen, Zhengfu He},
     url    = {https://github.com/OpenMOSS/Language-Model-SAEs},
     year   = {2024}
 }
 
@@ -0,0 +1,3 @@
+# Key Concepts
+
+![Overview of Language Model SAEs Pipeline](assets/images/lm-saes-overview.svg)
@@ -10,7 +10,112 @@ This library provides:
 
 - **Scalability**: Our framework is fully distributed with arbitrary combinations of data, model, and head parallelism for both training and analysis. Enjoy training SAEs with millions of features!
 - **Flexibility**: We support a wide range of SAE variants, including vanilla SAEs, Lorsa (Low-rank Sparse Attention), CLT (Cross-layer Transcoder), MoLT (Mixture of Linear Transforms), CrossCoder, and more. Each variant can be combined with different activation functions (e.g., ReLU, JumpReLU, TopK, BatchTopK) and sparsity penalties (e.g., L1, Tanh).
-- **Easy to Use**: We provide high-level `runners` APIs to quickly launch experiments with simple configurations. Check our [examples](examples) for verified hyperparameters.
+- **Easy to Use**: We provide high-level `runners` APIs to quickly launch experiments with simple configurations. Check our [examples](https://github.com/OpenMOSS/Language-Model-SAEs/tree/main/examples) for verified hyperparameters.
 - **Visualization**: We provide a unified web interface to visualize learned SAE variants and their features.
 
-## Getting Started
+## Quick Start
+
+### Installation
+
+=== "Astral uv"
+
+    We strongly recommend users to use [uv](https://docs.astral.sh/uv/) for dependency management. uv is a modern drop-in replacement of poetry or pdm, with a lightning fast dependency resolution and package installation. See their [instructions](https://docs.astral.sh/uv/getting-started/) on how to initialize a Python project with uv.
+
+    To add our library as a project dependency, run:
+
+    ```bash
+    uv add lm-saes
+    ```
+
+    We also support [Ascend NPU](https://github.com/Ascend/pytorch) as an accelerator backend. To add our library as a project dependency with NPU dependency constraints, run:
+
+    ```bash
+    uv add lm-saes[npu]
+    ```
+
+=== "Pip"
+
+    Of course, you can also directly use [pip](https://pypi.org/project/pip/) to install our library. To install our library with pip, run:
+
+    ```bash
+    pip install lm-saes
+    ```
+
+    Note that since we use a forked version of [TransformerLens](https://github.com/TransformerLensOrg/TransformerLens), so it'll be better to install the package in a seperate environment created by [conda](https://github.com/conda-forge/miniforge) or [virtualenv](https://virtualenv.pypa.io/en/latest/) to avoid conflicts.
+
+    We also support [Ascend NPU](https://github.com/Ascend/pytorch) as an accelerator backend. To install our library with NPU dependency constraints, run:
+
+    ```bash
+    pip install lm-saes[npu]
+    ```
+
+### Load a trained Sparse Autoencoder from HuggingFace
+
+WIP
+
+### Training a Sparse Autoencoder
+
+To train a simple Sparse Autoencoder on `blocks.5.hook_resid_post` of a Pythia-160M model with $768*8$ features, you can use the following:
+
+```python
+settings = TrainSAESettings(
+    sae=SAEConfig(
+        hook_point_in=f"blocks.5.hook_resid_post",
+        d_model=768,
+        expansion_factor=8,
+        act_fn="jumprelu",
+    ),
+    initializer=InitializerConfig(
+        grid_search_init_norm=True,
+    ),
+    trainer=TrainerConfig(
+        lr=5e-5,
+        l1_coefficient=0.3,
+        total_training_tokens=800_000_000,
+        sparsity_loss_type="tanh-quad",
+        jumprelu_lr_factor=0.1,
+    ),
+    wandb=WandbConfig(
+        wandb_project="lm-saes",
+        exp_name=name,
+    ),
+    activation_factory=ActivationFactoryConfig(
+        sources=[
+            ActivationFactoryActivationsSource(
+                path=Path(args.activation_path).expanduser(),
+                name=f"pythia-160m-1d",
+                device="cuda",
+                dtype=torch.float32,
+            )
+        ],
+        target=ActivationFactoryTarget.ACTIVATIONS_1D,
+        hook_points=["blocks.5.hook_resid_post"],
+        batch_size=4096,
+        buffer_size=None,
+    ),
+    sae_name="L5R",
+    sae_series="pythia-sae",
+)
+train_sae(settings)
+```
+
+### Analyze a trained Sparse Autoencoder
+
+WIP
+
+### Convert trained Sparse Autoencoder to SAELens format
+
+WIP
+
+## Citation
+
+If you find this library useful in your research, please cite:
+
+```
+@misc{Ge2024OpenMossSAEs,
+    title  = {OpenMoss Language Model Sparse Autoencoders},
+    author = {Xuyang Ge, Wentao Shu, Junxuan Wang, Guancheng Zhou, Jiaxing Wu, Fukang Zhu, Lingjie Chen, Zhengfu He},
+    url    = {https://github.com/OpenMOSS/Language-Model-SAEs},
+    year   = {2024}
+}
+```
@@ -0,0 +1,88 @@
+# Codes for Evolution of Concepts in Language Model Pre-Training
+
+## Install the Environment
+
+We use [uv](https://docs.astral.sh/uv/getting-started/installation/) as the dependency manager. Install `uv`, and run:
+
+```bash
+uv sync --extra default
+```
+
+to fetch all dependencies.
+
+## Replicate the Crosscoders
+
+To replicate our key results, you need to generate Pythia model activations, train the crosscoders, and analyze the crosscoders.
+
+### Requirements
+
+The following instructions assume you have access to a GPU cluster with at least 16 NVIDIA A100s/H100s or better GPUs, with CUDA version 12.8. With some simple modifications (e.g. change all `"cuda"` to `"npu"`, and install the environment by `uv sync --extra npu`), these codes can also run on an NPU cluster with at least 32 Ascend 910B or better NPUs. The cluster should have a large disk space (>200T) to save all model activations.
+
+Our scripts also require you have a [subset](https://huggingface.co/datasets/Hzfinfdu/SlimPajama-3B) of the SlimPajama dataset saved by `dataset.save_to_disk()` at `~/data/SlimPajama-3B`, and all Pythia model checkpoints at `~/models/pythia-{size}-all/step{step}`, where `size` can be `160m` or `6.9b`. You can change the paths in the scripts to your own paths.
+
+### Generate Activations
+
+Two types of model activations are required for training and analyzing crosscoders:
+
+1. **1D Activations:** Activations where the context dimension folds into the batch dimension and re-shuffled. Typically with the shape of `(batch, d_model)`. Use for crosscoder training.
+2. **2D Activations:** Activations where the context dimension is reserved. Typically with the shape of `(batch, n_context, d_model)`. Use for crosscoder analyzing.
+
+To generate 1D activations of Pythia-160M, run:
+
+```bash
+uv run torchrun --nproc-per-node=8 generate-pythia-activations-1d.py --size 160m --layer 6
+```
+
+This will take up ~40T disk space.
+
+To generate 2D activations of Pythia-160M, run:
+
+```bash
+uv run torchrun --nproc-per-node=8 generate-pythia-activations-2d.py --size 160m --layer 6
+```
+
+To generate 1D activations of Pythia-6.9B, run:
+
+```bash
+uv run torchrun --nproc-per-node=8 generate-pythia-activations-1d.py --size 6.9b --layer 16
+```
+
+This will take up ~170T disk space.
+
+To generate 2D activations of Pythia-160M, run:
+
+```bash
+uv run torchrun --nproc-per-node=8 generate-pythia-activations-2d.py --size 6.9b --layer 16
+```
+
+### Training Crosscoders
+
+To train crosscoders on Pythia-160M, run:
+
+```bash
+uv run torchrun --nproc-per-node=8 train-pythia-crosscoders.py --init_encoder_factor 1 --lr 5e-5 --l1_coefficient 0.3 --jumprelu_lr_factor 0.1 --layer 6 --expansion_factor 32 --batch_size 2048
+```
+
+To train crosscoders on Pythia-6.9B, run:
+
+```bash
+uv run torchrun --nproc-per-node=8 --nnodes=2 train-pythia-crosscoders.py --init_encoder_factor 1 --lr 1e-5 --l1_coefficient 0.3 --jumprelu_lr_factor 0.3 --layer 16 --expansion_factor 8 --batch_size 2048 --size 6.9b # Require 2 nodes
+```
+
+You can modify the `expansion_factor` to get crosscoders with different dictionary sizes, and modify the `l1_coefficient` to move the trade-off between sparsity and reconstruction fidelity.
+
+### Analyze Crosscoders
+
+To analyze trained crosscoders, you should first have a MongoDB instance run at `localhost:27017`, and run
+
+```bash
+uv run analyze-pythia-crosscoder.py --name <crosscoder-name> --batch-size 16
+```
+
+where `<crosscoder-name>` is the name of your trained crosscoder. Results will be saved to the MongoDB. Afterwards, you can use our visualization tool to view the features:
+
+```bash
+cd ../../ui
+bun install
+bun run dev
+```
@@ -0,0 +1,141 @@
+import argparse
+import os
+import re
+from pathlib import Path
+
+import torch
+from more_itertools import batched
+
+from lm_saes import (
+    ActivationFactoryActivationsSource,
+    ActivationFactoryConfig,
+    ActivationFactoryTarget,
+    AnalyzeCrossCoderSettings,
+    CrossCoderConfig,
+    FeatureAnalyzerConfig,
+    MongoDBConfig,
+    analyze_crosscoder,
+)
+
+
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--size", type=str, default="160m")
+    parser.add_argument("--name", type=str, default="L6R-lr5e-05-l1c0.5-32heads-8x-jlr0.1")
+    parser.add_argument("--batch-size", type=int, default=16)
+    parser.add_argument("--analysis-name", type=str, default="default")
+    return parser.parse_args()
+
+
+d_model_map = {
+    "70m": 512,
+    "160m": 768,
+    "410m": 1024,
+    "1b": 2048,
+    "1.4b": 2048,
+    "2.8b": 2048,
+    "6.9b": 4096,
+    "12b": 5120,
+}
+
+n_layers_map = {
+    "70m": 6,
+    "160m": 12,
+    "410m": 24,
+    "1b": 16,
+    "1.4b": 24,
+    "2.8b": 32,
+    "6.9b": 32,
+    "12b": 36,
+}
+
+steps = [
+    0,
+    2,
+    4,
+    8,
+    16,
+    32,
+    64,
+    128,
+    256,
+    512,
+    1000,
+    2000,
+    3000,
+    4000,
+    5000,
+    6000,
+    7000,
+    8000,
+    9000,
+    10000,
+    14000,
+    20000,
+    27000,
+    34000,
+    47000,
+    60000,
+    74000,
+    87000,
+    100000,
+    114000,
+    127000,
+    143000,
+]
+
+if __name__ == "__main__":
+    args = parse_args()
+    world_size = int(os.environ.get("WORLD_SIZE"))
+    if world_size is None:
+        raise ValueError("WORLD_SIZE is not set")
+    assert len(steps) % world_size == 0, f"Head count {len(steps)} is not divisible by world size {world_size}"
+
+    head_per_device = len(steps) // world_size
+    layer = int(re.search(r"L(\d+)R", args.name).group(1))
+    print(f"Analyzing {args.name} at layer {layer}")
+    settings = AnalyzeCrossCoderSettings(
+        sae=CrossCoderConfig.from_pretrained(
+            os.path.expanduser(f"~/results/{args.name}"),
+            device="cuda",
+            dtype=torch.float16,
+        ),
+        analyzer=FeatureAnalyzerConfig(
+            total_analyzing_tokens=100_000_000,
+            subsamples={
+                "top_activations": {"proportion": 1.0, "n_samples": 20},
+                "non_activating": {"proportion": 0.3, "n_samples": 20, "max_length": 50},
+            },
+            ignore_token_ids=[0],
+        ),
+        sae_name=args.name,
+        sae_series="pythia-crosscoder",
+        activation_factories=[
+            ActivationFactoryConfig(
+                sources=[
+                    ActivationFactoryActivationsSource(
+                        path={
+                            f"step{step}": Path(
+                                os.path.expanduser(
+                                    f"~/activations/SlimPajama-3B-activations-pythia-{args.size}-2d-all-fp16/step{step}/blocks.{layer}.hook_resid_post"
+                                )
+                            )
+                            for step in per_device_steps
+                        },
+                        sample_weights=1.0,
+                        name="SlimPajama-3B",
+                        device="cuda",
+                        dtype=torch.float16,
+                    )
+                ],
+                target=ActivationFactoryTarget.ACTIVATIONS_2D,
+                hook_points=[f"step{step}" for step in per_device_steps],
+                batch_size=args.batch_size,
+            )
+            for per_device_steps in batched(steps, head_per_device)
+        ],
+        mongo=MongoDBConfig(),
+        feature_analysis_name=args.analysis_name,
+        device_type="cuda",
+    )
+    analyze_crosscoder(settings)
Original file line number	Diff line number	Diff line change
`@@ -92,7 +92,7 @@ Please cite this library as:`
`92`	`92`	```
`93`	`93`	`@misc{Ge2024OpenMossSAEs,`
`94`	`94`	`title = {OpenMoss Language Model Sparse Autoencoders},`
`95`		`- author = {Xuyang Ge, Fukang Zhu, Junxuan Wang, Wentao Shu, Lingjie Chen, Zhengfu He},`
	`95`	`+ author = {Xuyang Ge, Wentao Shu, Junxuan Wang, Guancheng Zhou, Jiaxing Wu, Fukang Zhu, Lingjie Chen, Zhengfu He},`
`96`	`96`	`url = {https://github.com/OpenMOSS/Language-Model-SAEs},`
`97`	`97`	`year = {2024}`
`98`	`98`	`}`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+# Key Concepts`
	`2`	`+`
	`3`	`+![Overview of Language Model SAEs Pipeline](assets/images/lm-saes-overview.svg)`