Method comparison: LoRA that targets MLP modules (#2845)

BenjaminBossan · web-flow · commit 8d8aa0b71652 · 2025-10-16T17:37:02.000+02:00
The "LoRA Without Regret" blog post (https://thinkingmachines.ai/blog/lora/) mentions that targeting the MLP part of the transformer is more effective than targeting the attention modules. This experiment tests this by targeting: ["gate_proj", "up_proj", "down_proj"] instead of the default layers (["q_proj", "v_proj"]). I chose a rank to match the parameter count we would get when targeting the attention modules with rank 32, which is rank 10. Testing on my machine, there is indeed a nice improvement in the test score: | metric | target attention | target MLP | |----------------------|------------------|------------| | test accuracy | 48.2% | 51.3% | | # trainable params | 9175040 | 9461760 | | peak memory reserved | 20.74 GB | 23.02 GB | There is, however, also a marked increase in memory usage, despite matching parameter count. Since the operations are different, this may not be a surprise, but let's wait for the final verdict once this experiment runs on our AWS instance. Note: I also tested higher and lower ranks when targeting the MLP. The effect on memory usage was negligible, but it did improve the score: | metric | rank 8 | rank 10 | rank 12 | rank 32 | |--------------------|---------|---------|----------|----------| | test accuracy | 50.3% | 51.3% | 52.2% | 54.8% | | # trainable params | 7569408 | 9461760 | 11354112 | 30277632 | In the end, I chose only to add the rank 10 experiment to match the number of trainable parameters.
diff --git a/method_comparison/MetaMathQA/experiments/lora/llama-3.2-3B-rank10-target-mlp/adapter_config.json b/method_comparison/MetaMathQA/experiments/lora/llama-3.2-3B-rank10-target-mlp/adapter_config.json
@@ -0,0 +1,30 @@
+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": null,
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": false,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 20,
+  "lora_bias": false,
+  "lora_dropout": 0.0,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 10,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": ["gate_proj", "up_proj", "down_proj"],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}