Skip to content

Commit 8d8aa0b

Browse files
Method comparison: LoRA that targets MLP modules (#2845)
The "LoRA Without Regret" blog post (https://thinkingmachines.ai/blog/lora/) mentions that targeting the MLP part of the transformer is more effective than targeting the attention modules. This experiment tests this by targeting: ["gate_proj", "up_proj", "down_proj"] instead of the default layers (["q_proj", "v_proj"]). I chose a rank to match the parameter count we would get when targeting the attention modules with rank 32, which is rank 10. Testing on my machine, there is indeed a nice improvement in the test score: | metric | target attention | target MLP | |----------------------|------------------|------------| | test accuracy | 48.2% | 51.3% | | # trainable params | 9175040 | 9461760 | | peak memory reserved | 20.74 GB | 23.02 GB | There is, however, also a marked increase in memory usage, despite matching parameter count. Since the operations are different, this may not be a surprise, but let's wait for the final verdict once this experiment runs on our AWS instance. Note: I also tested higher and lower ranks when targeting the MLP. The effect on memory usage was negligible, but it did improve the score: | metric | rank 8 | rank 10 | rank 12 | rank 32 | |--------------------|---------|---------|----------|----------| | test accuracy | 50.3% | 51.3% | 52.2% | 54.8% | | # trainable params | 7569408 | 9461760 | 11354112 | 30277632 | In the end, I chose only to add the rank 10 experiment to match the number of trainable parameters.
1 parent 182f4c9 commit 8d8aa0b

File tree

1 file changed

+30
-0
lines changed
  • method_comparison/MetaMathQA/experiments/lora/llama-3.2-3B-rank10-target-mlp

1 file changed

+30
-0
lines changed
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
{
2+
"alpha_pattern": {},
3+
"auto_mapping": null,
4+
"base_model_name_or_path": null,
5+
"bias": "none",
6+
"corda_config": null,
7+
"eva_config": null,
8+
"exclude_modules": null,
9+
"fan_in_fan_out": false,
10+
"inference_mode": false,
11+
"init_lora_weights": true,
12+
"layer_replication": null,
13+
"layers_pattern": null,
14+
"layers_to_transform": null,
15+
"loftq_config": {},
16+
"lora_alpha": 20,
17+
"lora_bias": false,
18+
"lora_dropout": 0.0,
19+
"megatron_config": null,
20+
"megatron_core": "megatron.core",
21+
"modules_to_save": null,
22+
"peft_type": "LORA",
23+
"r": 10,
24+
"rank_pattern": {},
25+
"revision": null,
26+
"target_modules": ["gate_proj", "up_proj", "down_proj"],
27+
"task_type": "CAUSAL_LM",
28+
"use_dora": false,
29+
"use_rslora": false
30+
}

0 commit comments

Comments
 (0)