huggingface
diff --git a/‎docs/source/_toctree.yml‎
Lines changed: 2 additions & 0 deletions b/‎docs/source/_toctree.yml‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎docs/source/package_reference/delora.md‎
Lines changed: 35 additions & 0 deletions b/‎docs/source/package_reference/delora.md‎
Lines changed: 35 additions & 0 deletions
diff --git a/‎examples/delora_finetuning/README.md‎
Lines changed: 102 additions & 0 deletions b/‎examples/delora_finetuning/README.md‎
Lines changed: 102 additions & 0 deletions
diff --git a/‎examples/delora_finetuning/delora_finetuning.py‎
Lines changed: 189 additions & 0 deletions b/‎examples/delora_finetuning/delora_finetuning.py‎
Lines changed: 189 additions & 0 deletions
diff --git a/‎method_comparison/MetaMathQA/experiments/delora/llama-3.2-3B-rank32/adapter_config.json‎
Lines changed: 20 additions & 0 deletions b/‎method_comparison/MetaMathQA/experiments/delora/llama-3.2-3B-rank32/adapter_config.json‎
Lines changed: 20 additions & 0 deletions
diff --git a/‎method_comparison/MetaMathQA/experiments/delora/llama-3.2-3B-rank32/training_params.json‎
Lines changed: 6 additions & 0 deletions b/‎method_comparison/MetaMathQA/experiments/delora/llama-3.2-3B-rank32/training_params.json‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎src/peft/__init__.py‎
Lines changed: 4 additions & 0 deletions b/‎src/peft/__init__.py‎
Lines changed: 4 additions & 0 deletions
@@ -136,6 +136,8 @@
       title: RoAd
     - local: package_reference/waveft
       title: WaveFT
+    - local: package_reference/delora
+      title: DeLoRA
 
     title: Adapters
   - sections:
 
@@ -0,0 +1,35 @@
+<!--Copyright 2025 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+
+⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
+rendered properly in your Markdown viewer.
+
+-->
+
+# DeLoRA: Decoupled Low-rank Adaptation
+[DeLoRA](https://huggingface.co/papers/2503.18225) is a parameter-efficient fine-tuning technique that implicitly maintains a Frobenius boundary with respect to the pretrained weights by normalizing and scaling learnable low-rank matrices. This effectively decouples the learning of directions (BA term) and magnitude (boundary term) of the weight updates, avoiding catastrophic shifts in the adapted weights and enhancing robustness to hyperparameter choices.
+
+Note:
+- use 10-100x larger learning rate than standard LoRA variants (typical values from 1e-3/1e-2/..)
+- do not set a too small initial boundary parameter lambda (typical values are around 10/15/..)
+- setting different lambdas to different layers is possible
+
+The abstract from the paper is:
+
+> Parameter-Efficient FineTuning (PEFT) methods have recently gained significant popularity thanks to the widespread availability of large-scale pretrained models. These methods allow for quick adaptation to downstream tasks with minimal computational cost. However, popular finetuning methods such as LoRA exhibit limited robustness when it comes to hyperparameter choices or extended training regimes, preventing optimal out-of-the-box performance. In contrast, bounded approaches, such as ETHER, provide greater robustness but are limited to extremely low-rank adaptations and fixed-strength transformations, reducing their adaptation expressive power. In this work, we propose Decoupled Low-rank Adaptation (DeLoRA), a novel finetuning method that normalizes and scales learnable low-rank matrices. By bounding the distance of the transformation, DeLoRA effectively decouples the angular learning from the adaptation strength, enhancing robustness without compromising performance. Through evaluations on subject-driven image generation, natural language understanding, and instruction tuning, we show that DeLoRA matches or surpasses performance of competing PEFT methods, while exhibiting stronger robustness. 
+
+## DeloraConfig
+
+[[autodoc]] tuners.delora.config.DeloraConfig
+
+## DeloraModel
+
+[[autodoc]] tuners.delora.model.DeloraModel
@@ -0,0 +1,102 @@
+# DeLoRA: Decoupled Low-Rank Adaptation 
+
+## Introduction
+[DeLoRA](https://huggingface.co/papers/2503.18225) tackles finetuning in a Frobenius-norm bounded setup: this allows to prevent divergence from the pretrained model, effectively decoupling the learning of angles and magnitudes.
+
+This is done by (i) normalization of the BA low-rank matrices, which bound the updates' Frobenius norm, (ii) learnable scaling lambda, which controls the update's boundary/magnitude, (iii) layer-wise scaling of ||W||, to adapt each update's norm to the original weights' norm.
+
+## Quick start
+
+With respect to your standard PEFT training procedure with LoRA, simply swap your `LoraConfig` for a `DeloraConfig`. Note however that `lora_alpha` parameter is replaced by `delora_lambda` parameter which sets an upper bound to the Frobenius norm of the weight change.
+
+```python
+import torch
+from peft import DeloraConfig, get_peft_model
+from transformers import AutoTokenizer, AutoModelForCausalLM
+from trl import SFTConfig, SFTTrainer
+from datasets import load_dataset
+
+model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3-8B", dtype=torch.bfloat16, device_map="auto")
+tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B")
+tokenizer.pad_token_id = tokenizer.eos_token_id
+delora_config = DeloraConfig(r=32, delora_lambda=15)
+
+peft_model = get_peft_model(model, delora_config)
+peft_model.print_trainable_parameters()
+
+dataset = load_dataset("imdb", split="train[:1%]")
+
+training_args = SFTConfig(dataset_text_field="text", max_seq_length=128)
+trainer = SFTTrainer(
+    model=peft_model,
+    args=training_args,
+    train_dataset=dataset,
+    processing_class=tokenizer,
+)
+trainer.train()
+peft_model.save_pretrained("delora-llama-3-8b")
+```
+
+To utilize the fine-tuned DeLoRA modules, simply run the following command:
+```python
+import torch
+from peft import PeftModel
+from transformers import AutoModelForCausalLM
+
+model = AutoModelForCausalLM.from_pretrained(
+    "meta-llama/Meta-Llama-3-8B", dtype=torch.bfloat16, device_map="auto"
+)
+peft_model = PeftModel.from_pretrained(model, "delora-llama-3-8b")
+```
+
+## Advanced Usage
+In this script the default DeLoRA layers are the query and value layers of the Llama model. Adding adapters on more layers will increase memory usage. If you wish to choose a different set of layers for DeLoRA to be applied on, you can simply define it using:
+```bash
+python examples/delora_finetuning/delora_finetuning.py --base_model meta-llama/Meta-Llama-3-8B --delora_target_modules "q_proj,k_proj,v_proj,o_proj" 
+```
+
+Using different lambdas for different layers is also possible by setting `lambda_pattern`.
+
+### Fine-tune
+```bash
+python delora_finetuning.py \
+    --base_model "PATH_TO_MODEL" \
+    --data_path "PATH_TO_DATASET" \
+    --output_dir "PATH_TO_OUTPUT_DIR" \
+    --batch_size 1 \
+    --num_epochs 3 \
+    --learning_rate 3e-3 \
+    --cutoff_len 512 \
+    --val_set_size 500 \
+    --eval_step 10 \
+    --save_step 100 \
+    --device "auto" \
+    --rank 32 \
+    --delora_lambda 15 \
+    --module_dropout 0.1 \
+    --delora_target_modules "q_proj,v_proj" \
+    --hub_model_id "YOUR_HF_REPO" \
+    --push_to_hub
+```
+
+## Additional Notes
+### Best practices
+- use 10-100x larger learning rate than standard LoRA variants (typical values from 1e-3/1e-2/..)
+- do not set a too small initial boundary parameter lambda (typical values are around 10/15/..)
+
+
+### DeLoRA vs DoRA
+DeLoRA might feel quite similar to DoRA (given the similar target of decoupling angular from magnitude learning), however it presents key differences: (i) DoRA applies normalization and scaling operations on the fully finetuned weights ($W + \Delta W$), (ii) DoRA's normalization operation is performed on the column space of the weight matrices.
+
+Conversely DeLoRA (i) introduces the normalization and scaling operations directly on the weight updates $\Delta W$, better preventing divergence from the pretrained model, and (ii) normalizes the inner low-dimensional space, which enforces a Frobenius-norm boundary to the weight updates.
+
+
+## Citation
+```
+@inproceedings{bini2025decouplinganglesstrengthlowrank,
+      title={Decoupling Angles and Strength in Low-rank Adaptation}, 
+      author={Massimo Bini and Leander Girrbach and Zeynep Akata},
+      year={2025},
+  booktitle={International Conference on Learning Representations (ICLR)},
+}
+```
@@ -0,0 +1,189 @@
+# This script is based on examples/randlora_finetuning/randlora_finetuning.py
+import os
+
+import torch
+from datasets import load_dataset
+from transformers import (
+    AutoModelForCausalLM,
+    AutoTokenizer,
+    DataCollatorForLanguageModeling,
+    Trainer,
+    TrainingArguments,
+)
+
+from peft import DeloraConfig, get_peft_model
+
+
+def train_model(
+    base_model: str,
+    data_path: str,
+    output_dir: str,
+    batch_size: int,
+    num_epochs: int,
+    learning_rate: float,
+    cutoff_len: int,
+    val_set_size: int,
+    eval_step: int,
+    save_step: int,
+    device: str,
+    rank: int,
+    delora_lambda: int,
+    module_dropout: float,
+    target_modules: str,
+    hub_model_id: str,
+    push_to_hub: bool,
+):
+    os.environ["TOKENIZERS_PARALLELISM"] = "false"
+    hf_token = os.getenv("HF_TOKEN")
+
+    # Setup device
+    device = torch.device(device)
+    print(f"Using device: {device}")
+
+    # load tokenizer
+    tokenizer = AutoTokenizer.from_pretrained(base_model, token=hf_token)
+
+    # Compute type
+    device_type = device.type
+    device_module = getattr(torch, device_type, torch.cuda)
+    bf16_supported = device_module.is_available() and device_module.is_bf16_supported()
+    dtype = torch.bfloat16 if bf16_supported else torch.float32
+
+    # Load the base model
+    model = AutoModelForCausalLM.from_pretrained(
+        base_model,
+        dtype=dtype,
+    )
+
+    # DeLoRA config for the PEFT model
+    peft_config = DeloraConfig(
+        r=rank,
+        delora_lambda=delora_lambda,
+        target_modules=(target_modules.split(",") if target_modules else None),
+        module_dropout=module_dropout,
+        bias="none",
+    )
+
+    # get the peft model with DeLoRA config
+    model = get_peft_model(model, peft_config)
+
+    model.to(device)  # MODEL TO ACCELERATOR
+    tokenizer.pad_token = tokenizer.eos_token
+
+    # Load the dataset
+    dataset = load_dataset(data_path)
+
+    def tokenize_function(examples):
+        inputs = tokenizer(examples["text"], padding="max_length", truncation=True, max_length=cutoff_len)
+        inputs["labels"] = inputs["input_ids"].copy()  # setting labels for a language modeling task
+        return inputs
+
+    # Tokenize the dataset and prepare for training
+    tokenized_datasets = dataset.map(tokenize_function, batched=True, remove_columns=dataset["train"].column_names)
+
+    # Data collator to dynamically pad the batched examples
+    data_collator = DataCollatorForLanguageModeling(tokenizer, mlm=False)
+
+    # Compute the total amount of training step for warmup
+    max_steps = int((len(dataset) // batch_size) * num_epochs)
+
+    # Define training arguments
+    training_args = TrainingArguments(
+        output_dir=output_dir,
+        num_train_epochs=num_epochs,
+        per_device_train_batch_size=batch_size,
+        per_device_eval_batch_size=batch_size,
+        warmup_steps=int(max_steps * 0.1),  # 10% of total trainig steps
+        weight_decay=0.0,
+        logging_steps=eval_step,
+        save_steps=save_step,
+        save_total_limit=2,
+        push_to_hub=push_to_hub,
+        hub_model_id=hub_model_id,
+        gradient_accumulation_steps=16,
+        learning_rate=learning_rate,
+        hub_token=hf_token,
+        label_names=["labels"],
+    )
+
+    # Clear accelerator cache to free memory
+    device_module.empty_cache()
+
+    # Initialize the Trainer
+    trainer = Trainer(
+        model=model,
+        args=training_args,
+        train_dataset=tokenized_datasets["train"],
+        eval_dataset=tokenized_datasets["test"],
+        data_collator=data_collator,
+    )
+
+    # Start model training
+    trainer.train()
+
+    # Save and push the trained model and tokenizer
+    if push_to_hub:
+        # Push the main model to the hub
+        trainer.push_to_hub(commit_message="Fine-tuned model")
+
+    # Save the model and tokenizer locally
+    model.save_pretrained(output_dir)
+    tokenizer.save_pretrained(output_dir)
+
+
+if __name__ == "__main__":
+    import argparse
+
+    parser = argparse.ArgumentParser(description="Fine-tune LLaMA with DeLoRA")
+    parser.add_argument("--base_model", type=str, default="huggyllama/llama-7b", help="Base model path or name")
+    parser.add_argument(
+        "--data_path", type=str, default="timdettmers/openassistant-guanaco", help="Dataset path or name"
+    )
+    parser.add_argument(
+        "--output_dir", type=str, default="path/to/output", help="Output directory for the fine-tuned model"
+    )
+    parser.add_argument("--batch_size", type=int, default=1, help="Batch size")
+    parser.add_argument("--num_epochs", type=int, default=1, help="Number of training epochs")
+    parser.add_argument("--learning_rate", type=float, default=3e-3, help="Learning rate")
+    parser.add_argument("--cutoff_len", type=int, default=512, help="Cutoff length for tokenization")
+    parser.add_argument("--val_set_size", type=int, default=500, help="Validation set size")
+    parser.add_argument("--eval_step", type=int, default=10, help="Evaluation step interval")
+    parser.add_argument("--save_step", type=int, default=100, help="Save step interval")
+    parser.add_argument("--device", type=str, default="auto", help="Device to use for training")
+    parser.add_argument("--rank", type=int, default=32, help="DeLoRA basis rank")
+    parser.add_argument("--delora_lambda", type=int, default=640, help="DeLoRA alpha")
+    parser.add_argument("--module_dropout", type=float, default=0.05, help="DeLoRA dropout rate")
+    parser.add_argument(
+        "--target_modules", type=str, default=None, help="Comma-separated list of target modules for DeLoRA"
+    )
+    parser.add_argument(
+        "--hub_model_id",
+        type=str,
+        default="path/to/repo",
+        help="Repository name to push the model on the Hugging Face Hub",
+    )
+    parser.add_argument("--push_to_hub", action="store_true", help="Whether to push the model to Hugging Face Hub")
+    args = parser.parse_args()
+
+    if args.device == "auto":
+        args.device = torch.accelerator.current_accelerator().type if hasattr(torch, "accelerator") else "cuda"
+
+    train_model(
+        base_model=args.base_model,
+        data_path=args.data_path,
+        output_dir=args.output_dir,
+        batch_size=args.batch_size,
+        num_epochs=args.num_epochs,
+        learning_rate=args.learning_rate,
+        cutoff_len=args.cutoff_len,
+        val_set_size=args.val_set_size,
+        eval_step=args.eval_step,
+        save_step=args.save_step,
+        device=args.device,
+        rank=args.rank,
+        delora_lambda=args.delora_lambda,
+        module_dropout=args.module_dropout,
+        target_modules=args.target_modules,
+        hub_model_id=args.hub_model_id,
+        push_to_hub=args.push_to_hub,
+    )
@@ -0,0 +1,20 @@
+{
+  "lambda_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": null,
+  "bias": "none",
+  "exclude_modules": null,
+  "inference_mode": false,
+  "init_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "delora_lambda": 15,
+  "module_dropout": 0.0,
+  "modules_to_save": null,
+  "peft_type": "DELORA",
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": null,
+  "task_type": "CAUSAL_LM"
+}
@@ -0,0 +1,6 @@
+{
+  "optimizer_kwargs": {
+    "lr": 1e-3
+  }
+}
+
@@ -59,6 +59,8 @@
     C3AModel,
     CPTConfig,
     CPTEmbedding,
+    DeloraConfig,
+    DeloraModel,
     EvaConfig,
     FourierFTConfig,
     FourierFTModel,
@@ -154,6 +156,8 @@
     "C3AModel",
     "CPTConfig",
     "CPTEmbedding",
+    "DeloraConfig",
+    "DeloraModel",
     "EvaConfig",
     "FourierFTConfig",
     "FourierFTModel",
-Original file line number
+Diff line change
@@ @@ -0,0 +1,6 @@ @@
 +{
 +  "optimizer_kwargs": {
 +    "lr": 1e-3
 +  }
 +}
++