Skip to content

Conversation

@RubensZimbres
Copy link
Contributor

Add VaultGemma Fine-tuning with Differential Privacy and Inference

Overview

This PR adds a complete pipeline for privacy-preserving fine-tuning and inference of VaultGemma 1B on medical data using LoRA adapters and differential privacy via Opacus.

What's Added

  • Complete training pipeline with 4-bit quantization, LoRA, and differential privacy
  • Inference code for loading and running fine-tuned models
  • Comprehensive README with setup instructions and usage examples
  • Notebook: VaultGemma_FineTuning_Inference_Huggingface.ipynb

Key Features

  • 4-bit quantization using BitsAndBytes NF4 for memory efficiency
  • LoRA fine-tuning targeting all projection layers with r=8, alpha=16
  • Differential privacy with configurable epsilon and delta budgets
  • Prompt masking to train only on response tokens
  • Automatic checkpointing based on loss thresholds
  • Cosine learning rate schedule with warmup

Technical Details

Training Configuration

  • Model: VaultGemma 1B (4-bit quantized)
  • Dataset: Medical Meadow Medical Flashcards (1000 samples)
  • Privacy Budget: ε=8.0, δ=1e-5
  • Batch Size: 1 with gradient accumulation of 8
  • Learning Rate: 2e-5 with cosine schedule
  • Train/Validation Split: 90/10

Privacy Guarantees

The implementation provides (ε, δ)-differential privacy guarantees through gradient clipping (max norm: 1.0) and automatic privacy accounting via Opacus.

Inference

Includes simple inference functions with adjustable generation parameters (temperature, top_p, max_new_tokens) and support for single or batch processing.

Files Changed

  • VaultGemma_FineTuning_Inference_Huggingface.ipynb - Main training and inference notebook
  • README.md - Documentation

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@gemini-code-assist
Copy link

Summary of Changes

Hello @RubensZimbres, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers a robust solution for fine-tuning the VaultGemma 1B language model, specifically tailored for medical question-answering, with a strong emphasis on data privacy. It integrates state-of-the-art techniques like LoRA and differential privacy to enable efficient and secure model adaptation, making it suitable for applications involving sensitive information. The changes provide a complete, documented pipeline for both training and inference.

Highlights

  • Privacy-Preserving Fine-Tuning: Introduces a complete pipeline for fine-tuning Google's VaultGemma 1B model with differential privacy using Opacus, ensuring privacy guarantees for sensitive data.
  • Efficient Model Adaptation: Leverages LoRA (Low-Rank Adaptation) with 4-bit quantization (BitsAndBytes NF4) for memory-efficient and parameter-efficient fine-tuning, targeting all projection layers of the model.
  • Medical Q&A Application: The fine-tuning is demonstrated on the Medical Meadow Medical Flashcards dataset, aiming to create a medical question-answering model.
  • Comprehensive Workflow: Includes a full workflow from dataset loading and tokenization (with prompt masking) to training with gradient accumulation, learning rate scheduling, automatic checkpointing, and inference capabilities.
  • Detailed Documentation: A new README.md provides an overview, features, repository structure, training details, quick start for inference, requirements, and explicit privacy guarantees.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive notebook for fine-tuning VaultGemma with differential privacy. The implementation is well-structured, but there are several critical issues and inconsistencies that need to be addressed. My review focuses on a critical discrepancy in the reported privacy budget, errors in model checkpointing and loading logic that will cause failures, and various inconsistencies between the code and documentation. Addressing these points will significantly improve the reliability and usability of the notebook.

Comment on lines 697 to 698
"epsilon = privacy_engine.get_epsilon(delta=target_delta)\n",
"print(f\"Final privacy cost: ε = {epsilon:.2f} for δ = {target_delta}\")"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There is a critical inconsistency in the reported privacy budget. The cell's output shows a final privacy cost of ε = 22.21 for δ = 0.01. However, the code is configured with target_epsilon = 3.0 and target_delta = 1e-5. This discrepancy suggests the output is from a different execution or there is a fundamental issue in the privacy accounting. The reported epsilon is also significantly higher than the target. This must be corrected to ensure the privacy claims of this notebook are valid.

" device_map=\"auto\",\n",
")\n",
"\n",
"adapter_path = \"./final_model\"\n",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The path to the adapter for inference is hardcoded to ./final_model. However, the training loop saves checkpoints to a dynamically generated path based on the training loss (e.g., ./final_model_acc_...). This will cause a FileNotFoundError when running the inference cell. The path should be updated to point to a valid checkpoint saved during training.

"\n",
"# Training hyperparameters\n",
"device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
"num_train_epochs = 2\n",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is an inconsistency between the code and the documentation. The number of training epochs is set to 2 here, but the markdown cell in section 8 (b84fda3c) states, "The loop will run for 20 epochs." Please update either the code or the markdown to ensure they are consistent.

" log_message = f\"Step {global_step}: Train Loss = {avg_train_loss:.4f}\"\n",
" \n",
" # Save checkpoint if loss is below threshold\n",
" if avg_train_loss < 0.06:\n",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Checkpointing based on a hardcoded training loss threshold (avg_train_loss < 0.06) is unreliable. This condition may never be met, or it could be met too frequently, leading to no checkpoints or too many. A more robust strategy is to save checkpoints based on improvements in the validation loss or simply at the end of each epoch.

Comment on lines 43 to 55
```python
from transformers import AutoModelForCausalLM, GemmaTokenizer
from peft import PeftModel

# Load model and adapters
model = AutoModelForCausalLM.from_pretrained("google/vaultgemma-1b")
tokenizer = GemmaTokenizer.from_pretrained("google/vaultgemma-1b")
peft_model = PeftModel.from_pretrained(model, "path/to/adapters")

# Generate response
question = "What are the symptoms of diabetes?"
response = generate_response(question)
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The code snippet in the Quick Start section is incomplete because the generate_response function is not defined. This will cause an error for users trying to run this example directly. Please include the function definition or add a note directing users to the notebook for the full implementation.

Comment on lines 49 to 51
"!pip install -q -U transformers peft accelerate bitsandbytes datasets pandas\n",
"!pip install git+https://github.com/huggingface/[email protected]\n",
"!pip install kagglehub ipywidgets opacus -q"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The transformers library is installed twice: first from pip and then immediately overwritten by an installation from a specific git commit. This is redundant. To streamline the setup, you can remove transformers from the first pip install command.

!pip install -q -U peft accelerate bitsandbytes datasets pandas
!pip install git+https://github.com/huggingface/[email protected]
!pip install kagglehub ipywidgets opacus -q

"\n",
"# Load medical dataset\n",
"medical_data = load_dataset(\"medalpaca/medical_meadow_medical_flashcards\", split=\"train\")\n",
"data = medical_data.to_pandas().head(1000)\n",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The number of samples to be used from the dataset is hardcoded as 1000. This makes the notebook less flexible for experimentation. It would be better to define this as a configurable variable at the top of the cell or in a dedicated configuration section.

NUM_SAMPLES = 1000
data = medical_data.to_pandas().head(NUM_SAMPLES)

"model = AutoModelForCausalLM.from_pretrained(\n",
" model_path,\n",
" quantization_config=quantization_config,\n",
" torch_dtype=torch.bfloat16,\n",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The torch_dtype argument is deprecated and will be removed in a future version of the transformers library. The notebook's output already includes a warning about this. You should use dtype instead for forward compatibility and to remove the warning.

    dtype=torch.bfloat16,

],
"source": [
"def generate_response(question, max_new_tokens=128, temperature=0.9, top_p=0.9):\n",
" prompt = f\"Instruction:\\nAnswer this medical question concisely.\\n\\nQuestion:\\n{question}\\n\\nResponse:\\n\"\n",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The prompt template used for inference (Answer this medical question concisely.) is different from the one used during training (Answer this question truthfully.). This inconsistency can lead to suboptimal model performance, as the model is being prompted in a way it was not trained for. For best results, the prompt templates for training and inference should be identical.

    prompt = f"Instruction:\nAnswer this question truthfully.\n\nQuestion:\n{question}\n\nResponse:\n"

@bebechien
Copy link
Collaborator

Hi @RubensZimbres
Thanks for your contribution.
Since VaultGemma is research focused model. I think Research folder is the correct place for this notebook.

So, could you modify it with the following points?

  1. move VaultGemma/VaultGemma_FineTuning_Inference_Huggingface.ipynb -> Reserach/[VaultGemma]FineTuning_with_Huggingface.ipynb
  2. run nbfmt script
$ python3 -m pip install -U --user git+https://github.com/tensorflow/docs
$ python3 -m tensorflow_docs.tools.nbfmt notebook.ipynb

@RubensZimbres
Copy link
Contributor Author

RubensZimbres commented Oct 2, 2025

Done, @bebechien , as well as gemini-code-assist issues addressed.

Copy link
Collaborator

@bebechien bebechien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@bebechien bebechien merged commit 99d909c into google-gemini:main Oct 2, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants