Skip to content

Conversation

@Solventerritory
Copy link

@Solventerritory Solventerritory commented Nov 14, 2025

Summary

What: Adds documentation and a small tooling script to enable creation, validation, and export of sub‑billion (≤1B params) Gemma 3n submodels for resource‑constrained deployments (mobile/web). Includes a recommended 0.9B (26‑layer) config and guidance to extend slicing to the audio encoder.
Why: Provide teams with tested, ready‑to‑use slicing configs and an implementation path so Gemma can run efficiently on 4–6GB mobile devices and in browser environments.

#256

What I Changed

Docs:
RESPONSE_SUB_BILLION_AND_AUDIO_SLICING.md — deep technical analysis, FFN/layer strategies, audio encoder design
QUICK_START_SUB_BILLION_MODELS.md — step‑by‑step quickstart, troubleshooting, deployment tips
FEATURE_REQUEST_RESPONSE_SUMMARY.md — executive summary, recommendations, FAQ
README_SUB_BILLION_MODELS.md — navigation + TL;DR for quick adoption
INDEX_SUB_BILLION_RESPONSE.txt — consolidated index of created assets
Tooling:
custom_slicing_configs.py — runnable helper that lists presets, validates configs, and exports MatFormer Lab snippets

@gemini-code-assist
Copy link

Summary of Changes

Hello @Solventerritory, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a complete package of documentation and a utility script designed to enable the efficient deployment of Gemma 3n models on devices with limited resources, such as mobile phones and web browsers. It provides clear, actionable guidance for creating sub-billion parameter models, including a recommended 0.9B configuration, and explores the potential for extending these optimization techniques to the audio encoder. The changes aim to make Gemma 3n more accessible for a wider range of applications by offering tested and ready-to-use slicing configurations.

Highlights

  • Sub-Billion Model Guidance: Comprehensive documentation and a tooling script are introduced to facilitate the creation, validation, and export of sub-billion (≤1B parameters) Gemma 3n submodels for resource-constrained deployments.
  • Recommended 0.9B Configuration: A specific 0.9B (26-layer) model configuration is recommended as an optimal Pareto point for mobile devices with 4-6GB RAM, fitting in approximately 1.5GB when 4-bit quantized.
  • Audio Encoder Slicing Design: Guidance is provided for extending the slicing methodology to the audio encoder, outlining its feasibility and implementation requirements, though it currently requires custom implementation.
  • Tooling for Configuration Management: A new Python script, custom_slicing_configs.py, offers pre-defined sub-billion configurations, validation checks, and export capabilities for integration with the MatFormer Lab notebook.
  • Performance Metrics Included: Detailed performance predictions, including MMLU accuracy, quantized size, and inference speed, are included for various sub-billion model options to aid in deployment decisions.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is an excellent contribution, adding comprehensive documentation and a utility script for creating sub-billion parameter Gemma models. The guides are well-structured and provide valuable information for developers targeting resource-constrained environments. My review focuses on improving the correctness of code snippets, enhancing the maintainability of the documentation, and increasing the robustness of the Python script. The key suggestions involve ensuring model dimensions are integers, fixing minor inaccuracies in the documentation, and adding safeguards to the code.

@gemini-code-assist
Copy link

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

Solventerritory and others added 8 commits November 14, 2025 22:53
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@Solventerritory
Copy link
Author

@evansenter @markmcd @smsohan @justinmahood is it good to merge

@Solventerritory
Copy link
Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive set of documentation and a helper script for creating sub-billion parameter Gemma 3n models. The documentation is thorough and covers various aspects from quick-start to deep technical details. The Python script provides useful presets and validation. My main feedback is to address a recurring issue where floating-point numbers are used for FFN hidden dimensions, which will cause errors. I've left specific suggestions to cast these to integers. Additionally, there's a significant amount of duplicated content across the documentation files, which could be streamlined to improve maintainability by using a single source of truth for configurations.

1. Open: Gemma/[Gemma_3n]MatFormer_Lab.ipynb
2. In "Config details" cell, set:
layers_to_skip = [19, 20, 21, 22, 23, 24, 25, 26, 27]
ffn_hidden_dims = [2048*3]*10 + [2048*3.5]*9 + [2048*4]*7

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The multiplication 2048*3.5 will result in a float value (7168.0). Model layer dimensions must be integers. Please cast the result to an integer to avoid potential errors when this code is used.

   ffn_hidden_dims = [2048*3]*10 + [int(2048*3.5)]*9 + [2048*4]*7

Solventerritory and others added 8 commits November 14, 2025 23:03
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@Solventerritory
Copy link
Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive set of documentation and a utility script for creating sub-billion parameter Gemma 3n models. The documentation is exceptionally detailed and well-structured, providing clear, actionable guidance for developers. The inclusion of a Python script for generating and validating configurations is a fantastic addition that will greatly aid in creating custom models.

My review focuses on ensuring consistency and improving maintainability. The primary issue identified is the inconsistent use of int() when calculating ffn_hidden_dims in various code examples, which can lead to float values where integers are expected. I have also suggested some minor enhancements to the Python script to replace magic numbers with named constants, which will make the code easier to understand and maintain in the long run.

Overall, this is an excellent contribution that will be highly valuable for teams working on resource-constrained deployments.

Solventerritory and others added 4 commits November 14, 2025 23:13
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@Solventerritory
Copy link
Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive set of documentation and a Python helper script for creating and using sub-billion parameter Gemma 3n models. The documentation is exceptionally detailed, well-structured, and provides clear guidance for users with different levels of expertise, from quick-start guides to deep technical analyses. The Python script is a valuable tool for programmatically generating and validating model configurations.

My review includes a critical fix for a bug in the Python script that would cause it to crash, along with a few suggestions to improve clarity and correctness in the documentation and the script's output. Overall, this is an excellent contribution that will be highly valuable for developers working in resource-constrained environments.

Comment on lines +186 to +213
def validate_config(config: Dict, base_model_num_layers: int = 35, base_hidden_size: int = 2048, max_ffn_dim: int = 16384) -> Tuple[bool, List[str]]:
"""
Validate a configuration for logical consistency.

Returns:
(is_valid, list_of_errors)
"""
errors = []

expected_layers = BASE_MODEL_NUM_LAYERS - len(config["layers_to_skip"])
if expected_layers != config["num_layers"]:
errors.append(
f"Layer mismatch: {expected_layers} expected but {config['num_layers']} specified"
)

if len(config["ffn_hidden_dims"]) != config["num_layers"]:
errors.append(
f"FFN dims length ({len(config['ffn_hidden_dims'])}) != num_layers ({config['num_layers']})"
)

if not all(isinstance(d, int) for d in config["ffn_hidden_dims"]):
errors.append("FFN dimensions must be integers.")

for dim in config["ffn_hidden_dims"]:
if dim < BASE_HIDDEN_SIZE or dim > MAX_FFN_DIM:
errors.append(f"FFN dimension {dim} outside reasonable range [{BASE_HIDDEN_SIZE}, {MAX_FFN_DIM}]")

return len(errors) == 0, errors

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The validate_config function will raise a NameError at runtime because it references undefined variables. The function parameters are defined in snake_case (e.g., base_model_num_layers), but they are used in UPPER_SNAKE_CASE (e.g., BASE_MODEL_NUM_LAYERS) inside the function. This needs to be corrected for the script to run.

def validate_config(config: Dict, base_model_num_layers: int = 35, base_hidden_size: int = 2048, max_ffn_dim: int = 16384) -> Tuple[bool, List[str]]:
    """
    Validate a configuration for logical consistency.
    
    Returns:
        (is_valid, list_of_errors)
    """
    errors = []
    
    expected_layers = base_model_num_layers - len(config["layers_to_skip"])
    if expected_layers != config["num_layers"]:
        errors.append(
            f"Layer mismatch: {expected_layers} expected but {config['num_layers']} specified"
        )
    
    if len(config["ffn_hidden_dims"]) != config["num_layers"]:
        errors.append(
            f"FFN dims length ({len(config['ffn_hidden_dims'])}) != num_layers ({config['num_layers']})"
        )
    
    if not all(isinstance(d, int) for d in config["ffn_hidden_dims"]):
        errors.append("FFN dimensions must be integers.")

    for dim in config["ffn_hidden_dims"]:
        if dim < base_hidden_size or dim > max_ffn_dim:
            errors.append(f"FFN dimension {dim} outside reasonable range [{base_hidden_size}, {max_ffn_dim}]")
    
    return len(errors) == 0, errors

Solventerritory and others added 3 commits November 14, 2025 23:45
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@Solventerritory
Copy link
Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive set of documentation and a Python helper script for creating sub-billion parameter Gemma 3n models, targeting resource-constrained environments. The documentation is extensive, providing a technical deep-dive, a quick-start guide, and performance benchmarks. The Python script is a valuable tool for managing and exporting slicing configurations.

My review focuses on ensuring consistency and accuracy across the newly added files. I've provided feedback on:

  • Correcting a misleading comment and a typo in the documentation and Python script.
  • Ensuring consistency in model configuration parameters across different files.
  • Addressing minor redundancy and formatting issues in the documentation.

Overall, this is a very well-executed and valuable addition that will greatly help users looking to deploy Gemma on mobile and web platforms. The changes are clear and the provided guidance is excellent.

Solventerritory and others added 5 commits November 14, 2025 23:54
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@Solventerritory
Copy link
Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive set of documentation and a Python helper script for creating sub-billion parameter Gemma 3n models, targeting resource-constrained environments. The documentation is exceptionally thorough, covering technical analysis, quick-start guides, and tooling. The Python script is well-designed with clear configurations and validation functions. My review primarily focuses on enhancing the clarity, consistency, and correctness of code snippets within the documentation to ensure they are complete and runnable for a better user experience. I've identified several minor inconsistencies in configuration definitions and some incomplete code examples across the new markdown files.

Solventerritory and others added 6 commits November 15, 2025 00:06
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@Solventerritory
Copy link
Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces extensive documentation and a helper script for creating smaller, sub-billion parameter Gemma models, which is a valuable addition for developers working in resource-constrained environments. The documentation is well-structured, comprehensive, and provides clear guidance. The Python script is a useful utility for generating and validating configurations. My review focuses on improving the accuracy and consistency of the information presented to avoid potential user confusion. I've identified a few areas for improvement, including a misleading calculation in the Python script that underestimates model sizes, a confusing file self-reference, and a hardcoded path in the documentation. Addressing these points will enhance the overall quality and usability of these new assets.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@Solventerritory
Copy link
Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive set of documentation and a helper script for creating sub-billion parameter Gemma 3n models. The documentation is thorough, well-structured, and provides clear guidance for users looking to deploy these models in resource-constrained environments. The Python script is a great addition for programmatically generating and validating configurations.

My review includes a few suggestions to improve consistency and correctness in the provided configurations and documentation. Specifically, I've pointed out a configuration error in one of the text files, a formatting inconsistency in a markdown guide, and a missing data point in the Python script's configuration dictionaries that affects the generated output.

@Solventerritory
Copy link
Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive set of documentation and a helper script for creating sub-billion parameter Gemma 3n models, targeting resource-constrained environments. The documentation is well-structured, providing deep technical analysis, quick-start guides, and performance metrics. The Python script is a valuable tool for generating and validating model configurations. My review focuses on improving consistency across the different files, correcting minor errors in code snippets, and enhancing the utility of the helper script to ensure a clear and error-free experience for developers.

Configuration: 0.9B (26 layers)

layers_to_skip = [19, 20, 21, 22, 23, 24, 25, 26, 27]
ffn_hidden_dims = [2048*3]*10 + [2048*3.5]*9 + [2048*4]*7

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The ffn_hidden_dims configuration for the 0.9B model appears to have a typo. The expression [2048*3.5]*9 will create a list of floats, but model configuration dimensions typically require integers. In other files and in the Python script, this is correctly written as [int(2048*3.5)]*9. To ensure consistency and prevent potential errors for users who copy-paste this configuration, it should be corrected.

ffn_hidden_dims = [2048*3]*10 + [int(2048*3.5)]*9 + [2048*4]*7

Comment on lines +150 to +152
# Best: 0.5B with extreme quantization
layers_to_skip = [12, 13, 14, 15, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27]
ffn_hidden_dims = [2048*2]*8 + [int(2048*2.5)]*7 + [2048*3]*5

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The configuration for "Scenario 2: Web Browser (Client-side)" is not formatted as a code block, unlike the other scenarios in this document. This makes it harder to read and copy. Please wrap it in a ```python block for consistency and clarity.

Solventerritory and others added 2 commits November 16, 2025 20:36
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant