-
Notifications
You must be signed in to change notification settings - Fork 370
docs: add sub-billion slicing guides and config tool #259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Solventerritory
wants to merge
51
commits into
google-gemini:main
Choose a base branch
from
Solventerritory:feat/sub-billion-slicing-docs
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+2,030
−0
Open
Changes from 1 commit
Commits
Show all changes
51 commits
Select commit
Hold shift + click to select a range
47d52aa
docs: add sub-billion slicing guides and config tool
Solventerritory c7af99d
Update QUICK_START_SUB_BILLION_MODELS.md
Solventerritory 2846b17
Update RESPONSE_SUB_BILLION_AND_AUDIO_SLICING.md
Solventerritory 04d2493
Update custom_slicing_configs.py
Solventerritory 71cd8ef
Update README_SUB_BILLION_MODELS.md
Solventerritory 66cc439
Update QUICK_START_SUB_BILLION_MODELS.md
Solventerritory 7591a04
Update RESPONSE_SUB_BILLION_AND_AUDIO_SLICING.md
Solventerritory 8bb810e
Update RESPONSE_SUB_BILLION_AND_AUDIO_SLICING.md
Solventerritory 9a110fe
Update FEATURE_REQUEST_RESPONSE_SUMMARY.md
Solventerritory 98e5dbc
Update README_SUB_BILLION_MODELS.md
Solventerritory cd06f99
Update custom_slicing_configs.py
Solventerritory 2730165
Update QUICK_START_SUB_BILLION_MODELS.md
Solventerritory c420a48
Update QUICK_START_SUB_BILLION_MODELS.md
Solventerritory 479bfd4
Update custom_slicing_configs.py
Solventerritory 275aa40
Update custom_slicing_configs.py
Solventerritory cc57db7
Update RESPONSE_SUB_BILLION_AND_AUDIO_SLICING.md
Solventerritory 15db11a
Update custom_slicing_configs.py
Solventerritory d77db05
Update custom_slicing_configs.py
Solventerritory 1115309
Update custom_slicing_configs.py
Solventerritory cce08cf
Update QUICK_START_SUB_BILLION_MODELS.md
Solventerritory e1d1cbe
Update RESPONSE_SUB_BILLION_AND_AUDIO_SLICING.md
Solventerritory d0b531d
Update custom_slicing_configs.py
Solventerritory cc29a95
Update FEATURE_REQUEST_RESPONSE_SUMMARY.md
Solventerritory fd5a249
Update QUICK_START_SUB_BILLION_MODELS.md
Solventerritory 1e7de5d
Update RESPONSE_SUB_BILLION_AND_AUDIO_SLICING.md
Solventerritory 178905d
Update QUICK_START_SUB_BILLION_MODELS.md
Solventerritory 8ef151e
Update custom_slicing_configs.py
Solventerritory 515e42a
Update custom_slicing_configs.py
Solventerritory 735dfae
Update custom_slicing_configs.py
Solventerritory 6201623
Update custom_slicing_configs.py
Solventerritory 8bcd553
Update custom_slicing_configs.py
Solventerritory 1b85636
Update README_SUB_BILLION_MODELS.md
Solventerritory c15543a
Update custom_slicing_configs.py
Solventerritory 43b628c
Update QUICK_START_SUB_BILLION_MODELS.md
Solventerritory 16a0a22
Update FEATURE_REQUEST_RESPONSE_SUMMARY.md
Solventerritory c88589d
custom_slicing_configs
Solventerritory 78783d1
Merge branch 'feat/sub-billion-slicing-docs' of https://github.com/So…
Solventerritory b382eeb
Update RESPONSE_SUB_BILLION_AND_AUDIO_SLICING.md
Solventerritory 6e753fa
Update RESPONSE_SUB_BILLION_AND_AUDIO_SLICING.md
Solventerritory 5c1c14c
Update README_SUB_BILLION_MODELS.md
Solventerritory d17d4e8
Update RESPONSE_SUB_BILLION_AND_AUDIO_SLICING.md
Solventerritory c256d00
Update custom_slicing_configs.py
Solventerritory e17b2fb
Update FEATURE_REQUEST_RESPONSE_SUMMARY.md
Solventerritory 76f6cf0
Update QUICK_START_SUB_BILLION_MODELS.md
Solventerritory f2f54a2
Update QUICK_START_SUB_BILLION_MODELS.md
Solventerritory 575dfc2
Update QUICK_START_SUB_BILLION_MODELS.md
Solventerritory 8dbba36
Update QUICK_START_SUB_BILLION_MODELS.md
Solventerritory 062775b
Update RESPONSE_SUB_BILLION_AND_AUDIO_SLICING.md
Solventerritory dcc1c08
Update custom_slicing_configs.py
Solventerritory ed66f8f
Update custom_slicing_configs.py
Solventerritory 5357fb5
Update custom_slicing_configs.py
Solventerritory File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,261 @@ | ||||||
| # Feature Request Response Summary | ||||||
|
|
||||||
| ## Issue | ||||||
| **Request**: Create sub-billion Gemma 3n models (0.9B or smaller) with 26 layers for mobile deployment (4-6GB RAM), and explore audio encoder layer slicing. | ||||||
|
|
||||||
| **Status**: ✅ **ADDRESSED WITH COMPREHENSIVE GUIDANCE** | ||||||
|
|
||||||
| --- | ||||||
|
|
||||||
| ## Solution Overview | ||||||
|
|
||||||
| I've created detailed technical guidance on: | ||||||
| 1. ✅ **Creating 0.9B models** with optimal slicing configurations | ||||||
| 2. ✅ **Sub-billion alternatives** (0.5B, 0.7B, 1.3B options) | ||||||
| 3. ✅ **Audio encoder slicing** approach and implementation requirements | ||||||
| 4. ✅ **Practical implementation guide** for MatFormer Lab notebook | ||||||
| 5. ✅ **Performance predictions** and deployment recommendations | ||||||
|
|
||||||
| --- | ||||||
|
|
||||||
| ## Deliverables | ||||||
|
|
||||||
| ### 1. **RESPONSE_SUB_BILLION_AND_AUDIO_SLICING.md** 📋 | ||||||
| **Comprehensive technical analysis document** | ||||||
|
|
||||||
| Contains: | ||||||
| - Feasibility assessment (YES, both text and audio slicing are possible) | ||||||
| - Detailed 0.9B model configuration (26 layers) | ||||||
| - Alternative sub-billion configs (0.5B, 0.7B, 1.3B, 1.5B) | ||||||
| - Audio encoder slicing approach | ||||||
| - Implementation roadmap | ||||||
| - Pareto frontier analysis | ||||||
| - Performance predictions (MMLU, inference speed, memory) | ||||||
| - Deployment recommendations for 4-6GB RAM devices | ||||||
|
|
||||||
| **Key Finding**: | ||||||
| - 0.9B model achieves **46-48% MMLU** (vs E2B's 50.9%) | ||||||
| - Fits in **1.5GB with 4-bit quantization** (vs E2B's 2.9GB) | ||||||
| - Maintains **50-100 tokens/sec inference** speed | ||||||
|
|
||||||
| --- | ||||||
|
|
||||||
| ### 2. **QUICK_START_SUB_BILLION_MODELS.md** 🚀 | ||||||
| **Practical quick-start guide for users** | ||||||
|
|
||||||
| Contains: | ||||||
| - TL;DR implementation in 5 minutes | ||||||
| - Step-by-step instructions for MatFormer Lab | ||||||
| - Configuration presets for different scenarios: | ||||||
| - Mobile (4GB RAM): 0.9B | ||||||
| - Web browser: 0.5B | ||||||
| - High-end mobile: 1.3B | ||||||
| - FFN dimension strategy explanation | ||||||
| - Inference optimization tips | ||||||
| - Performance benchmarks | ||||||
| - Troubleshooting guide | ||||||
|
|
||||||
| **Recommended Configuration**: | ||||||
| ```python | ||||||
| layers_to_skip = [19, 20, 21, 22, 23, 24, 25, 26, 27] | ||||||
| ffn_hidden_dims = [2048*3]*10 + [2048*3.5]*9 + [2048*4]*7 | ||||||
|
||||||
| ffn_hidden_dims = [2048*3]*10 + [2048*3.5]*9 + [2048*4]*7 | |
| ffn_hidden_dims = [2048*3]*10 + [int(2048*3.5)]*9 + [2048*4]*7 |
Solventerritory marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Solventerritory marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Solventerritory marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Solventerritory marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Solventerritory marked this conversation as resolved.
Show resolved
Hide resolved
Solventerritory marked this conversation as resolved.
Show resolved
Hide resolved
Solventerritory marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Solventerritory marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code snippet for the recommended configuration uses float multiplication (
2048*3.5). While Python calculates this correctly, model layer dimensions are typically expected to be integers. Using floats in a configuration can lead to unexpected behavior or errors in downstream tools. To ensure robustness, please explicitly cast the result to an integer.