Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models

### Model description

Nemotron-H is a family of hybrid Mamba-Transformer models developed by NVIDIA that combines the efficiency of Mamba layers with the accuracy of Transformer architecture. The models come in two sizes:
- 8B parameter model
- 56B/47B parameter models (with a compressed 47B version using MiniPuzzle compression)

**Key Features:**
- **Hybrid Architecture:** Replaces majority of self-attention layers with Mamba layers for constant computation and memory per token
- **Superior Performance:** Up to 3x faster inference compared to similarly-sized state-of-the-art Transformer models
- **Competitive Accuracy:** Achieves better or on-par accuracy compared to Qwen-2.5-7B/72B and Llama-3.1-8B/70B
- **FP8 Training:** Introduces FP8-based training recipe achieving on-par results with BF16 training
- **Compression Technique:** MiniPuzzle compression reduces 56B model to 47B while maintaining accuracy and improving inference speed by 20%

**Technical Innovations:**
- Constant computation and memory requirements per generated token
- Novel compression via pruning and distillation technique (MiniPuzzle)
- FP8 training recipe for efficient training
- Hybrid Mamba-Transformer architecture optimization

### Open source status

- [x] The model implementation is available
- [x] The model weights are available

### Provide useful links for the implementation

HF model: https://huggingface.co/nvidia/Nemotron-H-8B-Base-8K/tree/main
arXiv paper: https://arxiv.org/abs/2504.03624

I would definitely love to integrate this into Transformers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models #38966

Model description

Open source status

Provide useful links for the implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models #38966

Description

Model description

Open source status

Provide useful links for the implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions