NVIDIA Introduces Nemotron 3 Model Family with Hybrid Mamba-Transformer Architecture
NVIDIA has released the Nemotron 3 family of language models, according to papers published on arXiv. The family includes three variants: Nano, Super, and Ultra, which “deliver strong agentic, reasoning, and conversational capabilities,” according to the arXiv abstract.
The models utilize a Mixture-of-Experts hybrid Mamba-Transformer architecture designed to “provide b[etter efficiency],” according to the research paper. The Nemotron 3 Nano 30B-A3B variant was pretrained on 25 trillion text tokens, “including more than 3 trillion new unique tokens over Nemotron 2,” followed by supervised fine-tuning, according to the paper’s abstract.
The architecture combines two emerging approaches in language model design: Mixture-of-Experts (MoE), which activates only subsets of model parameters for different inputs, and the Mamba-Transformer hybrid approach, which blends transformer attention mechanisms with the newer Mamba state-space model architecture.
According to the arXiv announcement, the models are described as “open” and “efficient,” suggesting availability for wider use. The research papers were published on arXiv with announcement type listed as “cross,” indicating cross-listing across multiple research categories.