Nemotron

Nvidia family of AI foundation models From Wikipedia, the free encyclopedia

Nemotron is a family of foundation models developed by Nvidia, chiefly large language models and related reasoning models. Nvidia has also used the name more broadly for associated datasets, training recipes, and developer tools; in March 2026, the company formed the Nemotron Coalition, a group of AI labs collaborating on future open models. Major releases have included the dense Nemotron-4 models in 2024, the Llama Nemotron reasoning models announced at the 2025 Consumer Electronics Show, and the hybrid Nemotron 3 family launched in late 2025.

Nvidia's keynote at CES 2025, where the company announced the Llama Nemotron family of reasoning models

History and development

The first public Nemotron-branded release was Nemotron-3 8B, which Nvidia introduced in November 2023 for enterprise chatbot and copilot development using its Nvidia NeMo framework.[1][2] In February 2024, the company published the Nemotron-4 15B Technical Report, describing a 15-billion-parameter multilingual decoder-only Transformer model trained on 8 trillion tokens.[3] In June 2024, Nvidia followed with the Nemotron-4 340B family, made up of Base, Instruct, and Reward models, intended for synthetic data generation and instruction tuning.[4] According to Nvidia's technical report, more than 98 percent of the alignment data used for the 340B family was synthetically generated.[4] A related line of research, published under the name Minitron, used pruning and knowledge distillation to compress Nemotron-4 15B into smaller 8-billion and 4-billion-parameter variants; the paper was accepted at the International Conference on Learning Representations in 2025.[5]

In parallel, Nvidia began post-training Meta's Llama models under the Nemotron brand. In October 2024, the company released Llama-3.1-Nemotron-70B-Instruct, a derivative of Meta's Llama 3.1 tuned using Nvidia's own reward model and alignment data.[6] At the Consumer Electronics Show in January 2025, Nvidia announced a broader Llama Nemotron family in three size tiers—Nano, Super, and Ultra—intended for enterprise reasoning and agentic AI tasks.[7] VentureBeat described the line as partly a response to the rise of DeepSeek R1 and other open reasoning models.[8]

During 2025, Nvidia also published technical reports on Nemotron-H (April 2025) and Nemotron Nano 2 (August 2025), both hybrid Mamba–Transformer models aimed at more efficient inference.[9][10] A more prominent expansion came in December 2025, when Nvidia announced the Nemotron 3 family. Reuters and Wired treated the launch as part of Nvidia's attempt to expand its open-model offerings and to compete more directly as a model developer, not only as a supplier of AI hardware.[11][2]

At launch, Nvidia released Nemotron 3 Nano and said that larger Super and Ultra models would follow in 2026.[11] Nvidia described the line as an open-model family and said it would provide not only weights but also training data and recipes.[2][12] In March 2026, Nvidia published a technical report for Nemotron 3 Super, while independent coverage framed the model as part of the company's push into agentic AI, in which models are combined with tools and multi-step workflows.[13][14] Later that month, Nvidia announced the Nemotron Coalition, a group of AI labs and partners intended to collaborate on open frontier models. Reporting at the time described it as groundwork for future Nemotron 4 systems, though those plans were still prospective.[15][16]

Model families and notable releases

Nemotron releases have spanned several distinct model lines: early dense Transformer models (Nemotron-3 8B, Nemotron-4 15B, and the Nemotron-4 340B Base/Instruct/Reward family),[1][3][4] the Llama Nemotron derivatives post-trained from Meta's Llama architecture,[6][7] hybrid Mamba–Transformer models such as Nemotron-H and Nemotron Nano 2,[9][10] and the later Nemotron 3 Nano and Super models.[17][13] The Nemotron 3 models were designated with rounded labels such as "30B-A3B" and "120B-A12B"; Nvidia's technical reports described Nano as having about 31.6 billion total parameters with about 3.2 billion active at a time, and Super as having about 120 billion total parameters with 12 billion active parameters.[17][13] Nvidia's current Nemotron pages also extend the brand to additional model types beyond core language models, including safety, vision, speech, and retrieval systems.[18][12]

Technical characteristics

The Nemotron architecture has shifted over successive releases from dense Transformer designs to hybrids that combine Transformer attention layers with Mamba state-space layers.[3][9][10] The Nemotron 3 line added mixture of experts routing, a design in which only part of the network is active for any given input, reducing the amount of computation needed per token.[19][17][13]

Training scale increased with each generation: Nvidia reported 8 trillion tokens for Nemotron-4 15B and 9 trillion for Nemotron-4 340B,[3][4] rising to about 20 trillion tokens for Nemotron Nano 2[10] and 25 trillion tokens for the Nemotron 3 Nano and Super models.[17][13] The Nemotron 3 white paper reported context windows of up to one million tokens.[19] Nvidia also described two architectural features specific to the Nemotron 3 line: multi-token prediction, in which the model forecasts several tokens at once to speed inference, and LatentMoE, a variant of mixture-of-experts routing that projects token representations to a lower-dimensional space before selecting experts.[13]

Release, licensing, and distribution

Nvidia distributes Nemotron model weights, training recipes, and supporting code through its developer portal, GitHub, and Hugging Face.[12][20] The company also ties Nemotron to Nvidia NeMo for model customization and to Nvidia NIM microservices for enterprise deployment.[1][12]

Earlier Nemotron releases used Nvidia's broader model licenses, including the Nvidia Open Model License for the Nemotron-4 340B family.[4] In December 2025, Nvidia introduced a separate Nemotron Open Model License. According to the license text, it permits commercial use and derivative works, states that users keep ownership of model outputs, requires preservation of certain notices, and includes a termination clause under which the license ends automatically if the licensee initiates certain patent infringement claims against Nvidia.[21]

Press coverage sometimes described Nemotron as "open-source" and sometimes as "open-weight". Reuters used the former term in its report on the December 2025 launch, while Wired used the latter in its March 2026 coverage; Nvidia's own pages described the brand more broadly as open models with associated data, recipes, and technologies.[11][22][12]

Reception and significance

Independent coverage treated Nemotron as evidence that Nvidia was trying to become a model developer as well as a chip supplier. Reuters framed the December 2025 launch against the rise of Chinese open-model efforts, and Wired described Nemotron 3 as part of Nvidia's transformation into a major model maker in its own right.[11][2] InfoWorld said the March 2026 Super release showed Nvidia targeting enterprise AI agents rather than only raw benchmark competition.[14]

Some coverage was more skeptical. AI Business reported that analysts viewed the December 2025 release as a meaningful step but not a dramatic departure from existing open models.[23] Igor's Lab argued that Nemotron's openness served primarily as a means of extending Nvidia's platform influence, since the models were optimized for Nvidia hardware.[24] Independent benchmark provider Artificial Analysis rated Nemotron 3 Super as the most capable open-weight model at its level of openness as of March 2026, while noting that it trailed closed frontier models on overall reasoning scores.[25]

In March 2026, Wired reported that Nvidia expected to spend about $26 billion over five years on open-weight models, presenting Nemotron as a central example of the company's broader strategy.[22]

References

Related Articles

Wikiwand AI