List of large language models

From Wikipedia, the free encyclopedia

A large language model (LLM) is a type of machine learning model designed for natural language processing tasks such as language generation. LLMs are language models with many parameters, and are trained with self-supervised learning on a vast amount of text.

List

For the training cost column, 1 petaFLOP-day equals 1 petaFLOP/sec × 1 day, or 8.64×1019 FLOP (floating point operations). Only the cost of the largest model is shown.

More information Name, Release date ...
NameRelease date[a]DeveloperNumber of parameters (billion) [b]Corpus size Training cost (petaFLOP-day)License[c]Notes
GPT-1 June 11, 2018 United States OpenAI 0.117 Unknown 1[1] MIT[2] First GPT model, decoder-only transformer. Trained for 30 days on 8 P600 GPUs.[3]
BERTOctober 2018United States Google0.340[4]3.3 billion words[4] 9[5]Apache 2.0[6] An early and influential language model.[7] Encoder-only and thus not built to be prompted or generative.[8] Training took 4 days on 64 TPUv2 chips.[9]
T5 October 2019 United States Google 11[10] 34 billion tokens[10] Apache 2.0[11] Base model for many Google projects, such as Imagen.[12]
XLNetJune 2019United States Google0.340[13]33 billion words 330Apache 2.0[14] An alternative to BERT; designed as encoder-only. Trained on 512 TPU v3 chips for 5.5 days.[15]
GPT-2February 2019United States OpenAI1.5[16]40GB[17] (~10 billion tokens)[18] 28[19]MIT[20] Trained on 32 TPUv3 chips for 1 week.[19]
GPT-3May 2020United States OpenAI175[21]300 billion tokens[18] 3640[22]Proprietary A fine-tuned variant of GPT-3, termed GPT-3.5, was made available to the public through a web interface called ChatGPT in 2022.[23]
GPT-Neo March 2021 EleutherAI 2.7[24] 825 GiB[25] Unknown MIT[26] The first of a series of free GPT-3 alternatives released by EleutherAI. GPT-Neo outperformed an equivalent-size GPT-3 model on some benchmarks, but was significantly worse than the largest GPT-3.[26]
GPT-JJune 2021EleutherAI6[27]825 GiB[25] 200[28]Apache 2.0 GPT-3-style language model trained on The Pile.
Megatron-Turing NLG October 2021[29] United States Microsoft and Nvidia 530[30] 338.6 billion tokens[30] 38000[31] Unreleased Trained for 3 months on over 2000 A100 GPUs on the NVIDIA Selene Supercomputer, for over 3 million GPU-hours.[31]
Ernie 3.0 Titan December 2021 China Baidu 260[32] 4TB Unknown Proprietary Chinese-language LLM initially used by Ernie Bot.
Claude[33] December 2021 United States Anthropic 52[34] 400 billion tokens[34] Unknown Proprietary Fine-tuned for desirable behavior in conversations.[35]
GLaM (Generalist Language Model)December 2021United States Google1200[36]1.6 trillion tokens[36] 5600[36]Proprietary Sparse mixture of experts (MoE) model, making it more expensive to train but cheaper to run inference compared to GPT-3.
GopherDecember 2021United States Google DeepMind280[37]300 billion tokens[38] 5833[39]Proprietary Later developed into the Chinchilla model.
LaMDA (Language Models for Dialog Applications)January 2022United States Google137[40]1.56T words,[40] 168 billion tokens[38] 4110[41]Proprietary Specialized for response generation in conversations.
GPT-NeoXFebruary 2022EleutherAI20[42]825 GiB[25] 740[28]Apache 2.0 Based on the Megatron architecture.
ChinchillaMarch 2022United States Google DeepMind70[43]1.4 trillion tokens[43][38] 6805[39]Proprietary Reduced-parameter model trained on more data. Used in the Sparrow bot. Often cited for its neural scaling law.
PaLM (Pathways Language Model)April 2022United States Google540[44]768 billion tokens[43] 29,250[39]Proprietary Trained for ~60 days on ~6000 TPU v4 chips.[39]
OPT (Open Pretrained Transformer)May 2022United States Meta175[45]180 billion tokens[46] 310[28]Non-commercial research[d] GPT-3 architecture with some adaptations from Megatron. The training logbook written by the team was published.[47]
YaLM 100B June 2022 Russia Yandex 100[48] 1.7TB[48] Unknown Apache 2.0 English-Russian model based on Microsoft's Megatron-LM.
Minerva June 2022 United States Google 540[49] 38.5B tokens from webpages filtered for mathematical content and from papers submitted to the arXiv preprint server[49] Unknown Proprietary For solving "mathematical and scientific questions using step-by-step reasoning".[50] Initialized from PaLM models, then finetuned on mathematical and scientific data.
BLOOM July 2022 Large collaboration led by Hugging Face 175[51] 350 billion tokens (1.6TB)[52] Unknown Responsible AI Essentially GPT-3 but trained on a multi-lingual corpus (30% English excluding programming languages).
GalacticaNovember 2022United States Meta120106 billion tokens[53] UnknownCC-BY-NC-4.0 Trained on scientific text and modalities.
AlexaTM (Teacher Models) November 2022 United States Amazon 20[54]1.3 trillion[55] Unknown Proprietary[56] Uses a bidirectional sequence-to-sequence architecture.
LlamaFebruary 2023United States Meta AI65[57]1.4 trillion[57] 6300[58]Non-commercial research[e] Corpus has 20 languages. "Overtrained" (compared to the Chinchilla scaling law) for better performance with fewer parameters.[57][failed verification]
GPT-4March 2023United States OpenAIUnknown[f]
(According to rumors: 1760)[60]
Unknown Unknown,
estimated 230,000
Proprietary Now available for all ChatGPT users; used in several products.
Cerebras-GPT March 2023 United States Cerebras 13[61] 270[28]Apache 2.0 Trained with the Chinchilla neural scaling formula.
FalconMarch 2023United Arab Emirates Technology Innovation Institute40[62]1 trillion tokens, from RefinedWeb (filtered web text corpus)[63] plus some "curated corpora".[64] 2800[58]Apache 2.0[65]
BloombergGPT March 2023 United States Bloomberg L.P. 50363 billion tokens from Bloomberg's proprietary data sources, plus 345 billion tokens from general purpose datasets[66] Unknown Unreleased Designed for financial tasks.[66]
PanGu-Σ March 2023 China Huawei 1085 329 billion tokens[67] Unknown Proprietary
OpenAssistant[68] March 2023 Germany LAION 17 1.5 trillion tokens Unknown Apache 2.0 Trained on crowdsourced, open conversational data.
Jurassic-2[69][70] March 2023 Israel AI21 Labs Unknown Unknown Unknown Proprietary
PaLM 2 (Pathways Language Model 2)May 2023United States Google340[71]3.6 trillion tokens[71] 85,000[58]Proprietary Used in the Bard chatbot.[72]
YandexGPT May 17, 2023 Russia Yandex Unknown Unknown UnknownProprietary Used in the Alice chatbot.
Llama 2July 2023United States Meta AI70[73]2 trillion tokens[73] 21,000Llama 2 license Trained over 3.3 million GPU (A100) hours.[74]
Claude 2 July 2023 United States Anthropic Unknown Unknown UnknownProprietary Used in the Claude chatbot.[75]
Granite 13b July 2023 United States IBM Unknown Unknown UnknownProprietary Used in IBM Watsonx.[76]
Mistral 7B September 2023 France Mistral AI 7.3[77] Unknown Unknown Apache 2.0
YandexGPT 2 September 7, 2023 Russia Yandex Unknown Unknown UnknownProprietary Used in the Alice chatbot.
Claude 2.1 November 2023 United States Anthropic Unknown Unknown UnknownProprietary Used in the Claude chatbot. Has a context window of 200,000 tokens, or ~500 pages.[78]
Grok 1[79] November 2023 United States xAI 314 Unknown UnknownApache 2.0 Used in the Grok chatbot. Grok 1 has a context length of 8,192 tokens and has access to X (Twitter).[80]
Gemini 1.0 December 2023 United States Google DeepMind Unknown Unknown UnknownProprietary Multimodal model, comes in three sizes. Used in the chatbot of the same name.[81]
Mixtral 8x7B December 2023 France Mistral AI 46.7 Unknown UnknownApache 2.0 Outperforms GPT-3.5 and Llama 2 70B on many benchmarks.[82] Mixture of experts model, with 12.9 billion parameters activated per token.[83]
DeepSeek-LLM November 29, 2023 China DeepSeek 67 2T tokens[84]:table 2 12,000 DeepSeek License Trained on English and Chinese text. Used 1024 training FLOPs for 67B model, 10b FLOPs for 7B.[84]:figure 5
Phi-2 December 2023 United States Microsoft 2.7 1.4T tokens 419[85]MIT Trained on real and synthetic "textbook-quality" data over 14 days on 96 A100 GPUs.[85]
Gemini 1.5 February 2024 United States Google DeepMind Unknown Unknown UnknownProprietary Multimodal model based on a MoE architecture. Context window above 1 million tokens.[86]
Gemini Ultra February 2024 United States Google DeepMind Unknown Unknown Unknown Proprietary
GemmaFebruary 2024United States Google DeepMind76T tokensUnknownGemma Terms of Use[87]
OLMo February 2024 United States Allen Institute for AI 7[88] 2T tokens[89] Unknown Apache 2.0
Claude 3 March 2024 United States Anthropic Unknown Unknown Unknown Proprietary Includes three models: Haiku, Sonnet, and Opus.[90]
DBRX March 2024 United States Databricks and Mosaic ML 136 12T tokens Unknown Databricks Open Model License[91][92] Training cost 10 million USD.[citation needed]
YandexGPT 3 Pro March 28, 2024 Russia Yandex Unknown Unknown UnknownProprietary Used in Alice chatbot.
Fugaku-LLM May 2024 Japan Fujitsu, Tokyo Institute of Technology, etc. 13 380B tokens Unknown Fugaku-LLM Terms of Use[93] The largest model ever trained on CPU-only, on the Fugaku supercomputer.[94]
Chameleon May 2024 United States Meta AI 34[95] 4.4 trillion Unknown Non-commercial research[96]
Mixtral 8x22B[97] April 17, 2024 France Mistral AI 141 Unknown UnknownApache 2.0
Phi-3 April 23, 2024 United States Microsoft 14[98] 4.8T tokens[citation needed] Unknown MIT Marketed by Microsoft as a "small language model".[99]
Granite Code Models May 2024 United States IBM Unknown Unknown UnknownApache 2.0
YandexGPT 3 Lite May 28, 2024 Russia Yandex Unknown Unknown UnknownProprietary Used in the Alice chatbot.
Qwen2 June 2024 China Alibaba Cloud 72[100] 3T tokens Unknown Qwen License Multiple sizes, the smallest being 0.5B.
DeepSeek-V2 June 2024 China DeepSeek 236 8.1T tokens 28,000 DeepSeek License 1.4M hours on H800.[101]
Nemotron-4 June 2024 United States Nvidia 340 9T tokens 200,000 NVIDIA Open Model License[102][103] Trained for 1 epoch. Trained on 6144 H100 GPUs between December 2023 and May 2024.[104][105]
Claude 3.5 June 2024 United States Anthropic Unknown Unknown Unknown Proprietary Initially, only one model, Sonnet, was released.[106] In October 2024, Sonnet 3.5 was upgraded, and Haiku 3.5 became available.[107]
Llama 3.1 July 2024 United States Meta AI 405 15.6T tokens 440,000 Llama 3 license 405B version took 31 million hours on H100-80GB, at 3.8E25 FLOPs.[108][109]
Grok-2 August 14, 2024 United States xAI Unknown Unknown Unknown xAI Community License Agreement[110][111] Originally closed-source, then re-released as "Grok 2.5" under a source-available license in August 2025.[112][113]
OpenAI o1 September 12, 2024 United States OpenAI Unknown Unknown Unknown Proprietary First LLM described as a "reasoning model".[114][115][better source needed]
YandexGPT 4 Lite and Pro October 24, 2024 Russia Yandex Unknown Unknown UnknownProprietary Used in the Alice chatbot.
Mistral Large November 2024 France Mistral AI 123 Unknown Unknown Mistral Research License Upgraded over time. The latest version is 24.11.[116]
Pixtral November 2024 France Mistral AI 123 Unknown Unknown Mistral Research License Multimodal. There is also a 12B version which is under Apache 2 license.[116]
OLMo 2 November 2024 United States Allen Institute for AI 32[117][118] 6.6T tokens[118] 15,000[118] Apache 2.0 Initially had 7B and 13B parameter variants, with 32B released later.
Phi-4 December 12, 2024 United States Microsoft 14[119] 9.8T tokens Unknown MIT Marketed by Microsoft as a "small language model".[120]
DeepSeek-V3 December 2024 China DeepSeek 671 14.8T tokens 56,000 MIT Used 2.788M training hours on H800 GPUs.[121] Originally released under the DeepSeek License, then re-released under the MIT License as "DeepSeek-V3-0324" in March 2025.[122]
Amazon Nova December 2024 United States Amazon Unknown Unknown Unknown Proprietary Includes three models: Nova Micro, Nova Lite, and Nova Pro.[123]
DeepSeek-R1 January 2025 China DeepSeek 671 Not applicable Unknown MIT No pretraining; reinforcement-learned upon V3-Base.[124][125]
Qwen2.5 January 2025 China Alibaba 72 18T tokens Unknown Qwen License 7 dense models with parameter counts from 0.5B to 72B. Alibaba also released 2 MoE variants.[126]
MiniMax-Text-01 January 2025 China Minimax 456 4.7T tokens[127] Unknown Minimax Model license [128][127]
Gemini 2.0 February 2025 United States Google DeepMind Unknown Unknown UnknownProprietary Three models released: Flash, Flash-Lite and Pro.[129][130][131]
Claude 3.7 February 24, 2025 United States Anthropic Unknown Unknown Unknown Proprietary One model, Sonnet 3.7.[132]
YandexGPT 5 Lite Pretrain and Pro February 25, 2025 Russia Yandex Unknown Unknown UnknownProprietary Used in the Alice Neural Network chatbot.
GPT-4.5 February 27, 2025 United States OpenAI Unknown Unknown Unknown Proprietary OpenAI's largest non-reasoning model at the time.[133]
Grok 3 February 2025 United States xAI Unknown Unknown Unknown Proprietary Training cost claimed to be "10x the compute of previous state-of-the-art models".[134]
Gemini 2.5 March 25, 2025 United States Google DeepMind Unknown Unknown Unknown Proprietary Three models released: Flash, Flash-Lite and Pro.[135]
YandexGPT 5 Lite Instruct March 31, 2025 Russia Yandex Unknown Unknown UnknownProprietary Used in the Alice Neural Network chatbot.
Llama 4 April 5, 2025 United States Meta AI 400 40T tokens Unknown Llama 4 license [136][137]
OpenAI o3 and o4-mini April 16, 2025 United States OpenAI Unknown Unknown Unknown Proprietary Reasoning models.[138]
Qwen3 April 2025 China Alibaba Cloud 235 36T tokens Unknown Apache 2.0 Multiple sizes, the smallest being 0.6B.[139]
Claude 4 May 22, 2025 United States Anthropic Unknown Unknown Unknown Proprietary Includes two models, Sonnet and Opus.[140]
Sarvam-M May 23, 2025 India Sarvam AI 24 Unknown Unknown Apache 2.0 Hybrid reasoning model fine-tuned on Mistral Small base; optimized for math, programming, and Indian languages.[141][142]
Grok 4 July 9, 2025 United States xAI Unknown Unknown Unknown Proprietary [citation needed]
Param-1 July 21, 2025 BharatGen 2.9[143] 5T tokens "focus[ed] on India’s linguistic landscape"[143] Unknown Unknown
GLM-4.5 July 29, 2025 China Zhipu AI 355 22T tokens[144][g] Unknown MIT Released in 335B and 106B sizes.[145]
GPT-OSS August 5, 2025 United States OpenAI 117 Unknown Unknown Apache 2.0 Released in 20B and 120B sizes.[146]
Claude 4.1 August 5, 2025 United StatesAnthropic Unknown Unknown Unknown Proprietary Includes one model, Opus.[147]
GPT-5 August 7, 2025 United States OpenAI Unknown Unknown Unknown Proprietary Includes three models: GPT-5, GPT-5 mini, and GPT-5 nano. GPT-5 is available in ChatGPT and API. It includes reasoning abilities. [148][149]
DeepSeek-V3.1 August 21, 2025 China DeepSeek 671 15.639T MIT Based on DeepSeek V3 (trained on 14.8T tokens); further trained on 839B tokens from the extension phases (630B + 209B).[150] A hybrid model that can switch between thinking and non-thinking modes.[151]
YandexGPT 5.1 Pro August 28, 2025 Russia Yandex Unknown Unknown UnknownProprietary Used in the Alice Neural Network chatbot.
Apertus September 2, 2025 Switzerland ETH Zurich and EPF Lausanne 70 15 trillion[152] Unknown Apache 2.0 The first LLM to be compliant with theArtificial Intelligence Act of the European Union.[153]
Claude Sonnet 4.5 September 29, 2025 United States Anthropic Unknown Unknown Unknown Proprietary [154]
DeepSeek-V3.2-Exp September 29, 2025 China DeepSeek 685 MIT Experimental model built upon v3.1-Terminus; uses a custom DeepSeek Sparse Attention (DSA) model.[155][156][157]
GLM-4.6 September 30, 2025 China Zhipu AI 357 Apache 2.0 [158][159][160]
Alice AI LLM 1.0 October 28, 2025 Russia Yandex Unknown Unknown UnknownProprietary Available in the Alice AI chatbot.
Gemini 3 November 18, 2025 United States Google DeepMind Unknown Unknown Unknown Proprietary Two models released: Deep Think and Pro.[161]
Olmo 3[162] November 20, 2025 United States Allen Institute for AI 32 5.9T tokens[163] Unknown Apache 2.0 Includes 7B and 32B parameter versions, alongside reasoning and instruction-following models.[163]
Claude Opus 4.5 November 24, 2025 United States Anthropic Unknown Unknown Unknown Proprietary The largest model in the Claude family.[164]
GPT 5.2 December 11, 2025 United States OpenAI Unknown Unknown Unknown Proprietary It was able to solve an open problem in statistical learning theory that had previously remained unresolved by human researchers.[165]
GLM-4.7 December 22, 2025 China Zhipu AI 355 Apache 2.0 MoE architecture. Open-source state-of-the-art on coding benchmarks.[citation needed] A smaller Flash variant (30B-A3B) was released on January 19, 2026.
Qwen3-Max-Thinking January 26, 2026 China Alibaba Cloud Unknown Unknown Unknown Proprietary Proprietary reasoning model with adaptive tool-use, test-time scaling, and iterative self-reflection.[166]
Kimi K2.5 January 27, 2026 China Moonshot AI 1040 15T tokens Modified MIT License Multimodal MoE with 32B active parameters, derived from Kimi K2.[167] Can use "Agent Swarm" technology to coordinate up to 100 parallel sub-agents.[168][169]
Claude Opus 4.6 February 5, 2026 United States Anthropic Unknown Unknown Unknown Proprietary
GPT-5.3-Codex February 5, 2026 United StatesOpenAI Unknown Unknown Unknown Proprietary
GLM-5 February 12, 2026 China Zhipu AI 754 MIT Specialized for agentic engineering and long-horizon tasks. Integrates DeepSeek Sparse Attention (DSA) for 200K context.
Param-2 February 17, 2026 BharatGen 17 ~22T tokens Unknown

Unknown

Mixture-of-experts model, successor of Param-1; many more Indic languages are supported. Trained on H100 GPUs for 24 days.[170]
Sarvam-1[171] February 18, 2026[h] India Sarvam AI 105 ~12T Tokens Unknown

Apache 2.0

India's first independently-trained foundation model; has 105B and 30B versions.[173] Based on mixture-of-experts model, using only 10.3B active parameters at a time.[174] Superior in Indic languages.[compared to?]
Close

See also

Notes

  1. This is the date that documentation describing the model's architecture was first released.
  2. In many cases, researchers release or report on multiple versions of a model having different sizes. In these cases, the size of the largest model is listed here.
  3. This is the license of the pre-trained model weights. In almost all cases the training code itself is open-source or can be easily replicated. LLMs may be licensed differently from the chatbots that use them; for the licenses of chatbots, see List of chatbots.
  4. The smaller models including 66B are publicly available, while the 175B model is available on request.
  5. Facebook's license and distribution scheme restricted access to approved researchers, but the model weights were leaked and became widely available.
  6. As stated in Technical report: "Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method ..."[59]
  7. Corpus size was calculated by combining the 15 trillion tokens and the 7 trillion tokens pre-training mix.
  8. An early checkpoint of the model was released in January.[172]

References

Related Articles

Wikiwand AI