List of large language models

From Wikipedia, the free encyclopedia

A large language model (LLM) is a type of machine learning model designed for natural language processing tasks such as language generation. LLMs are language models with many parameters, and are trained with self-supervised learning on a vast amount of text.

List

For the training cost column, 1 petaFLOP-day equals 1 petaFLOP/sec × 1 day, or 8.64×1019 FLOP (floating point operations). Only the cost of the largest model is shown. The number of parameters is measured in billions,[a] and the training cost is measured in petaFLOP-days.

2018

More information Name, Release date ...
Name Release date[b] Developer Number of parameters Corpus size Training cost License[c] Notes
GPT-1 Jun 11 OpenAI 0.117B Unknown 1[1] MIT[2]
First GPT model, decoder-only transformer. Trained for 30 days on 8 P600 GPUs.[3]
BERT Oct 2018 Google 0.340B[4] 3.3B words[4] 9[5] Apache 2.0[6]
An early and influential language model.[7] Encoder-only and thus not built to be prompted or generative.[8] Training took 4 days on 64 TPUv2 chips.[4]
Close

2019

More information Name, Release date ...
Name Release date[b] Developer Number of parameters Corpus size Training cost License[c] Notes
T5 Oct 2019 Google 11B[9] 34B tokens[9] Unknown Apache 2.0[10]
Base model for Google projects like Imagen.[11]
XLNet Jun 2019 Google 0.340B[12] 33B words 330 Apache 2.0[13]
An alternative to BERT; designed as encoder-only. Trained on 512 TPU v3 chips for 5.5 days.[14]
GPT-2 Feb 2019 OpenAI 1.5B[15] 40GB[16] (~10B tokens)[17] 28[18] MIT[19]
Trained on 32 TPUv3 chips for 1 week.[18]
Close

2020

More information Name, Release date ...
Name Release date[b] Developer Number of parameters Corpus size Training cost License[c] Notes
GPT-3 May 2020 OpenAI 175B[20] 300B tokens[17] 3640[21] Proprietary
A fine-tuned variant of GPT-3, termed GPT-3.5, was made available to the public through ChatGPT in 2022.[22]
Close

2021

More information Name, Release date ...
Name Release date[b] Developer Number of parameters Corpus size Training cost License[c] Notes
GPT-Neo Mar 2021 EleutherAI 2.7B[23] 825 GiB[24] Unknown MIT[25]
The first of a series of free GPT-3 alternatives released by EleutherAI. GPT-Neo outperformed an equivalent-size GPT-3 model on some benchmarks, but was significantly worse than the largest GPT-3.[25]
GPT-J Jun 2021 EleutherAI 6B[26] 825 GiB[24] 200[27] Apache 2.0
Megatron-Turing NLG Oct 2021[28] Microsoft and Nvidia 530B[29] 338.6B tokens[29] 38000[30] Unreleased
Trained for 3 months on over 2000 A100 GPUs on the NVIDIA Selene Supercomputer, for over 3 million GPU-hours.[30]
Ernie 3.0 Titan Dec 2021 Baidu 260B[31] 4TB Unknown Proprietary
Claude[32] Dec 2021 Anthropic 52B[33] 400B tokens[33] Unknown Proprietary
Fine-tuned for desirable behavior in conversations.[34]
GLaM (Generalist Language Model) Dec 2021 Google 1200B[35] 1.6T tokens[35] 5600[35] Proprietary
Gopher Dec 2021 Google DeepMind 280B[36] 300B tokens[37] 5833[38] Proprietary
Close

2022

More information Name, Release date ...
Name Release date[b] Developer Number of parameters Corpus size Training cost License[c] Notes
LaMDA (Language Models for Dialog Applications) Jan 2022 Google 137B[39] 1.56T words,[39] 168B tokens[37] 4110[40] Proprietary
GPT-NeoX Feb 2022 EleutherAI 20B[41] 825 GiB[24] 740[27] Apache 2.0
Chinchilla Mar 2022 Google DeepMind 70B[42] 1.4T tokens[42][37] 6805[38] Proprietary
PaLM (Pathways Language Model) Apr 2022 Google 540B[43] 768B tokens[42] 29,250[38] Proprietary
Trained for ~60 days on ~6000 TPU v4 chips.[38]
OPT (Open Pretrained Transformer) May 2022 Meta 175B[44] 180B tokens[45] 310[27] Non-commercial research[d]
GPT-3 architecture with some adaptations from Megatron. The training logbook written by the team was published.[46]
YaLM 100B Jun 2022 Yandex 100B[47] 1.7TB[47] Unknown Apache 2.0
Minerva Jun 2022 Google 540B[48] 38.5B tokens from webpages filtered for math content and from arXiv[48] Unknown Proprietary
For solving "mathematical and scientific questions using step-by-step reasoning".[49]
BLOOM Jul 2022 Large collaboration led by Hugging Face 175B[50] 350B tokens (1.6TB)[51] Unknown Responsible AI
Galactica Nov 2022 Meta 120B 106B tokens[52] Unknown CC-BY-NC-4.0
AlexaTM (Teacher Models) Nov 2022 Amazon 20B[53] 1.3T[54] Unknown Proprietary[55]
Close

2023

More information Name, Release date ...
Name Release date[b] Developer Number of parameters Corpus size Training cost License[c] Notes
Llama Feb 2023 Meta AI 65B[56] 1.4T[56] 6300[57] Non-commercial research[e]
GPT-4 Mar 2023 OpenAI Unknown[f]
(According to rumors: 1760)[59]
Unknown Unknown,
estimated 230,000
Proprietary
Cerebras-GPT Mar 2023 Cerebras 13B[60] 270[27] Apache 2.0
Falcon Mar 2023 Technology Innovation Institute 40B[61] 1T tokens, from RefinedWeb (filtered web text corpus)[62] plus some "curated corpora".[63] 2800[57] Apache 2.0[64]
BloombergGPT Mar 2023 Bloomberg L.P. 50B 363B tokens from Bloomberg's proprietary data sources, plus 345B tokens from general purpose datasets[65] Unknown Unreleased
Designed for financial tasks.[65]
PanGu-Σ Mar 2023 Huawei 1085B 329B tokens[66] Unknown Proprietary
OpenAssistant[67] Mar 2023 LAION 17B 1.5T tokens Unknown Apache 2.0
Jurassic-2[68][69] Mar 2023 AI21 Labs Unknown Unknown Unknown Proprietary
PaLM 2 (Pathways Language Model 2) May 2023 Google 340B[70] 3.6T tokens[70] 85,000[57] Proprietary
Used in the Bard chatbot.[71]
YandexGPT May 17, 2023 Yandex Unknown Unknown Unknown Proprietary
Phi-1 Jun 21, 2023 Microsoft 1.3B[72] 7B tokens[72] Unknown MIT
Trained for 4 days on 8 A100s.[72]
Llama 2 Jul 2023 Meta AI 70B[73] 2T tokens[73] 21,000 Llama 2
Trained over 3.3 million GPU (A100) hours.[74]
Claude 2 Jul 2023 Anthropic Unknown Unknown Unknown Proprietary
Used in the Claude chatbot.[75]
Granite 13b Jul 2023 IBM Unknown Unknown Unknown Proprietary
Used in IBM Watsonx.[76]
Mistral 7B Sep 2023 Mistral AI 7.3B[77] Unknown Unknown Apache 2.0
YandexGPT 2 Sep 7, 2023 Yandex Unknown Unknown Unknown Proprietary
Claude 2.1 Nov 2023 Anthropic Unknown Unknown Unknown Proprietary
Used in the Claude chatbot. Has a context window of 200,000 tokens, or ~500 pages.[78]
Grok-1[79] Nov 2023 xAI 314B Unknown Unknown Apache 2.0
Used in the Grok chatbot. Grok 1 has a context length of 8,192 tokens and has access to X (Twitter).[80]
Gemini 1.0 Dec 2023 Google DeepMind Unknown Unknown Unknown Proprietary
Multimodal model, comes in three sizes. Used in the chatbot of the same name.[81]
Mixtral 8x7B Dec 2023 Mistral AI 46.7B Unknown Unknown Apache 2.0
Outperforms GPT-3.5 and Llama 2 70B on many benchmarks.[82] Mixture of experts model, with 12.9 billion parameters activated per token.[83]
DeepSeek-LLM Nov 29, 2023 DeepSeek 67B 2T tokens[84]:table 2 12,000 DeepSeek
Trained on English and Chinese text. Used 1024 training FLOPs for 67B model, 10b FLOPs for 7B.[84]:figure 5
Phi-2 Dec 2023 Microsoft 2.7B 1.4T tokens 419[85] MIT
Trained on real and synthetic "textbook-quality" data over 14 days on 96 A100 GPUs.[85]
Close

2024

More information Name, Release date ...
Name Release date[b] Developer Number of parameters Corpus size Training cost License[c] Notes
Gemini 1.5 Feb 2024 Google DeepMind Unknown Unknown Unknown Proprietary
Multimodal model based on a MoE architecture. Context window above 1 million tokens.[86]
Gemini Ultra Feb 2024 Google DeepMind Unknown Unknown Unknown Proprietary
Gemma Feb 2024 Google DeepMind 7B 6T tokens Unknown Gemma Terms of Use[87]
OLMo Feb 2024 Allen Institute for AI 7B[88] 2T tokens[89] Unknown Apache 2.0
Claude 3 Mar 2024 Anthropic Unknown Unknown Unknown Proprietary
Includes three models: Haiku, Sonnet, and Opus.[90]
DBRX Mar 2024 Databricks and Mosaic ML 136B 12T tokens Unknown Databricks Open Model[91][92]
YandexGPT 3 Pro Mar 28, 2024 Yandex Unknown Unknown Unknown Proprietary
Fugaku-LLM[93] May 2024 Fujitsu, Tokyo Institute of Technology, Tohoku University, RIKEN, etc. 13B 380B tokens Unknown Fugaku-LLM Terms of Use[94]
The largest model ever trained on CPU-only, on the Fugaku supercomputer; the model was trained from scratch on 380 billion tokens using 13,824 Fugaku nodes.[93][95]
Chameleon May 2024 Meta AI 34B[96] 4.4T Unknown Non-commercial research[97]
Mixtral 8x22B[98] Apr 17, 2024 Mistral AI 141B Unknown Unknown Apache 2.0
Phi-3 Apr 23, 2024 Microsoft 14B[99] 4.8T tokens[100] Unknown MIT
Marketed by Microsoft as a "small language model".[99]
Granite Code Models May 2024 IBM Unknown Unknown Unknown Apache 2.0
YandexGPT 3 Lite May 28, 2024 Yandex Unknown Unknown Unknown Proprietary
Qwen2 Jun 2024 Alibaba Cloud 72B[101] 3T tokens Unknown Various
DeepSeek-V2 Jun 2024 DeepSeek 236B 8.1T tokens 28,000 DeepSeek
1.4M hours on H800.[102]
Nemotron-4 Jun 2024 Nvidia 340B 9T tokens 200,000 NVIDIA Open Model[103][104]
Trained for 1 epoch. Trained on 6144 H100 GPUs between December 2023 and May 2024.[105][106]
Claude 3.5 Jun 2024 Anthropic Unknown Unknown Unknown Proprietary
Initially, only one model, Sonnet, was released.[107] In October 2024, Sonnet 3.5 was upgraded, and Haiku 3.5 became available.[108]
Llama 3.1 Jul 2024 Meta AI 405B 15.6T tokens 440,000 Llama 3
405B version took 31 million hours on H100-80GB, at 3.8E25 FLOPs.[109][110]
Grok-2 Aug 14, 2024 xAI Unknown Unknown Unknown xAI Community License Agreement[111][112]
Originally closed-source, then re-released as "Grok 2.5" under a source-available license in August 2025.[113][114]
OpenAI o1 Sep 12, 2024 OpenAI Unknown Unknown Unknown Proprietary
Sarvam-1 Oct 24, 2024 Sarvam AI 2B ~2T tokens Unknown Sarvam AI Research
Supports 10 Indic languages and English[117][118]
YandexGPT 4 Lite and Pro Oct 24, 2024 Yandex Unknown Unknown Unknown Proprietary
Mistral Large Nov 2024 Mistral AI 123B Unknown Unknown Mistral Research
Upgraded over time. The latest version is 24.11.[119]
Pixtral Nov 2024 Mistral AI 123B Unknown Unknown Mistral Research
Multimodal. There is also a 12B version which is under Apache 2 license.[119]
OLMo 2 Nov 2024 Allen Institute for AI 32B[120][121] 6.6T tokens[121] 15,000[121] Apache 2.0
Phi-4 Dec 12, 2024 Microsoft 14B[122] 9.8T tokens Unknown MIT
Marketed by Microsoft as a "small language model".[123]
DeepSeek-V3 Dec 2024 DeepSeek 671B 14.8T tokens 56,000 MIT
Used 2.788M training hours on H800 GPUs.[124] Originally released under the DeepSeek License, then re-released under the MIT License as "DeepSeek-V3-0324" in March 2025.[125]
Amazon Nova Dec 2024 Amazon Unknown Unknown Unknown Proprietary
Includes three models: Nova Micro, Nova Lite, and Nova Pro.[126]
Close

2025

More information Name, Release date ...
Name Release date[b] Developer Number of parameters Corpus size License[c] Notes
DeepSeek-R1 Jan 20 DeepSeek 671B Not applicable MIT
No pretraining; reinforcement-learned upon V3-Base.[127][128]
Qwen2.5 Jan 26 Alibaba 72B 18T tokens Various
7 dense models with parameter counts from 0.5B to 72B. Alibaba also released 2 MoE variants.[129]
MiniMax-Text-01 Jan 14 Minimax 456B 4.7T tokens[130] Minimax Model
Gemini 2.0 Feb 5 Google DeepMind Unknown Unknown Proprietary
Three models released: Flash, Flash-Lite and Pro.[132][133][134]
Grok 3 Feb 19 xAI Unknown Unknown Proprietary
Training cost claimed to be "10x the compute of previous state-of-the-art models".[135]
Claude 3.7 Feb 24 Anthropic Unknown Unknown Proprietary
One model, Sonnet 3.7.[136]
YandexGPT 5 Lite Pretrain and Pro Feb 25 Yandex Unknown Unknown Proprietary
GPT-4.5 Feb 27 OpenAI Unknown Unknown Proprietary
OpenAI's largest non-reasoning model at the time.[137]
Gemini 2.5 Mar 25 Google DeepMind Unknown Unknown Proprietary
Three models released: Flash, Flash-Lite and Pro.[138]
YandexGPT 5 Lite Instruct Mar 31 Yandex Unknown Unknown Proprietary
Llama 4 Apr 5 Meta AI 400B 40T tokens Llama 4
OpenAI o3 and o4-mini Apr 16 OpenAI Unknown Unknown Proprietary
Reasoning models.[141]
Qwen3 Apr 28 Alibaba Cloud 235B 36T tokens Apache 2.0
Multiple sizes, the smallest being 0.6B.[142]
Claude 4 May 22 Anthropic Unknown Unknown Proprietary
Includes two models, Sonnet and Opus.[143]
Sarvam-M May 23 Sarvam AI 24B Unknown Apache 2.0
Hybrid reasoning model fine-tuned on Mistral Small base; optimized for math, programming, and Indian languages.[144][145]
Grok 4 Jul 9 xAI Unknown Unknown Proprietary
Param-1 Jul 21 BharatGen 2.9B[147] 5T tokens[g][147] Apache 2.0
GLM-4.5 Jul 29 Z.ai 355B 22T tokens[149][h] MIT
Released in 355B and 106B sizes.[150]
GPT-OSS Aug 5 OpenAI 117B Unknown Apache 2.0
Released in 20B and 120B sizes.[151]
Claude 4.1 Aug 5 Anthropic Unknown Unknown Proprietary
Includes one model, Opus.[152]
GPT-5 Aug 7 OpenAI Unknown Unknown Proprietary
Includes three models: GPT-5, GPT-5 mini, and GPT-5 nano. GPT-5 is available in ChatGPT and API. It includes reasoning abilities. [153][154]
DeepSeek-V3.1 Aug 21 DeepSeek 671B 15.639T MIT
Based on DeepSeek V3 (trained on 14.8T tokens); further trained on 839B tokens from the extension phases (630B + 209B).[155] A hybrid model that can switch between thinking and non-thinking modes.[156]
YandexGPT 5.1 Pro Aug 28 Yandex Unknown Unknown Proprietary
Apertus Sep 2 ETH Zurich and EPF Lausanne 70B 15T[157] Apache 2.0
The first LLM to be compliant with the Artificial Intelligence Act of the European Union.[158]
Claude Sonnet 4.5 Sep 29 Anthropic Unknown Unknown Proprietary
GLM-4.6 Sep 30 Z.ai 357B Unknown Apache 2.0
Alice AI LLM 1.0 Oct 28 Yandex Unknown Unknown Proprietary
Gemini 3 Nov 18 Google DeepMind Unknown Unknown Proprietary
Models released: Deep Think and Pro.[163]
Olmo 3[164] Nov 20 Allen Institute for AI 32B 5.9T tokens[165] Apache 2.0
Includes 7B and 32B parameter versions, alongside reasoning and instruction-following models.[165]
Claude Opus 4.5 Nov 24 Anthropic Unknown Unknown Proprietary
Largest model in the Claude family.[166]
DeepSeek-V3.2 Dec 1 DeepSeek 685B Unknown MIT
Uses a custom DeepSeek Sparse Attention (DSA) mechanism[167][168][169]
GPT 5.2 Dec 11 OpenAI Unknown Unknown Proprietary
It was able to solve an open problem in statistical learning theory that had previously remained unresolved by human researchers.[170]
GLM-4.7 Dec 22 Z.ai 355B Unknown Apache 2.0
Close

2026

More information Name, Release date ...
Name Release date[b] Developer Number of parameters Corpus size License[c] Notes
Qwen3-Max-Thinking Jan 26 Alibaba Cloud Unknown Unknown Proprietary
Proprietary reasoning model with adaptive tool-use, test-time scaling, and iterative self-reflection.[171]
Kimi K2.5 Jan 27 Moonshot AI 1040B 15T tokens Modified MIT
Multimodal MoE with 32B active parameters, derived from Kimi K2.[172] Can use "Agent Swarm" technology to coordinate up to 100 parallel sub-agents.[173][174]
Step-3.5-Flash Feb 12 StepFun 196B Unknown Apache 2.0
MoE model with 11B active parameters out of 196B total[175][176][177]
Claude Opus 4.6 Feb 5 Anthropic Unknown Unknown Proprietary
GPT-5.3-Codex Feb 5 OpenAI Unknown Unknown Proprietary
GLM-5 Feb 12 Z.ai 754B Unknown MIT
Claude Sonnet 4.6 Feb 17 Anthropic Unknown Unknown Proprietary
Param-2 Feb 17 BharatGen 17B ~22T tokens BharatGen Research[178]
Mixture-of-experts model, successor of Param-1; many more Indic languages are supported. Trained on H100 GPUs for 24 days.[179]
Sarvam-105B Feb 18[i] Sarvam AI 105B[181] 12T tokens[181] Apache 2.0
India's first independently-trained foundation model; has 105B and 30B versions. Based on mixture-of-experts model, using only 10.3B active parameters at a time.[182] Interprets Indic languages and Hinglish.[183][184]
Sarvam-30B 30B[181] 16T tokens[181]
GPT-5.4 Mar 5 OpenAI Unknown Unknown Proprietary
Mistral Small 4 Mar 17 Mistral AI 119B Unknown Apache 2.0
MoE model with 6B active parameters out of 119B total[185][186]
MiMo-V2-Pro Mar 18 Xiaomi 1000B[187] Unknown Proprietary
Mixture-of-experts (MoE) model with more than 1 trillion parameters (43 billion active). Designed for agentic scenarios. Initially available on OpenRouter under the codename "Hunter Alpha" before official release.[188]
Gemma 4 Apr 2 Google DeepMind 31B Unknown Apache 2.0
Released in 31B, 26B A4B (3.8 billion active parameters), E4B (4 billion effective parameters), and E2B variants[189][190]
GLM-5.1 Apr 7 Z.ai 754B Unknown MIT
MoE model designed for agentic coding[191][192]
Muse Spark Apr 8 Meta Superintelligence Labs Unknown Unknown Proprietary
Qwen3.6 (Qwen3.6-35B-A3B) Apr 15 Alibaba Cloud 35B Unknown Apache 2.0
MoE model with 3B active parameters out of 35B total[194][195]
Claude Opus 4.7 Apr 16 Anthropic Unknown Unknown Proprietary
GPT-5.5 Apr 23 OpenAI Unknown Unknown Proprietary
DeepSeek-V4-Flash Apr 24 DeepSeek 284B 32T MIT
Preview release[196]
DeepSeek-V4-Pro 1.6T
MiMo-V2.5-Pro Apr 27 Xiaomi 1.02T 48T MIT
MoE model designed for agentic coding and long-horizon software engineering tasks.[197][198]
MiMo-V2.5 310B 27T
Omni-modal MoE model with agentic capabilities and 1M-token context.[199]
Gemini 3.5 Flash May 19 Google DeepMind Unknown Unknown Proprietary
Claude Opus 4.8 May 28 Anthropic Unknown Unknown Proprietary
Step 3.7 Flash May 29 StepFun 198B[j] Unknown Apache 2.0
Close

See also

Notes

  1. In many cases, researchers release or report on multiple versions of a model having different sizes. In these cases, the size of the largest model is listed here.
  2. This is the date that documentation describing the model's architecture was first released.
  3. This is the license of the pre-trained model weights. In almost all cases the training code itself is open-source or can be easily replicated. LLMs may be licensed differently from the chatbots that use them; for the licenses of chatbots, see List of chatbots.
  4. The smaller models including 66B are publicly available, while the 175B model is available on request.
  5. Facebook's license and distribution scheme restricted access to approved researchers, but the model weights were leaked and became widely available.
  6. As stated in Technical report: "Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method ..."[58]
  7. "focus[ed] on India’s linguistic landscape"
  8. Corpus size was calculated by combining the 15 trillion tokens and the 7 trillion tokens pre-training mix.
  9. An early checkpoint of the model was released in January.[180]
  10. 196B + 1.8B (ViT)

References

Related Articles

Wikiwand AI