Gemini (language model)

Large language model developed by Google From Wikipedia, the free encyclopedia

Gemini is a family of multimodal large language models (LLMs) developed by Google DeepMind, and the successor to LaMDA and PaLM 2. Comprising Gemini Pro, Gemini Deep Think, Gemini Flash, and Gemini Flash Lite,[1] it was announced on December 6, 2023. It powers the chatbot of the same name.

Initial releaseDecember 6, 2023; 2 years ago (2023-12-06) (beta version)
February 8, 2024; 2 years ago (2024-02-08) (official rollout)
Stable release
3.1 Pro, 3 Deep Think, 3 Flash, 3.1 Flash lite[1][2] / March 3, 2026; 37 days ago (2026-03-03)
PredecessorPaLM
Quick facts Developers, Initial release ...
Gemini
DevelopersGoogle AI
Google DeepMind
Initial releaseDecember 6, 2023; 2 years ago (2023-12-06) (beta version)
February 8, 2024; 2 years ago (2024-02-08) (official rollout)
Stable release
3.1 Pro, 3 Deep Think, 3 Flash, 3.1 Flash lite[1][2] / March 3, 2026; 37 days ago (2026-03-03)
PredecessorPaLM
Available inEnglish and other languages
TypeLarge language model
LicenseProprietary
Websitedeepmind.google/technologies/gemini/ Edit this at Wikidata
Close

History

Development

Google announced Gemini, a large language model (LLM) developed by subsidiary Google DeepMind, during the Google I/O keynote on May 10, 2023. It was positioned as a more powerful successor to PaLM 2, which was also unveiled at the event, with Google CEO Sundar Pichai stating that Gemini was still in its early developmental stages.[3][4] Unlike other LLMs, Gemini was said to be unique in that it was not trained on a text corpus alone and was designed to be multimodal, meaning it could process multiple types of data simultaneously, including text, images, audio, video, and computer code.[5] It had been developed as a collaboration between DeepMind and Google Brain, two branches of Google that had been merged as Google DeepMind.[6] In an interview with Wired, DeepMind CEO Demis Hassabis touted Gemini's advanced capabilities, which he believed would allow the algorithm to trump OpenAI's ChatGPT, which runs on GPT-4 and whose growing popularity had been aggressively challenged by Google with LaMDA and Bard. Hassabis highlighted the strengths of DeepMind's AlphaGo program, which gained worldwide attention in 2016 when it defeated Go champion Lee Sedol, saying that Gemini would combine the power of AlphaGo and other Google–DeepMind LLMs.[7]

In August 2023, The Information published a report outlining Google's roadmap for Gemini, revealing that the company was targeting a launch date of late 2023. According to the report, Google hoped to surpass OpenAI and other competitors by combining conversational text capabilities present in most LLMs with artificial intelligence–powered image generation, allowing it to create contextual images and be adapted for a wider range of use cases.[8] Like Bard,[9] Google co-founder Sergey Brin was summoned out of retirement to assist in the development of Gemini, along with hundreds of other engineers from Google Brain and DeepMind;[8][10] he was later credited as a "core contributor" to Gemini.[11] Because Gemini was being trained on transcripts of YouTube videos, lawyers were brought in to filter out any potentially copyrighted materials.[8]

With news of Gemini's impending launch, OpenAI hastened its work on integrating GPT-4 with multimodal features similar to those of Gemini.[12] The Information reported in September that several companies had been granted early access to "an early version" of the LLM, which Google intended to make available to clients through Google Cloud's Vertex AI service. The publication also stated that Google was arming Gemini to compete with both GPT-4 and Microsoft's GitHub Copilot.[13][14]

Launch

On December 6, 2023, Pichai and Hassabis announced "Gemini 1.0" at a virtual press conference.[15][16] It contains three models: Gemini Ultra, designed for "highly complex tasks"; Gemini Pro, designed for "a wide range of tasks"; and Gemini Nano, designed for "on-device tasks". At launch, Gemini Pro and Nano were integrated into Bard and the Pixel 8 Pro smartphone, respectively, while Gemini Ultra was set to power "Bard Advanced" and become available to software developers in early 2024. Other products that Google intended to incorporate Gemini into included Search, Ads, Chrome, Duet AI on Google Workspace, and AlphaCode 2.[17][16] It was made available only in English.[16][18] Touted as Google's "largest and most capable AI model" and designed to emulate human behavior,[19][16][20] the company stated that Gemini would not be made widely available until the following year due to the need for "extensive safety testing".[15] Gemini was trained on and powered by Google's Tensor Processing Units (TPUs),[15][18] and the name is in reference to the DeepMind–Google Brain merger as well as NASA's Project Gemini.[21]

Gemini Ultra was said to have outperformed GPT-4, Anthropic's Claude 2, Inflection AI's Inflection-2, Meta's LLaMA 2, and xAI's Grok 1 on a variety of industry benchmarks,[22][15] while Gemini Pro was said to have outperformed GPT-3.5.[5] Gemini Ultra was also the first language model to outperform human experts on the 57-subject Massive Multitask Language Understanding (MMLU) test, obtaining a score of 90%.[5][21] Gemini Pro was made available to Google Cloud customers on AI Studio and Vertex AI on December 13, while Gemini Nano will be made available to Android developers as well.[23][24][25] Hassabis further revealed that DeepMind was exploring how Gemini could be "combined with robotics to physically interact with the world".[26] In accordance with an executive order signed by U.S. President Joe Biden in October, Google stated that it would share testing results of Gemini Ultra with the federal government of the United States. Similarly, the company was engaged in discussions with the government of the United Kingdom to comply with the principles laid out at the AI Safety Summit at Bletchley Park in November.[5]

In June 2025, Google introduced Gemini CLI, an open-source AI agent that brings the capabilities of Gemini directly to the terminal, offering advanced coding, automation, and problem-solving features with generous free usage limits for individual developers.[27]

Updates

In January 2024, Google partnered with Samsung to integrate Gemini Nano and Gemini Pro into its Galaxy S24 smartphone lineup.[28][29] The following month, Bard and Duet AI were unified under the Gemini brand,[30][31] with "Gemini Advanced with Ultra 1.0" releasing via a new "AI Premium" tier of the Google One subscription service.[32] Gemini Pro also received a global launch.[33]

In February 2024, Google launched Gemini 1.5, describing it as a more powerful and capable model than 1.0 Ultra.[34][35][36] Changes include a new architecture, a mixture-of-experts approach, and a larger one-million-token context window.[37] The same month, Google debuted Gemma, a smaller, free and open-source range of Gemini models. Multiple publications described this as a response to Meta and others open-sourcing their AI models, and a reversal from Google's prior practice of keeping its AI proprietary.[38][39][40] Google announced an additional model, Gemini 1.5 Flash, on May 14 at the 2024 I/O keynote.[41]

Two updated Gemini models, Gemini-1.5-Pro-002 and Gemini-1.5-Flash-002, were released on September 24, 2024.[42][non-primary source needed]

On December 11, 2024, Google announced the new Gemini 2.0 Flash Experimental model.[43] Features include a Multimodal Live API for real-time audio and video interactions, native image and controllable text-to-speech generation (with watermarking), and integrated Google Search.[44] It also introduces improved agentic capabilities, a new Google Gen AI SDK, and "Jules", an experimental AI coding agent for GitHub.[45][non-primary source needed]

On January 30, 2025, Google released Gemini 2.0 Flash as the new default model, with Gemini 1.5 Flash still available for usage. This was followed by the release of Gemini 2.0 Pro on February 5, 2025. Additionally, Google released Gemini 2.0 Flash Thinking Experimental, which generates a summary of the language model's thinking process when responding to prompts.[46][non-primary source needed]

On March 12, 2025, Google also announced Gemini Robotics, a vision-language-action model based on the Gemini 2.0 family of models.[47][non-primary source needed] The next day, Google announced that Gemini in Android Studio would be able to understand simple UI mockups and transform them into working Jetpack Compose code.[48]

Gemini 2.5 Pro Experimental was released on March 25, 2025.[49][50][51] This model offers chain-of-thought prompting whilst maintaining native multimodality.[49][51][52]

At Google I/O 2025, Google announced that Gemini 2.5 Flash would become the default model.[53] Gemini 2.5 Pro was introduced as the most advanced Gemini model, featuring better coding capabilities and other improvements.[54] Both 2.5 Pro and Flash support native audio output.

On June 17, 2025, Google announced general availability for 2.5 Pro and Flash. They also introduced Gemini 2.5 Flash-Lite that same day, a model optimized for speed and cost-efficiency.[55][non-primary source needed]

On November 18, 2025, Google announced the release of 3 Pro and 3 Deep Think.[56] These new models replace 2.5 Pro and Flash. This release prompted OpenAI to hasten the release of the competing model GPT-5.2, which was released on December 11.[57][58][59][60]

On December 4, 2025, Google announced that 3 Deep Think would start rolling out to Ultra subscribers.[61][non-primary source needed]

On December 17, 2025, Google announced the release of 3 Flash replacing the current version of 2.5 Flash.[62][non-primary source needed]

On January 12, 2026, Apple announced plans to use the Gemini AI model in the upcoming version of Siri.[63][64][65][66]

On February 19, 2026, Google released Gemini 3.1 Pro.[2][non-primary source needed]

On February 26, 2026, Nano Banana 2 was rolled out. It is an updated version built on the Gemini 3.1 Flash Image platform.[67]

On March 3, 2026, Google released Gemini 3.1 Flash Lite to Developers in the Google API.[68][69]

Model versions

The following table lists the main model versions of Gemini, describing the significant changes included with each version:[70][71]

More information Version, Release date ...
Version Release date Status[72][55] Description
Bard 21 March 2023 Unsupported: Discontinued Google's first experimental chatbot service based on LaMDA.[73]
1.0 Nano 6 December 2023 Unsupported: Discontinued Designed for on-device tasks and first available in Google's Pixel 8 Pro.[74]
1.0 Pro 13 December 2023 Unsupported: Discontinued Designed for a diverse range of tasks.[74]
1.0 Ultra 8 February 2024 Unsupported: Discontinued Google's most powerful offering in the Gemini 1.0 family.[74]
1.5 Pro 15 February 2024 Unsupported: Discontinued As a successor to the 1.0 series of models, 1.5 Pro offers significantly increased context size (up to 1 million tokens). It is designed to be the most capable model in the Gemini 1.5 family.[75]
1.5 Flash 14 May 2024 Unsupported: Discontinued This version got renamed from 'Nano' to 'Flash'. It is also Gemini's free model.
2.0 Flash 30 January 2025 Supported: GA Developed by Google with a focus on multimodality, agentic capabilities, and speed.[76]
2.0 Flash-Lite 1 February 2025 Supported: GA First-ever Gemini Flash-Lite model designed for cost-efficiency and speed.[77]
2.0 Pro 5 February 2025 Supported: GA
2.5 Pro 25 March 2025 Latest version: GA
2.5 Flash 17 April 2025 Latest version: GA An incremental improvement from Gemini 2.5.
2.5 Flash-Lite 17 June 2025 Latest version: GA
2.5 Flash Image (Nano Banana) 26 August 2025 Latest version: GA
3 Pro 18 November 2025 Unsupported: Discontinued preview[78] Sparse mixture-of-experts. Outputs up to 64K tokens.[79]
3 Deep Think 3 December 2025 Future version: preview based on the 2.5 Pro "Deep Think" mode that achieved high ratings in IOI[80][81]
3 Pro Image (Nano Banana Pro) 20 November 2025 Future version: preview An improved version of Nano Banana which includes better text rendering and better real world knowledge.[82]
3 Flash 17 December 2025 Future version: preview
3.1 Pro 19 February 2026 Future version: preview
3.1 Flash Image

(Nano Banana 2)

26 February 2026 Future version: preview
3.1 Flash Lite 3 March 2026 Future version: preview
Close

Nano Banana

AI-generated abstract art created with Nano Banana 2 at Gemini
AI-generated abstract art created with Nano Banana 2 at Gemini

Nano Banana (officially Gemini 2.5 Flash Image), Nano Banana Pro (officially Gemini 3 Pro Image) and Nano Banana 2 (officially Gemini 3.1 Flash Image) are image generation and editing models.

"Nano Banana" was the codename used for the model while it was undergoing secret public testing on Arena. It first appeared publicly as an anonymous model on the crowd-sourced AI evaluation platform Arena on August 12, 2025. It was released publicly on August 26, 2025 through the Gemini app and related Google AI services. The nickname "Nano Banana" originated from nicknames given to Naina Raisinghani, Product Manager at Google DeepMind.[83] Google later confirmed its identity as Gemini 2.5 Flash Image in an official announcement upon public release.[84][85] On November 20, 2025, DeepMind released Nano Banana Pro (Gemini 3 Pro Image) with improved text rendering and world knowledge.[86][87][88][89]

Upon release, Nano Banana became a viral Internet sensation on social media, particularly for its photorealistic "3D figurine" images. Following its release, Nano Banana was made available in the Gemini app, Google AI Studio, and through Vertex AI. According to Google, it helped attract over 10 million new users to the Gemini app and facilitated more than 200 million image edits within weeks of launch.[90][91]

The model lets users change hairstyles, backdrops, and mix photos with natural language cues. Subject consistency allows the same person or item to be recognized across revisions of an image. Multi-image fusion joins photographs into one seamless output, and world knowledge allows context-aware changes. It also provides SynthID watermarking, which is an invisible digital signature in outputs to identify AI-generated information.[85][92] Multi-image fusion joins photographs into one seamless output, and world knowledge allows context-aware changes. People started to connect Nano Banana with a viral craze in which people turned their selfies into 3D figurines that looked like toys. The event circulated quickly on sites like Instagram and X (previously Twitter).[93][94] By adding the model to X, users could tag Nano Banana directly in posts to make photos from prompts, which made it even more popular.[93]

A September 2025 review in TechRadar reported that Nano Banana was more realistic and consistent across multiple prompts than ChatGPT's image generation.[95] A review in Tom's Guide praised its ability to handle creative and lively image edits.[96] Another review in PC Gamer mentioned that the model did not have some basic editing tools like cropping, and that the product sometimes did not apply changes, but reverted back to the original image instead.[92] Nano Banana showed good performance in architectural visualization, for producing imagery at the correct scale even with complex geometry.[97][98]

On 26 February 2026, Nano Banana 2 was rolled out and integrated into the Gemini chatbot, Search AI Mode, and Lens. It is a faster version built on Gemini 3.1 Flash Image, with better instruction following and text rendering.[99]

Technical specifications

As Gemini is multimodal, each context window can contain multiple forms of input. The different modes can be interleaved and do not have to be presented in a fixed order, allowing for a multimodal conversation. For example, the user might open the conversation with a mix of text, picture, video, and audio, presented in any order, and Gemini might reply with the same free ordering. Input images may be of different resolutions, while video is inputted as a sequence of images. Audio is sampled at 16 kHz and then converted into a sequence of tokens by the Universal Speech Model. Gemini's dataset is multimodal and multilingual, consisting of "web documents, books, and code, and includ[ing] image, audio, and video data".[100]

Gemini and Gemma models are decoder-only transformers, with modifications to allow efficient training and inference on TPUs. The 1.0 generation uses multi-query attention.[100]

More information Generation, Variant ...
Technical specifications of Gemini models
Generation Variant Release date Parameters Context length Notes
1.0 Nano-1 6 December 2023 1.8B 32,768 Distilled from "larger Gemini models", 4-bit quantized[100]
Nano-2 6 December 2023 3.25B
Pro 13 December 2023 ?
Ultra 8 February 2024 ?
1.5 Pro 15 February 2024 ? 10,000,000[101][102] 1 million tokens in production API
Mini 14 May 2024
Close

No whitepapers were published for Gemini 2.0, 2.5, and 3.

Reception

MIT Technology Review described speculation about Gemini's launch as "peak AI hype".[103][22] Hugh Langley of Business Insider described Gemini as a make-or-break moment for Google, writing: "If Gemini dazzles, it will help Google change the narrative that it was blindsided by Microsoft and OpenAI. If it disappoints, it will embolden critics who say Google has fallen behind."[104]

Reacting to its unveiling in December 2023, computer scientist Oren Etzioni predicted an arms race between Google and OpenAI. Computer scientist Alexei Efros praised the potential of Gemini's multimodal approach.[21] Professors Percy Liang, Emily Bender and Michael Madden cautioned that it was difficult to interpret benchmark scores without insight into the training data used.[103][105] Writing for Fast Company, Mark Sullivan opined that Google had the opportunity to challenge the iPhone's dominant market share, believing that Apple was unlikely to have the capacity to develop functionality similar to Gemini with its Siri virtual assistant.[106] Google shares increased by 5.3 percent the day after Gemini's launch.[107][108]

Google faced criticism for a demonstrative video of Gemini, which was not conducted in real time.[109][vague]

Gemini 2.5 Pro Experimental debuted at the top position on the LMArena leaderboard, a benchmark measuring human preference.[49][51] The model achieved high results across various AI benchmarks, including Humanity's Last Exam.[49][110][51][50] Initial reviews mention its improved reasoning capabilities and performance gains compared to previous versions.[50][52] Published benchmarks also showed areas where contemporary models from competitors like Anthropic, xAI, or OpenAI held advantages.[110][51]

See also

References

Further reading

Related Articles

Wikiwand AI