GPT-J

Large language model developed by EleutherAI From Wikipedia, the free encyclopedia

GPT-J or GPT-J-6B is an open-source large language model (LLM) developed by EleutherAI in 2021. As the name suggests, it is a generative pre-trained transformer model designed to produce human-like text that continues from a prompt. The optional "6B" in the name refers to the fact that it has 6 billion parameters.[1] The model is available on GitHub, but the web interface no longer communicates with the model. Development stopped in 2021.[2]

Initial releaseJune 9, 2021; 4 years ago (2021-06-09)
Type
Quick facts Developer, Initial release ...
GPT-J
DeveloperEleutherAI
Initial releaseJune 9, 2021; 4 years ago (2021-06-09)
Type
LicenseApache License 2.0
Website6b.eleuther.ai Edit this on Wikidata
Close

Architecture

GPT-J is a GPT-3-like model with 6 billion parameters. Like GPT-3, it is an autoregressive, decoder-only transformer model designed to solve natural language processing (NLP) tasks by predicting how a piece of text will continue.

The model has 28 transformer layers and 16 attention heads. Its vocabulary size is 50257 tokens, the same size as GPT-2's.[1] It has a context window size of 2048 tokens.[3][non-primary source needed]

It was trained on the Pile dataset,[1] using the Mesh Transformer JAX library in JAX to handle the parallelization scheme.[1][4][non-primary source needed]

GPT-J was designed to generate English text from a prompt. It was not designed for translating or generating text in other languages or for performance without first fine-tuning the model for a specific task. Like all LLMs, it is not programmed to give factually accurate information, only to generate text based on probability.[1][non-primary source needed]

References

Related Articles

Wikiwand AI