GPT-J
Large language model developed by EleutherAI
From Wikipedia, the free encyclopedia
GPT-J or GPT-J-6B is an open-source large language model (LLM) developed by EleutherAI in 2021. As the name suggests, it is a generative pre-trained transformer model designed to produce human-like text that continues from a prompt. The optional "6B" in the name refers to the fact that it has 6 billion parameters.[1] The model is available on GitHub, but the web interface no longer communicates with the model. Development stopped in 2021.[2]
| GPT-J | |
|---|---|
Logo | |
| Developer | EleutherAI |
| Initial release | June 9, 2021 |
| Type | |
| License | Apache License 2.0 |
| Website | 6b |
Architecture
GPT-J is a GPT-3-like model with 6 billion parameters. Like GPT-3, it is an autoregressive, decoder-only transformer model designed to solve natural language processing (NLP) tasks by predicting how a piece of text will continue.
The model has 28 transformer layers and 16 attention heads. Its vocabulary size is 50257 tokens, the same size as GPT-2's.[1] It has a context window size of 2048 tokens.[3][non-primary source needed]
It was trained on the Pile dataset,[1] using the Mesh Transformer JAX library in JAX to handle the parallelization scheme.[1][4][non-primary source needed]
GPT-J was designed to generate English text from a prompt. It was not designed for translating or generating text in other languages or for performance without first fine-tuning the model for a specific task. Like all LLMs, it is not programmed to give factually accurate information, only to generate text based on probability.[1][non-primary source needed]