»  Artificial intelligence (AI) and MLScienceTechnology   »   How does chat GPT work?
Artificial intelligence (AI) and ML
  4 April 2023

How does chat GPT work?

Artificial intelligence (AI) and ML
  4 April 2023

How does chat GPT work?

OpenAI’s release of Chat GPT has had a significant impact on the world of artificial intelligence. This innovative AI assistant has captured the attention of many by providing an all-in-one solution for a variety of tasks. Chat GPT can answer questions, troubleshoot code, and even compose poems, making it a versatile companion for people in various fields. Its natural language processing capabilities make it easy to interact with, and it has the potential to revolutionize the way people work, learn, and communicate. As more people begin to incorporate Chat GPT into their daily lives, we can expect to see even more exciting developments in the field of artificial intelligence.
Although many people are using Chat GPT daily, the inner workings remain a mystery to most. In this article, we will uncover how it works by introducing the key technologies that make it possible.

What are "Transformers" (Introduced by Google)

The development of GPT technology owes a great deal to Google’s key research finding known as the Transformer. In 2017, Google researchers published a paper [1] called “Attention Is All You Need” introducing the Transformer architecture – a neural network architecture that relies entirely on attention mechanisms, with no recurrence or convolutions involved. The attention mechanism within a transformer neural network serves as a spotlight that shines on the most relevant information in the input data, allowing the model to process long sequences of data more efficiently and accurately. This mechanism is a significant improvement over earlier methods of processing data, and it plays a crucial role in enabling GPT technology to operate effectively.
In a transformer neural network, each input data point is represented as a vector, and the attention mechanism computes a weight for each vector based on how relevant it is to the output of the model. The vectors with higher weights are given more attention, while the vectors with lower weights are given less attention. By selectively attending to the most important parts of the input data, the model can make better predictions and learn more effectively. This attention mechanism has been particularly successful in natural language processing tasks, where long sequences of text need to be processed (learn more about transformers with this hands-on tutorial).
Figure 1: The encoder self-attention distribution for the word “it” from the 5th to the 6th layer of a Transformer trained on English-to-French translation (one of eight attention heads). Source: Google AI Blog.

Introduction of GPT (Generative Pre-training Transformer)

The Generative Pre-training Transformer (GPT) is a powerful neural network architecture that has revolutionized natural language processing tasks. GPT is a type of transformer model that uses unsupervised learning to pre-train a large neural network on vast amounts of text data, such as books, articles, and websites.

Learning Mechanism

At its core, GPT uses a decoder-only transformer architecture that employs causal (masked) attention. This means that it can generate text by predicting the next word in a sequence based on the preceding words. To do this, GPT uses a sequence of self-attention layers, where each layer takes in a sequence of embeddings representing the input text and applies attention mechanisms to identify the most relevant information in the sequence. The output of each layer is then passed to the next layer, allowing the model to learn increasingly complex relationships between words and phrases in the text.


During pre-training, GPT is trained on massive amounts of text data, typically in the form of books, articles, and web pages. The goal of pre-training is to teach the model to learn the underlying patterns and relationships between words and phrases in natural language. This is achieved by using a self-supervised learning approach, where the model is trained to predict the next word in a sequence based on the preceding words. By doing so, the model can learn to generate natural language text that is coherent and contextually relevant.

Fine Tuning

Once pre-training is complete, the model can be fine-tuned on a smaller dataset for a specific task, such as text classification or language generation. Fine-tuning involves adjusting the weights of the pre-trained model to optimize its performance on the specific task at hand. By leveraging the pre-trained weights, the fine-tuned model can achieve state-of-the-art performance on a wide range of natural language processing tasks.


GPT has demonstrated impressive capabilities in generating high-quality natural language text. For example, it can be used to generate coherent and contextually relevant responses in chatbots, summarize long articles or documents, or even write entire articles or stories. However, the sheer size of GPT and its computational requirements make it challenging to deploy and use in certain applications. Nonetheless, GPT represents a significant milestone in the development of transformer models for natural language processing, and it has inspired further research and innovation in this field.


In conclusion, the Generative Pre-training Transformer (GPT) is a powerful neural network architecture that has revolutionized natural language processing tasks. By pre-training a large neural network on vast amounts of text data and fine-tuning it for specific tasks, GPT can generate high-quality natural language text that is coherent and contextually relevant. As such, GPT has opened new possibilities in applications such as chatbots, language translation, and text summarization, and it is likely to play a major role in shaping the future of natural language processing and artificial intelligence more broadly.
GPT (2018)
  • Transformer model designed for natural language processing tasks
  • 117 million parameters
  • 1.5 billion parameters
  • Trained on a massive corpus of text from the internet, consisting of 8 million web pages (40GB)
GPT-3 (2020)
  • A massive 175 billion parameters
  • Trained on a diverse range of internet text
  • Demonstrated remarkable capabilities in natural language processing tasks, including language translation, question-answering, and text completion
GPT-4 (2023)
  • Multimodal model (accepting image and text inputs, emitting text outputs)
Figure 2: Evolution of Chat GPT Large Language Model.

Future of GPT:

Looking ahead, we can anticipate that future GPT models will continue to improve upon previous generations by incorporating new features, such as multimodal data inputs that allow for a combination of image and text (which has been introduced by GPT-4 at the time of writing this article). However, as the complexity of these models increases, the training process becomes more resource-intensive, requiring large amounts of energy and infrastructure. Addressing these concerns will be critical for the continued development of GPT technology, as we seek to harness the full potential of this powerful tool for a wide range of applications, from customer service chatbots to automated writing assistants.


Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need.


Starter Kit for Arduino Uno R3