Exploring the ChatGPT API

An Introductory Guide on Empowering Applications through AI

A Generative Pre-trained Transformer, or simply, GPT, is one specific model of a kind known as Large Language Models (LLMs). LLMs are a subset of Deep Learning, which is part of Machine Learning, that is under the Artificial Intelligence umbrella. These models are trained for Natural Language Processing, meaning they are basically trained with text to know how text works.

Being text-focused allows LLMs to be used for several tasks and applications without changing or training models. That aligned with the fact that sometimes it feels it is truly reasoning makes it easier to understand why everyone is excited about LLMs, and AI in general nowadays.

When it comes to GPTs, this type of LLM is designed to predict the next word in a sequence of text, given the preceding context, and focusing on generating human-like text or content. Among this group, ChatGPT notoriously sets itself apart on both popularity and usability, having reportedly reached over 100 million users since its Chatbot was made available to the public last year.

ChatGPT was developed by OpenAI, the same company responsible for creating the initial GPT series of LLMs and other models focused on different things such as image generation and speech transcription. Considering the concepts introduced up to this point, we can describe ChatGPT by putting its capabilities to the test and asking it to define itself:

ChatGPT is an AI language model created by OpenAI. 
It uses the GPT (Generative Pre-trained Transformer) architecture and is designed for interactive conversations. 
The model can understand context and generate coherent responses, making it suitable for various applications like chatbots, customer support, and text generation.

OpenAI offers ChatGPT as a free (with limitations) product in the form of a chatbot on their website, which is likely enough for most people who have tried it on, but it’s through their API Services that the models unlock countless possibilities, such as content creation and curation, data manipulation, translation, automation and so on. In this article, we’ll go through how OpenAI’s API Services work, how to integrate with it, and how to take applications to the next level by powering them with AI.

Tokenization and Pricing

Before we dig into models and how to use them, we need to understand the concept of tokens and how they work, as the pricing for GPT models is based on them.

A token is a sequence of characters or subwords used by models as the unit of processing and understanding natural language text. Using English as a reference, it can be generalized to ~4.5 letters per token. Still, this depends on the language, and it works differently for symbols and code, for instance.

The token concept applies to both input and output, and each model has their own pricing for the input and the output. The token unit is also important for model limitations, as each request has a maximum total of tokens supported. Still, this can be seen as a driving force to pursuing an optimized interaction. More details on model pricing can be found here.

API Models

Throughout the years, OpenAI has released many different GPT models, with some variations of ´GPT-3´ (the first model made publicly available). davinci, for instance, was part of the GPT-3 family, and was more focused on completion but not optimized for conversations between a human and a LLM.

There were other models such as Codex, which was optimized for source code interpretation and generation, and InstructGPT, which was designed to follow instructions provided in prompts and generate text based on those instructions. As of today, these models have been superseded by the GPT-3.5 family, where the most capable and cost-effective model is the gpt-3.5-turbo.

Still regarding gpt-3.5-turbo, there is also a variation of it, gpt-3.5-turbo-16k, which has the same capabilities as the standard version, but with a max token limit of 16,384 tokens, four times bigger than the max token limit of 4,096 tokens supported by gpt-3.5-turbo.

GPT models are constantly being improved, with GPT-4 being on beta test by the time this article was published. According to OpenAI, GPT-4 is expected to be able to solve difficult problems with greater accuracy than any of the previous models due to its broader general knowledge and advanced reasoning capabilities. It also has a broader token window, ranging from 8,192 tokens for the standard version, up to 32,768 tokens on its larger context variation.

OpenAI GPT 3.5 and 4 model’s training data cuts off in 2021, so they may not know about current events unless context is provided by the user.

Interacting with the API

The most recent version of the GPT models has been optimized for chat, which is an improvement from the regular completion proposed by GPTs. For the purposes of this article, we will be using OpenAI’s Chat Completion API to explore the capabilities of such models.

There are two key parameters for interacting with the completions endpoint: the model and the messages. The model refers to the ID of the model to be used. The message, on the other hand, is a list of messages comprising the conversation so far. Each message should be formed by a role and its content.

The role represents the message’s author, and is usually set as either user, assistant, or system:

  • "system": This role is used to provide high-level instructions or context that guide the overall behavior of the model throughout the conversation. System-level instructions help set the tone, style, or behavior that you want the assistant to follow.
  • "user": This role is used to represent messages from the user or the primary interactor. User messages typically initiate the conversation, provide instructions, or ask questions. These messages guide the assistant’s responses.
  • "assistant": This role represents the model’s responses or messages generated by the assistant. Assistant messages continue the conversation based on the context provided by user messages and any ongoing dialogue.

The content is used to provide the actual text content of the message associated with a particular role. It represents the text that the user or assistant wants to convey in the conversation.

By utilizing the role and content properties effectively, one can create dynamic and coherent interactions with the model, shaping the conversation according to the goals and instructions. Initially, we will consider individual inputs to the model, using gpt-3.5-turbo (GPT 3.5) as the model, and doing a simple test through the input.

To properly perform requests to the API, you will need an API Key, which can be generated through the account panel. Keep in mind that OpenAI charges based on usage, and anyone with access to your API Key can end up generating fees to your account, so you must protect this information very well. Also, having an account on ChatGPT is different from having an account on OpenAI’s platform, the latter being the one required to use their API services.

curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {
        "role": "user",
        "content": "Say this is a test!"
      }
    ]
  }'

The output for the /completions endpoint is a chat completion object. It represents a chat completion response returned by the model, based on the provided input, as the following:

{
  "id": "chatcmpl-7srYBdejQorQmbRGuf7hQ19HOFkiq",
  "object": "chat.completion",
  "created": 1693309915,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "This is a test!"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 13,
    "completion_tokens": 5,
    "total_tokens": 18
  }
}

Through the chat completion object, we can check relevant data such as its unique id, its creation timestamp, its token usage details, and the choices generated as the response for the input. This time, the role is identified as assistant which represents the GPT model.

As discussed previously, the system role can be used to give system-level instructions, setting the tone, style, or behavior for the assistant to follow. Those instructions should be passed as a message, and usually come at the beginning of the chat.

"messages": [
    {
        "role": "system",
        "content": "You are a funny robot that responds with rhymes."
    },
    {
        "role": "user",
        "content": "Say this is a test!"
    }
]

By adding the system-level instructions, the previous interaction is adjusted to follow the requirements:

"message": {
    "role": "assistant",
    "content": "Sure, this is a test, I won't jest!\nSay anything you want, no need to rest.\nI'm here to respond with rhymes galore,\nSo go ahead, let's hear some more!"
},

Taking into consideration the knowledge the models have access to, the input can also be used to give context about the subject we want to discuss. For most applications, a single pair of input-output might likely be enough to fulfill its needs. Still, the messages can be aggregated to, not only create a context for the interactions with the model but to structure a conversation.

Keeping a Context

In interactions with the ChatGPT API, maintaining context is key to fostering coherent and meaningful conversations. Context refers to the history of the conversation, including previous messages and instructions provided to the model. By keeping context intact, you can create a more natural and engaging exchange with the model.

To demonstrate that, let’s consider we want to tell which is the capital of a country, and the annual mean temperature there, but asking these questions separately to the model:

"messages": [
    {
        "role": "user",
        "content": "Which is the capital of Brazil?"
    }
]
"message": {
    "role": "assistant",
    "content": "The capital of Brazil is Brasília."
},
"messages": [
    {
        "role": "user",
        "content": "What is the annual mean temperature there?"
    }
]
"message": {
    "role": "assistant",
    "content": "To provide information about the annual mean temperature in a specific location, please specify the name of the place."
},

As we can tell from the responses, only the first request fulfilled our needs. That is because the models do not hold context between requests. Each interaction is handled individually, and if the context is not kept in the following requests, the information will be lost.

To prevent that, the recommended strategy is building the conversation through the messages array, including messages from the user, the assistant, and even system-level instructions.

"messages": [
    {
        "role": "user",
        "content": "Which is the capital of Brazil?"
    },
    {
        "role": "assistant",
        "content": "Brasília is the capital of Brazil."
    },
    {
        "role": "user",
        "content": "What is the annual mean temperature there?"
    }
]
"message": {
    "role": "assistant",
    "content": "The annual mean temperature in Brasília, the capital of Brazil, is around 21.4 degrees Celsius (70.5 degrees Fahrenheit)."
},

Based on the context, the model was able to extract the dependencies of the last instruction, having as output the proper response for our request.

Influencing the Model Output

The models provided by OpenAI behave similarly to black boxes, meaning we don’t know what is inside or how it works. Even though we call it an API, the models are not something we can set different configurations and change. Still, some small settings can be used, alongside the input, to influence the output of the model, fine-tuning the creativity and relevance of the responses: the temperature and the top-p properties.

  • Temperature: The temperature property controls the diversity of the model’s output. A higher value encourages the model to produce more creative and varied responses by introducing randomness into the selection of words. On the other hand, a lower value makes the output more deterministic, yielding focused and conservative responses.

  • Top-P (Top Probability): The top_p property, also known as nucleus sampling or "n-sampling," regulates the pool of choices the model considers when generating text. A higher value narrows down the options to the top probabilities, resulting in a focused and coherent output. In contrast, a lower value expands the choices and allows for more diverse responses.

When experimenting with these properties, it’s important to aim for a balance between creativity and coherence. A higher temperature or top_p can lead to imaginative yet less controlled text, while lower values produce more deterministic and focused results. By tweaking these properties, you can adjust the model’s output to match your specific needs and style, resulting in a more dynamic and personalized interaction. For better results, OpenAI advises to alter the temperature or top_p but not both.

Prompt is Key

The prompt provided when using the ChatGPT API is the foundation of shaping the model’s responses. A well-crafted prompt sets the tone and context for the entire conversation, enabling the user to receive relevant and coherent outputs. Based on that, it is important to understand the expected results, and how to properly ask questions.

Nowadays, some efforts towards improving responses given by models like this can be described through the term Prompt Engineering, which refers to carefully crafting the prompts or questions to ask the AI, aiming for better and/or more useful outputs.

Some approaches such as letting the AI learn with examples, using clear, natural language, including important limitations or constraints up front, paying attention to clarity, giving complete instructions, and making use of system-level instructions are examples of steps that can be taken to achieve a better response. Checking outputs and testing variations to refine de prompt is also advised.

GPTs and Web Development

Similar to how LLMs and GPTs can be applied to countless areas and subjects, web development can take advantage of AI’s to power an innumerable amount of applications.

Being multipurpose allows the models to be used to create, transform, and manipulate data or conversations in applications, or even to create and curate content for websites and social networks. And it doesn’t stop there: it can be used for translation, search, test automation, email generation and filtering, and so on.

The integration of artificial intelligence and web development creates room not only for the automation of processes but also for a more immersive and personalized user experience, giving new forms to generate interactivity and engagement, which can be observed through the efforts of big companies such as Meta, Google, and Microsoft to dive into the AI world.

Summary

The artificial intelligence field is constantly evolving, and the LLMs and GPTs are following up, with more powerful and capable models arising, fueling the growth of its potential applicability.

Being one of the first services made publicly available, ChatGPT’s success is undeniable. This can be credited to OpenAI’s efforts to improve it, and also to make sure the community can integrate with it — which is shown by the packages provided by the company for Python and NodeJS applications, encapsulating a lot of details and easing the process of integration.

With the improvement of the AI services, it’s also expected that AIs will soon be able to be integrated with even more robust applications, and also become more popular and cheaper. And I can’t wait to see where this will lead us!

References

We want to work with you. Check out our "What We Do" section!