Ilya Gusev, Feb 1, 2023

Training language models, such as OpenAI's GPTs, is now a crucial step in advancing the field of natural language processing. This post will explore training language models from scratch, focusing on efficiency. We will cover existing models, text datasets, scaling laws, and optimization tweaks.

Whether you're interested in training a language model or using the existing ones, this post will provide valuable insights and information to help you achieve your goals.

But first, let me answer several popular questions and bust some myths.

Untitled

FAQ

Can we use ChatGPT for powering any products?

No. ChatGPT has no API, so there is no technical and legal way to use it for commercial purposes. However, people are sometimes confusing ChatGPT with GPT-3. For GPT-3 situation is different: there is an API that we can use.

What is the difference between ChatGPT and GPT-3?

A lot of feedback from data annotators is embedded into ChatGPT through reinforcement learning. The format of interactions is also different:

ChatGPT is a dialog agent. It has only a web interface.
GPT-3 and GPT-3.5 (InstructGPT) are families of general-purpose models, and they have a playground and an API. The most advanced of those models is text-davinci-003.

A figure from a famous Notion article

Can I convert InstructGPT to ChatGPT?

No, but you can close the gap with the right prompt. See here.

Example of a prompt: As an advanced chatbot named ChatGPT, your primary goal is to assist users to the best of your ability. This may involve answering questions, providing helpful information, or completing tasks based on user input.

How much does it cost to use GPT-3?

$0.02 per 1000 tokens. One token equals around 4 chars in English. OpenAI charges both for input and output tokens.

Imagine you’d like to use it for text summarization of hotel reviews, and there are 1 million hotels to summarize. Let’s say there are 10 reviews on average, and each review has 100 tokens on average. So you fit them in 1000 tokens per hotel. It will cost you $20K. With some tweaks, you can reduce this number, of course. Still, in the end, there will be an irreducible cost of $4K for tokens of generated summaries, which are also 100 tokens on average, and for the same amount of input tokens.

FAQ

Can we use ChatGPT for powering any products?

What is the difference between ChatGPT and GPT-3?

Can I convert InstructGPT to ChatGPT?

How much does it cost to use GPT-3?

How much does it cost to build our own ChatGPT/GPT-3?