The essential lingo for building with Large Language Models — get up to speed fast with our LLM Primer ✔️
Ready to get started with LLM? Our team has learned a ton from playing and building with LLMs over the last few years. As our founder/CEO says “AI years are like dog years.” By that measure we now have decades of experience or so it feels 😅
Here’s the 21 most important terms to know if you want to work with LLMs:
1. Foundation models: an LLM that serves as the basis or starting point for developing more specialized models tailored for specific tasks or industries.
2. GPT or Generative Pretrained Transformer: A series of LLMs developed by OpenAI.
We use GPT-4, GPT-3.5-turbo (the model behind ChatGPT), and text-davinci-003 (the latest GPT-3 model). But we previously also used text-davinci-002, Curie, Ada, etc. In other words, there’s lots of them!
3. Neural networks: A type of machine learning model inspired by the human brain. LLMs are a specific type of neural network known as transformer-based models.
4. Gradient descent: The process used to train neural networks, including LLMs, by iteratively adjusting the model's weights to minimize the loss function.
5. Pre-training: This is the initial phase of training, where the model learns to predict the next word in a sentence.
A common misconception is that biased training data used in pre-training will necessarily make the model biased. Bias can be addressed during the fine-tuning.
6. Fine-tuning: This is a process where the pre-trained model is further trained (or "tuned") on a specific task. Fine-tuning allows the model to adapt its general language understanding to a specific use cases and to align its behavior more closely with human values.
7. Alignment: The process of ensuring that the model's output aligns with human values and goals.
8. Training Corpus: This refers to the massive amount of text data that the model is trained on. It includes books, articles, websites, and other forms of written language.
9. Parameters: Internal variables that the model uses to make predictions. The more parameters, the bigger the model.
When an LLM is "trained", these parameters are being adjusted so that the model can make accurate predictions or generate coherent.
Remember this meme from twitter:
10. Tokenization: Breaking down text into words or parts of words, which the model can process.
One token is roughly four characters in English and fewer in other languages, even other latin alphabet languages – which we learned the hard way for Kraftful - don't ask!
11. Prompt engineering: This is the process of designing the input (or "prompt") given to an LLM to elicit a desired response.
Good prompt engineering can significantly improve the performance of a language model.
Prompt engingeering gets less critical as models improve. We recall writing super complex prompts in early 2020 to get GPT3 to do basic things, now done with a 1-sentence prompt. It's also less important in chat where you give the model feedback to ultimately get the desired output.
12. Context window or sequence length: The number of tokens that the model can take as input at once.
For those of us using LLMs to summarize large texts (like user feedback from various sources), part of the puzzle is figuring out how to keep the original context when breaking up text to fit into the context window.
13. Hallucinations: Describes instances where an LLM generates output that is not grounded in its input data. This can often lead to the generation of factually incorrect or nonsensical stuff. I wrote a thread about how to limit hallucinations yesterday.
14. Zero-shot learning: This is the ability of the model to understand and perform tasks that it was not specifically trained on. It uses its general understanding of language to make inferences and predictions.
15. One or few-shot learning: In one-shot learning, the model receives a single example, and in few-shot learning, it receives a small number of examples.
For example: you could tell the model to write a poem and include a sample poem so that it can see your desired format.
16. Agent: In the context of LLMs, this refers to an implementation of a language model that is designed to interact with users or other systems, like a chatbot or virtual assistant.
17. Embedding: A vector representation of words that can be used to determine how similar that text is to embeddings of other text. Words used in similar contexts have embeddings that are closer in the vector space.
18. Vector database: A specialized type of database designed to efficiently store and manage embeddings. The key feature of a vector database is its ability to perform efficient similarity search.
19. LLMOps: This term refers to the operational aspects of deploying and managing LLMs in production, including model training, deployment, monitoring, and troubleshooting.
20. Open-source language models: These are models that are made available for public use and modification, so they can be deployed on-premise or in a private cloud.
21. Multimodal models: Models that combine language with other types of data, like images or sound.
Thanks for joining us on this journey through the fascinating world of large language models and their applications. Happy building!