LLM, Token, Prompt — Key AI Terms Explained Simply

If you've spent any time with AI tools like Claude, ChatGPT, or Gemini, you've probably stumbled across terms that nobody bothers to explain properly. Here's an overview — grouped by topic, as brief as possible, as precise as necessary.

Fundamentals — What AI Actually Is

Generative AI

The umbrella term for AI that creates new content: text, images, audio, video, code. Claude, ChatGPT, Midjourney, DALL-E — all fall under Generative AI. In contrast, there's AI that only analyzes or classifies (e.g., spam filters) but doesn't create anything new.

LLM — Large Language Model

The technology behind Claude, ChatGPT and the rest. An LLM is a language model trained on enormous amounts of text. Through this, it learned how language works — grammar, relationships, domain knowledge, style. When you ask Claude something, it generates the answer word by word, based on what's most likely given the context.

An LLM doesn't "know" things in the human sense. It has no memory between sessions (unless Memory is enabled) and no opinions of its own.

Transformer

The architecture that virtually all modern language models are built on. Developed at Google in 2017. The key advantage: Transformers can recognize relationships between words across long passages of text — not just adjacent words. Without Transformers, neither ChatGPT nor Claude would exist.

GPT — Generative Pre-trained Transformer

The name of the model architecture behind ChatGPT. Stands for "Generative Pre-trained Transformer". The term is often used as a synonym for LLMs in general, even though that's not quite technically correct. Claude also uses the Transformer architecture but with its own model development and training.

Model

A specific version of an LLM. Claude has several: Opus (most capable), Sonnet (balanced), Haiku (fastest). OpenAI has GPT-4o, Google has Gemini. Different models have different strengths — larger ones generally deliver better results but take longer and consume more tokens.

Parameters

The internal numerical values of a model that were adjusted during training. When someone says "the model has 70 billion parameters", that's a rough measure of its size and capacity. More parameters generally means: more knowledge, better results, but also more computational effort.

How You Work with AI

Token

Tokens are the unit in which LLMs process text. A token isn't the same as a word — it's more like a word fragment. "Unbelievable" might be split into multiple tokens, while "yes" is a single one.

Your usage limit is measured in tokens. Every message and every response consumes tokens. Documents and the conversation history count too. Rule of thumb for English: 1,000 tokens ≈ 750 words.

Prompt

The prompt is what you send to the model — your question, your task, your input. The quality of the prompt determines the quality of the answer. A vague prompt delivers a vague answer. A prompt with clear context and a defined format will almost always deliver something usable on the first try.

Prompt Engineering

The skill of writing prompts so that the model reliably delivers the desired result. This includes: providing context, defining the model's role, specifying the output format, setting constraints. No rocket science, but a noticeable difference.

Context Window

The maximum amount of text a model can process at once — prompt, conversation history, and response combined. Claude's current flagship models (Opus and Sonnet) have a context window of 1 million tokens. The fastest model (Haiku) has 200,000.

When a chat gets too long, the beginning falls out of the context window. The model then "forgets" the first messages. That's why: start a new chat for new topics.

System Prompt / Custom Instructions

An instruction that goes to the model before your actual question. You don't see it with every message, but it works in the background.

In Claude, the feature is called "Custom Instructions" — you can set globally how Claude should respond, or define specific rules per Project. Saves time because you don't have to repeat things.

How AI Is Trained

Training Data

The texts, books, websites, and documents an LLM was trained on. The model learns language, facts, and patterns from these. What's missing from the training data, the model doesn't know. What's biased may be adopted as bias.

Knowledge Cutoff

The date up to which the training data extends. Claude or ChatGPT don't know about events after this date — unless they have access to web search or you provide the information in the chat.

Fine-Tuning

Re-training an existing LLM with additional, specific data. A company might train a model with its own support tickets, for example, so it responds better to customer inquiries. Expensive and complex — regular users don't do this.

RLHF — Reinforcement Learning from Human Feedback

The training method that ensures chatbots give useful answers. Humans rate a model's responses — what was helpful, what was bad. The model learns from this which answers are desirable. Without RLHF, today's chatbots would be much harder to use.

Technical Concepts

Temperature

Controls how "creative" a model's responses are. Low: predictable, consistent. High: more variation, but also more risk of nonsense. As a user, you rarely set this yourself — but it explains why Claude sometimes answers the same question slightly differently.

Inference

The moment when a trained model actually generates a response. Training is the learning process (months, enormous computing power). Inference is the application (seconds, with every message).

API

An interface through which software communicates with software. The Claude API lets developers integrate Claude into their own applications. Many apps that advertise "with AI" use the Claude or OpenAI API behind the scenes.

MCP — Model Context Protocol

An open standard that lets AI models connect to external tools. Developed by Anthropic, open to everyone. Through MCP, Claude can access emails, search Google Drive, or read Notion pages — these are the "Connectors" in Claude. Currently over 50 connections.

RAG — Retrieval-Augmented Generation

A method where an LLM retrieves relevant documents from a database before answering. Instead of only responding from training material, the model works with current information. When you upload a PDF to Claude, it's loaded directly into the context — technically not RAG, but the principle is similar: the model works with specific documents rather than just its training material.

Agent

An AI system that executes tasks independently. It breaks a goal into steps, uses tools, and makes intermediate decisions — without you having to direct every step. Claude Code is an example: "Build me a contact page" — and Claude writes code, creates files, tests, and fixes errors.

Multimodal

A model that understands not only text but can also process images, audio, or video. Claude can analyze photos, read screenshots, and interpret diagrams. GPT-4o can additionally process speech in real time.

Safety and Society

Hallucination

When an LLM responds with something factually wrong but convincing-sounding. Happens because the model operates on probabilities, not lookups. Always double-check facts, numbers, and sources.

Bias

Systematic distortions in a model's responses. When training data overrepresents certain perspectives, that's reflected in the output. AI providers work to reduce bias — it's not fully eliminated.

Guardrails

Safety rules built into a model. They prevent the model from generating harmful or illegal content. The boundaries of these guardrails are an ongoing topic in AI research.

Alignment

The challenge of training AI systems to behave in the interest of the user and society. An "aligned" model does what you mean — not just what you literally wrote. One of the central research topics at Anthropic, OpenAI, and Google.

Deepfake

AI-generated images, videos, or audio that convincingly imitate a real person. The quality is now so high that deepfakes are nearly impossible to detect without technical tools. Frequently associated with disinformation and fraud.

Model Types and Market

Text-to-Image

AI tools that generate images from text descriptions. DALL-E, Midjourney, and Stable Diffusion are the most well-known. Quality has improved significantly — text rendering and hands, once typical weak spots, are increasingly well-handled.

Open Weight vs. Closed Source

Open-weight models (e.g., Llama from Meta) publish their model weights — you can download and run them locally, often with certain license terms. Closed-source models (Claude, GPT-4) are only available through the provider. The performance gap between the two categories is shrinking.

AGI — Artificial General Intelligence

The hypothetical AI that can handle all tasks at a human level. Doesn't currently exist. Presented as the end goal of the major AI labs. When or whether AGI arrives is extremely contested among experts.

Benchmark

A standardized test for comparing models. When a provider says their model is "the best", they're referring to benchmark results. Useful as orientation, but a good score doesn't automatically mean the model is best for your specific task.

These are the terms that come up most frequently in articles, discussions, and product descriptions. Not exhaustive — but enough to keep up everywhere.