What Is an LLM, Really?
You have seen the headlines: "Large language models are changing everything." CEOs say them in earnings calls. Your cousin mentioned them at dinner. The term gets thrown around like everyone should already know what it means.
Here is the simplest way to think about it:
A large language model (LLM) is a very advanced version of your phone's autocomplete.
When you type "Happy Bir..." on your phone, it suggests "Birthday." That is a tiny language model making a prediction. Now imagine that system was trained on most of the public internet — books, Wikipedia, Reddit, scientific papers, GitHub code, news articles — and scaled up to be thousands of times more sophisticated. That is an LLM.
How LLMs Actually Work (No Math)
Every LLM does exactly one thing: it predicts the next word (or more precisely, the next "token"). A token is roughly 0.75 words. When you type a question, the model looks at all the tokens so far and calculates the most likely next token, over and over, until the response is complete.
There is no database lookup. There is no "knowledge" being retrieved. The model has no internal Wikipedia. It has a statistical map of which tokens tend to follow which sequences of tokens, built from the training data.
By the Numbers
The scale is genuinely staggering:
- GPT-4 was trained on approximately 13 trillion tokens — roughly 10 trillion words. That is the equivalent of about 40 million books.
- GPT-4 is estimated to have 1.76 trillion parameters. More on what that means below.
- The training run consumed thousands of GPUs running for months. Estimated cost: $100 million+.
- Meta's Llama 3 405B was trained on 15.6 trillion tokens using 30.8 million GPU hours.
These numbers help explain why only a handful of companies in the world can build frontier models: OpenAI, Google, Anthropic, Meta, and a few others. The compute cost alone is prohibitive.
"Billions of Parameters" — What Does That Mean?
A parameter is a number the model learned during training. Think of it as a tiny weight that influences the prediction. When you multiply billions of these weights together through the model's neural network, you get a prediction.
A useful analogy: imagine a recipe with 1.76 trillion knobs. During training, the model adjusts each knob so that when you give it "The capital of France is ___," it turns "Paris" into the most probable answer. "Billions of parameters" just means there are that many knobs to tune.
By comparison:
- GPT-1 (2018): 117 million parameters
- GPT-3 (2020): 175 billion parameters
- GPT-4 (2023): ~1.76 trillion parameters (estimated)
- Llama 3 (2024): 8B, 70B, and 405B parameter variants
- Claude 3.5 Sonnet (2024): estimated <100B parameters but outperforms many larger models
Notice the last entry: bigger is not always better. Architecture and training data quality matter at least as much as raw parameter count.
How LLMs Are Different From Traditional Software
This is the most important distinction to understand:
Traditional software: A developer writes explicit rules. If you click "Save," the program calls saveFile(). Every behavior is deterministic and programmed by a human. If it does something wrong, a human wrote the wrong code.
LLMs: No human wrote rules for what to say. The model learned patterns from data. When you ask a question, it generates a response that is statistically likely based on its training, not one that is guaranteed correct. This is why LLMs can write poetry (there is no "poetry function") and also why they confidently state false information (there is no "fact-check function").
Real Example: What Happens Token by Token
Let's say you ask ChatGPT: "What is the capital of Japan?"
Here is approximately what happens under the hood:
Input tokens: ["What", " is", " the", " capital", " of", " Japan", "?"]
Processing starts:
Step 1: Model predicts next token → "The" (probability: 0.85)
Step 2: → "capital" (probability: 0.78)
Step 3: → "of" (probability: 0.92)
Step 4: → "Japan" (probability: 0.90)
Step 5: → "is" (probability: 0.83)
Step 6: → "Tokyo" (probability: 0.95)
Step 7: → "." (probability: 0.89)
Step 8: → " It" (probability: 0.72) — begins follow-up elaboration
... continues until the model predicts "stop"
The model does not know Japan has a capital. It does not know Tokyo is a city. It has seen the pattern "the capital of [country] is [city]" so many times in its training data that "Tokyo" is the overwhelmingly probable next token after "The capital of Japan is."
This is also why the model might also tell you about Tokyo's population, the history of Edo, and recommend restaurants — it has seen those patterns follow the same trigger phrase.
The Most Important Thing to Remember
LLMs predict. They do not know.
When a lawyer asks ChatGPT for case law and it fabricates six nonexistent court cases (this actually happened in 2023), it is not lying. It is predicting the most likely sequence of tokens that looks like a legal citation. The model has never seen a "truth database." It has seen patterns of text that include citations, so it generates more text that looks like a citation.
This distinction — prediction versus knowledge — explains nearly every weird behavior of LLMs: the hallucinations, the confident wrong answers, the creativity, the ability to write in any style, and the inability to do simple arithmetic reliably.
FAQ
Q: Do LLMs understand what they are saying?
No. There is no evidence of understanding, consciousness, or awareness in any current LLM. They manipulate tokens based on statistical patterns. They can appear to understand because human language is patterned, and mimicking patterns convincingly creates the illusion of understanding. But the underlying mechanism is prediction, not comprehension.
Q: Are all LLMs basically the same under the hood?
Architecturally, yes — most modern LLMs use a variant of the Transformer architecture (introduced by Google in 2017). But they differ enormously in training data, training methodology, size, and fine-tuning. GPT-4, Claude, Gemini, Llama, and DeepSeek all use Transformers but produce very different outputs because of different training choices.
Q: Can I run an LLM on my own computer?
Yes, but with caveats. Small models like Llama 3.2 3B or Microsoft Phi-3 can run on a modern laptop (8GB+ RAM) using tools like Ollama or LM Studio. Frontier models like GPT-4 require data center-scale hardware and cannot run locally. There is a growing ecosystem of capable small models that work offline and respect your privacy — at the cost of some capability compared to cloud models.
Frequently Asked Questions
Q: How does an LLM know the answer without searching the internet?
LLMs do not search the internet or a database for answers. They predict the most likely next word based on patterns learned from billions of text examples during training. Think of it as advanced autocomplete, not a search engine. This is why LLMs can sound confident even when wrong.
Q: What is the difference between a token and a word?
A token is roughly 0.75 words on average. LLMs process text in tokens, not whole words. Your API bill is calculated by total tokens used — both the input you send and the output the model generates.
Q: Can I run an LLM on my laptop without internet?
Yes, you can run smaller open-source LLMs locally using Ollama or LM Studio. Models with 7 billion parameters or fewer can run on a modern laptop with 8GB+ RAM. Large models like GPT-4 or Claude require cloud servers.