AI Study Online
AI Basics

AI Hallucinations Explained: Why ChatGPT Makes Stuff Up (With Real Examples)

5 min read

What Is an AI Hallucination?

An AI hallucination is when a language model generates information that is confidently incorrect. The model states falsehoods as facts, with the same grammatical certainty as true statements. It does not know it is wrong — because from its perspective, it is simply predicting the most probable next token.

This is not a bug that can be "fixed" with better code. It is a fundamental property of how LLMs work. They are next-token predictors, not fact-retrieval systems. Hallucination is the price we pay for a model that can write a poem, explain quantum physics, and draft a business plan — because the same mechanism that enables creativity also enables fabrication.

3 Real Documented Cases

Case 1: The Lawyer Who Filed Fake Cases

In 2023, a New York lawyer named Steven Schwartz used ChatGPT to prepare a legal brief. ChatGPT cited six court cases that did not exist — complete with docket numbers, judges, and legal reasoning. Schwartz filed the brief without verifying. The opposing counsel could not find any of the cases. When the judge asked, Schwartz admitted he had not verified the citations. He was sanctioned (fined) by the court.

What happened: ChatGPT had seen countless examples of legal citations in its training data. When asked for relevant cases, it generated plausible-looking citations — because that is the most statistically likely pattern. It had no way of knowing these cases did not exist.

Case 2: The Hallucinated Product Description

What happened: The AI combined patterns from other product descriptions (temperature-sensitive strips are real in other products) with the prompt's keywords. It "filled in" the details with plausible-sounding features that did not exist.

Case 3: Invented Scientific Citations

Researchers have documented multiple cases where AI models generate fake academic citations. A 2024 study found that when LLMs were asked to summarize research papers on specific topics, they invented non-existent papers with plausible titles, author names, and journal names — including publishing in real journals but with fabricated volume and page numbers.

What happened: The model learned the structure of academic citations (Author, Year, Title, Journal, Volume, Pages) and generated text matching that structure. The content was fabricated because the model does not have a database of real papers — it has a statistical pattern of what citations look like.

Why Does Hallucination Happen?

Go back to the core mechanism from Part 1: token prediction. When you ask "What is the capital of France?" the model's training data contains the pattern "Paris" following "capital of France" millions of times. The probability for "Paris" is ~95%, so the answer is correct.

But when you ask something that looks like a factual question but has no clear statistical answer in the data, the model does not say "I don't know." It generates the most plausible-sounding sequence it can. Several factors increase hallucination risk:

  • Obscure topics: Less training data means weaker statistical patterns, so the model fills in what is plausible.
  • Specific numbers and dates: LLMs are notoriously bad at exact numbers because token prediction does not favor arithmetic accuracy.
  • Recent events: If the event happened after the model's training cutoff, the model cannot know about it — but it may fabricate a plausible-sounding answer rather than admitting ignorance.
  • Ambiguous prompts: Vague questions give the model more room to fill gaps with invented details.

How to Detect Hallucinations

Technique 1: Cross-check facts. Treat every specific claim from an LLM as potentially fabricated until verified. Dates, statistics, citations, and quotes are the most commonly hallucinated items.

Technique 2: Ask for sources. Say "Can you provide specific sources for that claim?" If the model produces citations, verify them independently. Many users have caught hallucinations this way.

Technique 3: Use Perplexity for factual queries. Perplexity.ai is designed to ground responses in web search results. It is not immune to hallucination but includes citations you can click to verify. For factual research, Perplexity outperforms ChatGPT's standalone knowledge.

Technique 4: Ask the model to self-critique. A known workaround: after getting an answer, prompt "Are you sure about that? Double-check." This sometimes causes the model to reconsider high-probability but incorrect token sequences.

How to Reduce Hallucinations in Your Own Use

These techniques will not eliminate hallucinations (nothing will), but they reduce the rate significantly:

  1. Provide context. Do not ask "What are the key findings?" Say "Based on the transcript I just provided, what are the key findings?" Grounding the model in provided text reduces reliance on its statistical guesses.
  2. Ask for probabilities. "Rate your confidence in this answer from 1-10 and explain why." Models tend to be more cautious when explicitly asked about confidence.
  3. Break complex questions into steps. Instead of "Analyze this contract," ask "First, list all dates mentioned. Then summarize each clause separately." Step-by-step instructions reduce the model's need to "fill in" missing context.
  4. Use retrieval-augmented generation (RAG) tools. Tools like NotebookLM, Claude Projects, or custom GPTs let you upload documents that the model uses as its source of truth. When the model is constrained to your documents, hallucination drops dramatically.

What AI Companies Are Doing About It

The industry is actively working on the problem. The main approaches in 2026:

  • Retrieval-Augmented Generation (RAG): Before generating an answer, the model searches a knowledge base for relevant documents and uses them as context. This grounds the response in verified information. Every major AI platform now offers some form of RAG.
  • Grounding with web search: ChatGPT can now search the web, and Google Gemini is natively grounded in Google Search. This means the model can check facts against live sources — but only when search is explicitly enabled.
  • Constitutional AI and training improvements: Anthropic's Constitutional AI approach and improved post-training techniques have reduced hallucination rates in Claude compared to earlier models. Independent benchmarks show Claude 3.5 Sonnet hallucinates approximately 40-60% less than GPT-3.5 on factual questions.
  • Citation requirements: Modern models can be prompted to cite sources from their context, but this is a band-aid — the citation itself can be hallucinated.

The honest truth: Hallucination cannot be eliminated from pure LLMs. The mechanism that generates novel text is the same mechanism that generates false text. The solution is to combine LLMs with external tools (search, databases, verification systems) — not to rely on the model's "knowledge" alone.

FAQ

Q: Does Claude hallucinate less than ChatGPT?

In independent benchmarks (such as Vectara's Hallucination Leaderboard and LMSYS evaluations), Claude 3.5 Sonnet and GPT-4o have comparable hallucination rates on factual tasks, with Claude showing a slight edge on summarization tasks. Both hallucinate significantly less than GPT-3.5 or older models. However, no model is immune — you should verify critical information from any AI.

Q: Can I train a model to not hallucinate on my specific data?

Yes, this is called fine-tuning. If you have a dataset of verified Q&A pairs in your domain, you can fine-tune a model to be more accurate on those specific types of questions. This does not eliminate hallucinations on out-of-domain questions but can dramatically improve accuracy on your use case. Tools like LlamaFactory and services like OpenAI's fine-tuning API make this accessible without being a machine learning expert.

Q: Is a hallucination the same as a bug in the software?

No. A software bug is when code does not do what it was designed to do. Hallucination is when the model does exactly what it was designed to do (predict likely tokens) but that behavior produces an incorrect statement from a human perspective. It is a feature of the architecture, not a flaw in the implementation. This is why "fixing" hallucinations is fundamentally harder than fixing a normal software bug.

Frequently Asked Questions

Q: Why does ChatGPT make up facts when it sounds so confident?

LLMs do not distinguish between true and false — they only predict the statistically most likely next word. When the model lacks reliable training data, it still generates a plausible-sounding answer. Confidence is built into the architecture.

Q: How do I reduce hallucinations when using AI for research?

Ask the AI to cite sources or show its reasoning step by step. Use chain-of-thought prompting. For important facts, cross-check with a web search. Tools like Perplexity hallucinate less because they search the web in real time.

Q: Are some AI models more prone to hallucinations than others?

Yes. Smaller and older models hallucinate more. Claude and GPT-4 generally hallucinate less than GPT-3.5 or Llama 2. Models that can search the web hallucinate less on recent topics.

Share this article

Related Articles