AI Hallucination Examples: When ChatGPT, Claude, and Gemini Get It Wrong

AI Hallucination Examples

AI hallucinations cost businesses real money. In October 2025, Deloitte had to refund part of a $440,000 government contract after their report contained fabricated academic sources generated by AI. A U.S. lawyer faced a 90-day suspension for citing fake legal cases that ChatGPT invented. Google’s Alphabet lost $100 billion in market value when its Bard chatbot provided incorrect information in a promotional video.

These aren’t isolated incidents. Damien Charlotin’s AI Hallucination Cases database now tracks 486 documented cases worldwide, with 324 occurring in U.S. courts alone. As of May 2025, lawyers submit hallucinated content at a rate of two to three cases per day, up from two per week just months earlier.

Understanding what AI hallucinations look like in practice matters whether you’re a developer building AI systems, a business leader evaluating AI adoption, or a professional using ChatGPT for work. This article examines real hallucination examples from 2025, explains why they happen, and shows you how to spot fabricated AI content before it causes problems.

What Are AI Hallucinations? The Technical Reality

AI hallucinations occur when large language models like ChatGPT, Claude, or Gemini confidently generate information that sounds plausible but is completely false, misleading, or fabricated. Unlike simple mistakes, hallucinations get presented with the same confidence as verified facts, making them particularly dangerous.

The technical cause is straightforward: LLMs don’t actually “know” anything. They predict the next word based on statistical patterns learned from training data. When asked about topics where training data is sparse or contradictory, the model fills gaps with plausible-sounding content that matches learned patterns. The result is fiction presented as fact.

Research from Anthropic on their Claude model revealed something fascinating about how hallucinations actually work. Rather than being hardwired to hallucinate, Claude’s default behavior is to refuse answering questions it’s uncertain about. Hallucinations occur when internal circuits that normally say “I don’t know” fail to activate properly, allowing the model to generate responses without adequate grounding.

Think of it like a student who confidently answers an exam question by making educated guesses based on related material they studied, rather than admitting they don’t know. The answer sounds knowledgeable and uses correct terminology, but the specific facts are invented.

Current AI models achieve varying hallucination rates. According to 2025 benchmarks, Google’s Gemini-2.0-Flash-001 shows a 0.7% hallucination rate, while ChatGPT’s GPT-4o sits at approximately 1.5%, and Claude models range from 4.4% to 10.1% depending on the version. While these numbers represent massive improvements from the 21.8% rate in 2021, even a 1.5% error rate means roughly 15 out of every 1,000 responses contain fabricated information.

The Lawyer Who Cited Fake Legal Cases: ChatGPT’s Courtroom Disaster

Example of fake legal case citations generated by ChatGPT that led to lawyer sanctions in US federal court — AI Hallucination Examples: When ChatGPT, Claude, and Gemini Get It Wrong 3

Perhaps the most publicized hallucination example involved a U.S. lawyer who used ChatGPT to help draft court filings and ended up citing six completely fictitious legal cases. When opposing counsel challenged the citations, the lawyer claimed he didn’t realize ChatGPT was a generative language tool rather than a reliable legal database.

The federal judge was not amused. The case resulted in sanctions and a standing order requiring anyone appearing before that court to attest whether AI was used to draft filings, with any AI-generated content flagged for accuracy verification.

This wasn’t an isolated incident. By May 2025, legal AI expert Damien Charlotin’s database documented over 30 instances just in that month where lawyers submitted AI-hallucinated content. Legal observers fear the actual number is significantly higher, as many cases settle or get corrected before public records capture the errors.

In Colorado, a Denver attorney accepted a 90-day suspension after an investigation found he’d texted a paralegal about fabrications in a motion that ChatGPT helped draft, admitting “like an idiot” he hadn’t checked the work. The lawyer had specifically denied using AI when initially questioned, making the ethical violation more severe.

What makes legal hallucinations particularly problematic is that fabricated case citations can mislead judges and influence actual disputes between real parties. As University of Miami law professor Christina Frohock noted in her paper “Ghosts at the Gate: A Call for Vigilance Against AI-Generated Case Hallucinations,” if fake cases become prevalent and effective, they undermine the integrity of the legal system and erode trust in judicial orders.

Testing revealed the scope of the problem. When SCOTUSblog tested ChatGPT with 50 questions about the Supreme Court in January 2023, only 21 answers were correct. One egregiously wrong response claimed liberal Justice Ruth Bader Ginsburg had dissented in Obergefell v. Hodges, the 2015 ruling recognizing constitutional rights to same-sex marriage. Ginsburg actually voted with the majority.

Legal AI tools hallucinate at rates far higher than acceptable for responsible legal practice. The combination of high stakes, reliance on precise precedent, and the ease with which AI can fabricate convincing case citations creates a perfect storm for professional consequences.

The $440,000 Government Report: Deloitte’s Phantom Citations

In October 2025, multiple hallucinations including non-existent academic sources and a fake quote from a federal court judgment appeared in a $440,000 report written by Deloitte and submitted to the Australian government. The consulting giant later submitted a revised report with errors removed and issued a partial refund.

After a University of Sydney academic flagged multiple errors and urged investigation, Deloitte acknowledged using a generative AI tool to fill “traceability and documentation gaps” in its analysis. The revised version stripped more than a dozen bogus references and footnotes.

While officials claimed the report’s substantive recommendations remained unchanged, the incident undermines trust in consultancy practice, particularly when paid expert reports rely partially on flawed AI-generated content. This case illustrates how hallucinations can slip past multiple review stages at major professional services firms.

The phantom citations weren’t obviously fake. They followed proper academic citation format, referenced plausible-sounding journals and authors, and appeared alongside legitimate citations. This mixing of real and fabricated sources makes detection particularly difficult without systematic fact-checking of every single reference.

Google’s $100 Billion Loss: When Bard Got Astronomy Wrong

Google’s parent company Alphabet lost $100 billion in market value after its AI chatbot Bard provided incorrect information in a promotional video. The ad showed Bard mistakenly claiming that the James Webb Space Telescope had taken the very first pictures of a planet outside our solar system.

This was factually wrong. The first exoplanet images came from other instruments years earlier. The error occurred in promotional material designed to showcase Bard’s capabilities, making the mistake particularly embarrassing for Google. The market responded swiftly, valuing the reputational damage and competitive concerns at $100 billion in lost market capitalization.

This example demonstrates that hallucinations don’t just affect end users. They can materially impact companies deploying AI, particularly when errors appear in high-visibility contexts. Google’s engineering resources and review processes failed to catch a basic factual error that damaged investor confidence in their AI capabilities.

Air Canada’s Chatbot: The Bereavement Fare That Never Existed

In February 2024, Canadian airline Air Canada was ordered by the Civil Resolution Tribunal to pay damages to a customer and honor a bereavement fare that its AI chatbot had fabricated. The chatbot confidently told a customer they could apply for a bereavement discount retroactively after purchasing a ticket.

This policy didn’t exist. When the customer tried to claim the discount following their purchase, Air Canada refused, claiming the chatbot was wrong. The customer sued, and the tribunal ruled in their favor.

Air Canada’s defense argued the chatbot was a “separate legal entity responsible for its own actions.” The tribunal rejected this argument, holding the airline accountable for information provided through their official channels. This case established important precedent: companies cannot disclaim responsibility for their AI systems’ statements.

The ruling matters because it assigns liability for AI hallucinations to the deploying organization, not the AI system itself. Businesses implementing customer-facing AI must ensure accuracy or face legal consequences for fabricated information that customers reasonably rely upon.

Chicago Sun-Times: The Fake Book List That Made It to Print

In 2025, Chicago Sun-Times readers discovered their “Summer Reading List” included fake books attributed to real authors. Only 5 of the 15 titles were genuine works. The remaining 10 were fabrications with convincing descriptions.

The newspaper’s management explained the list came from another publisher that acknowledged using AI to generate it. Although the newspaper removed the list from its online edition, readers of the printed version expressed disappointment about paying for AI-generated content that hadn’t been fact-checked.

The fake titles sounded plausible. They matched the writing style and genre of their attributed authors. Descriptions included plot summaries, themes, and even fictional publication dates. Without checking ISBN numbers or publisher records, the books appeared legitimate.

This case shows how hallucinations can affect publishing and media, particularly when AI content gets incorporated without verification. The reputational damage to a newspaper that prides itself on editorial standards exceeded the minor time savings from AI-assisted content generation.

Medical AI Hallucinations: Whisper’s Dangerous Fabrications

OpenAI’s Whisper speech-to-text model, increasingly adopted in hospitals, has been found to hallucinate on many occasions. An Associated Press investigation revealed that Whisper invents false content in transcriptions, inserting fabricated words or entire phrases not present in audio recordings.

The errors included attributing race, violent rhetoric, and non-existent medical treatments to recordings. Although OpenAI advised against using Whisper in “high-risk domains,” over 30,000 medical workers continue using Whisper-powered tools to transcribe patient visits.

Medical transcription errors carry serious consequences. Wrong medication names, incorrect dosages, or fabricated symptoms in medical records can lead to improper treatment. When doctors rely on AI-generated transcripts without listening to original recordings, hallucinations enter patient health records as fact.

In July 2025, researchers from Flinders University found that leading AI models could be manipulated to produce dangerously false medical advice. Chatbots stated that sunscreen causes skin cancer and linked 5G to infertility, accompanied by convincing but entirely fabricated citations from reputable journals like The Lancet. Without robust fact-checking, this misinformation could spread through trusted platforms and endanger public health.

The Mathematical Hallucination: When AI Can’t Calculate

Large language models consistently struggle with mathematics, creating a specific category of hallucinations around numerical reasoning. When ChatGPT-4o was asked whether 2,089 is prime, it confidently stated the number is not prime because it equals 11 × 19 × 9.

This answer is wrong on multiple levels. First, 11 × 19 × 9 equals 1,881, not 2,089. Second, 2,089 is actually prime. The model admitted its mistake when corrected but then gave another wrong answer when the question was rephrased.

Mathematical hallucinations occur because LLMs trained on text data don’t actually perform calculations. They pattern-match against similar mathematical expressions seen during training. For simple arithmetic, this works reasonably well. For less common calculations, the model guesses based on what “looks right” mathematically.

This limitation matters for any application requiring numerical accuracy: financial modeling, scientific research, engineering calculations, or even basic accounting. AI tools marketed as productivity enhancers can introduce serious errors if users don’t verify mathematical outputs.

The Visa Hallucination: Travel Advice That Stranded a Passenger

An Australian traveler made headlines after getting stuck at the airport while trying to travel to Chile. ChatGPT had assured him he didn’t need a visa to enter Chile, but he did. This hallucination has since been corrected, but not before causing real travel disruption and expense.

Travel requirements change frequently and vary based on passport nationality, length of stay, and purpose of visit. AI models trained on historical data may not reflect current visa policies. When travelers trust AI-generated advice without confirming with official sources, they risk denied boarding, deportation, or entry refusal.

This example shows how hallucinations extend beyond professional contexts into everyday consumer use cases. People increasingly query AI chatbots for practical life decisions where accuracy matters. Wrong travel advice, incorrect legal information, or bad medical guidance can all cause real harm.

Why AI Hallucinations Happen: The Technical Causes

Understanding the root causes helps explain why even sophisticated models hallucinate and what we can realistically expect from AI systems.

Bar chart comparing hallucination rates: Gemini 0.7%, ChatGPT 1.5%, Claude 4-10% across different AI models — AI Hallucination Examples: When ChatGPT, Claude, and Gemini Get It Wrong 4

Prediction, Not Knowledge

Large language models don’t possess knowledge the way humans do. They predict the next word based on patterns learned from massive text datasets. If training data is sparse or contradictory for a given topic, the model fills gaps with plausible-sounding content that matches learned patterns.

Think of it as very sophisticated autocomplete. The model doesn’t verify whether statements are true. It generates what statistically should come next based on similar texts it saw during training.

Training Data Limitations

LLMs learn from internet text, academic papers, books, and other sources. This training data contains both facts and fiction, accurate information and misinformation, current data and outdated content. The model can’t inherently distinguish between reliable and unreliable sources.

For specialized domains like law, medicine, or recent technical developments, training data may be limited. The model then relies more heavily on inference from related domains, increasing hallucination risk.

Lack of Real-Time Information

Most models have fixed training data cutoffs. GPT-4’s knowledge generally extends to April 2023, though some versions include more recent data through search integrations. Claude and other models have similar limitations. When asked about events or information after their training cutoff, models must guess or refuse to answer.

Even models with web search integration can hallucinate if search results are ambiguous or if the model misinterprets retrieved information.

Confidence Without Verification

AI models generate responses with consistent tone and formatting regardless of certainty. A completely fabricated answer gets presented with the same confidence as a well-established fact. The model doesn’t internally track which statements it’s certain about versus which it’s guessing.

This uniform confidence particularly problematic because users can’t distinguish reliable from unreliable outputs without external verification.

How to Spot AI Hallucinations: Practical Detection Methods

Identifying fabricated content before it causes problems requires systematic verification approaches.

Verify Citations and Sources

Check that academic papers, legal cases, news articles, or studies actually exist. Search for exact titles, author names, and publication details. Many hallucinated citations sound plausible but don’t exist. DOI numbers for academic papers should resolve to real articles. Case law should appear in legal databases.

Cross-Reference Facts

Don’t rely on AI alone for important information. Verify key facts through authoritative sources. If AI claims a specific statistic, find the original source. If it states a historical fact, confirm through reliable references.

Look for Inconsistencies

Hallucinations often contain internal contradictions or statements that don’t logically fit together. If something seems off or contradictory, investigate further. Real information tends toward consistency, while fabricated content may mix incompatible details.

Check Recent Information Carefully

AI models struggle most with recent events or rapidly changing information. Be especially skeptical of claims about current news, recent legal changes, or emerging scientific findings. These areas have highest hallucination risk.

Use Multiple AI Systems

Different models hallucinate differently. If something is important, check the same query across ChatGPT, Claude, and Gemini. Consistent answers across models suggest accuracy, while contradictory responses indicate uncertainty where hallucination is likely.

Ask for Uncertainty

Request that AI indicate confidence levels or admit uncertainty. Prompts like “Only answer if you’re certain, otherwise say you don’t know” can reduce hallucinations. Well-designed models like Claude increasingly acknowledge uncertainty rather than guess.

The Current State: Are Hallucinations Improving?

Hallucination rates have dropped dramatically. The 21.8% hallucination rate in 2021 fell to just 0.7% in Google’s Gemini-2.0-Flash by 2025. That represents a 96% improvement through better training data, improved architecture, and techniques like Retrieval-Augmented Generation (RAG).

However, even low hallucination rates matter at scale. A 1.5% error rate means 15 fabrications per 1,000 responses. For applications handling millions of queries, that’s thousands of hallucinations daily.

Hallucination rates vary dramatically by domain. According to 2025 testing, AI models hallucinate legal information 6.4% of the time and programming content 5.2% of the time, while general knowledge queries show much lower error rates around 1-2%.

Research from organizations like Anthropic shows promise. Their interpretability research identified specific neural circuits that cause Claude to decline answering questions unless it has adequate information. Understanding these mechanisms points toward engineering solutions that make models more honest about uncertainty.

RAG (Retrieval-Augmented Generation) currently provides the most effective hallucination reduction, cutting false responses by approximately 71% on average. By grounding AI responses in retrieved documents from trusted sources, RAG ensures models work from verified information rather than relying solely on training data patterns.

What This Means for AI Users and Developers

Understanding AI hallucinations isn’t just academic. It has practical implications for anyone using or building AI systems.

For Business Users

Never use AI-generated content without verification for high-stakes applications like legal documents, medical advice, financial analysis, or regulatory compliance. The time saved by AI gets lost many times over when hallucinations cause problems. Implement review processes where human experts verify AI outputs before they’re acted upon.

Consider AI as a draft generator rather than authoritative source. Let it produce initial content, then fact-check thoroughly. This workflow captures efficiency benefits while managing hallucination risks.

For Developers

Implement RAG systems that ground AI responses in verified documents. Connect models to authoritative knowledge bases relevant to your application domain. This prevents models from guessing when they should instead retrieve.

Fine-tune models on high-quality, domain-specific data to reduce gaps in knowledge. RLHF (Reinforcement Learning from Human Feedback) helps train models to prefer truthful responses over plausible-sounding fabrications.

Add confidence scoring to outputs, letting users know when the model is uncertain. Design prompts that encourage models to admit uncertainty rather than guess. Better to get “I don’t know” than convincing fiction.

For Everyone

Maintain healthy skepticism about AI outputs. Verify important information through authoritative sources. Understand that current AI systems are impressive pattern matchers, not reliable fact databases. They should augment human intelligence, not replace human judgment.

The gap between AI capabilities and user expectations creates the most dangerous situations. When people trust AI outputs as facts without verification, hallucinations cause real damage. Education about AI limitations is as important as celebrating AI successes.

Conclusion: The Hallucination Challenge Continues

AI hallucinations represent the gap between what AI appears capable of and what it actually knows. Models that converse fluently, write convincingly, and answer confidently can seem omniscient. The reality is more modest: they’re extremely sophisticated prediction engines that sometimes fill knowledge gaps with fabrications.

The examples covered here demonstrate real costs of hallucinations: professional sanctions, financial losses, reputational damage, and erosion of trust. As AI adoption accelerates across industries, understanding hallucination risks becomes essential for responsible deployment.

Progress is real. Hallucination rates have dropped 96% in four years. Techniques like RAG provide practical mitigation. Research into AI interpretability promises future improvements. But for now, the challenge persists at low but non-zero rates that matter at scale.

The solution isn’t to avoid AI but to use it appropriately. Treat AI as a powerful drafting and brainstorming tool that requires human oversight. Verify important information. Build systems with hallucination safeguards. Maintain the human judgment layer that AI can’t yet replace.

As long as large language models generate text by predicting probable next words rather than reasoning from verified knowledge, some level of hallucination will persist. Understanding this limitation helps us capture AI’s benefits while avoiding its risks.

Frequently Asked Questions About AI Hallucinations

What is an AI hallucination?

An AI hallucination occurs when a large language model like ChatGPT, Claude, or Gemini generates information that sounds plausible and confident but is actually false, misleading, or completely fabricated. Unlike simple errors, hallucinations are presented with the same confidence as verified facts, making them difficult to detect without external verification. The term draws an analogy with human psychology, though AI hallucinations involve erroneously constructed responses rather than perceptual experiences.

Why do AI models hallucinate?

AI models hallucinate because they predict the next word based on statistical patterns from training data rather than actually “knowing” facts. When training data is sparse, contradictory, or the model lacks information about a topic, it fills gaps with plausible-sounding content that matches learned patterns. The model doesn’t verify truth or maintain awareness of which statements it’s certain about versus which it’s guessing. This fundamental architecture means some level of hallucination persists even in advanced models.

Which AI chatbot hallucinates the most?

According to 2025 benchmarks, hallucination rates vary significantly between models. Google’s Gemini-2.0-Flash-001 shows the lowest rate at 0.7%, while ChatGPT’s GPT-4o sits at approximately 1.5%, and Claude models range from 4.4% to 10.1% depending on the version. However, hallucination rates also vary by domain. All models hallucinate legal information and programming content at higher rates (5-7%) than general knowledge queries. The specific model and use case matter more than overall rankings.

Can you give examples of ChatGPT hallucinations?

Real ChatGPT hallucination examples include a U.S. lawyer who submitted court filings citing six completely fictitious legal cases that ChatGPT invented, resulting in sanctions and a 90-day suspension. When SCOTUSblog tested ChatGPT with Supreme Court questions, it falsely claimed Justice Ruth Bader Ginsburg dissented in Obergefell v. Hodges when she actually voted with the majority. ChatGPT told an Australian traveler he didn’t need a visa for Chile when he actually did, causing him to get stranded at the airport. In mathematical calculations, ChatGPT-4o incorrectly stated that 2,089 is not prime because it equals 11 × 19 × 9, which is wrong on multiple levels.

How can I detect AI hallucinations?

Detect AI hallucinations by systematically verifying outputs. Check that cited sources actually exist by searching for exact titles, authors, and publication details. Cross-reference important facts through authoritative sources rather than trusting AI alone. Look for internal inconsistencies or contradictions within responses. Be especially skeptical of recent information, as models struggle most with current events. Query the same question across multiple AI systems; consistent answers suggest accuracy while contradictions indicate uncertainty where hallucination is likely. Request that AI indicate confidence levels or admit uncertainty rather than guessing.

Are AI hallucinations getting better or worse?

AI hallucinations are dramatically improving. Hallucination rates dropped from 21.8% in 2021 to as low as 0.7% in Google’s Gemini-2.0-Flash by 2025, representing a 96% improvement. Better training data, improved model architecture, and techniques like Retrieval-Augmented Generation (RAG) have driven this progress. However, even low rates matter at scale. A 1.5% error rate means 15 fabrications per 1,000 responses, translating to thousands of hallucinations daily for applications handling millions of queries. Certain domains like legal information and programming still show higher error rates around 5-7%.

What industries are most affected by AI hallucinations?

Legal services face the highest impact, with over 486 documented cases of lawyers submitting hallucinated content to courts as of 2025. Healthcare sees serious consequences from medical transcription errors and fabricated medical advice. Consulting and professional services experience reputational damage, as demonstrated by Deloitte’s $440,000 report containing phantom citations. Publishing and media face credibility issues when AI-generated content includes fake information, like the Chicago Sun-Times’ fabricated book list. Aviation and travel industries encounter customer service problems when chatbots provide incorrect policy information, as Air Canada discovered.

How do companies reduce AI hallucinations?

Companies reduce hallucinations primarily through Retrieval-Augmented Generation (RAG), which connects AI models to verified knowledge bases and cuts false responses by approximately 71%. Fine-tuning models on high-quality, domain-specific data fills knowledge gaps that lead to guessing. Reinforcement Learning from Human Feedback (RLHF) trains models to prefer truthful responses over plausible fabrications. Implementing confidence scoring lets users know when models are uncertain. Designing prompts that encourage models to admit “I don’t know” rather than guess prevents fabricated responses. Human review processes catch errors before AI outputs reach end users.

What is the difference between an AI error and a hallucination?

An AI error is a straightforward mistake where the model provides incorrect information, often due to outdated training data or misunderstanding a query. A hallucination involves the model confidently generating entirely fabricated information, such as inventing non-existent academic papers, creating fake legal cases, or citing sources that don’t exist. Hallucinations are more problematic because they’re presented with the same confidence as facts, include plausible details that make them seem real, and often mix with accurate information making detection difficult. Errors might be caught through basic fact-checking, while hallucinations require systematic verification of sources and citations.

Should I stop using AI because of hallucinations?

No, you shouldn’t stop using AI, but you should use it appropriately with awareness of hallucination risks. Treat AI as a powerful drafting and brainstorming tool that requires human oversight rather than an authoritative fact source. Never use AI-generated content without verification for high-stakes applications like legal documents, medical advice, financial analysis, or regulatory compliance. Implement review processes where experts verify AI outputs before action. Use AI to increase productivity and generate initial content, then fact-check thoroughly. Understanding limitations allows you to capture AI’s substantial benefits while managing risks through proper workflows.

Sources Referenced:

Anthropic Research – Claude Interpretability Studies
Damien Charlotin – AI Hallucination Cases Database
Vectara – LLM Hallucination Leaderboard
Associated Press – Whisper AI Investigation
Legal Case Documentation – U.S. Federal Courts

Business Address: