OpenAI's Latest Research: Why GPT-5 and Other LLMs Still Hallucinate

OpenAI released its latest research paper, stating that even though large language models (LLM) like GPT-5 have made significant progress, "AI hallucinations" (Hallucinations) remain a fundamental issue that may never be completely eliminated. The research team revealed through experiments that the model can confidently provide completely incorrect answers when responding to specific questions and proposed a new "evaluation mechanism" reform plan to help reduce the model's tendency to "guess randomly."

Researchers tested the AI model with different questions, and all the answers were wrong.

Researchers asked a widely used chatbot about a certain doctoral thesis topic, and ended up receiving three consecutive incorrect answers. Then they asked about its birthday, and the chatbot similarly provided three different dates, all of which were wrong.

Research indicates that AI models tend to provide answers with high confidence when faced with "very uncommon information" in certain data, but they can be wildly incorrect.

The pre-training mechanism only learns the "surface of the language" and does not understand factual accuracy.

Research indicates that the pre-training process of the model is done by "predicting the next word" through a large amount of text, but the data is not labeled as "true or false." In other words, the model only learns the surface of the language, not the factual correctness.

With the increase in model size, errors in regular patterns such as spelling or parentheses will gradually disappear.

However, information with high randomness, such as "someone's birthday", cannot be inferred through language models, making it prone to hallucinations.

AI models are encouraged to "guess blindly" and need to revise their evaluation models.

Research emphasizes that the evaluation method needs a major overhaul; the focus should not be merely on "right or wrong," but rather on heavily penalizing those confident but incorrect answers, while rewarding AI for "honestly saying I don't know." In other words, AI should be penalized more for giving wrong answers than for admitting it doesn't know.

On the other hand, if it answers "uncertain", it should also receive some points instead of being directly counted as zero. Moreover, this cannot simply be achieved by adding a few more tests for show; it must fundamentally overturn the current evaluation system that only looks at the accuracy rate. If the evaluation method is not corrected, AI will continue to make random guesses.

The research ultimately indicates that to reduce illusions, it is necessary to start from the evaluation system and establish a testing method that can genuinely encourage "caution and honesty." Instead of requiring AI to "get it right every time," it is more important to establish game rules that can accept AI saying "I don’t know."

( 2025 Latest Analysis of the Top Five Mainstream LLMs, Understanding Payment, Applications, and Security All at Once )

This article OpenAI's latest research: Why GPT-5 and other LLMs still talk nonsense first appeared in Chain News ABMedia.

GPT4.74%
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)