Fatal hallucinations, development of GPU alternatives, large models still face these 10 major challenges

The release of ChatGPT, GPT-4, etc., let us see the charm of the large model (LLM), accompanied by various challenges it faces.

Image source: Generated by Unbounded AI

How to make LLM better? In the face of large models, what problems need to be solved? It has become an important research topic in the field of AI.

In this article, computer scientist Chip Huyen starts from 10 aspects and comprehensively expounds the challenges faced by LLM. Specifically, the first two aspects are about hallucinations and context learning, and several other aspects include but are not limited to multimodality, architecture, finding GPU alternatives, etc.

Original address:

The following is a translation of the original text.

1. How to reduce hallucinations

The hallucination problem is when the text generated by the LLM is fluid and natural, but not faithful to the source of the content (intrinsic problem) and/or uncertain (extrinsic problem). This problem exists widely in LLM.

Therefore, it is very important to alleviate hallucinations and develop indicators to measure hallucinations, and many companies and institutions are paying attention to this issue. Chip Huyen said that there are many ways to reduce hallucinations at this stage, such as adding more context to the prompt, using chains of thought, or making the model's response more concise.

Materials that can be referenced include:

  • A review of research on hallucinations in natural language generation:
  • How the illusion of language models snowballs:
  • ChatGPT evaluation on reasoning, hallucinations and interactivity:
  • Contrastive learning reduces hallucinations in conversations:
  • Self-consistency improves the thinking chain reasoning ability of the language model:
  • Black-box hallucination detection for generative large language models:

2. Optimize context length and context structure

Another research focus of LLM is the length of the context, because the large model needs to refer to the context when answering user questions, and the longer the length that can be processed, the more useful it is for LLM. For example, we asked ChatGPT "Which is the best Vietnamese restaurant?" Faced with this question, ChatGPT needs to refer to the context to figure out whether the user is asking about the best Vietnamese restaurant in Vietnam or the best Vietnamese restaurant in the United States. no the same.

Under this subsection, Chip Huyen presents several related papers.

The first is "SITUATEDQA: Incorporating Extra-Linguistic Contexts into QA", both authors are from the University of Texas at Austin. The paper introduces an open-retrieval QA dataset SITUATEDQA, and interested readers can check it out to learn more.

Chip Huyen stated that because the model learns from the context provided, this process is called contextual learning.

The second paper is "Retri-Augmented Generation for Knowledge-Intensive NLP Tasks". This paper proposes RAG (Retri-Augmented Generation), which can combine pre-trained language models and external knowledge to realize open-domain generative question answering and other knowledge Intensive tasks.

The RGA operation process is divided into two phases: the chunking (also known as retrieval) phase and the query phase:

Many people think, based on this research, that the longer the context, the more information the model will cram in and the better its response. Chip Huyen thinks this statement is not entirely true.

How much context a model can use and how efficiently a model uses context are two completely different questions. What we have to do is to increase the efficiency of the model processing context in parallel while increasing the length of the model context. For example, in the "Lost in the Middle: How Language Models Use Long Contexts" paper, the paper describes how the model can better understand the information at the beginning and end of the index, rather than the middle information.

3. Multimodal

Chip Huyen believes that multimodality is very important.

First, domains including healthcare, robotics, e-commerce, retail, gaming, entertainment, etc. require multimodal data. For example, medical prediction requires text content such as doctor's notes and patient questionnaires, as well as image information such as CT, X-ray, and MRI scans.

Second, multimodality promises to greatly improve model performance, with models that can understand both text and images performing better than models that can only understand text. Yet text-based models are so demanding of text that people are starting to worry that we will soon run out of internet data to train models. Once the text is exhausted, we need to consider other data modalities.

Flamingo Architecture Diagram

Regarding multimodality, you can refer to the following content:

  • 论文 1《Learning Transferable Visual Models From Natural Language Supervision》:
  • 论文 2《Flamingo: a Visual Language Model for Few-Shot Learning》:
  • 论文 3《BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models》:
  • 论文 4《Language Is Not All You Need: Aligning Perception with Language Models》:
  • Paper 5 "Visual Instruction Tuning":
  • Google PaLM-E:
  • NVIDIA NeVA:

4. Make LLM faster and cheaper

GPT-3.5 is first released in late November 2022, and many people are concerned about the high cost of use. However, in just half a year, the community has found a model that is close to GPT-3.5 in terms of performance, and the required memory footprint is only 2% of GPT-3.5.

Chip Huyen said that if you create something good enough, people will soon find a way to make it fast and cheap.

The following is a performance comparison of Guanaco 7B with models such as ChatGPT and GPT-4. But we have to emphasize that it is very difficult to evaluate LLM.

Then, Chip Huyen listed model optimization and compression techniques:

  • Quantification: The most general method for model optimization to date. Quantization uses fewer bits to represent parameters, thereby reducing the size of the model. For example, someone changes a 32-bit floating-point number to a 16-bit, or even a 4-bit floating-point representation;
  • Knowledge Distillation: A method of training a small model (student) to imitate a larger model or ensemble of models (teacher);
  • Low-rank decomposition: The key idea is to replace high-dimensional tensors with low-dimensional tensors to reduce the number of parameters. For example, users can decompose a 3x3 tensor into a product of 3x1 and 1x3 tensors, so that there are only 6 parameters instead of 9;
  • Pruning.

The above four methods are still popular, such as training Alpaca with knowledge distillation, and QLoRA combining low-rank decomposition and quantization.

5. Design a new model architecture

Since the release of AlexNet in 2012, many architectures including LSTM, seq2seq became popular and then became obsolete. Unlike that, Transformer is incredibly sticky. It has been around since 2017 and is still widely used until now. How long this architecture will be popular is hard to estimate.

However, it is not easy to develop a completely new architecture to surpass Transformer. In the past 6 years, researchers have made a lot of optimizations to Transformer. In addition to the model architecture, it also includes optimization at the hardware level.

The laboratory led by American computer scientist Chris Ré has conducted a lot of research around S4 in 2021. For more information, please refer to the paper "Efficiently Modeling Long Sequences with Structured State Spaces". In addition, the Chris Ré lab has invested heavily in the development of new architectures, and they recently partnered with startup Together to develop the Monarch Mixer architecture.

Their key idea is that for the existing Transformer architecture, the complexity of attention is the quadratic of the sequence length, while the complexity of MLP is the quadratic of the model dimension, and the architecture with low complexity will be more efficient.

6. Develop GPU alternatives

GPUs have dominated deep learning since the release of AlexNet in 2012. In fact, one well-recognized reason for AlexNet's popularity is that it was the first paper to successfully train a neural network using GPUs. Before the emergence of GPUs, if you wanted to train a model of the size of AlexNet, you had to use thousands of CPUs, and a few GPUs could do it.

Over the past decade, both large corporations and startups have attempted to create new hardware for artificial intelligence. The most representative ones include but are not limited to Google's TPU, Graphcore's IPU, and AI chip company Cerebras. Additionally, AI chip startup SambaNova raised more than $1 billion to develop new AI chips.

Another exciting direction is photonic chips, which use photons to move data around, enabling faster and more efficient computation. Several startups in this space have raised hundreds of millions of dollars, including Lightmatter ($270 million), Ayar Labs ($220 million), Lightelligence ($200 million+), and Luminous Compute ($115 million).

The following is a timeline of the progress of the three main approaches in photonic matrix computing, taken from the "Photonic matrix multiplication lights up photonic accelerator and beyond" paper. The three methods are planar light conversion (PLC), Mach-Zehnder interferometer (MZI) and wavelength division multiplexing (WDM).

7. Make agents more usable

Agents are LLMs that can take actions such as browsing the internet, sending emails, booking a room, etc. Compared with other research directions in this article, this direction appeared relatively late and is very new to everyone.

It is because of its novelty and great potential that everyone has a crazy obsession with intelligent agents. Auto-GPT is currently the 25th most popular project on GitHub. GPT-Engineering is another very popular project.

While this is expected and exciting, it remains doubtful whether LLM will be reliable enough and performant enough to be given the right to act.

However, an application case that has already appeared is to apply agents to social research. Some time ago, Stanford open sourced the "virtual town" Smallville. 25 AI agents lived in the town. They have jobs, can gossip, and can organize social activities. , make new friends, and even host a Valentine's Day party, each Town Dweller has a unique personality and backstory.

For more details, please refer to the following papers.

Paper address:

Probably the most famous startup in this space is Adept, founded by two Transformer co-authors and a former OpenAI VP, and has raised nearly $500 million to date. Last year, they did a demo showing how their agent could browse the internet and add a new account to Salesforce.

, duration 03:30

8. Improved Learning from Human Preferences

RLHF stands for Reinforcement Learning from Human Preferences. It wouldn't be surprising if people find other ways to train LLMs, after all RLHF still has a lot of problems to solve. Chip Huyen listed the following 3 points.

**How to represent human preferences mathematically? **

Currently, human preferences are determined by comparison: human annotators determine whether response A is better than response B, but do not consider how much better response A is than response B.

**What are human preferences? **

Anthropic measures the response quality of their models along three axes, usefulness, honesty, and innocence.

Paper address:

DeepMind also tries to generate responses that satisfy the majority. See this paper below.

Paper address:

But to be clear, do we want an AI that can take a stand, or a generic AI that avoids any potentially controversial topics?

**Whose preferences are the preferences of "people"? **

Given differences in culture, religion, etc., there are many challenges in obtaining training data that adequately represents all potential users.

For example, in OpenAI's InstructGPT data, the labelers are mainly Filipinos and Bangladeshis, which may cause some deviation due to geographical differences.

Source:

The research community is also working on this, but data bias persists. For example, in the demographic distribution of the OpenAssistant dataset, 201 of the 222 respondents (90.5%) were male.

9. Improve the efficiency of the chat interface

Since ChatGPT, there have been many discussions about whether chat is suitable for various tasks. For example these discussions:

  • Natural language is lazy UI
  • Why chatbots are not the future:
  • What types of questions require dialogue to answer?
  • The AI chat interface may become the main user interface for reading documentation:
  • Interact with LLM with minimal chat:

However, these discussions are not new. Many countries, especially in Asia, have used chat as an interface for super apps for about a decade.

  • *Chat as a common interface for Chinese apps

In 2016, when many thought apps were dead and chatbots were the future, the discussion became tense again:

  • About the chat interface:
  • Is the chatbot trend a huge misconception:
  • Bots will not replace apps, better apps will:

Chip Huyen said that he really likes the chat interface for the following reasons:

  • Chat is an interface that everyone can quickly learn to use, even those who have never had access to a computer or the Internet before.
  • There is no obstacle in the chat interface, even when you are in a hurry, you can use voice instead of text.
  • Chat is also a very powerful interface, you can make any request to it, even if the reply is not good, it will reply.

However, Chip Huyen thinks the chat interface leaves room for improvement in some areas. He has the following suggestions

  1. Multiple messages per round

Currently, it is pretty much thought that only one message can be sent per round. But that's not how people text in real life. Usually, multiple pieces of information are required to complete an individual's idea, because different data (such as pictures, locations, links) need to be inserted in the process, and the user may have missed something in the previous information, or just does not want to include everything Write it in a long paragraph.

  1. Multimodal input

In the domain of multimodal applications, most of the effort is spent on building better models, and little is spent on building better interfaces. In the case of Nvidia's NeVA chatbot, there may be room to improve the user experience.

address:

  1. Incorporate Generative AI into Workflows

Linus Lee articulates this well in his talk "AI-generated interfaces beyond chat". For example, if you want to ask a question about a column in a chart you're working on, you should be able to just point to that column and ask.

Video address:

  1. Editing and deleting information

It’s worth thinking about how editing or deleting user input can change the flow of a conversation with a chatbot.

10. Building an LLM for non-English languages

Current LLMs for English as the first language do not scale well to other languages in terms of performance, latency, and speed. Related content can read the following articles:

Paper address:

Article address:

Chip Huyen said that several early readers of this article told him that they thought this direction should not be included for two reasons.

  1. This is not so much a research question as a logistics one. We already know how to do it, it just needs someone to invest money and energy, which is not quite true. Most languages are considered low-resource languages, for example, have much less high-quality data than English or Chinese, and thus may require different techniques for training large language models. See the following articles:

Paper address:

Paper address:

  1. Pessimistic people think that many languages will die out in the future, and the future Internet will consist of two languages: English and Chinese.

The impact of AI tools, such as machine translation and chatbots, on language learning is unclear. Whether they help people learn new languages faster, or eliminate the need to learn new languages entirely, is unknown.

Summarize

The problems mentioned in this paper also have different levels of difficulty, such as the last problem, if you can find enough resources and time, it is achievable to construct LLM for non-English languages.

One of the first problems is to reduce hallucinations, which will be much harder, because hallucinations are just LLM doing probabilistic things.

The fourth problem is making LLM faster and cheaper, and this will not be completely solved. Some progress has been made in this area, and there will be more progress in the future, but we will never improve to perfection.

The fifth and sixth issues are new architectures and new hardware, which is very challenging, but inevitable over time. Because of the symbiotic relationship between architecture and hardware, where new architectures need to be optimized for general-purpose hardware, and hardware needs to support general-purpose architectures, this problem could potentially be solved by the same company.

There are also problems that cannot be solved with technical knowledge alone. For example, the eighth problem of improving methods for learning from human preferences may be more of a policy issue than a technical one. Speaking of the ninth question, improving interface efficiency, this is more like a user experience problem, and more people with non-technical backgrounds are needed to solve this problem together.

If you want to look at these problems from other angles, Chip Huyen recommends reading the following paper.

Paper address:

View Original
The content is for reference only, not a solicitation or offer. No investment, tax, or legal advice provided. See Disclaimer for more risks disclosure.
  • Reward
  • Comment
  • Share
Comment
0/400
No comments