🎉 Gate.io Growth Points Lucky Draw Round 🔟 is Officially Live!
Draw Now 👉 https://www.gate.io/activities/creditprize?now_period=10
🌟 How to Earn Growth Points for the Draw?
1️⃣ Enter 'Post', and tap the points icon next to your avatar to enter 'Community Center'.
2️⃣ Complete tasks like post, comment, and like to earn Growth Points.
🎁 Every 300 Growth Points to draw 1 chance, win MacBook Air, Gate x Inter Milan Football, Futures Voucher, Points, and more amazing prizes!
⏰ Ends on May 4, 16:00 PM (UTC)
Details: https://www.gate.io/announcements/article/44619
#GrowthPoints#
Fatal hallucinations, development of GPU alternatives, large models still face these 10 major challenges
The release of ChatGPT, GPT-4, etc., let us see the charm of the large model (LLM), accompanied by various challenges it faces.
How to make LLM better? In the face of large models, what problems need to be solved? It has become an important research topic in the field of AI.
In this article, computer scientist Chip Huyen starts from 10 aspects and comprehensively expounds the challenges faced by LLM. Specifically, the first two aspects are about hallucinations and context learning, and several other aspects include but are not limited to multimodality, architecture, finding GPU alternatives, etc.
The following is a translation of the original text.
1. How to reduce hallucinations
The hallucination problem is when the text generated by the LLM is fluid and natural, but not faithful to the source of the content (intrinsic problem) and/or uncertain (extrinsic problem). This problem exists widely in LLM.
Therefore, it is very important to alleviate hallucinations and develop indicators to measure hallucinations, and many companies and institutions are paying attention to this issue. Chip Huyen said that there are many ways to reduce hallucinations at this stage, such as adding more context to the prompt, using chains of thought, or making the model's response more concise.
Materials that can be referenced include:
2. Optimize context length and context structure
Another research focus of LLM is the length of the context, because the large model needs to refer to the context when answering user questions, and the longer the length that can be processed, the more useful it is for LLM. For example, we asked ChatGPT "Which is the best Vietnamese restaurant?" Faced with this question, ChatGPT needs to refer to the context to figure out whether the user is asking about the best Vietnamese restaurant in Vietnam or the best Vietnamese restaurant in the United States. no the same.
Under this subsection, Chip Huyen presents several related papers.
The first is "SITUATEDQA: Incorporating Extra-Linguistic Contexts into QA", both authors are from the University of Texas at Austin. The paper introduces an open-retrieval QA dataset SITUATEDQA, and interested readers can check it out to learn more.
Chip Huyen stated that because the model learns from the context provided, this process is called contextual learning.
The RGA operation process is divided into two phases: the chunking (also known as retrieval) phase and the query phase:
How much context a model can use and how efficiently a model uses context are two completely different questions. What we have to do is to increase the efficiency of the model processing context in parallel while increasing the length of the model context. For example, in the "Lost in the Middle: How Language Models Use Long Contexts" paper, the paper describes how the model can better understand the information at the beginning and end of the index, rather than the middle information.
3. Multimodal
Chip Huyen believes that multimodality is very important.
First, domains including healthcare, robotics, e-commerce, retail, gaming, entertainment, etc. require multimodal data. For example, medical prediction requires text content such as doctor's notes and patient questionnaires, as well as image information such as CT, X-ray, and MRI scans.
Second, multimodality promises to greatly improve model performance, with models that can understand both text and images performing better than models that can only understand text. Yet text-based models are so demanding of text that people are starting to worry that we will soon run out of internet data to train models. Once the text is exhausted, we need to consider other data modalities.
Regarding multimodality, you can refer to the following content:
4. Make LLM faster and cheaper
GPT-3.5 is first released in late November 2022, and many people are concerned about the high cost of use. However, in just half a year, the community has found a model that is close to GPT-3.5 in terms of performance, and the required memory footprint is only 2% of GPT-3.5.
Chip Huyen said that if you create something good enough, people will soon find a way to make it fast and cheap.
The above four methods are still popular, such as training Alpaca with knowledge distillation, and QLoRA combining low-rank decomposition and quantization.
5. Design a new model architecture
Since the release of AlexNet in 2012, many architectures including LSTM, seq2seq became popular and then became obsolete. Unlike that, Transformer is incredibly sticky. It has been around since 2017 and is still widely used until now. How long this architecture will be popular is hard to estimate.
However, it is not easy to develop a completely new architecture to surpass Transformer. In the past 6 years, researchers have made a lot of optimizations to Transformer. In addition to the model architecture, it also includes optimization at the hardware level.
The laboratory led by American computer scientist Chris Ré has conducted a lot of research around S4 in 2021. For more information, please refer to the paper "Efficiently Modeling Long Sequences with Structured State Spaces". In addition, the Chris Ré lab has invested heavily in the development of new architectures, and they recently partnered with startup Together to develop the Monarch Mixer architecture.
Their key idea is that for the existing Transformer architecture, the complexity of attention is the quadratic of the sequence length, while the complexity of MLP is the quadratic of the model dimension, and the architecture with low complexity will be more efficient.
GPUs have dominated deep learning since the release of AlexNet in 2012. In fact, one well-recognized reason for AlexNet's popularity is that it was the first paper to successfully train a neural network using GPUs. Before the emergence of GPUs, if you wanted to train a model of the size of AlexNet, you had to use thousands of CPUs, and a few GPUs could do it.
Over the past decade, both large corporations and startups have attempted to create new hardware for artificial intelligence. The most representative ones include but are not limited to Google's TPU, Graphcore's IPU, and AI chip company Cerebras. Additionally, AI chip startup SambaNova raised more than $1 billion to develop new AI chips.
Another exciting direction is photonic chips, which use photons to move data around, enabling faster and more efficient computation. Several startups in this space have raised hundreds of millions of dollars, including Lightmatter ($270 million), Ayar Labs ($220 million), Lightelligence ($200 million+), and Luminous Compute ($115 million).
The following is a timeline of the progress of the three main approaches in photonic matrix computing, taken from the "Photonic matrix multiplication lights up photonic accelerator and beyond" paper. The three methods are planar light conversion (PLC), Mach-Zehnder interferometer (MZI) and wavelength division multiplexing (WDM).
Agents are LLMs that can take actions such as browsing the internet, sending emails, booking a room, etc. Compared with other research directions in this article, this direction appeared relatively late and is very new to everyone.
It is because of its novelty and great potential that everyone has a crazy obsession with intelligent agents. Auto-GPT is currently the 25th most popular project on GitHub. GPT-Engineering is another very popular project.
While this is expected and exciting, it remains doubtful whether LLM will be reliable enough and performant enough to be given the right to act.
However, an application case that has already appeared is to apply agents to social research. Some time ago, Stanford open sourced the "virtual town" Smallville. 25 AI agents lived in the town. They have jobs, can gossip, and can organize social activities. , make new friends, and even host a Valentine's Day party, each Town Dweller has a unique personality and backstory.
For more details, please refer to the following papers.
Probably the most famous startup in this space is Adept, founded by two Transformer co-authors and a former OpenAI VP, and has raised nearly $500 million to date. Last year, they did a demo showing how their agent could browse the internet and add a new account to Salesforce.
, duration 03:30
8. Improved Learning from Human Preferences
RLHF stands for Reinforcement Learning from Human Preferences. It wouldn't be surprising if people find other ways to train LLMs, after all RLHF still has a lot of problems to solve. Chip Huyen listed the following 3 points.
**How to represent human preferences mathematically? **
Currently, human preferences are determined by comparison: human annotators determine whether response A is better than response B, but do not consider how much better response A is than response B.
**What are human preferences? **
Anthropic measures the response quality of their models along three axes, usefulness, honesty, and innocence.
DeepMind also tries to generate responses that satisfy the majority. See this paper below.
But to be clear, do we want an AI that can take a stand, or a generic AI that avoids any potentially controversial topics?
**Whose preferences are the preferences of "people"? **
Given differences in culture, religion, etc., there are many challenges in obtaining training data that adequately represents all potential users.
For example, in OpenAI's InstructGPT data, the labelers are mainly Filipinos and Bangladeshis, which may cause some deviation due to geographical differences.
The research community is also working on this, but data bias persists. For example, in the demographic distribution of the OpenAssistant dataset, 201 of the 222 respondents (90.5%) were male.
Since ChatGPT, there have been many discussions about whether chat is suitable for various tasks. For example these discussions:
However, these discussions are not new. Many countries, especially in Asia, have used chat as an interface for super apps for about a decade.
In 2016, when many thought apps were dead and chatbots were the future, the discussion became tense again:
Chip Huyen said that he really likes the chat interface for the following reasons:
However, Chip Huyen thinks the chat interface leaves room for improvement in some areas. He has the following suggestions
Currently, it is pretty much thought that only one message can be sent per round. But that's not how people text in real life. Usually, multiple pieces of information are required to complete an individual's idea, because different data (such as pictures, locations, links) need to be inserted in the process, and the user may have missed something in the previous information, or just does not want to include everything Write it in a long paragraph.
In the domain of multimodal applications, most of the effort is spent on building better models, and little is spent on building better interfaces. In the case of Nvidia's NeVA chatbot, there may be room to improve the user experience.
Linus Lee articulates this well in his talk "AI-generated interfaces beyond chat". For example, if you want to ask a question about a column in a chart you're working on, you should be able to just point to that column and ask.
Video address:
It’s worth thinking about how editing or deleting user input can change the flow of a conversation with a chatbot.
10. Building an LLM for non-English languages
Current LLMs for English as the first language do not scale well to other languages in terms of performance, latency, and speed. Related content can read the following articles:
Chip Huyen said that several early readers of this article told him that they thought this direction should not be included for two reasons.
The impact of AI tools, such as machine translation and chatbots, on language learning is unclear. Whether they help people learn new languages faster, or eliminate the need to learn new languages entirely, is unknown.
Summarize
The problems mentioned in this paper also have different levels of difficulty, such as the last problem, if you can find enough resources and time, it is achievable to construct LLM for non-English languages.
One of the first problems is to reduce hallucinations, which will be much harder, because hallucinations are just LLM doing probabilistic things.
The fourth problem is making LLM faster and cheaper, and this will not be completely solved. Some progress has been made in this area, and there will be more progress in the future, but we will never improve to perfection.
The fifth and sixth issues are new architectures and new hardware, which is very challenging, but inevitable over time. Because of the symbiotic relationship between architecture and hardware, where new architectures need to be optimized for general-purpose hardware, and hardware needs to support general-purpose architectures, this problem could potentially be solved by the same company.
There are also problems that cannot be solved with technical knowledge alone. For example, the eighth problem of improving methods for learning from human preferences may be more of a policy issue than a technical one. Speaking of the ninth question, improving interface efficiency, this is more like a user experience problem, and more people with non-technical backgrounds are needed to solve this problem together.
If you want to look at these problems from other angles, Chip Huyen recommends reading the following paper.