Does the large language model have non-linguistic reasoning abilities?

Source: Gate.io

An Ars Technica headline today explores the question of whether large language models have non-linguistic reasoning abilities, citing researchers' findings that processing in a 'latent space' can help AI tackle challenging logical problems. What's going on? Let's continue reading.

Large language models have achieved great success so far, using their transformer architecture to effectively predict the next word (i.e., language token) needed for a given query. However, when it comes to complex reasoning tasks that require abstract logic, some researchers have found that explaining everything through this 'language space' may lead to issues, even for modern 'reasoning' models.

Now, researchers are trying to solve these problems by designing models that can compute potential logical solutions entirely in the 'latent space'—the hidden computational layer before the transformer generates language. While this approach does not lead to earth-shattering changes in the reasoning ability of large language models, it does significantly improve the accuracy of certain types of logical problems and points to some interesting directions for new research.

Wait, what space?

Modern reasoning models (such as o1 of ChatGPT) tend to work by generating a 'chain of thought.' In these models, each step of the logical process is represented as a series of natural language token and fed back through the model.

In a new paper, the Meta AI research team and researchers from the University of California, San Diego, view this reliance on natural language and 'word tokens' as a 'fundamental constraint' of these reasoning models. This is because successfully completing reasoning tasks often requires complex planning around specific key tokens to find the correct logical path from numerous options.

The figure above illustrates that in the standard model, each step must go through a converter, distinguishing it from the COCONUT model that uses hidden "latent" states. (Image Source: Training Large Language Models to Reason in a Continuous Latent Space)

Researchers wrote that in the current mind chain model, word tags are usually generated for 'text coherence' and 'fluency' rather than for 'contributing little to the actual reasoning process'. Instead, they suggest that 'ideally, large language models can freely reason without any language constraints, and then only translate their findings into language when necessary.'

To achieve this 'ideal,' researchers describe a method for training large language models to reason in a continuous latent space, as the title of the paper suggests. This 'latent space' is essentially composed of a set of 'hidden' intermediate token weight sets, which are the human-readable natural language versions of the internal states that the model contains before the transformer generates them.

In the COCONUT (Continuous Cognitive Chain) model of researchers, these hidden states are encoded as 'latent thinking', and when training and processing queries, they will replace individual written steps in a logical order. The researchers wrote that this avoids the need to convert each step into natural language, and 'liberates reasoning from the language space', thereby producing an optimized reasoning path, which they call 'continuous thinking'.

A broader vision

Although processing logic in latent space has certain benefits for improving model efficiency, the more important finding is that this type of model can "simultaneously encode multiple potential subsequent steps." Processing logic in the "latent space" can achieve a kind of instant backtracking, which researchers liken to breadth-first search in a graph rather than greedily and sequentially searching for each logical option.

Researchers wrote that even if the model is not explicitly trained, this sudden, synchronous processing characteristic will also be reflected in the test. "Although the model may not initially make the correct decision, it can maintain many possible choices in continuous thinking under the guidance of some implicit value functions, and gradually eliminate incorrect paths through reasoning," they wrote.

This figure highlights some ways in which different models may fail in certain types of logical reasoning. (Image source: Training Large Language Models to Reason in a Continuous Latent Space)

In relatively simple mathematical reasoning tests (GSM8K) or general reasoning (ProntoQA) tests, this multi-path reasoning does not really improve the accuracy of COCONUT compared to traditional thinking chain models. However, researchers have found that the model performs relatively well in a set of randomly generated ProntoQA-style queries involving complex and convoluted sets of logical conditions (such as 'every apple is a fruit, every fruit is food, etc.').

For these tasks, the standard chain of thought reasoning model often gets stuck in the dead end of reasoning and even produces completely fictitious rules when trying to solve logical chain problems. Previous research also indicates that the 'verbalized' logical steps output by these chain reasoning models may actually utilize potential reasoning processes different from the shared reasoning process.

This new study joins the ranks of increasingly numerous studies aimed at understanding and harnessing the workings of large-scale language models at their underlying neural network level. Although such research has not yet made significant breakthroughs, researchers believe that models pre-trained with this 'continuous thinking' from the beginning can 'enable the model to generalize more effectively in a wider range of inference scenarios'.

View Original
The content is for reference only, not a solicitation or offer. No investment, tax, or legal advice provided. See Disclaimer for more risks disclosure.
  • Reward
  • Comment
  • Share
Comment
0/400
No comments