Combined synergy between reasoning and acting for large language models

4 min readMay 31, 2023

TL;DR

An alternative prompting technique, called ReAct, is introduced to overcome limitations of chain-of-thought prompting.
The ReAct prompting merges reasoning process from chain-of-thought prompting and acting process from reinforcement learning into a single system and it outperforms other prompting approaches.
ReAct prompting allows large language models to experience up-to-date data which can mitigate teh hallucination problem.
Relevant data provided in in-context learning is a significant factor that can influence the success rate of large language models.

Overcoming chain-of-thought prompting challenges

While chain-of-thought (CoT) prompting is a groundbreaking concept for in-context learning and it enables large language models to engage in multi-step reasoning by thinking step-by-step, it may pose a challenge when a fixed elaboration in the CoT prompt cannot be generalized due to limited training data or other reasons. As a result, large language models may need to improvise the answer, which can lead to a hallucination problem.

To overcome this issue, some researchers propose a new prompting idea through the use of a more flexible chain-of-thought prompt. They develop ReAct (short for Reasoning and Acting) (Yao et al., 2023), which combines the original chain of thought with a dynamic action that depends on the current situation as illustrated in Figure 1. This synergy allows large language models to track each intermediate reasoning step and make new decisions when the original plan is not applicable.

Figure 1: ReAct framework. It is noted that LM and ENV refer to large language models and environment, respectively. Reprinted from (Yao et al., 2023).

The researchers also introduce an API that allows large language models to access additional information, enabling them to generate more accurate responses. This resolves the limitation of knowledge domain and unlocks the true potential of large language models.

When action meets rationale

The authors design three possible actions for them to carry out a certain task which are search, lookup, and finish.

search: retrieves external data of a particular keyword through API.
lookup: finds relevant context of a particular keyword.
finish: generates the answer from the provided context.

By combining a chain of thought, action plan, and live data, large language models demonstrate capability to generate the answer more accurately. In Figure 2, the question is tricky, so the original CoT prompting, as shown in Figure 1(b), cannot produce the correct answer (not to mention the standard one). Likewise, depending solely on actions does not produce a satisfactory response, as illustrated in Figure 1(c). However, the correct response can be achieved by utilizing the ReAct prompting method, as displayed in Figure 1(d). Examples of prompt templates for each prompting method are provided in Figure 3.

Figure 2: Comparison between (a) standard, (b) chain-of-thought, © act-only, and (d) ReAct prompting. Reprinted from (Yao et al., 2023).

Figure 3: Comparison between (a) standard, (b) chain-of-thought, © act-only, and (d) ReAct prompting. Reprinted from (Yao et al., 2023).

Synergy evaluation

To compare the performance of ReAct and CoT, the authors categorize results from solving multi-hop question problems (HotpotQA; please see an example of the question in Figure 2) into two groups for success and failure, and each group can be further divided into smaller categories, as listed in Table 1. The ReAct prompting outperforms CoT prompting and it can mitigate hallucination problems.

Table 1: Performance of large language models when using ReAct and CoT prompting methods. Reprinted from (Yao et al., 2023).

In the comparison between ReAct and Act, the authors evaluate their performance with decision-making tasks using ALFWorld and WebShop (as shown in Figure 4 for examples of prompts). They also compare their results to related works on imitation learning (IL) and reinforcement learning (RL), as well as benchmarks from human experts. Figure 5 presents the results in terms of score and success rate (SR), and the findings suggest that ReAct prompting outperforms all other benchmarks, except for human experts.

Figure 4: Examples of prompts used in the decision-making task (WebShop). Reprinted from (Yao et al., 2023).

Figure 4: Score and success rate of different approaches. Reprinted from (Yao et al., 2023).

In summary, a chain of thought is a powerful technique that enables large language models to accomplish complex tasks. However, it may sometimes fail due to the limited knowledge of language models, which is typically fixed at training time. To address this, a group of researchers has proposed an alternative prompting idea called ReAct, which offers flexibility to prompting by designing three actions: search, lookup, and finish. When we pose a question to a large language model, it retrieves knowledge from a search engine API and selects relevant context to generate an answer using a chain-of-thought approach. The ReAct prompting demonstrates promising performance and can reduce the occurrence of hallucinated facts, which can typically occur in the original CoT prompting.

Combined synergy between reasoning and acting for large language models

TL;DR

Overcoming chain-of-thought prompting challenges

When action meets rationale

Synergy evaluation

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Pakhapoom Sarapat

No responses yet