Self-consistency is a prompting technique within the larger field of Prompt Engineering designed to improve the accuracy and coherence of responses from Large Language Models (LLMs). It is listed as one of the various prompting techniques available to engineers.
Here’s a breakdown of what the sources say about Self-consistency:
-
Definition and Purpose: Self-consistency combines sampling and majority voting to generate diverse reasoning paths for a given prompt and then selects the most consistent answer across these paths. The primary goal is to improve the accuracy and coherence of the LLM’s responses, particularly for tasks that require reasoning.
-
Mechanism: The process of Self-consistency involves the following steps:
- Generating diverse reasoning paths: The same prompt is provided to the LLM multiple times. To encourage the generation of different reasoning paths and perspectives, a high temperature setting is used during inference.
- Extracting the answer: The final answer is extracted from each of the generated responses.
- Choosing the most common answer: The most frequently occurring answer among the diverse reasoning paths is selected as the final, self-consistent answer.
-
Relationship with Chain of Thought (CoT): Self-consistency is closely related to and often used in conjunction with Chain of Thought (CoT) prompting. As mentioned in the sources, while CoT prompting encourages the LLM to generate reasoning steps, it typically uses a “simple ‘greedy decoding’ strategy, limiting its effectiveness”. Self-consistency builds upon CoT by generating multiple Chains of Thought through sampling and then aggregating the results to arrive at a more reliable answer. Therefore, to achieve self-consistency, you would generate multiple CoT reasoning paths for the same question.
-
Benefits:
- Improved Accuracy and Coherence: By considering multiple reasoning paths and selecting the most consistent answer, self-consistency can lead to more accurate and coherent responses from LLMs.
- Pseudo-probability of Correctness: Self-consistency can provide a “pseudo-probability likelihood of an answer being correct”.
- Considers Multiple Perspectives: The technique improves accuracy by taking into account “multiple perspectives”.
-
Drawbacks:
- High Costs: The primary drawback of self-consistency is its “high costs”. Generating multiple responses from the LLM requires significantly more computational resources and time compared to generating a single response.
-
Example: The sources provide an example of using self-consistency for an email classification system that classifies emails as “IMPORTANT” or “NOT IMPORTANT”. A zero-shot chain of thought prompt is sent to the LLM multiple times. Due to the friendly tone, word choice, and sarcasm in the email, the LLM might initially return inconsistent classifications depending on the model and temperature configuration. However, by generating many Chains of Thought and taking the most common answer (“IMPORTANT” in the example), a more consistently correct answer can be obtained.
-
Best Practices: When using self-consistency with CoT prompting, it is important to ensure that the final answer can be clearly extracted from the generated reasoning steps and is separate from the reasoning.
In the larger context of Prompt Engineering, Self-consistency represents a more advanced technique to address the inherent probabilistic nature of LLMs and the limitations of relying on a single reasoning path. It acknowledges that LLMs might explore different ways of arriving at an answer and leverages this by promoting exploration and then enforcing consistency. The success of techniques like CoT and Self-consistency has paved the way for even more sophisticated reasoning strategies like Tree of Thoughts (ToT), which generalizes the concept of CoT and implicitly builds upon the idea of exploring multiple reasoning paths, similar in spirit to self-consistency. While powerful, the increased computational cost highlights the trade-offs involved in choosing different prompt engineering techniques. It underscores the iterative nature of prompt engineering, where practitioners might experiment with various techniques and configurations to find the optimal approach for their specific task and resource constraints. Documenting these experiments, as suggested in the sources, would be crucial in understanding the effectiveness of self-consistency and other prompting methods.