# Prompting Techniques

Prompt Engineering are strategies used to communicate effectively with a large language model. They guide the model in generating desired responses.

Here are some key Prompt Engineering techniques that are commonly used to optimize the performance of AI models:

## Zero-Shot Prompting

With zero-shot prompting, you ask the model a question or give an instruction directly without providing any examples or demonstrations. This technique works well for straightforward tasks.

LLMs today, such as GPT-4, are tuned to follow instructions and are trained on large amounts of data; so they are capable of performing some tasks “zero-shot.”

We tried a few zero-shot examples in the previous section. Here is one of the examples we used:

*Prompt:*

`Classify the text into neutral, negative or positive. Text: I think the vacation is okay.Sentiment:`

*Output:*

`Neutral`

Note that in the prompt above we didn’t provide the model with any examples of text alongside their classifications, the LLM already understands “sentiment” — that’s the zero-shot capabilities at work.

Instruction tuning has shown to improve zero-shot learning Wei et al. (2022). Instruction tuning is essentially the concept of finetuning models on datasets described via instructions. Furthermore, RLHF (reinforcement learning from human feedback) has been adopted to scale instruction tuning wherein the model is aligned to better fit human preferences. This recent development powers models like ChatGPT. We will discuss all these approaches and methods in upcoming sections.

When zero-shot doesn’t work, it’s recommended to provide demonstrations or examples in the prompt which leads to few-shot prompting. In the next section, we demonstrate few-shot prompting.

## Few-Shot Prompting

In this technique, you provide the model with a few examples before asking the actual question or giving the task. It’s like a short training session and it can help the model understand more complex tasks.

While large-language models demonstrate remarkable zero-shot capabilities, they still fall short on more complex tasks when using the zero-shot setting. Few-shot prompting can be used as a technique to enable in-context learning where we provide demonstrations in the prompt to steer the model to better performance. The demonstrations serve as conditioning for subsequent examples where we would like the model to generate a response.

According to Touvron et al. 2023 few shot properties first appeared when models were scaled to a sufficient size (Kaplan et al., 2020).

Let’s demonstrate few-shot prompting via an example that was presented in Brown et al. 2020. In the example, the task is to correctly use a new word in a sentence.

*Prompt:*

`A "whatpu" is a small, furry animal native to Tanzania. An example of a sentence that usesthe word whatpu is:We were traveling in Africa and we saw these very cute whatpus.To do a "farduddle" means to jump up and down really fast. An example of a sentence that usesthe word farduddle is:`

*Output:*

`When we won the game, we all started to farduddle in celebration.`

We can observe that the model has somehow learned how to perform the task by providing it with just one example (i.e., 1-shot). For more difficult tasks, we can experiment with increasing the demonstrations (e.g., 3-shot, 5-shot, 10-shot, etc.).

Following the findings from Min et al. (2022), here are a few more tips about demonstrations/exemplars when doing few-shot:

- The label space and the distribution of the input text specified by the demonstrations are both important (regardless of whether the labels are correct for individual inputs).
- the format you use also plays a key role in performance, even if you just use random labels, this is much better than no labels at all.
- additional results show that selecting random labels from a true distribution of labels (instead of a uniform distribution) also helps.

Let’s try out a few examples. Let’s first try an example with random labels (meaning the labels Negative and Positive are randomly assigned to the inputs):

*Prompt:*

```
This is awesome! // Negative
This is bad! // Positive
Wow that movie was rad! // Positive
What a horrible show! //
```

*Output:*

`Negative`

We still get the correct answer, even though the labels have been randomized. Note that we also kept the format, which helps too. In fact, with further experimentation, it seems the newer GPT models we are experimenting with are becoming more robust to even random formats. Example:

*Prompt:*

```
Positive This is awesome!
This is bad! Negative
Wow that movie was rad!Positive
What a horrible show! --
```

*Output:*

`Negative`

There is no consistency in the format above but the model still predicted the correct label. We have to conduct a more thorough analysis to confirm if this holds for different and more complex tasks, including different variations of prompts.

### Limitations of Few-shot Prompting

Standard few-shot prompting works well for many tasks but is still not a perfect technique, especially when dealing with more complex reasoning tasks. Let’s demonstrate why this is the case. Do you recall the previous example where we provided the following task:

`The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1. A: `

If we try this again, the model outputs the following:

`Yes, the odd numbers in this group add up to 107, which is an even number.`

This is not the correct response, which not only highlights the limitations of these systems but that there is a need for more advanced prompt engineering.

Let’s try to add some examples to see if few-shot prompting improves the results.

*Prompt:*

`The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.A: The answer is False.The odd numbers in this group add up to an even number: 17, 10, 19, 4, 8, 12, 24.A: The answer is True.The odd numbers in this group add up to an even number: 16, 11, 14, 4, 8, 13, 24.A: The answer is True.The odd numbers in this group add up to an even number: 17, 9, 10, 12, 13, 4, 2.A: The answer is False.The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1. A: `

*Output:*

`The answer is True.`

That didn’t work. It seems like few-shot prompting is not enough to get reliable responses for this type of reasoning problem. The example above provides basic information on the task. If you take a closer look, the type of task we have introduced involves a few more reasoning steps. In other words, it might help if we break the problem down into steps and demonstrate that to the model. More recently, chain-of-thought (CoT) prompting has been popularized to address more complex arithmetic, commonsense, and symbolic reasoning tasks.

Overall, it seems that providing examples is useful for solving some tasks. When zero-shot prompting and few-shot prompting are not sufficient, it might mean that whatever was learned by the model isn’t enough to do well at the task. From here it is recommended to start thinking about fine-tuning your models or experimenting with more advanced prompting techniques. Up next we talk about one of the popular prompting techniques called chain-of-thought prompting which has gained a lot of popularity.

## Chain-of-Thought (CoT) Prompting

Chain of Thought Prompting is a technique that leverages a model’s ability to generate text based on preceding text, thus creating a coherent “chain of thought”. Instead of providing one static prompt, you interactively and iteratively feed the model’s previous output back as the next input. This can create a conversation or a longer, more complex chain of reasoning. It’s like having an ongoing dialogue with the model. This method can be particularly effective for more complex tasks, deep-diving into a subject, or exploring creative uses.

Introduced in Wei et al. (2022), chain-of-thought (CoT) prompting enables complex reasoning capabilities through intermediate reasoning steps. You can combine it with few-shot prompting to get better results on more complex tasks that require reasoning before responding.

*Prompt:*

`The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.A: Adding all the odd numbers (9, 15, 1) gives 25. The answer is False.The odd numbers in this group add up to an even number: 17, 10, 19, 4, 8, 12, 24.A: Adding all the odd numbers (17, 19) gives 36. The answer is True.The odd numbers in this group add up to an even number: 16, 11, 14, 4, 8, 13, 24.A: Adding all the odd numbers (11, 13) gives 24. The answer is True.The odd numbers in this group add up to an even number: 17, 9, 10, 12, 13, 4, 2.A: Adding all the odd numbers (17, 9, 13) gives 39. The answer is False.The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1. A:`

*Output:*

`Adding all the odd numbers (15, 5, 13, 7, 1) gives 41. The answer is False.`

Wow! We can see a perfect result when we provided the reasoning step. In fact, we can solve this task by providing even fewer examples, i.e., just one example seems enough:

*Prompt:*

`The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.A: Adding all the odd numbers (9, 15, 1) gives 25. The answer is False.The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1. A:`

*Output:*

`Adding all the odd numbers (15, 5, 13, 7, 1) gives 41. The answer is False.`

Keep in mind that the authors claim that this is an emergent ability that arises with sufficiently large language models.

## Zero-shot COT Prompting

Zero-shot Chain of Thought (COT) prompting builds on the general idea of Chain of Thought prompting. But in this case, you don’t provide any examples to guide the model before starting the interactive, iterative conversation.

In other words, you start a task or a question without any setup or context. You then take the model’s output, append or modify it based on what you want to achieve next, and then feed it back to the model as the next input. You continue this process as long as necessary to achieve the desired result.

This approach is particularly useful when you want to explore an idea or topic more deeply with the model, or when the task requires a sequence of steps that build on each other. It’s also valuable when the conversation’s direction might change based on the model’s outputs, allowing you to adjust and steer the conversation on-the-fly.

Image Source: Kojima et al. (2022)

One recent idea that came out more recently is the idea of zero-shot CoT that essentially involves adding “Let’s think step by step” to the original prompt. Let’s try a simple problem and see how the model performs:

*Prompt:*

`I went to the market and bought 10 apples. I gave 2 apples to the neighbor and 2 to the repairman. I then went and bought 5 more apples and ate 1. How many apples did I remain with?`

*Output:*

`11 apples`

The answer is incorrect! Now Let’s try with the special prompt.

*Prompt:*

`I went to the market and bought 10 apples. I gave 2 apples to the neighbor and 2 to the repairman. I then went and bought 5 more apples and ate 1. How many apples did I remain with?Let's think step by step.`

*Output:*

`First, you started with 10 apples.You gave away 2 apples to the neighbor and 2 to the repairman, so you had 6 apples left.Then you bought 5 more apples, so now you had 11 apples.Finally, you ate 1 apple, so you would remain with 10 apples.`

It’s impressive that this simple prompt is effective at this task. This is particularly useful where you don’t have too many examples to use in the prompt.

## Automatic Chain-of-Thought (Auto-CoT)

Automatic Chain-of-Thought (Auto-CoT) is an advanced prompting technique that involves an automated sequence of prompts and responses, leveraging the Chain of Thought (CoT) approach. Instead of manually steering the model’s responses by adjusting subsequent prompts, Auto-CoT involves setting up an automated process that controls this sequence.

In Auto-CoT, the system automatically takes the model’s output, processes it based on predefined rules or an algorithm, and uses it as input for the next prompt. This process repeats until a desired goal is achieved or a termination condition is met.

The advantages of Auto-CoT include the ability to conduct extended interactions with the model and extract more complex or comprehensive outputs. However, setting up Auto-CoT can be a challenging task and requires a good understanding of both the task at hand and the behaviors of the language model.

When applying chain-of-thought prompting with demonstrations, the process involves hand-crafting effective and diverse examples. This manual effort could lead to suboptimal solutions. Zhang et al. (2022) propose an approach to eliminate manual efforts by leveraging LLMs with “Let’s think step by step” prompt to generate reasoning chains for demonstrations one by one. This automatic process can still end up with mistakes in generated chains. To mitigate the effects of the mistakes, the diversity of demonstrations matter. This works proposes Auto-CoT, which samples questions with diversity and generates reasoning chains to construct the demonstrations.

Auto-CoT consists of two main stages:

- Stage 1):
**question clustering**: partition questions of a given dataset into a few clusters - Stage 2):
**demonstration sampling**: select a representative question from each cluster and generate its reasoning chain using Zero-Shot-CoT with simple heuristics

The simple heuristics could be length of questions (e.g., 60 tokens) and number of steps in rationale (e.g., 5 reasoning steps). This encourages the model to use simple and accurate demonstrations.

The process is illustrated below:

Image Source: Zhang et al. (2022)

Code for Auto-CoT is available here.

## Self-Consistency

Self-consistency is a method used to improve the reliability of responses generated by language models. It involves asking the model the same question in different ways, or repeating the same question after some time, and then comparing the responses for consistency.

This technique is based on the principle that a model trained on a large dataset, like GPT-4, should provide the same or similar answers to the same questions, regardless of slight rephrasing or the passage of time between queries. Any discrepancies in the answers could indicate uncertainty or inaccuracies in the model’s responses.

Self-consistency can be particularly useful in situations where it’s important to ensure the accuracy and reliability of the model’s output. However, it should be noted that even if a model is self-consistent, it doesn’t necessarily guarantee the correctness of the information provided.

Perhaps one of the more advanced techniques out there for prompt engineering is self-consistency. Proposed by Wang et al. (2022), self-consistency aims “to replace the naive greedy decoding used in chain-of-thought prompting”. The idea is to sample multiple, diverse reasoning paths through few-shot CoT, and use the generations to select the most consistent answer. This helps to boost the performance of CoT prompting on tasks involving arithmetic and commonsense reasoning.

Let’s try the following example for arithmetic reasoning:

*Prompt:*

`When I was 6 my sister was half my age. NowI’m 70 how old is my sister?`

*Output:*

`35`

The output is wrong! How may we improve this with self-consistency? Let’s try it out. We will use the few-shot exemplars from Wang et al. 2022 (Table 17):

*Prompt:*

`Q: There are 15 trees in the grove. Grove workers will plant trees in the grove today. After they are done, there will be 21 trees. How many trees did the grove workers plant today?A: We start with 15 trees. Later we have 21 trees. The difference must be the number of trees they planted. So, they must have planted 21 - 15 = 6 trees. The answer is 6.Q: If there are 3 cars in the parking lot and 2 more cars arrive, how many cars are in the parking lot? A: There are 3 cars in the parking lot already. 2 more arrive. Now there are 3 + 2 = 5 cars. The answer is 5.Q: Leah had 32 chocolates and her sister had 42. If they ate 35, how many pieces do they have left in total? A: Leah had 32 chocolates and Leah’s sister had 42. That means there were originally 32 + 42 = 74chocolates. 35 have been eaten. So in total they still have 74 - 35 = 39 chocolates. The answer is 39.Q: Jason had 20 lollipops. He gave Denny some lollipops. Now Jason has 12 lollipops. How many lollipops did Jason give to Denny?A: Jason had 20 lollipops. Since he only has 12 now, he must have given the rest to Denny. The number oflollipops he has given to Denny must have been 20 - 12 = 8 lollipops. The answer is 8.Q: Shawn has five toys. For Christmas, he got two toys each from his mom and dad. How many toys doeshe have now?A: He has 5 toys. He got 2 from mom, so after that he has 5 + 2 = 7 toys. Then he got 2 more from dad, soin total he has 7 + 2 = 9 toys. The answer is 9.Q: There were nine computers in the server room. Five more computers were installed each day, frommonday to thursday. How many computers are now in the server room?A: There are 4 days from monday to thursday. 5 computers were added each day. That means in total 4 * 5 =20 computers were added. There were 9 computers in the beginning, so now there are 9 + 20 = 29 computers.The answer is 29.Q: Michael had 58 golf balls. On tuesday, he lost 23 golf balls. On wednesday, he lost 2 more. How manygolf balls did he have at the end of wednesday?A: Michael initially had 58 balls. He lost 23 on Tuesday, so after that he has 58 - 23 = 35 balls. OnWednesday he lost 2 more so now he has 35 - 2 = 33 balls. The answer is 33.Q: Olivia has $23. She bought five bagels for $3 each. How much money does she have left?A: She bought 5 bagels for $3 each. This means she spent 5Q: When I was 6 my sister was half my age. Now I’m 70 how old is my sister?A:`

*Output 1:*

`When I was 6 my sister was half my age, so she was 3. Now I am 70, so she is 70 - 3 = 67. The answer is 67.`

*Output 2:*

`When the narrator was 6, his sister was half his age, which is 3. Now that the narrator is 70, his sister would be 70 - 3 = 67 years old. The answer is 67.`

*Output 3:*

`When I was 6 my sister was half my age, so she was 3. Now I am 70, so she is 70/2 = 35. The answer is 35.`

Computing for the final answer involves a few steps (check out the paper for the details) but for the sake of simplicity, we can see that there is already a majority answer emerging so that would essentially become the final answer.