The results showed that the open-source 6B parameter model exceeded the average few-shot performance of the GPT3-175B model on 15 out of 20 benchmarks.
Jean-marc Mommessin informs us “Researchers developed a simple prompting strategy that enables open-source LLMs with 30x fewer parameters to exceed the few-shot performance of GPT3-175B,” as reported in Ask Me Anything: A simple strategy for prompting language models (Simran Arora, et al.). Methods like AMA may narrow things down for users facing the massive search space of prompts. The AMA approach aims to improve the in-context learning quality of smaller and open source models and enables the use of imperfect prompts.
- Studying the pertaining corpus and the training procedure provides signals for how to format prompts,
- Multiple prompts are aggregated using weak supervision.
- The strategy is as follows:
- Identify the properties for effective prompts.
- Develop strategy to scalably format task inputs according to the most efficient prompt property.
- Aggregate the prompts.
The goal of this approach is to improve in-context learning for smaller models and for open source models—thus, the “30x fewer parameters.” Question-answering prompts had the highest performance. They used prompt chaining and weak supervision to aggregate multiple prompt outputs for each input and to format task inputs according to the most efficient prompt property.
Here we see another article about crafting better prompts in order to improve LLM performance. Called prompt generation, it can be used for generating adversarial inputs and automating prompt search.
The method is based on the idea of feature visualization for image classifiers. Feature visualisation works as follows:
We take an input and use gradient descent until it it maximises a particular class activation, such as “goldfish.”. In the presented example, we see “prototype” inputs that maximise the classes “goldfish”, “monarch,” “tarantula ” ,and “flamingo.” To be clear, this is not the data that the system was trained upon, but an archetype of what the algorithm deems to be particularly “goldfish-like,” a sort of ‘Platonic ideal’ goldfish, in the network’s view.
Here’s another example of feature visualisation for images:
This is reminiscent of findings with human subjects, where humans are exposed to visual exemplars and then asked to classify the familiarity of other exemplars, some identical, some not. Interestingly, those exemplars that were closest to a category average or prototype tended to be classified as more familiar, even if those exact exemplars had never been displayed to the subjects.
This Captain Kirk Grid shows various generative AI responses to “Captain James T. Kirk.” The three real Captain James T. Kirks are shown in the center (Paul Wesley, William Shatner, and Chris Pine). The three real Kirks together appear to the algorithm as a prototype, which results in the generated exemplars arranged around the edges appearing to blend features of all three.
Transferring the “feature visualisation” methodology into the language space is difficult. Inputs are discrete tokens, which is inconsistent with gradient descent. But, these tokens are mapped to embeddings in a continuous space, which can be optimized to keep embeddings close to legal tokens. Code for this process is available for those interested in more details.
Here are some examples of the sorts of prompts the system comes up with: