Contrastive search in large language models addresses issues such as degenerative expressions and a lack of semantic coherence.
Degenerative expressions refer to the tendency of large language models to fall into repetitive or uninformative outputs. This often happens when the model generates long-form content and gets stuck in a loop of similar responses. They are trained to maximise likelihood, which can lead to repetitive sentence structure.
These models can only learn from their training data, obviously, and will thus reflect any flaws of that training diet. Such systems may understand immediate (local) context of a conversation but have difficulty with broader context. They lack real-world knowledge.
Lack of semantic coherence: inconsistent responses, topic drift, and misinterpretation.
Inconsistent responses mean the model might provide contradictory responses, as it has a limited memory of past interactions.
In topic drift, the model’s mind wanders, moving away from the original theme over the course of a session.
The model is trying to maximize the immediate next-word prediction, rather than maintaining overall topic coherence.
Finally, the model might not fully understand the context of a conversation, leading to responses that are irrelevant or non-sensical.
Contrastive Search to the Rescue. Contrastive search generates semantically coherent text while maintaining text diversity. It does this by using a contrastive training objective to calibrate the model’s representation space. It maximizes the similarity between positive pairs of data samples and minimizes the similarity between negative pairs, aiming to obtain a more discriminative and robust feature space.
The authors of the paper “Contrastive Search Is What You Need For Neural Text Generation” (Yixuan Su and Nigel Collier) argue that contrastive search significantly outperforms existing decoding methods and generates high-quality text. They also suggest that contrastive search could be used for open-domain knowledge probing of language models and for synthesizing training data.
In “Generating Human-level Text with Contrastive Search in Transformers,” Tian Lin introduces HuggingFace’s contrastive search code libraries for PyTorch and TensorFlow. The paper provides a comparison between deterministic and stochastic methods, and between greedy and contrastive search.
In the article “Generating Text With Contrastive Search vs GPT-3/ChatGPT,” / Mario Filho presents experiments comparing the outputs of various generative AI text models. He concludes that while contrastive search improves the completions generated by smaller models, GPT-3 still outperforms it. He also suggests that the improvements seen with ChatGPT might be partly due to better search strategies during text generation.
Contrastive search is a promising method for improving the quality of text generated by neural models, especially when used with smaller models. However, larger models like GPT-3 and -4 still seem to provide superior results.