How to generate text - Greedy search, Beam search, Top-K Sampling, Top-p sampling

As ad-hoc decoding methods, top-p and top-K sampling seem to produce more fluent text than traditional greedy - and beam search on open-ended language generation. There is evidence that the apparent flaws of greedy and beam search - mainly generating repetitive word sequences - are caused by the model (especially the way the model is trained), rather than the decoding method

The Difficulties of Text Generation using Autoregressive Language Models: A Brief Overview

While some may criticize the autoregressive formulation because people generally don’t write purely autoregressively, there actually are authors who use this sort of technique to write entire books.

“GPT learning has been great at capturing the underlying reality and maybe the weak point is the text generation” - Sutskever - YouTube logo https://www.youtube.com/watch?v=SjhIlw3Iffs

The Curious Case of Neural Text Degeneration - Beam search text (blue) is less surprising than human text (orange): A graph of the surprisingness of beam search vs human text

Why is human-written text not the most probable text? … people optimize against stating the obvious.

GPT-3 has a habit of repeating its input