This first step works admirably as a first pass, but unfortunately it still gives quite a few inaccurate or completely wrong responses. His solution is “stuffing context into the prompt.“
His key insight is to include extra information in the prompt as a hint, like an open-book test.
Shipper uses GPT Index, an open source project of data structures designed for large language models. The entire source text of “Lenny’s Newsletter” is broken into chunks, with each chunk converted into a GPT-3 embedding vector. OpenAI’s embeddings API allow the measurement of the relatedness of text strings, to facilitate search, clustering, recommendations, anomaly detection, diversity measurement, and classification.
Once we have this index, we can proceed. We get the embedding vector for the question and retrieve those indexed chunks that are closest to it in the embedding space. This chunk or chunks is then concatenated onto the prompt.
Link to the code.