So here’s an interesting question addressed by Katherine Miller in How Do We Fix and Update Large Language Models? Large language models are, unlike a database, essentially a huge blob of clay that is trained on a bunch of facts and then makes inferences from those facts. Adding new information in after the training is complete—short of completely retraining the entire network, which can be quite costly—is not an easy thing. The difficulty is that a neural network is essentially a holistic entity with billions of connections, and it’s not as if each connection, or even a small subset of connections, corresponds to one particular fact. Given that ChatGPT’s training set ends in 2021, how are we to keep it up-to-date without having to go through millions of dollars of training every time we have a new set of facts to add.

Miller quotes Stanford Computer Science student Eric Anthony Mitchell, who notes a few solutions researchers are working on. Some of these solutions involve fine-tuning the weights of existing neural networks, while others approach the problem by connecting the network to a secondary, external system that is more akin to a database.

MEND, for example, “uses gradient decomposition to represent large language models’ gradients in a more compact way. We can then filter the neural connections to identify which need to get updated and which don’t.”

Two other approaches SERAC and ConCoRD, don’t change the model weights, but rather invoke an external system to be responsible for information updates. The approach is likened to having a notebook in your back pocket that your check before saying anything. What happens when the notebook gets so big that you are essentially relying upon it for all of your answers is an open research topic. Perhaps it’s at that point the the system gets retrained in its entirety.

A final intriguing point is—how do we know when it’s time to update the model? One approach is to compare the contents of the network with an up-to-date data stream and somehow make updates when there is a conflict. “The grand challenge here is how to take an extremely capable model like GPT3 and set up an automated system so that it can read the news every day and incorporate whatever new information is in that data stream without losing any of its existing abilities or knowledge.”

Author