Damien Benveniste  provides us with two articles about the history and taxonomy of large language models. In a post on LinkedIn,  he leads us from 2017’s Attention is all you need, to 2018’s ELMo and BERT. He references Facebooks XLM (2019), which demonstrated the use of transformers for cross-linguistic language representation, and then of course GPT and ChatGPT. He provides us with great diagrams on transformers, including this one:

1675439151638 2

In The ChatGPT Models Family, he provides us with some fascinating charts showing the taxonomy of various large language models. He notes that GPT models (GPT-1, GPT-2, GPT-3) differ mostly in terms of the data size and number of transformer blocks used for training. GPT-1 has 12 transformer blocks and 117 million parameters, GPT-2 has 48 blocks and 1.5 billion parameters, and GPT-3 has 96 blocks and 175 billion parameters. He also tells us about some ChatGPT alternatives, including Google’s LAMDA and Meta’s PEER. He finishes by pointing out that since the original 2017 paper, not much has changed in large language models as far as the underlying transformer architecture goes.

This reminds me of this graphic by Max Roser showing us just how fast technology in general accelerates.

 

 

Timeline of Technology

Longterm timeline of technology

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

I always thought that it was fascinating that from the beginning of history until the early nineteenth century, the speed at which humans could travel was constant.

Maximum Possible Speed in Km/h Over Time, Rockets Excluded

 

 

 

 

 

 

 

 

   

   

   

   

   

 

 

Throughout most of human history, the pace of technological change has been glacial. Now technologies are emerging that would have been unimaginable just a few decades ago. Roser explains his chart:

The timeline begins at the center of the spiral. The first use of stone tools, 3.4 million years ago, marks the beginning of this history of technology. Each turn of the spiral then represents 200,000 years of history. It took 2.4 million years – 12 turns of the spiral – for our ancestors to control fire and use it for cooking.3

To be able to visualize the inventions in the more recent past – the last 12,000 years – I had to unroll the spiral. I needed more space to be able to show when agriculture, writing, and the wheel were invented. During this period, technological change was faster, but it was still relatively slow: several thousand years passed between each of these three inventions.

From 1800 onwards, I stretched out the timeline even further to show the many major inventions that rapidly followed one after the other.

Given all of this, it’s fun to consider where this technology may take us in the next five or ten years. If history is any guide, we may well be at the start of a very fast rising hockey stick.

 

Author