For those interested in plunging into the deep grass of how large language models and the like work, Damien Benveniste offers us a concise primer in Deep Neural Networks: All the Building Blocks.
He gives us an overview of loss functions, activation functions, and various architectures. He also offers us a useful timeline of work in the field along with links to relevant references.
He makes an interesting point:”Deep Learning requires much more of an ARCHITECT mindset than traditional Machine Learning. In a sense, part of the feature engineering work has been moved to the design of very specialized computational blocks using smaller units.” Discuss.