The Perceptron: Where It All Began
In 1958, Frank Rosenblatt built the Mark I Perceptron - a machine that could learn to classify images. At its core, it was logistic regression: a weighted sum of inputs passed through an activation function. The New York Times proclaimed it could one day walk, talk, and be conscious. The reality was more modest, but the seed was planted.
The XOR Problem and the First AI Winter
In 1969, Minsky and Papert proved that a single perceptron cannot learn the XOR function - a devastating blow. Funding dried up, and neural network research entered its first winter. The solution was hiding in plain sight: stack multiple layers. But nobody knew how to train them.
Backpropagation: The Key to Deep Learning
The chain rule of calculus, when applied systematically through a network, gives us backpropagation. Popularized by Rumelhart, Hinton, and Williams in 1986, it solved the credit assignment problem: how to know which weights in which layers caused the error. Suddenly, multi-layer networks could learn.
The Modern Deep Learning Revolution
Three factors converged around 2012: massive datasets (ImageNet), powerful GPUs, and algorithmic innovations (ReLU, dropout, batch normalization). AlexNet won the ImageNet competition by a landslide, and deep learning has dominated AI ever since. Today, transformers power language models, diffusion models create images, and the field moves faster than ever.
The Building Block Remains the Same
Despite all the architectural innovation, the fundamental building block remains unchanged: a weighted sum, passed through a nonlinear activation, trained with gradient descent. Every neuron in GPT-4 is, at its core, a logistic regression unit. Understanding the simple case deeply is the best preparation for understanding the complex one.