The simplest possible neural network: one neuron. Multiply inputs by weights, add them up, squish through an activation function.
One neuron. Multiply inputs by weights, add them up, squish through activation. That is it.
Section 2
Activation Functions
The “squish” functions that give networks their nonlinearity. Without them, stacking layers would be pointless.
Section 3
Multi-Layer Perceptron (MLP)
Stack neurons in layers, connect every neuron to every neuron in the next layer, and now you can learn complex patterns.
Every neuron in one layer connects to every neuron in the next. The network has 4 + 6 + 4 + 2 = 16 neurons and (4×6) + (6×4) + (4×2) = 56 weight connections.
Section 4
Convolutional Neural Network (CNN)
How a computer learns to see: filters slide across images, detecting edges, then shapes, then objects.
Section 5
The Training Loop
How a network learns: make predictions, measure errors, adjust weights, repeat.
Section 6
Key Terms Glossary
Quick reference for the essential vocabulary of neural networks.
EpochOne complete pass through the entire training dataset. Training typically takes 10-100+ epochs.
BatchA subset of training data used per weight update step. Common sizes: 32, 64, 128, 256.
Learning RateHow large each weight adjustment is. Too high: overshoots. Too low: learns too slowly.
LossA single number measuring how wrong the network's predictions are. Training minimizes this.
GradientThe direction and magnitude to adjust each weight. Computed via backpropagation (chain rule).
OverfittingWhen a network memorizes training data instead of learning general patterns. Fails on new data.
ActivationA nonlinear function applied after each neuron's weighted sum. Without it, deep networks collapse to one layer.
WeightA learnable number on each connection. The network's “knowledge” is entirely stored in its weights.
BiasAn extra learnable offset added before the activation function. Shifts the decision boundary.
DropoutRandomly disabling neurons during training. Prevents over-reliance on any single pathway.
Feature MapThe output of one convolutional filter applied to an image. Highlights one type of visual pattern.
BackpropagationThe algorithm that computes gradients layer by layer, from output back to input, using the chain rule.