Neural Network Anatomy

From a single neuron to convolutional vision — every part labeled

Illustration companion to Issue 6: Machines That Learn

Section 1

The Perceptron

The simplest possible neural network: one neuron. Multiply inputs by weights, add them up, squish through an activation function.

Diagram of a single perceptron with 4 inputs, weights, summation, activation, and output Inputs x₁ x₂ x₃ x₄ b b=0.1 w₁=0.7 w₂=-0.3 w₃=0.5 w₄=0.2 Σ Σ(wᵢ×xᵢ)+b Activation σ(z) ŷ Output Blue lines = positive weights Red lines = negative weights Thicker = larger magnitude

One neuron. Multiply inputs by weights, add them up, squish through activation. That is it.

Section 2

Activation Functions

The “squish” functions that give networks their nonlinearity. Without them, stacking layers would be pointless.

Four activation functions: Sigmoid, Tanh, ReLU, and Softmax Sigmoid 1 0.5 0 f(x) = 1/(1+e^(-x)) Squishes to 0–1. Classic, but vanishing gradients. Tanh 1 0 -1 f(x) = tanh(x) Squishes to -1 to 1. Centered around zero. ReLU 6 3 0 f(x) = max(0, x) Dead simple. Fast. The modern default. Softmax 2.0 1.0 0.5 Raw scores 0.59 0.24 0.17 Probabilities (sum = 1.0) softmax(zᵢ) Turns numbers into probabilities. Used for classification.
Section 3

Multi-Layer Perceptron (MLP)

Stack neurons in layers, connect every neuron to every neuron in the next layer, and now you can learn complex patterns.

A 4-layer multi-layer perceptron with 4 input, 6 hidden-1, 4 hidden-2, and 2 output neurons Forward Pass → Input Hidden 1 Hidden 2 Output 4 → 6 → 4 → 2 x₁ x₂ x₃ x₄ ŷ₁ ŷ₂ Opacity = activation level (brighter = more active) ← Backward Pass (Backpropagation)

Every neuron in one layer connects to every neuron in the next. The network has 4 + 6 + 4 + 2 = 16 neurons and (4×6) + (6×4) + (4×2) = 56 weight connections.

Section 4

Convolutional Neural Network (CNN)

How a computer learns to see: filters slide across images, detecting edges, then shapes, then objects.

CNN architecture showing image through convolutional layers, pooling, flattening, and dense layers to classification output Input Image 224×224×3 Conv Layer 1 32 filters, 3×3 Max Pool 2×2 stride 2 ½ size Conv Layer 2 64 filters Conv Layer 3 128 filters Flatten 1D vector Dense Output cat: 92% dog: 7% bird: 1% softmax Each filter detects one pattern Keeps strongest signals, shrinks Combines earlier features Detects complex parts (eyes, ears) What Each Layer Learns Early Layers edge edge gradient diag Middle Layers corner texture curve stripe Deep Layers eye ear wheel nose Final Layers cat car person Simple → Complex: each layer builds on the one before it. How Convolution Works A small filter (3×3) slides across the entire image: × filter = n Key parameters: Filter size: 3×3 or 5×5 Stride: how far the filter slides Padding: border handling Max Pooling (2×2): 3 1 7 4 7 Keep the max value only
Section 5

The Training Loop

How a network learns: make predictions, measure errors, adjust weights, repeat.

Circular training loop: Forward Pass, Loss Function, Backward Pass, Update Weights Repeat 10,000 – 1,000,000 times 1. Forward Pass Input → through network → prediction 1 2. Loss Function Prediction vs. truth → error 2 3. Backward Pass Compute gradients: which weights caused errors? 3 4. Update Weights weight -= learning_rate × gradient 4 prediction = model(input) Data flows forward through every layer loss = (pred - truth)² One number: how wrong were we? gradients = backprop(loss) Chain rule: blame flows backward w = w - lr * grad Nudge each weight to reduce error Each loop makes the network slightly less wrong. Over millions of loops, it learns.
Section 6

Key Terms Glossary

Quick reference for the essential vocabulary of neural networks.

Epoch One complete pass through the entire training dataset. Training typically takes 10-100+ epochs.
Batch A subset of training data used per weight update step. Common sizes: 32, 64, 128, 256.
Learning Rate How large each weight adjustment is. Too high: overshoots. Too low: learns too slowly.
Loss A single number measuring how wrong the network's predictions are. Training minimizes this.
Gradient The direction and magnitude to adjust each weight. Computed via backpropagation (chain rule).
Overfitting When a network memorizes training data instead of learning general patterns. Fails on new data.
Activation A nonlinear function applied after each neuron's weighted sum. Without it, deep networks collapse to one layer.
Weight A learnable number on each connection. The network's “knowledge” is entirely stored in its weights.
Bias An extra learnable offset added before the activation function. Shifts the decision boundary.
Dropout Randomly disabling neurons during training. Prevents over-reliance on any single pathway.
Feature Map The output of one convolutional filter applied to an image. Highlights one type of visual pattern.
Backpropagation The algorithm that computes gradients layer by layer, from output back to input, using the chain rule.