Neural Network Anatomy

From a single neuron to convolutional vision — every part labeled

Illustration companion to Issue 6: Machines That Learn

Section 1

The Perceptron

The simplest possible neural network: one neuron. Multiply inputs by weights, add them up, squish through an activation function.

One neuron. Multiply inputs by weights, add them up, squish through activation. That is it.

Section 2

Activation Functions

The “squish” functions that give networks their nonlinearity. Without them, stacking layers would be pointless.

Section 3

Multi-Layer Perceptron (MLP)

Stack neurons in layers, connect every neuron to every neuron in the next layer, and now you can learn complex patterns.

Every neuron in one layer connects to every neuron in the next. The network has 4 + 6 + 4 + 2 = 16 neurons and (4×6) + (6×4) + (4×2) = 56 weight connections.

Section 4

Convolutional Neural Network (CNN)

How a computer learns to see: filters slide across images, detecting edges, then shapes, then objects.

Section 5

The Training Loop

How a network learns: make predictions, measure errors, adjust weights, repeat.

Section 6

Key Terms Glossary

Quick reference for the essential vocabulary of neural networks.

Epoch One complete pass through the entire training dataset. Training typically takes 10-100+ epochs.

Batch A subset of training data used per weight update step. Common sizes: 32, 64, 128, 256.

Learning Rate How large each weight adjustment is. Too high: overshoots. Too low: learns too slowly.

Loss A single number measuring how wrong the network's predictions are. Training minimizes this.

Gradient The direction and magnitude to adjust each weight. Computed via backpropagation (chain rule).

Overfitting When a network memorizes training data instead of learning general patterns. Fails on new data.

Activation A nonlinear function applied after each neuron's weighted sum. Without it, deep networks collapse to one layer.

Weight A learnable number on each connection. The network's “knowledge” is entirely stored in its weights.

Bias An extra learnable offset added before the activation function. Shifts the decision boundary.

Dropout Randomly disabling neurons during training. Prevents over-reliance on any single pathway.

Feature Map The output of one convolutional filter applied to an image. Highlights one type of visual pattern.

Backpropagation The algorithm that computes gradients layer by layer, from output back to input, using the chain rule.