How words become numbers — and why ‘king - man + woman = queen’
Illustration companion to Issue 7: Attention Is All You Need
1
Words as Points in Space
In an embedding space, every word is a point. Words with similar meanings cluster together — like stars forming constellations. This is a 2D projection of a high-dimensional space where proximity means similarity.
Animals
Countries
Professions
Emotions
Food
Technology
Words that appear in similar contexts end up near each other in the embedding space. The model was never told that “cat” and “dog” are both animals — it discovered this structure from patterns in text.
2
Vector Arithmetic
The most stunning property of embeddings: you can do math with meanings. Subtract one concept, add another, and arrive at the correct answer — because directions in the space encode relationships.
The model was never taught these relationships. They emerged from reading billions of words. Directions in embedding space encode consistent semantic relationships — gender, geography, grammar, and more.
3
How a Word Becomes a Vector
Every word passes through the same pipeline: from text to token ID to a dense vector of numbers. Here is the journey of the word “cat”.
4
Similarity = Distance
How do we measure if two words are similar? We compare their vectors. Cosine similarity measures the angle between two vectors: 1.0 means identical, 0.0 means unrelated.
This is how search engines find relevant results — by finding vectors close to your query. When you type a question, it gets embedded into the same space, and the engine returns documents whose vectors are nearest.
5
The Embedding Matrix
All embeddings live in one giant table: 50,000 rows (one per token in the vocabulary) by 768 columns (one per dimension). That is 38.4 million parameters — just for the first layer.
6
Beyond Words
The embedding trick is not limited to words. The same idea — turn anything into a vector — works for images, audio, and code. Every modality gets mapped into a similar high-dimensional space.
The same trick works for images, audio, and code — turn anything into a vector. This is why multimodal models (like those that understand both text and images) are possible: everything lives in the same kind of space.
Part of From Turing to LLMs and Beyond — a 10-issue comic series about the history of computing.
This illustration accompanies Issue 7: Attention Is All You Need.