The Embedding Galaxy — How Words Become Numbers

1

Words as Points in Space

In an embedding space, every word is a point. Words with similar meanings cluster together — like stars forming constellations. This is a 2D projection of a high-dimensional space where proximity means similarity.

Animals

Countries

Professions

Emotions

Food

Technology

Words that appear in similar contexts end up near each other in the embedding space. The model was never told that “cat” and “dog” are both animals — it discovered this structure from patterns in text.

2

Vector Arithmetic

The most stunning property of embeddings: you can do math with meanings. Subtract one concept, add another, and arrive at the correct answer — because directions in the space encode relationships.

The model was never taught these relationships. They emerged from reading billions of words. Directions in embedding space encode consistent semantic relationships — gender, geography, grammar, and more.

3

How a Word Becomes a Vector

Every word passes through the same pipeline: from text to token ID to a dense vector of numbers. Here is the journey of the word “cat”.

4

Similarity = Distance

How do we measure if two words are similar? We compare their vectors. Cosine similarity measures the angle between two vectors: 1.0 means identical, 0.0 means unrelated.

This is how search engines find relevant results — by finding vectors close to your query. When you type a question, it gets embedded into the same space, and the engine returns documents whose vectors are nearest.

5

The Embedding Matrix

All embeddings live in one giant table: 50,000 rows (one per token in the vocabulary) by 768 columns (one per dimension). That is 38.4 million parameters — just for the first layer.

6

Beyond Words

The embedding trick is not limited to words. The same idea — turn anything into a vector — works for images, audio, and code. Every modality gets mapped into a similar high-dimensional space.

The same trick works for images, audio, and code — turn anything into a vector. This is why multimodal models (like those that understand both text and images) are possible: everything lives in the same kind of space.