How "understanding" becomes [under][stand][ing] — and why it matters.
Interactive companion to Issue 7: Attention Is All You Need
Type anything below, or pick a pre-loaded example. The tokenizer will break your text into tokens using Byte-Pair Encoding (BPE) — the same algorithm behind GPT, Claude, and other LLMs.
Watch Byte-Pair Encoding build a vocabulary from scratch. It starts with individual characters, then repeatedly merges the most frequent adjacent pair into a new token.
Your text after BPE tokenization. Hover over any token to see its ID and frequency. Each unique token gets a distinct color.
The same text tokenized with different vocabulary sizes. A bigger vocabulary means fewer tokens — but the vocabulary table itself takes more memory.
Different kinds of text tokenize very differently. English prose is "cheap" because LLMs train mostly on English. Code, emoji, and other languages cost more tokens for the same meaning.
Every token maps to an integer ID, which then maps to an embedding vector — a point in high-dimensional space. Similar words end up near each other.
| Token | ID | Embedding (768-dim, first 20 shown) |
|---|
LLM APIs charge per token. Here is what processing your text would cost across different models.