It is natural to assume a language model reads your prompt the way you do — as words. It does not. Before a model sees anything, your text is chopped into tokens, and that single fact explains a surprising number of the model's quirks: why it charges by the token, why it miscounts letters, and why some languages cost more than others.

What a token is

A token is a chunk of text — sometimes a whole word, often a piece of one, sometimes just a few characters or a space. Common words like "the" are usually a single token. Longer or rarer words get split: "unbelievable" might become "un," "believ," and "able." Models use this subword scheme because it strikes a balance — a manageable vocabulary that can still represent any word, including ones never seen in training, by assembling them from pieces. A rough rule of thumb in English is that a token is about three-quarters of a word.

Why the model "thinks" in tokens

Everything the model does happens in token units. It converts each token into a vector, processes the sequence, and predicts the next token — over and over — to generate text. It never actually manipulates letters or words as such; it manipulates these chunks. That is the root of a famous failure: ask a model how many times the letter "r" appears in a word and it often gets it wrong, because the word arrived as a couple of tokens, not a string of individual letters it can count. It is not stupid; it literally cannot see the spelling the way you can.

Why tokens are the unit of money and memory

Tokens are also the currency. API pricing is per token, context windows are measured in tokens, and rate limits are counted in tokens. This is not an arbitrary billing choice — tokens are the actual unit of computation, so they are the honest measure of work. It also means wording matters: a verbose prompt and a tight one that say the same thing can cost noticeably different amounts, because the model pays per chunk, not per idea.

The language tax

Tokenizers are usually trained mostly on English, so English packs efficiently — few tokens per word. Many other languages, and especially non-Latin scripts, fragment into far more tokens for the same meaning. The practical consequence is real and a little unfair: the same sentence can cost several times more in one language than another, fit into less of the context window, and run slower. As models go global, this hidden tax on non-English text is an active area of improvement.

Why it matters

Once you know the model reads tokens, its odd behavior stops being mysterious. Counting characters, exact spelling, and precise formatting are hard for it because those operate below the token level. Cost, speed, and context limits all track token counts, not word counts. Keep prompts concise, expect weakness on letter-level tasks, and remember that the model is not reading your words — it is reading the pieces your words were broken into.

Analysis by GenZTech.