What are tokens and why should we care

November 22, 2025

Part of the AI Coding Basics series

If you’ve used AI coding tools like Claude or ChatGPT, you’ve probably noticed they measure usage in “tokens” instead of words or characters. At first, this seems like an implementation detail you can ignore. But understanding tokens is actually the key to using AI tools effectively and keeping costs under control.

What are tokens anyway?

Tokens are the units AI models use to read and write text. Think of them as chunks of text that the model processes. A token might be a whole word like “cat”, part of a word like “ing” in “running”, or even a single character like a comma.

Here’s a concrete example. If we split “The cat sat on the mat” by words (the naive approach), we’d get:

┌─────┬─────┬─────┬────┬─────┬─────┐
│ The │ cat │ sat │ on │ the │ mat │
└─────┴─────┴─────┴────┴─────┴─────┘
  6 unique words

But with tokenization, the same sentence becomes:

┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐
│ The │ c   │ at  │ s   │ at  │ on  │ the │ m   │ at  │
└─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘
         6 unique tokens: [The, c, at, s, on, the, m]

That’s 9 token instances, but only 6 unique tokens. Notice how “cat”, “sat”, and “mat” each get split into two tokens (” c” + “at”, ” s” + “at”, ” m” + “at”). The “at” token appears three times, which is exactly why tokenization is efficient. The model only needs to learn one representation for “at” and can reuse it across different words. This is much more efficient than treating every word as completely unique.

For English text, a rough rule of thumb is that one token equals about 3-4 characters, or roughly 0.75 words. So “Hello world” is about 2 tokens, while a typical paragraph of 100 words is around 130-140 tokens.

The exact way text gets split into tokens depends on the model’s tokenizer, which is like a dictionary that maps text chunks to numbers the AI can process. Different models use different tokenizers, but the ratios are usually similar.

Why tokens instead of words?

AI models work with numbers, not text. The tokenizer converts your text into a sequence of numbers (tokens), the model processes those numbers, and then converts the output back into text. This approach lets models handle any language, code, special characters, and even emojis consistently.

Using tokens instead of words also means models can handle technical content better. In code, something like getUserById is split into meaningful chunks like get, User, By, Id rather than treated as one giant unknown word.

Why should developers care?

First, cost. Most AI services charge by the token. If you’re sending 50,000 tokens per request when you could send 5,000, you’re paying 10x more than necessary. Understanding tokens helps you write more efficient prompts and avoid dumping entire files when a focused excerpt would work better.

Second, context limits. Every AI model has a maximum context window measured in tokens. Claude Sonnet 4.5 has a 200,000 token limit for both input and output combined. If you hit that limit, the model can’t process your request. Knowing how tokens work helps you stay within these limits and structure your work accordingly.

Third, performance. More tokens means more processing time. A request with 100,000 tokens will take longer than one with 10,000 tokens. Being mindful of token usage leads to faster responses and a better development experience.

The practical takeaway

You don’t need to count every token, but developing an intuition for token costs helps you use AI tools more effectively. When you’re about to paste 5 files into a prompt, ask yourself if the AI really needs all of them. When you’re writing a long explanation, consider if a shorter version would work just as well.

Think of tokens like bandwidth. You have plenty of it, but that doesn’t mean you should waste it. Understanding this one concept will make you better at working with every AI coding tool you use.