
Why most people get these 3 AI terms wrong (and it's making their work worse)
Discover why AI hallucination isn't always bad, what your context window really means, and why tokens control everything. Stop making these common mistakes.
You've probably heard someone say AI is "hallucinating again" and rolled their eyes. But here's the thing: AI hallucinations are when a large language model (LLM) perceives patterns or objects that are nonexistent, creating nonsensical or inaccurate outputs. They are plausible but false statements generated by language models.
Most people think this is always terrible. They're wrong.
The same mechanism that produces "wrong" answers is exactly what gives you creative writing, breakthrough ideas, and novel connections. Some researchers take an anthropomorphic perspective and posit that hallucinations arise from a tension between novelty and usefulness. For instance, Amabile and Pratt define human creativity as the production of novel and useful ideas. By extension, a focus on novelty in machine creativity can lead to the production of original but inaccurate responses—that is, falsehoods—whereas a focus on usefulness may result in memorized content lacking originality.
Understanding this changes everything about how you use AI tools. In this guide, you'll learn the three most misunderstood AI terms that are secretly controlling your results—and how to use them to your advantage.
What is AI hallucination really?
The word "hallucination" makes it sound broken. It's not. In creative and innovative fields, this behavior may prove beneficial.
Think about it: when ChatGPT writes a story, comes up with marketing angles, or brainstorms solutions to your problem, it's using the exact same process that sometimes produces factual errors. The use of ChatGPT helped generate more creative ideas for various everyday and innovation-related problems, compared with not using any technology or using a conventional Web search (Google). We found that using ChatGPT increased the creativity of the generated ideas.
The real question isn't whether hallucination is good or bad. The question is: does your task need creativity or accuracy?
How do you control creativity versus accuracy?
Here's where it gets practical. You have more control than you think.
For maximum creativity: Turn off web search in ChatGPT or Claude. When the AI can only work with its training data, it makes more unexpected connections. Search in ChatGPT, while powerful in theory, introduces major limitations that often get in the way of deeper conversations, creative ideation and reliable outputs.
For maximum accuracy: Enable web search. The AI can ground its responses in current, factual information. But understand the tradeoff—Ninety-four percent of ideas from those who used ChatGPT "shared overlapping concepts." Participants who used their own ideas with the help of web searches produced the most "unique concepts."
This isn't a bug in the system. It's a feature you can control.
What does context window actually mean?
The context window (or "context length") of a large language model (LLM) is the amount of text, in tokens, that the model can consider or "remember" at any one time. But most people think about it wrong.
It's not like a conversation where information builds up. A context window is the portion of information an AI model can use at one time when generating a response. It acts as the model's working memory.
Think of it like a whiteboard with fixed dimensions. You can write on it, erase parts, and write more. But once it's full, adding new information means something else disappears. When a prompt, conversation, document or code base exceeds an artificial intelligence model's context window, it must be truncated or summarized for the model to proceed.
Why does your context window matter?
Context windows are the difference between an AI that feels attentive and one that seems forgetful. If the window is too small, the model might lose track of earlier messages and produce disjointed or contradictory answers.
Here's what happens in practice:
- Small context: The AI forgets your earlier instructions or context
- Large context: The AI can maintain coherence across longer conversations
- Overloaded context: Performance degrades due to "context rot"
Different models have wildly different context windows:
- GPT-3 was about 4k tokens. Then we got increases to 8k, 32k and even more tokens.
- Claude (from Anthropic) has a context window of up to 200k tokens.
- Google Gemini 1.5 Pro: 1 million tokens, in testing up to 10 million
How do you manage context windows effectively?
When the AI starts giving weird, inconsistent answers, it's often because the context window is full. The solution is simple: start a new conversation.
Businesses and developers often use techniques like: Summarization: Compressing earlier conversation turns while keeping key facts · Prioritization: Keeping critical information, such as account numbers or case IDs, in view · Selective inclusion: Dropping filler messages or irrelevant details to save space
You can do this too. Before starting a complex task, give the AI a clear, focused prompt rather than letting context build up randomly.
What are tokens and why do they control everything?
The smallest piece of information we use when we write is a single character, like a letter or number. Similarly, a token is the smallest piece of information an AI model uses.
But tokens aren't words. Tokens are almost words (and also punctuation and other symbols). So, a token is almost a word, but not quite. Statistically, it's about ¾ths of a word.
This matters because They're fundamental to how Large Language Models (LLMs) work. Mostly because tokens are the standard unit of measure in LLM Land. But also because the limits of context windows are usually stated in tokens.
Why do free plans have usage limits?
Every interaction with AI consumes tokens. When you send a prompt, that's tokens. When the AI responds, that's more tokens. The faster tokens can be processed, the faster models can learn and respond. The goal is to achieve the fastest processing time and lowest cost per token to optimize AI infrastructure and maximize revenue generation
This is why free ChatGPT has usage limits. Each conversation literally costs money in computational resources.
How do you use fewer tokens?
Better prompting uses fewer tokens, letting you do more before hitting limits. Instead of:
"Can you please help me write a professional email to my client about the project delay and make it sound polite and apologetic while also explaining the technical reasons behind the delay?"
Try:
"Write a professional apology email: project delayed due to API integration issues."
The second version gets the same result with fewer tokens.
What tools help you understand these concepts?
To see tokenization in action, check out OpenAI's Tokenizer. It shows exactly how your text gets broken down into tokens.
For understanding context windows, Anthropic's Claude documentation explains how to optimize your prompts for better performance.
Hugging Face's Tokenizer Playground lets you experiment with different models' tokenization approaches.
How do you put this knowledge to work?
Understanding these three concepts changes how you approach every AI interaction:
For creative tasks: Turn off web search, use focused prompts, and don't worry about "hallucination"—embrace it as the source of novel ideas.
For factual tasks: Enable web search, break complex queries into smaller parts, and start new conversations when context gets muddled.
For efficiency: Write concise prompts to save tokens, monitor your context window, and structure conversations to maintain coherence.
The difference between mediocre AI results and transformative ones often comes down to understanding these fundamentals. "If I just sit back and let ChatGPT do the work, I'm not taking the full advantage of what this tool has to offer. I can do better than that."
Most people use AI like a magic black box. Now you know what's really happening under the hood—and how to use that knowledge to get dramatically better results.