Question 1

What is the difference between input tokens and output tokens?

Accepted Answer

Input tokens are the tokens in your prompt — everything you send to the model including system prompts, conversation history, and the current user message. Output tokens are the tokens in the model's response. Output tokens are consistently more expensive than input tokens because generating text is computationally more intensive than reading it. For example, GPT-4o charges $2.50 per million input tokens but $10.00 per million output tokens — a 4x difference.

Question 2

Why do output tokens cost more than input tokens?

Accepted Answer

Generating tokens requires the model to run sequentially — each token is produced one at a time, with the model attending to all previous context at every step. Input processing can be done in parallel across the sequence. This autoregressive generation process is computationally intensive, which is why providers charge 3–5x more for output tokens than input tokens.

Question 3

What is prompt caching and how does it reduce costs?

Accepted Answer

Prompt caching allows providers to store a computed representation of repeated prompt prefixes (like a long system prompt or document) so they don't need to reprocess them on every request. Anthropic charges $0.30 per million tokens for cache reads (vs $3.00 for standard input), a 90% reduction. OpenAI offers similar discounts. For applications with large, repeated system prompts, caching can reduce costs by 60–80%.

Question 4

How do I estimate my token usage before building?

Accepted Answer

A rough rule of thumb: 1 token ≈ 0.75 English words, or 4 characters. A typical paragraph (75 words) is about 100 tokens. A 2,000-word article is roughly 2,600 tokens. For structured outputs like JSON, tokens run a bit higher. Most providers offer tokenizer tools — OpenAI's tiktoken library and Anthropic's Claude token counter let you measure exact token counts for your specific prompts.

Question 5

Which LLM is most cost-effective for high-volume applications?

Accepted Answer

For high-volume, cost-sensitive applications: Gemini 1.5 Flash ($0.075 input / $0.30 output per million tokens) and GPT-4o mini ($0.15 / $0.60) are the most economical while still offering strong performance. For tasks requiring maximum capability, GPT-4o ($2.50 / $10.00) and Claude 3.5 Sonnet ($3.00 / $15.00) offer the best quality-to-cost ratio among frontier models. Claude 3 Opus ($15.00 / $75.00) and o1 ($15.00 / $60.00) are best reserved for tasks where accuracy is paramount and cost is secondary.

Model	Input / 1M	Output / 1M	Per Request	Daily	Monthly
Gemini 1.5 Flash	$0.075	$0.300	$0.000225	$0.0225	$0.67
GPT-4o mini	$0.150	$0.600	$0.000450	$0.0450	$1.35
Llama 3.3 70B (Groq)	$0.590	$0.790	$0.000985	$0.0985	$2.96
Claude 3.5 Haiku	$0.800	$4.000	$0.002800	$0.2800	$8.40
Gemini 1.5 Pro	$1.250	$5.000	$0.003750	$0.3750	$11.25
Mistral Large	$2.000	$6.000	$0.005000	$0.5000	$15.00
GPT-4o	$2.500	$10.000	$0.007500	$0.7500	$22.50
o1-mini	$3.000	$12.000	$0.009000	$0.9000	$27.00
Claude 3.5 Sonnet	$3.000	$15.000	$0.010500	$1.0500	$31.50
GPT-4 Turbo	$10.000	$30.000	$0.025000	$2.5000	$75.00
o1	$15.000	$60.000	$0.045000	$4.5000	$135.00
Claude 3 Opus	$15.000	$75.000	$0.052500	$5.2500	$157.50

Token Type	What it includes	Relative cost
Input	System prompt, conversation history, user message, retrieved documents (RAG)	1x (baseline)
Output	Model's response text, tool call arguments, structured output JSON	3–5x input price

Strategy	Potential Saving	Complexity
Right-size the model (use Flash/mini for simple tasks)	50–95%	Low
Prompt caching for repeated prefixes	40–80%	Medium
RAG instead of full-document context	50–90%	High
Shorten system prompts	10–30%	Low
Batch API (async, lower priority)	50%	Low
Response streaming with early termination	5–20%	Medium

Tier	Examples	Best for
Economy	GPT-4o mini, Gemini Flash, Claude Haiku, Llama 70B	Classification, extraction, summarization, simple Q&A, high-volume pipelines
Mid-range	GPT-4o, Claude Sonnet, Gemini Pro	Chat, code assistance, analysis, most production workloads
Frontier	o1, Claude Opus, GPT-4 Turbo	Complex reasoning, long-horizon planning, specialized research tasks

LLM API Cost Calculator

How LLM API Pricing Works

Input Tokens vs. Output Tokens

Context Window and Its Cost Implications

Prompt Caching — The Hidden Cost Saver

Cost Optimization Strategies

Model Tiers and When to Use Each

Frequently Asked Questions

Related Calculators