CalculatorFree.net

LLM API Cost Calculator

Last updated: April 2026 — Prices change frequently. Verify at provider pricing pages.

Cost per request: $0.007500
Daily cost: $0.7500
Monthly cost (30 days): $22.50
Annual cost (365 days): $273.75
Model: GPT-4o (OpenAI) Input tokens per request: 1,000 Output tokens per request: 500 Requests per day: 100 Input cost per request: 1,000 tokens × $2.500 / 1,000,000 = $0.002500 Output cost per request: 500 tokens × $10.000 / 1,000,000 = $0.005000 Cost per request = $0.007500 Daily cost = $0.007500 × 100 requests = $0.7500 Monthly cost (×30) = $22.50 Annual cost (×365) = $273.75
All Models — Cost Comparison (1,000 input / 500 output tokens, 100 req/day)
ModelInput / 1MOutput / 1MPer RequestDailyMonthly
Gemini 1.5 Flash$0.075$0.300$0.000225$0.0225$0.67
GPT-4o mini$0.150$0.600$0.000450$0.0450$1.35
Llama 3.3 70B (Groq)$0.590$0.790$0.000985$0.0985$2.96
Claude 3.5 Haiku$0.800$4.000$0.002800$0.2800$8.40
Gemini 1.5 Pro$1.250$5.000$0.003750$0.3750$11.25
Mistral Large$2.000$6.000$0.005000$0.5000$15.00
GPT-4o$2.500$10.000$0.007500$0.7500$22.50
o1-mini$3.000$12.000$0.009000$0.9000$27.00
Claude 3.5 Sonnet$3.000$15.000$0.010500$1.0500$31.50
GPT-4 Turbo$10.000$30.000$0.025000$2.5000$75.00
o1$15.000$60.000$0.045000$4.5000$135.00
Claude 3 Opus$15.000$75.000$0.052500$5.2500$157.50

Prices change frequently. Always verify current pricing at OpenAI, Anthropic, Google, and other provider pages.

Advertisement

How LLM API Pricing Works

Large language model APIs are priced per token — the fundamental unit of text that models process. Every provider splits billing into two categories: input tokens (what you send) and output tokens (what the model generates). Prices are quoted per million tokens, making even expensive models practical for most applications.

As of April 2026, pricing spans two orders of magnitude — from $0.075 per million input tokens (Gemini 1.5 Flash) to $15.00 per million (Claude 3 Opus, OpenAI o1). The wide range reflects real differences in capability, context window size, and reasoning depth. Choosing the right model for your workload is the single most impactful cost optimization.

Input Tokens vs. Output Tokens

The input/output distinction is the most important concept in LLM cost modeling:

Token TypeWhat it includesRelative cost
Input System prompt, conversation history, user message, retrieved documents (RAG) 1x (baseline)
Output Model's response text, tool call arguments, structured output JSON 3–5x input price

For most chat applications, outputs are 20–40% of total tokens but 60–80% of total cost because of this price differential. Applications with long outputs (code generation, document drafting) are dramatically more expensive than classification or extraction tasks where outputs are short.

Context Window and Its Cost Implications

Every LLM has a context window — the maximum number of tokens it can process in a single request (input + output combined). Larger context windows cost more per request when filled: sending a 100,000-token document to Claude 3 Opus costs $1.50 in input tokens alone (100,000 × $15 / 1,000,000). Context management is therefore a core engineering concern in production applications.

Strategies to reduce context costs: summarize conversation history instead of replaying it verbatim, use RAG (Retrieval-Augmented Generation) to send only the relevant document chunks, and leverage prompt caching for repeated prefixes. A well-engineered RAG pipeline can reduce per-request token counts by 70–90% compared to naive full-document approaches.

Prompt Caching — The Hidden Cost Saver

Prompt caching is a mechanism where the API stores a processed representation of a repeated prompt prefix. If you always start requests with the same 10,000-token system prompt, you can cache it and pay only a fraction (typically 10% of normal input price) on subsequent requests that hit the cache.

Anthropic charges $3.75 per million tokens to write to cache and $0.30 per million for cache reads (vs $3.00 for standard input). OpenAI offers automatic prompt caching with similar economics. For applications with large system prompts — customer support bots with product documentation, coding assistants with codebase context — caching can cut input costs by 80–90%.

Cost Optimization Strategies

The most effective cost levers in LLM applications, roughly in order of impact:

StrategyPotential SavingComplexity
Right-size the model (use Flash/mini for simple tasks)50–95%Low
Prompt caching for repeated prefixes40–80%Medium
RAG instead of full-document context50–90%High
Shorten system prompts10–30%Low
Batch API (async, lower priority)50%Low
Response streaming with early termination5–20%Medium

The most common mistake is defaulting to a frontier model for all tasks. A simple classification or intent-detection task that costs $0.002 per request on GPT-4o costs $0.000015 on GPT-4o mini — over 100x cheaper with nearly identical accuracy for that task class. Always evaluate which model tier a given task actually requires.

Model Tiers and When to Use Each

Providers now offer distinct capability tiers. Matching the tier to the task is the fundamental cost optimization:

TierExamplesBest for
Economy GPT-4o mini, Gemini Flash, Claude Haiku, Llama 70B Classification, extraction, summarization, simple Q&A, high-volume pipelines
Mid-range GPT-4o, Claude Sonnet, Gemini Pro Chat, code assistance, analysis, most production workloads
Frontier o1, Claude Opus, GPT-4 Turbo Complex reasoning, long-horizon planning, specialized research tasks

Frequently Asked Questions

What is the difference between input tokens and output tokens?

Input tokens are the tokens in your prompt — everything you send to the model including system prompts, conversation history, and the current user message. Output tokens are the tokens in the model's response. Output tokens are consistently more expensive than input tokens because generating text is computationally more intensive than reading it. For example, GPT-4o charges $2.50 per million input tokens but $10.00 per million output tokens — a 4x difference.

Why do output tokens cost more than input tokens?

Generating tokens requires the model to run sequentially — each token is produced one at a time, with the model attending to all previous context at every step. Input processing can be done in parallel across the sequence. This autoregressive generation process is computationally intensive, which is why providers charge 3–5x more for output tokens than input tokens.

What is prompt caching and how does it reduce costs?

Prompt caching allows providers to store a computed representation of repeated prompt prefixes (like a long system prompt or document) so they don't need to reprocess them on every request. Anthropic charges $0.30 per million tokens for cache reads (vs $3.00 for standard input), a 90% reduction. OpenAI offers similar discounts. For applications with large, repeated system prompts, caching can reduce costs by 60–80%.

How do I estimate my token usage before building?

A rough rule of thumb: 1 token ≈ 0.75 English words, or 4 characters. A typical paragraph (75 words) is about 100 tokens. A 2,000-word article is roughly 2,600 tokens. For structured outputs like JSON, tokens run a bit higher. Most providers offer tokenizer tools — OpenAI's tiktoken library and Anthropic's Claude token counter let you measure exact token counts for your specific prompts.

Which LLM is most cost-effective for high-volume applications?

For high-volume, cost-sensitive applications: Gemini 1.5 Flash ($0.075 input / $0.30 output per million tokens) and GPT-4o mini ($0.15 / $0.60) are the most economical while still offering strong performance. For tasks requiring maximum capability, GPT-4o ($2.50 / $10.00) and Claude 3.5 Sonnet ($3.00 / $15.00) offer the best quality-to-cost ratio among frontier models. Claude 3 Opus ($15.00 / $75.00) and o1 ($15.00 / $60.00) are best reserved for tasks where accuracy is paramount and cost is secondary.

Advertisement

Related Calculators