Your OpenAI invoice arrives. $2,400. You pay it. But you still can't answer the most important question: which part of your product burned the tokens?

Was it chat? Embeddings? That agent loop you shipped last sprint? The invoice doesn't say. And by the time finance flags the spike, the damage is already done.

Why monthly totals aren't enough

Most teams track LLM spend one of three ways:

  • Provider dashboard — shows total usage, not per-feature
  • Spreadsheet — manual, always outdated
  • Nothing — surprisingly common for teams under $5k/mo

None of these tell you that your agent endpoint consumed 52% of your budget while chat only used 28%. That visibility is what enables real optimization.

Step 1: Tag every API call

Add metadata to every LLM request before it leaves your app:

const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages,
}, {
  headers: {
    "X-Feature": "chat",
    "X-User-Id": userId,
  },
});

Step 2: Log input + output tokens per call

const { usage } = response;
log({
  feature: "chat",
  model: "gpt-4o",
  input_tokens: usage.prompt_tokens,
  output_tokens: usage.completion_tokens,
  cost: calculateCost(usage, "gpt-4o"),
  timestamp: Date.now(),
});

Step 3: Aggregate by feature daily

Roll up logs into a daily view: feature → total tokens → total cost. This is the dashboard your engineering lead actually needs.

Step 4: Set spike alerts

Define thresholds per feature. If chat normally costs $40/day and suddenly hits $400, you want a Slack alert today — not on next month's invoice.

The agent loop problem

Agent chains are the silent budget killer. A loop that retries 12 times burns 12× the expected tokens. Flag any agent call that exceeds 3× your rolling average token count.

Build vs. buy

Or use our free LLM Cost Calculator to estimate your monthly spend in seconds.

Tools like TokenCurb, Helicone, and LangSmith solve this out of the box — each with different strengths.

What to do this week

  1. Pick one feature and add token logging today
  2. Check last month's invoice against your logs
  3. Set one alert for your highest-cost endpoint
  4. Review agent loops for retry patterns

TokenCurb does all of this automatically — per-feature breakdown, spike alerts, and agent loop detection.

Join the waitlist →