Price Per TokenPrice Per Token
Cahl-Dee·17d ago

Measuring and optimizing LLM spend

Curious what's working for people here in practice. The tooling landscape is getting crowded (LiteLLM, Helicone, Langfuse, LangSmith, Bifrost, provider dashboards, custom middleware...) but actual workflows still seem pretty fragmented, or completely unhinged (read: caveman ).

I'm interested across three use cases:

  1. LLMs powering apps / products

Are you tracking spend per request, per customer, per feature, or per successful outcome?
Gateway-level?
SDK wrapper?
Something custom?
Beyond just "swap to a cheaper model," is anyone doing real optimization like semantic caching, prompt compaction, adaptive context windows, confidence-based model escalation?
Has anyone gotten to actual cost-per-user-action metrics rather than just raw token totals?

  1. LLMs inside agents (OpenClaw-style, tool-use loops, planner-executor setups)

This is where spend gets slippery. Lots of small calls that add up: planning, tool selection, heartbeat, memory updates, retries, self-correction.
Are you routing different parts of the loop to different models (cheap for heartbeat, expensive only for hard reasoning)?
Setting explicit budgets per run?
Anyone found good heuristics for stopping an agent from overthinking before the orchestration costs more than the task itself?

  1. LLMs for software development

Feels under-discussed because coding workflows look cheap per interaction but can be massive in aggregate. Are you tracking spend per dev, per repo, per task type?
Aggressively filtering context before sending?
Using smaller models for first-pass edits and reserving bigger ones for architecture decisions?
Has anyone measured whether agentic coding actually reduces cost per merged change, or just increases throughput while also increasing spend?

What I'd love to hear:

  • Your stack and what you measure
  • Where spend surprised you
  • The optimization that actually moved the needle (and the one you thought would but didn't)
  • Whether you optimize for raw token cost, latency-adjusted cost, or cost per outcome
  • What's working for you?
0
to join the discussion.