Curious what's working for people here in practice. The tooling landscape is getting crowded (LiteLLM, Helicone, Langfuse, LangSmith, Bifrost, provider dashboards, custom middleware...) but actual workflows still seem pretty fragmented, or completely unhinged (read: caveman ).
I'm interested across three use cases:
Are you tracking spend per request, per customer, per feature, or per successful outcome?
Gateway-level?
SDK wrapper?
Something custom?
Beyond just "swap to a cheaper model," is anyone doing real optimization like semantic caching, prompt compaction, adaptive context windows, confidence-based model escalation?
Has anyone gotten to actual cost-per-user-action metrics rather than just raw token totals?
This is where spend gets slippery. Lots of small calls that add up: planning, tool selection, heartbeat, memory updates, retries, self-correction.
Are you routing different parts of the loop to different models (cheap for heartbeat, expensive only for hard reasoning)?
Setting explicit budgets per run?
Anyone found good heuristics for stopping an agent from overthinking before the orchestration costs more than the task itself?
Feels under-discussed because coding workflows look cheap per interaction but can be massive in aggregate. Are you tracking spend per dev, per repo, per task type?
Aggressively filtering context before sending?
Using smaller models for first-pass edits and reserving bigger ones for architecture decisions?
Has anyone measured whether agentic coding actually reduces cost per merged change, or just increases throughput while also increasing spend?
What I'd love to hear:
Built by @aellman
2026 68 Ventures, LLC. All rights reserved.