Price Per TokenPrice Per Token
arturoyo·16d ago

Measuring and optimizing LLM spend

optym.pro

Been working on exactly this problem for the past year. What we've found is that the biggest lever isn't really observability — it's automatic model routing.

Most teams are paying for a single model (usually GPT-4 class) for every call, when in practice 60–70% of those calls could be handled by a model 5–10x cheaper with equivalent output quality.

We built optym.pro around this idea — it's an OpenAI-compatible router that evaluates each request and routes it to the best model for that specific task, in real time. No code changes on your end, just swap the endpoint.

Teams spending $3K+/month on LLMs are seeing 40–60% reduction. Happy to share more details if useful.

1
to join the discussion.

1 comment

ellmanalex·16d ago

Looks great! It could be interesting to showcase this on our playground!

1 pt