Track token usage and cost for OpenRouter, Groq, and OpenAI — with FastAPI integration.
- ✅ Token tracking per request and per chat session
- ✅ Cost estimation (real model pricing)
- ✅ Multi-turn chat session management
- ✅ FastAPI integration with middleware
- ✅ Production logging
pip install ai-token-monitorfrom ai_token_monitor import TokenMonitor, ChatManager
from ai_token_monitor.utils import normalize_response, extract_reply
monitor = TokenMonitor()
chat_manager = ChatManager()
chat_id = chat_manager.create_chat("user_123")
chat_manager.add_message(chat_id, "user", "Hello!")
# After getting response from OpenAI/OpenRouter:
data = normalize_response(raw_response)
answer = extract_reply(data)
monitor.track(data, model="openai/gpt-4o-mini", chat_manager=chat_manager, chat_id=chat_id)
print(monitor.summary())Copy .env.example to .env and set:
OPENROUTER_API_KEY=your_key_here
uvicorn examples.fastapi_app:app --reloadopenrouter/freedeepseek/deepseek-r1:freedeepseek/deepseek-v3:freemeta-llama/llama-3.3-70b-instruct:freeqwen/qwen-2.5-72b-instruct:freegoogle/gemma-3-27b-it:freemistralai/devstral-2-2512:freexiaomi/mimo-v2-flash:freenvidia/llama-3.3-nemotron-nano-3b-v1:freestepfun/step-3-5-flash:freeupstage/solar-pro-3:free
openai/gpt-5.4openai/gpt-5.4-proopenai/gpt-4oopenai/gpt-4o-miniopenai/o3openai/o4-mini
anthropic/claude-opus-4-6anthropic/claude-sonnet-4-6anthropic/claude-haiku-4-5
google/gemini-3.1-pro-previewgoogle/gemini-3.1-flash-litegoogle/gemini-3-flash
deepseek/deepseek-v3-2deepseek/deepseek-r1
mistralai/devstral-2-2512mistralai/mistral-large-2411mistralai/mixtral-8x7b-instruct
meta-llama/llama-3.3-70b-instructmeta-llama/llama-3.1-405b-instruct
qwen/qwen3.5-plusqwen/qwen3-coder-next
x-ai/grok-4-1-fastx-ai/grok-3-beta
bytedance/seed-1.6minimax/minimax-m2.1moonshot/kimi-k2.5z-ai/glm-5kwai/kat-coder-proallenai/olmo-3.1-32b-think
MIT
