Token-Aware Rate Limiting

6 production-ready cookbooks for rate limiting AI workloads with Valkey. From basic fixed windows to enterprise hierarchical limits.

01

Getting Started

Start Valkey, install deps, and build your first token-aware rate limiter in under 5 minutes.

Beginner~5 minPython
02

Token-Aware Limiting

Count LLM tokens, not just requests. Dual limiting, output estimation, and post-call adjustment.

Intermediate~15 minPython
03

Agent Rate Limiting

Per-agent limits with token bucket, tool-weighted costs, concurrent agent slots, and budget tracking.

Intermediate~20 minPython
04

Hierarchical Limits

Cascading Org → Team → User → Agent → Model limits. All tiers checked in a single Valkey pipeline.

Advanced~25 minPython
05

Cost-Based Limiting

Dollar-amount budgets per window. Model-aware pricing, automatic downgrades, and spend tracking.

Advanced~20 minPython
06

Production Patterns

Retry-After headers, circuit breakers, graceful degradation, request queuing, and observability.

Advanced~30 minPython

🎮 Try It Live

Interactive dashboard with all 5 algorithms. Send requests, test bursts, simulate agents.

Open Interactive Demo
⭐ View on GitHub 📦 Clone Repo