6 production-ready cookbooks for rate limiting AI workloads with Valkey. From basic fixed windows to enterprise hierarchical limits.
Start Valkey, install deps, and build your first token-aware rate limiter in under 5 minutes.
Count LLM tokens, not just requests. Dual limiting, output estimation, and post-call adjustment.
Per-agent limits with token bucket, tool-weighted costs, concurrent agent slots, and budget tracking.
Cascading Org → Team → User → Agent → Model limits. All tiers checked in a single Valkey pipeline.
Dollar-amount budgets per window. Model-aware pricing, automatic downgrades, and spend tracking.
Retry-After headers, circuit breakers, graceful degradation, request queuing, and observability.