Rate Limiting Cookbooks - Valkey for AI

6 production-ready cookbooks for rate limiting AI workloads with Valkey. From basic fixed windows to enterprise hierarchical limits.

Start Valkey, install deps, and build your first token-aware rate limiter in under 5 minutes.

Count LLM tokens, not just requests. Dual limiting, output estimation, and post-call adjustment.

Per-agent limits with token bucket, tool-weighted costs, concurrent agent slots, and budget tracking.

Cascading Org → Team → User → Agent → Model limits. All tiers checked in a single Valkey pipeline.

Dollar-amount budgets per window. Model-aware pricing, automatic downgrades, and spend tracking.

Retry-After headers, circuit breakers, graceful degradation, request queuing, and observability.

Token-Aware Rate Limiting