3 cookbooks for building semantic caches that cut LLM costs by 60%+ using vector similarity with the valkey-search module.
Build a semantic cache with FT.CREATE, HSET, and FT.SEARCH KNN. Embed prompts, cache responses, and return hits for similar queries.
Cache full conversation contexts, not just single prompts. Per-user isolation with TAG filters and hybrid search.
Threshold tuning, hit rate monitoring, TTL strategies, memory management, cache invalidation, and cost tracking.