Semantic Caching

3 cookbooks for building semantic caches that cut LLM costs by 60%+ using vector similarity with the valkey-search module.

Build a semantic cache with FT.CREATE, HSET, and FT.SEARCH KNN. Embed prompts, cache responses, and return hits for similar queries.

Cache full conversation contexts, not just single prompts. Per-user isolation with TAG filters and hybrid search.

Threshold tuning, hit rate monitoring, TTL strategies, memory management, cache invalidation, and cost tracking.