Semantic
Caching

Cache LLM responses by meaning, not exact match. Cut API costs by 60%+ and slash latency from seconds to milliseconds using vector similarity with the valkey-search module.

valkey-searchLLMCost SavingsVector SimilarityLow Latency

Cookbooks

3 guides from basic semantic caching to production multi-turn conversation caching

01
Getting Started
Build a semantic cache with FT.CREATE, HSET, FT.SEARCH KNN
Beginner~15 min
02
Multi-Turn Conversation Caching
Cache conversation contexts with per-user TAG isolation
Intermediate~20 min
03
Production Patterns
Threshold tuning, monitoring, TTL, eviction, cost tracking
Advanced~25 min

Live Demo

Type prompts, see cache hits/misses, adjust similarity thresholds, and track cost savings in real-time

Complete source code on GitHub

All cookbooks and demo code for semantic caching with valkey-search.

View on GitHub → All 3 Cookbooks →

SemanticCaching

Cookbooks

Getting Started

Multi-Turn Conversation Caching

Production Patterns

Live Demo

Complete source code on GitHub

Semantic
Caching