LangChain & Vercel AI Adapters
Drop semantic caching into LangChain or the Vercel AI SDK with one import. No check/store loops - the adapter handles hit/miss transparently.
Cookbook 01 showed the direct check() / store() API. For applications already using LangChain or the Vercel AI SDK, the package ships with first-class adapters that plug semantic caching in at the framework level - no manual hit/miss loops required.
LangChain Adapter
BetterDBSemanticCache implements LangChain's BaseCache interface and is passed directly to any ChatModel or LLM constructor via the cache option.
Setup
import Valkey from 'iovalkey';
import { SemanticCache } from '@betterdb/semantic-cache';
import { BetterDBSemanticCache } from '@betterdb/semantic-cache/langchain';
import { ChatOpenAI } from '@langchain/openai';
import OpenAI from 'openai';
const openai = new OpenAI();
const client = new Valkey({ host: 'localhost', port: 6379 });
const cache = new SemanticCache({
client,
embedFn: async (text) => {
const res = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: text,
});
return res.data[0].embedding;
},
defaultThreshold: 0.15,
});
const model = new ChatOpenAI({
model: 'gpt-4o-mini',
cache: new BetterDBSemanticCache({ cache }),
});
Usage
import { HumanMessage } from '@langchain/core/messages';
// First call - miss, calls OpenAI
const r1 = await model.invoke([new HumanMessage('What is Valkey?')]);
console.log(r1.content);
// "Valkey is an open-source, in-memory data structure store..."
// Second call with different phrasing - hit, served from Valkey
const r2 = await model.invoke([new HumanMessage('Explain what Valkey is.')]);
console.log(r2.content);
// "Valkey is an open-source, in-memory data structure store..." (same response, <1ms)
await client.quit();
Scoping by Model
By default, cache entries are shared across all models. Set filterByModel: true to scope entries to the specific LLM configuration - useful when different models produce meaningfully different responses to the same prompt:
const modelAwareCache = new BetterDBSemanticCache({
cache,
filterByModel: true, // a gpt-4o cache entry won't hit for gpt-4o-mini
});
Vercel AI SDK Adapter
createSemanticCacheMiddleware wraps any Vercel AI SDK language model. Use it with wrapLanguageModel() from the ai package.
Setup
import Valkey from 'iovalkey';
import { SemanticCache } from '@betterdb/semantic-cache';
import { createSemanticCacheMiddleware } from '@betterdb/semantic-cache/ai';
import { wrapLanguageModel, generateText } from 'ai';
import { openai } from '@ai-sdk/openai';
const client = new Valkey({ host: 'localhost', port: 6379 });
const cache = new SemanticCache({
client,
embedFn: async (text) => {
// same embed function as above
},
defaultThreshold: 0.15,
});
const model = wrapLanguageModel({
model: openai('gpt-4o-mini'),
middleware: createSemanticCacheMiddleware({ cache }),
});
Usage
// First call - miss
const { text: t1 } = await generateText({
model,
prompt: 'What is Valkey?',
});
console.log(t1);
// Second call - hit, returned from Valkey, no tokens consumed
const { text: t2 } = await generateText({
model,
prompt: 'Can you describe what Valkey is?',
});
console.log(t2); // same response as t1
await client.quit();
The middleware intercepts doGenerate() calls before they reach OpenAI. On a hit it returns the cached response directly - no API call, no tokens, no latency.
Streaming: The Vercel AI SDK adapter caches
generateText()calls only. Streaming responses viastreamText()are not cached - accumulate the full response and callcache.store()manually if you need to cache streamed output.
Direct API vs Adapters
| Direct API | LangChain Adapter | Vercel AI SDK Adapter | |
|---|---|---|---|
| Framework dependency | None | @langchain/core |
ai package |
check() / store() control |
Full | Automatic | Automatic |
| Works with any LLM client | Yes | LangChain models only | Vercel AI models only |
| Streaming support | Manual | Automatic (no cache) | No (generateText only) |
| Best for | Custom pipelines | LangChain agents | Next.js / AI SDK apps |
Use the direct API when you need fine-grained control - for example, to pass per-request TTLs, categories, or to handle uncertain hits differently. Use an adapter when you want zero-boilerplate caching for an existing LangChain or Vercel AI SDK application.