Prompt Caching
Prompt caching reduces latency and cost by reusing computed context from previous requests. When the beginning of your prompt matches a cached prefix, Stacknet skips re-processing those tokens.
How it works
What gets cached
| Cacheable | Not cacheable |
|---|---|
| System prompts | Dynamic user messages |
| Few-shot examples | Streaming deltas |
| Tool definitions | Random/temperature-dependent outputs |
| Static context documents | Cross-node state |
Usage
Caching is automatic — no special configuration required. The network transparently caches and reuses prefixes when beneficial.
Optimizing for cache hits
Structure your prompts with static content first:
const response = await client.chat.completions.create({
model: 'preview',
messages: [
// Static — will be cached after first request
{
role: 'system',
content: 'You are a helpful assistant specialized in...'
},
// Static context — also cached
{
role: 'user',
content: 'Here is the reference document: [long document]'
},
{ role: 'assistant', content: 'I have read the document.' },
// Dynamic — only this part changes between requests
{
role: 'user',
content: userQuestion // Changes each request
}
]
})Cache indicators
The response includes cache metadata:
{
"usage": {
"prompt_tokens": 1500,
"completion_tokens": 200,
"prompt_tokens_details": {
"cached_tokens": 1200
}
}
}Distributed caching
Across Stacknet, caching operates at the node level:
- Each aISP node maintains its own semantic cache shard
- Warm affinity routing preferentially routes repeat requests to the same node
- Cache hits improve aISP efficiency and reduce user costs
Last updated on