Prompt Caching

Prompt caching reduces latency and cost by reusing computed context from previous requests. When the beginning of your prompt matches a cached prefix, Stacknet skips re-processing those tokens.

How it works

What gets cached

Cacheable	Not cacheable
System prompts	Dynamic user messages
Few-shot examples	Streaming deltas
Tool definitions	Random/temperature-dependent outputs
Static context documents	Cross-node state

Usage

Caching is automatic, no special configuration required. The network transparently caches and reuses prefixes when beneficial.

Optimizing for cache hits

Structure your prompts with static content first:


const response = await client.chat.completions.create({
  model: 'preview',
  messages: [
    // Static, will be cached after first request
    {
      role: 'system',
      content: 'You are a helpful assistant specialized in...'
    },
    // Static context, also cached
    {
      role: 'user',
      content: 'Here is the reference document: [long document]'
    },
    { role: 'assistant', content: 'I have read the document.' },
    // Dynamic, only this part changes between requests
    {
      role: 'user',
      content: userQuestion  // Changes each request
    }
  ]
})

Cache indicators

The response includes cache metadata:


{
  "usage": {
    "prompt_tokens": 1500,
    "completion_tokens": 200,
    "prompt_tokens_details": {
      "cached_tokens": 1200
    }
  }
}

Distributed caching

Across Stacknet, caching operates at the node level:

Each aISP node maintains its own semantic cache shard
Warm affinity routing preferentially routes repeat requests to the same node
Cache hits improve aISP efficiency and reduce user costs