Mixture of Models (MoM)

MoM is Stacknet’s ensemble inference engine. It can be configured in a model layer. Instead of routing to a single model, MoM dispatches your prompt to N candidate models in parallel, then a judge model evaluates and surfaces the best response. This results in higher performance.

How MoM works

Trigger modes

MoM can be activated three ways, including reasoning effort based patterns that are OpenAI api compatible:

Explicit: Set "model": "mom-duce" (prefix any layer with mom-)
Flag: Add "mom": true to your request body
Auto: Set "reasoning_effort": "high" with "sequentialThinking": false

The prefixed mom- model names do not appear in /v1/models list in api calls.

Example


curl https://stacknet.magma-rpc.com/v1/chat/completions \
  -H "Authorization: Bearer gk_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mom-duce",
    "messages": [
      {"role": "user", "content": "Compare microservices vs monolith for a 5-person startup"}
    ]
  }'

Or with the flag:


const response = await client.chat.completions.create({
  model: 'preview',
  messages: [{ role: 'user', content: 'Your complex question here' }],
  mom: true,
})

When to use MoM

Use MoM when:

Making high-stakes decisions that benefit from multiple perspectives
Complex reasoning where models may disagree
Research and analysis where thoroughness matters more than speed

Skip MoM when:

Simple queries with clear answers
Latency-sensitive applications
High-volume, low-complexity tasks