Mixture of Models (MoM)
MoM is Stacknet’s ensemble inference engine. It can be configured in a model layer. Instead of routing to a single model, MoM dispatches your prompt to N candidate models in parallel, then a judge model evaluates and surfaces the best response. This results in higher performance.
How MoM works
Trigger modes
MoM can be activated three ways, including reasoning effort based patterns that are OpenAI api compatible:
- Explicit: Set
"model": "mom-duce"(prefix any layer withmom-) - Flag: Add
"mom": trueto your request body - Auto: Set
"reasoning_effort": "high"with"sequentialThinking": false
The prefixed mom- model names do not appear in /v1/models list in api calls.
Example
curl https://stacknet.magma-rpc.com/v1/chat/completions \
-H "Authorization: Bearer gk_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "mom-duce",
"messages": [
{"role": "user", "content": "Compare microservices vs monolith for a 5-person startup"}
]
}'Or with the flag:
const response = await client.chat.completions.create({
model: 'preview',
messages: [{ role: 'user', content: 'Your complex question here' }],
mom: true,
})When to use MoM
Use MoM when:
- Making high-stakes decisions that benefit from multiple perspectives
- Complex reasoning where models may disagree
- Research and analysis where thoroughness matters more than speed
Skip MoM when:
- Simple queries with clear answers
- Latency-sensitive applications
- High-volume, low-complexity tasks
Last updated on