Skip to Content
DocumentationModelsMixture of Models (MoM)

Mixture of Models (MoM)

MoM is Stacknet’s ensemble inference engine. It can be configured in a model layer. Instead of routing to a single model, MoM dispatches your prompt to N candidate models in parallel, then a judge model evaluates and surfaces the best response. This results in higher performance.

How MoM works

Trigger modes

MoM can be activated three ways, including reasoning effort based patterns that are OpenAI api compatible:

  1. Explicit: Set "model": "mom-duce" (prefix any layer with mom-)
  2. Flag: Add "mom": true to your request body
  3. Auto: Set "reasoning_effort": "high" with "sequentialThinking": false

The prefixed mom- model names do not appear in /v1/models list in api calls.

Example

curl https://stacknet.magma-rpc.com/v1/chat/completions \ -H "Authorization: Bearer gk_YOUR_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "mom-duce", "messages": [ {"role": "user", "content": "Compare microservices vs monolith for a 5-person startup"} ] }'

Or with the flag:

const response = await client.chat.completions.create({ model: 'preview', messages: [{ role: 'user', content: 'Your complex question here' }], mom: true, })

When to use MoM

Use MoM when:

  • Making high-stakes decisions that benefit from multiple perspectives
  • Complex reasoning where models may disagree
  • Research and analysis where thoroughness matters more than speed

Skip MoM when:

  • Simple queries with clear answers
  • Latency-sensitive applications
  • High-volume, low-complexity tasks
Last updated on