Multi-Model Teams
Each role in an ArkTeam can use a different model. You can mix Ollama, OpenAI, and Anthropic in a single pipeline — routing each step to the model best suited for that task.
Common use cases:
- Use a small local model for cheap, fast tasks (summarization, formatting) and a larger model only for reasoning-heavy steps
- Route to OpenAI for tasks that require structured JSON and Anthropic for long-form writing
- Pin a specific model version to a specific role for reproducibility
How it works
Each inline role definition has its own model field. The agent runtime for that role auto-detects the provider from the model name — or you can specify it explicitly with AGENT_PROVIDER via an ArkAgent reference.
For inline roles, provider detection uses the same rules as everywhere else:
claude-*→ Anthropicgpt-*/o*→ OpenAI- anything else → OpenAI-compatible (reads
OPENAI_BASE_URLfor Ollama)
Example: mixed-model content team
This pipeline uses llama3.2 (Ollama) for the researcher and gpt-4o for the writer, since writing quality is more critical than research breadth:
apiVersion: arkonis.dev/v1alpha1
kind: ArkTeam
metadata:
name: mixed-content-team
namespace: my-org
spec:
output: "{{ .steps.writer.output }}"
roles:
- name: researcher
model: llama3.2 # fast and cheap — Ollama
systemPrompt: |
You are a research assistant. Summarize key facts about the topic.
Be concise — bullet points are fine.
- name: writer
model: gpt-4o # higher quality writing
systemPrompt: |
You are a skilled writer. Turn the research bullets into a polished
500-word article for a technical audience.
pipeline:
- role: researcher
inputs:
prompt: "Research: {{ .input.topic }}"
- role: writer
dependsOn: [researcher]
inputs:
research: "{{ .steps.researcher.output }}"
prompt: "Write a polished article from these notes."
To run this locally, both providers need to be available:
# Ollama running locally + OpenAI API key
OPENAI_BASE_URL=http://localhost:11434/v1 \
OPENAI_API_KEY=sk-... \
ark run mixed-content-team.yaml --watch \
--input topic="distributed systems consensus"
The CLI auto-detects the provider per step: llama3.2 sends requests to OPENAI_BASE_URL, gpt-4o sends to api.openai.com.
Example: Anthropic + Ollama
roles:
- name: coordinator
model: claude-sonnet-4-20250514 # Anthropic — strong reasoning
systemPrompt: "You are the coordinator. Break down the task."
canDelegate: [worker]
- name: worker
model: llama3.2 # Ollama — bulk work
systemPrompt: "You are a worker. Execute the given task precisely."
canDelegate: []
Using ArkAgent references for per-role provider config
If a role needs explicit provider settings (e.g., a custom OPENAI_BASE_URL), define a standalone ArkAgent and reference it from the team:
apiVersion: arkonis.dev/v1alpha1
kind: ArkAgent
metadata:
name: local-worker
namespace: my-org
spec:
model: llama3.2:70b
systemPrompt: "You are a local worker agent."
# Provider env vars are set via agentExtraEnv in Helm or the arkonis-api-keys Secret
---
apiVersion: arkonis.dev/v1alpha1
kind: ArkTeam
metadata:
name: hybrid-team
namespace: my-org
spec:
roles:
- name: planner
model: gpt-4o
systemPrompt: "Plan the work."
- name: executor
arkAgent: local-worker # uses the separate ArkAgent with its own config
pipeline:
- role: planner
inputs:
prompt: ""
- role: executor
dependsOn: [planner]
inputs:
plan: ""
output: ""
Cost optimization patterns
Use smaller models for structured extraction. A 3B parameter model can reliably extract JSON from text. Reserve larger models for generation tasks.
Route summarization to the fastest model. Summarization is one of the cheapest tasks to execute well — llama3.2:1b is often sufficient.
Use spec.maxTokens per step. Set tighter limits on steps that use expensive models:
roles:
- name: analyst
model: claude-sonnet-4-20250514
limits:
maxTokensPerCall: 2000 # tight limit on the expensive model
See also
- Providers concept — provider auto-detection, Ollama, custom endpoints
- Cost Management guide — token budgets and daily limits
- Building a Pipeline guide — pipeline structure and template expressions