Multi-Model Teams

Each role in an ArkTeam can use a different model. You can mix Ollama, OpenAI, and Anthropic in a single pipeline — routing each step to the model best suited for that task.

Common use cases:

Use a small local model for cheap, fast tasks (summarization, formatting) and a larger model only for reasoning-heavy steps
Route to OpenAI for tasks that require structured JSON and Anthropic for long-form writing
Pin a specific model version to a specific role for reproducibility

How it works

Each inline role definition has its own model field. The agent runtime for that role auto-detects the provider from the model name — or you can specify it explicitly with AGENT_PROVIDER via an ArkAgent reference.

For inline roles, provider detection uses the same rules as everywhere else:

claude-* → Anthropic
gpt-*/o* → OpenAI
anything else → OpenAI-compatible (reads OPENAI_BASE_URL for Ollama)

Example: mixed-model content team

This pipeline uses llama3.2 (Ollama) for the researcher and gpt-4o for the writer, since writing quality is more critical than research breadth:

apiVersion: arkonis.dev/v1alpha1
kind: ArkTeam
metadata:
  name: mixed-content-team
  namespace: my-org
spec:
  output: "{{ .steps.writer.output }}"
  roles:
    - name: researcher
      model: llama3.2                # fast and cheap — Ollama
      systemPrompt: |
        You are a research assistant. Summarize key facts about the topic.
        Be concise — bullet points are fine.
    - name: writer
      model: gpt-4o                  # higher quality writing
      systemPrompt: |
        You are a skilled writer. Turn the research bullets into a polished
        500-word article for a technical audience.
  pipeline:
    - role: researcher
      inputs:
        prompt: "Research: {{ .input.topic }}"
    - role: writer
      dependsOn: [researcher]
      inputs:
        research: "{{ .steps.researcher.output }}"
        prompt: "Write a polished article from these notes."

To run this locally, both providers need to be available:

# Ollama running locally + OpenAI API key
OPENAI_BASE_URL=http://localhost:11434/v1 \
OPENAI_API_KEY=sk-... \
  ark run mixed-content-team.yaml --watch \
  --input topic="distributed systems consensus"

The CLI auto-detects the provider per step: llama3.2 sends requests to OPENAI_BASE_URL, gpt-4o sends to api.openai.com.

Example: Anthropic + Ollama

roles:
  - name: coordinator
    model: claude-sonnet-4-20250514   # Anthropic — strong reasoning
    systemPrompt: "You are the coordinator. Break down the task."
    canDelegate: [worker]
  - name: worker
    model: llama3.2                   # Ollama — bulk work
    systemPrompt: "You are a worker. Execute the given task precisely."
    canDelegate: []

Using ArkAgent references for per-role provider config

If a role needs explicit provider settings (e.g., a custom OPENAI_BASE_URL), define a standalone ArkAgent and reference it from the team:

apiVersion: arkonis.dev/v1alpha1
kind: ArkAgent
metadata:
  name: local-worker
  namespace: my-org
spec:
  model: llama3.2:70b
  systemPrompt: "You are a local worker agent."
  # Provider env vars are set via agentExtraEnv in Helm or the arkonis-api-keys Secret
---
apiVersion: arkonis.dev/v1alpha1
kind: ArkTeam
metadata:
  name: hybrid-team
  namespace: my-org
spec:
  roles:
    - name: planner
      model: gpt-4o
      systemPrompt: "Plan the work."
    - name: executor
      arkAgent: local-worker    # uses the separate ArkAgent with its own config
  pipeline:
    - role: planner
      inputs:
        prompt: ""
    - role: executor
      dependsOn: [planner]
      inputs:
        plan: ""
  output: ""

Cost optimization patterns

Use smaller models for structured extraction. A 3B parameter model can reliably extract JSON from text. Reserve larger models for generation tasks.

Route summarization to the fastest model. Summarization is one of the cheapest tasks to execute well — llama3.2:1b is often sufficient.

Use spec.maxTokens per step. Set tighter limits on steps that use expensive models:

roles:
  - name: analyst
    model: claude-sonnet-4-20250514
    limits:
      maxTokensPerCall: 2000    # tight limit on the expensive model