How It Works
A mental model for understanding ark-operator before diving into the details.
The core idea
Kubernetes manages Deployments of containers. ark-operator manages ArkAgents of LLM-based agents.
The analogy is direct:
| Kubernetes | ark-operator |
|---|---|
| Container image | Model + system prompt + MCP tools |
Deployment | ArkAgent |
Service | ArkService |
ConfigMap | ArkSettings |
CronJob / Ingress | ArkEvent |
When you define an ArkAgent, you declare the desired state: which model to use, what to tell it, what tools to give it, and how many replicas to run. The operator’s job is to make reality match that declaration — exactly like a Deployment controller reconciles pod replicas.
The operator reconcile loop
The operator watches your ArkAgent resources. When something changes — or on a periodic resync — it runs a reconcile:
Watch ArkAgent CR
│
▼
Reconcile()
├── How many agent pods are currently running?
├── Compare with spec.replicas
├── Scale up → create new agent pods
├── Scale down → delete excess pods
├── Inject env vars: MODEL, SYSTEM_PROMPT, MCP_SERVERS
├── Run semantic liveness checks
└── Update .status.readyReplicas
The operator never modifies your YAML. It only manages the backing Kubernetes resources (Deployments, Pods) that bring the desired state to life.
What runs inside an agent pod
Each agent pod runs the ark-runtime binary. It does one thing: poll a task queue, call the LLM, and return results.
agent pod startup
│
├── Read config from env vars (MODEL, SYSTEM_PROMPT, MCP_SERVERS, ...)
├── Connect to MCP tool servers
└── Poll task queue (Redis Streams)
│
▼
Task arrives
│
├── Build prompt from system prompt + task input
├── Call LLM provider (tool-use loop until model stops calling tools)
└── Return result to queue
The operator injects all configuration as environment variables. The agent runtime has no knowledge of Kubernetes — it just reads env vars and processes tasks. This means you can run the same runtime code locally with ark run.
How tasks flow
External trigger (webhook / cron / ark trigger)
│
▼
ArkEvent → dispatches ArkTeam run
│
▼
ArkTeam controller → submits step tasks to Redis Streams
│
▼
Agent pods poll their queue → call LLM → write result back
│
▼
ArkTeam controller reads results → advances DAG → submits next steps
│
▼
All steps done → ArkTeam phase = Succeeded → output available
The Redis Streams queue is the boundary between the operator and the agent pods. Trace context is propagated across this boundary, so a single OpenTelemetry trace spans the full path from trigger to final output.
ArkTeam: the execution primitive
An ArkTeam is the unit of work. It defines roles (agents) and either:
- A pipeline — a DAG of steps executed in declared order (like a CI workflow)
- Dynamic delegation — agents decide at runtime what to delegate to whom (like an org chart)
In pipeline mode, template expressions connect step outputs to the next step’s inputs:
pipeline:
- role: research
inputs:
prompt: ""
- role: summarize
dependsOn: [research]
inputs:
content: ""
The operator tracks each step’s phase (Pending → Running → Succeeded/Failed) and orchestrates the DAG. You never write scheduling logic.
What the operator does not do
- It does not make LLM API calls — agent pods do
- It does not parse or validate LLM outputs (unless a step has
outputSchema) - It does not route external traffic — use
ArkServicefor that - It does not store agent memory — use
ArkMemoryfor that
Next steps
- ArkAgent concept — agent spec walkthrough
- ArkTeam concept — pipeline and dynamic delegation in depth
- Building a Pipeline — step-by-step guide