Cost Management
ark-operator enforces token budgets at the infrastructure level — not in application code. Limits apply to every agent pod the operator creates, regardless of what the agent does.
Per-run token budget
Set spec.maxTokens on an ArkTeam to hard-stop a run when the budget is exceeded:
spec:
maxTokens: 100000 # input + output tokens across all steps
When the total crosses this threshold, the operator sets the team phase to Failed with reason BudgetExceeded. Steps already running are allowed to finish; no new steps are submitted.
The budget covers all roles combined. To limit a specific role, set limits.maxTokensPerCall on the inline role or the referenced ArkAgent.
Per-call limits on individual agents
spec:
roles:
- name: researcher
model: llama3.2
limits:
maxTokensPerCall: 8000 # max tokens for any single LLM call
timeoutSeconds: 120 # abandon task after this duration
These are enforced by the agent runtime. The pod uses AGENT_MAX_TOKENS and AGENT_TIMEOUT_SECONDS env vars injected by the operator.
Rolling daily budget
spec.limits.maxDailyTokens caps token usage over a rolling 24-hour window. When the cap is hit, the operator scales all role replicas to zero. The team cannot process new tasks until the window rotates and token usage drops below the limit.
spec:
maxTokens: 100000 # per-run hard stop
limits:
maxDailyTokens: 500000 # rolling 24h cap
The operator checks token usage against the rolling window every reconcile cycle. When the window clears, replicas are automatically restored to their configured count — no manual intervention required.
Token usage tracking
Token usage is accumulated per step in ArkTeam.status:
status:
steps:
- name: researcher
phase: Succeeded
tokenUsage:
inputTokens: 320
outputTokens: 1240
totalTokens: 1560
- name: writer
phase: Succeeded
tokenUsage:
inputTokens: 1240
outputTokens: 612
totalTokens: 1852
totalTokenUsage:
inputTokens: 1560
outputTokens: 1852
totalTokens: 3412
View current usage:
kubectl get arkteam content-pipeline -n my-org \
-o jsonpath='{.status.totalTokenUsage}'
Estimating costs before running
Use --dry-run to estimate token cost before a real run:
ark run team.yaml --provider mock --dry-run --input topic="test"
This parses the team definition, resolves templates with the provided inputs, and estimates prompt length — without making any API calls.
Practical limits for common models
| Model | Input price | Output price | Suggested maxTokensPerCall |
|---|---|---|---|
llama3.2 (Ollama) | Free | Free | 8000 (no cost concern) |
gpt-4o | ~$2.50/1M | ~$10/1M | 4000–8000 |
claude-sonnet-4-20250514 | ~$3/1M | ~$15/1M | 4000–8000 |
gpt-4o-mini | ~$0.15/1M | ~$0.60/1M | 8000–16000 |
These are rough estimates. Check provider pricing pages for current rates.
BudgetExceeded error
When a run hits spec.maxTokens, the team phase is set to Failed with this condition:
status:
phase: Failed
conditions:
- type: Ready
status: "False"
reason: BudgetExceeded
message: "token budget exceeded: 102,341 > 100,000"
Retry with a higher budget:
ark trigger content-pipeline -n my-org \
--input '{"topic": "your topic"}'
Or patch the budget before retrying:
kubectl patch arkteam content-pipeline -n my-org \
--type=merge -p '{"spec":{"maxTokens":200000}}'
ark trigger content-pipeline -n my-org --input '{"topic": "your topic"}'
See also
- ArkTeam concept —
maxTokensandlimits.maxDailyTokensfields - Observability concept — OTel metrics for token spend monitoring
- Environment Variables reference —
AGENT_MAX_TOKENS