Cost Management

ark-operator enforces token budgets at the infrastructure level — not in application code. Limits apply to every agent pod the operator creates, regardless of what the agent does.


Per-run token budget

Set spec.maxTokens on an ArkTeam to hard-stop a run when the budget is exceeded:

spec:
  maxTokens: 100000   # input + output tokens across all steps

When the total crosses this threshold, the operator sets the team phase to Failed with reason BudgetExceeded. Steps already running are allowed to finish; no new steps are submitted.

The budget covers all roles combined. To limit a specific role, set limits.maxTokensPerCall on the inline role or the referenced ArkAgent.


Per-call limits on individual agents

spec:
  roles:
    - name: researcher
      model: llama3.2
      limits:
        maxTokensPerCall: 8000     # max tokens for any single LLM call
        timeoutSeconds: 120        # abandon task after this duration

These are enforced by the agent runtime. The pod uses AGENT_MAX_TOKENS and AGENT_TIMEOUT_SECONDS env vars injected by the operator.


Rolling daily budget

spec.limits.maxDailyTokens caps token usage over a rolling 24-hour window. When the cap is hit, the operator scales all role replicas to zero. The team cannot process new tasks until the window rotates and token usage drops below the limit.

spec:
  maxTokens: 100000          # per-run hard stop
  limits:
    maxDailyTokens: 500000   # rolling 24h cap

The operator checks token usage against the rolling window every reconcile cycle. When the window clears, replicas are automatically restored to their configured count — no manual intervention required.


Token usage tracking

Token usage is accumulated per step in ArkTeam.status:

status:
  steps:
    - name: researcher
      phase: Succeeded
      tokenUsage:
        inputTokens: 320
        outputTokens: 1240
        totalTokens: 1560
    - name: writer
      phase: Succeeded
      tokenUsage:
        inputTokens: 1240
        outputTokens: 612
        totalTokens: 1852
  totalTokenUsage:
    inputTokens: 1560
    outputTokens: 1852
    totalTokens: 3412

View current usage:

kubectl get arkteam content-pipeline -n my-org \
  -o jsonpath='{.status.totalTokenUsage}'

Estimating costs before running

Use --dry-run to estimate token cost before a real run:

ark run team.yaml --provider mock --dry-run --input topic="test"

This parses the team definition, resolves templates with the provided inputs, and estimates prompt length — without making any API calls.


Practical limits for common models

Model Input price Output price Suggested maxTokensPerCall
llama3.2 (Ollama) Free Free 8000 (no cost concern)
gpt-4o ~$2.50/1M ~$10/1M 4000–8000
claude-sonnet-4-20250514 ~$3/1M ~$15/1M 4000–8000
gpt-4o-mini ~$0.15/1M ~$0.60/1M 8000–16000

These are rough estimates. Check provider pricing pages for current rates.


BudgetExceeded error

When a run hits spec.maxTokens, the team phase is set to Failed with this condition:

status:
  phase: Failed
  conditions:
    - type: Ready
      status: "False"
      reason: BudgetExceeded
      message: "token budget exceeded: 102,341 > 100,000"

Retry with a higher budget:

ark trigger content-pipeline -n my-org \
  --input '{"topic": "your topic"}'

Or patch the budget before retrying:

kubectl patch arkteam content-pipeline -n my-org \
  --type=merge -p '{"spec":{"maxTokens":200000}}'
ark trigger content-pipeline -n my-org --input '{"topic": "your topic"}'

See also


Apache 2.0 · ARKONIS