Scaling Agents

ArkAgent replica count controls how many agent pods run in parallel. More replicas means more concurrent task capacity — up to spec.limits.maxConcurrentTasks per pod.


Manual scaling

Set spec.replicas in the ArkAgent:

spec:
  replicas: 5

The operator reconciles the backing Deployment to match. Scale via kubectl patch:

kubectl patch arkagent research-agent --type=merge -p '{"spec":{"replicas":10}}'

Or via the ark-dashboard scale control.

Range: 0–50. Set to 0 to drain the pool without deleting the resource.


Scale to zero

Scale to zero to pause an agent and stop consuming API quota:

kubectl patch arkagent research-agent --type=merge -p '{"spec":{"replicas":0}}'

Pending tasks in the queue are preserved. When you scale back up, the pods resume processing from where they left off.


Per-pod concurrency

Each agent pod can process multiple tasks simultaneously. The default is 5 concurrent tasks per pod:

spec:
  limits:
    maxConcurrentTasks: 5   # tasks processed in parallel per pod

Total effective parallelism = replicas × maxConcurrentTasks.

A 2-replica agent with maxConcurrentTasks: 5 can process 10 tasks simultaneously.


Daily token budget scaling

When spec.limits.maxDailyTokens is reached, the operator automatically scales all replicas to zero:

spec:
  limits:
    maxDailyTokens: 500000   # rolling 24h cap

No tasks are processed until the rolling 24-hour window clears. Replicas are automatically restored when usage drops below the limit. See Cost Management for details.


Resource limits on agent pods

spec.limits controls LLM-level limits (tokens, concurrency, timeout), not Kubernetes CPU/memory. To set Kubernetes resource requests and limits on agent pods, use the Helm agentResources values:

helm upgrade ark-operator arkonis/ark-operator \
  --set agentResources.requests.cpu=100m \
  --set agentResources.requests.memory=256Mi \
  --set agentResources.limits.cpu=500m \
  --set agentResources.limits.memory=512Mi

Queue-depth autoscaling (planned — v0.10)

CPU and memory are poor proxies for agent load. What matters is task queue depth — how many tasks are waiting.

Queue-depth autoscaling via KEDA is planned for v0.10. This will let you define scale-up/scale-down triggers on Redis Streams queue length so agent replicas grow automatically as work arrives and shrink back when the queue drains.

Until then, manual scaling and the daily token budget mechanism are the available controls.


Inspecting queue depth

Check the Redis Streams queue length for a specific agent:

# Queue for a team role: <namespace>.<team>.<role>
redis-cli XLEN my-org.content-pipeline.researcher

Or view through the ark-dashboard which shows pending task counts per agent.


See also


Apache 2.0 · ARKONIS