Scaling Agents
ArkAgent replica count controls how many agent pods run in parallel. More replicas means more concurrent task capacity — up to spec.limits.maxConcurrentTasks per pod.
Manual scaling
Set spec.replicas in the ArkAgent:
spec:
replicas: 5
The operator reconciles the backing Deployment to match. Scale via kubectl patch:
kubectl patch arkagent research-agent --type=merge -p '{"spec":{"replicas":10}}'
Or via the ark-dashboard scale control.
Range: 0–50. Set to 0 to drain the pool without deleting the resource.
Scale to zero
Scale to zero to pause an agent and stop consuming API quota:
kubectl patch arkagent research-agent --type=merge -p '{"spec":{"replicas":0}}'
Pending tasks in the queue are preserved. When you scale back up, the pods resume processing from where they left off.
Per-pod concurrency
Each agent pod can process multiple tasks simultaneously. The default is 5 concurrent tasks per pod:
spec:
limits:
maxConcurrentTasks: 5 # tasks processed in parallel per pod
Total effective parallelism = replicas × maxConcurrentTasks.
A 2-replica agent with maxConcurrentTasks: 5 can process 10 tasks simultaneously.
Daily token budget scaling
When spec.limits.maxDailyTokens is reached, the operator automatically scales all replicas to zero:
spec:
limits:
maxDailyTokens: 500000 # rolling 24h cap
No tasks are processed until the rolling 24-hour window clears. Replicas are automatically restored when usage drops below the limit. See Cost Management for details.
Resource limits on agent pods
spec.limits controls LLM-level limits (tokens, concurrency, timeout), not Kubernetes CPU/memory. To set Kubernetes resource requests and limits on agent pods, use the Helm agentResources values:
helm upgrade ark-operator arkonis/ark-operator \
--set agentResources.requests.cpu=100m \
--set agentResources.requests.memory=256Mi \
--set agentResources.limits.cpu=500m \
--set agentResources.limits.memory=512Mi
Queue-depth autoscaling (planned — v0.10)
CPU and memory are poor proxies for agent load. What matters is task queue depth — how many tasks are waiting.
Queue-depth autoscaling via KEDA is planned for v0.10. This will let you define scale-up/scale-down triggers on Redis Streams queue length so agent replicas grow automatically as work arrives and shrink back when the queue drains.
Until then, manual scaling and the daily token budget mechanism are the available controls.
Inspecting queue depth
Check the Redis Streams queue length for a specific agent:
# Queue for a team role: <namespace>.<team>.<role>
redis-cli XLEN my-org.content-pipeline.researcher
Or view through the ark-dashboard which shows pending task counts per agent.
See also
- Cost Management guide —
maxDailyTokensbudget scaling - ArkAgent concept —
spec.replicasandspec.limits - Helm Values reference —
agentResources