Bug
defaultCpuLoad in worker.ts uses os.cpus() to measure CPU utilization, but inside a Docker container os.cpus() returns host CPU counters, not the container's cgroup allocation. This causes the worker to report inflated load values to the LiveKit server, which intermittently refuses to dispatch jobs with:
failed to send job request {"error": "no servers available (received 1 responses)", "jobType": "JT_ROOM", "agentName": ""}
The worker registers successfully and the Node.js side never marks itself as WS_FULL (since loadThreshold is Infinity in dev mode), but the raw load value sent to the Go server appears to be interpreted independently, causing the server to consider the worker unavailable.
Reproduction
- Run
@livekit/agents v1.0.43 inside a Docker container (oven/bun:1 base image)
- Host has many CPUs (tested with 24)
- Worker registers with LiveKit server v1.9.11
- Create a room — dispatch intermittently fails with "no servers available"
- Override
loadFunc: async () => 0 in ServerOptions — dispatch works reliably
Root cause
os.cpus().times in Node.js/Bun is not cgroup-aware. Inside a container it reflects all host CPUs, producing unreliable utilization percentages. This is a well-known Node.js limitation.
Suggested fix
Replace os.cpus() sampling with cgroup-aware CPU measurement when running inside a container:
- cgroup v2: Read
usage_usec from /sys/fs/cgroup/cpu.stat, compute delta against wall time and CPU quota from /sys/fs/cgroup/cpu.max
- cgroup v1: Read
/sys/fs/cgroup/cpu/cpuacct.usage and /sys/fs/cgroup/cpu/cpu.cfs_quota_us
- Detection: Check for
/.dockerenv or parse /proc/1/cgroup
- Fallback: Use current
os.cpus() approach when not in a container
Workaround
Override loadFunc in ServerOptions to bypass the default measurement:
cli.runApp(new ServerOptions({
agent: import.meta.filename,
loadFunc: async () => 0,
}));
Environment
@livekit/agents: 1.0.43
- LiveKit server: 1.9.11
- Runtime: Bun 1.3.8 (inside
oven/bun:1 Docker image)
- Host: Linux 6.12.70, 24 CPUs
Bug
defaultCpuLoadinworker.tsusesos.cpus()to measure CPU utilization, but inside a Docker containeros.cpus()returns host CPU counters, not the container's cgroup allocation. This causes the worker to report inflated load values to the LiveKit server, which intermittently refuses to dispatch jobs with:The worker registers successfully and the Node.js side never marks itself as
WS_FULL(sinceloadThresholdisInfinityin dev mode), but the raw load value sent to the Go server appears to be interpreted independently, causing the server to consider the worker unavailable.Reproduction
@livekit/agentsv1.0.43 inside a Docker container (oven/bun:1base image)loadFunc: async () => 0inServerOptions— dispatch works reliablyRoot cause
os.cpus().timesin Node.js/Bun is not cgroup-aware. Inside a container it reflects all host CPUs, producing unreliable utilization percentages. This is a well-known Node.js limitation.Suggested fix
Replace
os.cpus()sampling with cgroup-aware CPU measurement when running inside a container:usage_usecfrom/sys/fs/cgroup/cpu.stat, compute delta against wall time and CPU quota from/sys/fs/cgroup/cpu.max/sys/fs/cgroup/cpu/cpuacct.usageand/sys/fs/cgroup/cpu/cpu.cfs_quota_us/.dockerenvor parse/proc/1/cgroupos.cpus()approach when not in a containerWorkaround
Override
loadFuncinServerOptionsto bypass the default measurement:Environment
@livekit/agents: 1.0.43oven/bun:1Docker image)