You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On a 4.22.1.0 deployment with two management servers behind a load balancer VM and system-VM console access fails intermittently. The browser receives a generic Apache-style "Internal Server Error" page. Reopening the console sometimes works, sometimes fails, with no change to the environment.
The failure has been isolated to the console session token validation path. Every surrounding component has been verified healthy. Pinning the browser directly to a single management server (bypassing the load balancer) does not resolve it, so this is not simply LB source-IP rewriting.
Browser shows stock "Internal Server Error" (Apache-style) page intermittently when opening a console.
CPVM (/var/log/cloud/cloud.out) on failure:
Session has already been used, cannot connect
External authenticator failed authentication request for vm with sid
ERROR ConsoleProxyNoVNCHandler ... Failed to create viewer ...
com.cloud.consoleproxy.AuthenticationException: External authenticator failed request
Also observed earlier: org.eclipse.jetty.websocket.api.CloseException: TimeoutException: Idle timeout expired: 300000/300000 ms.
KEY DIAGNOSTIC FINDING
The cloud.console_session table records the load balancer IP (10.125.128.38) as
console_endpoint_creator_address for sessions, and a subset of rows never reach acquired/removed
(these correlate with the failures):
console_endpoint_creator_address is being recorded as the load balancer (10.125.128.38) and in
some rows as the client IP (172.31.1.204) — never as a real management server IP (.37/.39).
Neither .38 nor the client is a valid validation target, which appears to be why those sessions
are never acquired.
On failure, NOTHING is logged on either management server (management-server.log filtered for
console|authentication|failed|console_session shows only cluster heartbeat). On success, the
minting MS logs the full createConsoleEndpoint -> Compose console url -> Adding allowed session
-> ConsoleAccessAuthenticationCommand flow. So failing validations are being rejected at the
CPVM and never reach a real MS.
RULED OUT (verified, not assumed)
CPVM health: up 3+ days, ports 80/8001/8080 bound by the cloud Java process (no stray Apache),
actively serving noVNC. curl of /resource/noVNC/vnc.html returns 200.
Network/MTU: DF ping at 1472 bytes succeeds 0% loss between client and CPVM; path MTU is full 1500.
Management servers: both Up in mshost; both reachable on 8250 from CPVM; clocks within ~1s
(verified date -u).
KVM host clock: within ~1s of MS.
VNC port: virsh vncdisplay v-1902-VM = :4 (5904), exactly matching the port the MS hands out —
no stale-port mismatch.
cluster.node.IP: correctly set per node; no rogue/extra management instance;
CPVM rebuild: destroyed/recreated multiple times — no effect.
Pinning browser directly to a single MS (https://:8080/client), bypassing the LB:
still fails. (So this is NOT solely LB source-IP rewriting / cross-MS in-memory token, despite
PR Handle console session in multiple management servers #7094 being present and the console_session table being populated.)
Yet console_endpoint_creator_address is being populated with the load balancer IP and sometimes
the client IP, rather than the processing management server's own IP. Those values are not valid
validation targets.
Bypassing the load balancer (direct-to-MS) does not fix it, so source-IP rewriting by the LB is
not a complete explanation.
The result is intermittent, single-use-token "already used / external authenticator failed"
rejections at the CPVM, with no corresponding log on any management server.
versions
CloudStack version: 4.22.1.0-shapeblue0
Hypervisor: KVM (host agent cloudstack-agent 4.22.1.0-shapeblue0, matched to MS)
Management servers: 2 nodes
mshost table: both nodes Up
Haproxy load balancers (fronts the management/UI tier; NOT a management server)
SSL: disabled (consoleproxy.sslEnabled=false, no consoleproxy.url.domain)
Deploy CloudStack 4.22.1.0 with two management servers behind a load balancer fronting the management/UI tier. KVM hypervisor, SSL disabled.
From a client on a different subnet, open the CloudStack UI through the load balancer and click "View Console" on a running VM or system VM.
Repeat opening the console several times.
Observe that console access succeeds intermittently — some attempts load the noVNC console, others return a generic "Internal Server Error" page in the browser.
On a failing attempt, check the CPVM log (/var/log/cloud/cloud.out):
Session has already been used, cannot connect
External authenticator failed authentication request for vm with sid
com.cloud.consoleproxy.AuthenticationException: External authenticator failed request
On a failing attempt, check both management-server logs — NOTHING is logged on either MS (no createConsoleEndpoint, no ConsoleAccessAuthenticationCommand). On a succeeding attempt, the full flow IS logged on the minting MS.
Inspect the cloud.console_session table:
SELECT id,uuid,acquired,removed,console_endpoint_creator_address,client_address FROM cloud.console_session ORDER BY created DESC LIMIT 10;
Note that console_endpoint_creator_address is recorded as the load balancer IP, and in some rows the client IP — never a real management server IP. Rows with these creator addresses are the ones that never reach 'acquired'/'removed', and these correlate with the failures.
What to do about it?
console_endpoint_creator_address is being populated with the load balancer IP (and sometimes the client IP) instead of the processing management server's own management IP. Since the CPVM validates the one-time console token against the recorded creator address, and neither the LB nor the client can service that validation callback, those sessions are never acquired and the CPVM rejects the connection — producing the intermittent HTTP 500.
Expected: console_endpoint_creator_address should be the management server that processed the createConsoleEndpoint call (a real MS IP from the host setting), regardless of whether the request arrived via a load balancer.
problem
On a 4.22.1.0 deployment with two management servers behind a load balancer VM and system-VM console access fails intermittently. The browser receives a generic Apache-style "Internal Server Error" page. Reopening the console sometimes works, sometimes fails, with no change to the environment.
The failure has been isolated to the console session token validation path. Every surrounding component has been verified healthy. Pinning the browser directly to a single management server (bypassing the load balancer) does not resolve it, so this is not simply LB source-IP rewriting.
Session has already been used, cannot connect
External authenticator failed authentication request for vm with sid
ERROR ConsoleProxyNoVNCHandler ... Failed to create viewer ...
com.cloud.consoleproxy.AuthenticationException: External authenticator failed request
KEY DIAGNOSTIC FINDING
The cloud.console_session table records the load balancer IP (10.125.128.38) as
console_endpoint_creator_address for sessions, and a subset of rows never reach acquired/removed
(these correlate with the failures):
console_endpoint_creator_address is being recorded as the load balancer (10.125.128.38) and in
some rows as the client IP (172.31.1.204) — never as a real management server IP (.37/.39).
Neither .38 nor the client is a valid validation target, which appears to be why those sessions
are never acquired.
On failure, NOTHING is logged on either management server (management-server.log filtered for
console|authentication|failed|console_session shows only cluster heartbeat). On success, the
minting MS logs the full createConsoleEndpoint -> Compose console url -> Adding allowed session
-> ConsoleAccessAuthenticationCommand flow. So failing validations are being rejected at the
CPVM and never reach a real MS.
RULED OUT (verified, not assumed)
actively serving noVNC. curl of /resource/noVNC/vnc.html returns 200.
(verified date -u).
no stale-port mismatch.
still fails. (So this is NOT solely LB source-IP rewriting / cross-MS in-memory token, despite
PR Handle console session in multiple management servers #7094 being present and the console_session table being populated.)
WHY THIS LOOKS LIKE A BUG, NOT MISCONFIGURATION
and is written.
the client IP, rather than the processing management server's own IP. Those values are not valid
validation targets.
not a complete explanation.
rejections at the CPVM, with no corresponding log on any management server.
versions
mshost table: both nodes Up
console.session.cleanup.retention.hours=240,
consoleproxy.session.timeout=300000, consoleproxy.session.max=50,
novnc.console.default=true, novnc.console.sourceip.check.enabled=false
The steps to reproduce the bug
Session has already been used, cannot connect
External authenticator failed authentication request for vm with sid
com.cloud.consoleproxy.AuthenticationException: External authenticator failed request
SELECT id,uuid,acquired,removed,console_endpoint_creator_address,client_address FROM cloud.console_session ORDER BY created DESC LIMIT 10;
Note that console_endpoint_creator_address is recorded as the load balancer IP, and in some rows the client IP — never a real management server IP. Rows with these creator addresses are the ones that never reach 'acquired'/'removed', and these correlate with the failures.
What to do about it?
console_endpoint_creator_address is being populated with the load balancer IP (and sometimes the client IP) instead of the processing management server's own management IP. Since the CPVM validates the one-time console token against the recorded creator address, and neither the LB nor the client can service that validation callback, those sessions are never acquired and the CPVM rejects the connection — producing the intermittent HTTP 500.
Expected: console_endpoint_creator_address should be the management server that processed the createConsoleEndpoint call (a real MS IP from the
hostsetting), regardless of whether the request arrived via a load balancer.