Skip to content

# Feature Request: Add ResponseHeaderTimeout and Retry Logic to containerd Image Pull HTTP Transport #13006

@ZHIXIN-SUN

Description

@ZHIXIN-SUN

What we observed

During AKS node initialization, when pulling container images from the registry, if the registry server became temporarily unresponsive, the HTTP client waited for approximately 2 minutes 47 seconds before timing out — with 0 bytes received. An immediate retry after the failure succeeded in just 1.6 seconds.

This wait time is too long. The client should be able to detect unresponsive connections much sooner.

Where the issue is

The CRI image pull HTTP request flows through this chain:

PullImage
  └─ pullRequestReporterRoundTripper.RoundTrip()   // counts active requests & bytes
       └─ http.Transport.RoundTrip()                // actual HTTP request goes out here
            └─ newTransport() (image_pull.go L569)  // DialContext.Timeout=30s, but NO ResponseHeaderTimeout

newTransport() in internal/cri/server/images/image_pull.go#L569-L581 creates the http.Transport that actually sends the request. Once the TCP connection and TLS handshake succeed, there is no timeout for waiting on response headers — so the client hangs until the OS TCP stack gives up (~2-3 minutes).

What we propose

Add ResponseHeaderTimeout to newTransport():

func newTransport() *http.Transport {
    return &http.Transport{
        Proxy: http.ProxyFromEnvironment,
        DialContext: (&net.Dialer{
            Timeout:       30 * time.Second,
            KeepAlive:     30 * time.Second,
            FallbackDelay: 300 * time.Millisecond,
        }).DialContext,
        MaxIdleConns:          10,
        IdleConnTimeout:       30 * time.Second,
        TLSHandshakeTimeout:  10 * time.Second,
        ExpectContinueTimeout: 5 * time.Second,
        ResponseHeaderTimeout: 30 * time.Second,  // <-- add this
    }
}

This allows the client to fail fast and retry sooner, instead of waiting minutes on a stalled connection.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions