# Feature Request: Add `ResponseHeaderTimeout` and Retry Logic to containerd Image Pull HTTP Transport

## What we observed

During AKS node initialization, when pulling container images from the registry, if the registry server became temporarily unresponsive, the HTTP client waited for approximately **2 minutes 47 seconds** before timing out — with 0 bytes received. An immediate retry after the failure succeeded in just **1.6 seconds**.

This wait time is too long. The client should be able to detect unresponsive connections much sooner.

## Where the issue is

The CRI image pull HTTP request flows through this chain:

```
PullImage
  └─ pullRequestReporterRoundTripper.RoundTrip()   // counts active requests & bytes
       └─ http.Transport.RoundTrip()                // actual HTTP request goes out here
            └─ newTransport() (image_pull.go L569)  // DialContext.Timeout=30s, but NO ResponseHeaderTimeout
```

`newTransport()` in [`internal/cri/server/images/image_pull.go#L569-L581`](https://github.com/containerd/containerd/blob/9860888666f7e96a37d0a412ee80be065ea74903/internal/cri/server/images/image_pull.go#L570-L580) creates the `http.Transport` that actually sends the request. Once the TCP connection and TLS handshake succeed, there is no timeout for waiting on response headers — so the client hangs until the OS TCP stack gives up (~2-3 minutes).

## What we propose

Add `ResponseHeaderTimeout` to `newTransport()`:

```go
func newTransport() *http.Transport {
    return &http.Transport{
        Proxy: http.ProxyFromEnvironment,
        DialContext: (&net.Dialer{
            Timeout:       30 * time.Second,
            KeepAlive:     30 * time.Second,
            FallbackDelay: 300 * time.Millisecond,
        }).DialContext,
        MaxIdleConns:          10,
        IdleConnTimeout:       30 * time.Second,
        TLSHandshakeTimeout:  10 * time.Second,
        ExpectContinueTimeout: 5 * time.Second,
        ResponseHeaderTimeout: 30 * time.Second,  // <-- add this
    }
}
```

This allows the client to fail fast and retry sooner, instead of waiting minutes on a stalled connection.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

# Feature Request: Add `ResponseHeaderTimeout` and Retry Logic to containerd Image Pull HTTP Transport #13006

What we observed

Where the issue is

What we propose

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

# Feature Request: Add ResponseHeaderTimeout and Retry Logic to containerd Image Pull HTTP Transport #13006

Description

What we observed

Where the issue is

What we propose

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

# Feature Request: Add `ResponseHeaderTimeout` and Retry Logic to containerd Image Pull HTTP Transport #13006