Skip to content

Create a plugin to integrate with systemd watchdog for daemon health#13188

Open
chilkaditya wants to merge 2 commits intocontainerd:mainfrom
chilkaditya:systemd-watchdog-integration-#10329
Open

Create a plugin to integrate with systemd watchdog for daemon health#13188
chilkaditya wants to merge 2 commits intocontainerd:mainfrom
chilkaditya:systemd-watchdog-integration-#10329

Conversation

@chilkaditya
Copy link
Copy Markdown

@chilkaditya chilkaditya commented Apr 8, 2026

This PR adds a watchdog plugin to containerd that integrates with the systemd watchdog.
The plugin performs lightweight internal health checks and sends WATCHDOG=1 notifications only when the daemon is healthy. If containerd becomes unresponsive, notifications stop and systemd restarts the service.

Now here we are healthchecking for metadata store and content store because MetadataPlugin stores all persistent state: container records, image manifests, snapshot references, lease data everything. ContentPlugin stores the actual image layer blobs.

How this health check is done?

  • Metadata store: Uses a BoltDB read transaction (mdb.View) to verify database accessibility, lock availability, and detect potential deadlocks or I/O stalls.
  • Content store: Uses content.Store.Walk to ensure the content service and backend storage are responsive. The walk exits early after the first item, making the check lightweight while still exercising the full request path.

I have tested this in my setup by setting up WatchdogSec = 60s in container.service.d.
Fix - #10329

image

@samuelkarp
Copy link
Copy Markdown
Member

FAIL - does not have a valid DCO

CI is failing because you did not sign-off your commit. Please add the appropriate Signed-off-by line to indicate your acceptance of the Developer Certificate of Origin.

@mxpv
Copy link
Copy Markdown
Member

mxpv commented Apr 8, 2026

Prior art:
#10623
#11897
#11960

@chilkaditya chilkaditya force-pushed the systemd-watchdog-integration-#10329 branch 2 times, most recently from 48b2843 to 9b03453 Compare April 9, 2026 05:39
Signed-off-by: chilkaditya <apurkahini@gmail.com>
@chilkaditya chilkaditya force-pushed the systemd-watchdog-integration-#10329 branch from 9b03453 to 67b6f61 Compare April 9, 2026 05:52
Signed-off-by: chilkaditya <apurkahini@gmail.com>
Copilot AI review requested due to automatic review settings April 10, 2026 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Needs Triage

Development

Successfully merging this pull request may close these issues.

4 participants