Skip to content

[adapters] checkpoint-suspend on SIGTERM#6482

Open
swanandx wants to merge 1 commit into
mainfrom
checkpoint-on-sigterm
Open

[adapters] checkpoint-suspend on SIGTERM#6482
swanandx wants to merge 1 commit into
mainfrom
checkpoint-on-sigterm

Conversation

@swanandx

@swanandx swanandx commented Jun 16, 2026

Copy link
Copy Markdown
Member

Infra-initiated termination (node drain, eviction) doesn't involve the runner, so the pipeline does not stop gracefully.

To solve this, we add an env var FELDERA_CLEAN_SHUTDOWN_ON_SIGTERM: when set, SIGTERM runs the same checkpoint-and-suspend as /suspend before stopping.

disabled by default.

tested with local runner & FELDERA_CLEAN_SHUTDOWN_ON_SIGTERM=1. Started pipeline and then kill -TERM <pid>

Breaking Changes?

not breaking change

@mythical-fred mythical-fred left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useful feature, and the refactor of trigger_suspend looks faithful to the original. A few blocking items before approval:

  • New user-visible env var FELDERA_CLEAN_SHUTDOWN_ON_SIGTERM with no documentation. This belongs in docs.feldera.com/operations/ (Kubernetes/deployment section), together with guidance on terminationGracePeriodSeconds.
  • New signal-shutdown behavior with no tests and no "Manual testing" section in the PR description. At minimum a brief description of how this was exercised (e.g. kill -TERM on a running pipeline and observed checkpoint completing) is needed.

Non-blocking observations inline (K8s grace-period interaction, unbounded wait loop, SIGHUP/SIGQUIT no longer handled when disable_signals() is called, expect() defensiveness, macro-hygiene workaround).

Comment thread crates/adapters/src/server.rs
Comment thread crates/adapters/src/server.rs
Comment thread crates/adapters/src/server.rs Outdated
Comment thread crates/adapters/src/server.rs
Comment thread crates/adapters/src/server.rs Outdated
Infra-initiated termination (node drain, eviction) doesn't involve the
runner, so the pipeline does not stop gracefully.

To solve this, we add an env var FELDERA_CLEAN_SHUTDOWN_ON_SIGTERM: when set, SIGTERM runs the same checkpoint-and-suspend as /suspend before stopping.

disabled by default.

Signed-off-by: Swanand Mulay <73115739+swanandx@users.noreply.github.com>
@swanandx swanandx force-pushed the checkpoint-on-sigterm branch from cbae24b to b00816c Compare June 16, 2026 07:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants