Skip to content

refactor(tower-runtime): structured AppCompletion / Status::Failed plumbing#274

Merged
bradhe merged 1 commit into
developfrom
feature/tow-2020-surface-waiter-closed-cause
May 14, 2026
Merged

refactor(tower-runtime): structured AppCompletion / Status::Failed plumbing#274
bradhe merged 1 commit into
developfrom
feature/tow-2020-surface-waiter-closed-cause

Conversation

@bradhe
Copy link
Copy Markdown
Contributor

@bradhe bradhe commented May 12, 2026

Summary

LocalApp::start previously spawned execute_local_app with a oneshot::Sender<i32>; every ? early-return in that task dropped the sender silently, and LocalApp::status() reported the resulting closed channel as Error::WaiterClosed — losing the actual cause (env join errors, tower_uv::Error variants, panics, etc.).

This PR replaces the channel payload with an internal AppCompletion enum and a new Status::Failed(AppFailure) shape that preserves the structured error.

Changes

  • AppCompletion (internal to local.rs) has Exit(i32), Cancelled, Failed(AppFailure) variants. The spawn wrapper in LocalApp::start runs inner_execute_local_app inside AssertUnwindSafe(...).catch_unwind() and always sends an AppCompletion before exiting — a closed waiter channel can no longer mask a real failure.

  • Status::Failed(AppFailure) replaces the old Status::Failed { error_code: String, error_message: String }. AppFailure has:

    • Runtime(Error) — structured tower_runtime error.
    • Panic(String)catch_unwind payload (the panic loses type info, so a string is the most we can carry).
    • Platform { error_code, error_message } — opaque strings for callers that construct Status::Failed from outside tower_runtime (e.g. Kubernetes pod-status reconciliation in downstream consumers).

    Stringification (mapping Error variants to a stable error_code, debug-formatting the message) is the consumer's responsibility, not the runtime's.

  • Status::Cancelled is a new terminal variant for explicit cancellation via the CancellationToken. Previously cancellation was represented as Crashed { code: -1 }, indistinguishable from an app that legitimately exits with code 255 / -1 or an IO error in wait_for_process. inner_execute_local_app now returns Err(Error::Cancelled) on every cancellation path (four prologue checks + three post-wait_for_process checks that disambiguate cancellation from a child's genuine non-zero exit), and the spawn wrapper special-cases that into AppCompletion::Cancelled.

  • Error gains Clone + PartialEq + Eq (all variants are unit, so this is free) so Status can keep deriving Clone + PartialEq, which is required by status caching in LocalApp::status() and by existing integration tests.

  • tower-cmd (the only other in-crate consumer of Status::Failed) is updated for the new variant shape and prints a dedicated "cancelled" message for Status::Cancelled.

What stays the same

  • uv venv / uv sync non-zero exits still surface as Status::Crashed { code } (verified by test_abort_on_dependency_installation_failure). Status::Failed is reserved for platform failures where the child process never ran.

  • Error::WaiterClosed is preserved but now only fires when the spawn was aborted or the runtime was dropped — both real bugs worth surfacing as-is.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 12, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1afb8566-688c-4132-8e40-33a99aaeaf0b

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/tow-2020-surface-waiter-closed-cause

Comment @coderabbitai help to get the list of available commands and usage tips.

…umbing

`LocalApp::start` previously spawned `execute_local_app` with a
`oneshot::Sender<i32>`; every `?` early-return in that task dropped the
sender silently, and `LocalApp::status()` reported the resulting closed
channel as `Error::WaiterClosed` — losing the actual cause (env join
errors, `tower_uv::Error` variants, panics, etc.).

Replace the channel payload with an internal `AppCompletion` enum and a
new `Status::Failed(AppFailure)` shape that preserves the structured
error. Highlights:

  - `AppCompletion` (internal) has `Exit(i32)`, `Cancelled`,
    `Failed(AppFailure)` variants. The spawn wrapper in `LocalApp::start`
    runs `inner_execute_local_app` inside `AssertUnwindSafe(...).catch_unwind()`
    and always sends an `AppCompletion` before exiting, so a closed waiter
    channel can no longer mask a real failure.

  - `Status::Failed(AppFailure)` replaces the old
    `Status::Failed { error_code, error_message }` strings.
    `AppFailure` has:
      * `Runtime(Error)` — structured runtime error
      * `Panic(String)` — `catch_unwind` payload
      * `Platform { error_code, error_message }` — opaque strings for
        callers that construct `Status::Failed` from outside
        `tower_runtime` (e.g. Kubernetes pod-status reconciliation).
    Stringification (`error_code` / `error_message`) is the consumer's
    responsibility, not the runtime's.

  - `Status::Cancelled` is a new terminal variant for explicit
    cancellation via the `CancellationToken`. Previously cancellation
    was represented as `Crashed { code: -1 }`, indistinguishable from
    an app that legitimately exits with code 255 / -1 or an IO error
    in `wait_for_process`. `inner_execute_local_app` now returns
    `Err(Error::Cancelled)` on every cancellation path (4 prologue
    checks + 3 post-`wait_for_process` checks that disambiguate
    cancellation from a child's genuine non-zero exit), and the spawn
    wrapper special-cases that into `AppCompletion::Cancelled`.

  - `Error` gains `Clone + PartialEq + Eq` (all variants are unit, so
    this is free) so `Status` can keep deriving `Clone + PartialEq`,
    which is required by status caching in `LocalApp::status()` and by
    existing integration tests.

  - `tower-cmd` (the only other in-crate consumer of `Status::Failed`)
    is updated for the new variant shape and prints a dedicated
    "cancelled" message for `Status::Cancelled`.

  - The integration test `test_abort_on_dependency_installation_failure`
    keeps asserting that `uv sync` non-zero exits surface as
    `Status::Crashed { code }` — `Status::Failed` is reserved for
    platform failures where the child never ran.

`Error::WaiterClosed` now only fires when the spawn was aborted or the
runtime was dropped — both real bugs worth surfacing as-is.
@bradhe bradhe force-pushed the feature/tow-2020-surface-waiter-closed-cause branch from ca64d62 to 744f6d1 Compare May 13, 2026 09:53
@bradhe bradhe changed the title fix(tower-runtime): surface real cause of WaiterClosed via AppCompletion channel refactor(tower-runtime): structured AppCompletion / Status::Failed plumbing May 13, 2026
@bradhe bradhe requested review from sammuti and socksy May 13, 2026 13:08
@bradhe bradhe merged commit fb2ced6 into develop May 14, 2026
33 checks passed
@bradhe bradhe deleted the feature/tow-2020-surface-waiter-closed-cause branch May 14, 2026 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants