[fix](cloud) Avoid recycle the instance which has successor#62234
Merged
gavinchou merged 1 commit intoapache:masterfrom Apr 9, 2026
Merged
Conversation
Because the successor instance may still need the data
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
Author
|
run buildall |
Contributor
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
Contributor
|
/review |
Hastyshell
approved these changes
Apr 9, 2026
Contributor
|
PR approved by anyone and no changes requested. |
Contributor
There was a problem hiding this comment.
Found 1 issue.
- High:
cloud/src/recycler/recycler.cppnow skips KV deletion whenever the successor instance key still exists, but I could not find any production path that ever clearssuccessor_instance_idon the deleted source instance after the handoff finishes.SnapshotManager::decouple_instance()only clearssource_instance_id/source_snapshot_idon the successor, and the onlyset_successor_instance_id()usage in this repo is the test setup. That means a deleted source instance will keep returning early here for as long as its successor instance exists, so its meta/versioned KV ranges are never reclaimed. This fixes the immediate data-loss concern by turning it into a permanent recycle leak.
Critical checkpoint conclusions:
- Goal of the task: protect metadata still needed by a rollback successor. Current code does not achieve that safely end-to-end because it blocks recycler progress on successor existence alone, and there is no test proving eventual cleanup.
- Small/clear/focused: the patch is small, but the lifecycle condition is incomplete.
- Concurrency: no new in-memory lock ordering issue found; this path uses txn-kv operations only.
- Lifecycle management: this is the main problem. The deleted-source/successor lifecycle is not closed because
successor_instance_idis never cleared in production code. - Configuration: no new config.
- Compatibility: no FE/BE or storage-format compatibility issue introduced.
- Parallel code paths: no equivalent recycler path appears to handle this cleanup elsewhere.
- Conditional checks: the new guard lacks a proven terminal condition; mere successor existence is too broad.
- Test coverage: insufficient. Existing test coverage only exercises resource-manager refresh behavior, not deleted-instance recycling or eventual cleanup.
- Observability: logging is acceptable for diagnosing this path.
- Transaction/persistence: yes, this changes persistent cleanup behavior, and the new condition can leave persistent KV data behind indefinitely.
- Data write/modification safety: not safe from a cleanup-progress perspective because deleted instances can become unrecyclable forever.
- FE/BE variable propagation: not applicable.
- Performance: repeated recycler scans on permanently skipped deleted instances can accumulate avoidable background work.
- Other issues: none beyond the finding above.
Recommend guarding on the actual remaining dependency, or clearing successor_instance_id when the successor no longer needs the source instance metadata, and adding a recycler test for that lifecycle.
github-actions Bot
pushed a commit
that referenced
this pull request
Apr 9, 2026
Because the successor instance may still need the data
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Because the successor instance may still need the data
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)