Skip to content

Revert merging of targeted check and flaky check into one#102148

Merged
alexey-milovidov merged 3 commits intomasterfrom
revert-merge-targeted-flaky-check
Apr 10, 2026
Merged

Revert merging of targeted check and flaky check into one#102148
alexey-milovidov merged 3 commits intomasterfrom
revert-merge-targeted-flaky-check

Conversation

@alexey-milovidov
Copy link
Copy Markdown
Member

PR #99513 merged the stateless targeted check and flaky check into a single combined job. This caused all flaky check options — including Thread Fuzzer — to be applied to the targeted check, making it non-representative of the normal test environment.

This reverts the merge by restoring:

  • Separate Stateless tests (arm_asan_ubsan, targeted) job that runs coverage-relevant tests 5 times without Thread Fuzzer
  • Stateless tests (amd_binary, flaky check) job that was removed
  • Removes the Stateless tests (arm_asan_ubsan, flaky check) job that replaced both

Changelog category (leave one):

  • CI Fix or Improvement (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

...

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

PR #99513 merged the stateless targeted check and flaky check into a
single combined job. This was a bad decision because all flaky check
options — including Thread Fuzzer — got applied to the targeted check,
making it non-representative of the normal test environment.

This reverts the merge by restoring:
- Separate `Stateless tests (arm_asan_ubsan, targeted)` job that runs
  coverage-relevant tests 5 times without Thread Fuzzer
- `Stateless tests (amd_binary, flaky check)` job that was removed
- Removes the `Stateless tests (arm_asan_ubsan, flaky check)` job that
  replaced both

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Apr 8, 2026

Workflow [PR], commit [0a2d468]

Summary:


AI Review

Summary

This PR restores separation between targeted and flaky check stateless jobs and reintroduces amd_binary, flaky check. There is one blocker in ci/jobs/functional_tests.py: the targeted-check soft time limit can still be bypassed when the remaining budget is already 0 before the first loop iteration, allowing another unbounded run_tests invocation.

Findings

❌ Blockers

  • [ci/jobs/functional_tests.py:600-622] Targeted-check time budget enforcement is incomplete. The guard runs only for cnt > 0, so when global_time_limit is already exhausted before iteration 0, iteration 0 still executes with global_time_limit=0, and run_tests converts that to timeout=None (no timeout). This can run a full extra clickhouse-test set after budget exhaustion.
    • Suggested fix: enforce if is_targeted_check and global_time_limit <= 0: break before every run_tests call (including the first iteration), or move recomputation/check outside the cnt > 0 gate.
ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
No large/binary files
Final Verdict
  • Status: ⚠️ Request changes
  • Minimum required action: fix the targeted-check time-limit guard so no run_tests invocation starts once the targeted soft budget is exhausted.

@clickhouse-gh clickhouse-gh Bot added the pr-ci label Apr 8, 2026
Comment thread ci/jobs/functional_tests.py Outdated
tests_to_run = list(tests) if tests else tests

for cnt in range(run_sets_cnt):
if global_time_limit > 0 and run_sets_cnt > 1:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The time-limit guard for targeted checks can be bypassed when the initial budget is already exhausted.

global_time_limit is initialized before the loop, and the per-iteration recomputation/break check is gated by if global_time_limit > 0 and run_sets_cnt > 1:. If setup consumed the entire targeted budget (global_time_limit == 0 before the first iteration), this condition is skipped and run_tests is still invoked with global_time_limit=0 (interpreted as no timeout in run_tests).

That means we can start another full clickhouse-test run even after the soft limit is exhausted.

Please move the targeted-check recomputation + exhaustion check outside the global_time_limit > 0 gate (or explicitly break when is_targeted_check and global_time_limit <= 0) before calling run_tests.

@alexey-milovidov alexey-milovidov requested a review from fm4v April 9, 2026 03:10
@fm4v
Copy link
Copy Markdown
Member

fm4v commented Apr 9, 2026

Disabling thread fuzzer here: #102038

alexey-milovidov and others added 2 commits April 9, 2026 20:37
…ed-flaky-check

# Conflicts:
#	.github/workflows/pull_request.yml
When `global_time_limit` is already 0 after initial setup (e.g. setup
consumed the entire budget), the guard `if global_time_limit > 0` was
never entered, allowing all subsequent test iterations to run without
a time check.

Change the loop guard to check on every iteration after the first
(`cnt > 0`) regardless of the initial budget value, so that exhausted
budgets are caught before starting extra iterations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@alexey-milovidov alexey-milovidov merged commit 5f31c78 into master Apr 10, 2026
163 of 164 checks passed
@alexey-milovidov alexey-milovidov deleted the revert-merge-targeted-flaky-check branch April 10, 2026 02:08
@robot-ch-test-poll4 robot-ch-test-poll4 added the pr-synced-to-cloud The PR is synced to the cloud repo label Apr 10, 2026
Onyx2406 added a commit to Onyx2406/ClickHouse that referenced this pull request Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-ci pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants