The bug we're hitting is in a different subsystem: the Usage Sanity Check (UsageSanityChecker.java). Specifically the "snapshot after removed" check, whose query is roughly:
SELECT count(*)
FROM cloud_usage.cloud_usage cu
JOIN cloud.snapshots s ON cu.usage_id = s.id
WHERE cu.usage_type = 9
AND cu.start_date > s.removed;
i.e. usage records in cloud_usage.cloud_usage with a start_date AFTER the snapshot was already soft-deleted in cloud.snapshots. This drives the Usage Sanity Check failed mailer and feeds our Prometheus exporter.
On the running 4.22.1.0 (which includes 83ce006) we still see:
| check |
count |
| snapshot_after_removed |
160 |
| volume_after_removed |
36 |
| template_after_removed |
0 |
| vm_after_destroyed |
0 |
Originally posted by @PPisz in #13398 (comment)
The bug we're hitting is in a different subsystem: the Usage Sanity Check (
UsageSanityChecker.java). Specifically the "snapshot after removed" check, whose query is roughly:i.e. usage records in
cloud_usage.cloud_usagewith astart_dateAFTER the snapshot was already soft-deleted incloud.snapshots. This drives theUsage Sanity Check failedmailer and feeds our Prometheus exporter.On the running
4.22.1.0(which includes 83ce006) we still see:Originally posted by @PPisz in #13398 (comment)