Add checkpoint info to /stats API endpoint, add checkpoint info UI#6003
Add checkpoint info to /stats API endpoint, add checkpoint info UI#6003Karakatiza666 wants to merge 6 commits intomainfrom
Conversation
mythical-fred
left a comment
There was a problem hiding this comment.
Two design questions, see inline.
crates/adapters/src/controller.rs
Outdated
|
|
||
| /// Converts epoch milliseconds to a `DateTime<Utc>`, falling back to | ||
| /// `Utc::now()` if the value is 0 or invalid. | ||
| fn epoch_ms_to_datetime(epoch_ms: i64) -> DateTime<Utc> { |
There was a problem hiding this comment.
If epoch_ms is 0/invalid, returning Utc::now() will make the UI show a fresh timestamp even when we don’t actually know the start time. Consider returning an Option (and omitting the field) or a sentinel so the UI can show "unknown" instead of a misleading "just now".
There was a problem hiding this comment.
The epoch_ms should not happen at runtime; updated the comments and the code to reflect that. Since this is a hard rule I opted for an explicit crash if this occurs, LMK if you think it's better to bubble it up as an error
crates/adapters/src/controller.rs
Outdated
| let epoch_ms = self.checkpoint_started_epoch_ms(); | ||
| let started_at = epoch_ms_to_datetime(epoch_ms); | ||
| CheckpointActivity::InProgress { | ||
| sequence_number: 0, |
There was a problem hiding this comment.
sequence_number: 0 looks like a placeholder. If we can’t plumb the real seq here, consider making this field optional or dropping it from CheckpointActivity::InProgress for now to avoid exposing bogus data.
There was a problem hiding this comment.
Dropped sequence_number because it is not available in all cases, and we don't have a business case to display it for an in-progress checkpoint anyway
|
please engage with anna for the UX |
|
Some feedback on UI:
|
9f168a8 to
301b664
Compare
blp
left a comment
There was a problem hiding this comment.
I looked at the Rust code and it seems OK. I'll leave approval for whoever reviews rest of the PR when it becomes ready for review.
crates/adapters/src/controller.rs
Outdated
| /// Is the circuit thread still restoring from a checkpoint (this includes the journal replay phase)? | ||
| restoring: AtomicBool, | ||
|
|
||
| /// Wall-clock epoch millis when the checkpoint entered the Delayed or |
There was a problem hiding this comment.
I think that it is premature optimization to use AtomicI64 for these. I think that Mutex<Option<DateTime<Utc>>> would be cleaner.
Signed-off-by: Karakatiza666 <bulakh.96@gmail.com>
…timestamp Signed-off-by: Karakatiza666 <bulakh.96@gmail.com>
Signed-off-by: Karakatiza666 <bulakh.96@gmail.com>
…rted: Mutex<Option<DateTime<Utc>>> Signed-off-by: Karakatiza666 <bulakh.96@gmail.com>
0fa760f to
a76fc27
Compare
|
The current design is intended to be merged as-is; the nicer design from our design team will be implemented when ready and accepted. |
| &self, | ||
| suspend_error: Result<(), SuspendError>, | ||
| checkpoint_activity: CheckpointActivity, | ||
| permanent_checkpoint_errors: Option<Vec<PermanentSuspendError>>, |
There was a problem hiding this comment.
this type is awfully specific, is this the only type of error possible?
There was a problem hiding this comment.
PermanentSuspendError ? Yes, it covers a few possibilities. This is a type of errors that don't ever allow checkpointing the pipeline.
| /// Wall-clock time when the checkpoint entered the Delayed or | ||
| /// Barriers state. `None` means "not delayed". Set by the circuit thread | ||
| /// in `set_checkpoint_coordination`, read by the HTTP thread. | ||
| checkpoint_delay_started: Mutex<Option<DateTime<Utc>>>, |
There was a problem hiding this comment.
I am not an expert in Rust concurrency, someone else will have to comment whether this is the right type to use here
crates/adapters/src/server.rs
Outdated
| next_seq: u64, | ||
|
|
||
| /// Sequence number of the currently in-flight checkpoint, if any. | ||
| /// This is only valid when exactly one checkpoint runs at a time. |
There was a problem hiding this comment.
Is it possible for many checkpoints to be running at the same time? I thought it's not.
There was a problem hiding this comment.
We only write one checkpoint at a time.
| /// | ||
| /// This tracks transient checkpoint failures (e.g. I/O errors during | ||
| /// writing). A subsequent successful checkpoint will not clear this | ||
| /// field — it always reflects the *last* failure that occurred. |
There was a problem hiding this comment.
the field should be named "last_failure", but I don't know if you can rename it
| Delayed { | ||
| /// Why the checkpoint cannot proceed yet. | ||
| reasons: Vec<TemporarySuspendError>, | ||
| /// When the delay started (ISO 8601). |
There was a problem hiding this comment.
What ISO 8601? It's a DateTime, 8601 is a serialization format. Maybe you want to say "serialized as ISO 8601"
There was a problem hiding this comment.
This comment is applied as a description in openapi.json, but adding "serialized as" makes it less ambiguous either way
| Used storage: {humanSize(metrics.at(-1)?.s.toNumber() ?? 0)} | ||
| <div class="flex flex-nowrap items-center justify-between px-4 pb-2"> | ||
| <span> | ||
| <span class="hidden sm:inline">Used storage:</span> |
There was a problem hiding this comment.
I don't know what these two spans are, in the pictures you attached I only see one
There was a problem hiding this comment.
One is shown on desktop screens, one (shorter) is shown when on mobile
| if (anchorOffscreen) { | ||
| const focusOffscreen = | ||
| tracked.focus.row < visible.min || tracked.focus.row > visible.max | ||
| const focusOffscreen = tracked.focus.row < visible.min || tracked.focus.row > visible.max |
There was a problem hiding this comment.
why these formatting changes?
Isn't the formatting made by a tool?
There was a problem hiding this comment.
It is. This one probably was committed without running through the formatter
|
|
||
| /** | ||
| * Optional settings for tweaking Feldera internals. | ||
| * |
There was a problem hiding this comment.
looks to me that some of these changes should have been in other PRs.
Is this a sign that our tooling for generating these gen.ts files is not complete?
There was a problem hiding this comment.
When people update openapi.json they don't usually re-generate typescript bindings from it; so when I periodically re-generate it it often contains cumulative additions from multiple PRs
| }) | ||
| } | ||
| fetchCheckpoints() | ||
| const interval = setInterval(fetchCheckpoints, 2_000) |
There was a problem hiding this comment.
Are there other polling loops that can be combined with this one?
I don't know if we tested scaling of the manager and ui with lots of pipelines and ui clients.
There was a problem hiding this comment.
I don't see a reason to combine this loop with any other. This is the only access for this endpoint in web-console
|
|
||
| /** Extracts the timestamp from a UUID v7 (first 48 bits = unix ms). | ||
| * Returns null for non-v7 UUIDs. */ | ||
| const uuidV7Timestamp = (uuid: string): string | null => { |
There was a problem hiding this comment.
don't you have a library with utility functions?
this would belong there
There was a problem hiding this comment.
I sometimes don't put utiliities there when there's only a single use for them. Will move
mythical-fred
left a comment
There was a problem hiding this comment.
Two blockers.
-
This PR changes behavior in both the adapters/API layer and the web-console, but it still adds no unit or integration coverage. The new checkpoint activity/status paths, the extra polling loop, and the new UI states all need tests before this is safe to merge. For the UI, please add Vitest + @testing-library/svelte coverage rather than relying only on manual screenshots.
-
This is a user-visible API/UI change (new checkpoint fields, new checkpoint status presentation, new checkpoint actions), but there is still no corresponding docs update under docs.feldera.com/. That is a hard rule for user-visible changes.
Signed-off-by: Karakatiza666 <bulakh.96@gmail.com>
Signed-off-by: Karakatiza666 <bulakh.96@gmail.com>
|
Thanks for the improvements. |

Testing:
Tested the UI using the mock API value simulator. Did not test the adapters and pipeline-manager changes