The pipeline /checkpoint API is synchronous: it triggers a checkpoint and returns only when the checkpoint succeeds or fails. A checkpoint can take a long time, so it would be better to instead have it return when the checkpoint has been initiated and then use an asynchronous mechanism to report that the checkpoint is complete.
Design proposal
Serial numbers can work OK for this kind of API. For example, "/checkpoint" can return the next serial number. Then we'd add a few items to status reports:
last_succeeded, the serial number of the last checkpoint operation that succeeded.
last_failed, serial number of the last checkpoint operation that failed, as well as the error message associated with that failure.
This enables the usual goal of a caller, which is to find out whether a checkpoint has been successfully written since the time it was requested. If so, then if seq is the sequence number you got, the answer is last_succeeded >= seq.
The pipeline /checkpoint API is synchronous: it triggers a checkpoint and returns only when the checkpoint succeeds or fails. A checkpoint can take a long time, so it would be better to instead have it return when the checkpoint has been initiated and then use an asynchronous mechanism to report that the checkpoint is complete.
Design proposal
Serial numbers can work OK for this kind of API. For example, "/checkpoint" can return the next serial number. Then we'd add a few items to status reports:
last_succeeded, the serial number of the last checkpoint operation that succeeded.last_failed, serial number of the last checkpoint operation that failed, as well as the error message associated with that failure.This enables the usual goal of a caller, which is to find out whether a checkpoint has been successfully written since the time it was requested. If so, then if
seqis the sequence number you got, the answer is last_succeeded >= seq.