feat(coderd/healthcheck): allow configuring database hc threshold#10623
Conversation
| LatencyMs int64 `json:"latency_ms"` | ||
| ThresholdMs int64 `json:"threshold_ms"` |
There was a problem hiding this comment.
self-review: I can make these ints if preferred, but int64 is slightly less annoying to use here.
| DB database.Store | ||
| // TODO: support getting this over HTTP? | ||
| DERPMap *tailcfg.DERPMap | ||
| AccessURL *url.URL | ||
| Client *http.Client | ||
| APIKey string | ||
| AccessURL AccessURLReportOptions | ||
| Database DatabaseReportOptions | ||
| DerpHealth derphealth.ReportOptions | ||
| Websocket WebsocketReportOptions |
There was a problem hiding this comment.
self-review: Refactored this struct to directly pass in the individual checker options.
| if options.HealthcheckTimeout == 0 { | ||
| options.HealthcheckTimeout = 30 * time.Second | ||
| options.HealthcheckTimeout = options.DeploymentValues.Healthcheck.Timeout.Value() | ||
| } | ||
| if options.HealthcheckRefresh == 0 { | ||
| options.HealthcheckRefresh = 10 * time.Minute | ||
| options.HealthcheckRefresh = options.DeploymentValues.Healthcheck.Refresh.Value() |
There was a problem hiding this comment.
self-review: Also plumbed through the healthcheck timeout and refresh and made them configurable. Addresses part of #8969 but I can break this into a separate PR if preferred.
|
We should also show a hint on the database health check page on how to configure this interval. |
Right now they'll show up by default under Deployment > Observability. I can move them to a separate option group and move them to a separate page if required. I might need some design help with adding the hint to the health page though, so I'll defer that to a separate PR. |
|
I think this is fine for now. A hint could be added in a separate PR. Thank you. |
| Reachable bool `json:"reachable"` | ||
| Latency string `json:"latency"` | ||
| LatencyMS int64 `json:"latency_ms"` | ||
| ThresholdMS int64 `json:"threshold_ms"` |
There was a problem hiding this comment.
Should we add the Threshold string for consistency? I don't know what is the original though behind string+int64 pairs.
There was a problem hiding this comment.
Me either, and the fact that I don't know why this particular fence exists makes me pause before knocking it down. I do know that _ms was added post-hoc for the UI (8ee500c) so I suspect it was simply a case of not renaming a field in the API for compatibility purposes.
Instead of parsing the Go string representation (latency) it's probably better for the UI to just create the durations from millis and then display them however is deemed best. So I would argue in favour of leaving out threshold entirely.
mtojek
left a comment
There was a problem hiding this comment.
This is in good shape to get merged 👍
| # database exceeds this threshold over 5 attempts, the database is considered | ||
| # unhealthy. The default value is 15ms. | ||
| # (default: 15ms, type: duration) | ||
| thresholdDatabase: 15ms |
There was a problem hiding this comment.
I think this would be smoother as databaseThreshold vs thresholdDatabase (same with flags, etc)

Fixes #9381:
Also takes care of part of #8969: