Skip to content

feat(coderd): add provisioner_daemons to /debug/health endpoint#11393

Merged
johnstcn merged 18 commits into
mainfrom
cj/provisionerd-healthcheck
Jan 8, 2024
Merged

feat(coderd): add provisioner_daemons to /debug/health endpoint#11393
johnstcn merged 18 commits into
mainfrom
cj/provisionerd-healthcheck

Conversation

@johnstcn

@johnstcn johnstcn commented Jan 3, 2024

Copy link
Copy Markdown
Member

Adds a healthcheck for provisioner daemons to /debug/health endpoint.

Part of #10676

@johnstcn johnstcn self-assigned this Jan 3, 2024
@johnstcn johnstcn force-pushed the cj/provisionerd-healthcheck branch from 1b2bc9a to 53ed901 Compare January 4, 2024 16:33
@johnstcn johnstcn changed the base branch from main to cj/util-apiversion January 4, 2024 16:34
Base automatically changed from cj/util-apiversion to main January 5, 2024 10:22
@johnstcn johnstcn force-pushed the cj/provisionerd-healthcheck branch from 53ed901 to b83013b Compare January 5, 2024 12:19
@johnstcn johnstcn changed the title WIP: add version healthcheck for provisioner daemons feat(coderd): add provisioner_daemons to /debug/health endpoint Jan 5, 2024
@johnstcn johnstcn marked this pull request as ready for review January 5, 2024 12:31

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you understand why this keeps getting marked as changed?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's the timestamp on the files changing; I haven't yet figured out a way to ignore this.

Comment thread coderd/healthcheck/provisioner.go Outdated
Comment thread coderd/healthcheck/provisioner.go Outdated
Comment thread coderd/healthcheck/provisioner.go Outdated
Comment thread coderd/healthcheck/provisioner.go Outdated
Comment thread coderd/healthcheck/provisioner.go Outdated
Comment thread coderd/healthcheck/healthcheck.go Outdated
Comment thread coderd/healthcheck/provisioner.go Outdated
Comment thread coderd/coderd.go Outdated
Comment thread coderd/healthcheck/provisioner.go Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking out loud, is it possible that a daemon reports an error, but after that, the last seen isn't updated. Then after stale interval the error disappears, and perhaps nobody ever notices it?

Also, don't we want to apply the same rules to r.ProvisionerDaemons?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible that a daemon reports an error, but after that, the last seen isn't updated. Then after stale interval the error disappears, and perhaps nobody ever notices it?

Yes, and this is by design. If a provisioner daemon connects at some point, has a transient error, and then never heartbeats, I don't think it makes sense to warn about this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. I was entertaining the possibility of some error state resulting in the provisioner exiting, crashing or plain stopping communication with server. But I didn't have anything concrete in mind, so this is fine.

Comment thread coderd/coderd.go Outdated
Comment thread coderd/healthcheck/provisioner.go Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels more like Config instead of Deps, but I'll leave it up to you.

Comment thread coderd/healthcheck/provisioner.go Outdated
@johnstcn johnstcn force-pushed the cj/provisionerd-healthcheck branch from 62d42e8 to 19e896a Compare January 8, 2024 09:10
@johnstcn johnstcn merged commit 04fd96a into main Jan 8, 2024
@johnstcn johnstcn deleted the cj/provisionerd-healthcheck branch January 8, 2024 09:29
@github-actions github-actions Bot locked and limited conversation to collaborators Jan 8, 2024
@mtojek

mtojek commented Jan 8, 2024

Copy link
Copy Markdown
Member

@johnstcn Nice contribution! Could you please link it with the umbrella issue and fill in the PR description?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants