Merged
Conversation
Codecov Report
@@ Coverage Diff @@
## main #102 +/- ##
============================================
- Coverage 81.59% 81.28% -0.31%
Complexity 232 232
============================================
Files 28 28
Lines 788 791 +3
Branches 57 57
============================================
Hits 643 643
- Misses 97 100 +3
Partials 48 48
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
This comment has been minimized.
This comment has been minimized.
surajpuvvada
approved these changes
Aug 26, 2021
tim-mwangi
approved these changes
Aug 26, 2021
Contributor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
After deploying the previous probe changes in a more complex environment with a service mesh, instrumentation and constrained resources (limit around 400m instead of the default 2 cpus). I found the default values were a little ambitious. The grpc-based check is significantly more sensitive to resource constraints then the previous checks, which from my perspective is a good thing and reflective of its increased fidelity - that is, the health check reflects the response time the service clients would see.
In order to account for that, I took the following changes:
As a separate exercise, it might be worth exploring why the cpu usage on attribute service is so bursty, as it does really minimal compute. It appears to be correlated with a synchronized burst of traffic, taking quite a bit of time to stabilize to its baseline (significantly longer than any call should take). Below log, with some temporary messaging enabled for illustration, shows this behavior.
Testing
Deployed and tuned this a bit in a test environment running with a mesh and constrained resources.