Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7484 +/- ##
==========================================
- Coverage 81.58% 81.50% -0.08%
==========================================
Files 357 357
Lines 27243 27294 +51
==========================================
+ Hits 22227 22247 +20
- Misses 3820 3831 +11
- Partials 1196 1216 +20
|
|
Also, please add a more descriptive release note entry. |
| pr := "fail" | ||
| if _, ok := status.FromError(err); ok { | ||
| pr = "drop" | ||
| } | ||
| return res, func() { | ||
| targetPicksMetric.Record(p.metricsRecorder, 1, p.grpcTarget, p.rlsServerTarget, cpw.target, pr) | ||
| }, err |
There was a problem hiding this comment.
How about making a helper since this logic is used in two places now:
// errToPickResult is a helper function which converts the error value returned
// by Pick() to a string that represents the pick result.
func errToPickResult(err error) string {
if err == nil {
return "complete"
}
if errors.Is(err, balancer.ErrNoSubConnAvailable) {
return "queue"
}
if _, ok := status.FromError(err); ok {
return "drop"
}
return "fail"
}And we need to handle the balancer.ErrNoSubConnAvailable case in your code, don't we? Otherwise, we will wrongly mark picks that a child policy wanted to be queued as "fail".
And if the decision is to not record a metric for queued picks, we need to make sure that we ignore that explicitly after we handle the balancer.ErrNoSubConnAvailable case, as in the helper I described above.
There was a problem hiding this comment.
Oh for some reason I thought the balancer.ErrNoSubConnAvailable was only possible to be emitted from this RLS Layer. I forgot this could emit from the child as well in these error cases. I will try your helper out.
|
Also, how do you plan to test these changes? |
|
Thanks for the pass; great suggestions. I plan on testing this with unit tests for more specific sceanrios/counters, and e2e with RLS deployed as a top level balancer of channel, with a OpenTelemetry component configured. I can then verify OpenTelemetry eventually emits the expected metrics atoms for the RLS Balancer metrics. |
| rf := func() { | ||
| targetPicksMetric.Record(p.metricsRecorder, 1, p.grpcTarget, p.rlsServerTarget, cpw.target, pr) | ||
| } | ||
| if pr == "queue" { | ||
| // Don't record metrics for queued Picks. | ||
| rf = func() {} | ||
| } |
There was a problem hiding this comment.
Here and down below, I would pull in the check for queued picks inside the callback:
rf := func() {
if pr == "queue" {
// Don't record metrics for queued Picks.
return
}
targetPicksMetric.Record(p.metricsRecorder, 1, p.grpcTarget, p.rlsServerTarget, cpw.target, pr)
}
The above approach would mean that for completed and failed picks, we would incur the extra cost of running a conditional if pr == "queue" { ... }, and branch statements can be expensive with CPU pipelining. I'm not too concerned about it though. I prefer the way the code looks when you pull in it. But if you are @dfawley thinks the cost is not worth the ergonomics, I'm fine with it.
There was a problem hiding this comment.
Doug said he doesn't mind either, so I switched to this.
This PR adds the implementation for RLS metrics.
RELEASE NOTES: