balancer/rls: Add picker metrics by zasweq · Pull Request #7484 · grpc/grpc-go

zasweq · 2024-08-06T17:26:30Z

This PR adds the implementation for RLS metrics.

RELEASE NOTES:

balancer/rls: Add picker Pick information metrics

codecov · 2024-08-06T17:42:13Z

Codecov Report

Attention: Patch coverage is 91.07143% with 5 lines in your changes missing coverage. Please review.

Project coverage is 81.50%. Comparing base (c8716e5) to head (b427c24).
Report is 8 commits behind head on master.

Files	Patch %	Lines
balancer/rls/picker.go	88.88%	4 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #7484      +/-   ##
==========================================
- Coverage   81.58%   81.50%   -0.08%     
==========================================
  Files         357      357              
  Lines       27243    27294      +51     
==========================================
+ Hits        22227    22247      +20     
- Misses       3820     3831      +11     
- Partials     1196     1216      +20

Files	Coverage Δ
balancer/rls/balancer.go	`85.76% <100.00%> (+0.14%)`	⬆️
balancer/rls/picker.go	`91.19% <88.88%> (+0.49%)`	⬆️

... and 21 files with indirect coverage changes

easwars · 2024-08-07T18:29:18Z

Also, please add a more descriptive release note entry.

easwars · 2024-08-08T21:33:38Z

+				pr := "fail"
+				if _, ok := status.FromError(err); ok {
+					pr = "drop"
+				}
+				return res, func() {
+					targetPicksMetric.Record(p.metricsRecorder, 1, p.grpcTarget, p.rlsServerTarget, cpw.target, pr)
+				}, err


How about making a helper since this logic is used in two places now:

// errToPickResult is a helper function which converts the error value returned // by Pick() to a string that represents the pick result. func errToPickResult(err error) string { if err == nil { return "complete" } if errors.Is(err, balancer.ErrNoSubConnAvailable) { return "queue" } if _, ok := status.FromError(err); ok { return "drop" } return "fail" }

And we need to handle the balancer.ErrNoSubConnAvailable case in your code, don't we? Otherwise, we will wrongly mark picks that a child policy wanted to be queued as "fail".

And if the decision is to not record a metric for queued picks, we need to make sure that we ignore that explicitly after we handle the balancer.ErrNoSubConnAvailable case, as in the helper I described above.

Oh for some reason I thought the balancer.ErrNoSubConnAvailable was only possible to be emitted from this RLS Layer. I forgot this could emit from the child as well in these error cases. I will try your helper out.

easwars · 2024-08-08T21:41:07Z

Also, how do you plan to test these changes?

zasweq · 2024-08-08T22:57:08Z

Thanks for the pass; great suggestions. I plan on testing this with unit tests for more specific sceanrios/counters, and e2e with RLS deployed as a top level balancer of channel, with a OpenTelemetry component configured. I can then verify OpenTelemetry eventually emits the expected metrics atoms for the RLS Balancer metrics.

easwars · 2024-08-08T23:46:11Z

+				rf := func() {
+					targetPicksMetric.Record(p.metricsRecorder, 1, p.grpcTarget, p.rlsServerTarget, cpw.target, pr)
+				}
+				if pr == "queue" {
+					// Don't record metrics for queued Picks.
+					rf = func() {}
+				}


Here and down below, I would pull in the check for queued picks inside the callback:

rf := func() { if pr == "queue" { // Don't record metrics for queued Picks. return } targetPicksMetric.Record(p.metricsRecorder, 1, p.grpcTarget, p.rlsServerTarget, cpw.target, pr) }

The above approach would mean that for completed and failed picks, we would incur the extra cost of running a conditional if pr == "queue" { ... }, and branch statements can be expensive with CPU pipelining. I'm not too concerned about it though. I prefer the way the code looks when you pull in it. But if you are @dfawley thinks the cost is not worth the ergonomics, I'm fine with it.

Doug said he doesn't mind either, so I switched to this.

zasweq requested review from dfawley and easwars August 6, 2024 17:26

zasweq assigned easwars and dfawley Aug 6, 2024

zasweq added the Type: Feature New features or improvements in behavior label Aug 6, 2024

zasweq added this to the 1.66 Release milestone Aug 6, 2024

zasweq force-pushed the add-rls-metrics branch from 9b8d562 to 5f5ffc2 Compare August 6, 2024 17:37

Add RLS Metrics

439a6ad

zasweq force-pushed the add-rls-metrics branch from 5f5ffc2 to 439a6ad Compare August 6, 2024 17:44

easwars requested changes Aug 7, 2024

View reviewed changes

Comment thread balancer/rls/cache.go

Comment thread balancer/rls/balancer.go Outdated

Comment thread balancer/rls/balancer.go Outdated

Comment thread balancer/rls/balancer.go Outdated

easwars assigned zasweq and unassigned easwars and dfawley Aug 7, 2024

zasweq changed the title ~~balancer/rls: Add metrics~~ balancer/rls: Add picker metrics Aug 8, 2024

zasweq assigned easwars and dfawley and unassigned zasweq Aug 8, 2024

Removed cache metrics and moved picker metrics to outside cache mu

4cb1d35

zasweq force-pushed the add-rls-metrics branch from 816a0f1 to 4cb1d35 Compare August 8, 2024 02:10

easwars requested changes Aug 8, 2024

View reviewed changes

Comment thread balancer/rls/picker.go

Comment thread balancer/rls/picker.go

Comment thread balancer/rls/picker.go Outdated

easwars assigned zasweq and unassigned easwars and dfawley Aug 8, 2024

Responded to Easwar's comments

6b2836d

zasweq assigned easwars and dfawley and unassigned zasweq Aug 8, 2024

easwars reviewed Aug 8, 2024

View reviewed changes

easwars assigned zasweq and unassigned easwars and dfawley Aug 8, 2024

zasweq assigned easwars and unassigned zasweq Aug 8, 2024

zasweq assigned dfawley Aug 8, 2024

zasweq force-pushed the add-rls-metrics branch from 986d44d to ddc2e1d Compare August 8, 2024 23:01

Responded to Easwar's comments

2cce4cf

zasweq force-pushed the add-rls-metrics branch from ddc2e1d to 2cce4cf Compare August 8, 2024 23:05

zasweq mentioned this pull request Aug 8, 2024

balancer/rls: Add cache metrics #7494

Closed

easwars approved these changes Aug 8, 2024

View reviewed changes

easwars removed their assignment Aug 8, 2024

Final comment

b427c24

zasweq merged commit 7b9e012 into grpc:master Aug 9, 2024

zasweq mentioned this pull request Aug 9, 2024

balancer/rls: Add cache metrics #7495

Merged

infovivek2020 pushed a commit to infovivek2020/grpc-go that referenced this pull request Aug 18, 2024

balancer/rls: Add picker metrics (grpc#7484)

2d0266f

tbg mentioned this pull request Nov 27, 2024

DEPS: upgrade grpc to v1.68.0 cockroachdb/cockroach#136278

Closed

16 tasks

github-actions Bot locked as resolved and limited conversation to collaborators Feb 5, 2025

Conversation

zasweq commented Aug 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Aug 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

easwars commented Aug 7, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

easwars Aug 8, 2024

Choose a reason for hiding this comment

Uh oh!

zasweq Aug 8, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

easwars commented Aug 8, 2024

Uh oh!

zasweq commented Aug 8, 2024

Uh oh!

Uh oh!

easwars Aug 8, 2024

Choose a reason for hiding this comment

Uh oh!

zasweq Aug 9, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zasweq commented Aug 6, 2024 •

edited

Loading

codecov Bot commented Aug 6, 2024 •

edited

Loading