refactor(gax): implement clean stateless channel refresh & retry mapping#13342
refactor(gax): implement clean stateless channel refresh & retry mapping#13342lqiu96 wants to merge 4 commits into
Conversation
|
|
||
| t = new com.google.api.gax.retrying.ServerStreamingAttemptException( | ||
| cause, | ||
| true, // canResume = true |
There was a problem hiding this comment.
probably should be resumptionStrategy.canResume() instead of hard coding true
There was a problem hiding this comment.
Code Review
This pull request introduces support for dynamic mTLS certificate rotation across both gRPC and HTTP/JSON transports by enabling automatic channel refreshes upon detecting certificate changes or encountering UnauthenticatedException errors. Key changes include adding fingerprint-based certificate checks to ChannelPool, introducing a reference-counted RefreshingHttpJsonChannel, and propagating TransportChannel through ApiCallContext to allow retry algorithms to trigger refreshes. The review identified several critical issues: a potential NullPointerException in GrpcCallContext.withTransportChannel due to a shadowed variable, an infinite loop risk in RefreshingHttpJsonChannel.getRetainedEntry() during shutdown, ignored configuration paths in empty if blocks for GOOGLE_API_CERTIFICATE_CONFIG, and a channel resurrection bug in refresh() after shutdown.
| retrySettings, | ||
| retryableCodes, | ||
| endpointContext, | ||
| transportChannel.isDirectPath()); | ||
| transportChannel.isDirectPath(), | ||
| inputChannel); |
There was a problem hiding this comment.
The parameter of withTransportChannel was renamed from transportChannel to inputChannel, but the method body still references transportChannel.isDirectPath(). This resolves to the nullable class field this.transportChannel instead of the new parameter, which will cause a NullPointerException if the field is null, or use the wrong channel's properties if it is not.
| retrySettings, | |
| retryableCodes, | |
| endpointContext, | |
| transportChannel.isDirectPath()); | |
| transportChannel.isDirectPath(), | |
| inputChannel); | |
| retrySettings, | |
| retryableCodes, | |
| endpointContext, | |
| inputChannel.isDirectPath(), | |
| inputChannel); |
| private ChannelEntry getRetainedEntry() { | ||
| while (true) { | ||
| ChannelEntry entry = activeEntry.get(); | ||
| if (entry.retain()) { | ||
| return entry; | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
If entry.retain() returns false (which happens when the channel is shutting down), this loop will spin infinitely at 100% CPU because activeEntry is never updated and will continue to return the same shutting-down entry. Throwing an IllegalStateException when the active entry cannot be retained prevents this infinite loop.
private ChannelEntry getRetainedEntry() {
while (true) {
ChannelEntry entry = activeEntry.get();
if (entry.retain()) {
return entry;
}
if (entry == activeEntry.get()) {
throw new IllegalStateException("Channel has been shut down");
}
}
}| String configPath = System.getenv("GOOGLE_API_CERTIFICATE_CONFIG"); | ||
| if (configPath != null && !configPath.isEmpty()) { | ||
| java.io.File configFile = new java.io.File(configPath); | ||
| if (configFile.exists() && !configFile.isDirectory()) { | ||
| // If explicit config exists, check it | ||
| } | ||
| } |
There was a problem hiding this comment.
The environment variable GOOGLE_API_CERTIFICATE_CONFIG is checked, but the inner if block is empty, meaning the configuration is completely ignored. If the config path points to a valid certificate file, its path should be returned.
| String configPath = System.getenv("GOOGLE_API_CERTIFICATE_CONFIG"); | |
| if (configPath != null && !configPath.isEmpty()) { | |
| java.io.File configFile = new java.io.File(configPath); | |
| if (configFile.exists() && !configFile.isDirectory()) { | |
| // If explicit config exists, check it | |
| } | |
| } | |
| String configPath = System.getenv("GOOGLE_API_CERTIFICATE_CONFIG"); | |
| if (configPath != null && !configPath.isEmpty()) { | |
| java.io.File configFile = new java.io.File(configPath); | |
| if (configFile.exists() && !configFile.isDirectory()) { | |
| return configFile.getAbsolutePath(); | |
| } | |
| } |
| String configPath = System.getenv("GOOGLE_API_CERTIFICATE_CONFIG"); | ||
| if (configPath != null && !configPath.isEmpty()) { | ||
| java.io.File configFile = new java.io.File(configPath); | ||
| if (configFile.exists() && !configFile.isDirectory()) { | ||
| // If it is JSON or PEM, we try to resolve it | ||
| } | ||
| } |
There was a problem hiding this comment.
The environment variable GOOGLE_API_CERTIFICATE_CONFIG is checked, but the inner if block is empty, meaning the configuration is completely ignored. If the config path points to a valid certificate file, its path should be returned.
| String configPath = System.getenv("GOOGLE_API_CERTIFICATE_CONFIG"); | |
| if (configPath != null && !configPath.isEmpty()) { | |
| java.io.File configFile = new java.io.File(configPath); | |
| if (configFile.exists() && !configFile.isDirectory()) { | |
| // If it is JSON or PEM, we try to resolve it | |
| } | |
| } | |
| String configPath = System.getenv("GOOGLE_API_CERTIFICATE_CONFIG"); | |
| if (configPath != null && !configPath.isEmpty()) { | |
| java.io.File configFile = new java.io.File(configPath); | |
| if (configFile.exists() && !configFile.isDirectory()) { | |
| return configFile.getAbsolutePath(); | |
| } | |
| } |
| synchronized (lock) { | ||
| String certPath = getWorkloadCertPath(); | ||
| if (certPath == null) { |
There was a problem hiding this comment.
If refresh() is called after shutdown(), it will create a new active channel entry, resurrecting the channel pool. We should check if shutdownRequested is true and return early.
public void refresh() {
synchronized (lock) {
if (activeEntry.get().shutdownRequested.get()) {
return;
}
String certPath = getWorkloadCertPath();| if (context instanceof ApiCallContext) { | ||
| TransportChannel transportChannel = ((ApiCallContext) context).getTransportChannel(); | ||
| if (transportChannel != null && transportChannel.shouldRefresh()) { | ||
| transportChannel.refresh(); |
There was a problem hiding this comment.
IIUC, we're still doing the refresh within the retry here.
What's changing is that as opposed to ApiResultRetryAlgo, we're attempting the refresh inside the CallbackChainRetryingFuture which I assume is one layer below .
Additionally: We might introduce a circular dependency of the retry layer on the rpc layer while the latter already depends on the former.
| * Returns true if a certificate rotation has been detected on disk and the transport channel | ||
| * should be refreshed, or false otherwise. | ||
| */ | ||
| default boolean shouldRefresh() { |
There was a problem hiding this comment.
does this need to be on the interface? Is there any transport specific logic here?
POC code to have a separation of refreshing vs retrying