Expose GRPC_OPENSSL_CLEANUP_TIMEOUT to control shutdown grace period#39297
Expose GRPC_OPENSSL_CLEANUP_TIMEOUT to control shutdown grace period#39297EngHabu wants to merge 10 commits intogrpc:masterfrom
Conversation
Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>
|
@drfloob mind taking a look? |
|
Does this 2 seconds timeout occur if |
With
I agree, however, in a situation where the process is shutting down (the whole container is shutting down), I'm less concerned about leaking resources than I'm about delaying the shutdown. |
gtcooke94
left a comment
There was a problem hiding this comment.
I'm good with this from a security perspective, let's get other C++ reviewers to check how they want the env var - definitely at least define a constant for this and add some documentation
Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>
|
Thank you, @gtcooke94! |
|
I'm actually from Python team, since this is Core change, @yashykt can you help review or assign? |
|
@XuanWang-Amos can you assign it to @yashykt to make sure they get pinged? or have it pop up in their dashboard or something :-) |
| // Name of the environment variable controlling OpenSSL cleanup timeout. | ||
| // This variable allows users to specify the timeout (in seconds) for OpenSSL | ||
| // resource cleanup during gRPC shutdown. If not set, a default timeout is used. | ||
| #define GRPC_OPENSSL_CLEANUP_TIMEOUT_ENV "GRPC_OPENSSL_CLEANUP_TIMEOUT" |
There was a problem hiding this comment.
| #define GRPC_OPENSSL_CLEANUP_TIMEOUT_ENV "GRPC_OPENSSL_CLEANUP_TIMEOUT" | |
| #define GRPC_ARG_OPENSSL_CLEANUP_TIMEOUT_ENV "grpc.openssl_cleanup_timeout" |
| // Retrieve the OpenSSL cleanup timeout from the environment variable. | ||
| // This allows users to override the default cleanup timeout for OpenSSL | ||
| // resource deallocation during gRPC shutdown. | ||
| const char* env = std::getenv(GRPC_OPENSSL_CLEANUP_TIMEOUT_ENV); |
There was a problem hiding this comment.
let's use grpc_core::GetEnv here
| const char* env = std::getenv(GRPC_OPENSSL_CLEANUP_TIMEOUT_ENV); | ||
| int timeout_sec = 2; | ||
| if (env != nullptr) { | ||
| timeout_sec = std::stoi(env); |
Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>
Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>
|
@yashykt thank you for the review. Please see the updated commits. |
|
Looks like our Bazel build failed, we might need change some [5,568 / 5,672] Compiling src/core/ext/transport/chttp2/transport/chttp2_transport.cc; 12s remote-cache, processwrapper-sandbox ... (16 actions, 15 running)
ERROR: /var/local/jenkins/grpc/BUILD:4209:16: Compiling src/core/tsi/ssl_transport_security.cc failed: (Exit 1): clang-15 failed: error executing CppCompile command (from target //:tsi_ssl_credentials)
(cd /root/.cache/bazel/_bazel_root/954bb7512d44d20015390af6e76121c6/sandbox/processwrapper-sandbox/2728/execroot/com_github_grpc_grpc && \
exec env - \
FUZZINTRO_OUTDIR=/src \
GRPC_BAZEL_RUNTIME=1 \
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/out \
PWD=/proc/self/cwd \
/usr/local/bin/clang-15 -U_FORTIFY_SOURCE -fstack-protector -Wall -Wthread-safety -Wself-assign -Wunused-but-set-parameter -Wno-free-nonheap-object -fcolor-diagnostics -fno-omit-frame-pointer '-std=c++17' -MD -MF bazel-out/k8-fastbuild/bin/_objs/tsi_ssl_credentials/ssl_transport_security.pic.d '-frandom-seed=bazel-out/k8-fastbuild/bin/_objs/tsi_ssl_credentials/ssl_transport_security.pic.o' -fPIC -DCARES_STATICLIB -iquote . -iquote bazel-out/k8-fastbuild/bin -iquote external/com_google_absl -iquote bazel-out/k8-fastbuild/bin/external/com_google_absl -iquote external/com_google_googleapis -iquote bazel-out/k8-fastbuild/bin/external/com_google_googleapis -iquote external/com_google_protobuf -iquote bazel-out/k8-fastbuild/bin/external/com_google_protobuf -iquote external/com_github_cares_cares -iquote bazel-out/k8-fastbuild/bin/external/com_github_cares_cares -iquote external/zlib -iquote bazel-out/k8-fastbuild/bin/external/zlib -iquote external/boringssl -iquote bazel-out/k8-fastbuild/bin/external/boringssl -Ibazel-out/k8-fastbuild/bin/external/com_google_googleapis -Ibazel-out/k8-fastbuild/bin/external/com_google_protobuf/src/google/protobuf/_virtual_imports/any_proto -Ibazel-out/k8-fastbuild/bin/external/com_google_protobuf/third_party/utf8_range/_virtual_includes/utf8_range -Ibazel-out/k8-fastbuild/bin -isystem include -isystem bazel-out/k8-fastbuild/bin/include -isystem src/core/ext/upb-gen -isystem bazel-out/k8-fastbuild/bin/src/core/ext/upb-gen -isystem src/core/ext/upbdefs-gen -isystem bazel-out/k8-fastbuild/bin/src/core/ext/upbdefs-gen -isystem third_party/address_sorting/include -isystem bazel-out/k8-fastbuild/bin/third_party/address_sorting/include -isystem external/com_github_cares_cares/include -isystem bazel-out/k8-fastbuild/bin/external/com_github_cares_cares/include -isystem external/com_github_cares_cares -isystem bazel-out/k8-fastbuild/bin/external/com_github_cares_cares -isystem external/zlib -isystem bazel-out/k8-fastbuild/bin/external/zlib -isystem external/boringssl/src/include -isystem bazel-out/k8-fastbuild/bin/external/boringssl/src/include -Wno-typedef-redefinition -DGRPC_BAZEL_BUILD '-std=c++17' -c src/core/tsi/ssl_transport_security.cc -o bazel-out/k8-fastbuild/bin/_objs/tsi_ssl_credentials/ssl_transport_security.pic.o -no-canonical-prefixes -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' '-fmodule-name=//:tsi_ssl_credentials' '-fmodule-map-file=bazel-out/k8-fastbuild/bin/tsi_ssl_credentials.cppmap' -Xclang -fno-cxx-modules -fmodules-strict-decluse -Wprivate-header '-fmodule-map-file=external/local_config_cc/module.modulemap' '-fmodule-map-file=bazel-out/k8-fastbuild/bin/channel_arg_names.cppmap' '-fmodule-map-file=bazel-out/k8-fastbuild/bin/config_vars.cppmap' '-fmodule-map-file=bazel-out/k8-fastbuild/bin/gpr.cppmap' '-fmodule-map-file=bazel-out/k8-fastbuild/bin/grpc_base.cppmap' '-fmodule-map-file=bazel-out/k8-fastbuild/bin/grpc_core_credentials_header.cppmap' '-fmodule-map-file=bazel-out/k8-fastbuild/bin/grpc_credentials_util.cppmap' '-fmodule-map-file=bazel-out/k8-fastbuild/bin/grpc_public_hdrs.cppmap' '-fmodule-map-file=bazel-out/k8-fastbuild/bin/grpc_security_base.cppmap' '-fmodule-map-file=bazel-out/k8-fastbuild/bin/ref_counted_ptr.cppmap' '-fmodule-map-file=bazel-out/k8-fastbuild/bin/transport_auth_context.cppmap' '-fmodule-map-file=bazel-out/k8-fastbuild/bin/tsi_base.cppmap' '-fmodule-map-file=bazel-out/k8-fastbuild/bin/tsi_ssl_session_cache.cppmap' '-fmodule-map-file=bazel-out/k8-fastbuild/bin/src/core/channel_args.cppmap' '-fmodule-map-file=bazel-out/k8-fastbuild/bin/src/core/error.cppmap' '-fmodule-map-file=bazel-out/k8-fastbuild/bin/src/core/grpc_crl_provider.cppmap' '-fmodule-map-file=bazel-out/k8-fastbuild/bin/src/core/grpc_transport_chttp2_alpn.cppmap' '-fmodule-map-file=bazel-out/k8-fastbuild/bin/src/core/load_file.cppmap' '-fmodule-map-file=bazel-out/k8-fastbuild/bin/src/core/ref_counted.cppmap' '-fmodule-map-file=bazel-out/k8-fastbuild/bin/src/core/slice.cppmap' '-fmodule-map-file=bazel-out/k8-fastbuild/bin/src/core/sync.cppmap' '-fmodule-map-file=bazel-out/k8-fastbuild/bin/src/core/tsi_ssl_types.cppmap' '-fmodule-map-file=bazel-out/k8-fastbuild/bin/src/core/useful.cppmap' '-fmodule-map-file=bazel-out/k8-fastbuild/bin/external/com_google_absl/absl/base/core_headers.cppmap' '-fmodule-map-file=bazel-out/k8-fastbuild/bin/external/com_google_absl/absl/log/check.cppmap' '-fmodule-map-file=bazel-out/k8-fastbuild/bin/external/com_google_absl/absl/log/log.cppmap' '-fmodule-map-file=bazel-out/k8-fastbuild/bin/external/com_google_absl/absl/status/status.cppmap' '-fmodule-map-file=bazel-out/k8-fastbuild/bin/external/com_google_absl/absl/status/statusor.cppmap' '-fmodule-map-file=bazel-out/k8-fastbuild/bin/external/com_google_absl/absl/strings/strings.cppmap' '-fmodule-map-file=bazel-out/k8-fastbuild/bin/external/boringssl/src/crypto.cppmap' '-fmodule-map-file=bazel-out/k8-fastbuild/bin/external/boringssl/src/ssl.cppmap')
# Configuration: 7108c862523f59b4e742a7c9e976566182409b50b4e555417e1ded7aa0ed47dc
# Execution platform: @@internal_platforms_do_not_use//host:host
Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
src/core/tsi/ssl_transport_security.cc:71:10: error: module //:tsi_ssl_credentials does not depend on a module exporting 'src/core/util/env.h'
#include "src/core/util/env.h"
^
1 error generated.
[5,585 / 5,672] checking cached actions
INFO: Elapsed time: 358.554s, Critical Path: 55.70s
INFO: 5585 processes: 2863 internal, 2722 processwrapper-sandbox.
ERROR: Build did NOT complete successfully |
Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>
|
@yashykt fixed the build issue. Mind helping me run CI again and merge the change? |
|
@yashykt I see it failed again... Mind sharing the new errors? |
|
I think we're still getting a similar build error - is the normal bazel build working locally for you? |
| "//src/core:tsi/ssl/key_logging/ssl_key_logging.h", | ||
| "//src/core:tsi/ssl_transport_security.h", | ||
| "//src/core:tsi/ssl_transport_security_utils.h", | ||
| "//src/core:env", |
There was a problem hiding this comment.
Ah, this dependency needs to get added to the deps section, not the hdrs section
There was a problem hiding this comment.
🤦🏽♂️ thank you... updated..
locally running this succeeds
>bazel build //:tsi_ssl_credentials
INFO: Running bazel wrapper (see //tools/bazel for details), bazel version 8.0.1 will be used instead of system-wide bazel installation.
Starting local Bazel server (8.0.1) and connecting to it...
WARNING: WORKSPACE support will be removed in Bazel 9 (late 2025), please migrate to Bzlmod, see https://bazel.build/external/migration.
WARNING: /private/var/tmp/_bazel_haytham/99b9644c7936a3395988ff63192eef6e/external/io_opentelemetry_cpp/api/BUILD:62:10: target '@@io_opentelemetry_cpp//api:with_abseil' is deprecated: The value of this flag is ignored. Bazel builds always depend on Abseil for its pre-adopted `std::` types. You should remove this flag from your build command.
INFO: Analyzed target //:tsi_ssl_credentials (146 packages loaded, 4184 targets configured).
INFO: Found 1 target...
Target //:tsi_ssl_credentials up-to-date:
bazel-bin/libtsi_ssl_credentials.a
INFO: Elapsed time: 96.622s, Critical Path: 18.44s
INFO: 1505 processes: 250 action cache hit, 6 internal, 1499 darwin-sandbox.
INFO: Build completed successfully, 1505 total actions
Signed-off-by: Haytham Abuelfutuh <haytham@afutuh.com>
|
@yashykt mind rerunning CI |
|
@yashykt I assume this is some internal process you can help me drive :-) |
|
Thanks for fixing all the errors, I'll get this merged |
This addresses this fairly annoying issue at shutdown of any python process that uses aio.
The issue and some workarounds are explained collectively in the following issues:
#38679
#33342
#22365
label: Python
Let me dig deeper into the repro and root cause and why this PR.
1. The repro
There are two issues you see here:
If you read the issues above, you are told you should initialize aio by adding:
Early on the main thread.
This, correctly, addresses the first issue (empty error messages) but leaves the second issue still there. If you notice in the exported methods, there is also
shutdown_grpc_aioso we attempted to do so, unsuccessfully, in various places:a. Just in the main thread before main() exits
b. As an atexit handler
2. What's going on
To debug the issue, we recompiled grpc with tracing and additional logs:
and noticed the following:
The delta between these two logs are exactly 2 seconds. This is triggered in the following function:
grpc/src/core/tsi/ssl_transport_security.cc
Line 196 in 20901a4
It waits (for 2 seconds) for
g_intializationsto get to 0 before timing out.while (g_initializations != 0) {
void grpc_shutdown(void) { is the only place that reasonably decrements it. This gets called if we call
shutdown_grpc_aio, so why isn't it working?!Looking at the code, it seems reasonable to suggest that if you are going to call
init_grpc_aiothen you must make sureshutdown_grpc_aiois also called.3. What's this PR
Regardless of the proper way to handle shutdown, I think enforcing a constant 2seconds may not always be the desired grace period for all applications. This PR allows that grace period to be configurable and continues to default to 2 seconds.
After the PR: