Impl sys.audithook#7960
Conversation
📝 WalkthroughWalkthroughThis PR implements Python's sys.audit() hook mechanism in RustPython. It adds audit hook storage to the VM, introduces sys.audit() and sys.addaudithook() as public APIs, and integrates conditional audit event emission across socket, syslog, marshal, time, monitoring, and frame introspection operations. ChangesPython Audit Hook System
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested Reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
📦 Library DependenciesThe following Lib/ modules were modified. Here are their dependencies: [x] test: cpython/Lib/test/test_sys.py (TODO: 7) dependencies: dependent tests: (233 tests)
[x] lib: cpython/Lib/bdb.py dependencies:
dependent tests: (1 tests)
Legend:
|
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/stdlib/src/syslog.rs`:
- Around line 98-104: When auto-opening, the code currently emits the
"syslog.syslog" audit before calling openlog(); change the order so that if
host_syslog::is_open() is false you first emit an audit for "syslog.openlog"
(using the same vm.sys_module.get_attr("audit", vm) and audit.call pattern, e.g.
audit.call((vm.ctx.new_str("syslog.openlog"), /* priority or args as appropriate
*/, vm.ctx.none()), vm) or similar to mirror how syslog.syslog is audited), then
call openlog(OpenLogArgs::default(), vm), and only after openlog succeeds emit
the "syslog.syslog" audit and call audit.call for syslog.syslog and then proceed
to call the syslog handler; locate the logic around
vm.sys_module.get_attr("audit", vm), host_syslog::is_open(), openlog(..., vm)
and the audit.call for "syslog.syslog" to implement this reorder.
- Around line 56-66: The audit payload currently calls args.ident.flatten(),
which conflates "argument omitted" and "explicit None" so openlog(None, ...) is
treated like omitted — change the audit_ident construction to use the outer
Option (args.ident.clone()) so omitted vs explicit None are distinct: if
args.ident is None (argument omitted) fall back to get_argv(vm) as before, if
args.ident is Some(None) produce vm.ctx.none() (explicit None), and if
args.ident is Some(Some(s)) convert s into a PyObjectRef; update the code path
that builds audit_ident (the closure using args.ident.flatten(),
vm.sys_module.get_attr("audit", vm), and get_argv(vm)) to implement this
branching.
In `@crates/vm/src/stdlib/sys.rs`:
- Around line 982-984: The current code checks for "audit" via
vm.sys_module.get_attr and calls it (e.g., the block using
vm.sys_module.get_attr("audit", vm) and audit.call(..., vm)), which allows
bypassing registered audit hooks if sys.audit was removed or replaced; replace
these dynamic attribute lookups with a direct call to run_audit_hooks(...)
passing the same arguments (e.g., the "sys._getframe" name and frame_ref) so
registered hooks always run; apply the same change to the analogous block around
the later occurrence (the code at the other audit call site referenced in the
comment).
In `@crates/vm/src/vm/thread.rs`:
- Line 733: new_thread() currently initializes audit_hooks with
RefCell::new(vec![]), dropping any hooks registered on the parent VM; update
new_thread (in vm::thread.rs) to preserve the parent VM's audit_hooks by cloning
or sharing the existing audit_hooks RefCell contents instead of creating an
empty vec (e.g., derive the new thread's audit_hooks from self.audit_hooks or
the parent ThreadVM's audit_hooks via clone() or by using an Rc/Arc to share the
container), so that sys.addaudithook() handlers registered on the parent are
observed in spawned threads.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 229a247f-bc9d-4092-9c7d-0db605941ce1
⛔ Files ignored due to path filters (4)
Lib/test/audit-tests.pyis excluded by!Lib/**Lib/test/test_audit.pyis excluded by!Lib/**Lib/test/test_bdb.pyis excluded by!Lib/**Lib/test/test_sys_setprofile.pyis excluded by!Lib/**
📒 Files selected for processing (8)
crates/stdlib/src/socket.rscrates/stdlib/src/syslog.rscrates/vm/src/stdlib/marshal.rscrates/vm/src/stdlib/sys.rscrates/vm/src/stdlib/sys/monitoring.rscrates/vm/src/stdlib/time.rscrates/vm/src/vm/mod.rscrates/vm/src/vm/thread.rs
| let ident = match args.ident.clone().flatten() { | ||
| Some(args) => Some(args.to_cstring(vm)?), | ||
| None => get_argv(vm).map(|argv| argv.to_cstring(vm)).transpose()?, | ||
| } | ||
| .map(|ident| ident.into_boxed_c_str()); | ||
|
|
||
| if let Ok(audit) = vm.sys_module.get_attr("audit", vm) { | ||
| let audit_ident: PyObjectRef = args.ident.flatten().map_or_else( | ||
| || get_argv(vm).map_or_else(|| vm.ctx.none(), Into::into), | ||
| Into::into, | ||
| ); |
There was a problem hiding this comment.
Preserve ident=None semantics instead of falling back to argv.
args.ident.flatten() merges “argument omitted” and “explicit None”. That makes openlog(None, ...) behave like omitted ident in audit payload (and ident derivation), which diverges from expected behavior.
Suggested fix
- let ident = match args.ident.clone().flatten() {
- Some(args) => Some(args.to_cstring(vm)?),
- None => get_argv(vm).map(|argv| argv.to_cstring(vm)).transpose()?,
- }
+ let ident = match args.ident.clone() {
+ OptionalArg::Present(Some(s)) => Some(s.to_cstring(vm)?),
+ OptionalArg::Present(None) => None,
+ OptionalArg::Missing => get_argv(vm).map(|argv| argv.to_cstring(vm)).transpose()?,
+ }
.map(|ident| ident.into_boxed_c_str());
if let Ok(audit) = vm.sys_module.get_attr("audit", vm) {
- let audit_ident: PyObjectRef = args.ident.flatten().map_or_else(
- || get_argv(vm).map_or_else(|| vm.ctx.none(), Into::into),
- Into::into,
- );
+ let audit_ident: PyObjectRef = match args.ident {
+ OptionalArg::Present(Some(s)) => s.into(),
+ OptionalArg::Present(None) => vm.ctx.none(),
+ OptionalArg::Missing => get_argv(vm).map_or_else(|| vm.ctx.none(), Into::into),
+ };📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let ident = match args.ident.clone().flatten() { | |
| Some(args) => Some(args.to_cstring(vm)?), | |
| None => get_argv(vm).map(|argv| argv.to_cstring(vm)).transpose()?, | |
| } | |
| .map(|ident| ident.into_boxed_c_str()); | |
| if let Ok(audit) = vm.sys_module.get_attr("audit", vm) { | |
| let audit_ident: PyObjectRef = args.ident.flatten().map_or_else( | |
| || get_argv(vm).map_or_else(|| vm.ctx.none(), Into::into), | |
| Into::into, | |
| ); | |
| let ident = match args.ident.clone() { | |
| OptionalArg::Present(Some(s)) => Some(s.to_cstring(vm)?), | |
| OptionalArg::Present(None) => None, | |
| OptionalArg::Missing => get_argv(vm).map(|argv| argv.to_cstring(vm)).transpose()?, | |
| } | |
| .map(|ident| ident.into_boxed_c_str()); | |
| if let Ok(audit) = vm.sys_module.get_attr("audit", vm) { | |
| let audit_ident: PyObjectRef = match args.ident { | |
| OptionalArg::Present(Some(s)) => s.into(), | |
| OptionalArg::Present(None) => vm.ctx.none(), | |
| OptionalArg::Missing => get_argv(vm).map_or_else(|| vm.ctx.none(), Into::into), | |
| }; |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/stdlib/src/syslog.rs` around lines 56 - 66, The audit payload
currently calls args.ident.flatten(), which conflates "argument omitted" and
"explicit None" so openlog(None, ...) is treated like omitted — change the
audit_ident construction to use the outer Option (args.ident.clone()) so omitted
vs explicit None are distinct: if args.ident is None (argument omitted) fall
back to get_argv(vm) as before, if args.ident is Some(None) produce
vm.ctx.none() (explicit None), and if args.ident is Some(Some(s)) convert s into
a PyObjectRef; update the code path that builds audit_ident (the closure using
args.ident.flatten(), vm.sys_module.get_attr("audit", vm), and get_argv(vm)) to
implement this branching.
| if let Ok(audit) = vm.sys_module.get_attr("audit", vm) { | ||
| audit.call((vm.ctx.new_str("syslog.syslog"), priority, msg.clone()), vm)?; | ||
| } | ||
|
|
||
| if !host_syslog::is_open() { | ||
| openlog(OpenLogArgs::default(), vm)?; | ||
| } |
There was a problem hiding this comment.
Emit syslog.openlog audit before first syslog.syslog when auto-opening.
Right now syslog.syslog is audited before the implicit openlog() call. That inverts expected event ordering on first use.
Suggested fix
- if let Ok(audit) = vm.sys_module.get_attr("audit", vm) {
- audit.call((vm.ctx.new_str("syslog.syslog"), priority, msg.clone()), vm)?;
- }
-
if !host_syslog::is_open() {
openlog(OpenLogArgs::default(), vm)?;
}
+
+ if let Ok(audit) = vm.sys_module.get_attr("audit", vm) {
+ audit.call((vm.ctx.new_str("syslog.syslog"), priority, msg.clone()), vm)?;
+ }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/stdlib/src/syslog.rs` around lines 98 - 104, When auto-opening, the
code currently emits the "syslog.syslog" audit before calling openlog(); change
the order so that if host_syslog::is_open() is false you first emit an audit for
"syslog.openlog" (using the same vm.sys_module.get_attr("audit", vm) and
audit.call pattern, e.g. audit.call((vm.ctx.new_str("syslog.openlog"), /*
priority or args as appropriate */, vm.ctx.none()), vm) or similar to mirror how
syslog.syslog is audited), then call openlog(OpenLogArgs::default(), vm), and
only after openlog succeeds emit the "syslog.syslog" audit and call audit.call
for syslog.syslog and then proceed to call the syslog handler; locate the logic
around vm.sys_module.get_attr("audit", vm), host_syslog::is_open(), openlog(...,
vm) and the audit.call for "syslog.syslog" to implement this reorder.
| if let Ok(audit) = vm.sys_module.get_attr("audit", vm) { | ||
| audit.call((vm.ctx.new_str("sys._getframe"), frame_ref.to_owned()), vm)?; | ||
| } |
There was a problem hiding this comment.
Call audit hooks directly instead of resolving sys.audit dynamically.
These paths currently skip auditing when sys.audit is missing/replaced, even if hooks are registered. Use run_audit_hooks(...) directly to avoid bypass through attribute mutation.
Also applies to: 995-997
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/vm/src/stdlib/sys.rs` around lines 982 - 984, The current code checks
for "audit" via vm.sys_module.get_attr and calls it (e.g., the block using
vm.sys_module.get_attr("audit", vm) and audit.call(..., vm)), which allows
bypassing registered audit hooks if sys.audit was removed or replaced; replace
these dynamic attribute lookups with a direct call to run_audit_hooks(...)
passing the same arguments (e.g., the "sys._getframe" name and frame_ref) so
registered hooks always run; apply the same change to the analogous block around
the later occurrence (the code at the other audit call site referenced in the
comment).
| asyncio_running_loop: RefCell::new(None), | ||
| asyncio_running_task: RefCell::new(None), | ||
| callable_cache: self.callable_cache.clone(), | ||
| audit_hooks: RefCell::new(vec![]), |
There was a problem hiding this comment.
Preserve audit hooks when creating thread VMs.
new_thread() resets audit_hooks to empty, so hooks added via sys.addaudithook() in the parent VM are not observed in spawned threads. This breaks threaded audit dispatch behavior.
Suggested fix
- audit_hooks: RefCell::new(vec![]),
+ audit_hooks: RefCell::new(self.audit_hooks.borrow().clone()),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| audit_hooks: RefCell::new(vec![]), | |
| audit_hooks: RefCell::new(self.audit_hooks.borrow().clone()), |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/vm/src/vm/thread.rs` at line 733, new_thread() currently initializes
audit_hooks with RefCell::new(vec![]), dropping any hooks registered on the
parent VM; update new_thread (in vm::thread.rs) to preserve the parent VM's
audit_hooks by cloning or sharing the existing audit_hooks RefCell contents
instead of creating an empty vec (e.g., derive the new thread's audit_hooks from
self.audit_hooks or the parent ThreadVM's audit_hooks via clone() or by using an
Rc/Arc to share the container), so that sys.addaudithook() handlers registered
on the parent are observed in spawned threads.
ref: #7956
Summary by CodeRabbit