Implement collections.defaultdict in rust#8132
Conversation
📝 WalkthroughWalkthrough
Changescollections.defaultdict
Sequence Diagram(s)sequenceDiagram
participant Caller
participant PyDefaultDict
participant default_factory
participant PyDict
Caller->>PyDefaultDict: __getitem__(missing key)
PyDefaultDict->>PyDefaultDict: __missing__(key)
alt default_factory is None
PyDefaultDict-->>Caller: raise KeyError
else default_factory callable
PyDefaultDict->>default_factory: call()
default_factory-->>PyDefaultDict: default value
PyDefaultDict->>PyDict: setdefault(key, default value)
PyDefaultDict-->>Caller: default value
end
Caller->>PyDefaultDict: __or__(other dict)
PyDefaultDict->>PyDict: update(other)
PyDefaultDict-->>Caller: new PyDefaultDict with merged entries and factory
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
📦 Library DependenciesThe following Lib/ modules were modified. Here are their dependencies: [ ] lib: cpython/Lib/collections dependencies:
dependent tests: (330 tests)
[x] lib: cpython/Lib/dataclasses.py dependencies:
dependent tests: (98 tests)
Legend:
|
af01eb8 to
9fd4a09
Compare
9fd4a09 to
4857fbb
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/vm/src/stdlib/_collections.rs`:
- Around line 784-789: In the `__missing__` method, the current implementation
uses `setdefault` to store the factory result, but this can return a
pre-existing value if the factory function mutates the dict during execution.
Replace the `setdefault` call with a direct assignment using the dict's set
method instead, ensuring the factory result is unconditionally stored and
returned regardless of any re-entrant mutations that occur during factory
execution.
- Around line 878-897: The defaultdict init method does not validate that only
the expected number of positional arguments are provided. After the second
take_positional() call in the init method (which consumes the default_factory
and the optional mapping), add a validation check to ensure args.args is empty.
If args.args is not empty, return a TypeError with a message indicating the
maximum number of positional arguments expected (2) and the actual number
provided. This will prevent the method from silently accepting extra positional
arguments beyond default_factory and the optional mapping/iterable parameter.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: b95fbf25-a429-4183-bede-2143ea5aa274
⛔ Files ignored due to path filters (3)
Lib/collections/__init__.pyis excluded by!Lib/**Lib/collections/_defaultdict.pyis excluded by!Lib/**Lib/test/test_dataclasses/__init__.pyis excluded by!Lib/**
📒 Files selected for processing (2)
crates/vm/src/builtins/dict.rscrates/vm/src/stdlib/_collections.rs
| fn __missing__(&self, key: PyObjectRef, vm: &VirtualMachine) -> PyResult { | ||
| let factory = self.default_factory.read().clone(); | ||
| if let Some(f) = factory { | ||
| let value = f.call((), vm)?; | ||
| self.dict.setdefault(key, value.into(), vm) | ||
| } else { |
There was a problem hiding this comment.
__missing__ should assign unconditionally, not via setdefault
Line 788 uses setdefault, which can return a pre-inserted value if the factory mutates the dict re-entrantly. defaultdict.__missing__ must store and return the factory result.
Proposed fix
#[pymethod]
fn __missing__(&self, key: PyObjectRef, vm: &VirtualMachine) -> PyResult {
let factory = self.default_factory.read().clone();
if let Some(f) = factory {
let value = f.call((), vm)?;
- self.dict.setdefault(key, value.into(), vm)
+ self.dict.inner_setitem(&*key, value.clone(), vm)?;
+ Ok(value)
} else {
Err(vm.new_key_error(key))
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| fn __missing__(&self, key: PyObjectRef, vm: &VirtualMachine) -> PyResult { | |
| let factory = self.default_factory.read().clone(); | |
| if let Some(f) = factory { | |
| let value = f.call((), vm)?; | |
| self.dict.setdefault(key, value.into(), vm) | |
| } else { | |
| fn __missing__(&self, key: PyObjectRef, vm: &VirtualMachine) -> PyResult { | |
| let factory = self.default_factory.read().clone(); | |
| if let Some(f) = factory { | |
| let value = f.call((), vm)?; | |
| self.dict.inner_setitem(&*key, value.clone(), vm)?; | |
| Ok(value) | |
| } else { | |
| Err(vm.new_key_error(key)) | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/vm/src/stdlib/_collections.rs` around lines 784 - 789, In the
`__missing__` method, the current implementation uses `setdefault` to store the
factory result, but this can return a pre-existing value if the factory function
mutates the dict during execution. Replace the `setdefault` call with a direct
assignment using the dict's set method instead, ensuring the factory result is
unconditionally stored and returned regardless of any re-entrant mutations that
occur during factory execution.
| fn init(zelf: PyRef<Self>, mut args: Self::Args, vm: &VirtualMachine) -> PyResult<()> { | ||
| let default_factory = args.take_positional().map_or(Ok(None), |factory| { | ||
| let is_none = factory.is(&vm.ctx.none()); | ||
|
|
||
| if !is_none && !factory.is_callable() { | ||
| Err(vm.new_type_error("first argument must be callable or None")) | ||
| } else if is_none { | ||
| Ok(None) | ||
| } else { | ||
| Ok(Some(factory)) | ||
| } | ||
| })?; | ||
|
|
||
| *zelf.default_factory.write() = default_factory; | ||
|
|
||
| zelf.dict.update( | ||
| OptionalArg::from_option(args.take_positional()), | ||
| args.kwargs, | ||
| vm, | ||
| )?; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Inspect how Initializer(FuncArgs) callers enforce full argument consumption.
# 1) Find other Initializer impls with FuncArgs and how they reject extra args.
rg -n -C3 'impl Initializer for .*{[[:space:][:print:]]*type Args = FuncArgs;' crates/vm/src
# 2) Look for explicit checks of leftover positional args after take_positional().
rg -n -C3 'take_positional\(\)|remaining|is_empty\(\)|too many positional|takes .* positional' crates/vm/src/stdlib crates/vm/src/builtins crates/vm/src/function
# 3) Inspect defaultdict init call path for automatic leftover-arg validation.
rg -n -C3 'trait Initializer|fn init\(|py_new\(|FuncArgs' crates/vm/srcRepository: RustPython/RustPython
Length of output: 50569
🏁 Script executed:
# Find the Initializer trait definition
rg -n 'trait Initializer' crates/vm/src --type rust -A 10
# Look at how py_new/init is called and if remaining args are validated
rg -n 'fn init\(' crates/vm/src/stdlib/_collections.rs -B 2 -A 15
# Search for other defaultdict-like Initializer impls that might show the pattern
rg -n 'impl Initializer' crates/vm/src/stdlib crates/vm/src/builtins --type rust -A 8 | head -100Repository: RustPython/RustPython
Length of output: 8999
🏁 Script executed:
# See the complete defaultdict.__init__ implementation
sed -n '876,897p' crates/vm/src/stdlib/_collections.rs
# Check what happens in slot_init wrapper - does it validate remaining args?
sed -n '1647,1680p' crates/vm/src/types/slot.rs
# See how dict.update handles the args
rg -n 'fn update' crates/vm/src/stdlib/_collections.rs -A 20 | head -60
# Check if FuncArgs has validation or if take_positional enforces limits
rg -n 'fn take_positional|impl FuncArgs' crates/vm/src/function.rs -A 5 | head -40Repository: RustPython/RustPython
Length of output: 2164
🏁 Script executed:
# Find FuncArgs definition and bind method
rg -n 'struct FuncArgs|impl FuncArgs' crates/vm/src --type rust -A 3 | head -50
# Find take_positional method definition
rg -n 'fn take_positional' crates/vm/src --type rust -B 2 -A 8
# Check the bind method that converts FuncArgs
rg -n 'fn bind' crates/vm/src --type rust -B 2 -A 10 | head -80Repository: RustPython/RustPython
Length of output: 4589
🏁 Script executed:
# Check FromArgs implementation for FuncArgs
rg -n 'impl FromArgs for FuncArgs|fn from_args' crates/vm/src/function/argument.rs -A 15
# Also check if there's a FromArgs impl that validates
rg -n 'trait FromArgs' crates/vm/src -A 8 | head -40Repository: RustPython/RustPython
Length of output: 5310
🏁 Script executed:
# Check if defaultdict uses slot_init or a custom init
rg -n 'fn slot_init|fn init' crates/vm/src/stdlib/_collections.rs -B 3 -A 1
# Also verify how slot_init is invoked - does it check leftover args AFTER init?
sed -n '1652,1690p' crates/vm/src/types/slot.rsRepository: RustPython/RustPython
Length of output: 1909
defaultdict.init silently accepts extra positional arguments
The implementation does not reject positional arguments beyond default_factory and the optional mapping/iterable. It calls take_positional() twice (lines 879, 894) but leaves any 3rd+ argument in the FuncArgs struct. Due to how FuncArgs::from_args consumes the entire struct in slot_init's binding step, the validation check at the framework level cannot catch these leftovers.
To match CPython's behavior, add a check after the second take_positional() to reject any remaining args:
Example fix
zelf.dict.update(
OptionalArg::from_option(args.take_positional()),
args.kwargs,
vm,
)?;
if !args.args.is_empty() {
return Err(vm.new_type_error(format!(
"defaultdict expected at most 2 positional arguments, got {}",
2 + args.args.len()
)));
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/vm/src/stdlib/_collections.rs` around lines 878 - 897, The defaultdict
init method does not validate that only the expected number of positional
arguments are provided. After the second take_positional() call in the init
method (which consumes the default_factory and the optional mapping), add a
validation check to ensure args.args is empty. If args.args is not empty, return
a TypeError with a message indicating the maximum number of positional arguments
expected (2) and the actual number provided. This will prevent the method from
silently accepting extra positional arguments beyond default_factory and the
optional mapping/iterable parameter.
|
|
||
| impl PyDefaultDict { | ||
| fn __or__(lhs: PyObjectRef, rhs: PyObjectRef, vm: &VirtualMachine) -> PyResult { | ||
| Ok(if let Some(zelf) = lhs.downcast_ref::<Self>() { |
There was a problem hiding this comment.
@bschoenmaeckers I'd love to get your review on this as well if you don't mind. I had lots of issues with implementing this correctly for some reason, I'm sure it can be simplifiers
Summary
Summary by CodeRabbit
collections.defaultdictclass supporting default factory functions, merge operations via the|operator, and standard dictionary operations.