Skip to content

Implement collections.defaultdict in rust#8132

Open
ShaharNaveh wants to merge 2 commits into
RustPython:mainfrom
ShaharNaveh:impl-defaultdict
Open

Implement collections.defaultdict in rust#8132
ShaharNaveh wants to merge 2 commits into
RustPython:mainfrom
ShaharNaveh:impl-defaultdict

Conversation

@ShaharNaveh

@ShaharNaveh ShaharNaveh commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Summary

Summary by CodeRabbit

  • New Features
    • Added collections.defaultdict class supporting default factory functions, merge operations via the | operator, and standard dictionary operations.

@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

PyDict::setdefault and PyDict::update are made pub(crate). A new PyDefaultDict struct backed by PyDict is added to the _collections stdlib module, registered as collections.defaultdict, implementing default_factory accessors, __missing__, __copy__, __reduce__, __or__, and the required Initializer, Representable, AsMapping, and AsNumber traits.

Changes

collections.defaultdict

Layer / File(s) Summary
PyDict crate-visibility prerequisites
crates/vm/src/builtins/dict.rs
setdefault and update methods changed from private to pub(crate) so the new defaultdict implementation can delegate to them.
PyDefaultDict class and trait wiring
crates/vm/src/stdlib/_collections.rs
Imports extended with FuncArgs and ToPyObject; PyDefaultDict added as collections.defaultdict with default_factory accessors, __missing__ (calls factory or raises KeyError), __copy__, __reduce__, and __or__ (merges via dict.update); wired with DefaultConstructor, Initializer, Representable, AsMapping, and AsNumber.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant PyDefaultDict
  participant default_factory
  participant PyDict

  Caller->>PyDefaultDict: __getitem__(missing key)
  PyDefaultDict->>PyDefaultDict: __missing__(key)
  alt default_factory is None
    PyDefaultDict-->>Caller: raise KeyError
  else default_factory callable
    PyDefaultDict->>default_factory: call()
    default_factory-->>PyDefaultDict: default value
    PyDefaultDict->>PyDict: setdefault(key, default value)
    PyDefaultDict-->>Caller: default value
  end

  Caller->>PyDefaultDict: __or__(other dict)
  PyDefaultDict->>PyDict: update(other)
  PyDefaultDict-->>Caller: new PyDefaultDict with merged entries and factory
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 Hop hop, the dict gets a friend,
A defaultdict born with factory in hand,
__missing__ calls when keys are astray,
pub(crate) opens the path for the day,
Rust and Python dance — hip hooray! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 73.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'Implement collections.defaultdict in rust' directly and clearly summarizes the main change: adding defaultdict functionality to the collections module in Rust.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

📦 Library Dependencies

The following Lib/ modules were modified. Here are their dependencies:

[ ] lib: cpython/Lib/collections
[x] lib: cpython/Lib/_collections_abc.py
[x] test: cpython/Lib/test/test_collections.py (TODO: 1)
[x] test: cpython/Lib/test/test_deque.py (TODO: 2)
[x] test: cpython/Lib/test/test_defaultdict.py
[x] test: cpython/Lib/test/test_ordered_dict.py (TODO: 7)

dependencies:

  • collections (native: _collections, _weakref, itertools, sys)
    • _collections_abc
    • warnings (native: _contextvars, _thread, _warnings, builtins, sys)
    • _collections_abc, abc, annotationlib, copy, heapq, keyword, operator, reprlib

dependent tests: (330 tests)

  • collections: test_annotationlib test_array test_asyncio test_bisect test_builtin test_c_locale_coercion test_call test_collections test_configparser test_contains test_context test_copy test_csv test_ctypes test_defaultdict test_deque test_descr test_dict test_dictviews test_embed test_enum test_exception_group test_file test_fileinput test_fileio test_frame test_funcattrs test_functools test_genericalias test_hash test_httpservers test_inspect test_io test_ipaddress test_iter test_iterlen test_json test_logging test_math test_monitoring test_ordered_dict test_pathlib test_patma test_pickle test_plistlib test_pprint test_pydoc test_random test_reprlib test_richcmp test_set test_shelve test_sqlite3 test_statistics test_string test_struct test_sys test_traceback test_tuple test_types test_typing test_unittest test_urllib test_userdict test_userlist test_userstring test_weakref test_weakset test_with
    • ast: test_ast test_compile test_compiler_codegen test_dis test_fstring test_future_stmt test_peepholer test_peg_generator test_site test_ssl test_type_comments test_ucn test_unparse
      • annotationlib: test_grammar test_type_annotations test_type_params
      • dbm.dumb: test_dbm_dumb
      • inspect: test_abc test_argparse test_asyncgen test_buffer test_clinic test_code test_coroutines test_decimal test_generators test_ntpath test_operator test_posixpath test_signal test_turtle test_yield_from test_zipimport test_zipimport_support test_zoneinfo
      • pyclbr: test_pyclbr
      • traceback: test_asyncio test_code_module test_contextlib test_contextlib_async test_dictcomps test_exceptions test_http_cookiejar test_importlib test_listcomps test_pyexpat test_setcomps test_socket test_subprocess test_threadedtempfile test_threading test_unittest
    • asyncio: test_asyncio test_os test_pdb
    • concurrent.futures._base: test_concurrent_futures
    • dbm.sqlite3: test_dbm_sqlite3
    • difflib: test_difflib test_profile test_sys_settrace
    • dis: test__opcode test_compiler_assemble test_dtrace test_opcache test_positional_only_arg test_type_cache
      • bdb: test_bdb
      • modulefinder: test_importlib test_modulefinder
      • trace: test_trace
    • email.feedparser: test_email
    • http.client: test_docxmlrpc test_hashlib test_unicodedata test_urllib2 test_wsgiref test_xmlrpc
      • urllib.request: test_sax test_urllib2_localnet test_urllib2net test_urllibnet
    • idlelib: test_idle
    • importlib.metadata: test_importlib
    • inspect:
      • cmd: test_cmd
      • dataclasses: test__colorize test_ctypes test_regrtest
      • pkgutil: test_pkgutil test_pyrepl test_runpy
      • rlcompleter: test_pyrepl test_rlcompleter
    • logging: test_support
      • hashlib: test_hmac test_smtplib test_tarfile
      • multiprocessing.util: test_compileall test_concurrent_futures
      • venv: test_venv
    • multiprocessing: test_fcntl test_memoryview test_multiprocessing_main_handling test_re
    • platform: test__locale test__osx_support test_baseexception test_cmath test_ctypes test_mimetypes test_platform test_posix test_shutil test_strptime test_sysconfig test_time test_winreg
    • pprint: test_htmlparser test_sys_setprofile
      • pickle: test_bool test_bytes test_bz2 test_codecs test_concurrent_futures test_ctypes test_email test_enumerate test_fractions test_http_cookies test_itertools test_list test_lzma test_memoryio test_minidom test_picklebuffer test_pickletools test_range test_slice test_str test_structseq test_super test_type_aliases test_unittest test_uuid test_xml_dom_minicompat test_xml_etree test_xpickle test_zipfile test_zlib test_zoneinfo
    • queue: test_android test_dummy_thread test_sched
    • selectors: test_selectors
      • socket: test_epoll test_exception_hierarchy test_ftplib test_httplib test_imaplib test_kqueue test_largefile test_mailbox test_mmap test_poplib test_pty test_smtpnet test_socketserver test_stat test_timeout test_urllib_response
      • subprocess: test_atexit test_audit test_cmd_line test_cmd_line_script test_ctypes test_faulthandler test_file_eintr test_gc test_gzip test_json test_launcher test_msvcrt test_osx_env test_peg_generator test_poll test_py_compile test_pyrepl test_quopri test_repl test_script_helper test_select test_tempfile test_unittest test_utf8_mode test_wait3 test_webbrowser test_zipfile
    • shlex: test_shlex
    • shutil: test_filecmp test_glob test_importlib test_string_literals test_unicode_file
      • ctypes.util: test_ctypes
      • ensurepip: test_ensurepip
      • pathlib: test_importlib test_pathlib test_tomllib test_tools test_winapi test_zipapp test_zstd
      • tempfile: test_cprofile test_doctest test_generated_cases test_importlib test_linecache test_pkg test_pstats test_pyrepl test_tabnanny test_termios test_tokenize test_winconsoleio test_zipfile64
      • zipfile: test_zipfile
    • statistics:
      • random: test_complex test_devpoll test_email test_float test_grp test_heapq test_int test_long test_numeric_tower test_pow test_pwd test_queue test_sort test_strtod test_thread
    • string: test_email test_fnmatch test_pyrepl test_secrets test_string
    • threading: test_concurrent_futures test_ctypes test_fork1 test_importlib test_ioctl test_pyrepl test_robotparser test_syslog test_threading_local
      • dummy_threading: test_dummy_threading
      • sysconfig: test_asdl_parser test_tools
    • traceback:
      • timeit: test_timeit
    • tracemalloc: test_tracemalloc
    • urllib.parse: test_urlparse
    • wave: test_wave

[x] lib: cpython/Lib/dataclasses.py
[x] test: cpython/Lib/test/test_dataclasses (TODO: 1)

dependencies:

  • dataclasses

dependent tests: (98 tests)

  • dataclasses: test__colorize test_copy test_ctypes test_enum test_genericalias test_patma test_pprint test_pydoc test_regrtest test_typing test_zoneinfo
    • pprint: test_htmlparser test_ssl test_sys_setprofile test_unittest
      • pdb: test_pdb
      • pickle: test_annotationlib test_argparse test_array test_ast test_asyncio test_bool test_builtin test_bytes test_bz2 test_codecs test_collections test_concurrent_futures test_configparser test_coroutines test_csv test_ctypes test_decimal test_defaultdict test_deque test_descr test_dict test_dictviews test_email test_enumerate test_exceptions test_fractions test_functools test_generators test_http_cookies test_importlib test_inspect test_io test_ipaddress test_iter test_itertools test_list test_logging test_lzma test_memoryio test_memoryview test_minidom test_opcache test_operator test_ordered_dict test_os test_pathlib test_pickle test_picklebuffer test_pickletools test_platform test_plistlib test_positional_only_arg test_posix test_random test_range test_re test_set test_shelve test_slice test_socket test_statistics test_str test_string test_structseq test_super test_time test_trace test_tuple test_turtle test_type_aliases test_type_params test_types test_unittest test_uuid test_xml_dom_minicompat test_xml_etree test_xpickle test_zipfile test_zlib test_zoneinfo
    • pstats: test_profile test_pstats

Legend:

  • [+] path exists in CPython
  • [x] up-to-date, [ ] outdated

@ShaharNaveh ShaharNaveh marked this pull request as ready for review June 19, 2026 15:05

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/vm/src/stdlib/_collections.rs`:
- Around line 784-789: In the `__missing__` method, the current implementation
uses `setdefault` to store the factory result, but this can return a
pre-existing value if the factory function mutates the dict during execution.
Replace the `setdefault` call with a direct assignment using the dict's set
method instead, ensuring the factory result is unconditionally stored and
returned regardless of any re-entrant mutations that occur during factory
execution.
- Around line 878-897: The defaultdict init method does not validate that only
the expected number of positional arguments are provided. After the second
take_positional() call in the init method (which consumes the default_factory
and the optional mapping), add a validation check to ensure args.args is empty.
If args.args is not empty, return a TypeError with a message indicating the
maximum number of positional arguments expected (2) and the actual number
provided. This will prevent the method from silently accepting extra positional
arguments beyond default_factory and the optional mapping/iterable parameter.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: b95fbf25-a429-4183-bede-2143ea5aa274

📥 Commits

Reviewing files that changed from the base of the PR and between fe2a7db and 5e25861.

⛔ Files ignored due to path filters (3)
  • Lib/collections/__init__.py is excluded by !Lib/**
  • Lib/collections/_defaultdict.py is excluded by !Lib/**
  • Lib/test/test_dataclasses/__init__.py is excluded by !Lib/**
📒 Files selected for processing (2)
  • crates/vm/src/builtins/dict.rs
  • crates/vm/src/stdlib/_collections.rs

Comment on lines +784 to +789
fn __missing__(&self, key: PyObjectRef, vm: &VirtualMachine) -> PyResult {
let factory = self.default_factory.read().clone();
if let Some(f) = factory {
let value = f.call((), vm)?;
self.dict.setdefault(key, value.into(), vm)
} else {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

__missing__ should assign unconditionally, not via setdefault

Line 788 uses setdefault, which can return a pre-inserted value if the factory mutates the dict re-entrantly. defaultdict.__missing__ must store and return the factory result.

Proposed fix
         #[pymethod]
         fn __missing__(&self, key: PyObjectRef, vm: &VirtualMachine) -> PyResult {
             let factory = self.default_factory.read().clone();
             if let Some(f) = factory {
                 let value = f.call((), vm)?;
-                self.dict.setdefault(key, value.into(), vm)
+                self.dict.inner_setitem(&*key, value.clone(), vm)?;
+                Ok(value)
             } else {
                 Err(vm.new_key_error(key))
             }
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn __missing__(&self, key: PyObjectRef, vm: &VirtualMachine) -> PyResult {
let factory = self.default_factory.read().clone();
if let Some(f) = factory {
let value = f.call((), vm)?;
self.dict.setdefault(key, value.into(), vm)
} else {
fn __missing__(&self, key: PyObjectRef, vm: &VirtualMachine) -> PyResult {
let factory = self.default_factory.read().clone();
if let Some(f) = factory {
let value = f.call((), vm)?;
self.dict.inner_setitem(&*key, value.clone(), vm)?;
Ok(value)
} else {
Err(vm.new_key_error(key))
}
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/vm/src/stdlib/_collections.rs` around lines 784 - 789, In the
`__missing__` method, the current implementation uses `setdefault` to store the
factory result, but this can return a pre-existing value if the factory function
mutates the dict during execution. Replace the `setdefault` call with a direct
assignment using the dict's set method instead, ensuring the factory result is
unconditionally stored and returned regardless of any re-entrant mutations that
occur during factory execution.

Comment on lines +878 to +897
fn init(zelf: PyRef<Self>, mut args: Self::Args, vm: &VirtualMachine) -> PyResult<()> {
let default_factory = args.take_positional().map_or(Ok(None), |factory| {
let is_none = factory.is(&vm.ctx.none());

if !is_none && !factory.is_callable() {
Err(vm.new_type_error("first argument must be callable or None"))
} else if is_none {
Ok(None)
} else {
Ok(Some(factory))
}
})?;

*zelf.default_factory.write() = default_factory;

zelf.dict.update(
OptionalArg::from_option(args.take_positional()),
args.kwargs,
vm,
)?;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Inspect how Initializer(FuncArgs) callers enforce full argument consumption.
# 1) Find other Initializer impls with FuncArgs and how they reject extra args.
rg -n -C3 'impl Initializer for .*{[[:space:][:print:]]*type Args = FuncArgs;' crates/vm/src

# 2) Look for explicit checks of leftover positional args after take_positional().
rg -n -C3 'take_positional\(\)|remaining|is_empty\(\)|too many positional|takes .* positional' crates/vm/src/stdlib crates/vm/src/builtins crates/vm/src/function

# 3) Inspect defaultdict init call path for automatic leftover-arg validation.
rg -n -C3 'trait Initializer|fn init\(|py_new\(|FuncArgs' crates/vm/src

Repository: RustPython/RustPython

Length of output: 50569


🏁 Script executed:

# Find the Initializer trait definition
rg -n 'trait Initializer' crates/vm/src --type rust -A 10

# Look at how py_new/init is called and if remaining args are validated
rg -n 'fn init\(' crates/vm/src/stdlib/_collections.rs -B 2 -A 15

# Search for other defaultdict-like Initializer impls that might show the pattern
rg -n 'impl Initializer' crates/vm/src/stdlib crates/vm/src/builtins --type rust -A 8 | head -100

Repository: RustPython/RustPython

Length of output: 8999


🏁 Script executed:

# See the complete defaultdict.__init__ implementation
sed -n '876,897p' crates/vm/src/stdlib/_collections.rs

# Check what happens in slot_init wrapper - does it validate remaining args?
sed -n '1647,1680p' crates/vm/src/types/slot.rs

# See how dict.update handles the args
rg -n 'fn update' crates/vm/src/stdlib/_collections.rs -A 20 | head -60

# Check if FuncArgs has validation or if take_positional enforces limits
rg -n 'fn take_positional|impl FuncArgs' crates/vm/src/function.rs -A 5 | head -40

Repository: RustPython/RustPython

Length of output: 2164


🏁 Script executed:

# Find FuncArgs definition and bind method
rg -n 'struct FuncArgs|impl FuncArgs' crates/vm/src --type rust -A 3 | head -50

# Find take_positional method definition
rg -n 'fn take_positional' crates/vm/src --type rust -B 2 -A 8

# Check the bind method that converts FuncArgs
rg -n 'fn bind' crates/vm/src --type rust -B 2 -A 10 | head -80

Repository: RustPython/RustPython

Length of output: 4589


🏁 Script executed:

# Check FromArgs implementation for FuncArgs
rg -n 'impl FromArgs for FuncArgs|fn from_args' crates/vm/src/function/argument.rs -A 15

# Also check if there's a FromArgs impl that validates
rg -n 'trait FromArgs' crates/vm/src -A 8 | head -40

Repository: RustPython/RustPython

Length of output: 5310


🏁 Script executed:

# Check if defaultdict uses slot_init or a custom init
rg -n 'fn slot_init|fn init' crates/vm/src/stdlib/_collections.rs -B 3 -A 1

# Also verify how slot_init is invoked - does it check leftover args AFTER init?
sed -n '1652,1690p' crates/vm/src/types/slot.rs

Repository: RustPython/RustPython

Length of output: 1909


defaultdict.init silently accepts extra positional arguments

The implementation does not reject positional arguments beyond default_factory and the optional mapping/iterable. It calls take_positional() twice (lines 879, 894) but leaves any 3rd+ argument in the FuncArgs struct. Due to how FuncArgs::from_args consumes the entire struct in slot_init's binding step, the validation check at the framework level cannot catch these leftovers.

To match CPython's behavior, add a check after the second take_positional() to reject any remaining args:

Example fix
zelf.dict.update(
    OptionalArg::from_option(args.take_positional()),
    args.kwargs,
    vm,
)?;

if !args.args.is_empty() {
    return Err(vm.new_type_error(format!(
        "defaultdict expected at most 2 positional arguments, got {}",
        2 + args.args.len()
    )));
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/vm/src/stdlib/_collections.rs` around lines 878 - 897, The defaultdict
init method does not validate that only the expected number of positional
arguments are provided. After the second take_positional() call in the init
method (which consumes the default_factory and the optional mapping), add a
validation check to ensure args.args is empty. If args.args is not empty, return
a TypeError with a message indicating the maximum number of positional arguments
expected (2) and the actual number provided. This will prevent the method from
silently accepting extra positional arguments beyond default_factory and the
optional mapping/iterable parameter.


impl PyDefaultDict {
fn __or__(lhs: PyObjectRef, rhs: PyObjectRef, vm: &VirtualMachine) -> PyResult {
Ok(if let Some(zelf) = lhs.downcast_ref::<Self>() {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bschoenmaeckers I'd love to get your review on this as well if you don't mind. I had lots of issues with implementing this correctly for some reason, I'm sure it can be simplifiers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant