Skip to content

basic hostenv impl#7604

Open
youknowone wants to merge 2 commits intoRustPython:mainfrom
youknowone:hostenv
Open

basic hostenv impl#7604
youknowone wants to merge 2 commits intoRustPython:mainfrom
youknowone:hostenv

Conversation

@youknowone
Copy link
Copy Markdown
Member

@youknowone youknowone commented Apr 14, 2026

Summary by CodeRabbit

  • Chores
    • Reorganized internal platform-specific utilities into a dedicated host environment module for improved code organization and maintainability.

youknowone and others added 2 commits April 13, 2026 23:05
Move os, crt_fd, fileutils, windows, macros modules from
rustpython-common into the new rustpython-host-env crate.
This isolates host OS API wrappers behind a crate boundary
with zero Python runtime dependency.

- Add crates/host_env to workspace
- Drop nix, windows-sys, widestring deps from common
- Wire vm and stdlib to depend on rustpython-host-env
- Migrate all imports from common::{os,crt_fd,fileutils,windows}
  to rustpython_host_env::
Agent-Logs-Url: https://github.com/RustPython/RustPython/sessions/48d1e64d-37ce-409f-b511-8e61a349665c

Co-authored-by: youknowone <69878+youknowone@users.noreply.github.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 14, 2026

📝 Walkthrough

Walkthrough

A new rustpython-host_env crate is introduced to consolidate platform-specific host environment utilities and system calls previously scattered across rustpython-common. Dependencies across the codebase are updated to reference the new module paths, with rustpython-common cleaned up by removing platform-specific dependencies and public module exports.

Changes

Cohort / File(s) Summary
New rustpython-host_env crate foundation
Cargo.toml, crates/host_env/Cargo.toml, crates/host_env/src/lib.rs
Created new workspace crate with platform-specific dependencies and conditional module declarations for os, crt_fd, fileutils, fcntl, select, syslog, termios, posix, shm, signal, time, msvcrt, nt, winapi modules.
rustpython-host_env Unix/POSIX modules
crates/host_env/src/fcntl.rs, crates/host_env/src/posix.rs, crates/host_env/src/select.rs, crates/host_env/src/shm.rs, crates/host_env/src/signal.rs, crates/host_env/src/syslog.rs, crates/host_env/src/termios.rs
Introduced Unix/POSIX wrapper functions around libc syscalls: fcntl, ioctl, flock, lockf helpers; POSIX fd inheritability; cross-platform select abstraction; POSIX shared-memory wrappers; signal timeval converters; syslog integration with global state; termios terminal I/O wrappers.
rustpython-host_env Windows/MSVC modules
crates/host_env/src/msvcrt.rs, crates/host_env/src/nt.rs, crates/host_env/src/winapi.rs, crates/host_env/src/time.rs
Added Windows-specific wrappers: MSVCRT character I/O and file locking; NT permission/chmod utilities; Win32 API helpers (ACP, process handle, error, version); time utilities including timezone and conversions.
rustpython-common cleanup
crates/common/Cargo.toml, crates/common/src/lib.rs
Removed platform-specific target dependencies (nix, widestring, windows-sys) and unexported previously public modules (macros, crt_fd, fileutils, os, windows).
rustpython-stdlib import updates
crates/stdlib/src/faulthandler.rs, crates/stdlib/src/fcntl.rs, crates/stdlib/src/mmap.rs, crates/stdlib/src/openssl.rs, crates/stdlib/src/overlapped.rs, crates/stdlib/src/posixshmem.rs, crates/stdlib/src/select.rs, crates/stdlib/src/socket.rs, crates/stdlib/src/syslog.rs, crates/stdlib/src/termios.rs
Updated module imports from rustpython-common to rustpython-host_env; replaced local select implementation and syslog state management with host_env wrappers; simplified error-handling paths by delegating to wrapper functions.
rustpython-vm dependency setup
crates/vm/Cargo.toml
Added rustpython-host_env workspace dependency.
rustpython-vm core module updates
crates/vm/src/lib.rs, crates/vm/src/exceptions.rs, crates/vm/src/function/fspath.rs, crates/vm/src/ospath.rs, crates/vm/src/vm/vm_new.rs, crates/vm/src/windows.rs
Exported host_env as public re-export; updated errno/error handling and os string conversion to use host_env sources; switched Windows-specific utilities import paths.
rustpython-vm stdlib module updates
crates/vm/src/stdlib/_codecs.rs, crates/vm/src/stdlib/_ctypes/function.rs, crates/vm/src/stdlib/_io.rs, crates/vm/src/stdlib/_signal.rs, crates/vm/src/stdlib/_winapi.rs, crates/vm/src/stdlib/msvcrt.rs, crates/vm/src/stdlib/nt.rs, crates/vm/src/stdlib/os.rs, crates/vm/src/stdlib/posix.rs, crates/vm/src/stdlib/posix_compat.rs, crates/vm/src/stdlib/sys.rs, crates/vm/src/stdlib/time.rs, crates/vm/src/stdlib/winreg.rs, crates/vm/src/stdlib/winsound.rs
Comprehensive import migration from rustpython-common to rustpython-host_env across Windows wide-string conversions, errno management, file descriptor operations, and platform-specific syscall wrappers.
Example and CLI updates
examples/generator.rs, examples/package_embed.rs, src/lib.rs
Updated exit-code helper calls to use host_env module instead of common module.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Poem

🐰 A Warren of Wrappers

Scattered helpers found a home so neat,
In host_env's burrow, cozy and complete!
From common's scattered warren we did roam,
Platform whispers now have a unified home. 🏡✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 67.07% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'basic hostenv impl' is vague and lacks specificity. It uses generic phrasing that doesn't clearly convey the scope or nature of the changes, making it difficult for reviewers scanning history to understand the PR's primary purpose. Revise the title to be more descriptive, such as 'Extract host environment utilities into new rustpython-host_env crate' or 'Refactor platform-specific code to host_env module', to clearly communicate the main change.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@youknowone youknowone marked this pull request as ready for review April 14, 2026 11:59
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (3)
crates/host_env/src/msvcrt.rs (2)

58-60: char to u16 cast silently truncates supplementary plane characters.

Characters above U+FFFF (supplementary planes) will be truncated when cast to u16. While this matches Windows' 16-bit wchar_t limitation, the silent data loss could be surprising.

📝 Consider documenting or validating BMP range
 pub fn putwch(c: char) {
+    // Note: Windows wchar_t is 16-bit; only BMP characters (U+0000-U+FFFF) supported
+    debug_assert!(c as u32 <= 0xFFFF, "putwch only supports BMP characters");
     unsafe { suppress_iph!(_putwch(c as u16)) };
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/host_env/src/msvcrt.rs` around lines 58 - 60, The putwch function
silently truncates non-BMP characters when casting char to u16; locate putwch
(and its call to suppress_iph!(_putwch(...))) and add validation or
documentation: check if c as u32 <= 0xFFFF before calling _putwch, and if not
either map to a replacement (e.g. '\u{FFFD}') or return/propagate an error/log a
warning (choose consistent behavior with your crate's error handling);
alternatively add a clear doc comment on putwch stating it accepts only BMP
characters and that callers must handle supplementary-plane code points
themselves.

36-52: Panic on invalid Unicode could be problematic in edge cases.

char::from_u32 will return None for surrogate code units (0xD800-0xDFFF). If _getwch/_getwche returns a surrogate (which shouldn't happen for console input, but could occur with unusual input methods), this will panic.

Consider returning a Result or using a replacement character instead of panicking.

♻️ Alternative: Return Result or use replacement char
 pub fn getwch() -> String {
     let value = unsafe { _getwch() };
-    char::from_u32(value)
-        .unwrap_or_else(|| panic!("invalid unicode {value:`#x`} from _getwch"))
-        .to_string()
+    char::from_u32(value)
+        .unwrap_or(char::REPLACEMENT_CHARACTER)
+        .to_string()
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/host_env/src/msvcrt.rs` around lines 36 - 52, The current
implementations of getwch and getwche panic if char::from_u32 returns None
(surrogate/invalid codepoints); to avoid this, stop panicking and normalize
invalid values to the Unicode replacement character instead so the API stays
unchanged: in getwch and getwche replace the unwrap_or_else panic path with a
fallback to '\u{FFFD}' before .to_string(); keep getche as-is (it returns raw
bytes). Ensure you remove the panic messages and any panic-causing closures in
functions getwch and getwche so invalid input yields U+FFFD rather than
aborting.
crates/host_env/src/fcntl.rs (1)

57-75: Function named lockf uses flock-style flags, which could cause confusion.

The function is named lockf but accepts a cmd parameter that's checked against libc::LOCK_NB (a flock-style flag). POSIX lockf() uses different command constants (F_LOCK, F_ULOCK, F_TLOCK, F_TEST), while LOCK_NB is from flock().

This conflation of APIs could confuse callers. Consider renaming to fcntl_lock or documenting that this expects flock-style flags.

📝 Suggested rename for clarity
+/// Performs file locking via fcntl using flock-style flags.
+/// If `cmd` includes `LOCK_NB`, uses non-blocking `F_SETLK`; otherwise uses blocking `F_SETLKW`.
 #[cfg(not(any(target_os = "wasi", target_os = "redox")))]
-pub fn lockf(fd: i32, cmd: i32, lock: &libc::flock) -> io::Result<i32> {
+pub fn fcntl_lock(fd: i32, cmd: i32, lock: &libc::flock) -> io::Result<i32> {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/host_env/src/fcntl.rs` around lines 57 - 75, The function lockf
currently mixes flock-style flags (checks libc::LOCK_NB on the cmd parameter)
with fcntl operations (using F_SETLK/F_SETLKW), which is confusing and
potentially incorrect; either rename the function to something like fcntl_lock
(or fcntl_lockf) and update its docstring to state it expects flock-style
nonblocking flag semantics, or change the cmd handling to accept POSIX lockf
commands (F_LOCK/F_TLOCK/F_ULOCK/F_TEST) and map them appropriately to
F_SETLK/F_SETLKW; locate the function named lockf and its cmd parameter and
adjust the name and/or the flag checks and documentation accordingly so callers
aren’t misled by the mixed APIs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/host_env/src/select.rs`:
- Around line 171-176: sec_to_timeval can produce negative tv_usec when sec is
negative which yields invalid timeval for select(); update sec_to_timeval so
negative timeouts are handled by clamping to a non-negative timeval (e.g., if
sec <= 0.0 return timeval { tv_sec: 0, tv_usec: 0 }), otherwise compute tv_sec =
sec.trunc() as _ and tv_usec = (sec.fract() * 1e6) as _ (ensuring tv_usec is
non-negative and fits the timeval field); reference sec_to_timeval and timeval
(and select()) when making the change.

In `@crates/host_env/src/shm.rs`:
- Around line 5-6: Replace the infallible conversion using try_into().unwrap()
with a direct cast on FreeBSD: change the conversion of mode in the
#[cfg(target_os = "freebsd")] branch so that the local variable `mode` is cast
to the target type (since `libc::c_uint` and `libc::mode_t` are both u32 on
FreeBSD) rather than calling `try_into().unwrap()`, updating the `mode` binding
used later in functions in this module (shm.rs) accordingly.

In `@crates/host_env/src/syslog.rs`:
- Around line 62-64: The implementation of pub const fn log_mask(pri: i32)
shifts the wrong operand; change the shift to compute 1 << pri (not pri << 1) so
LOG_MASK(0) yields 1 and all masks are correct. Update log_mask to perform the
shift with a literal one of the correct integer type (e.g., 1i32 << pri or (1 as
i32) << pri) and keep the function signature and const semantics intact. Ensure
the return type stays i32 and handle pri as-is (no additional checks needed
here).
- Around line 35-43: syslog() and closelog() have race conditions: ensure
syslog() holds the global_ident read lock while calling libc::syslog() so the
ident Box<CStr> cannot be freed concurrently, and change closelog() to acquire
the global_ident write lock and perform the is_open() check and drop the ident
while holding that write lock (preventing an openlog() from racing in between).
Also correct log_mask() to use the POSIX LOG_MASK formula (return 1 << pri, not
pri << 1). Update references: openlog (stores GlobalIdent), syslog
(acquire/read-lock around libc::syslog call), closelog (do is_open check and
drop under write lock), global_ident (use its locks accordingly), and log_mask
(use 1 << pri).

In `@crates/host_env/src/time.rs`:
- Around line 24-27: get_tz_info currently ignores GetTimeZoneInformation's
return value and returns a zeroed TIME_ZONE_INFORMATION on failure; change
get_tz_info to return a
Result<windows_sys::Win32::System::Time::TIME_ZONE_INFORMATION, std::io::Error>
(or similar error type), call GetTimeZoneInformation and check its return value
against TIME_ZONE_ID_INVALID, and on failure call GetLastError to construct and
return an Err(std::io::Error::from_raw_os_error(...)); update all callers in
crates/vm/src/stdlib/time.rs (the wrapper function and the functions altzone,
timezone, daylight, tzname and the platform module) to handle the Result by
propagating errors or mapping the Ok value to the previous behavior so failures
are not silently converted to a zeroed struct. Ensure unique symbols:
get_tz_info, wrapper function in stdlib/time.rs, altzone, timezone, daylight,
tzname, and the platform module are updated accordingly.
- Around line 31-45: checked_tm_from_struct_time currently only checks
tm.tm_wday < 0 causing a potential out-of-bounds when asctime_from_tm indexes
WDAY_NAME; update checked_tm_from_struct_time to validate the full range for
tm.tm_wday (>=0 && <=6) and ensure tm.tm_mon is validated for 0..=11 as well so
both WDAY_NAME[tm.tm_wday as usize] and MON_NAME[tm.tm_mon as usize] cannot
panic; locate and update the checks around tm.tm_wday and tm.tm_mon in the
checked_tm_from_struct_time function accordingly.

In `@crates/stdlib/src/openssl.rs`:
- Around line 1671-1681: The fallback arm of the map_err on the
rustpython_host_env::fileutils::fopen call currently returns
vm.new_os_error(e.to_string()) which drops the original errno and prevents
mapping to specific OSError subtypes; change that fallback to return the
original IO error converted to a Python exception by calling
e.to_pyexception(vm) so errno is preserved (i.e., replace
vm.new_os_error(e.to_string()) with e.to_pyexception(vm) inside the map_err
closure).

In `@crates/stdlib/src/select.rs`:
- Around line 106-108: The select path is reading the wrong Windows error
namespace; update the error retrieval in the host_select::select() helper
(crates/host_env/src/select.rs) to call WSAGetLastError() on Windows and
construct the io::Error from that code instead of using
GetLastError()/io::Error::last_os_error(); ensure the returned Error codes
propagate into select.rs::select() and socket.rs::sock_select() so the
interrupted-retry logic can detect Winsock-specific errors (e.g.,
WSAEINTR/WSAEINPROGRESS) correctly. Use conditional compilation for Windows
targets to call the WinSock API (WSAGetLastError) and fallback to the existing
path on non-Windows platforms so behavior is unchanged elsewhere.

In `@crates/vm/src/stdlib/msvcrt.rs`:
- Around line 46-54: The wide-char wrappers (getwch, getche, getwche) currently
treat MSVCRT UTF-16 code units as Rust char and can panic on surrogate halves;
update the plumbing to handle UTF-16 code units instead of Unicode scalars:
change the host_msvcrt side bindings for _getwch/_getwche/_putwch/_ungetwch to
use u16/u32 (i.e., raw UTF-16 code units), stop calling char::from_u32 on the
returned values, and in the VM wrappers (getwch, getche, getwche) either return
the raw u16 code units (Vec<u16>) or safely decode sequences using
std::char::decode_utf16 to produce a String without panicking (handling Err by
using replacement U+FFFD); likewise, when sending to putwch/ungetwch, accept
UTF-16 code units (u16/u32) rather than truncating Rust char.

---

Nitpick comments:
In `@crates/host_env/src/fcntl.rs`:
- Around line 57-75: The function lockf currently mixes flock-style flags
(checks libc::LOCK_NB on the cmd parameter) with fcntl operations (using
F_SETLK/F_SETLKW), which is confusing and potentially incorrect; either rename
the function to something like fcntl_lock (or fcntl_lockf) and update its
docstring to state it expects flock-style nonblocking flag semantics, or change
the cmd handling to accept POSIX lockf commands (F_LOCK/F_TLOCK/F_ULOCK/F_TEST)
and map them appropriately to F_SETLK/F_SETLKW; locate the function named lockf
and its cmd parameter and adjust the name and/or the flag checks and
documentation accordingly so callers aren’t misled by the mixed APIs.

In `@crates/host_env/src/msvcrt.rs`:
- Around line 58-60: The putwch function silently truncates non-BMP characters
when casting char to u16; locate putwch (and its call to
suppress_iph!(_putwch(...))) and add validation or documentation: check if c as
u32 <= 0xFFFF before calling _putwch, and if not either map to a replacement
(e.g. '\u{FFFD}') or return/propagate an error/log a warning (choose consistent
behavior with your crate's error handling); alternatively add a clear doc
comment on putwch stating it accepts only BMP characters and that callers must
handle supplementary-plane code points themselves.
- Around line 36-52: The current implementations of getwch and getwche panic if
char::from_u32 returns None (surrogate/invalid codepoints); to avoid this, stop
panicking and normalize invalid values to the Unicode replacement character
instead so the API stays unchanged: in getwch and getwche replace the
unwrap_or_else panic path with a fallback to '\u{FFFD}' before .to_string();
keep getche as-is (it returns raw bytes). Ensure you remove the panic messages
and any panic-causing closures in functions getwch and getwche so invalid input
yields U+FFFD rather than aborting.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 32126555-6507-4aa3-9b49-7cd150eda503

📥 Commits

Reviewing files that changed from the base of the PR and between 2b1b0ba and ee34181.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (56)
  • Cargo.toml
  • crates/common/Cargo.toml
  • crates/common/src/lib.rs
  • crates/host_env/Cargo.toml
  • crates/host_env/src/crt_fd.rs
  • crates/host_env/src/fcntl.rs
  • crates/host_env/src/fileutils.rs
  • crates/host_env/src/lib.rs
  • crates/host_env/src/macros.rs
  • crates/host_env/src/msvcrt.rs
  • crates/host_env/src/nt.rs
  • crates/host_env/src/os.rs
  • crates/host_env/src/posix.rs
  • crates/host_env/src/select.rs
  • crates/host_env/src/shm.rs
  • crates/host_env/src/signal.rs
  • crates/host_env/src/syslog.rs
  • crates/host_env/src/termios.rs
  • crates/host_env/src/time.rs
  • crates/host_env/src/winapi.rs
  • crates/host_env/src/windows.rs
  • crates/stdlib/Cargo.toml
  • crates/stdlib/src/faulthandler.rs
  • crates/stdlib/src/fcntl.rs
  • crates/stdlib/src/mmap.rs
  • crates/stdlib/src/openssl.rs
  • crates/stdlib/src/overlapped.rs
  • crates/stdlib/src/posixshmem.rs
  • crates/stdlib/src/select.rs
  • crates/stdlib/src/socket.rs
  • crates/stdlib/src/syslog.rs
  • crates/stdlib/src/termios.rs
  • crates/vm/Cargo.toml
  • crates/vm/src/exceptions.rs
  • crates/vm/src/function/fspath.rs
  • crates/vm/src/lib.rs
  • crates/vm/src/ospath.rs
  • crates/vm/src/stdlib/_codecs.rs
  • crates/vm/src/stdlib/_ctypes/function.rs
  • crates/vm/src/stdlib/_io.rs
  • crates/vm/src/stdlib/_signal.rs
  • crates/vm/src/stdlib/_winapi.rs
  • crates/vm/src/stdlib/msvcrt.rs
  • crates/vm/src/stdlib/nt.rs
  • crates/vm/src/stdlib/os.rs
  • crates/vm/src/stdlib/posix.rs
  • crates/vm/src/stdlib/posix_compat.rs
  • crates/vm/src/stdlib/sys.rs
  • crates/vm/src/stdlib/time.rs
  • crates/vm/src/stdlib/winreg.rs
  • crates/vm/src/stdlib/winsound.rs
  • crates/vm/src/vm/vm_new.rs
  • crates/vm/src/windows.rs
  • examples/generator.rs
  • examples/package_embed.rs
  • src/lib.rs
💤 Files with no reviewable changes (2)
  • crates/common/Cargo.toml
  • crates/common/src/lib.rs

Comment on lines +171 to +176
pub fn sec_to_timeval(sec: f64) -> timeval {
timeval {
tv_sec: sec.trunc() as _,
tv_usec: (sec.fract() * 1e6) as _,
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Negative timeout values could produce invalid timeval.

If sec is negative, sec.fract() will also be negative, resulting in a negative tv_usec value. This could cause undefined behavior when passed to select().

🛡️ Proposed fix to handle negative values
 pub fn sec_to_timeval(sec: f64) -> timeval {
+    // Clamp negative values to zero to avoid invalid timeval
+    let sec = sec.max(0.0);
     timeval {
         tv_sec: sec.trunc() as _,
         tv_usec: (sec.fract() * 1e6) as _,
     }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pub fn sec_to_timeval(sec: f64) -> timeval {
timeval {
tv_sec: sec.trunc() as _,
tv_usec: (sec.fract() * 1e6) as _,
}
}
pub fn sec_to_timeval(sec: f64) -> timeval {
// Clamp negative values to zero to avoid invalid timeval
let sec = sec.max(0.0);
timeval {
tv_sec: sec.trunc() as _,
tv_usec: (sec.fract() * 1e6) as _,
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/host_env/src/select.rs` around lines 171 - 176, sec_to_timeval can
produce negative tv_usec when sec is negative which yields invalid timeval for
select(); update sec_to_timeval so negative timeouts are handled by clamping to
a non-negative timeval (e.g., if sec <= 0.0 return timeval { tv_sec: 0, tv_usec:
0 }), otherwise compute tv_sec = sec.trunc() as _ and tv_usec = (sec.fract() *
1e6) as _ (ensuring tv_usec is non-negative and fits the timeval field);
reference sec_to_timeval and timeval (and select()) when making the change.

Comment on lines +5 to +6
#[cfg(target_os = "freebsd")]
let mode = mode.try_into().unwrap();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

In Rust libc bindings on FreeBSD, what is the modeparameter type forshm_open, and can converting from libc::c_uintviaTryInto fail?

💡 Result:

In Rust's libc bindings on FreeBSD, the mode parameter for shm_open is of type libc::mode_t, which is u32 (type alias). The libc::shm_open function is declared as pub unsafe extern "C" fn shm_open(name: *const c_char, oflag: c_int, ...) -> c_int, using variadic arguments (...), so the mode value passed after oflag must be compatible with FreeBSD's C mode_t (also u32). Since libc::c_uint is also u32 on FreeBSD, converting from libc::c_uint to libc::mode_t via TryInto cannot fail, as it is a zero-cost type alias conversion between identical underlying types (u32 to u32).

Citations:


🏁 Script executed:

# Find and examine the shm.rs file
fd -type f -name "shm.rs" | head -20

Repository: RustPython/RustPython

Length of output: 235


🏁 Script executed:

# Find the shm.rs file in the repository
fd shm.rs

Repository: RustPython/RustPython

Length of output: 91


🏁 Script executed:

# Read the shm.rs file to see the function and its context
cat -n crates/host_env/src/shm.rs

Repository: RustPython/RustPython

Length of output: 801


🏁 Script executed:

# Check the libc crate definitions for FreeBSD mode_t and c_uint
# First, let's see what's in the Cargo.lock or Cargo.toml to find the libc version
fd -name "Cargo.toml" | xargs grep -l "libc" | head -5

Repository: RustPython/RustPython

Length of output: 295


🏁 Script executed:

# Find Cargo.toml files that reference libc
find . -name "Cargo.toml" -type f | head -5

Repository: RustPython/RustPython

Length of output: 202


🏁 Script executed:

# Check the main Cargo.toml and the host_env Cargo.toml for libc dependency
grep -r "libc" ./Cargo.toml ./crates/host_env/Cargo.toml 2>/dev/null | head -10

Repository: RustPython/RustPython

Length of output: 192


🏁 Script executed:

# Let's verify the actual type definitions for FreeBSD
# Check if there's any documentation or test that shows the types
rg "mode_t|c_uint" crates/host_env/ -A 2 -B 2

Repository: RustPython/RustPython

Length of output: 933


Simplify FreeBSD mode conversion using a direct cast.

The try_into().unwrap() is unnecessary since both libc::c_uint and libc::mode_t are u32 on FreeBSD, making the conversion infallible. Use a direct cast instead for clarity:

Proposed fix
 pub fn shm_open(name: &CStr, flags: libc::c_int, mode: libc::c_uint) -> io::Result<libc::c_int> {
     #[cfg(target_os = "freebsd")]
-    let mode = mode.try_into().unwrap();
+    let mode = mode as libc::mode_t;
 
     let fd = unsafe { libc::shm_open(name.as_ptr(), flags, mode) };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#[cfg(target_os = "freebsd")]
let mode = mode.try_into().unwrap();
pub fn shm_open(name: &CStr, flags: libc::c_int, mode: libc::c_uint) -> io::Result<libc::c_int> {
#[cfg(target_os = "freebsd")]
let mode = mode as libc::mode_t;
let fd = unsafe { libc::shm_open(name.as_ptr(), flags, mode) };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/host_env/src/shm.rs` around lines 5 - 6, Replace the infallible
conversion using try_into().unwrap() with a direct cast on FreeBSD: change the
conversion of mode in the #[cfg(target_os = "freebsd")] branch so that the local
variable `mode` is cast to the target type (since `libc::c_uint` and
`libc::mode_t` are both u32 on FreeBSD) rather than calling
`try_into().unwrap()`, updating the `mode` binding used later in functions in
this module (shm.rs) accordingly.

Comment on lines +35 to +43
pub fn openlog(ident: Option<Box<CStr>>, logoption: i32, facility: i32) {
let ident = match ident {
Some(ident) => GlobalIdent::Explicit(ident),
None => GlobalIdent::Implicit,
};
let mut locked_ident = global_ident().write().expect("syslog lock poisoned");
unsafe { libc::openlog(ident.as_ptr(), logoption, facility) };
*locked_ident = Some(ident);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat crates/host_env/src/syslog.rs

Repository: RustPython/RustPython

Length of output: 1698


🌐 Web query:

POSIX syslog openlog ident pointer lifetime semantics

💡 Result:

According to POSIX.1-2024 (IEEE Std 1003.1-2024), the openlog function takes a const char *ident parameter, which is a pointer to a null-terminated string used as an identifier prepended to syslog messages. The specification states: "The application shall ensure that the string pointed to by ident remains valid during the syslog calls that will prepend this identifier; however, it is unspecified whether changes made to the string will change the identifier prepended by later syslog calls." This means the pointer is stored and used across subsequent syslog calls until closelog or another openlog is called. The application is responsible for ensuring the string's lifetime extends at least until those syslog calls complete (i.e., do not free or invalidate the memory while the syslog connection uses it). Implementations typically store the pointer directly without copying the string, so the memory must persist. GNU C Library documentation confirms: "the string pointer ident will be retained internally by the Syslog routines. You must not free the memory that ident points to." Linux and other man pages note similar behavior: the pointer is stored as-is, and invalidating it leads to undefined results. Best practice: Pass a static string literal or ensure the string outlives all relevant syslog calls (e.g., until closelog).

Citations:


🌐 Web query:

syslog LOG_MASK macro definition 1 << pri

💡 Result:

The LOG_MASK macro in syslog.h is defined as #define LOG_MASK(pri) (1 << (pri)). This creates a bitmask for a specific syslog priority level (0-7), used with setlogmask to control which priority messages are logged. For example, LOG_MASK(LOG_ERR) evaluates to 1 << 3, enabling only error-level messages.

Citations:


Hold the ident lock during syslog() and move the is_open() check inside closelog()'s write lock.

Per POSIX.1-2024, the ident pointer passed to openlog() is retained by libc and must remain valid across subsequent syslog() calls. The current code has two races:

  1. syslog() calls libc::syslog() without holding the lock on global_ident, so a concurrent closelog() can drop the Box<CStr> while libc still uses the pointer.
  2. closelog() checks is_open() (releasing the lock) before acquiring the write lock to drop the ident, allowing a concurrent openlog() to race in between.
Proposed fixes
 pub fn syslog(priority: i32, msg: &CStr) {
+    let _ident_guard = global_ident().read().expect("syslog lock poisoned");
     let cformat = c"%s";
     unsafe { libc::syslog(priority, cformat.as_ptr(), msg.as_ptr()) };
 }

 pub fn closelog() {
-    if is_open() {
-        let mut locked_ident = global_ident().write().expect("syslog lock poisoned");
+    let mut locked_ident = global_ident().write().expect("syslog lock poisoned");
+    if locked_ident.is_some() {
         unsafe { libc::closelog() };
         *locked_ident = None;
     }
 }

Also, log_mask() implements the wrong formula. The syslog LOG_MASK macro is 1 << pri, not pri << 1. For example, log_mask(5) should evaluate to 32, not 10.

 pub const fn log_mask(pri: i32) -> i32 {
-    pri << 1
+    1 << pri
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/host_env/src/syslog.rs` around lines 35 - 43, syslog() and closelog()
have race conditions: ensure syslog() holds the global_ident read lock while
calling libc::syslog() so the ident Box<CStr> cannot be freed concurrently, and
change closelog() to acquire the global_ident write lock and perform the
is_open() check and drop the ident while holding that write lock (preventing an
openlog() from racing in between). Also correct log_mask() to use the POSIX
LOG_MASK formula (return 1 << pri, not pri << 1). Update references: openlog
(stores GlobalIdent), syslog (acquire/read-lock around libc::syslog call),
closelog (do is_open check and drop under write lock), global_ident (use its
locks accordingly), and log_mask (use 1 << pri).

Comment on lines +62 to +64
pub const fn log_mask(pri: i32) -> i32 {
pri << 1
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's check the current state of the syslog.rs file
cat -n crates/host_env/src/syslog.rs | head -80

Repository: RustPython/RustPython

Length of output: 2174


🏁 Script executed:

# Search for any usages of log_mask in the codebase
rg "log_mask" --type rust -B 2 -A 2

Repository: RustPython/RustPython

Length of output: 578


🌐 Web query:

syslog LOG_MASK macro definition standard

💡 Result:

The LOG_MASK macro is defined in the POSIX standard <syslog.h> header as a macro that expands to an expression of type int when its argument pri (of type int) is provided. It generates a bitmask for a specific priority level pri, used as part of the maskpri argument to setlogmask to control which syslog priority levels are logged. Standard POSIX definition (IEEE Std 1003.1): LOG_MASK(pri) - A mask for priority pri. Common implementation across systems (BSD/glibc/OpenBSD/Haiku): #define LOG_MASK(pri) (1 << (pri)) This sets the bit corresponding to the priority value (0-7 for LOG_EMERG to LOG_DEBUG), enabling logging only for messages at that exact priority when passed to setlogmask. The default mask allows all priorities. It is often used with LOG_UPTO(pri), which masks all priorities from LOG_EMERG (0) through pri.

Citations:


LOG_MASK() shifts the wrong operand.

The syslog mask macro per POSIX standard is 1 << pri, not pri << 1. The current implementation returns 0 for LOG_MASK(0) instead of 1, and produces incorrect masks for all other priorities.

Proposed fix
 pub const fn log_mask(pri: i32) -> i32 {
-    pri << 1
+    1 << pri
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pub const fn log_mask(pri: i32) -> i32 {
pri << 1
}
pub const fn log_mask(pri: i32) -> i32 {
1 << pri
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/host_env/src/syslog.rs` around lines 62 - 64, The implementation of
pub const fn log_mask(pri: i32) shifts the wrong operand; change the shift to
compute 1 << pri (not pri << 1) so LOG_MASK(0) yields 1 and all masks are
correct. Update log_mask to perform the shift with a literal one of the correct
integer type (e.g., 1i32 << pri or (1 as i32) << pri) and keep the function
signature and const semantics intact. Ensure the return type stays i32 and
handle pri as-is (no additional checks needed here).

Comment on lines +24 to +27
pub fn get_tz_info() -> windows_sys::Win32::System::Time::TIME_ZONE_INFORMATION {
let mut info = unsafe { core::mem::zeroed() };
unsafe { windows_sys::Win32::System::Time::GetTimeZoneInformation(&mut info) };
info
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

For Win32 GetTimeZoneInformation, what return value indicates failure, and should callers check GetLastError in that case?

💡 Result:

The return value indicating failure for Win32 GetTimeZoneInformation is TIME_ZONE_ID_INVALID (0xFFFFFFFF). Callers should check GetLastError in that case to get extended error information, such as for out-of-memory errors.

Citations:


🏁 Script executed:

# Read the time.rs file to verify the current code state
cat -n crates/host_env/src/time.rs

Repository: RustPython/RustPython

Length of output: 1776


🏁 Script executed:

# Search for all usages of get_tz_info to understand call sites
rg "get_tz_info" --type rs -B 2 -A 2

Repository: RustPython/RustPython

Length of output: 91


🏁 Script executed:

# Search for all usages of get_tz_info
rg "get_tz_info" -A 2 -B 2

Repository: RustPython/RustPython

Length of output: 3079


Handle GetTimeZoneInformation failure explicitly.

The return value is currently ignored; on failure this returns a zeroed struct, which silently produces incorrect timezone behavior. The Windows API signals failure with TIME_ZONE_ID_INVALID and requires checking GetLastError for extended error information.

All call sites in crates/vm/src/stdlib/time.rs (wrapper function, altzone, timezone, daylight, tzname, and platform module) currently expect a direct struct return and will need to be updated to handle the Result type.

Proposed fix
+use std::io;
+
 #[cfg(target_env = "msvc")]
 #[cfg(not(target_arch = "wasm32"))]
-pub fn get_tz_info() -> windows_sys::Win32::System::Time::TIME_ZONE_INFORMATION {
+pub fn get_tz_info() -> io::Result<windows_sys::Win32::System::Time::TIME_ZONE_INFORMATION> {
     let mut info = unsafe { core::mem::zeroed() };
-    unsafe { windows_sys::Win32::System::Time::GetTimeZoneInformation(&mut info) };
-    info
+    let status = unsafe { windows_sys::Win32::System::Time::GetTimeZoneInformation(&mut info) };
+    if status == windows_sys::Win32::System::Time::TIME_ZONE_ID_INVALID {
+        Err(io::Error::last_os_error())
+    } else {
+        Ok(info)
+    }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/host_env/src/time.rs` around lines 24 - 27, get_tz_info currently
ignores GetTimeZoneInformation's return value and returns a zeroed
TIME_ZONE_INFORMATION on failure; change get_tz_info to return a
Result<windows_sys::Win32::System::Time::TIME_ZONE_INFORMATION, std::io::Error>
(or similar error type), call GetTimeZoneInformation and check its return value
against TIME_ZONE_ID_INVALID, and on failure call GetLastError to construct and
return an Err(std::io::Error::from_raw_os_error(...)); update all callers in
crates/vm/src/stdlib/time.rs (the wrapper function and the functions altzone,
timezone, daylight, tzname and the platform module) to handle the Result by
propagating errors or mapping the Ok value to the previous behavior so failures
are not silently converted to a zeroed struct. Ensure unique symbols:
get_tz_info, wrapper function in stdlib/time.rs, altzone, timezone, daylight,
tzname, and the platform module are updated accordingly.

Comment on lines +31 to +45
pub fn asctime_from_tm(tm: &libc::tm) -> String {
const WDAY_NAME: [&str; 7] = ["Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"];
const MON_NAME: [&str; 12] = [
"Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec",
];
format!(
"{} {}{:>3} {:02}:{:02}:{:02} {}",
WDAY_NAME[tm.tm_wday as usize],
MON_NAME[tm.tm_mon as usize],
tm.tm_mday,
tm.tm_hour,
tm.tm_min,
tm.tm_sec,
tm.tm_year + 1900
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
rg -n --type rust 'asctime_from_tm\('
rg -n --type rust -C 4 'tm_wday|tm_mon'

Repository: RustPython/RustPython

Length of output: 1784


🏁 Script executed:

rg -n --type rust 'asctime_from_tm' --max-count 20

Repository: RustPython/RustPython

Length of output: 392


🏁 Script executed:

rg -n --type rust 'tm_wday|tm_mon' -C 3 --max-count 30

Repository: RustPython/RustPython

Length of output: 13157


🌐 Web query:

libc::tm tm_wday tm_mon range valid values documentation

💡 Result:

In the C standard library (libc) struct tm, the fields tm_wday and tm_mon have the following valid ranges: tm_wday: days since Sunday [0, 6] (0=Sunday, 1=Monday, ..., 6=Saturday) tm_mon: months since January [0, 11] (0=January, 1=February, ..., 11=December) These ranges are standard across C (cppreference.com citing C11/C17/C23), POSIX (Open Group Base Specifications), and libc man pages (Linux/Ubuntu). Functions like mktime may normalize or ignore tm_wday values outside this range, but the documented valid input/output range is 0-6 for tm_wday and 0-11 for tm_mon.

Citations:


🏁 Script executed:

sed -n '580,600p' crates/vm/src/stdlib/time.rs
sed -n '600,620p' crates/vm/src/stdlib/time.rs
sed -n '350,430p' crates/vm/src/stdlib/time.rs

Repository: RustPython/RustPython

Length of output: 4219


🏁 Script executed:

sed -n '440,450p' crates/vm/src/stdlib/time.rs

Repository: RustPython/RustPython

Length of output: 448


Add upper bound validation for tm_wday in checked_tm_from_struct_time.

While asctime_from_tm call sites do pass through checked_tm_from_struct_time validation, the bounds check for tm_wday is incomplete. Line 442 only checks tm.tm_wday < 0 but omits the upper bound—should also validate tm.tm_wday > 6 to prevent panic when indexing WDAY_NAME[7]. Similarly, confirm tm_mon validation correctly enforces [0, 11] range (line 425 looks correct).

Validation gaps in crates/vm/src/stdlib/time.rs

Line 442-444 (incomplete):

if tm.tm_wday < 0 {
    return Err(vm.new_value_error("day of week out of range"));
}

Should be:

if tm.tm_wday < 0 || tm.tm_wday > 6 {
    return Err(vm.new_value_error("day of week out of range"));
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/host_env/src/time.rs` around lines 31 - 45,
checked_tm_from_struct_time currently only checks tm.tm_wday < 0 causing a
potential out-of-bounds when asctime_from_tm indexes WDAY_NAME; update
checked_tm_from_struct_time to validate the full range for tm.tm_wday (>=0 &&
<=6) and ensure tm.tm_mon is validated for 0..=11 as well so both
WDAY_NAME[tm.tm_wday as usize] and MON_NAME[tm.tm_mon as usize] cannot panic;
locate and update the checks around tm.tm_wday and tm.tm_mon in the
checked_tm_from_struct_time function accordingly.

Comment on lines +1671 to +1681
rustpython_host_env::fileutils::fopen(path.as_path(), "rb").map_err(|e| match e
.kind()
{
std::io::ErrorKind::NotFound => vm
.new_os_subtype_error(
vm.ctx.exceptions.file_not_found_error.to_owned(),
Some(libc::ENOENT),
e.to_string(),
)
.upcast(),
_ => vm.new_os_error(e.to_string()),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the current load_dh_params mapping.
fd -a 'openssl\.rs$' . -x sed -n '1666,1682p' {}

# Inspect the generic OSError constructor.
rg -n -C3 --type=rust '\b(pub\s+)?fn new_os_error\b'

# Inspect how std::io::Error is normally translated into Python exceptions.
rg -n -C4 --type=rust 'raw_os_error\(\)|to_pyexception\(.*vm\)|into_pyexception\(.*vm\)|file_not_found_error|permission_error'

Repository: RustPython/RustPython

Length of output: 50378


Replace vm.new_os_error() with e.to_pyexception() in the fallback arm to preserve errno.

The _ => vm.new_os_error(e.to_string()) branch discards the errno code from the std::io::Error, mapping failures like EACCES, EISDIR, and ENOTDIR to a plain OSError instead of their proper subtypes (PermissionError, IsADirectoryError, NotADirectoryError). The codebase uses e.to_pyexception(vm) throughout (_io.rs, nt.rs, _winapi.rs) to preserve this information for correct exception type selection.

Suggested fix
-                    _ => vm.new_os_error(e.to_string()),
+                    _ => e.to_pyexception(vm),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
rustpython_host_env::fileutils::fopen(path.as_path(), "rb").map_err(|e| match e
.kind()
{
std::io::ErrorKind::NotFound => vm
.new_os_subtype_error(
vm.ctx.exceptions.file_not_found_error.to_owned(),
Some(libc::ENOENT),
e.to_string(),
)
.upcast(),
_ => vm.new_os_error(e.to_string()),
rustpython_host_env::fileutils::fopen(path.as_path(), "rb").map_err(|e| match e
.kind()
{
std::io::ErrorKind::NotFound => vm
.new_os_subtype_error(
vm.ctx.exceptions.file_not_found_error.to_owned(),
Some(libc::ENOENT),
e.to_string(),
)
.upcast(),
_ => e.to_pyexception(vm),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/stdlib/src/openssl.rs` around lines 1671 - 1681, The fallback arm of
the map_err on the rustpython_host_env::fileutils::fopen call currently returns
vm.new_os_error(e.to_string()) which drops the original errno and prevents
mapping to specific OSError subtypes; change that fallback to return the
original IO error converted to a Python exception by calling
e.to_pyexception(vm) so errno is preserved (i.e., replace
vm.new_os_error(e.to_string()) with e.to_pyexception(vm) inside the map_err
closure).

Comment on lines +106 to +108
let mut tv = timeout.map(host_select::sec_to_timeval);
let res =
vm.allow_threads(|| host_select::select(nfds, &mut r, &mut w, &mut x, tv.as_mut()));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
sed -n '1,220p' crates/host_env/src/select.rs
printf '\n--- host_select call sites ---\n'
rg -n "host_select::select|use rustpython_host_env::select" crates/stdlib/src/select.rs crates/stdlib/src/socket.rs
printf '\n--- error retrieval in host wrapper ---\n'
rg -n "last_os_error|WSAGetLastError" crates/host_env/src/select.rs

Repository: RustPython/RustPython

Length of output: 4825


Use WSAGetLastError() for WinSock error retrieval on Windows.

The host_select::select() helper in crates/host_env/src/select.rs calls io::Error::last_os_error() on error, which reads GetLastError() instead of WSAGetLastError(). WinSock errors live in a separate error namespace; using the wrong retrieval function will produce incorrect error codes on Windows. This breaks error handling in both select.rs::select() (line 108) and socket.rs::sock_select() (line 2813), affecting the interrupted-retry logic.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/stdlib/src/select.rs` around lines 106 - 108, The select path is
reading the wrong Windows error namespace; update the error retrieval in the
host_select::select() helper (crates/host_env/src/select.rs) to call
WSAGetLastError() on Windows and construct the io::Error from that code instead
of using GetLastError()/io::Error::last_os_error(); ensure the returned Error
codes propagate into select.rs::select() and socket.rs::sock_select() so the
interrupted-retry logic can detect Winsock-specific errors (e.g.,
WSAEINTR/WSAEINPROGRESS) correctly. Use conditional compilation for Windows
targets to call the WinSock API (WSAGetLastError) and fallback to the existing
path on non-Windows platforms so behavior is unchanged elsewhere.

Comment on lines +46 to +54
host_msvcrt::getwch()
}
#[pyfunction]
fn getche() -> Vec<u8> {
let c = unsafe { _getche() };
vec![c as u8]
host_msvcrt::getche()
}
#[pyfunction]
fn getwche() -> String {
let c = unsafe { _getwche() };
char::from_u32(c).unwrap().to_string()
host_msvcrt::getwche()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
sed -n '1,180p' crates/host_env/src/msvcrt.rs
printf '\n--- wide-char wrapper sites ---\n'
rg -n "getwch|getwche|putwch|ungetwch|char::from_u32|as u16|_ungetwch" crates/host_env/src/msvcrt.rs crates/vm/src/stdlib/msvcrt.rs

Repository: RustPython/RustPython

Length of output: 5075


The wide-char wrappers incorrectly model UTF-16 console I/O as Unicode scalars.

crates/host_env/src/msvcrt.rs treats _getwch, _getwche, _putwch, and _ungetwch as operating on Unicode code points via char, but MSVCRT APIs work with UTF-16 code units. _getwch and _getwche return u32 values that char::from_u32() cannot handle when they are UTF-16 surrogate code units (0xD800–0xDFFF). An emoji like 😀 arrives as a surrogate pair; the first half will panic. Similarly, putwch and ungetwch truncate char to u16/u32, losing supplementary plane input entirely.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/msvcrt.rs` around lines 46 - 54, The wide-char wrappers
(getwch, getche, getwche) currently treat MSVCRT UTF-16 code units as Rust char
and can panic on surrogate halves; update the plumbing to handle UTF-16 code
units instead of Unicode scalars: change the host_msvcrt side bindings for
_getwch/_getwche/_putwch/_ungetwch to use u16/u32 (i.e., raw UTF-16 code units),
stop calling char::from_u32 on the returned values, and in the VM wrappers
(getwch, getche, getwche) either return the raw u16 code units (Vec<u16>) or
safely decode sequences using std::char::decode_utf16 to produce a String
without panicking (handling Err by using replacement U+FFFD); likewise, when
sending to putwch/ungetwch, accept UTF-16 code units (u16/u32) rather than
truncating Rust char.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants