# Plan: Create `rustpython-host_env` crate ## Context RustPython controls host OS access via the `host_env` feature flag, enforced by `#[cfg(feature = "host_env")]` scattered across hundreds of locations. If a `cfg` is forgotten, host code leaks into sandbox builds silently. By isolating host OS API wrappers into a dedicated crate, **the crate boundary itself becomes the sandbox guarantee**. Key constraint: this crate has **zero Python runtime dependency**. All Python-level bindings must be added by the consumer (vm/stdlib). ## Current State ### Already Python-free host abstractions in `crates/common/src/`: - `os.rs` — errno handling, exit_code, winerror_to_errno, OsStr ffi conversions - `crt_fd.rs` — CRT file descriptor abstraction (Owned/Borrowed types, open/read/write/close) - `fileutils.rs` — fstat, fopen, Windows StatStruct - `windows.rs` — ToWideString, FromWideString traits - `macros.rs` — `suppress_iph!` macro (MSVC invalid parameter handler suppression) ### Pure host functions embedded in vm/stdlib modules: These files mix Python bindings with pure host API calls. The host parts should be extracted: **`vm/src/stdlib/posix.rs`** (2908 lines): - `set_inheritable(fd, inheritable)` — pure nix fcntl wrapper - `getgroups_impl()` — pure libc/nix wrapper - `get_right_permission()`, `get_permissions()` — pure permission logic - 400+ libc constant re-exports (`#[pyattr] use libc::*`) **`vm/src/stdlib/nt.rs`** (2301 lines): - `win32_hchmod()`, `win32_lchmod()`, `fchmod_impl()` — pure Windows API calls (currently return PyResult, should return io::Result) - Spawn mode constants, `O_*` flags **`vm/src/stdlib/_signal.rs`** (729 lines): - `timeval_to_double()`, `double_to_timeval()`, `itimerval_to_tuple()` — pure math - 30+ signal/timer constants **`vm/src/stdlib/time.rs`** (1616 lines): - `asctime_from_tm()` — pure string formatting - `get_tz_info()` — pure Windows API - Time unit constants (`SEC_TO_MS`, `MS_TO_US`, etc.) - `duration_since_system_now()` — host clock access (currently takes vm, can return io::Result instead) **`vm/src/stdlib/msvcrt.rs`**: - `getch()`, `getwch()`, `getche()`, `getwche()`, `kbhit()`, `setmode_binary()` — all pure host - Locking constants (`LK_UNLCK`, `LK_LOCK`, etc.) **`vm/src/stdlib/_winapi.rs`** (2180 lines): - `GetACP()`, `GetCurrentProcess()`, `GetLastError()`, `GetVersion()` — pure host - 100+ Windows API constants **`vm/src/stdlib/os.rs`** (2395 lines): - `fs_metadata()` — pure `std::fs` wrapper - libc flag constants (`O_APPEND`, `O_CREAT`, etc.) ## Dependency Graph (After) ``` rustpython-host_env (NEW — zero Python dep, independent of common) ├── Dependencies: libc, nix (unix), windows-sys (win), widestring (win), rustpython-wtf8 ├── From common: os, crt_fd, fileutils, windows, macros └── Extracted from vm/stdlib: posix, nt, signal, time, msvcrt, winapi, socket, mmap, ... rustpython-common (NO host_env dependency — pure algorithmic code only) └── cformat, float_ops, hash, int, str, encodings, etc. rustpython-vm ├── rustpython-common ├── rustpython-host_env (optional, feature = "host_env") ├── libc (retained for type definitions & constants used inline in #[pyattr]) └── Python bindings call host_env for actual OS operations rustpython-stdlib ├── rustpython-vm, rustpython-common ├── rustpython-host_env (optional, feature = "host_env") └── libc, nix, socket2, memmap2 (retained for now — future migration target) ``` `common` and `host_env` are fully independent — no dependency in either direction. ## Phase 1: Create the crate and move modules from common Create `crates/host_env/`, **move** host modules from common, and update common to re-export. ### New files: **`crates/host_env/Cargo.toml`:** ```toml [package] name = "rustpython-host_env" description = "Host OS API abstractions for RustPython (zero Python dependency)" version.workspace = true edition.workspace = true [dependencies] rustpython-wtf8 = { workspace = true } libc = { workspace = true } num-traits = { workspace = true } cfg-if = { workspace = true } [target.'cfg(unix)'.dependencies] nix = { workspace = true } [target.'cfg(windows)'.dependencies] widestring = { workspace = true } windows-sys = { workspace = true, features = [ "Win32_Foundation", "Win32_Globalization", "Win32_Networking_WinSock", "Win32_Storage_FileSystem", "Win32_System_Console", "Win32_System_Ioctl", "Win32_System_LibraryLoader", "Win32_System_SystemServices", "Win32_System_Time", ] } ``` **`crates/host_env/src/lib.rs`:** ```rust #[macro_use] mod macros; pub use macros::*; pub mod os; #[cfg(any(unix, windows, target_os = "wasi"))] pub mod crt_fd; #[cfg(any(not(target_arch = "wasm32"), target_os = "wasi"))] pub mod fileutils; #[cfg(windows)] pub mod windows; // New modules — extracted from vm/stdlib (Phase 2) #[cfg(unix)] pub mod posix; #[cfg(windows)] pub mod nt; pub mod signal; pub mod time; #[cfg(windows)] pub mod msvcrt; #[cfg(windows)] pub mod winapi; ``` **Modules moved from common**: `os.rs`, `crt_fd.rs`, `fileutils.rs`, `windows.rs`, `macros.rs` ### Modified files: **`Cargo.toml` (workspace root):** - Add `"crates/host_env"` to `[workspace.members]` - Add `rustpython-host_env = { path = "crates/host_env" }` to `[workspace.dependencies]` **`crates/common/Cargo.toml`:** - Remove `nix`, `windows-sys`, `widestring` from direct dependencies - Keep `libc` for type definitions (`wchar_t` in `str.rs`) - No `host_env` feature or dependency — common stays purely algorithmic **`crates/common/src/lib.rs`:** - Remove `pub mod os`, `pub mod crt_fd`, `pub mod fileutils`, `pub mod windows` declarations - Remove `#[macro_use] mod macros` and `suppress_iph!` macro (moved to host_env) - Delete the source files: `os.rs`, `crt_fd.rs`, `fileutils.rs`, `windows.rs`, `macros.rs` **`crates/vm/Cargo.toml`:** ```toml [features] host_env = ["rustpython-host_env"] [dependencies] rustpython-host_env = { workspace = true, optional = true } ``` **`crates/stdlib/Cargo.toml`:** ```toml [features] host_env = ["rustpython-vm/host_env", "rustpython-host_env"] [dependencies] rustpython-host_env = { workspace = true, optional = true } ``` ### Verification: ```bash cargo check -p rustpython-host_env cargo test cargo check -p rustpython-vm --no-default-features --features compiler,gc # sandbox build ``` ## Phase 2: Extract host functions from vm/stdlib modules Extract pure host API functions and constants from vm's stdlib modules into new modules within `host_env`. ### New modules in `crates/host_env/src/`: **`posix.rs`** — extracted from `vm/src/stdlib/posix.rs`: ```rust use std::os::fd::BorrowedFd; pub fn set_inheritable(fd: BorrowedFd<'_>, inheritable: bool) -> nix::Result<()> { use nix::fcntl; let flags = fcntl::FdFlag::from_bits_truncate(fcntl::fcntl(fd, fcntl::FcntlArg::F_GETFD)?); let mut new_flags = flags; new_flags.set(fcntl::FdFlag::FD_CLOEXEC, !inheritable); if flags != new_flags { fcntl::fcntl(fd, fcntl::FcntlArg::F_SETFD(new_flags))?; } Ok(()) } pub fn getgroups() -> nix::Result> { ... } pub fn get_right_permission(mode: u32, file_owner: Uid, file_group: Gid) -> nix::Result { ... } ``` **`nt.rs`** — extracted from `vm/src/stdlib/nt.rs`: ```rust pub fn win32_hchmod(handle: HANDLE, mode: u32) -> io::Result<()> { ... } pub fn win32_lchmod(path: &OsStr, mode: u32) -> io::Result<()> { ... } ``` **`signal.rs`** — extracted from `vm/src/stdlib/_signal.rs`: ```rust pub fn timeval_to_double(tv: &libc::timeval) -> f64 { ... } pub fn double_to_timeval(val: f64) -> libc::timeval { ... } pub fn itimerval_to_tuple(it: &libc::itimerval) -> (f64, f64) { ... } ``` **`time.rs`** — extracted from `vm/src/stdlib/time.rs`: ```rust pub const SEC_TO_MS: i64 = 1000; pub const MS_TO_US: i64 = 1000; // ... pub fn asctime_from_tm(tm: &libc::tm) -> String { ... } pub fn duration_since_system_now() -> io::Result { ... } #[cfg(windows)] pub fn get_tz_info() -> TIME_ZONE_INFORMATION { ... } ``` **`msvcrt.rs`** — extracted from `vm/src/stdlib/msvcrt.rs`: ```rust pub fn getch() -> Vec { ... } pub fn getwch() -> String { ... } pub fn kbhit() -> i32 { ... } pub fn setmode_binary(fd: crt_fd::Borrowed<'_>) { ... } pub const LK_UNLCK: i32 = 0; pub const LK_LOCK: i32 = 1; // ... ``` **`winapi.rs`** — extracted from `vm/src/stdlib/_winapi.rs`: ```rust pub fn get_acp() -> u32 { ... } pub fn get_current_process() -> HANDLE { ... } pub fn get_last_error() -> u32 { ... } pub fn get_version() -> u32 { ... } // + Windows API constants ``` ### Modified vm/stdlib files: Each file is updated to call `rustpython_host_env::` instead of inlining the host calls: ```rust // BEFORE (vm/src/stdlib/posix.rs) pub fn set_inheritable(fd: BorrowedFd<'_>, inheritable: bool) -> nix::Result<()> { use nix::fcntl; // ... 10 lines of nix API calls } // AFTER (vm/src/stdlib/posix.rs) pub use rustpython_host_env::posix::set_inheritable; ``` ## Phase 3: vm/stdlib import migration All `common::os`, `common::crt_fd`, `common::fileutils`, `common::windows` imports must be updated to `rustpython_host_env::`. ### Import migration targets (vm) — ~20 files: | File | Current | New | |------|---------|-----| | `ospath.rs` | `rustpython_common::crt_fd` | `rustpython_host_env::crt_fd` | | `stdlib/os.rs` | `common::crt_fd`, `common::os::*` | `rustpython_host_env::` | | `stdlib/nt.rs` | `common::windows::*`, `common::crt_fd::*` | `rustpython_host_env::` | | `stdlib/_io.rs` | `common::crt_fd::Offset`, `common::fileutils::fstat` | `rustpython_host_env::` | | `stdlib/_signal.rs` | `common::crt_fd::*`, `common::fileutils::fstat` | `rustpython_host_env::` | | `stdlib/posix.rs` | `common::os::*`, `common::crt_fd::Offset` | `rustpython_host_env::` | | `stdlib/_ctypes/function.rs` | `rustpython_common::os::get_errno` | `rustpython_host_env::os::` | | `stdlib/_codecs.rs` | `common::windows::ToWideString` | `rustpython_host_env::windows::` | | `stdlib/sys.rs`, `winreg.rs`, `winsound.rs` | `common::windows::ToWideString` | `rustpython_host_env::windows::` | | `windows.rs` | `rustpython_common::windows::ToWideString` | `rustpython_host_env::windows::` | | `exceptions.rs` | `common::os::ErrorExt`, `common::os::winerror_to_errno` | `rustpython_host_env::os::` | ### Import migration targets (stdlib) — ~7 files: | File | Current | New | |------|---------|-----| | `socket.rs` | `common::os::ErrorExt`, `common::os::errno_io_error` | `rustpython_host_env::os::` | | `mmap.rs` | `rustpython_common::crt_fd` | `rustpython_host_env::crt_fd` | | `faulthandler.rs` | `rustpython_common::os::{get_errno, set_errno}` | `rustpython_host_env::os::` | | `posixshmem.rs` | `common::os::errno_io_error` | `rustpython_host_env::os::` | | `termios.rs` | `common::os::ErrorExt` | `rustpython_host_env::os::` | | `overlapped.rs` | `crate::vm::common::os::winerror_to_errno` | `rustpython_host_env::os::` | | `openssl.rs` | `rustpython_common::fileutils::fopen` | `rustpython_host_env::fileutils::` | ### External consumers: | File | Current | New | |------|---------|-----| | `src/lib.rs` | `rustpython_vm::common::os::exit_code` | `rustpython_host_env::os::exit_code` | | `examples/*.rs` | `vm::common::os::exit_code` | Keep via re-export | ## Phase 4 (Future): Extract host functions from stdlib modules Same pattern as Phase 2, but for `crates/stdlib/src/` modules. These modules heavily use `libc`, `nix`, `socket2`, `memmap2` directly. Extract the pure host layer into `host_env`. **Target modules and what goes into host_env:** | stdlib module | host_env module | What to extract | |---------------|----------------|-----------------| | `socket.rs` (3498 lines) | `host_env::socket` | Socket creation, bind, connect, address conversion, cmsg helpers, poll wrappers. Re-export `socket2` types. | | `mmap.rs` (1625 lines) | `host_env::mmap` | mmap/munmap wrappers, madvise, msync. Re-export `memmap2` types. | | `select.rs` (745 lines) | `host_env::select` | select/poll/epoll/kqueue wrappers via libc/nix. | | `posixsubprocess.rs` (537 lines) | `host_env::subprocess` | fork_exec, pipe, dup2, close-on-exec logic. | | `multiprocessing.rs` (1152 lines) | `host_env::multiprocessing` | Semaphore operations (sem_open/wait/post/unlink via libc). | | `fcntl.rs` (220 lines) | `host_env::fcntl` | fcntl, ioctl, flock wrappers. | | `faulthandler.rs` (1333 lines) | `host_env::faulthandler` | Signal handler registration, stack dump via libc write. | | `locale.rs` (332 lines) | `host_env::locale` | strcoll, strxfrm, setlocale wrappers. | | `resource.rs` (194 lines) | `host_env::resource` | getrusage, getrlimit, setrlimit wrappers. | | `grp.rs` (103 lines) | `host_env::grp` | getgrent/setgrent/endgrent, Group lookup via nix. | | `syslog.rs` (148 lines) | `host_env::syslog` | openlog, syslog, closelog, setlogmask wrappers. | | `posixshmem.rs` (52 lines) | `host_env::shm` | shm_open, shm_unlink wrappers. | | `termios.rs` (280 lines) | `host_env::termios` | Terminal attribute get/set via termios crate. | After this, `nix`, `socket2`, `memmap2`, `rustix` are removed from stdlib's direct dependencies. Only `host_env` provides them. ## Phase 5: Lint enforcement Three layers of enforcement, from strongest to lightest: ### Layer 1: Crate boundary (compile-time, absolute) The strongest guarantee. If a crate doesn't list `rustpython-host_env` in its `[dependencies]`, it physically cannot call any host_env function. This is already enforced by Rust's module system. **Pure crates (no host_env dependency allowed):** - `rustpython-common` - `rustpython-compiler`, `rustpython-compiler-core`, `rustpython-compiler-source` - `rustpython-codegen` - `rustpython-literal` - `rustpython-sre_engine` - `rustpython-wtf8` - `rustpython-derive`, `rustpython-derive-impl` CI check: ```bash # Verify pure crates don't depend on host_env for crate in common compiler compiler-core compiler-source codegen literal sre_engine wtf8 derive derive-impl; do if rg 'rustpython-host_env' "crates/$crate/Cargo.toml"; then echo "ERROR: $crate should not depend on host_env" exit 1 fi done ``` ### Layer 2: clippy disallowed_methods (compile-time, configurable) Block direct host API usage in vm/stdlib. Force all host access through `host_env`. **Workspace-level `clippy.toml`** (project root): ```toml disallowed-methods = [ # Filesystem { path = "std::fs::read", reason = "use rustpython_host_env for host filesystem access" }, { path = "std::fs::write", reason = "use rustpython_host_env" }, { path = "std::fs::read_to_string", reason = "use rustpython_host_env" }, { path = "std::fs::read_dir", reason = "use rustpython_host_env" }, { path = "std::fs::create_dir", reason = "use rustpython_host_env" }, { path = "std::fs::create_dir_all", reason = "use rustpython_host_env" }, { path = "std::fs::remove_file", reason = "use rustpython_host_env" }, { path = "std::fs::remove_dir", reason = "use rustpython_host_env" }, { path = "std::fs::metadata", reason = "use rustpython_host_env" }, { path = "std::fs::symlink_metadata", reason = "use rustpython_host_env" }, { path = "std::fs::canonicalize", reason = "use rustpython_host_env" }, { path = "std::fs::File::open", reason = "use rustpython_host_env" }, { path = "std::fs::File::create", reason = "use rustpython_host_env" }, { path = "std::fs::OpenOptions::open", reason = "use rustpython_host_env" }, # Environment { path = "std::env::var", reason = "use rustpython_host_env" }, { path = "std::env::var_os", reason = "use rustpython_host_env" }, { path = "std::env::set_var", reason = "use rustpython_host_env" }, { path = "std::env::remove_var", reason = "use rustpython_host_env" }, { path = "std::env::vars", reason = "use rustpython_host_env" }, { path = "std::env::vars_os", reason = "use rustpython_host_env" }, { path = "std::env::current_dir", reason = "use rustpython_host_env" }, { path = "std::env::set_current_dir", reason = "use rustpython_host_env" }, { path = "std::env::temp_dir", reason = "use rustpython_host_env" }, # Process { path = "std::process::Command::new", reason = "use rustpython_host_env" }, { path = "std::process::exit", reason = "use rustpython_host_env" }, { path = "std::process::abort", reason = "use rustpython_host_env" }, { path = "std::process::id", reason = "use rustpython_host_env" }, # Network { path = "std::net::TcpStream::connect", reason = "use rustpython_host_env" }, { path = "std::net::TcpListener::bind", reason = "use rustpython_host_env" }, { path = "std::net::UdpSocket::bind", reason = "use rustpython_host_env" }, ] ``` **`crates/host_env/clippy.toml`** (overrides — host_env is allowed to use everything): ```toml disallowed-methods = [] ``` Clippy resolves `clippy.toml` by walking up from the crate directory, so `host_env`'s local config takes precedence over the workspace root. **Workspace `Cargo.toml`:** ```toml [workspace.lints.clippy] disallowed_methods = "deny" ``` ### Layer 3: Sandbox build verification (CI) Build without `host_env` feature to catch any code that accidentally compiles without the feature gate: ```bash cargo check -p rustpython-vm --no-default-features --features compiler,gc cargo check -p rustpython-stdlib --no-default-features --features compiler ``` ### Layer 4: Whitelist-based module audit (CI script) Maintain a whitelist of modules in vm/stdlib that are known to NOT use host_env. Any change that adds a `rustpython_host_env` import to a whitelisted module triggers CI failure. ```bash # .ci/host_env_whitelist.txt — modules that must stay host-free # vm modules: crates/vm/src/stdlib/_abc.rs crates/vm/src/stdlib/_collections.rs crates/vm/src/stdlib/_functools.rs crates/vm/src/stdlib/_operator.rs crates/vm/src/stdlib/_sre.rs crates/vm/src/stdlib/_stat.rs crates/vm/src/stdlib/_string.rs crates/vm/src/stdlib/errno.rs crates/vm/src/stdlib/gc.rs crates/vm/src/stdlib/itertools.rs crates/vm/src/stdlib/marshal.rs # Check: while IFS= read -r file; do if rg 'rustpython_host_env' "$file" 2>/dev/null; then echo "ERROR: $file is whitelisted as host-free but imports host_env" exit 1 fi done < .ci/host_env_whitelist.txt ``` The inverse is also useful — list all files that ARE allowed to use host_env, and reject any new file that uses it without being on the list. This catches accidental host API usage in new modules. ### Layer 5: `#![no_std]` for pure crates After removing host modules from `common`, it could potentially become `#![no_std]` unconditionally (it already has `#![cfg_attr(not(feature = "std"), no_std)]`). This is the strongest possible guarantee — no `std::fs`, `std::env`, `std::net`, `std::process` available at all. Candidate crates for unconditional `#![no_std]`: - `rustpython-literal` - `rustpython-wtf8` - `rustpython-compiler-source` ### Summary of enforcement layers | Layer | What it catches | Strength | Cost | |-------|----------------|----------|------| | Crate boundary | Missing host_env dependency | Absolute — compile error | Zero — automatic | | clippy disallowed_methods | Direct std::fs/env/net usage | Strong — clippy deny | Low — clippy.toml config | | Sandbox build | Missing `#[cfg(feature = "host_env")]` | Strong — compile error | Low — CI job | | Module whitelist | Unintended host_env usage in pure modules | Medium — CI script | Low — maintain whitelist | | `#![no_std]` | Any std usage in pure crates | Absolute — compile error | Medium — may need refactoring | ## Risk Assessment | Risk | Level | Mitigation | |------|-------|------------| | Target modules have Python type dependencies | **Low** | Verified: only `libc`, `nix`, `windows-sys`, `rustpython-wtf8` | | Internal cross-references break on move | **Low** | `crt_fd`, `os`, `fileutils`, `windows` all move together; `crate::` paths stay valid | | `suppress_iph!` macro `$crate` resolution | **Medium** | `$crate` automatically resolves to new crate; `__macro_private` moves alongside | | Breaking external consumers | **Medium** | Clean break — consumers must update `common::os` to `host_env::os`. No re-export shim. | | Scope of Phase 2 extraction | **Medium** | Start with clearly pure functions; mixed functions can be migrated incrementally |