Skip to content

Latest commit

 

History

History
494 lines (390 loc) · 20 KB

File metadata and controls

494 lines (390 loc) · 20 KB

Plan: Create rustpython-host_env crate

Context

RustPython controls host OS access via the host_env feature flag, enforced by #[cfg(feature = "host_env")] scattered across hundreds of locations. If a cfg is forgotten, host code leaks into sandbox builds silently.

By isolating host OS API wrappers into a dedicated crate, the crate boundary itself becomes the sandbox guarantee. Key constraint: this crate has zero Python runtime dependency. All Python-level bindings must be added by the consumer (vm/stdlib).

Current State

Already Python-free host abstractions in crates/common/src/:

  • os.rs — errno handling, exit_code, winerror_to_errno, OsStr ffi conversions
  • crt_fd.rs — CRT file descriptor abstraction (Owned/Borrowed types, open/read/write/close)
  • fileutils.rs — fstat, fopen, Windows StatStruct
  • windows.rs — ToWideString, FromWideString traits
  • macros.rssuppress_iph! macro (MSVC invalid parameter handler suppression)

Pure host functions embedded in vm/stdlib modules:

These files mix Python bindings with pure host API calls. The host parts should be extracted:

vm/src/stdlib/posix.rs (2908 lines):

  • set_inheritable(fd, inheritable) — pure nix fcntl wrapper
  • getgroups_impl() — pure libc/nix wrapper
  • get_right_permission(), get_permissions() — pure permission logic
  • 400+ libc constant re-exports (#[pyattr] use libc::*)

vm/src/stdlib/nt.rs (2301 lines):

  • win32_hchmod(), win32_lchmod(), fchmod_impl() — pure Windows API calls (currently return PyResult, should return io::Result)
  • Spawn mode constants, O_* flags

vm/src/stdlib/_signal.rs (729 lines):

  • timeval_to_double(), double_to_timeval(), itimerval_to_tuple() — pure math
  • 30+ signal/timer constants

vm/src/stdlib/time.rs (1616 lines):

  • asctime_from_tm() — pure string formatting
  • get_tz_info() — pure Windows API
  • Time unit constants (SEC_TO_MS, MS_TO_US, etc.)
  • duration_since_system_now() — host clock access (currently takes vm, can return io::Result instead)

vm/src/stdlib/msvcrt.rs:

  • getch(), getwch(), getche(), getwche(), kbhit(), setmode_binary() — all pure host
  • Locking constants (LK_UNLCK, LK_LOCK, etc.)

vm/src/stdlib/_winapi.rs (2180 lines):

  • GetACP(), GetCurrentProcess(), GetLastError(), GetVersion() — pure host
  • 100+ Windows API constants

vm/src/stdlib/os.rs (2395 lines):

  • fs_metadata() — pure std::fs wrapper
  • libc flag constants (O_APPEND, O_CREAT, etc.)

Dependency Graph (After)

rustpython-host_env  (NEW — zero Python dep, independent of common)
├── Dependencies: libc, nix (unix), windows-sys (win), widestring (win), rustpython-wtf8
├── From common: os, crt_fd, fileutils, windows, macros
└── Extracted from vm/stdlib: posix, nt, signal, time, msvcrt, winapi, socket, mmap, ...

rustpython-common  (NO host_env dependency — pure algorithmic code only)
└── cformat, float_ops, hash, int, str, encodings, etc.

rustpython-vm
├── rustpython-common
├── rustpython-host_env (optional, feature = "host_env")
├── libc (retained for type definitions & constants used inline in #[pyattr])
└── Python bindings call host_env for actual OS operations

rustpython-stdlib
├── rustpython-vm, rustpython-common
├── rustpython-host_env (optional, feature = "host_env")
└── libc, nix, socket2, memmap2 (retained for now — future migration target)

common and host_env are fully independent — no dependency in either direction.

Phase 1: Create the crate and move modules from common

Create crates/host_env/, move host modules from common, and update common to re-export.

New files:

crates/host_env/Cargo.toml:

[package]
name = "rustpython-host_env"
description = "Host OS API abstractions for RustPython (zero Python dependency)"
version.workspace = true
edition.workspace = true

[dependencies]
rustpython-wtf8 = { workspace = true }
libc = { workspace = true }
num-traits = { workspace = true }
cfg-if = { workspace = true }

[target.'cfg(unix)'.dependencies]
nix = { workspace = true }

[target.'cfg(windows)'.dependencies]
widestring = { workspace = true }
windows-sys = { workspace = true, features = [
    "Win32_Foundation",
    "Win32_Globalization",
    "Win32_Networking_WinSock",
    "Win32_Storage_FileSystem",
    "Win32_System_Console",
    "Win32_System_Ioctl",
    "Win32_System_LibraryLoader",
    "Win32_System_SystemServices",
    "Win32_System_Time",
] }

crates/host_env/src/lib.rs:

#[macro_use]
mod macros;
pub use macros::*;

pub mod os;

#[cfg(any(unix, windows, target_os = "wasi"))]
pub mod crt_fd;

#[cfg(any(not(target_arch = "wasm32"), target_os = "wasi"))]
pub mod fileutils;

#[cfg(windows)]
pub mod windows;

// New modules — extracted from vm/stdlib (Phase 2)
#[cfg(unix)]
pub mod posix;
#[cfg(windows)]
pub mod nt;
pub mod signal;
pub mod time;
#[cfg(windows)]
pub mod msvcrt;
#[cfg(windows)]
pub mod winapi;

Modules moved from common: os.rs, crt_fd.rs, fileutils.rs, windows.rs, macros.rs

Modified files:

Cargo.toml (workspace root):

  • Add "crates/host_env" to [workspace.members]
  • Add rustpython-host_env = { path = "crates/host_env" } to [workspace.dependencies]

crates/common/Cargo.toml:

  • Remove nix, windows-sys, widestring from direct dependencies
  • Keep libc for type definitions (wchar_t in str.rs)
  • No host_env feature or dependency — common stays purely algorithmic

crates/common/src/lib.rs:

  • Remove pub mod os, pub mod crt_fd, pub mod fileutils, pub mod windows declarations
  • Remove #[macro_use] mod macros and suppress_iph! macro (moved to host_env)
  • Delete the source files: os.rs, crt_fd.rs, fileutils.rs, windows.rs, macros.rs

crates/vm/Cargo.toml:

[features]
host_env = ["rustpython-host_env"]

[dependencies]
rustpython-host_env = { workspace = true, optional = true }

crates/stdlib/Cargo.toml:

[features]
host_env = ["rustpython-vm/host_env", "rustpython-host_env"]

[dependencies]
rustpython-host_env = { workspace = true, optional = true }

Verification:

cargo check -p rustpython-host_env
cargo test
cargo check -p rustpython-vm --no-default-features --features compiler,gc   # sandbox build

Phase 2: Extract host functions from vm/stdlib modules

Extract pure host API functions and constants from vm's stdlib modules into new modules within host_env.

New modules in crates/host_env/src/:

posix.rs — extracted from vm/src/stdlib/posix.rs:

use std::os::fd::BorrowedFd;

pub fn set_inheritable(fd: BorrowedFd<'_>, inheritable: bool) -> nix::Result<()> {
    use nix::fcntl;
    let flags = fcntl::FdFlag::from_bits_truncate(fcntl::fcntl(fd, fcntl::FcntlArg::F_GETFD)?);
    let mut new_flags = flags;
    new_flags.set(fcntl::FdFlag::FD_CLOEXEC, !inheritable);
    if flags != new_flags {
        fcntl::fcntl(fd, fcntl::FcntlArg::F_SETFD(new_flags))?;
    }
    Ok(())
}

pub fn getgroups() -> nix::Result<Vec<nix::unistd::Gid>> { ... }
pub fn get_right_permission(mode: u32, file_owner: Uid, file_group: Gid) -> nix::Result<Permissions> { ... }

nt.rs — extracted from vm/src/stdlib/nt.rs:

pub fn win32_hchmod(handle: HANDLE, mode: u32) -> io::Result<()> { ... }
pub fn win32_lchmod(path: &OsStr, mode: u32) -> io::Result<()> { ... }

signal.rs — extracted from vm/src/stdlib/_signal.rs:

pub fn timeval_to_double(tv: &libc::timeval) -> f64 { ... }
pub fn double_to_timeval(val: f64) -> libc::timeval { ... }
pub fn itimerval_to_tuple(it: &libc::itimerval) -> (f64, f64) { ... }

time.rs — extracted from vm/src/stdlib/time.rs:

pub const SEC_TO_MS: i64 = 1000;
pub const MS_TO_US: i64 = 1000;
// ...

pub fn asctime_from_tm(tm: &libc::tm) -> String { ... }
pub fn duration_since_system_now() -> io::Result<Duration> { ... }

#[cfg(windows)]
pub fn get_tz_info() -> TIME_ZONE_INFORMATION { ... }

msvcrt.rs — extracted from vm/src/stdlib/msvcrt.rs:

pub fn getch() -> Vec<u8> { ... }
pub fn getwch() -> String { ... }
pub fn kbhit() -> i32 { ... }
pub fn setmode_binary(fd: crt_fd::Borrowed<'_>) { ... }

pub const LK_UNLCK: i32 = 0;
pub const LK_LOCK: i32 = 1;
// ...

winapi.rs — extracted from vm/src/stdlib/_winapi.rs:

pub fn get_acp() -> u32 { ... }
pub fn get_current_process() -> HANDLE { ... }
pub fn get_last_error() -> u32 { ... }
pub fn get_version() -> u32 { ... }
// + Windows API constants

Modified vm/stdlib files:

Each file is updated to call rustpython_host_env:: instead of inlining the host calls:

// BEFORE (vm/src/stdlib/posix.rs)
pub fn set_inheritable(fd: BorrowedFd<'_>, inheritable: bool) -> nix::Result<()> {
    use nix::fcntl;
    // ... 10 lines of nix API calls
}

// AFTER (vm/src/stdlib/posix.rs)
pub use rustpython_host_env::posix::set_inheritable;

Phase 3: vm/stdlib import migration

All common::os, common::crt_fd, common::fileutils, common::windows imports must be updated to rustpython_host_env::.

Import migration targets (vm) — ~20 files:

File Current New
ospath.rs rustpython_common::crt_fd rustpython_host_env::crt_fd
stdlib/os.rs common::crt_fd, common::os::* rustpython_host_env::
stdlib/nt.rs common::windows::*, common::crt_fd::* rustpython_host_env::
stdlib/_io.rs common::crt_fd::Offset, common::fileutils::fstat rustpython_host_env::
stdlib/_signal.rs common::crt_fd::*, common::fileutils::fstat rustpython_host_env::
stdlib/posix.rs common::os::*, common::crt_fd::Offset rustpython_host_env::
stdlib/_ctypes/function.rs rustpython_common::os::get_errno rustpython_host_env::os::
stdlib/_codecs.rs common::windows::ToWideString rustpython_host_env::windows::
stdlib/sys.rs, winreg.rs, winsound.rs common::windows::ToWideString rustpython_host_env::windows::
windows.rs rustpython_common::windows::ToWideString rustpython_host_env::windows::
exceptions.rs common::os::ErrorExt, common::os::winerror_to_errno rustpython_host_env::os::

Import migration targets (stdlib) — ~7 files:

File Current New
socket.rs common::os::ErrorExt, common::os::errno_io_error rustpython_host_env::os::
mmap.rs rustpython_common::crt_fd rustpython_host_env::crt_fd
faulthandler.rs rustpython_common::os::{get_errno, set_errno} rustpython_host_env::os::
posixshmem.rs common::os::errno_io_error rustpython_host_env::os::
termios.rs common::os::ErrorExt rustpython_host_env::os::
overlapped.rs crate::vm::common::os::winerror_to_errno rustpython_host_env::os::
openssl.rs rustpython_common::fileutils::fopen rustpython_host_env::fileutils::

External consumers:

File Current New
src/lib.rs rustpython_vm::common::os::exit_code rustpython_host_env::os::exit_code
examples/*.rs vm::common::os::exit_code Keep via re-export

Phase 4 (Future): Extract host functions from stdlib modules

Same pattern as Phase 2, but for crates/stdlib/src/ modules. These modules heavily use libc, nix, socket2, memmap2 directly. Extract the pure host layer into host_env.

Target modules and what goes into host_env:

stdlib module host_env module What to extract
socket.rs (3498 lines) host_env::socket Socket creation, bind, connect, address conversion, cmsg helpers, poll wrappers. Re-export socket2 types.
mmap.rs (1625 lines) host_env::mmap mmap/munmap wrappers, madvise, msync. Re-export memmap2 types.
select.rs (745 lines) host_env::select select/poll/epoll/kqueue wrappers via libc/nix.
posixsubprocess.rs (537 lines) host_env::subprocess fork_exec, pipe, dup2, close-on-exec logic.
multiprocessing.rs (1152 lines) host_env::multiprocessing Semaphore operations (sem_open/wait/post/unlink via libc).
fcntl.rs (220 lines) host_env::fcntl fcntl, ioctl, flock wrappers.
faulthandler.rs (1333 lines) host_env::faulthandler Signal handler registration, stack dump via libc write.
locale.rs (332 lines) host_env::locale strcoll, strxfrm, setlocale wrappers.
resource.rs (194 lines) host_env::resource getrusage, getrlimit, setrlimit wrappers.
grp.rs (103 lines) host_env::grp getgrent/setgrent/endgrent, Group lookup via nix.
syslog.rs (148 lines) host_env::syslog openlog, syslog, closelog, setlogmask wrappers.
posixshmem.rs (52 lines) host_env::shm shm_open, shm_unlink wrappers.
termios.rs (280 lines) host_env::termios Terminal attribute get/set via termios crate.

After this, nix, socket2, memmap2, rustix are removed from stdlib's direct dependencies. Only host_env provides them.

Phase 5: Lint enforcement

Three layers of enforcement, from strongest to lightest:

Layer 1: Crate boundary (compile-time, absolute)

The strongest guarantee. If a crate doesn't list rustpython-host_env in its [dependencies], it physically cannot call any host_env function. This is already enforced by Rust's module system.

Pure crates (no host_env dependency allowed):

  • rustpython-common
  • rustpython-compiler, rustpython-compiler-core, rustpython-compiler-source
  • rustpython-codegen
  • rustpython-literal
  • rustpython-sre_engine
  • rustpython-wtf8
  • rustpython-derive, rustpython-derive-impl

CI check:

# Verify pure crates don't depend on host_env
for crate in common compiler compiler-core compiler-source codegen literal sre_engine wtf8 derive derive-impl; do
  if rg 'rustpython-host_env' "crates/$crate/Cargo.toml"; then
    echo "ERROR: $crate should not depend on host_env"
    exit 1
  fi
done

Layer 2: clippy disallowed_methods (compile-time, configurable)

Block direct host API usage in vm/stdlib. Force all host access through host_env.

Workspace-level clippy.toml (project root):

disallowed-methods = [
    # Filesystem
    { path = "std::fs::read", reason = "use rustpython_host_env for host filesystem access" },
    { path = "std::fs::write", reason = "use rustpython_host_env" },
    { path = "std::fs::read_to_string", reason = "use rustpython_host_env" },
    { path = "std::fs::read_dir", reason = "use rustpython_host_env" },
    { path = "std::fs::create_dir", reason = "use rustpython_host_env" },
    { path = "std::fs::create_dir_all", reason = "use rustpython_host_env" },
    { path = "std::fs::remove_file", reason = "use rustpython_host_env" },
    { path = "std::fs::remove_dir", reason = "use rustpython_host_env" },
    { path = "std::fs::metadata", reason = "use rustpython_host_env" },
    { path = "std::fs::symlink_metadata", reason = "use rustpython_host_env" },
    { path = "std::fs::canonicalize", reason = "use rustpython_host_env" },
    { path = "std::fs::File::open", reason = "use rustpython_host_env" },
    { path = "std::fs::File::create", reason = "use rustpython_host_env" },
    { path = "std::fs::OpenOptions::open", reason = "use rustpython_host_env" },

    # Environment
    { path = "std::env::var", reason = "use rustpython_host_env" },
    { path = "std::env::var_os", reason = "use rustpython_host_env" },
    { path = "std::env::set_var", reason = "use rustpython_host_env" },
    { path = "std::env::remove_var", reason = "use rustpython_host_env" },
    { path = "std::env::vars", reason = "use rustpython_host_env" },
    { path = "std::env::vars_os", reason = "use rustpython_host_env" },
    { path = "std::env::current_dir", reason = "use rustpython_host_env" },
    { path = "std::env::set_current_dir", reason = "use rustpython_host_env" },
    { path = "std::env::temp_dir", reason = "use rustpython_host_env" },

    # Process
    { path = "std::process::Command::new", reason = "use rustpython_host_env" },
    { path = "std::process::exit", reason = "use rustpython_host_env" },
    { path = "std::process::abort", reason = "use rustpython_host_env" },
    { path = "std::process::id", reason = "use rustpython_host_env" },

    # Network
    { path = "std::net::TcpStream::connect", reason = "use rustpython_host_env" },
    { path = "std::net::TcpListener::bind", reason = "use rustpython_host_env" },
    { path = "std::net::UdpSocket::bind", reason = "use rustpython_host_env" },
]

crates/host_env/clippy.toml (overrides — host_env is allowed to use everything):

disallowed-methods = []

Clippy resolves clippy.toml by walking up from the crate directory, so host_env's local config takes precedence over the workspace root.

Workspace Cargo.toml:

[workspace.lints.clippy]
disallowed_methods = "deny"

Layer 3: Sandbox build verification (CI)

Build without host_env feature to catch any code that accidentally compiles without the feature gate:

cargo check -p rustpython-vm --no-default-features --features compiler,gc
cargo check -p rustpython-stdlib --no-default-features --features compiler

Layer 4: Whitelist-based module audit (CI script)

Maintain a whitelist of modules in vm/stdlib that are known to NOT use host_env. Any change that adds a rustpython_host_env import to a whitelisted module triggers CI failure.

# .ci/host_env_whitelist.txt — modules that must stay host-free
# vm modules:
crates/vm/src/stdlib/_abc.rs
crates/vm/src/stdlib/_collections.rs
crates/vm/src/stdlib/_functools.rs
crates/vm/src/stdlib/_operator.rs
crates/vm/src/stdlib/_sre.rs
crates/vm/src/stdlib/_stat.rs
crates/vm/src/stdlib/_string.rs
crates/vm/src/stdlib/errno.rs
crates/vm/src/stdlib/gc.rs
crates/vm/src/stdlib/itertools.rs
crates/vm/src/stdlib/marshal.rs

# Check:
while IFS= read -r file; do
  if rg 'rustpython_host_env' "$file" 2>/dev/null; then
    echo "ERROR: $file is whitelisted as host-free but imports host_env"
    exit 1
  fi
done < .ci/host_env_whitelist.txt

The inverse is also useful — list all files that ARE allowed to use host_env, and reject any new file that uses it without being on the list. This catches accidental host API usage in new modules.

Layer 5: #![no_std] for pure crates

After removing host modules from common, it could potentially become #![no_std] unconditionally (it already has #![cfg_attr(not(feature = "std"), no_std)]). This is the strongest possible guarantee — no std::fs, std::env, std::net, std::process available at all.

Candidate crates for unconditional #![no_std]:

  • rustpython-literal
  • rustpython-wtf8
  • rustpython-compiler-source

Summary of enforcement layers

Layer What it catches Strength Cost
Crate boundary Missing host_env dependency Absolute — compile error Zero — automatic
clippy disallowed_methods Direct std::fs/env/net usage Strong — clippy deny Low — clippy.toml config
Sandbox build Missing #[cfg(feature = "host_env")] Strong — compile error Low — CI job
Module whitelist Unintended host_env usage in pure modules Medium — CI script Low — maintain whitelist
#![no_std] Any std usage in pure crates Absolute — compile error Medium — may need refactoring

Risk Assessment

Risk Level Mitigation
Target modules have Python type dependencies Low Verified: only libc, nix, windows-sys, rustpython-wtf8
Internal cross-references break on move Low crt_fd, os, fileutils, windows all move together; crate:: paths stay valid
suppress_iph! macro $crate resolution Medium $crate automatically resolves to new crate; __macro_private moves alongside
Breaking external consumers Medium Clean break — consumers must update common::os to host_env::os. No re-export shim.
Scope of Phase 2 extraction Medium Start with clearly pure functions; mixed functions can be migrated incrementally