Skip to content

rustpython-host_env for better sandbox support? #7575

@youknowone

Description

@youknowone

Summary

If we fully isolate host_env support code to a crate, will that be easier to maintain sandbox feature with lints? I guess so. Worth to do? not sure.

Plan: Create rustpython-host-env crate

Context

RustPython controls host OS access via the host_env feature flag, enforced by #[cfg(feature = "host_env")] scattered across hundreds of locations. If a cfg is forgotten, host code leaks into sandbox builds silently.

By isolating host OS API wrappers into a dedicated crate, the crate boundary itself becomes the sandbox guarantee. Key constraint: this crate has zero Python runtime dependency. All Python-level bindings must be added by the consumer (vm/stdlib).

Current State

Already Python-free host abstractions in crates/common/src/:

  • os.rs — errno handling, exit_code, winerror_to_errno, OsStr ffi conversions
  • crt_fd.rs — CRT file descriptor abstraction (Owned/Borrowed types, open/read/write/close)
  • fileutils.rs — fstat, fopen, Windows StatStruct
  • windows.rs — ToWideString, FromWideString traits
  • macros.rssuppress_iph! macro (MSVC invalid parameter handler suppression)

Pure host functions embedded in vm/stdlib modules:

These files mix Python bindings with pure host API calls. The host parts should be extracted:

vm/src/stdlib/posix.rs (2908 lines):

  • set_inheritable(fd, inheritable) — pure nix fcntl wrapper
  • getgroups_impl() — pure libc/nix wrapper
  • get_right_permission(), get_permissions() — pure permission logic
  • 400+ libc constant re-exports (#[pyattr] use libc::*)

vm/src/stdlib/nt.rs (2301 lines):

  • win32_hchmod(), win32_lchmod(), fchmod_impl() — pure Windows API calls (currently return PyResult, should return io::Result)
  • Spawn mode constants, O_* flags

vm/src/stdlib/_signal.rs (729 lines):

  • timeval_to_double(), double_to_timeval(), itimerval_to_tuple() — pure math
  • 30+ signal/timer constants

vm/src/stdlib/time.rs (1616 lines):

  • asctime_from_tm() — pure string formatting
  • get_tz_info() — pure Windows API
  • Time unit constants (SEC_TO_MS, MS_TO_US, etc.)
  • duration_since_system_now() — host clock access (currently takes vm, can return io::Result instead)

vm/src/stdlib/msvcrt.rs:

  • getch(), getwch(), getche(), getwche(), kbhit(), setmode_binary() — all pure host
  • Locking constants (LK_UNLCK, LK_LOCK, etc.)

vm/src/stdlib/_winapi.rs (2180 lines):

  • GetACP(), GetCurrentProcess(), GetLastError(), GetVersion() — pure host
  • 100+ Windows API constants

vm/src/stdlib/os.rs (2395 lines):

  • fs_metadata() — pure std::fs wrapper
  • libc flag constants (O_APPEND, O_CREAT, etc.)

Dependency Graph (After)

rustpython-host-env  (NEW — zero Python dep, independent of common)
├── Dependencies: libc, nix (unix), windows-sys (win), widestring (win), rustpython-wtf8
├── From common: os, crt_fd, fileutils, windows, macros
└── Extracted from vm/stdlib: posix, nt, signal, time, msvcrt, winapi, socket, mmap, ...

rustpython-common  (NO host_env dependency — pure algorithmic code only)
└── cformat, float_ops, hash, int, str, encodings, etc.

rustpython-vm
├── rustpython-common
├── rustpython-host-env (optional, feature = "host_env")
├── libc (retained for type definitions & constants used inline in #[pyattr])
└── Python bindings call host_env for actual OS operations

rustpython-stdlib
├── rustpython-vm, rustpython-common
├── rustpython-host-env (optional, feature = "host_env")
└── libc, nix, socket2, memmap2 (retained for now — future migration target)

common and host_env are fully independent — no dependency in either direction.

Phase 1: Create the crate and move modules from common

Create crates/host_env/, move host modules from common, and update common to re-export.

New files:

crates/host_env/Cargo.toml:

[package]
name = "rustpython-host-env"
description = "Host OS API abstractions for RustPython (zero Python dependency)"
version.workspace = true
edition.workspace = true

[dependencies]
rustpython-wtf8 = { workspace = true }
libc = { workspace = true }
num-traits = { workspace = true }
cfg-if = { workspace = true }

[target.'cfg(unix)'.dependencies]
nix = { workspace = true }

[target.'cfg(windows)'.dependencies]
widestring = { workspace = true }
windows-sys = { workspace = true, features = [
    "Win32_Foundation",
    "Win32_Globalization",
    "Win32_Networking_WinSock",
    "Win32_Storage_FileSystem",
    "Win32_System_Console",
    "Win32_System_Ioctl",
    "Win32_System_LibraryLoader",
    "Win32_System_SystemServices",
    "Win32_System_Time",
] }

crates/host_env/src/lib.rs:

#[macro_use]
mod macros;
pub use macros::*;

pub mod os;

#[cfg(any(unix, windows, target_os = "wasi"))]
pub mod crt_fd;

#[cfg(any(not(target_arch = "wasm32"), target_os = "wasi"))]
pub mod fileutils;

#[cfg(windows)]
pub mod windows;

// New modules — extracted from vm/stdlib (Phase 2)
#[cfg(unix)]
pub mod posix;
#[cfg(windows)]
pub mod nt;
pub mod signal;
pub mod time;
#[cfg(windows)]
pub mod msvcrt;
#[cfg(windows)]
pub mod winapi;

Modules moved from common: os.rs, crt_fd.rs, fileutils.rs, windows.rs, macros.rs

Modified files:

Cargo.toml (workspace root):

  • Add "crates/host_env" to [workspace.members]
  • Add rustpython-host-env = { path = "crates/host_env" } to [workspace.dependencies]

crates/common/Cargo.toml:

  • Remove nix, windows-sys, widestring from direct dependencies
  • Keep libc for type definitions (wchar_t in str.rs)
  • No host_env feature or dependency — common stays purely algorithmic

crates/common/src/lib.rs:

  • Remove pub mod os, pub mod crt_fd, pub mod fileutils, pub mod windows declarations
  • Remove #[macro_use] mod macros and suppress_iph! macro (moved to host_env)
  • Delete the source files: os.rs, crt_fd.rs, fileutils.rs, windows.rs, macros.rs

crates/vm/Cargo.toml:

[features]
host_env = ["rustpython-host-env"]

[dependencies]
rustpython-host-env = { workspace = true, optional = true }

crates/stdlib/Cargo.toml:

[features]
host_env = ["rustpython-vm/host_env", "rustpython-host-env"]

[dependencies]
rustpython-host-env = { workspace = true, optional = true }

Verification:

cargo check -p rustpython-host-env
cargo test
cargo check -p rustpython-vm --no-default-features --features compiler,gc   # sandbox build

Phase 2: Extract host functions from vm/stdlib modules

Extract pure host API functions and constants from vm's stdlib modules into new modules within host_env.

New modules in crates/host_env/src/:

posix.rs — extracted from vm/src/stdlib/posix.rs:

use std::os::fd::BorrowedFd;

pub fn set_inheritable(fd: BorrowedFd<'_>, inheritable: bool) -> nix::Result<()> {
    use nix::fcntl;
    let flags = fcntl::FdFlag::from_bits_truncate(fcntl::fcntl(fd, fcntl::FcntlArg::F_GETFD)?);
    let mut new_flags = flags;
    new_flags.set(fcntl::FdFlag::FD_CLOEXEC, !inheritable);
    if flags != new_flags {
        fcntl::fcntl(fd, fcntl::FcntlArg::F_SETFD(new_flags))?;
    }
    Ok(())
}

pub fn getgroups() -> nix::Result<Vec<nix::unistd::Gid>> { ... }
pub fn get_right_permission(mode: u32, file_owner: Uid, file_group: Gid) -> nix::Result<Permissions> { ... }

nt.rs — extracted from vm/src/stdlib/nt.rs:

pub fn win32_hchmod(handle: HANDLE, mode: u32) -> io::Result<()> { ... }
pub fn win32_lchmod(path: &OsStr, mode: u32) -> io::Result<()> { ... }

signal.rs — extracted from vm/src/stdlib/_signal.rs:

pub fn timeval_to_double(tv: &libc::timeval) -> f64 { ... }
pub fn double_to_timeval(val: f64) -> libc::timeval { ... }
pub fn itimerval_to_tuple(it: &libc::itimerval) -> (f64, f64) { ... }

time.rs — extracted from vm/src/stdlib/time.rs:

pub const SEC_TO_MS: i64 = 1000;
pub const MS_TO_US: i64 = 1000;
// ...

pub fn asctime_from_tm(tm: &libc::tm) -> String { ... }
pub fn duration_since_system_now() -> io::Result<Duration> { ... }

#[cfg(windows)]
pub fn get_tz_info() -> TIME_ZONE_INFORMATION { ... }

msvcrt.rs — extracted from vm/src/stdlib/msvcrt.rs:

pub fn getch() -> Vec<u8> { ... }
pub fn getwch() -> String { ... }
pub fn kbhit() -> i32 { ... }
pub fn setmode_binary(fd: crt_fd::Borrowed<'_>) { ... }

pub const LK_UNLCK: i32 = 0;
pub const LK_LOCK: i32 = 1;
// ...

winapi.rs — extracted from vm/src/stdlib/_winapi.rs:

pub fn get_acp() -> u32 { ... }
pub fn get_current_process() -> HANDLE { ... }
pub fn get_last_error() -> u32 { ... }
pub fn get_version() -> u32 { ... }
// + Windows API constants

Modified vm/stdlib files:

Each file is updated to call rustpython_host_env:: instead of inlining the host calls:

// BEFORE (vm/src/stdlib/posix.rs)
pub fn set_inheritable(fd: BorrowedFd<'_>, inheritable: bool) -> nix::Result<()> {
    use nix::fcntl;
    // ... 10 lines of nix API calls
}

// AFTER (vm/src/stdlib/posix.rs)
pub use rustpython_host_env::posix::set_inheritable;

Phase 3: vm/stdlib import migration

All common::os, common::crt_fd, common::fileutils, common::windows imports must be updated to rustpython_host_env::.

Import migration targets (vm) — ~20 files:

File Current New
ospath.rs rustpython_common::crt_fd rustpython_host_env::crt_fd
stdlib/os.rs common::crt_fd, common::os::* rustpython_host_env::
stdlib/nt.rs common::windows::*, common::crt_fd::* rustpython_host_env::
stdlib/_io.rs common::crt_fd::Offset, common::fileutils::fstat rustpython_host_env::
stdlib/_signal.rs common::crt_fd::*, common::fileutils::fstat rustpython_host_env::
stdlib/posix.rs common::os::*, common::crt_fd::Offset rustpython_host_env::
stdlib/_ctypes/function.rs rustpython_common::os::get_errno rustpython_host_env::os::
stdlib/_codecs.rs common::windows::ToWideString rustpython_host_env::windows::
stdlib/sys.rs, winreg.rs, winsound.rs common::windows::ToWideString rustpython_host_env::windows::
windows.rs rustpython_common::windows::ToWideString rustpython_host_env::windows::
exceptions.rs common::os::ErrorExt, common::os::winerror_to_errno rustpython_host_env::os::

Import migration targets (stdlib) — ~7 files:

File Current New
socket.rs common::os::ErrorExt, common::os::errno_io_error rustpython_host_env::os::
mmap.rs rustpython_common::crt_fd rustpython_host_env::crt_fd
faulthandler.rs rustpython_common::os::{get_errno, set_errno} rustpython_host_env::os::
posixshmem.rs common::os::errno_io_error rustpython_host_env::os::
termios.rs common::os::ErrorExt rustpython_host_env::os::
overlapped.rs crate::vm::common::os::winerror_to_errno rustpython_host_env::os::
openssl.rs rustpython_common::fileutils::fopen rustpython_host_env::fileutils::

External consumers:

File Current New
src/lib.rs rustpython_vm::common::os::exit_code rustpython_host_env::os::exit_code
examples/*.rs vm::common::os::exit_code Keep via re-export

Phase 4 (Future): Extract host functions from stdlib modules

Same pattern as Phase 2, but for crates/stdlib/src/ modules. These modules heavily use libc, nix, socket2, memmap2 directly. Extract the pure host layer into host_env.

Target modules and what goes into host_env:

stdlib module host_env module What to extract
socket.rs (3498 lines) host_env::socket Socket creation, bind, connect, address conversion, cmsg helpers, poll wrappers. Re-export socket2 types.
mmap.rs (1625 lines) host_env::mmap mmap/munmap wrappers, madvise, msync. Re-export memmap2 types.
select.rs (745 lines) host_env::select select/poll/epoll/kqueue wrappers via libc/nix.
posixsubprocess.rs (537 lines) host_env::subprocess fork_exec, pipe, dup2, close-on-exec logic.
multiprocessing.rs (1152 lines) host_env::multiprocessing Semaphore operations (sem_open/wait/post/unlink via libc).
fcntl.rs (220 lines) host_env::fcntl fcntl, ioctl, flock wrappers.
faulthandler.rs (1333 lines) host_env::faulthandler Signal handler registration, stack dump via libc write.
locale.rs (332 lines) host_env::locale strcoll, strxfrm, setlocale wrappers.
resource.rs (194 lines) host_env::resource getrusage, getrlimit, setrlimit wrappers.
grp.rs (103 lines) host_env::grp getgrent/setgrent/endgrent, Group lookup via nix.
syslog.rs (148 lines) host_env::syslog openlog, syslog, closelog, setlogmask wrappers.
posixshmem.rs (52 lines) host_env::shm shm_open, shm_unlink wrappers.
termios.rs (280 lines) host_env::termios Terminal attribute get/set via termios crate.

After this, nix, socket2, memmap2, rustix are removed from stdlib's direct dependencies. Only host_env provides them.

Phase 5: Lint enforcement

Three layers of enforcement, from strongest to lightest:

Layer 1: Crate boundary (compile-time, absolute)

The strongest guarantee. If a crate doesn't list rustpython-host-env in its [dependencies], it physically cannot call any host_env function. This is already enforced by Rust's module system.

Pure crates (no host_env dependency allowed):

  • rustpython-common
  • rustpython-compiler, rustpython-compiler-core, rustpython-compiler-source
  • rustpython-codegen
  • rustpython-literal
  • rustpython-sre_engine
  • rustpython-wtf8
  • rustpython-derive, rustpython-derive-impl

CI check:

# Verify pure crates don't depend on host_env
for crate in common compiler compiler-core compiler-source codegen literal sre_engine wtf8 derive derive-impl; do
  if rg 'rustpython-host-env' "crates/$crate/Cargo.toml"; then
    echo "ERROR: $crate should not depend on host_env"
    exit 1
  fi
done

Layer 2: clippy disallowed_methods (compile-time, configurable)

Block direct host API usage in vm/stdlib. Force all host access through host_env.

Workspace-level clippy.toml (project root):

disallowed-methods = [
    # Filesystem
    { path = "std::fs::read", reason = "use rustpython_host_env for host filesystem access" },
    { path = "std::fs::write", reason = "use rustpython_host_env" },
    { path = "std::fs::read_to_string", reason = "use rustpython_host_env" },
    { path = "std::fs::read_dir", reason = "use rustpython_host_env" },
    { path = "std::fs::create_dir", reason = "use rustpython_host_env" },
    { path = "std::fs::create_dir_all", reason = "use rustpython_host_env" },
    { path = "std::fs::remove_file", reason = "use rustpython_host_env" },
    { path = "std::fs::remove_dir", reason = "use rustpython_host_env" },
    { path = "std::fs::metadata", reason = "use rustpython_host_env" },
    { path = "std::fs::symlink_metadata", reason = "use rustpython_host_env" },
    { path = "std::fs::canonicalize", reason = "use rustpython_host_env" },
    { path = "std::fs::File::open", reason = "use rustpython_host_env" },
    { path = "std::fs::File::create", reason = "use rustpython_host_env" },
    { path = "std::fs::OpenOptions::open", reason = "use rustpython_host_env" },

    # Environment
    { path = "std::env::var", reason = "use rustpython_host_env" },
    { path = "std::env::var_os", reason = "use rustpython_host_env" },
    { path = "std::env::set_var", reason = "use rustpython_host_env" },
    { path = "std::env::remove_var", reason = "use rustpython_host_env" },
    { path = "std::env::vars", reason = "use rustpython_host_env" },
    { path = "std::env::vars_os", reason = "use rustpython_host_env" },
    { path = "std::env::current_dir", reason = "use rustpython_host_env" },
    { path = "std::env::set_current_dir", reason = "use rustpython_host_env" },
    { path = "std::env::temp_dir", reason = "use rustpython_host_env" },

    # Process
    { path = "std::process::Command::new", reason = "use rustpython_host_env" },
    { path = "std::process::exit", reason = "use rustpython_host_env" },
    { path = "std::process::abort", reason = "use rustpython_host_env" },
    { path = "std::process::id", reason = "use rustpython_host_env" },

    # Network
    { path = "std::net::TcpStream::connect", reason = "use rustpython_host_env" },
    { path = "std::net::TcpListener::bind", reason = "use rustpython_host_env" },
    { path = "std::net::UdpSocket::bind", reason = "use rustpython_host_env" },
]

crates/host_env/clippy.toml (overrides — host_env is allowed to use everything):

disallowed-methods = []

Clippy resolves clippy.toml by walking up from the crate directory, so host_env's local config takes precedence over the workspace root.

Workspace Cargo.toml:

[workspace.lints.clippy]
disallowed_methods = "deny"

Layer 3: Sandbox build verification (CI)

Build without host_env feature to catch any code that accidentally compiles without the feature gate:

cargo check -p rustpython-vm --no-default-features --features compiler,gc
cargo check -p rustpython-stdlib --no-default-features --features compiler

Layer 4: Whitelist-based module audit (CI script)

Maintain a whitelist of modules in vm/stdlib that are known to NOT use host_env. Any change that adds a rustpython_host_env import to a whitelisted module triggers CI failure.

# .ci/host_env_whitelist.txt — modules that must stay host-free
# vm modules:
crates/vm/src/stdlib/_abc.rs
crates/vm/src/stdlib/_collections.rs
crates/vm/src/stdlib/_functools.rs
crates/vm/src/stdlib/_operator.rs
crates/vm/src/stdlib/_sre.rs
crates/vm/src/stdlib/_stat.rs
crates/vm/src/stdlib/_string.rs
crates/vm/src/stdlib/errno.rs
crates/vm/src/stdlib/gc.rs
crates/vm/src/stdlib/itertools.rs
crates/vm/src/stdlib/marshal.rs

# Check:
while IFS= read -r file; do
  if rg 'rustpython_host_env' "$file" 2>/dev/null; then
    echo "ERROR: $file is whitelisted as host-free but imports host_env"
    exit 1
  fi
done < .ci/host_env_whitelist.txt

The inverse is also useful — list all files that ARE allowed to use host_env, and reject any new file that uses it without being on the list. This catches accidental host API usage in new modules.

Layer 5: #![no_std] for pure crates

After removing host modules from common, it could potentially become #![no_std] unconditionally (it already has #![cfg_attr(not(feature = "std"), no_std)]). This is the strongest possible guarantee — no std::fs, std::env, std::net, std::process available at all.

Candidate crates for unconditional #![no_std]:

  • rustpython-literal
  • rustpython-wtf8
  • rustpython-compiler-source

Summary of enforcement layers

Layer What it catches Strength Cost
Crate boundary Missing host_env dependency Absolute — compile error Zero — automatic
clippy disallowed_methods Direct std::fs/env/net usage Strong — clippy deny Low — clippy.toml config
Sandbox build Missing #[cfg(feature = "host_env")] Strong — compile error Low — CI job
Module whitelist Unintended host_env usage in pure modules Medium — CI script Low — maintain whitelist
#![no_std] Any std usage in pure crates Absolute — compile error Medium — may need refactoring

Risk Assessment

Risk Level Mitigation
Target modules have Python type dependencies Low Verified: only libc, nix, windows-sys, rustpython-wtf8
Internal cross-references break on move Low crt_fd, os, fileutils, windows all move together; crate:: paths stay valid
suppress_iph! macro $crate resolution Medium $crate automatically resolves to new crate; __macro_private moves alongside
Breaking external consumers Medium Clean break — consumers must update common::os to host_env::os. No re-export shim.
Scope of Phase 2 extraction Medium Start with clearly pure functions; mixed functions can be migrated incrementally

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions