Skip to content

Commit ab9b32e

Browse files
committed
update
1 parent a9610d2 commit ab9b32e

File tree

11 files changed

+421
-50
lines changed

11 files changed

+421
-50
lines changed

_toc.yml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,12 @@ parts:
2222

2323
- caption: Python Security
2424
chapters:
25-
- file: security
25+
- file: security/secintro
26+
- file: security/fundamentals
27+
sections:
28+
- file: security/executionmodel
29+
- file: security/threatmodel
30+
- file: security/attacklandscape
2631
- file: security/traversalattacks
2732

2833
- caption: Python Basics

images/bytecode.png

69.9 KB
Loading

images/commonvulnerabilities.png

87.7 KB
Loading

images/pythonexecution.png

54 KB
Loading

images/threatmodel.png

77 KB
Loading

security.md

Lines changed: 0 additions & 49 deletions
This file was deleted.

security/attacklandscape.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# Python Attack Landscape
2+
3+
Python's popularity, ease of use, and widespread adoption make it an attractive target for malicious actors. It is commonly pre-installed or readily available on developer workstations, servers, and cloud environments, giving attackers an easily accessible tool for misuse.
4+
5+
Python itself does not implement **privilege separation** within the interpreter to limit the attack surface when executing code. Once an attacker gains the ability to run arbitrary Python code, they inherit the full privileges of the process running the interpreter — typically those of the user account executing the program. This often grants broad access to the filesystem, network, environment variables, and other system resources.
6+
7+
While true privilege separation cannot be reliably enforced *inside* Python (due to the large attack surface of the interpreter), it can be applied externally by running Python code within a sandboxed environment (e.g., using containers, seccomp filters, virtual machines, or tools like GraalVM isolates). However, no sandbox is 100% secure — escapes and bypasses remain possible, as demonstrated in various real-world vulnerabilities.
8+
9+
Vulnerabilities in Python programs are commonly exposed through:
10+
11+
- **Untrusted or malicious input**
12+
Attackers can supply malicious data via APIs, corrupted files, or insufficiently sanitised user input. Examples include malformed XML/JSON/YAML triggering parser bugs, poisoned data files, or insecure REST endpoints lacking proper validation and security controls.
13+
14+
- **Insecure third-party modules**
15+
Dependencies from PyPI or other sources may contain vulnerabilities, supply-chain compromises, or malicious code.
16+
17+
A particularly significant risk is Python's ability to **execute arbitrary code provided as data**. This capability underpins many injection and remote code execution (RCE) attacks, such as unsafe deserialisation (e.g., via the `pickle` module), `eval()`/`exec()` misuse, or dynamic code loading from untrusted sources.
18+
19+
## Security Vulnerabilities Specific to Python Software
20+
21+
Beyond general environmental risks during execution, certain threat vectors are inherent to Python applications and often stem from gaps in the **Software Development Life Cycle (SDLC)**:
22+
23+
- **Lack of security auditing**
24+
Many Python codebases receive little or no internal security auditing, static/dynamic analysis, or architectural/design reviews tailored to the target environment. As a result, vulnerabilities can persist undetected from development through to production for years.
25+
26+
- **Insufficient security awareness**
27+
A significant number of Python developers lack formal training in secure coding practices. This leads to the introduction of common weaknesses. Modern AI-assisted coding tools can exacerbate the issue by generating code riddled with security flaws that readily become exploitable vulnerabilities.
28+
29+
- **Unreviewed code**
30+
Substantial portions of Python code — across development, staging, and production — undergo no security-focused peer review. Static Application Security Testing (SAST) is rarely performed comprehensively; when it is, coverage is often limited to only a minimal set of issues.
31+
32+
- **Over-privileged accounts and processes**
33+
Developers, testers, CI/CD pipelines, and automated processes frequently operate with excessively broad permissions. Upon compromise, attackers can exploit these elevated privileges for lateral movement, data exfiltration, persistence, or full system takeover.
34+
35+
The combination of Python's powerful dynamic execution features and longstanding systemic weaknesses in development practices — such as minimal code review, inadequate dependency management, and reliance on over-privileged accounts — creates a particularly fertile environment for compromise.
36+
37+
38+
![common Python vulnerabilities](../images/commonvulnerabilities.png)
39+
40+
:::{tip}
41+
Many Python code weaknesses are detected in the #1 **Open Source** Python SAST tool, [**Python Code Audit**](https://nocomplexity.com/codeaudit/)
42+
:::

security/executionmodel.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# Python Execution Model
2+
3+
You cannot understand specific Python security vulnerabilities without better understanding how Python runs. This section should cover the bytecode compilation, the PVM (Python Virtual Machine), and how Python handles memory and dynamic typing.
4+
5+
To understand which security threats are relevant to Python code, knowledge of the internal workings of the Python interpreter is required.
6+
7+
:::{note}
8+
Understanding how Python source files (`*.py`) are executed is crucial from a security point of view.
9+
:::
10+
11+
12+
The default and most widely used interpreter is CPython, distributed by the Python Software Foundation (PSF). The core CPython source code cannot be easily compromised; multiple defensive measures are taken to prevent weaknesses in the code and to protect against supply chain attacks. While trust is essential, verification is better: the open-source nature of Python allows anyone to inspect the code or validate the PSF’s processes if required.
13+
14+
Most secure Unix distributions and all BSD systems require that CPython is built from source. Some distributions use reproducible builds to minimise the risk of supply chain attacks on the CPython code.
15+
16+
While Python is an interpreted language, it involves a compilation step where source code is converted into bytecode (`.pyc`). This bytecode is then executed by the Python Virtual Machine (PVM).
17+
18+
![Python Execution Model](../images/pythonexecution.png)
19+
20+
21+
Detailed Execution Steps:
22+
23+
1. **Source Code**: Python code is a file with a `.py` extension.
24+
25+
+++
26+
27+
2. **Compilation to Bytecode**: When the program is run, the Python Compiler first translates the source code into an intermediate format called bytecode. During this phase, it also checks for syntax errors. This bytecode is a set of instructions tailored for the Python Virtual Machine. This bytecode is a platform-independent representation that is not specific to any operating system or hardware. The compiled bytecode is then stored in a .pyc file. For faster loading in future runs, the bytecode is saved as a `.pyc` file in a `__pycache__` directory. If a script is run directly, e.g. python `myscript.py`, Python compiles it to bytecode in memory only. After execution no bytecode is saved. So `.pyc` files are only created when a file is imported and when chasing is enabled (which is by default).
28+
29+
+++
30+
31+
3. **Execution by the Python Virtual Machine (PVM)**: The bytecode is then passed to the Python Virtual Machine (PVM), which acts as an interpreter. The PVM reads the bytecode instructions and translates them into machine-specific (binary) code that the computer's CPU can understand and execute.
32+
33+
+++
34+
35+
4. **Execution and Output**: The computer's CPU executes the machine code instructions, interacts with the hardware (memory, I/O devices, etc.), and produces the final results or output of the program.
36+
37+
38+
This flow ensures that Python is platform-independent at the bytecode level, as any system with a compatible PVM can run the same bytecode.
39+
40+
41+
CPython is the reference implementation written in C. CPython functions as both a compiler and an interpreter. It first compiles Python source code into an intermediate format called bytecode, which is then executed by the Python Virtual Machine (PVM), a runtime environment written in C.
42+
43+
Python Code goes through parsing, complication and execution. Bytecode is the intermediate representation.
44+
45+
CPython is a stack based virtual machine. This means that it executes instructions using a **Last-In, First-Out (LIFO) stack data structure** to store and retrieve data, rather than using general-purpose registers.
46+
47+
Inside the CPython VM, each function call has an associated "frame," which contains a specific "evaluation stack" (or "value stack"). This is where most of the work happens.
48+
49+
The Python Virtual Machine (PVM), which is part of the **CPython interpreter**, is written in the **C programming language** and uses the standard **C API** and **C runtime library** to make calls to the operating system. So instead of talking to the OS directly from Python code (which would not be platform-independent), the PVM includes an implementation layer (written in C) that uses the underlying OS's own Application Programming Interface (API) to perform tasks using the Operating System.
50+
51+
So Python's execution model combines interpretation with compilation. The Bytecode is an intermediate representation between the high-level Python source code and machine code.
52+
Bytecode is the intermediate representation that the PVM executes. The bytecode is more efficient to run compared to interpreting the source code directly.
53+
54+
Below is an example how you can see how Python code is compiled to bytecode:
55+
![bytecode](../images/bytecode.png)
56+
57+

security/fundamentals.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
# Python Security Fundamentals
2+
3+
This chapter provides a comprehensive exploration of the security mechanics inherent in Python development. We begin by deconstructing the Python Execution Model, establishing how the underlying runtime influences the safety of your code. Building on this technical foundation, we define a formal Python Threat Model to identify trust boundaries and protect vital assets.
4+
5+
The focus then shifts outward to the Python Attack Landscape, examining the external risks posed by the modern software supply chain and third-party dependencies. By understanding this environment, we categorise specific Python Security Threats.
6+

security/secintro.md

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
# Introduction
2+
3+
Python is the most widely used programming language worldwide, valued for its readable syntax and extensive ecosystem. Its accessibility makes it suitable for a broad audience, including occasional programmers, academic researchers and professional developers.
4+
5+
Python plays a central role in modern computing. It powers some of the world’s largest websites and web applications, serving as a key driver of advances in artificial intelligence and machine learning. Its comprehensive libraries and adaptability have established it as a standard tool across scientific research and data-intensive disciplines.
6+
7+
In addition to its academic and research applications, Python is deeply integrated into the operations of millions of companies and supports a vast number of software systems and applications globally. This combination of versatility, scalability, and community support underpins Python’s position as a foundational technology in contemporary computing.
8+
9+
In today’s digital world, cybersecurity remains a critical concern. This applies equally to using or creating Python software: preventing vulnerabilities starts with a solid architecture, but even well-written code—including AI-generated code—is not secure by default.
10+
11+
Validating Python code for potential vulnerabilities is therefore essential, whether you are writing your own programs or relying on code developed by others.
12+
13+
:::{hint}
14+
Security is a process that must be embedded within your development flow at every stage, from design through to production, maintenance, and operations. For those using or developing Python programs, this means that security should, at a minimum, be integrated into the initial design or architecture.
15+
:::
16+
17+
:::{admonition} TLDR - If you short on time
18+
:class: important
19+
The bare minimum to do as programmer is:
20+
* Follow and use the [Python Secure Coding Guidelines](https://nocomplexity.com/documents/codeaudit/securecoding.html).
21+
22+
and:
23+
24+
* Practice [Security by design](https://nocomplexity.com/documents/securitybydesign/sdlc.html).
25+
:::
26+
27+
28+
:::{tip}
29+
If you are a professional Python programmer, make sure you become familiar with basic security threats and the measures used to prevent security risks.
30+
31+
Read and use: [The Open Security Reference Architecture](https://nocomplexity.com/documents/securityarchitecture/introduction.html)
32+
33+
:::
34+
35+
36+
## Python Security Model
37+
38+
Developing secure programs in Python requires understanding its strengths and limitations in handling security. Below are some critical insights and best practices to help you build secure Python applications.
39+
40+
### Key Recommendations
41+
42+
- **Follow a Security Checklist**: Start with a comprehensive security checklist, such as this [simple security checklist](https://nocomplexity.com/documents/securityarchitecture/prevention/simple-checklists.html), to identify and mitigate potential vulnerabilities.
43+
44+
- **Understand Privilege Separation**:
45+
Python, by design, does not implement internal privilege separation, which increases the attack surface. If an attacker can execute arbitrary Python code, they effectively gain full access to the system.
46+
47+
**Solution**: Implement privilege separation outside Python by running it within a sandboxed environment. This reduces risks by isolating the Python process from critical system resources.
48+
49+
### Bytecode Concerns
50+
51+
- **Lack of Bytecode Safety**: CPython does not verify the safety of bytecode. If an attacker can execute arbitrary bytecode, they could import and run sensitive code. However, this should be considered a secondary concern since code execution implies significant compromise.
52+
53+
- **Avoid Building Sandboxes in CPython**: Creating a secure sandbox inside CPython is impractical due to the extensive attack surface. Python's rich introspection capabilities (e.g., the `inspect` module) and features that execute code on demand make it challenging to enforce strict isolation.
54+
55+
### Examples of Potential Exploits
56+
57+
- A literal string like `'\N{Snowman}'` automatically imports the `unicodedata` module.
58+
- Code designed to log warnings can potentially be abused to execute malicious code.
59+
60+
### Secure Development Approach
61+
62+
For robust security, do not rely on Python itself for sandboxing. Instead, encapsulate CPython in a dedicated, externally managed sandbox. This design ensures that the Python environment remains isolated, significantly reducing the risk of exploitation.
63+

0 commit comments

Comments
 (0)