Skip to content

Commit 648cd34

Browse files
committed
Initial commit
0 parents  commit 648cd34

File tree

6 files changed

+482
-0
lines changed

6 files changed

+482
-0
lines changed

.gitignore

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
/pyext-myrustlib/target/
2+
/pyext-myrustlib/**/*.rs.bk
3+
/pyext-myrustlib/Cargo.lock
4+
.cache
5+
.benchmarks
6+
__pycache__
7+
myrustlib.so
8+
/pyext-myrustlib/.gitignore

README.md

Lines changed: 377 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,377 @@
1+
# Speed up your Python using Rust
2+
3+
![Rust](https://www.rust-lang.org/logos/rust-logo-blk.svg)
4+
5+
## What is Rust?
6+
7+
**Rust** is a systems programming language that runs blazingly fast, prevents segfaults, and guarantees thread safety.
8+
9+
Featuring
10+
11+
* zero-cost abstractions
12+
* move semantics
13+
* guaranteed memory safety
14+
* threads without data races
15+
* trait-based generics
16+
* pattern matching
17+
* type inference
18+
* minimal runtime
19+
* efficient C bindings
20+
21+
> Taken from: from rust-lang.org
22+
23+
## Why does it matter for a Python developer?
24+
25+
The better description of Rust I heard from [**Elias**](https://github.com/dlight) a member and the **Rust Guru** of the [**Rust Brazil Telegram Group**](https://t.me/rustlangbr)
26+
27+
> **Rust is** a language that allows you to build high level abstractions, but without giving up low level control - that is, control of how data is represented in memory, control of which threading model you want to use etc.
28+
> **Rust is** a language that can usually detect, during compilation, the worst parallelism and memory management errors (such as accessing data on different threads without synchronization, or using data after they have been deallocated), but gives you a hatch escape in the case you really know what you're doing.
29+
> **Rust is** a language that, because it has no runtime, can be used to integrate with any runtime; you can write a native extension in Rust that is called by a program node.js, or by a python program, or by a program in ruby, lua etc. and, on the other hand, you can script a program in Rust using these languages. -- "Elias Gabriel Amaral da Silva"
30+
31+
32+
![PyRust](https://user-images.githubusercontent.com/458654/32692578-9a424482-c701-11e7-8ea5-09c71612b96c.png)
33+
34+
There are a bunch of Rust packages out there to help you extending Python with Rust.
35+
36+
I can mention [Milksnake](https://github.com/getsentry/milksnake) created by Armin Ronacher (the creator of Flask) and also [PyO3](https://github.com/PyO3/pyo3) The Rust bindings for Python interpreter
37+
38+
> See a complete reference list at the bottom.
39+
40+
For this post I am going to use [Rust Cpython](https://github.com/dgrunwald/rust-cpython) as it is the only one I have tested and found straighforward to use.
41+
42+
**Pros:** It is really easy to write Rust functions and import from Python and as you will see by the benchmarks it worth in terms of performance.
43+
44+
**Cons:** The distribution of your **project/lib/framework** will demand the Rust module to be compiled on the target system because of variation of environment and architecture, there will be a **compiling** stage which you don't have when installing Pure Python libraries, you can make it easier using [rust-setuptools](https://pypi.python.org/pypi/setuptools-rust) or using the [MilkSnake](https://github.com/getsentry/milksnake) to embed binary data in Python Wheels.
45+
46+
## Python is sometimes slow
47+
48+
Yes, Python is known for being "slow" in some cases and the good news is that this doesn't really matter depending on your project goals and priorities. For most projects this
49+
detail will not be very important.
50+
51+
However, you may face the **rare** case where a single function or module is taking too much time and is detected as the bottleneck of your project performance, often happens with string parsing and image processing.
52+
53+
## Example
54+
55+
Lets say you have a Python function which does some kind of string processing, take the following easy example of `counting pairs of repeated chars` but have in mind that this example can be reproduced with other `string processing` functions or any other generally slow process in Python.
56+
57+
58+
```bash
59+
# How many subsequent-repeated group of chars are in the given string?
60+
abCCdeFFghiJJklmnopqRRstuVVxyZZ... {millions of chars here}
61+
1 2 3 4 5 6
62+
```
63+
64+
Python is pretty slow for doing large `string` processing so you can use `pytest-benchmark` to compare a `Pure Python (with Iterator Zipping)` function versus a `Regexp` implementation.
65+
66+
```bash
67+
# Using a Python3.6 environment
68+
$ pip3 install pytest pytest-benchmark
69+
70+
```
71+
72+
Then write a new Python program called `doubles.py`
73+
74+
```python
75+
import re
76+
import string
77+
import random
78+
79+
# Python ZIP version
80+
def count_doubles(val):
81+
total = 0
82+
for c1, c2 in zip(val, val[1:]):
83+
if c1 == c2:
84+
total += 1
85+
return total
86+
87+
88+
# Python REGEXP version
89+
double_re = re.compile(r'(?=(.)\1)')
90+
91+
def count_doubles_regex(val):
92+
return len(double_re.findall(val))
93+
94+
95+
# Benchmark it
96+
# generate 1M of random letters to test it
97+
val = ''.join(random.choice(string.ascii_letters) for i in range(1000000))
98+
99+
def test_pure_python(benchmark):
100+
benchmark(count_doubles, val)
101+
102+
def test_regex(benchmark):
103+
benchmark(count_doubles_regex, val)
104+
```
105+
106+
Run **pytest** to compare:
107+
108+
109+
```bash
110+
$ pytest doubles.py
111+
================================================================================= test session starts ==================================================================================
112+
platform linux -- Python 3.6.0, pytest-3.2.3, py-1.4.34, pluggy-0.4.0
113+
benchmark: 3.1.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
114+
rootdir: /Projects/rustpy, inifile:
115+
plugins: benchmark-3.1.1
116+
collected 2 items
117+
118+
doubles.py ..
119+
120+
121+
--------------------------------------------------------------------------------- benchmark: 2 tests --------------------------------------------------------------------------------
122+
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
123+
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
124+
test_regex 24.6824 (1.0) 32.3960 (1.0) 27.0167 (1.0) 1.8610 (1.0) 27.2148 (1.0) 2.9345 (4.55) 16;1 37.0141 (1.0) 36 1
125+
test_pure_python 51.4964 (2.09) 62.5680 (1.93) 52.8334 (1.96) 2.3630 (1.27) 52.2846 (1.92) 0.6444 (1.0) 1;2 18.9274 (0.51) 20 1
126+
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
127+
128+
Legend:
129+
Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
130+
OPS: Operations Per Second, computed as 1 / Mean
131+
=============================================================================== 2 passed in 4.10 seconds ===============================================================================
132+
133+
```
134+
135+
Lets take the `Median` for comparison:
136+
137+
- **Regexp** - 27.2148 **<-- less is better**
138+
- **Python Zip** - 52.2846
139+
140+
# Extending Python with Rust
141+
142+
# Create a new crate
143+
144+
> **crate** is how we call Rust Packages.
145+
146+
Having rust installed (recommended way is https://www.rustup.rs/)
147+
148+
> I used `rustc 1.21.0`
149+
150+
151+
In the same folder run:
152+
153+
```bash
154+
cargo new pyext-myrustlib
155+
```
156+
157+
It creates a new Rust project in that same folder called `pyext-myrustlib` containing the `Cargo.toml` (cargo is the Rust package manager) and also a `src/lib.rs` (where we write our library implementation)
158+
159+
# Edit Cargo.toml
160+
161+
It will use the `rust-cpython` crate as dependency and tell cargo to generate a `dylib` to be imported from Python
162+
163+
```toml
164+
[package]
165+
name = "pyext-myrustlib"
166+
version = "0.1.0"
167+
authors = ["Bruno Rocha <rochacbruno@gmail.com>"]
168+
169+
[lib]
170+
name = "myrustlib"
171+
crate-type = ["dylib"]
172+
173+
[dependencies.cpython]
174+
version = "0.1"
175+
features = ["extension-module"]
176+
```
177+
178+
# Edit src/lib.rs
179+
180+
What we need to do:
181+
182+
1) Import all macros from `cpython` crate
183+
2) Take `Python` and `PyResult` types from cpython in to our lib scope
184+
3) Write the `count_doubles` function implementation in `Rust`, note that this is very similar to the Pure Python version except for:
185+
186+
* It takes a `Python` as first argument, which is a reference to the Python Interpreter and allows Rust to use the `Python GIL`
187+
* Receives a `&str` typed `val` as reference
188+
* Returns a `PyResult` which is a type that allows the raise of Python exceptions
189+
* Returns a `PyResult` object in `Ok(total)` (**Result** is a enum type that represents either success (Ok) or failure (Err)) and as our function is expected to return a `PyResult` the compiler will take care of **wrapping** our `Ok` on that type. (note that our PyResult expects a `u64` as return value)
190+
191+
4) Using `py_module_initializer!` macro we register new attributes to the lib, including the `__doc__` and also we add the `count_doubles` attribute referencing our `Rust implementation of the function`
192+
* Attention to the names **lib**myrustlib, **initlib**myrustlib and **PyInit**_myrustlib which is suffixed by our library name (defined in Cargo.toml)
193+
* We also use the `try!` macro which is the equivalent to Python's `try.. except`
194+
* Return `Ok(())` - The `()` is an empty result tuple, the equivalent of `None` in Python
195+
196+
```rust
197+
#[macro_use]
198+
extern crate cpython;
199+
200+
use cpython::{Python, PyResult};
201+
202+
fn count_doubles(_py: Python, val: &str) -> PyResult<u64> {
203+
let mut total = 0u64;
204+
205+
for (c1, c2) in val.chars().zip(val.chars().skip(1)) {
206+
if c1 == c2 {
207+
total += 1;
208+
}
209+
}
210+
211+
Ok(total)
212+
}
213+
214+
py_module_initializer!(libmyrustlib, initlibmyrustlib, PyInit_myrustlib, |py, m | {
215+
try!(m.add(py, "__doc__", "This module is implemented in Rust"));
216+
try!(m.add(py, "count_doubles", py_fn!(py, count_doubles(val: &str))));
217+
Ok(())
218+
});
219+
220+
```
221+
222+
Now lets build it in cargo
223+
224+
```bash
225+
$ cargo build --release
226+
Finished release [optimized] target(s) in 0.0 secs
227+
228+
$ ls -la target/release/libmyrustlib*
229+
target/release/libmyrustlib.d
230+
target/release/libmyrustlib.so* <-- Our dylib is here
231+
```
232+
233+
Now lets copy the generated `.so` lib to the same folder where our `doubles.py` is:
234+
235+
> NOTE: on **Fedora** you must get a `.so` in other system you may get a `.dylib` and you can rename it changing extension to `.so`
236+
237+
```bash
238+
$ cd ..
239+
$ ls
240+
doubles.py pyext-myrustlib/
241+
242+
$ cp pyext-myrustlib/target/release/libmyrustlib.so myrustlib.so
243+
244+
$ ls
245+
doubles.py myrustlib.so pyext-myrustlib/
246+
```
247+
248+
> Having the `myrustlib.so` in the same folder or added to your Python path allows it to be directly imported, transparently as it was a Python module.
249+
250+
251+
# Importing from Python and comparing the results
252+
253+
Edit your `doubles.py` now importing our `Rust implemented` version and also adding a `benchmark` for it.
254+
255+
256+
```python
257+
import re
258+
import string
259+
import random
260+
import myrustlib # <-- Import the Rust implemented module (myrustlib.so)
261+
262+
263+
def count_doubles(val):
264+
"""Count repeated pair of chars ins a string"""
265+
total = 0
266+
for c1, c2 in zip(val, val[1:]):
267+
if c1 == c2:
268+
total += 1
269+
return total
270+
271+
272+
double_re = re.compile(r'(?=(.)\1)')
273+
274+
275+
def count_doubles_regex(val):
276+
return len(double_re.findall(val))
277+
278+
279+
val = ''.join(random.choice(string.ascii_letters) for i in range(1000000))
280+
281+
282+
def test_pure_python(benchmark):
283+
benchmark(count_doubles, val)
284+
285+
286+
def test_regex(benchmark):
287+
benchmark(count_doubles_regex, val)
288+
289+
290+
def test_rust(benchmark): # <-- Benchmark the Rust version
291+
benchmark(myrustlib.count_doubles, val)
292+
293+
```
294+
295+
# Benchmark
296+
297+
```bash
298+
$ pytest doubles.py
299+
================================================================================= test session starts ==================================================================================
300+
platform linux -- Python 3.6.0, pytest-3.2.3, py-1.4.34, pluggy-0.4.0
301+
benchmark: 3.1.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
302+
rootdir: /Projects/rustpy, inifile:
303+
plugins: benchmark-3.1.1
304+
collected 3 items
305+
306+
doubles_rust.py ...
307+
308+
309+
--------------------------------------------------------------------------------- benchmark: 3 tests ---------------------------------------------------------------------------------
310+
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
311+
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
312+
test_rust 2.5555 (1.0) 2.9296 (1.0) 2.6085 (1.0) 0.0521 (1.0) 2.5935 (1.0) 0.0456 (1.0) 53;23 383.3661 (1.0) 382 1
313+
test_regex 25.6049 (10.02) 27.2190 (9.29) 25.8876 (9.92) 0.3543 (6.80) 25.7664 (9.93) 0.3020 (6.63) 4;3 38.6285 (0.10) 40 1
314+
test_pure_python 52.9428 (20.72) 56.3666 (19.24) 53.9732 (20.69) 0.9248 (17.75) 53.6220 (20.68) 1.4899 (32.70) 6;0 18.5277 (0.05) 20 1
315+
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
316+
317+
Legend:
318+
Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
319+
OPS: Operations Per Second, computed as 1 / Mean
320+
=============================================================================== 3 passed in 5.19 seconds ===============================================================================
321+
322+
```
323+
324+
Lets take the `Median` for comparison:
325+
326+
- **Rust** - 2.5935 **<-- less is better**
327+
- **Regexp** - 25.7664
328+
- **Python Zip** - 53.6220
329+
330+
Rust implementation can be **10x** faster than Python Regex and **21x** faster than Pure Python Version.
331+
332+
> Interesting that **Regex** version is only 2x faster than Pure Python :)
333+
334+
> NOTE: That numbers makes sense only for this particular scenario, for other cases that comparison may be different.
335+
336+
# Conclusion
337+
338+
`Rust` may not be **yet** the `general purpose language` of choice by its level of complexity and may not be the better choice **yet** to write common simple `applications` such as `web` sites and `test automation` scripts.
339+
340+
However, for `specific parts` of the project where Python is known to be the bottleneck and your natural choice would be implementing a `C/C++` extension, writing this extension in Rust seems easy and better to maintain.
341+
342+
There are still many improvements to come in Rust and lots of others crates to offer `Python <--> Rust` integration. Even if your are not including the language in your tool belt right now, it is really worth to keep an eye open to the future!
343+
344+
## Credits
345+
346+
The examples on this publication are inspired by `Extending Python with Rust` talk by **Samuel Cormier-Iijima** in **Pycon Canada**.
347+
video here: https://www.youtube.com/watch?v=-ylbuEzkG4M
348+
349+
And also by `My Python is a little Rust-y` by **Dan Callahan** in **Pycon Montreal**.
350+
video here: https://www.youtube.com/watch?v=3CwJ0MH-4MA
351+
352+
Other references:
353+
354+
- https://github.com/mitsuhiko/snaek
355+
- https://github.com/PyO3/pyo3
356+
- https://pypi.python.org/pypi/setuptools-rust
357+
- https://github.com/mckaymatt/cookiecutter-pypackage-rust-cross-platform-publish
358+
- http://jakegoulding.com/rust-ffi-omnibus/
359+
- https://github.com/urschrei/polylabel-rs/blob/master/src/ffi.rs
360+
- https://bheisler.github.io/post/calling-rust-in-python/
361+
- https://github.com/saethlin/rust-lather
362+
363+
Join Community:
364+
365+
Join Rust community, you can find group links in https://www.rust-lang.org/en-US/community.html
366+
367+
**If you speak Portuguese** I recommend you to join https://t.me/rustlangbr and there
368+
is also the http://bit.ly/canalrustbr on Youtube.
369+
370+
## Author
371+
372+
**Bruno Rocha**
373+
- Senior Quality Enginner at **Red Hat**
374+
- Teaching Python at CursoDePython.com.br
375+
- Fellow Member of Python Software Foundation
376+
377+
More info: http://about.me/rochacbruno and http://brunorocha.org

0 commit comments

Comments
 (0)