Skip to content

Commit a926119

Browse files
committed
docs/ref/speed_python: Update and make more hardware-neutral.
Move hardware-specific optimizations to the very end of document, and add visible note that it gives an example for Pyboard. Remove references to specific hardware technologies, so the doc can be more naturally used across ports. Various markup updates to adhere to the latest docs conventions.
1 parent dd16e21 commit a926119

File tree

1 file changed

+56
-46
lines changed

1 file changed

+56
-46
lines changed

docs/reference/speed_python.rst

Lines changed: 56 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
1-
Maximising Python Speed
2-
=======================
1+
Maximising MicroPython Speed
2+
============================
3+
4+
.. contents::
35

46
This tutorial describes ways of improving the performance of MicroPython code.
57
Optimisations involving other languages are covered elsewhere, namely the use
6-
of modules written in C and the MicroPython inline ARM Thumb-2 assembler.
8+
of modules written in C and the MicroPython inline assembler.
79

810
The process of developing high performance code comprises the following stages
911
which should be performed in the order listed.
@@ -17,6 +19,7 @@ Optimisation steps:
1719
* Improve the efficiency of the Python code.
1820
* Use the native code emitter.
1921
* Use the viper code emitter.
22+
* Use hardware-specific optimisations.
2023

2124
Designing for speed
2225
-------------------
@@ -50,7 +53,7 @@ once only and not permitted to grow in size. This implies that the object persis
5053
for the duration of its use: typically it will be instantiated in a class constructor
5154
and used in various methods.
5255

53-
This is covered in further detail :ref:`Controlling garbage collection <gc>` below.
56+
This is covered in further detail :ref:`Controlling garbage collection <controlling_gc>` below.
5457

5558
Buffers
5659
~~~~~~~
@@ -60,8 +63,8 @@ used for communication with a device. A typical driver will create the buffer in
6063
constructor and use it in its I/O methods which will be called repeatedly.
6164

6265
The MicroPython libraries typically provide support for pre-allocated buffers. For
63-
example, objects which support stream interface (e.g., file or UART) provide ``read()``
64-
method which allocate new buffer for read data, but also a ``readinto()`` method
66+
example, objects which support stream interface (e.g., file or UART) provide `read()`
67+
method which allocates new buffer for read data, but also a `readinto()` method
6568
to read data into an existing buffer.
6669

6770
Floating Point
@@ -79,14 +82,14 @@ Arrays
7982
~~~~~~
8083

8184
Consider the use of the various types of array classes as an alternative to lists.
82-
The ``array`` module supports various element types with 8-bit elements supported
83-
by Python's built in ``bytes`` and ``bytearray`` classes. These data structures all store
85+
The `array` module supports various element types with 8-bit elements supported
86+
by Python's built in `bytes` and `bytearray` classes. These data structures all store
8487
elements in contiguous memory locations. Once again to avoid memory allocation in critical
8588
code these should be pre-allocated and passed as arguments or as bound objects.
8689

87-
When passing slices of objects such as ``bytearray`` instances, Python creates
90+
When passing slices of objects such as `bytearray` instances, Python creates
8891
a copy which involves allocation of the size proportional to the size of slice.
89-
This can be alleviated using a ``memoryview`` object. ``memoryview`` itself
92+
This can be alleviated using a `memoryview` object. `memoryview` itself
9093
is allocated on heap, but is a small, fixed-size object, regardless of the size
9194
of slice it points too.
9295

@@ -97,19 +100,19 @@ of slice it points too.
97100
mv = memoryview(ba) # small object is allocated
98101
func(mv[30:2000]) # a pointer to memory is passed
99102
100-
A ``memoryview`` can only be applied to objects supporting the buffer protocol - this
103+
A `memoryview` can only be applied to objects supporting the buffer protocol - this
101104
includes arrays but not lists. Small caveat is that while memoryview object is live,
102105
it also keeps alive the original buffer object. So, a memoryview isn't a universal
103106
panacea. For instance, in the example above, if you are done with 10K buffer and
104107
just need those bytes 30:2000 from it, it may be better to make a slice, and let
105108
the 10K buffer go (be ready for garbage collection), instead of making a
106109
long-living memoryview and keeping 10K blocked for GC.
107110

108-
Nonetheless, ``memoryview`` is indispensable for advanced preallocated buffer
109-
management. ``.readinto()`` method discussed above puts data at the beginning
111+
Nonetheless, `memoryview` is indispensable for advanced preallocated buffer
112+
management. `readinto()` method discussed above puts data at the beginning
110113
of buffer and fills in entire buffer. What if you need to put data in the
111114
middle of existing buffer? Just create a memoryview into the needed section
112-
of buffer and pass it to ``.readinto()``.
115+
of buffer and pass it to `readinto()`.
113116

114117
Identifying the slowest section of code
115118
---------------------------------------
@@ -118,8 +121,7 @@ This is a process known as profiling and is covered in textbooks and
118121
(for standard Python) supported by various software tools. For the type of
119122
smaller embedded application likely to be running on MicroPython platforms
120123
the slowest function or method can usually be established by judicious use
121-
of the timing ``ticks`` group of functions documented
122-
`here <http://docs.micropython.org/en/latest/pyboard/library/time.html>`_.
124+
of the timing ``ticks`` group of functions documented in `utime`.
123125
Code execution time can be measured in ms, us, or CPU cycles.
124126

125127
The following enables any function or method to be timed by adding an
@@ -130,9 +132,9 @@ The following enables any function or method to be timed by adding an
130132
def timed_function(f, *args, **kwargs):
131133
myname = str(f).split(' ')[1]
132134
def new_func(*args, **kwargs):
133-
t = time.ticks_us()
135+
t = utime.ticks_us()
134136
result = f(*args, **kwargs)
135-
delta = time.ticks_diff(time.ticks_us(), t)
137+
delta = utime.ticks_diff(utime.ticks_us(), t)
136138
print('Function {} Time = {:6.3f}ms'.format(myname, delta/1000))
137139
return result
138140
return new_func
@@ -170,7 +172,7 @@ by caching the object in a local variable:
170172
This avoids the need repeatedly to look up ``self.ba`` and ``obj_display.framebuffer``
171173
in the body of the method ``bar()``.
172174

173-
.. _gc:
175+
.. _controlling_gc:
174176

175177
Controlling garbage collection
176178
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -182,42 +184,19 @@ process known as garbage collection reclaims the memory used by these redundant
182184
objects and the allocation is then tried again - a process which can take several
183185
milliseconds.
184186

185-
There are benefits in pre-empting this by periodically issuing ``gc.collect()``.
187+
There may be benefits in pre-empting this by periodically issuing `gc.collect()`.
186188
Firstly doing a collection before it is actually required is quicker - typically on the
187189
order of 1ms if done frequently. Secondly you can determine the point in code
188190
where this time is used rather than have a longer delay occur at random points,
189191
possibly in a speed critical section. Finally performing collections regularly
190192
can reduce fragmentation in the heap. Severe fragmentation can lead to
191193
non-recoverable allocation failures.
192194

193-
Accessing hardware directly
194-
~~~~~~~~~~~~~~~~~~~~~~~~~~~
195-
196-
This comes into the category of more advanced programming and involves some knowledge
197-
of the target MCU. Consider the example of toggling an output pin on the Pyboard. The
198-
standard approach would be to write
199-
200-
.. code:: python
201-
202-
mypin.value(mypin.value() ^ 1) # mypin was instantiated as an output pin
203-
204-
This involves the overhead of two calls to the ``Pin`` instance's ``value()``
205-
method. This overhead can be eliminated by performing a read/write to the relevant bit
206-
of the chip's GPIO port output data register (odr). To facilitate this the ``stm``
207-
module provides a set of constants providing the addresses of the relevant registers.
208-
A fast toggle of pin ``P4`` (CPU pin ``A14``) - corresponding to the green LED -
209-
can be performed as follows:
210-
211-
.. code:: python
212-
213-
BIT14 = const(1 << 14)
214-
stm.mem16[stm.GPIOA + stm.GPIO_ODR] ^= BIT14
215-
216195
The Native code emitter
217196
-----------------------
218197

219-
This causes the MicroPython compiler to emit ARM native opcodes rather than
220-
bytecode. It covers the bulk of the Python language so most functions will require
198+
This causes the MicroPython compiler to emit native CPU opcodes rather than
199+
bytecode. It covers the bulk of the MicroPython functionality, so most functions will require
221200
no adaptation (but see below). It is invoked by means of a function decorator:
222201

223202
.. code:: python
@@ -276,7 +255,7 @@ Viper provides pointer types to assist the optimiser. These comprise
276255
* ``ptr32`` Points to a 32 bit machine word.
277256

278257
The concept of a pointer may be unfamiliar to Python programmers. It has similarities
279-
to a Python ``memoryview`` object in that it provides direct access to data stored in memory.
258+
to a Python `memoryview` object in that it provides direct access to data stored in memory.
280259
Items are accessed using subscript notation, but slices are not supported: a pointer can return
281260
a single item only. Its purpose is to provide fast random access to data stored in contiguous
282261
memory locations - such as data stored in objects which support the buffer protocol, and
@@ -330,3 +309,34 @@ The following example illustrates the use of a ``ptr16`` cast to toggle pin X1 `
330309
A detailed technical description of the three code emitters may be found
331310
on Kickstarter here `Note 1 <https://www.kickstarter.com/projects/214379695/micro-python-python-for-microcontrollers/posts/664832>`_
332311
and here `Note 2 <https://www.kickstarter.com/projects/214379695/micro-python-python-for-microcontrollers/posts/665145>`_
312+
313+
Accessing hardware directly
314+
---------------------------
315+
316+
.. note::
317+
318+
Code examples in this section are given for the Pyboard. The techniques
319+
described however may be applied to other MicroPython ports too.
320+
321+
This comes into the category of more advanced programming and involves some knowledge
322+
of the target MCU. Consider the example of toggling an output pin on the Pyboard. The
323+
standard approach would be to write
324+
325+
.. code:: python
326+
327+
mypin.value(mypin.value() ^ 1) # mypin was instantiated as an output pin
328+
329+
This involves the overhead of two calls to the `Pin` instance's :meth:`~machine.Pin.value()`
330+
method. This overhead can be eliminated by performing a read/write to the relevant bit
331+
of the chip's GPIO port output data register (odr). To facilitate this the ``stm``
332+
module provides a set of constants providing the addresses of the relevant registers.
333+
A fast toggle of pin ``P4`` (CPU pin ``A14``) - corresponding to the green LED -
334+
can be performed as follows:
335+
336+
.. code:: python
337+
338+
import machine
339+
import stm
340+
341+
BIT14 = const(1 << 14)
342+
machine.mem16[stm.GPIOA + stm.GPIO_ODR] ^= BIT14

0 commit comments

Comments
 (0)