1- Maximising Python Speed
2- =======================
1+ Maximising MicroPython Speed
2+ ============================
3+
4+ .. contents ::
35
46This tutorial describes ways of improving the performance of MicroPython code.
57Optimisations involving other languages are covered elsewhere, namely the use
6- of modules written in C and the MicroPython inline ARM Thumb-2 assembler.
8+ of modules written in C and the MicroPython inline assembler.
79
810The process of developing high performance code comprises the following stages
911which should be performed in the order listed.
@@ -17,6 +19,7 @@ Optimisation steps:
1719* Improve the efficiency of the Python code.
1820* Use the native code emitter.
1921* Use the viper code emitter.
22+ * Use hardware-specific optimisations.
2023
2124Designing for speed
2225-------------------
@@ -50,7 +53,7 @@ once only and not permitted to grow in size. This implies that the object persis
5053for the duration of its use: typically it will be instantiated in a class constructor
5154and used in various methods.
5255
53- This is covered in further detail :ref: `Controlling garbage collection <gc >` below.
56+ This is covered in further detail :ref: `Controlling garbage collection <controlling_gc >` below.
5457
5558Buffers
5659~~~~~~~
@@ -60,8 +63,8 @@ used for communication with a device. A typical driver will create the buffer in
6063constructor and use it in its I/O methods which will be called repeatedly.
6164
6265The MicroPython libraries typically provide support for pre-allocated buffers. For
63- example, objects which support stream interface (e.g., file or UART) provide `` read() ` `
64- method which allocate new buffer for read data, but also a `` readinto() ` ` method
66+ example, objects which support stream interface (e.g., file or UART) provide `read() `
67+ method which allocates new buffer for read data, but also a `readinto() ` method
6568to read data into an existing buffer.
6669
6770Floating Point
@@ -79,14 +82,14 @@ Arrays
7982~~~~~~
8083
8184Consider the use of the various types of array classes as an alternative to lists.
82- The `` array ` ` module supports various element types with 8-bit elements supported
83- by Python's built in `` bytes `` and `` bytearray ` ` classes. These data structures all store
85+ The `array ` module supports various element types with 8-bit elements supported
86+ by Python's built in `bytes ` and `bytearray ` classes. These data structures all store
8487elements in contiguous memory locations. Once again to avoid memory allocation in critical
8588code these should be pre-allocated and passed as arguments or as bound objects.
8689
87- When passing slices of objects such as `` bytearray ` ` instances, Python creates
90+ When passing slices of objects such as `bytearray ` instances, Python creates
8891a copy which involves allocation of the size proportional to the size of slice.
89- This can be alleviated using a `` memoryview `` object. `` memoryview ` ` itself
92+ This can be alleviated using a `memoryview ` object. `memoryview ` itself
9093is allocated on heap, but is a small, fixed-size object, regardless of the size
9194of slice it points too.
9295
@@ -97,19 +100,19 @@ of slice it points too.
97100 mv = memoryview (ba) # small object is allocated
98101 func(mv[30 :2000 ]) # a pointer to memory is passed
99102
100- A `` memoryview ` ` can only be applied to objects supporting the buffer protocol - this
103+ A `memoryview ` can only be applied to objects supporting the buffer protocol - this
101104includes arrays but not lists. Small caveat is that while memoryview object is live,
102105it also keeps alive the original buffer object. So, a memoryview isn't a universal
103106panacea. For instance, in the example above, if you are done with 10K buffer and
104107just need those bytes 30:2000 from it, it may be better to make a slice, and let
105108the 10K buffer go (be ready for garbage collection), instead of making a
106109long-living memoryview and keeping 10K blocked for GC.
107110
108- Nonetheless, `` memoryview ` ` is indispensable for advanced preallocated buffer
109- management. `` . readinto()` ` method discussed above puts data at the beginning
111+ Nonetheless, `memoryview ` is indispensable for advanced preallocated buffer
112+ management. `readinto() ` method discussed above puts data at the beginning
110113of buffer and fills in entire buffer. What if you need to put data in the
111114middle of existing buffer? Just create a memoryview into the needed section
112- of buffer and pass it to `` . readinto()` `.
115+ of buffer and pass it to `readinto() `.
113116
114117Identifying the slowest section of code
115118---------------------------------------
@@ -118,8 +121,7 @@ This is a process known as profiling and is covered in textbooks and
118121(for standard Python) supported by various software tools. For the type of
119122smaller embedded application likely to be running on MicroPython platforms
120123the slowest function or method can usually be established by judicious use
121- of the timing ``ticks `` group of functions documented
122- `here <http://docs.micropython.org/en/latest/pyboard/library/time.html >`_.
124+ of the timing ``ticks `` group of functions documented in `utime `.
123125Code execution time can be measured in ms, us, or CPU cycles.
124126
125127The following enables any function or method to be timed by adding an
@@ -130,9 +132,9 @@ The following enables any function or method to be timed by adding an
130132 def timed_function (f , * args , ** kwargs ):
131133 myname = str (f).split(' ' )[1 ]
132134 def new_func (* args , ** kwargs ):
133- t = time .ticks_us()
135+ t = utime .ticks_us()
134136 result = f(* args, ** kwargs)
135- delta = time .ticks_diff(time .ticks_us(), t)
137+ delta = utime .ticks_diff(utime .ticks_us(), t)
136138 print (' Function {} Time = {:6.3f } ms' .format(myname, delta/ 1000 ))
137139 return result
138140 return new_func
@@ -170,7 +172,7 @@ by caching the object in a local variable:
170172 This avoids the need repeatedly to look up ``self.ba `` and ``obj_display.framebuffer ``
171173in the body of the method ``bar() ``.
172174
173- .. _ gc :
175+ .. _ controlling_gc :
174176
175177Controlling garbage collection
176178~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -182,42 +184,19 @@ process known as garbage collection reclaims the memory used by these redundant
182184objects and the allocation is then tried again - a process which can take several
183185milliseconds.
184186
185- There are benefits in pre-empting this by periodically issuing `` gc.collect() ` `.
187+ There may be benefits in pre-empting this by periodically issuing `gc.collect() `.
186188Firstly doing a collection before it is actually required is quicker - typically on the
187189order of 1ms if done frequently. Secondly you can determine the point in code
188190where this time is used rather than have a longer delay occur at random points,
189191possibly in a speed critical section. Finally performing collections regularly
190192can reduce fragmentation in the heap. Severe fragmentation can lead to
191193non-recoverable allocation failures.
192194
193- Accessing hardware directly
194- ~~~~~~~~~~~~~~~~~~~~~~~~~~~
195-
196- This comes into the category of more advanced programming and involves some knowledge
197- of the target MCU. Consider the example of toggling an output pin on the Pyboard. The
198- standard approach would be to write
199-
200- .. code :: python
201-
202- mypin.value(mypin.value() ^ 1 ) # mypin was instantiated as an output pin
203-
204- This involves the overhead of two calls to the ``Pin `` instance's ``value() ``
205- method. This overhead can be eliminated by performing a read/write to the relevant bit
206- of the chip's GPIO port output data register (odr). To facilitate this the ``stm ``
207- module provides a set of constants providing the addresses of the relevant registers.
208- A fast toggle of pin ``P4 `` (CPU pin ``A14 ``) - corresponding to the green LED -
209- can be performed as follows:
210-
211- .. code :: python
212-
213- BIT14 = const(1 << 14 )
214- stm.mem16[stm.GPIOA + stm.GPIO_ODR ] ^= BIT14
215-
216195The Native code emitter
217196-----------------------
218197
219- This causes the MicroPython compiler to emit ARM native opcodes rather than
220- bytecode. It covers the bulk of the Python language so most functions will require
198+ This causes the MicroPython compiler to emit native CPU opcodes rather than
199+ bytecode. It covers the bulk of the MicroPython functionality, so most functions will require
221200no adaptation (but see below). It is invoked by means of a function decorator:
222201
223202.. code :: python
@@ -276,7 +255,7 @@ Viper provides pointer types to assist the optimiser. These comprise
276255* ``ptr32 `` Points to a 32 bit machine word.
277256
278257The concept of a pointer may be unfamiliar to Python programmers. It has similarities
279- to a Python `` memoryview ` ` object in that it provides direct access to data stored in memory.
258+ to a Python `memoryview ` object in that it provides direct access to data stored in memory.
280259Items are accessed using subscript notation, but slices are not supported: a pointer can return
281260a single item only. Its purpose is to provide fast random access to data stored in contiguous
282261memory locations - such as data stored in objects which support the buffer protocol, and
@@ -330,3 +309,34 @@ The following example illustrates the use of a ``ptr16`` cast to toggle pin X1 `
330309 A detailed technical description of the three code emitters may be found
331310on Kickstarter here `Note 1 <https://www.kickstarter.com/projects/214379695/micro-python-python-for-microcontrollers/posts/664832 >`_
332311and here `Note 2 <https://www.kickstarter.com/projects/214379695/micro-python-python-for-microcontrollers/posts/665145 >`_
312+
313+ Accessing hardware directly
314+ ---------------------------
315+
316+ .. note ::
317+
318+ Code examples in this section are given for the Pyboard. The techniques
319+ described however may be applied to other MicroPython ports too.
320+
321+ This comes into the category of more advanced programming and involves some knowledge
322+ of the target MCU. Consider the example of toggling an output pin on the Pyboard. The
323+ standard approach would be to write
324+
325+ .. code :: python
326+
327+ mypin.value(mypin.value() ^ 1 ) # mypin was instantiated as an output pin
328+
329+ This involves the overhead of two calls to the `Pin ` instance's :meth: `~machine.Pin.value() `
330+ method. This overhead can be eliminated by performing a read/write to the relevant bit
331+ of the chip's GPIO port output data register (odr). To facilitate this the ``stm ``
332+ module provides a set of constants providing the addresses of the relevant registers.
333+ A fast toggle of pin ``P4 `` (CPU pin ``A14 ``) - corresponding to the green LED -
334+ can be performed as follows:
335+
336+ .. code :: python
337+
338+ import machine
339+ import stm
340+
341+ BIT14 = const(1 << 14 )
342+ machine.mem16[stm.GPIOA + stm.GPIO_ODR ] ^= BIT14
0 commit comments