|
| 1 | +<html> |
| 2 | +<head> |
| 3 | + <title>BPL Pickle Support</title> |
| 4 | +</head> |
| 5 | +<body> |
| 6 | + |
| 7 | +<img src="../../../c++boost.gif" |
| 8 | + alt="c++boost.gif (8819 bytes)" |
| 9 | + align="center" |
| 10 | + width="277" height="86"> |
| 11 | + |
| 12 | +</body> |
| 13 | +<hr> |
| 14 | +<h1>BPL Pickle Support</h1> |
| 15 | + |
| 16 | +Pickle is a Python module for object serialization, also known |
| 17 | +as persistence, marshalling, or flattening. |
| 18 | + |
| 19 | +<p> |
| 20 | +It is often necessary to save and restore the contents of an object to |
| 21 | +a file. One approach to this problem is to write a pair of functions |
| 22 | +that read and write data from a file in a special format. A powerful |
| 23 | +alternative approach is to use Python's pickle module. Exploiting |
| 24 | +Python's ability for introspection, the pickle module recursively |
| 25 | +converts nearly arbitrary Python objects into a stream of bytes that |
| 26 | +can be written to a file. |
| 27 | + |
| 28 | +<p> |
| 29 | +The Boost Python Library supports the pickle module by emulating the |
| 30 | +interface implemented by Jim Fulton's ExtensionClass module that is |
| 31 | +included in the ZOPE distribution |
| 32 | +(<a href="http://www.zope.org/">http://www.zope.org/</a>). |
| 33 | +This interface is similar to that for regular Python classes as |
| 34 | +described in detail in the Python Library Reference for pickle: |
| 35 | + |
| 36 | +<blockquote> |
| 37 | + <a href="http://www.python.org/doc/current/lib/module-pickle.html" |
| 38 | + >http://www.python.org/doc/current/lib/module-pickle.html</a> |
| 39 | +</blockquote> |
| 40 | + |
| 41 | +<hr> |
| 42 | +<h1>The BPL Pickle Interface</h1> |
| 43 | + |
| 44 | +At the user level, the BPL pickle interface involves three special |
| 45 | +methods: |
| 46 | + |
| 47 | +<dl> |
| 48 | +<dt> |
| 49 | +<strong>__getinitargs__</strong> |
| 50 | +<dd> |
| 51 | + When an instance of a BPL extension class is pickled, the pickler |
| 52 | + tests if the instance has a __getinitargs__ method. This method must |
| 53 | + return a Python tuple. When the instance is restored by the |
| 54 | + unpickler, the contents of this tuple are used as the arguments for |
| 55 | + the class constructor. |
| 56 | + |
| 57 | + <p> |
| 58 | + If __getinitargs__ is not defined, the class constructor will be |
| 59 | + called without arguments. |
| 60 | + |
| 61 | +<p> |
| 62 | +<dt> |
| 63 | +<strong>__getstate__</strong> |
| 64 | + |
| 65 | +<dd> |
| 66 | + When an instance of a BPL extension class is pickled, the pickler |
| 67 | + tests if the instance has a __getstate__ method. This method should |
| 68 | + return a Python object representing the state of the instance. |
| 69 | + |
| 70 | + <p> |
| 71 | + If __getstate__ is not defined, the instance's __dict__ is pickled |
| 72 | + (if it is not empty). |
| 73 | + |
| 74 | +<p> |
| 75 | +<dt> |
| 76 | +<strong>__setstate__</strong> |
| 77 | + |
| 78 | +<dd> |
| 79 | + When an instance of a BPL extension class is restored by the |
| 80 | + unpickler, it is first constructed using the result of |
| 81 | + __getinitargs__ as arguments (see above). Subsequently the unpickler |
| 82 | + tests if the new instance has a __setstate__ method. If so, this |
| 83 | + method is called with the result of __getstate__ (a Python object) as |
| 84 | + the argument. |
| 85 | + |
| 86 | + <p> |
| 87 | + If __setstate__ is not defined, the result of __getstate__ must be |
| 88 | + a Python dictionary. The items of this dictionary are added to |
| 89 | + the instance's __dict__. |
| 90 | +</dl> |
| 91 | + |
| 92 | +If both __getstate__ and __setstate__ are defined, the Python object |
| 93 | +returned by __getstate__ need not be a dictionary. The __getstate__ and |
| 94 | +__setstate__ methods can do what they want. |
| 95 | + |
| 96 | +<hr> |
| 97 | +<h1>Pitfalls and Safety Guards</h1> |
| 98 | + |
| 99 | +In BPL extension modules with many extension classes, providing |
| 100 | +complete pickle support for all classes would be a significant |
| 101 | +overhead. In general complete pickle support should only be implemented |
| 102 | +for extension classes that will eventually be pickled. However, the |
| 103 | +author of a BPL extension module might not anticipate correctly which |
| 104 | +classes need support for pickle. Unfortunately, the pickle protocol |
| 105 | +described above has two important pitfalls that the end user of a BPL |
| 106 | +extension module might not be aware of: |
| 107 | + |
| 108 | +<dl> |
| 109 | +<dt> |
| 110 | +<strong>Pitfall 1:</strong> |
| 111 | +Both __getinitargs__ and __getstate__ are not defined. |
| 112 | + |
| 113 | +<dd> |
| 114 | + In this situation the unpickler calls the class constructor without |
| 115 | + arguments and then adds the __dict__ that was pickled by default to |
| 116 | + that of the new instance. |
| 117 | + |
| 118 | + <p> |
| 119 | + However, most C++ classes wrapped with the BPL will have member data |
| 120 | + that are not restored correctly by this procedure. To alert the user |
| 121 | + to this problem, a safety guard is provided. If both __getinitargs__ |
| 122 | + and __getstate__ are not defined, the BPL tests if the class has an |
| 123 | + attribute __dict_defines_state__. An exception is raised if this |
| 124 | + attribute is not defined: |
| 125 | + |
| 126 | +<pre> |
| 127 | + RuntimeError: Incomplete pickle support (__dict_defines_state__ not set) |
| 128 | +</pre> |
| 129 | + |
| 130 | + In the rare cases where this is not the desired behavior, the safety |
| 131 | + guard can deliberately be disabled. The corresponding C++ code for |
| 132 | + this is, e.g.: |
| 133 | + |
| 134 | +<pre> |
| 135 | + class_builder<your_class> py_your_class(your_module, "your_class"); |
| 136 | + py_your_class.dict_defines_state(); |
| 137 | +</pre> |
| 138 | + |
| 139 | + It is also possible to override the safety guard at the Python level. |
| 140 | + E.g.: |
| 141 | + |
| 142 | +<pre> |
| 143 | + import your_bpl_module |
| 144 | + class your_class(your_bpl_module.your_class): |
| 145 | + __dict_defines_state__ = 1 |
| 146 | +</pre> |
| 147 | + |
| 148 | +<p> |
| 149 | +<dt> |
| 150 | +<strong>Pitfall 2:</strong> |
| 151 | +__getstate__ is defined and the instance's __dict__ is not empty. |
| 152 | + |
| 153 | +<dd> |
| 154 | + The author of a BPL extension class might provide a __getstate__ |
| 155 | + method without considering the possibilities that: |
| 156 | + |
| 157 | + <p> |
| 158 | + <ul> |
| 159 | + <li> |
| 160 | + his class is used as a base class. Most likely the __dict__ of |
| 161 | + instances of the derived class needs to be pickled in order to |
| 162 | + restore the instances correctly. |
| 163 | + |
| 164 | + <p> |
| 165 | + <li> |
| 166 | + the user adds items to the instance's __dict__ directly. Again, |
| 167 | + the __dict__ of the instance then needs to be pickled. |
| 168 | + </ul> |
| 169 | + <p> |
| 170 | + |
| 171 | + To alert the user to this highly unobvious problem, a safety guard is |
| 172 | + provided. If __getstate__ is defined and the instance's __dict__ is |
| 173 | + not empty, the BPL tests if the class has an attribute |
| 174 | + __getstate_manages_dict__. An exception is raised if this attribute |
| 175 | + is not defined: |
| 176 | + |
| 177 | +<pre> |
| 178 | + RuntimeError: Incomplete pickle support (__getstate_manages_dict__ not set) |
| 179 | +</pre> |
| 180 | + |
| 181 | + To resolve this problem, it should first be established that the |
| 182 | + __getstate__ and __setstate__ methods manage the instances's __dict__ |
| 183 | + correctly. Note that this can be done both at the C++ and the Python |
| 184 | + level. Finally, the safety guard should intentionally be overridden. |
| 185 | + E.g. in C++: |
| 186 | + |
| 187 | +<pre> |
| 188 | + class_builder<your_class> py_your_class(your_module, "your_class"); |
| 189 | + py_your_class.getstate_manages_dict(); |
| 190 | +</pre> |
| 191 | + |
| 192 | + In Python: |
| 193 | + |
| 194 | +<pre> |
| 195 | + import your_bpl_module |
| 196 | + class your_class(your_bpl_module.your_class): |
| 197 | + __getstate_manages_dict__ = 1 |
| 198 | + def __getstate__(self): |
| 199 | + # your code here |
| 200 | + def __setstate__(self, state): |
| 201 | + # your code here |
| 202 | +</pre> |
| 203 | +</dl> |
| 204 | + |
| 205 | +<hr> |
| 206 | +<h1>Practical Advice</h1> |
| 207 | + |
| 208 | +<ul> |
| 209 | +<li> |
| 210 | + Avoid using __getstate__ if the instance can also be reconstructed |
| 211 | + by way of __getinitargs__. This automatically avoids Pitfall 2. |
| 212 | + |
| 213 | +<p> |
| 214 | +<li> |
| 215 | + If __getstate__ is required, include the instance's __dict__ in the |
| 216 | + Python object that is returned. |
| 217 | +</ul> |
| 218 | + |
| 219 | +<hr> |
| 220 | +<address> |
| 221 | +Author: Ralf W. Grosse-Kunstleve, March 2001 |
| 222 | +</address> |
| 223 | +</html> |
0 commit comments