|
| 1 | +# Extensions |
| 2 | + |
| 3 | +OpenML-Python provides an extension interface to connect other machine |
| 4 | +learning libraries than scikit-learn to OpenML. Please check the |
| 5 | +`api_extensions`{.interpreted-text role="ref"} and use the scikit-learn |
| 6 | +extension in |
| 7 | +`openml.extensions.sklearn.SklearnExtension`{.interpreted-text |
| 8 | +role="class"} as a starting point. |
| 9 | + |
| 10 | +## List of extensions |
| 11 | + |
| 12 | +Here is a list of currently maintained OpenML extensions: |
| 13 | + |
| 14 | +- `openml.extensions.sklearn.SklearnExtension`{.interpreted-text |
| 15 | + role="class"} |
| 16 | +- [openml-keras](https://github.com/openml/openml-keras) |
| 17 | +- [openml-pytorch](https://github.com/openml/openml-pytorch) |
| 18 | +- [openml-tensorflow (for tensorflow |
| 19 | + 2+)](https://github.com/openml/openml-tensorflow) |
| 20 | + |
| 21 | +## Connecting new machine learning libraries |
| 22 | + |
| 23 | +### Content of the Library |
| 24 | + |
| 25 | +To leverage support from the community and to tap in the potential of |
| 26 | +OpenML, interfacing with popular machine learning libraries is |
| 27 | +essential. The OpenML-Python package is capable of downloading meta-data |
| 28 | +and results (data, flows, runs), regardless of the library that was used |
| 29 | +to upload it. However, in order to simplify the process of uploading |
| 30 | +flows and runs from a specific library, an additional interface can be |
| 31 | +built. The OpenML-Python team does not have the capacity to develop and |
| 32 | +maintain such interfaces on its own. For this reason, we have built an |
| 33 | +extension interface to allows others to contribute back. Building a |
| 34 | +suitable extension for therefore requires an understanding of the |
| 35 | +current OpenML-Python support. |
| 36 | + |
| 37 | +The |
| 38 | +`sphx_glr_examples_20_basic_simple_flows_and_runs_tutorial.py`{.interpreted-text |
| 39 | +role="ref"} tutorial shows how scikit-learn currently works with |
| 40 | +OpenML-Python as an extension. The *sklearn* extension packaged with the |
| 41 | +[openml-python](https://github.com/openml/openml-python) repository can |
| 42 | +be used as a template/benchmark to build the new extension. |
| 43 | + |
| 44 | +#### API |
| 45 | + |
| 46 | +- The extension scripts must import the [openml]{.title-ref} package |
| 47 | + and be able to interface with any function from the OpenML-Python |
| 48 | + `api`{.interpreted-text role="ref"}. |
| 49 | +- The extension has to be defined as a Python class and must inherit |
| 50 | + from `openml.extensions.Extension`{.interpreted-text role="class"}. |
| 51 | +- This class needs to have all the functions from [class |
| 52 | + Extension]{.title-ref} overloaded as required. |
| 53 | +- The redefined functions should have adequate and appropriate |
| 54 | + docstrings. The [Sklearn Extension API |
| 55 | + :class:\`openml.extensions.sklearn.SklearnExtension.html]{.title-ref} |
| 56 | + is a good example to follow. |
| 57 | + |
| 58 | +#### Interfacing with OpenML-Python |
| 59 | + |
| 60 | +Once the new extension class has been defined, the openml-python module |
| 61 | +to `openml.extensions.register_extension`{.interpreted-text role="meth"} |
| 62 | +must be called to allow OpenML-Python to interface the new extension. |
| 63 | + |
| 64 | +The following methods should get implemented. Although the documentation |
| 65 | +in the [Extension]{.title-ref} interface should always be leading, here |
| 66 | +we list some additional information and best practices. The [Sklearn |
| 67 | +Extension API |
| 68 | +:class:\`openml.extensions.sklearn.SklearnExtension.html]{.title-ref} is |
| 69 | +a good example to follow. Note that most methods are relatively simple |
| 70 | +and can be implemented in several lines of code. |
| 71 | + |
| 72 | +- General setup (required) |
| 73 | + - `can_handle_flow`{.interpreted-text role="meth"}: Takes as |
| 74 | + argument an OpenML flow, and checks whether this can be handled |
| 75 | + by the current extension. The OpenML database consists of many |
| 76 | + flows, from various workbenches (e.g., scikit-learn, Weka, mlr). |
| 77 | + This method is called before a model is being deserialized. |
| 78 | + Typically, the flow-dependency field is used to check whether |
| 79 | + the specific library is present, and no unknown libraries are |
| 80 | + present there. |
| 81 | + - `can_handle_model`{.interpreted-text role="meth"}: Similar as |
| 82 | + `can_handle_flow`{.interpreted-text role="meth"}, except that in |
| 83 | + this case a Python object is given. As such, in many cases, this |
| 84 | + method can be implemented by checking whether this adheres to a |
| 85 | + certain base class. |
| 86 | +- Serialization and De-serialization (required) |
| 87 | + - `flow_to_model`{.interpreted-text role="meth"}: deserializes the |
| 88 | + OpenML Flow into a model (if the library can indeed handle the |
| 89 | + flow). This method has an important interplay with |
| 90 | + `model_to_flow`{.interpreted-text role="meth"}. Running these |
| 91 | + two methods in succession should result in exactly the same |
| 92 | + model (or flow). This property can be used for unit testing |
| 93 | + (e.g., build a model with hyperparameters, make predictions on a |
| 94 | + task, serialize it to a flow, deserialize it back, make it |
| 95 | + predict on the same task, and check whether the predictions are |
| 96 | + exactly the same.) The example in the scikit-learn interface |
| 97 | + might seem daunting, but note that here some complicated design |
| 98 | + choices were made, that allow for all sorts of interesting |
| 99 | + research questions. It is probably good practice to start easy. |
| 100 | + - `model_to_flow`{.interpreted-text role="meth"}: The inverse of |
| 101 | + `flow_to_model`{.interpreted-text role="meth"}. Serializes a |
| 102 | + model into an OpenML Flow. The flow should preserve the class, |
| 103 | + the library version, and the tunable hyperparameters. |
| 104 | + - `get_version_information`{.interpreted-text role="meth"}: Return |
| 105 | + a tuple with the version information of the important libraries. |
| 106 | + - `create_setup_string`{.interpreted-text role="meth"}: No longer |
| 107 | + used, and will be deprecated soon. |
| 108 | +- Performing runs (required) |
| 109 | + - `is_estimator`{.interpreted-text role="meth"}: Gets as input a |
| 110 | + class, and checks whether it has the status of estimator in the |
| 111 | + library (typically, whether it has a train method and a predict |
| 112 | + method). |
| 113 | + - `seed_model`{.interpreted-text role="meth"}: Sets a random seed |
| 114 | + to the model. |
| 115 | + - `_run_model_on_fold`{.interpreted-text role="meth"}: One of the |
| 116 | + main requirements for a library to generate run objects for the |
| 117 | + OpenML server. Obtains a train split (with labels) and a test |
| 118 | + split (without labels) and the goal is to train a model on the |
| 119 | + train split and return the predictions on the test split. On top |
| 120 | + of the actual predictions, also the class probabilities should |
| 121 | + be determined. For classifiers that do not return class |
| 122 | + probabilities, this can just be the hot-encoded predicted label. |
| 123 | + The predictions will be evaluated on the OpenML server. Also, |
| 124 | + additional information can be returned, for example, |
| 125 | + user-defined measures (such as runtime information, as this can |
| 126 | + not be inferred on the server). Additionally, information about |
| 127 | + a hyperparameter optimization trace can be provided. |
| 128 | + - `obtain_parameter_values`{.interpreted-text role="meth"}: |
| 129 | + Obtains the hyperparameters of a given model and the current |
| 130 | + values. Please note that in the case of a hyperparameter |
| 131 | + optimization procedure (e.g., random search), you only should |
| 132 | + return the hyperparameters of this procedure (e.g., the |
| 133 | + hyperparameter grid, budget, etc) and that the chosen model will |
| 134 | + be inferred from the optimization trace. |
| 135 | + - `check_if_model_fitted`{.interpreted-text role="meth"}: Check |
| 136 | + whether the train method of the model has been called (and as |
| 137 | + such, whether the predict method can be used). |
| 138 | +- Hyperparameter optimization (optional) |
| 139 | + - `instantiate_model_from_hpo_class`{.interpreted-text |
| 140 | + role="meth"}: If a given run has recorded the hyperparameter |
| 141 | + optimization trace, then this method can be used to |
| 142 | + reinstantiate the model with hyperparameters of a given |
| 143 | + hyperparameter optimization iteration. Has some similarities |
| 144 | + with `flow_to_model`{.interpreted-text role="meth"} (as this |
| 145 | + method also sets the hyperparameters of a model). Note that |
| 146 | + although this method is required, it is not necessary to |
| 147 | + implement any logic if hyperparameter optimization is not |
| 148 | + implemented. Simply raise a [NotImplementedError]{.title-ref} |
| 149 | + then. |
| 150 | + |
| 151 | +### Hosting the library |
| 152 | + |
| 153 | +Each extension created should be a stand-alone repository, compatible |
| 154 | +with the [OpenML-Python |
| 155 | +repository](https://github.com/openml/openml-python). The extension |
| 156 | +repository should work off-the-shelf with *OpenML-Python* installed. |
| 157 | + |
| 158 | +Create a [public Github |
| 159 | +repo](https://docs.github.com/en/github/getting-started-with-github/create-a-repo) |
| 160 | +with the following directory structure: |
| 161 | + |
| 162 | + | [repo name] |
| 163 | + | |-- [extension name] |
| 164 | + | | |-- __init__.py |
| 165 | + | | |-- extension.py |
| 166 | + | | |-- config.py (optionally) |
| 167 | + |
| 168 | +### Recommended |
| 169 | + |
| 170 | +- Test cases to keep the extension up to date with the |
| 171 | + [openml-python]{.title-ref} upstream changes. |
| 172 | +- Documentation of the extension API, especially if any new |
| 173 | + functionality added to OpenML-Python\'s extension design. |
| 174 | +- Examples to show how the new extension interfaces and works with |
| 175 | + OpenML-Python. |
| 176 | +- Create a PR to add the new extension to the OpenML-Python API |
| 177 | + documentation. |
| 178 | + |
| 179 | +Happy contributing! |
0 commit comments