Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Extend extensions page (#1080)
* started working on additional information for extension

* extended documentation

* final pass over extensions

* Update doc/extensions.rst

Co-authored-by: Matthias Feurer <feurerm@informatik.uni-freiburg.de>

* Update doc/extensions.rst

Co-authored-by: Matthias Feurer <feurerm@informatik.uni-freiburg.de>

* changes suggested by MF

* Update doc/extensions.rst

Co-authored-by: PGijsbers <p.gijsbers@tue.nl>

* Update doc/extensions.rst

Co-authored-by: PGijsbers <p.gijsbers@tue.nl>

* Update doc/extensions.rst

Co-authored-by: PGijsbers <p.gijsbers@tue.nl>

* added info to optional method

* fix documentation building

* updated doc

Co-authored-by: Matthias Feurer <feurerm@informatik.uni-freiburg.de>
Co-authored-by: PGijsbers <p.gijsbers@tue.nl>
  • Loading branch information
3 people authored May 18, 2021
commit 79e647df81e98e41ab4e65a27f928e3e328db4ed
6 changes: 1 addition & 5 deletions doc/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,4 @@ In particular, a few ways to contribute to openml-python are:

* Visit one of our `hackathons <https://meet.openml.org/>`_.

* Contribute to another OpenML project, such as `the main OpenML project <https://github.com/openml/OpenML/blob/main/CONTRIBUTING.md>`_.

.. _extensions:


* Contribute to another OpenML project, such as `the main OpenML project <https://github.com/openml/OpenML/blob/master/CONTRIBUTING.md>`_.
86 changes: 82 additions & 4 deletions doc/extensions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,14 @@ Connecting new machine learning libraries
Content of the Library
~~~~~~~~~~~~~~~~~~~~~~

To leverage support from the community and to tap in the potential of OpenML, interfacing
with popular machine learning libraries is essential. However, the OpenML-Python team does
not have the capacity to develop and maintain such interfaces on its own. For this, we
To leverage support from the community and to tap in the potential of OpenML,
interfacing with popular machine learning libraries is essential.
The OpenML-Python package is capable of downloading meta-data and results (data,
flows, runs), regardless of the library that was used to upload it.
However, in order to simplify the process of uploading flows and runs from a
specific library, an additional interface can be built.
The OpenML-Python team does not have the capacity to develop and maintain such
interfaces on its own. For this reason, we
have built an extension interface to allows others to contribute back. Building a suitable
extension for therefore requires an understanding of the current OpenML-Python support.

Expand All @@ -48,7 +53,7 @@ API
* This class needs to have all the functions from `class Extension` overloaded as required.
* The redefined functions should have adequate and appropriate docstrings. The
`Sklearn Extension API :class:`openml.extensions.sklearn.SklearnExtension.html`
is a good benchmark to follow.
is a good example to follow.


Interfacing with OpenML-Python
Expand All @@ -57,6 +62,79 @@ Once the new extension class has been defined, the openml-python module to
:meth:`openml.extensions.register_extension` must be called to allow OpenML-Python to
interface the new extension.

The following methods should get implemented. Although the documentation in
the `Extension` interface should always be leading, here we list some additional
information and best practices.
The `Sklearn Extension API :class:`openml.extensions.sklearn.SklearnExtension.html`
is a good example to follow. Note that most methods are relatively simple and can be implemented in several lines of code.

* General setup (required)

* :meth:`can_handle_flow`: Takes as argument an OpenML flow, and checks
whether this can be handled by the current extension. The OpenML database
consists of many flows, from various workbenches (e.g., scikit-learn, Weka,
mlr). This method is called before a model is being deserialized.
Typically, the flow-dependency field is used to check whether the specific
library is present, and no unknown libraries are present there.
* :meth:`can_handle_model`: Similar as :meth:`can_handle_flow`, except that
in this case a Python object is given. As such, in many cases, this method
can be implemented by checking whether this adheres to a certain base class.
* Serialization and De-serialization (required)

* :meth:`flow_to_model`: deserializes the OpenML Flow into a model (if the
library can indeed handle the flow). This method has an important interplay
with :meth:`model_to_flow`.
Running these two methods in succession should result in exactly the same
model (or flow). This property can be used for unit testing (e.g., build a
model with hyperparameters, make predictions on a task, serialize it to a flow,
deserialize it back, make it predict on the same task, and check whether the
predictions are exactly the same.)
The example in the scikit-learn interface might seem daunting, but note that
here some complicated design choices were made, that allow for all sorts of
interesting research questions. It is probably good practice to start easy.
* :meth:`model_to_flow`: The inverse of :meth:`flow_to_model`. Serializes a
model into an OpenML Flow. The flow should preserve the class, the library
version, and the tunable hyperparameters.
* :meth:`get_version_information`: Return a tuple with the version information
of the important libraries.
* :meth:`create_setup_string`: No longer used, and will be deprecated soon.
* Performing runs (required)

* :meth:`is_estimator`: Gets as input a class, and checks whether it has the
status of estimator in the library (typically, whether it has a train method
and a predict method).
* :meth:`seed_model`: Sets a random seed to the model.
* :meth:`_run_model_on_fold`: One of the main requirements for a library to
generate run objects for the OpenML server. Obtains a train split (with
labels) and a test split (without labels) and the goal is to train a model
on the train split and return the predictions on the test split.
On top of the actual predictions, also the class probabilities should be
determined.
For classifiers that do not return class probabilities, this can just be the
hot-encoded predicted label.
The predictions will be evaluated on the OpenML server.
Also, additional information can be returned, for example, user-defined
measures (such as runtime information, as this can not be inferred on the
server).
Additionally, information about a hyperparameter optimization trace can be
provided.
* :meth:`obtain_parameter_values`: Obtains the hyperparameters of a given
model and the current values. Please note that in the case of a hyperparameter
optimization procedure (e.g., random search), you only should return the
hyperparameters of this procedure (e.g., the hyperparameter grid, budget,
etc) and that the chosen model will be inferred from the optimization trace.
* :meth:`check_if_model_fitted`: Check whether the train method of the model
has been called (and as such, whether the predict method can be used).
* Hyperparameter optimization (optional)

* :meth:`instantiate_model_from_hpo_class`: If a given run has recorded the
hyperparameter optimization trace, then this method can be used to
reinstantiate the model with hyperparameters of a given hyperparameter
optimization iteration. Has some similarities with :meth:`flow_to_model` (as
this method also sets the hyperparameters of a model).
Note that although this method is required, it is not necessary to implement
any logic if hyperparameter optimization is not implemented. Simply raise
a `NotImplementedError` then.

Hosting the library
~~~~~~~~~~~~~~~~~~~
Expand Down
4 changes: 3 additions & 1 deletion doc/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,9 @@ Docker

It is also possible to try out the latest development version of ``openml-python`` with docker:

``docker run -it openml/openml-python``
.. code:: bash

docker run -it openml/openml-python

See the `openml-python docker documentation <https://github.com/openml/openml-python/blob/main/docker/readme.md>`_ for more information.

Expand Down