Skip to content

Commit d0f31b9

Browse files
Docs/mkdoc (openml#1379)
Co-authored-by: SubhadityaMukherjee <msubhaditya@gmail.com> Co-authored-by: Subhaditya Mukherjee <26865436+SubhadityaMukherjee@users.noreply.github.com>
1 parent c66d22a commit d0f31b9

35 files changed

Lines changed: 2025 additions & 846 deletions

.github/workflows/docs.yaml

Lines changed: 35 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -22,48 +22,39 @@ jobs:
2222
build-and-deploy:
2323
runs-on: ubuntu-latest
2424
steps:
25-
- uses: actions/checkout@v4
26-
- name: Setup Python
27-
uses: actions/setup-python@v5
28-
with:
29-
python-version: 3.8
30-
- name: Install dependencies
31-
run: |
25+
- uses: actions/checkout@v4
26+
- name: Setup Python
27+
uses: actions/setup-python@v5
28+
with:
29+
python-version: 3.8
30+
- name: Install dependencies
31+
run: |
3232
pip install -e .[docs,examples]
33-
- name: Make docs
34-
run: |
35-
cd doc
36-
make html
37-
- name: Check links
38-
run: |
39-
cd doc
40-
make linkcheck
41-
- name: Pull latest gh-pages
42-
if: (contains(github.ref, 'develop') || contains(github.ref, 'main')) && github.event_name == 'push'
43-
run: |
44-
cd ..
45-
git clone https://github.com/openml/openml-python.git --branch gh-pages --single-branch gh-pages
46-
- name: Copy new doc into gh-pages
47-
if: (contains(github.ref, 'develop') || contains(github.ref, 'main')) && github.event_name == 'push'
48-
run: |
49-
branch_name=${GITHUB_REF##*/}
50-
cd ../gh-pages
51-
rm -rf $branch_name
52-
cp -r ../openml-python/doc/build/html $branch_name
53-
- name: Push to gh-pages
54-
if: (contains(github.ref, 'develop') || contains(github.ref, 'main')) && github.event_name == 'push'
55-
run: |
56-
last_commit=$(git log --pretty=format:"%an: %s")
57-
cd ../gh-pages
58-
branch_name=${GITHUB_REF##*/}
59-
git add $branch_name/
60-
git config --global user.name 'Github Actions'
61-
git config --global user.email 'not@mail.com'
62-
git remote set-url origin https://x-access-token:${{ secrets.GITHUB_TOKEN }}@github.com/${{ github.repository }}
63-
# Only commit and push if there are changes
64-
if ! git diff --cached --quiet; then
65-
git commit -m "$last_commit"
66-
git push
67-
else
68-
echo "Branch is up to date with origin/gh-pages, no need to update docs. Skipping."
69-
fi
33+
- name: Make docs
34+
run: |
35+
mkdocs build
36+
- name: Deploy to GitHub Pages
37+
env:
38+
CI: false
39+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
40+
PAGES_BRANCH: gh-pages
41+
if: (contains(github.ref, 'develop') || contains(github.ref, 'main')) && github.event_name == 'push'
42+
run: |
43+
# mkdocs gh-deploy --force
44+
git config user.name doc-bot
45+
git config user.email doc-bot@openml.com
46+
current_version=$(git tag | sort --version-sort | tail -n 1)
47+
# This block will rename previous retitled versions
48+
retitled_versions=$(mike list -j | jq ".[] | select(.title != .version) | .version" | tr -d '"')
49+
for version in $retitled_versions; do
50+
mike retitle "${version}" "${version}"
51+
done
52+
53+
echo "Deploying docs for ${current_version}"
54+
mike deploy \
55+
--push \
56+
--title "${current_version} (latest)" \
57+
--update-aliases \
58+
"${current_version}" \
59+
"latest"\
60+
-b $PAGES_BRANCH origin/$PAGES_BRANCH

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
doc/generated
33
examples/.ipynb_checkpoints
44
venv
5+
.uv-lock
56

67
# Byte-compiled / optimized / DLL files
78
__pycache__/

docs/contributing.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# Contributing
2+
3+
Contribution to the OpenML package is highly appreciated in all forms.
4+
In particular, a few ways to contribute to openml-python are:
5+
6+
- A direct contribution to the package, by means of improving the
7+
code, documentation or examples. To get started, see [this
8+
file](https://github.com/openml/openml-python/blob/main/CONTRIBUTING.md)
9+
with details on how to set up your environment to develop for
10+
openml-python.
11+
- A contribution to an openml-python extension. An extension package
12+
allows OpenML to interface with a machine learning package (such
13+
as scikit-learn or keras). These extensions are hosted in separate
14+
repositories and may have their own guidelines. For more
15+
information, see also [extensions](extensions.md).
16+
- Bug reports. If something doesn't work for you or is cumbersome,
17+
please open a new issue to let us know about the problem. See
18+
[this
19+
section](https://github.com/openml/openml-python/blob/main/CONTRIBUTING.md).
20+
- [Cite OpenML](https://www.openml.org/cite) if you use it in a
21+
scientific publication.
22+
- Visit one of our [hackathons](https://www.openml.org/meet).
23+
- Contribute to another OpenML project, such as [the main OpenML
24+
project](https://github.com/openml/OpenML/blob/master/CONTRIBUTING.md).

docs/extensions.md

Lines changed: 179 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,179 @@
1+
# Extensions
2+
3+
OpenML-Python provides an extension interface to connect other machine
4+
learning libraries than scikit-learn to OpenML. Please check the
5+
`api_extensions`{.interpreted-text role="ref"} and use the scikit-learn
6+
extension in
7+
`openml.extensions.sklearn.SklearnExtension`{.interpreted-text
8+
role="class"} as a starting point.
9+
10+
## List of extensions
11+
12+
Here is a list of currently maintained OpenML extensions:
13+
14+
- `openml.extensions.sklearn.SklearnExtension`{.interpreted-text
15+
role="class"}
16+
- [openml-keras](https://github.com/openml/openml-keras)
17+
- [openml-pytorch](https://github.com/openml/openml-pytorch)
18+
- [openml-tensorflow (for tensorflow
19+
2+)](https://github.com/openml/openml-tensorflow)
20+
21+
## Connecting new machine learning libraries
22+
23+
### Content of the Library
24+
25+
To leverage support from the community and to tap in the potential of
26+
OpenML, interfacing with popular machine learning libraries is
27+
essential. The OpenML-Python package is capable of downloading meta-data
28+
and results (data, flows, runs), regardless of the library that was used
29+
to upload it. However, in order to simplify the process of uploading
30+
flows and runs from a specific library, an additional interface can be
31+
built. The OpenML-Python team does not have the capacity to develop and
32+
maintain such interfaces on its own. For this reason, we have built an
33+
extension interface to allows others to contribute back. Building a
34+
suitable extension for therefore requires an understanding of the
35+
current OpenML-Python support.
36+
37+
The
38+
`sphx_glr_examples_20_basic_simple_flows_and_runs_tutorial.py`{.interpreted-text
39+
role="ref"} tutorial shows how scikit-learn currently works with
40+
OpenML-Python as an extension. The *sklearn* extension packaged with the
41+
[openml-python](https://github.com/openml/openml-python) repository can
42+
be used as a template/benchmark to build the new extension.
43+
44+
#### API
45+
46+
- The extension scripts must import the [openml]{.title-ref} package
47+
and be able to interface with any function from the OpenML-Python
48+
`api`{.interpreted-text role="ref"}.
49+
- The extension has to be defined as a Python class and must inherit
50+
from `openml.extensions.Extension`{.interpreted-text role="class"}.
51+
- This class needs to have all the functions from [class
52+
Extension]{.title-ref} overloaded as required.
53+
- The redefined functions should have adequate and appropriate
54+
docstrings. The [Sklearn Extension API
55+
:class:\`openml.extensions.sklearn.SklearnExtension.html]{.title-ref}
56+
is a good example to follow.
57+
58+
#### Interfacing with OpenML-Python
59+
60+
Once the new extension class has been defined, the openml-python module
61+
to `openml.extensions.register_extension`{.interpreted-text role="meth"}
62+
must be called to allow OpenML-Python to interface the new extension.
63+
64+
The following methods should get implemented. Although the documentation
65+
in the [Extension]{.title-ref} interface should always be leading, here
66+
we list some additional information and best practices. The [Sklearn
67+
Extension API
68+
:class:\`openml.extensions.sklearn.SklearnExtension.html]{.title-ref} is
69+
a good example to follow. Note that most methods are relatively simple
70+
and can be implemented in several lines of code.
71+
72+
- General setup (required)
73+
- `can_handle_flow`{.interpreted-text role="meth"}: Takes as
74+
argument an OpenML flow, and checks whether this can be handled
75+
by the current extension. The OpenML database consists of many
76+
flows, from various workbenches (e.g., scikit-learn, Weka, mlr).
77+
This method is called before a model is being deserialized.
78+
Typically, the flow-dependency field is used to check whether
79+
the specific library is present, and no unknown libraries are
80+
present there.
81+
- `can_handle_model`{.interpreted-text role="meth"}: Similar as
82+
`can_handle_flow`{.interpreted-text role="meth"}, except that in
83+
this case a Python object is given. As such, in many cases, this
84+
method can be implemented by checking whether this adheres to a
85+
certain base class.
86+
- Serialization and De-serialization (required)
87+
- `flow_to_model`{.interpreted-text role="meth"}: deserializes the
88+
OpenML Flow into a model (if the library can indeed handle the
89+
flow). This method has an important interplay with
90+
`model_to_flow`{.interpreted-text role="meth"}. Running these
91+
two methods in succession should result in exactly the same
92+
model (or flow). This property can be used for unit testing
93+
(e.g., build a model with hyperparameters, make predictions on a
94+
task, serialize it to a flow, deserialize it back, make it
95+
predict on the same task, and check whether the predictions are
96+
exactly the same.) The example in the scikit-learn interface
97+
might seem daunting, but note that here some complicated design
98+
choices were made, that allow for all sorts of interesting
99+
research questions. It is probably good practice to start easy.
100+
- `model_to_flow`{.interpreted-text role="meth"}: The inverse of
101+
`flow_to_model`{.interpreted-text role="meth"}. Serializes a
102+
model into an OpenML Flow. The flow should preserve the class,
103+
the library version, and the tunable hyperparameters.
104+
- `get_version_information`{.interpreted-text role="meth"}: Return
105+
a tuple with the version information of the important libraries.
106+
- `create_setup_string`{.interpreted-text role="meth"}: No longer
107+
used, and will be deprecated soon.
108+
- Performing runs (required)
109+
- `is_estimator`{.interpreted-text role="meth"}: Gets as input a
110+
class, and checks whether it has the status of estimator in the
111+
library (typically, whether it has a train method and a predict
112+
method).
113+
- `seed_model`{.interpreted-text role="meth"}: Sets a random seed
114+
to the model.
115+
- `_run_model_on_fold`{.interpreted-text role="meth"}: One of the
116+
main requirements for a library to generate run objects for the
117+
OpenML server. Obtains a train split (with labels) and a test
118+
split (without labels) and the goal is to train a model on the
119+
train split and return the predictions on the test split. On top
120+
of the actual predictions, also the class probabilities should
121+
be determined. For classifiers that do not return class
122+
probabilities, this can just be the hot-encoded predicted label.
123+
The predictions will be evaluated on the OpenML server. Also,
124+
additional information can be returned, for example,
125+
user-defined measures (such as runtime information, as this can
126+
not be inferred on the server). Additionally, information about
127+
a hyperparameter optimization trace can be provided.
128+
- `obtain_parameter_values`{.interpreted-text role="meth"}:
129+
Obtains the hyperparameters of a given model and the current
130+
values. Please note that in the case of a hyperparameter
131+
optimization procedure (e.g., random search), you only should
132+
return the hyperparameters of this procedure (e.g., the
133+
hyperparameter grid, budget, etc) and that the chosen model will
134+
be inferred from the optimization trace.
135+
- `check_if_model_fitted`{.interpreted-text role="meth"}: Check
136+
whether the train method of the model has been called (and as
137+
such, whether the predict method can be used).
138+
- Hyperparameter optimization (optional)
139+
- `instantiate_model_from_hpo_class`{.interpreted-text
140+
role="meth"}: If a given run has recorded the hyperparameter
141+
optimization trace, then this method can be used to
142+
reinstantiate the model with hyperparameters of a given
143+
hyperparameter optimization iteration. Has some similarities
144+
with `flow_to_model`{.interpreted-text role="meth"} (as this
145+
method also sets the hyperparameters of a model). Note that
146+
although this method is required, it is not necessary to
147+
implement any logic if hyperparameter optimization is not
148+
implemented. Simply raise a [NotImplementedError]{.title-ref}
149+
then.
150+
151+
### Hosting the library
152+
153+
Each extension created should be a stand-alone repository, compatible
154+
with the [OpenML-Python
155+
repository](https://github.com/openml/openml-python). The extension
156+
repository should work off-the-shelf with *OpenML-Python* installed.
157+
158+
Create a [public Github
159+
repo](https://docs.github.com/en/github/getting-started-with-github/create-a-repo)
160+
with the following directory structure:
161+
162+
| [repo name]
163+
| |-- [extension name]
164+
| | |-- __init__.py
165+
| | |-- extension.py
166+
| | |-- config.py (optionally)
167+
168+
### Recommended
169+
170+
- Test cases to keep the extension up to date with the
171+
[openml-python]{.title-ref} upstream changes.
172+
- Documentation of the extension API, especially if any new
173+
functionality added to OpenML-Python\'s extension design.
174+
- Examples to show how the new extension interfaces and works with
175+
OpenML-Python.
176+
- Create a PR to add the new extension to the OpenML-Python API
177+
documentation.
178+
179+
Happy contributing!

docs/index.md

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# OpenML
2+
3+
**Collaborative Machine Learning in Python**
4+
5+
Welcome to the documentation of the OpenML Python API, a connector to
6+
the collaborative machine learning platform
7+
[OpenML.org](https://www.openml.org). The OpenML Python package allows
8+
to use datasets and tasks from OpenML together with scikit-learn and
9+
share the results online.
10+
11+
## Example
12+
13+
```python
14+
import openml
15+
from sklearn import impute, tree, pipeline
16+
17+
# Define a scikit-learn classifier or pipeline
18+
clf = pipeline.Pipeline(
19+
steps=[
20+
('imputer', impute.SimpleImputer()),
21+
('estimator', tree.DecisionTreeClassifier())
22+
]
23+
)
24+
# Download the OpenML task for the pendigits dataset with 10-fold
25+
# cross-validation.
26+
task = openml.tasks.get_task(32)
27+
# Run the scikit-learn model on the task.
28+
run = openml.runs.run_model_on_task(clf, task)
29+
# Publish the experiment on OpenML (optional, requires an API key.
30+
# You can get your own API key by signing up to OpenML.org)
31+
run.publish()
32+
print(f'View the run online: {run.openml_url}')
33+
```
34+
35+
Find more examples in the sidebar on the left.
36+
37+
## How to get OpenML for python
38+
39+
You can install the OpenML package via `pip` (we recommend using a virtual environment):
40+
41+
```bash
42+
python -m pip install openml
43+
```
44+
45+
For more advanced installation information, please see the
46+
["Introduction"](../examples/20_basic/introduction_tutorial.py) example.
47+
48+
49+
## Further information
50+
51+
- [OpenML documentation](https://docs.openml.org/)
52+
- [OpenML client APIs](https://docs.openml.org/APIs/)
53+
- [OpenML developer guide](https://docs.openml.org/Contributing/)
54+
- [Contact information](https://www.openml.org/contact)
55+
- [Citation request](https://www.openml.org/cite)
56+
- [OpenML blog](https://medium.com/open-machine-learning)
57+
- [OpenML twitter account](https://twitter.com/open_ml)
58+
59+
## Contributing
60+
61+
Contribution to the OpenML package is highly appreciated. Please see the
62+
["Contributing"][contributing] page for more information.
63+
64+
## Citing OpenML-Python
65+
66+
If you use OpenML-Python in a scientific publication, we would
67+
appreciate a reference to our JMLR-MLOSS paper
68+
["OpenML-Python: an extensible Python API for OpenML"](https://www.jmlr.org/papers/v22/19-920.html):
69+
70+
=== "Bibtex"
71+
72+
```bibtex
73+
@article{JMLR:v22:19-920,
74+
author = {Matthias Feurer and Jan N. van Rijn and Arlind Kadra and Pieter Gijsbers and Neeratyoy Mallik and Sahithya Ravi and Andreas Müller and Joaquin Vanschoren and Frank Hutter},
75+
title = {OpenML-Python: an extensible Python API for OpenML},
76+
journal = {Journal of Machine Learning Research},
77+
year = {2021},
78+
volume = {22},
79+
number = {100},
80+
pages = {1--5},
81+
url = {http://jmlr.org/papers/v22/19-920.html}
82+
}
83+
```
84+
85+
=== "MLA"
86+
87+
Feurer, Matthias, et al.
88+
"OpenML-Python: an extensible Python API for OpenML."
89+
_Journal of Machine Learning Research_ 22.100 (2021):1−5.

0 commit comments

Comments
 (0)