Skip to content

Commit 8eac076

Browse files
authored
[WIP] Restructuring the examples section (openml#785)
* Restructuring the examples section. * Introducing new placeholder examples. * Excluding from the flake8 check the examples with lengthy descriptions.
1 parent dcac17e commit 8eac076

22 files changed

Lines changed: 214 additions & 71 deletions

examples/20_basic/README.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
Introductory Examples
2+
=====================
3+
4+
Introductory examples to the usage of the OpenML python connector.

examples/introduction_tutorial.py renamed to examples/20_basic/introduction_tutorial.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
"""
2-
Introduction
3-
============
2+
Setup
3+
=====
44
5-
An introduction to OpenML, followed up by a simple example.
5+
An example how to set up OpenML-Python followed up by a simple example.
66
"""
77
############################################################################
88
# OpenML is an online collaboration platform for machine learning which allows
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
"""
2+
========
3+
Datasets
4+
========
5+
6+
A basic tutorial on how to list and download datasets.
7+
"""
8+
############################################################################
9+
import openml
10+
11+
############################################################################
12+
# List datasets
13+
# =============
14+
15+
datasets_df = openml.datasets.list_datasets(output_format='dataframe')
16+
print(datasets_df.head(n=10))
17+
18+
############################################################################
19+
# Download a dataset
20+
# ==================
21+
22+
first_dataset_id = int(datasets_df['did'].iloc[0])
23+
dataset = openml.datasets.get_dataset(first_dataset_id)
24+
25+
# Print a summary
26+
print("This is dataset '%s', the target feature is '%s'" %
27+
(dataset.name, dataset.default_target_attribute))
28+
print("URL: %s" % dataset.url)
29+
print(dataset.description[:500])
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
"""
2+
Flows and Runs
3+
==============
4+
5+
A simple tutorial on how to train/run a model and how to upload the results.
6+
"""
7+
8+
import openml
9+
from sklearn import ensemble, neighbors
10+
11+
############################################################################
12+
# Train a machine learning model
13+
# ==============================
14+
#
15+
# .. warning:: This example uploads data. For that reason, this example
16+
# connects to the test server at test.openml.org. This prevents the main
17+
# server from crowding with example datasets, tasks, runs, and so on.
18+
19+
openml.config.start_using_configuration_for_example()
20+
21+
# NOTE: We are using dataset 20 from the test server: https://test.openml.org/d/20
22+
dataset = openml.datasets.get_dataset(20)
23+
X, y, categorical_indicator, attribute_names = dataset.get_data(
24+
dataset_format='array',
25+
target=dataset.default_target_attribute
26+
)
27+
clf = neighbors.KNeighborsClassifier(n_neighbors=3)
28+
clf.fit(X, y)
29+
30+
############################################################################
31+
# Running a model on a task
32+
# =========================
33+
34+
task = openml.tasks.get_task(119)
35+
clf = ensemble.RandomForestClassifier()
36+
run = openml.runs.run_model_on_task(clf, task)
37+
print(run)
38+
39+
############################################################################
40+
# Publishing the run
41+
# ==================
42+
43+
myrun = run.publish()
44+
print("Run was uploaded to http://test.openml.org/r/" + str(myrun.run_id))
45+
print("The flow can be found at http://test.openml.org/f/" + str(myrun.flow_id))
46+
47+
############################################################################
48+
openml.config.stop_using_configuration_for_example()
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
"""
2+
=======
3+
Studies
4+
=======
5+
6+
This is only a placeholder so far.
7+
"""

examples/30_extended/README.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
In-Depth Examples
2+
=================
3+
4+
Extended examples for the usage of the OpenML python connector.
File renamed without changes.
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
# **********
1515
#
1616
# * List datasets
17+
#
1718
# * Use the output_format parameter to select output type
1819
# * Default gives 'dict' (other option: 'dataframe')
1920

File renamed without changes.

examples/flows_and_runs_tutorial.py renamed to examples/30_extended/flows_and_runs_tutorial.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -132,7 +132,7 @@
132132
# The run may be stored offline, and the flow will be stored along with it:
133133
run.to_filesystem(directory='myrun')
134134

135-
# They made later be loaded and uploaded
135+
# They may be loaded and uploaded at a later time
136136
run = openml.runs.OpenMLRun.from_filesystem(directory='myrun')
137137
run.publish()
138138

0 commit comments

Comments
 (0)