forked from openml/openml-python
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathsimple_datasets_tutorial.py
More file actions
57 lines (45 loc) · 1.47 KB
/
simple_datasets_tutorial.py
File metadata and controls
57 lines (45 loc) · 1.47 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# %% [markdown]
# A basic tutorial on how to list, load and visualize datasets.
#
# In general, we recommend working with tasks, so that the results can
# be easily reproduced. Furthermore, the results can be compared to existing results
# at OpenML. However, for the purposes of this tutorial, we are going to work with
# the datasets directly.
# %%
import openml
# %% [markdown]
# ## List datasets stored on OpenML
# %%
datasets_df = openml.datasets.list_datasets()
print(datasets_df.head(n=10))
# %% [markdown]
# ## Download a dataset
# %%
# Iris dataset https://www.openml.org/d/61
dataset = openml.datasets.get_dataset(dataset_id=61)
# Print a summary
print(
f"This is dataset '{dataset.name}', the target feature is '{dataset.default_target_attribute}'"
)
print(f"URL: {dataset.url}")
print(dataset.description[:500])
# %% [markdown]
# ## Load a dataset
# * `X` - A dataframe where each row represents one example with
# the corresponding feature values.
# * `y` - the classes for each example
# * `categorical_indicator` - a list that indicates which feature is categorical
# * `attribute_names` - the names of the features for the examples (X) and
# target feature (y)
# %%
X, y, categorical_indicator, attribute_names = dataset.get_data(
target=dataset.default_target_attribute
)
# %% [markdown]
# Visualize the dataset
# %%
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
iris_plot = sns.pairplot(pd.concat([X, y], axis=1), hue="class")
plt.show()