-
-
Notifications
You must be signed in to change notification settings - Fork 270
Adding helper functions to support ColumnTransformer #982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
b45f6f2
8e7ea0b
102a084
18a2dba
381c267
5dbff2e
fc4ec73
8d5cad9
3d66404
90c8de6
e0af15e
14aa11d
31d48d8
431447c
cc3199e
8a29668
405e03c
caf4f46
b308e71
436a9fe
ddd8b04
74ae622
50ce90e
aea2832
56cd639
8e8ea2e
d518beb
37d9f6b
9bd4892
dc41b5d
396cb8d
8f380de
bc1745e
d95b5e6
d58ca5a
91c6cf5
b43a0e0
a9430b3
e9cfba8
c13f6ce
3d7abc2
94576b1
b5e1242
d764aad
f5e4a3e
c065dfc
07ce722
82e1b72
936c252
fc8b464
46ab043
1be82c3
dfbf5e5
b611f9f
93833c3
f6aa7ed
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -40,7 +40,8 @@ | |
| from openml.flows import OpenMLFlow | ||
| from openml.flows.functions import assert_flows_equal | ||
| from openml.runs.trace import OpenMLRunTrace | ||
| from openml.testing import TestBase, SimpleImputer, CustomImputer, cat, cont | ||
| from openml.testing import TestBase, SimpleImputer, CustomImputer | ||
| from openml.extensions.sklearn import cat, cont | ||
|
|
||
|
|
||
| this_directory = os.path.dirname(os.path.abspath(__file__)) | ||
|
|
@@ -2183,16 +2184,6 @@ def test_failed_serialization_of_custom_class(self): | |
| # for lower versions | ||
| from sklearn.preprocessing import Imputer as SimpleImputer | ||
|
|
||
| class CustomImputer(SimpleImputer): | ||
| pass | ||
|
|
||
| def cont(X): | ||
| return X.dtypes != "category" | ||
|
|
||
| def cat(X): | ||
| return X.dtypes == "category" | ||
|
|
||
| import sklearn.metrics | ||
| import sklearn.tree | ||
| from sklearn.pipeline import Pipeline, make_pipeline | ||
| from sklearn.compose import ColumnTransformer | ||
|
|
@@ -2215,3 +2206,37 @@ def cat(X): | |
| raise AttributeError(e) | ||
| else: | ||
| raise Exception(e) | ||
|
|
||
| @unittest.skipIf( | ||
| LooseVersion(sklearn.__version__) < "0.20", | ||
| reason="columntransformer introduction in 0.20.0", | ||
| ) | ||
| def test_setupid_with_column_transformer(self): | ||
| """Test to check if inclusion of ColumnTransformer in a pipleline is treated as a new | ||
| flow each time. | ||
| """ | ||
| import sklearn.compose | ||
| from sklearn.svm import SVC | ||
|
|
||
| def column_transformer_pipe(task_id): | ||
| task = openml.tasks.get_task(task_id) | ||
| # make columntransformer | ||
| preprocessor = sklearn.compose.ColumnTransformer( | ||
| transformers=[ | ||
| ("num", StandardScaler(), cont), | ||
| ("cat", OneHotEncoder(handle_unknown="ignore"), cat), | ||
| ] | ||
| ) | ||
| # make pipeline | ||
| clf = SVC(gamma="scale", random_state=1) | ||
| pipe = make_pipeline(preprocessor, clf) | ||
| # run task | ||
| run = openml.runs.run_model_on_task(pipe, task, avoid_duplicate_runs=True) | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems to fail on several jobs (but strangely not all-but-one, perhaps due to race conditions?). And shouldn't that be correct?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's a good point that I missed.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, @mfeurer is cleaning the test servers, and he recommended waiting this week out before judging if this error is a result of those changes. Nevertheless, I made the push with the change that should be there in any case.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay we'll wait.
I imagine the times it did pass are those where a previous upload had been deleted by the clean up script in between.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I just had a look at this PR and it appears that this error and an error building the docs are the only things that hold back merging this PR? @Neeratyoy could you please have a look into this? |
||
| run.publish() | ||
| new_run = openml.runs.get_run(run.run_id) | ||
| return new_run.setup_id | ||
|
|
||
| setup1 = column_transformer_pipe(11) # only categorical | ||
| setup2 = column_transformer_pipe(23) # only numeric | ||
|
|
||
| self.assertEqual(setup1, setup2) | ||
Uh oh!
There was an error while loading. Please reload this page.