-
-
Notifications
You must be signed in to change notification settings - Fork 270
Dataframe run on task #777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
8df347a
ce8640c
d8a2347
b16a937
df7b496
b31c1dc
05b55a3
5cd59af
1f8c37b
9d45f6f
453fbdf
6c7172c
b98fdb4
8ac6468
5c5fb31
b5729d1
b619361
25e1b8e
08b1be1
6a4eae3
ae9c312
7f60589
49584de
b25bbc4
c0116e4
7379f0c
ba9c1a2
3305a12
1b05089
053beb6
15743ee
440c0ad
7967624
c06eb0d
83f309a
c2a090a
2cb2028
5ea4d31
f0ff562
7200418
29af032
001ee74
ae57bea
1fee939
418d9e6
c00c060
463a326
44c9e65
d439d73
3ae777e
9fc6c10
3acce3f
39011df
4908237
90ad9e2
9982afe
da8dbb9
0a0f71f
78ab677
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
…e same
- Loading branch information
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -762,6 +762,7 @@ def _get_external_version_string( | |
| # requirements for their subcomponents. The external version string is a | ||
| # sorted concatenation of all modules which are present in this run. | ||
| model_package_name = model.__module__.split('.')[0] | ||
|
|
||
| module = importlib.import_module(model_package_name) | ||
| model_package_version_number = module.__version__ # type: ignore | ||
| external_version = self._format_external_version( | ||
|
|
@@ -1512,10 +1513,11 @@ def _prediction_to_probabilities(y: np.ndarray, classes: List[Any]) -> np.ndarra | |
| if not isinstance(classes, list): | ||
| raise ValueError('please convert model classes to list prior to ' | ||
| 'calling this fn') | ||
| result = np.zeros((len(y), len(classes)), dtype=np.float32) | ||
| for obs, prediction_idx in enumerate(y): | ||
| result[obs][prediction_idx] = 1.0 | ||
| return result | ||
| # DataFrame allows more accurate mapping of classes as column names | ||
| result = pd.DataFrame(0, index=np.arange(len(y)), columns=classes, dtype=np.float32) | ||
| for obs, prediction in enumerate(y): | ||
| result.loc[obs, prediction] = 1.0 | ||
| return result.to_numpy() | ||
|
|
||
| if isinstance(task, OpenMLSupervisedTask): | ||
|
PGijsbers marked this conversation as resolved.
|
||
| if y_train is None: | ||
|
|
@@ -1573,6 +1575,11 @@ def _prediction_to_probabilities(y: np.ndarray, classes: List[Any]) -> np.ndarra | |
| else: | ||
| model_classes = used_estimator.classes_ | ||
|
|
||
| # to handle the case when dataset is numpy and categories are encoded | ||
| # however the class labels stored in task are still categories | ||
| if isinstance(y_train, np.ndarray) and isinstance(task.class_labels[0], str): | ||
| model_classes = [task.class_labels[i] for i in model_classes] | ||
|
|
||
| modelpredict_start_cputime = time.process_time() | ||
| modelpredict_start_walltime = time.time() | ||
|
|
||
|
|
@@ -1601,9 +1608,16 @@ def _prediction_to_probabilities(y: np.ndarray, classes: List[Any]) -> np.ndarra | |
|
|
||
| try: | ||
| proba_y = model_copy.predict_proba(X_test) | ||
| except AttributeError: | ||
| except AttributeError: # predict_proba is not available when probability=False | ||
| if task.class_labels is not None: | ||
| proba_y = _prediction_to_probabilities(pred_y, list(task.class_labels)) | ||
| if isinstance(y_train, np.ndarray) and isinstance(task.class_labels[0], str): | ||
| # mapping (decoding) the predictions to the categories | ||
| # creating a separate copy to not change the expected pred_y type | ||
| preds = [task.class_labels[pred] for pred in pred_y] | ||
|
mfeurer marked this conversation as resolved.
Outdated
|
||
| proba_y = _prediction_to_probabilities(preds, model_classes) | ||
| else: | ||
| proba_y = _prediction_to_probabilities(pred_y, model_classes) | ||
|
|
||
| else: | ||
| raise ValueError('The task has no class labels') | ||
|
|
||
|
|
@@ -1619,10 +1633,13 @@ def _prediction_to_probabilities(y: np.ndarray, classes: List[Any]) -> np.ndarra | |
| # then we need to add a column full of zeros into the probabilities | ||
| # for class 3 because the rest of the library expects that the | ||
| # probabilities are ordered the same way as the classes are ordered). | ||
| proba_y_new = np.zeros((proba_y.shape[0], len(task.class_labels))) | ||
|
|
||
| # DataFrame allows more accurate mapping of classes as column names | ||
| proba_y_new = pd.DataFrame(0, index=np.arange(proba_y.shape[0]), | ||
| columns=task.class_labels, dtype=np.float32) | ||
| for idx, model_class in enumerate(model_classes): | ||
| proba_y_new[:, model_class] = proba_y[:, idx] | ||
| proba_y = proba_y_new | ||
| proba_y_new.loc[:, model_class] = proba_y[:, idx] | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would it perhaps not be clear to:
Advantages:
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's a good spot. Thanks. However, right at the end, needed to convert the probability array to numpy since _run_task_get_arffcontent appears to require a numpy as the probability matrix.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
You could also change that function to accept a dataframe - it appears easier and safer to work with.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hey, I just checked but I think you did not yet update the return value type annotation for the functions
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated it, thanks for pointing it out.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That appears to be a wrong since two years, you can just update the docstring.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the update. I think Pieter's original suggestion is not yet addressed - do you think you could have a look whether that's still possible?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this resolved now then? |
||
| proba_y = proba_y_new.to_numpy() | ||
|
|
||
| if proba_y.shape[1] != len(task.class_labels): | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a repeated clause from Line#1666.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, you're right. I was scratching my head too. I think it makes sense to remove the redundant
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds good to me. |
||
| message = "Estimator only predicted for {}/{} classes!".format( | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.