Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
sklearn model fit check
  • Loading branch information
Neeratyoy committed Oct 28, 2020
commit d224845a327e9e5e754f8e3b7a9c2ffc5c7fbe8f
16 changes: 16 additions & 0 deletions openml/extensions/sklearn/extension.py
Original file line number Diff line number Diff line change
Expand Up @@ -1652,6 +1652,22 @@ def _prediction_to_probabilities(y: np.ndarray, model_classes: List[Any]) -> pd.

user_defined_measures = OrderedDict() # type: 'OrderedDict[str, float]'

try:
# check if model is fitted
# 'predict' internally calls sklearn.utils.validation.check_is_fitted for every
# model-specific attribute it excepts, thus offering a more robust check than
Comment thread
mfeurer marked this conversation as resolved.
Outdated
# a generic simplified call of check_is_fitted(model_copy)
from sklearn.exceptions import NotFittedError

model_copy.predict(X_train)
warnings.warn(
"The model is already fitted!"
" This might cause inconsistency in comparison of results."
)
except NotFittedError:
# model is not fitted, as is required
pass
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought in the Python call we discussed that perhaps we would check this at the first call to run_model_on_task?
In either case I would extract this to a separate method _raise_warning_if_fitted to make sure the functions don't get too big (they already are).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, will make it into a function and push.

As for its placement, I reconsidered it given that irrespective of what is called, run_model_on_task or run_flow_on_task, this function is what the call is reduced to. Hence went ahead with this placement for this snippet of code.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_model_on_task actually calls run_flow_on_task. That said, then we would need to add a function to the extension interface that will indicate if a model is already fit, otherwise we can't check it in a general way.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mfeurer do you think this is something we should want? or do we just leave it to the extension devs to implement a warning if they see it fit?

Copy link
Copy Markdown
Collaborator

@mfeurer mfeurer Nov 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of having this function as callback that can be implemented by the extension deves. And yes, I expected this function to be called from the run_model_on_task function.


try:
# for measuring runtime. Only available since Python 3.3
modelfit_start_cputime = time.process_time()
Expand Down