You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: tutorials/deeplearning/README.md
+66-63Lines changed: 66 additions & 63 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,3 +1,4 @@
1
+
##AUTO-GENERATED - DO NOT EDIT##
1
2
# Classification and Regression with H2O Deep Learning
2
3
3
4
* Introduction
@@ -55,7 +56,7 @@ We start with a small dataset representing red and black dots on a plane, arrang
55
56
We visualize the nature of H2O Deep Learning (DL), H2O's tree methods (GBM/DRF) and H2O's generalized linear modeling (GLM) by plotting the decision boundary between the red and black spirals:
Let's compare the training error with the validation and test set errors
247
248
248
249
```r
249
-
h2o.performance(m3, train=T) ## sampled training data (from model building)
250
-
h2o.performance(m3, valid=T) ## sampled validation data (from model building)
251
-
h2o.performance(m3, data=train) ## full training data
252
-
h2o.performance(m3, data=valid) ## full validation data
253
-
h2o.performance(m3, data=test) ## full test data
250
+
h2o.performance(m3, train=T) ## sampled training data (from model building)
251
+
h2o.performance(m3, valid=T) ## sampled validation data (from model building)
252
+
h2o.performance(m3, newdata=train) ## full training data
253
+
h2o.performance(m3, newdata=valid) ## full validation data
254
+
h2o.performance(m3, newdata=test) ## full test data
254
255
```
255
256
256
257
To confirm that the reported confusion matrix on the validation set (here, the test set) was correct, we make a prediction on the test set and compare the confusion matrices explicitly:
@@ -282,15 +283,15 @@ hyper_params <- list(
282
283
)
283
284
hyper_params
284
285
grid<- h2o.grid(
285
-
"deeplearning",
286
-
model_id="dl_grid",
286
+
algorithm="deeplearning",
287
+
grid_id="dl_grid",
287
288
training_frame=sampled_train,
288
289
validation_frame=valid,
289
290
x=predictors,
290
291
y=response,
291
292
epochs=10,
292
293
stopping_metric="misclassification",
293
-
stopping_tolerance=1e-2, ## stop when logloss does not improve by >=1% for 2 scoring events
294
+
stopping_tolerance=1e-2, ## stop when misclassification does not improve by >=1% for 2 scoring events
294
295
stopping_rounds=2,
295
296
score_validation_samples=10000, ## downsample validation set for faster scoring
296
297
score_duty_cycle=0.025, ## don't score more than 2.5% of the wall time
@@ -310,75 +311,77 @@ grid
310
311
Let's see which model had the lowest validation error:
311
312
312
313
```r
313
-
## Find the best model and its full set of parameters (clunky for now, will be improved)
## Find the best model and its full set of parameters
324
+
grid@summary_table[1,]
325
+
best_model<- h2o.getModel(grid@model_ids[[1]])
326
+
best_model
327
+
320
328
print(best_model@allparameters)
321
-
best_err<-sorted_scores$misclassification[1]
322
-
print(best_err)
329
+
print(h2o.performance(best_model, valid=T))
330
+
print(h2o.logloss(best_model, valid=T))
323
331
```
324
332
325
333
### Random Hyper-Parameter Search
326
-
Often, hyper-parameter search for more than 4 parameters can be done more efficiently with random parameter search than with grid search. Basically, chances are good to find one of many good models in less time than performing an exhaustive grid search. We simply build `N` models with parameters drawn randomly from user-specified distributions (here, uniform). For this example, we use the adaptive learning rate and focus on tuning the network architecture and the regularization parameters.
334
+
Often, hyper-parameter search for more than 4 parameters can be done more efficiently with random parameter search than with grid search. Basically, chances are good to find one of many good models in less time than performing an exhaustive grid search. We simply build up to `max_models` models with parameters drawn randomly from user-specified distributions (here, uniform). For this example, we use the adaptive learning rate and focus on tuning the network architecture and the regularization parameters. Note that we keep the `grid_id` the same, which will lead to the original grid search to be extended, such that we'll have one "leaderboard" across both grid searches. We also let the grid search stop automatically once the performance at the top of the leaderboard doesn't change much anymore, i.e., once the search has converged.
Copy file name to clipboardExpand all lines: tutorials/deeplearning/deeplearning.R
+69-66Lines changed: 69 additions & 66 deletions
Original file line number
Diff line number
Diff line change
@@ -1,3 +1,4 @@
1
+
##AUTO-GENERATED - DO NOT EDIT##
1
2
## Classification and Regression with H2O Deep Learning
2
3
#
3
4
#* Introduction
@@ -14,11 +15,11 @@
14
15
#* Deep Learning Tips & Tricks
15
16
#
16
17
### Introduction
17
-
#This tutorial shows how a H2O [Deep Learning](http://en.wikipedia.org/wiki/Deep_learning) model can be used to do supervised classification and regression. This tutorial covers usage of H2O from R. A python version of this tutorial will be available as well in a separate document. This file is available in plain R, R markdown and regular markdown formats, and the plots are available as PDF files. All documents are available [on Github](https://github.com/h2oai/h2o-world-2015-training/raw/master/tutorials/deeplearning/).
18
+
#This tutorial shows how a H2O [Deep Learning](http://en.wikipedia.org/wiki/Deep_learning) model can be used to do supervised classification and regression. This tutorial covers usage of H2O from R. A python version of this tutorial will be available as well in a separate document. This file is available in plain R, R markdown and regular markdown formats, and the plots are available as PDF files. All documents are available [on Github](.).
18
19
#
19
20
#If run from plain R, execute R in the directory of this script. If run from RStudio, be sure to setwd() to the location of this script. h2o.init() starts H2O in R's current working directory. h2o.importFile() looks for files from the perspective of where H2O was started.
20
21
#
21
-
#More examples and explanations can be found in our [H2O Deep Learning booklet](http://h2o.ai/resources/) and on our [H2O Github Repository](http://github.com/h2oai/h2o-3/). The PDF slide deck can be found [on Github](https://github.com/h2oai/h2o-world-2015-training/raw/master/tutorials/deeplearning/H2ODeepLearning.pdf).
22
+
#More examples and explanations can be found in our [H2O Deep Learning booklet](http://h2o.ai/resources/) and on our [H2O Github Repository](http://github.com/h2oai/h2o-3/). The PDF slide deck can be found [on Github](./H2ODeepLearning.pdf).
22
23
#
23
24
#### H2O R Package
24
25
#
@@ -48,7 +49,7 @@ example(h2o.deeplearning)
48
49
#
49
50
#We visualize the nature of H2O Deep Learning (DL), H2O's tree methods (GBM/DRF) and H2O's generalized linear modeling (GLM) by plotting the decision boundary between the red and black spirals:
#Let's compare the training error with the validation and test set errors
217
218
#
218
-
h2o.performance(m3, train=T) ## sampled training data (from model building)
219
-
h2o.performance(m3, valid=T) ## sampled validation data (from model building)
220
-
h2o.performance(m3, data=train) ## full training data
221
-
h2o.performance(m3, data=valid) ## full validation data
222
-
h2o.performance(m3, data=test) ## full test data
219
+
h2o.performance(m3, train=T) ## sampled training data (from model building)
220
+
h2o.performance(m3, valid=T) ## sampled validation data (from model building)
221
+
h2o.performance(m3, newdata=train) ## full training data
222
+
h2o.performance(m3, newdata=valid) ## full validation data
223
+
h2o.performance(m3, newdata=test) ## full test data
223
224
#
224
225
#To confirm that the reported confusion matrix on the validation set (here, the test set) was correct, we make a prediction on the test set and compare the confusion matrices explicitly:
225
226
#
@@ -245,15 +246,15 @@ hyper_params <- list(
245
246
)
246
247
hyper_params
247
248
grid<- h2o.grid(
248
-
"deeplearning",
249
-
model_id="dl_grid",
249
+
algorithm="deeplearning",
250
+
grid_id="dl_grid",
250
251
training_frame=sampled_train,
251
252
validation_frame=valid,
252
253
x=predictors,
253
254
y=response,
254
255
epochs=10,
255
256
stopping_metric="misclassification",
256
-
stopping_tolerance=1e-2, ## stop when logloss does not improve by >=1% for 2 scoring events
257
+
stopping_tolerance=1e-2, ## stop when misclassification does not improve by >=1% for 2 scoring events
257
258
stopping_rounds=2,
258
259
score_validation_samples=10000, ## downsample validation set for faster scoring
259
260
score_duty_cycle=0.025, ## don't score more than 2.5% of the wall time
@@ -271,71 +272,73 @@ grid
271
272
#
272
273
#Let's see which model had the lowest validation error:
273
274
#
274
-
## Find the best model and its full set of parameters (clunky for now, will be improved)
## Find the best model and its full set of parameters
285
+
grid@summary_table[1,]
286
+
best_model<- h2o.getModel(grid@model_ids[[1]])
287
+
best_model
288
+
281
289
print(best_model@allparameters)
282
-
best_err<-sorted_scores$misclassification[1]
283
-
print(best_err)
290
+
print(h2o.performance(best_model, valid=T))
291
+
print(h2o.logloss(best_model, valid=T))
284
292
#
285
293
#### Random Hyper-Parameter Search
286
-
#Often, hyper-parameter search for more than 4 parameters can be done more efficiently with random parameter search than with grid search. Basically, chances are good to find one of many good models in less time than performing an exhaustive grid search. We simply build `N` models with parameters drawn randomly from user-specified distributions (here, uniform). For this example, we use the adaptive learning rate and focus on tuning the network architecture and the regularization parameters.
# epochs=100, ## for real parameters: set high enough to get to convergence
304
-
epochs=1,
305
-
stopping_metric="misclassification",
306
-
stopping_tolerance=1e-2, ## stop when logloss does not improve by >=1% for 2 scoring events
307
-
stopping_rounds=2,
308
-
score_validation_samples=10000, ## downsample validation set for faster scoring
309
-
score_duty_cycle=0.025, ## don't score more than 2.5% of the wall time
310
-
max_w2=10, ## can help improve stability for Rectifier
294
+
#Often, hyper-parameter search for more than 4 parameters can be done more efficiently with random parameter search than with grid search. Basically, chances are good to find one of many good models in less time than performing an exhaustive grid search. We simply build up to `max_models` models with parameters drawn randomly from user-specified distributions (here, uniform). For this example, we use the adaptive learning rate and focus on tuning the network architecture and the regularization parameters. Note that we keep the `grid_id` the same, which will lead to the original grid search to be extended, such that we'll have one "leaderboard" across both grid searches. We also let the grid search stop automatically once the performance at the top of the leaderboard doesn't change much anymore, i.e., once the search has converged.
best_model<- h2o.getModel(grid@model_ids[[1]]) ## model with lowest classification error (on validation, since it was available during training)
333
335
h2o.confusionMatrix(best_model,valid=T)
334
336
best_params<-best_model@allparameters
337
+
best_params$activation
335
338
best_params$hidden
339
+
best_params$input_dropout_ratio
336
340
best_params$l1
337
341
best_params$l2
338
-
best_params$input_dropout_ratio
339
342
#
340
343
####Checkpointing
341
344
#Let's continue training the manually tuned model from before, for 2 more epochs. Note that since many important parameters such as `epochs, l1, l2, max_w2, score_interval, train_samples_per_iteration, input_dropout_ratio, hidden_dropout_ratios, score_duty_cycle, classification_stop, regression_stop, variable_importances, force_load_balance` can be modified between checkpoint restarts, it is best to specify as many parameters as possible explicitly.
0 commit comments