Skip to content

Commit eabcab2

Browse files
committed
Update to h2o-3 with stochastic grid search.
1 parent ed565a5 commit eabcab2

5 files changed

Lines changed: 275 additions & 261 deletions

File tree

tutorials/deeplearning/README.md

Lines changed: 66 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
##AUTO-GENERATED - DO NOT EDIT##
12
# Classification and Regression with H2O Deep Learning
23

34
* Introduction
@@ -55,7 +56,7 @@ We start with a small dataset representing red and black dots on a plane, arrang
5556
We visualize the nature of H2O Deep Learning (DL), H2O's tree methods (GBM/DRF) and H2O's generalized linear modeling (GLM) by plotting the decision boundary between the red and black spirals:
5657

5758
```r
58-
setwd("~/h2o-world-2015-training/tutorials/deeplearning") ##For RStudio
59+
setwd("~/h2o-tutorials/tutorials/deeplearning") ##For RStudio
5960
spiral <- h2o.importFile(path = normalizePath("../data/spiral.csv"))
6061
grid <- h2o.importFile(path = normalizePath("../data/grid.csv"))
6162
# Define helper to plot contours
@@ -246,11 +247,11 @@ summary(m3)
246247
Let's compare the training error with the validation and test set errors
247248

248249
```r
249-
h2o.performance(m3, train=T) ## sampled training data (from model building)
250-
h2o.performance(m3, valid=T) ## sampled validation data (from model building)
251-
h2o.performance(m3, data=train) ## full training data
252-
h2o.performance(m3, data=valid) ## full validation data
253-
h2o.performance(m3, data=test) ## full test data
250+
h2o.performance(m3, train=T) ## sampled training data (from model building)
251+
h2o.performance(m3, valid=T) ## sampled validation data (from model building)
252+
h2o.performance(m3, newdata=train) ## full training data
253+
h2o.performance(m3, newdata=valid) ## full validation data
254+
h2o.performance(m3, newdata=test) ## full test data
254255
```
255256

256257
To confirm that the reported confusion matrix on the validation set (here, the test set) was correct, we make a prediction on the test set and compare the confusion matrices explicitly:
@@ -282,15 +283,15 @@ hyper_params <- list(
282283
)
283284
hyper_params
284285
grid <- h2o.grid(
285-
"deeplearning",
286-
model_id="dl_grid",
286+
algorithm="deeplearning",
287+
grid_id="dl_grid",
287288
training_frame=sampled_train,
288289
validation_frame=valid,
289290
x=predictors,
290291
y=response,
291292
epochs=10,
292293
stopping_metric="misclassification",
293-
stopping_tolerance=1e-2, ## stop when logloss does not improve by >=1% for 2 scoring events
294+
stopping_tolerance=1e-2, ## stop when misclassification does not improve by >=1% for 2 scoring events
294295
stopping_rounds=2,
295296
score_validation_samples=10000, ## downsample validation set for faster scoring
296297
score_duty_cycle=0.025, ## don't score more than 2.5% of the wall time
@@ -310,75 +311,77 @@ grid
310311
Let's see which model had the lowest validation error:
311312

312313
```r
313-
## Find the best model and its full set of parameters (clunky for now, will be improved)
314-
scores <- cbind(as.data.frame(unlist((lapply(grid@model_ids, function(x)
315-
{ h2o.confusionMatrix(h2o.performance(h2o.getModel(x),valid=T))$Error[8] })) )), unlist(grid@model_ids))
316-
names(scores) <- c("misclassification","model")
317-
sorted_scores <- scores[order(scores$misclassification),]
318-
head(sorted_scores)
319-
best_model <- h2o.getModel(as.character(sorted_scores$model[1]))
314+
grid <- h2o.getGrid("dl_grid",sort_by="err",decreasing=FALSE)
315+
grid
316+
317+
## See what other "sort_by" criteria are allowed
318+
grid <- h2o.getGrid("dl_grid",sort_by="wrong_thing",decreasing=FALSE)
319+
320+
## Sort by logloss
321+
h2o.getGrid("dl_grid",sort_by="logloss",decreasing=FALSE)
322+
323+
## Find the best model and its full set of parameters
324+
grid@summary_table[1,]
325+
best_model <- h2o.getModel(grid@model_ids[[1]])
326+
best_model
327+
320328
print(best_model@allparameters)
321-
best_err <- sorted_scores$misclassification[1]
322-
print(best_err)
329+
print(h2o.performance(best_model, valid=T))
330+
print(h2o.logloss(best_model, valid=T))
323331
```
324332

325333
### Random Hyper-Parameter Search
326-
Often, hyper-parameter search for more than 4 parameters can be done more efficiently with random parameter search than with grid search. Basically, chances are good to find one of many good models in less time than performing an exhaustive grid search. We simply build `N` models with parameters drawn randomly from user-specified distributions (here, uniform). For this example, we use the adaptive learning rate and focus on tuning the network architecture and the regularization parameters.
334+
Often, hyper-parameter search for more than 4 parameters can be done more efficiently with random parameter search than with grid search. Basically, chances are good to find one of many good models in less time than performing an exhaustive grid search. We simply build up to `max_models` models with parameters drawn randomly from user-specified distributions (here, uniform). For this example, we use the adaptive learning rate and focus on tuning the network architecture and the regularization parameters. Note that we keep the `grid_id` the same, which will lead to the original grid search to be extended, such that we'll have one "leaderboard" across both grid searches. We also let the grid search stop automatically once the performance at the top of the leaderboard doesn't change much anymore, i.e., once the search has converged.
327335

328336
```r
329-
models <- c()
330-
for (i in 1:10) {
331-
rand_activation <- c("TanhWithDropout", "RectifierWithDropout")[sample(1:2,1)]
332-
rand_numlayers <- sample(2:5,1)
333-
rand_hidden <- c(sample(10:50,rand_numlayers,T))
334-
rand_l1 <- runif(1, 0, 1e-3)
335-
rand_l2 <- runif(1, 0, 1e-3)
336-
rand_dropout <- c(runif(rand_numlayers, 0, 0.6))
337-
rand_input_dropout <- runif(1, 0, 0.5)
338-
dlmodel <- h2o.deeplearning(
339-
model_id=paste0("dl_random_model_", i),
340-
training_frame=sampled_train,
341-
validation_frame=valid,
342-
x=predictors,
343-
y=response,
344-
# epochs=100, ## for real parameters: set high enough to get to convergence
345-
epochs=1,
346-
stopping_metric="misclassification",
347-
stopping_tolerance=1e-2, ## stop when logloss does not improve by >=1% for 2 scoring events
348-
stopping_rounds=2,
349-
score_validation_samples=10000, ## downsample validation set for faster scoring
350-
score_duty_cycle=0.025, ## don't score more than 2.5% of the wall time
351-
max_w2=10, ## can help improve stability for Rectifier
352-
353-
### Random parameters
354-
activation=rand_activation,
355-
hidden=rand_hidden,
356-
l1=rand_l1,
357-
l2=rand_l2,
358-
input_dropout_ratio=rand_input_dropout,
359-
hidden_dropout_ratios=rand_dropout
360-
)
361-
models <- c(models, dlmodel)
362-
}
337+
hyper_params <- list(
338+
activation=c("Rectifier","Tanh","Maxout","RectifierWithDropout","TanhWithDropout","MaxoutWithDropout"),
339+
hidden=list(c(20,20),c(50,50),c(30,30,30),c(25,25,25,25)),
340+
input_dropout_ratio=c(0,0.05),
341+
l1=seq(0,1e-4,1e-6),
342+
l2=seq(0,1e-4,1e-6)
343+
)
344+
hyper_params
345+
346+
## Stop once the top 5 models are within 1% of each other (i.e., the windowed average varies less than 1%)
347+
search_criteria = list(strategy = "RandomDiscrete", max_runtime_secs = 36000, max_models = 1000, seed=1234567, stopping_rounds=5, stopping_tolerance=1e-2)
348+
dl_random_grid <- h2o.grid(
349+
algorithm="deeplearning",
350+
grid_id = "dl_grid",
351+
training_frame=sampled_train,
352+
validation_frame=valid,
353+
x=predictors,
354+
y=response,
355+
epochs=1,
356+
stopping_metric="logloss",
357+
stopping_tolerance=1e-2, ## stop when logloss does not improve by >=1% for 2 scoring events
358+
stopping_rounds=2,
359+
score_validation_samples=10000, ## downsample validation set for faster scoring
360+
score_duty_cycle=0.025, ## don't score more than 2.5% of the wall time
361+
max_w2=10, ## can help improve stability for Rectifier
362+
hyper_params = hyper_params,
363+
search_criteria = search_criteria
364+
)
365+
grid <- h2o.getGrid("dl_grid",sort_by="logloss",decreasing=FALSE)
366+
grid
367+
368+
grid@summary_table[1,]
369+
best_model <- h2o.getModel(grid@model_ids[[1]]) ## model with lowest logloss
370+
best_model
363371
```
364372

365-
We continue to look for the model with the lowest validation misclassification rate:
373+
Let's look at the model with the lowest validation misclassification rate:
366374

367375
```r
368-
best_err <- 1 ##start with the best reference model from the grid search above, if available
369-
for (i in 1:length(models)) {
370-
err <- h2o.confusionMatrix(h2o.performance(models[[i]],valid=T))$Error[8]
371-
if (err < best_err) {
372-
best_err <- err
373-
best_model <- models[[i]]
374-
}
375-
}
376+
grid <- h2o.getGrid("dl_grid",sort_by="err",decreasing=FALSE)
377+
best_model <- h2o.getModel(grid@model_ids[[1]]) ## model with lowest classification error (on validation, since it was available during training)
376378
h2o.confusionMatrix(best_model,valid=T)
377379
best_params <- best_model@allparameters
380+
best_params$activation
378381
best_params$hidden
382+
best_params$input_dropout_ratio
379383
best_params$l1
380384
best_params$l2
381-
best_params$input_dropout_ratio
382385
```
383386

384387
###Checkpointing

tutorials/deeplearning/convert.sh

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,10 @@
11
#!/bin/bash
22

33
## Turn R Markdown into regular Markdown
4-
sed -e 's/```{r.*}/```r/' deeplearning.Rmd > deeplearning.md
4+
echo "##AUTO-GENERATED - DO NOT EDIT##" > deeplearning.md
5+
sed -e 's/```{r.*}/```r/' deeplearning.Rmd >> deeplearning.md
56
cp deeplearning.md README.md
67

78
## Turn R Markdown into plain R
8-
sed -e '1,\%```{r.*}%s:^:#:;/^```/,\%```{r.*}%s:^:#:;/```/d' deeplearning.Rmd > deeplearning.R
9+
echo "##AUTO-GENERATED - DO NOT EDIT##" > deeplearning.R
10+
sed -e '1,\%```{r.*}%s:^:#:;/^```/,\%```{r.*}%s:^:#:;/```/d' deeplearning.Rmd >> deeplearning.R

tutorials/deeplearning/deeplearning.R

Lines changed: 69 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
##AUTO-GENERATED - DO NOT EDIT##
12
## Classification and Regression with H2O Deep Learning
23
#
34
#* Introduction
@@ -14,11 +15,11 @@
1415
#* Deep Learning Tips & Tricks
1516
#
1617
### Introduction
17-
#This tutorial shows how a H2O [Deep Learning](http://en.wikipedia.org/wiki/Deep_learning) model can be used to do supervised classification and regression. This tutorial covers usage of H2O from R. A python version of this tutorial will be available as well in a separate document. This file is available in plain R, R markdown and regular markdown formats, and the plots are available as PDF files. All documents are available [on Github](https://github.com/h2oai/h2o-world-2015-training/raw/master/tutorials/deeplearning/).
18+
#This tutorial shows how a H2O [Deep Learning](http://en.wikipedia.org/wiki/Deep_learning) model can be used to do supervised classification and regression. This tutorial covers usage of H2O from R. A python version of this tutorial will be available as well in a separate document. This file is available in plain R, R markdown and regular markdown formats, and the plots are available as PDF files. All documents are available [on Github](.).
1819
#
1920
#If run from plain R, execute R in the directory of this script. If run from RStudio, be sure to setwd() to the location of this script. h2o.init() starts H2O in R's current working directory. h2o.importFile() looks for files from the perspective of where H2O was started.
2021
#
21-
#More examples and explanations can be found in our [H2O Deep Learning booklet](http://h2o.ai/resources/) and on our [H2O Github Repository](http://github.com/h2oai/h2o-3/). The PDF slide deck can be found [on Github](https://github.com/h2oai/h2o-world-2015-training/raw/master/tutorials/deeplearning/H2ODeepLearning.pdf).
22+
#More examples and explanations can be found in our [H2O Deep Learning booklet](http://h2o.ai/resources/) and on our [H2O Github Repository](http://github.com/h2oai/h2o-3/). The PDF slide deck can be found [on Github](./H2ODeepLearning.pdf).
2223
#
2324
#### H2O R Package
2425
#
@@ -48,7 +49,7 @@ example(h2o.deeplearning)
4849
#
4950
#We visualize the nature of H2O Deep Learning (DL), H2O's tree methods (GBM/DRF) and H2O's generalized linear modeling (GLM) by plotting the decision boundary between the red and black spirals:
5051
#
51-
setwd("~/h2o-world-2015-training/tutorials/deeplearning") ##For RStudio
52+
setwd("~/h2o-tutorials/tutorials/deeplearning") ##For RStudio
5253
spiral <- h2o.importFile(path = normalizePath("../data/spiral.csv"))
5354
grid <- h2o.importFile(path = normalizePath("../data/grid.csv"))
5455
# Define helper to plot contours
@@ -215,11 +216,11 @@ summary(m3)
215216
#
216217
#Let's compare the training error with the validation and test set errors
217218
#
218-
h2o.performance(m3, train=T) ## sampled training data (from model building)
219-
h2o.performance(m3, valid=T) ## sampled validation data (from model building)
220-
h2o.performance(m3, data=train) ## full training data
221-
h2o.performance(m3, data=valid) ## full validation data
222-
h2o.performance(m3, data=test) ## full test data
219+
h2o.performance(m3, train=T) ## sampled training data (from model building)
220+
h2o.performance(m3, valid=T) ## sampled validation data (from model building)
221+
h2o.performance(m3, newdata=train) ## full training data
222+
h2o.performance(m3, newdata=valid) ## full validation data
223+
h2o.performance(m3, newdata=test) ## full test data
223224
#
224225
#To confirm that the reported confusion matrix on the validation set (here, the test set) was correct, we make a prediction on the test set and compare the confusion matrices explicitly:
225226
#
@@ -245,15 +246,15 @@ hyper_params <- list(
245246
)
246247
hyper_params
247248
grid <- h2o.grid(
248-
"deeplearning",
249-
model_id="dl_grid",
249+
algorithm="deeplearning",
250+
grid_id="dl_grid",
250251
training_frame=sampled_train,
251252
validation_frame=valid,
252253
x=predictors,
253254
y=response,
254255
epochs=10,
255256
stopping_metric="misclassification",
256-
stopping_tolerance=1e-2, ## stop when logloss does not improve by >=1% for 2 scoring events
257+
stopping_tolerance=1e-2, ## stop when misclassification does not improve by >=1% for 2 scoring events
257258
stopping_rounds=2,
258259
score_validation_samples=10000, ## downsample validation set for faster scoring
259260
score_duty_cycle=0.025, ## don't score more than 2.5% of the wall time
@@ -271,71 +272,73 @@ grid
271272
#
272273
#Let's see which model had the lowest validation error:
273274
#
274-
## Find the best model and its full set of parameters (clunky for now, will be improved)
275-
scores <- cbind(as.data.frame(unlist((lapply(grid@model_ids, function(x)
276-
{ h2o.confusionMatrix(h2o.performance(h2o.getModel(x),valid=T))$Error[8] })) )), unlist(grid@model_ids))
277-
names(scores) <- c("misclassification","model")
278-
sorted_scores <- scores[order(scores$misclassification),]
279-
head(sorted_scores)
280-
best_model <- h2o.getModel(as.character(sorted_scores$model[1]))
275+
grid <- h2o.getGrid("dl_grid",sort_by="err",decreasing=FALSE)
276+
grid
277+
278+
## See what other "sort_by" criteria are allowed
279+
grid <- h2o.getGrid("dl_grid",sort_by="wrong_thing",decreasing=FALSE)
280+
281+
## Sort by logloss
282+
h2o.getGrid("dl_grid",sort_by="logloss",decreasing=FALSE)
283+
284+
## Find the best model and its full set of parameters
285+
grid@summary_table[1,]
286+
best_model <- h2o.getModel(grid@model_ids[[1]])
287+
best_model
288+
281289
print(best_model@allparameters)
282-
best_err <- sorted_scores$misclassification[1]
283-
print(best_err)
290+
print(h2o.performance(best_model, valid=T))
291+
print(h2o.logloss(best_model, valid=T))
284292
#
285293
#### Random Hyper-Parameter Search
286-
#Often, hyper-parameter search for more than 4 parameters can be done more efficiently with random parameter search than with grid search. Basically, chances are good to find one of many good models in less time than performing an exhaustive grid search. We simply build `N` models with parameters drawn randomly from user-specified distributions (here, uniform). For this example, we use the adaptive learning rate and focus on tuning the network architecture and the regularization parameters.
287-
#
288-
models <- c()
289-
for (i in 1:10) {
290-
rand_activation <- c("TanhWithDropout", "RectifierWithDropout")[sample(1:2,1)]
291-
rand_numlayers <- sample(2:5,1)
292-
rand_hidden <- c(sample(10:50,rand_numlayers,T))
293-
rand_l1 <- runif(1, 0, 1e-3)
294-
rand_l2 <- runif(1, 0, 1e-3)
295-
rand_dropout <- c(runif(rand_numlayers, 0, 0.6))
296-
rand_input_dropout <- runif(1, 0, 0.5)
297-
dlmodel <- h2o.deeplearning(
298-
model_id=paste0("dl_random_model_", i),
299-
training_frame=sampled_train,
300-
validation_frame=valid,
301-
x=predictors,
302-
y=response,
303-
# epochs=100, ## for real parameters: set high enough to get to convergence
304-
epochs=1,
305-
stopping_metric="misclassification",
306-
stopping_tolerance=1e-2, ## stop when logloss does not improve by >=1% for 2 scoring events
307-
stopping_rounds=2,
308-
score_validation_samples=10000, ## downsample validation set for faster scoring
309-
score_duty_cycle=0.025, ## don't score more than 2.5% of the wall time
310-
max_w2=10, ## can help improve stability for Rectifier
294+
#Often, hyper-parameter search for more than 4 parameters can be done more efficiently with random parameter search than with grid search. Basically, chances are good to find one of many good models in less time than performing an exhaustive grid search. We simply build up to `max_models` models with parameters drawn randomly from user-specified distributions (here, uniform). For this example, we use the adaptive learning rate and focus on tuning the network architecture and the regularization parameters. Note that we keep the `grid_id` the same, which will lead to the original grid search to be extended, such that we'll have one "leaderboard" across both grid searches. We also let the grid search stop automatically once the performance at the top of the leaderboard doesn't change much anymore, i.e., once the search has converged.
295+
#
296+
hyper_params <- list(
297+
activation=c("Rectifier","Tanh","Maxout","RectifierWithDropout","TanhWithDropout","MaxoutWithDropout"),
298+
hidden=list(c(20,20),c(50,50),c(30,30,30),c(25,25,25,25)),
299+
input_dropout_ratio=c(0,0.05),
300+
l1=seq(0,1e-4,1e-6),
301+
l2=seq(0,1e-4,1e-6)
302+
)
303+
hyper_params
311304

312-
### Random parameters
313-
activation=rand_activation,
314-
hidden=rand_hidden,
315-
l1=rand_l1,
316-
l2=rand_l2,
317-
input_dropout_ratio=rand_input_dropout,
318-
hidden_dropout_ratios=rand_dropout
319-
)
320-
models <- c(models, dlmodel)
321-
}
305+
## Stop once the top 5 models are within 1% of each other (i.e., the windowed average varies less than 1%)
306+
search_criteria = list(strategy = "RandomDiscrete", max_runtime_secs = 36000, max_models = 1000, seed=1234567, stopping_rounds=5, stopping_tolerance=1e-2)
307+
dl_random_grid <- h2o.grid(
308+
algorithm="deeplearning",
309+
grid_id = "dl_grid",
310+
training_frame=sampled_train,
311+
validation_frame=valid,
312+
x=predictors,
313+
y=response,
314+
epochs=1,
315+
stopping_metric="logloss",
316+
stopping_tolerance=1e-2, ## stop when logloss does not improve by >=1% for 2 scoring events
317+
stopping_rounds=2,
318+
score_validation_samples=10000, ## downsample validation set for faster scoring
319+
score_duty_cycle=0.025, ## don't score more than 2.5% of the wall time
320+
max_w2=10, ## can help improve stability for Rectifier
321+
hyper_params = hyper_params,
322+
search_criteria = search_criteria
323+
)
324+
grid <- h2o.getGrid("dl_grid",sort_by="logloss",decreasing=FALSE)
325+
grid
326+
327+
grid@summary_table[1,]
328+
best_model <- h2o.getModel(grid@model_ids[[1]]) ## model with lowest logloss
329+
best_model
322330
#
323-
#We continue to look for the model with the lowest validation misclassification rate:
324-
#
325-
best_err <- 1 ##start with the best reference model from the grid search above, if available
326-
for (i in 1:length(models)) {
327-
err <- h2o.confusionMatrix(h2o.performance(models[[i]],valid=T))$Error[8]
328-
if (err < best_err) {
329-
best_err <- err
330-
best_model <- models[[i]]
331-
}
332-
}
331+
#Let's look at the model with the lowest validation misclassification rate:
332+
#
333+
grid <- h2o.getGrid("dl_grid",sort_by="err",decreasing=FALSE)
334+
best_model <- h2o.getModel(grid@model_ids[[1]]) ## model with lowest classification error (on validation, since it was available during training)
333335
h2o.confusionMatrix(best_model,valid=T)
334336
best_params <- best_model@allparameters
337+
best_params$activation
335338
best_params$hidden
339+
best_params$input_dropout_ratio
336340
best_params$l1
337341
best_params$l2
338-
best_params$input_dropout_ratio
339342
#
340343
####Checkpointing
341344
#Let's continue training the manually tuned model from before, for 2 more epochs. Note that since many important parameters such as `epochs, l1, l2, max_w2, score_interval, train_samples_per_iteration, input_dropout_ratio, hidden_dropout_ratios, score_duty_cycle, classification_stop, regression_stop, variable_importances, force_load_balance` can be modified between checkpoint restarts, it is best to specify as many parameters as possible explicitly.

0 commit comments

Comments
 (0)