|
| 1 | +--- |
| 2 | +title: Import data from R Datasets package |
| 3 | +description: Import data from the R Datasets package in your SAP HANA, express edition instance |
| 4 | +primary_tag: products>sap-hana\, express-edition |
| 5 | +tags: [ tutorial>beginner, products>sap-hana\, express-edition, topic>machine-learning ] |
| 6 | +--- |
| 7 | +## Prerequisites |
| 8 | +- Proficiency: beginner |
| 9 | + |
| 10 | +### You will learn |
| 11 | + |
| 12 | +As part of the [R Datasets package](https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html), you can get access to over a hundred sample datasets to address many Machine learning scenarios. |
| 13 | + |
| 14 | +In this tutorial, you will learn how to proceed to import the sample dataset into your SAP HANA, express edition instance. |
| 15 | + |
| 16 | +## Details |
| 17 | + |
| 18 | +### Time to Complete |
| 19 | +**10 Min**. |
| 20 | + |
| 21 | +[ACCORDION-BEGIN [Prerequisite: ](Prepare your environment)] |
| 22 | + |
| 23 | +The steps detailed in this tutorial and the related links will assume that you have completed the following tutorial: |
| 24 | + |
| 25 | +- [Install a SQL query tool for SAP HANA, express edition](https://www.sap.com/developer/tutorials/mlb-hxe-tools-sql.html) |
| 26 | +- [Prepare your SAP HANA, express edition instance for Machine Learning](https://www.sap.com/developer/tutorials/mlb-hxe-setup-basic.html) |
| 27 | +- [Configure the SAP HANA R integration with SAP HANA, express edition](https://www.sap.com/developer/tutorials/mlb-hxe-setup-r.html) |
| 28 | + |
| 29 | +[ACCORDION-END] |
| 30 | + |
| 31 | +[ACCORDION-BEGIN [Prerequisite : ](Create a dedicated schema)] |
| 32 | + |
| 33 | +In addition, it is a good practice to separate data into different schema based on their origin. |
| 34 | + |
| 35 | +In this tutorial, you will be using the **`R_DATA`** schema to load the SAP Predictive Analytics sample data. |
| 36 | + |
| 37 | +If you have already created the schema, move to the next step. |
| 38 | + |
| 39 | +Connect to the **HXE** tenant using the **`ML_USER`** user credentials and execute the following SQL statement: |
| 40 | + |
| 41 | +```SQL |
| 42 | +CREATE SCHEMA R_DATA; |
| 43 | +``` |
| 44 | + |
| 45 | +[ACCORDION-END] |
| 46 | + |
| 47 | +[ACCORDION-BEGIN [Info : ](Supported Data Types)] |
| 48 | + |
| 49 | +The data structure supported to exchange data between the SAP HANA database and the R environment is the R data frame, which has a similar data structure to a column table in the SAP HANA database. |
| 50 | + |
| 51 | +The supported data types are listed below: |
| 52 | + |
| 53 | +| R Type | SAP HANA SQL Type | |
| 54 | +|:----------------------|---------------------| |
| 55 | +| numeric (integer) | TINYINT <br> SMALLINT <br> INTEGER |
| 56 | +| numeric (double) | REAL <br> DOUBLE <br> FLOAT <br> FLOAT(p) <br> DECIMAL <br> DECIMAL(p,s) <br> BIGINT |
| 57 | +| character / factor | VARCHAR <br> NVARCHAR <br> CLOB <br> NCLOB |
| 58 | +| Date | DATE |
| 59 | +| Date/Time (`POSIXct`) | TIMESTAMP <br> SECONDDATE |
| 60 | +| Raw | VARBINARY <br> BLOB |
| 61 | + |
| 62 | +For example, the `ts` R type is not supported. |
| 63 | + |
| 64 | +[ACCORDION-END] |
| 65 | + |
| 66 | +[ACCORDION-BEGIN [Step 1: ](Get The Dataset Structure)] |
| 67 | + |
| 68 | +Before being able to import the data inside of your SAP HANA, express edition instance you will need to create the dataset table. |
| 69 | + |
| 70 | +The [R Datasets package documentation](https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html) doesn't always provide the details to create the corresponding table as data type are not always documented. |
| 71 | + |
| 72 | +However, as most datasets are in fact available as data frame or vectors, and sometime time series, you can easily retrieve the structure and details about the data types. |
| 73 | + |
| 74 | +The following script will give you the information for the Iris dataset. |
| 75 | + |
| 76 | +Connect to the **HXE** tenant using the **`ML_USER`** user credentials and execute the following SQL statement: |
| 77 | + |
| 78 | +```SQL |
| 79 | +SET SCHEMA R_DATA; |
| 80 | + |
| 81 | +-- Uncomment the drop statement is you want to run it from scratch |
| 82 | +-- DROP TABLE COLUMN_LIST; |
| 83 | +-- DROP PROCEDURE GET_COLUMN_NAME; |
| 84 | +-- DROP PROCEDURE DISPLAY_COLUMN_NAME; |
| 85 | + |
| 86 | +CREATE COLUMN TABLE COLUMN_LIST ( |
| 87 | + "name" VARCHAR(5000), |
| 88 | + "type" VARCHAR(5000) |
| 89 | +); |
| 90 | + |
| 91 | + |
| 92 | +CREATE PROCEDURE GET_COLUMN_NAME(OUT col_list "COLUMN_LIST") |
| 93 | +LANGUAGE RLANG AS |
| 94 | +BEGIN |
| 95 | + library(datasets) |
| 96 | + |
| 97 | + data(iris) |
| 98 | + |
| 99 | + n <- c(colnames(as.data.frame(iris))) |
| 100 | + t <- sapply(as.data.frame(iris), class) |
| 101 | + |
| 102 | + col_name <- data.frame("name" = n) |
| 103 | + col_type <- data.frame("type" = t) |
| 104 | + |
| 105 | + |
| 106 | + col_list <- cbind(as.data.frame(col_name, col_type)) |
| 107 | +END; |
| 108 | + |
| 109 | + |
| 110 | +CREATE PROCEDURE DISPLAY_COLUMN_NAME() |
| 111 | +AS BEGIN |
| 112 | + CALL GET_COLUMN_NAME(col_list); |
| 113 | + INSERT INTO COLUMN_LIST SELECT * FROM :col_list; |
| 114 | +END; |
| 115 | + |
| 116 | +CALL DISPLAY_COLUMN_NAME(); |
| 117 | +SELECT * FROM COLUMN_LIST; |
| 118 | +``` |
| 119 | + |
| 120 | +The expected output should look like this: |
| 121 | + |
| 122 | + |
| 123 | +| Name | Type | |
| 124 | +|:---------------|-------------:| |
| 125 | +| `Sepal.Length` | numeric |
| 126 | +| `Sepal.Width` | numeric |
| 127 | +| `Petal.Length` | numeric |
| 128 | +| `Petal.Width` | numeric |
| 129 | +| `Species` | factor |
| 130 | + |
| 131 | +And you can deduce the following table structure: |
| 132 | + |
| 133 | +``` |
| 134 | +CREATE COLUMN TABLE IRIS ( |
| 135 | + "Sepal.Length" DOUBLE, |
| 136 | + "Sepal.Width" DOUBLE, |
| 137 | + "Petal.Length" DOUBLE, |
| 138 | + "Petal.Width" DOUBLE, |
| 139 | + "Species" VARCHAR(5000) |
| 140 | +); |
| 141 | +``` |
| 142 | + |
| 143 | +[ACCORDION-END] |
| 144 | + |
| 145 | +[ACCORDION-BEGIN [Step 2 : ](Import the Iris Dataset)] |
| 146 | + |
| 147 | +Now that we have the dataset structure, it is fairly easy to import the data running the following SQL. |
| 148 | + |
| 149 | +```sql |
| 150 | +SET SCHEMA R_DATA; |
| 151 | + |
| 152 | +-- Uncomment the drop statement is you want to run it from scratch |
| 153 | +-- DROP TABLE IRIS; |
| 154 | +-- DROP PROCEDURE LOAD_IRIS; |
| 155 | +-- DROP PROCEDURE DISPLAY_IRIS; |
| 156 | + |
| 157 | +CREATE COLUMN TABLE IRIS ( |
| 158 | + "Sepal.Length" DOUBLE, |
| 159 | + "Sepal.Width" DOUBLE, |
| 160 | + "Petal.Length" DOUBLE, |
| 161 | + "Petal.Width" DOUBLE, |
| 162 | + "Species" VARCHAR(5000) |
| 163 | +); |
| 164 | + |
| 165 | +CREATE PROCEDURE LOAD_IRIS(OUT iris "IRIS") |
| 166 | +LANGUAGE RLANG AS |
| 167 | +BEGIN |
| 168 | + library(datasets) |
| 169 | + data(iris) |
| 170 | + iris <- cbind(as.data.frame(iris)) |
| 171 | +END; |
| 172 | + |
| 173 | +CREATE PROCEDURE DISPLAY_IRIS() |
| 174 | +AS BEGIN |
| 175 | + CALL LOAD_IRIS(iris); |
| 176 | + INSERT INTO IRIS SELECT * FROM :iris; |
| 177 | +END; |
| 178 | + |
| 179 | +CALL DISPLAY_IRIS(); |
| 180 | +SELECT * FROM IRIS; |
| 181 | +``` |
| 182 | + |
| 183 | +The Dataset package provide data in many format, with the most common one being a data frame. |
| 184 | + |
| 185 | +For example the `ability.cov` is a covariance matrix, and the script used before will still work thanks to the use of the `as.data.frame` method. |
| 186 | + |
| 187 | +Running the step 1 SQL, you can build the following script to load the Air Quality data. |
| 188 | + |
| 189 | +```sql |
| 190 | +SET SCHEMA R_DATA; |
| 191 | + |
| 192 | +-- Uncomment the drop statement is you want to run it from scratch |
| 193 | +-- DROP TABLE ABILITY; |
| 194 | +-- DROP PROCEDURE LOAD_ABILITY; |
| 195 | +-- DROP PROCEDURE DISPLAY_ABILITY; |
| 196 | + |
| 197 | +CREATE COLUMN TABLE ABILITY ( |
| 198 | + "cov.general" DOUBLE, |
| 199 | + "cov.picture" DOUBLE, |
| 200 | + "cov.blocks" DOUBLE, |
| 201 | + "cov.maze" DOUBLE, |
| 202 | + "cov.reading" DOUBLE, |
| 203 | + "cov.vocab" DOUBLE, |
| 204 | + "center" DOUBLE, |
| 205 | + "n.obs" DOUBLE |
| 206 | +); |
| 207 | + |
| 208 | +CREATE PROCEDURE LOAD_ABILITY(OUT ability "ABILITY") |
| 209 | +LANGUAGE RLANG AS |
| 210 | +BEGIN |
| 211 | + library(datasets) |
| 212 | + data(ability.cov) |
| 213 | + ability <- cbind(as.data.frame(ability.cov)) |
| 214 | +END; |
| 215 | + |
| 216 | +CREATE PROCEDURE DISPLAY_ABILITY() |
| 217 | +AS BEGIN |
| 218 | + CALL LOAD_ABILITY(ability); |
| 219 | + INSERT INTO ABILITY SELECT * FROM :ability; |
| 220 | +END; |
| 221 | + |
| 222 | +CALL DISPLAY_ABILITY(); |
| 223 | +SELECT * FROM ABILITY; |
| 224 | +``` |
| 225 | + |
| 226 | +[ACCORDION-END] |
0 commit comments