Skip to content

Commit 5ebb570

Browse files
committed
mlb week 6
1 parent 1746a10 commit 5ebb570

2 files changed

Lines changed: 227 additions & 1 deletion

File tree

tutorials/mlb-hxe-import-data-pa/mlb-hxe-import-data-pa.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ tags: [ tutorial>beginner, products>sap-hana\, express-edition, topic>machine-le
99

1010
### You will learn
1111

12-
As Part of the [SAP Predictive Analytics documentation](https://help.sap.com/viewer/p/SAP_PREDICTIVE_ANALYTICS), you can download sample datasets to address many Machine learning scenarios.
12+
As part of the [SAP Predictive Analytics documentation](https://help.sap.com/viewer/p/SAP_PREDICTIVE_ANALYTICS), you can download sample datasets to address many Machine learning scenarios.
1313

1414
In this tutorial, you will learn how to import all the SAP Predictive Analytics sample dataset into your SAP HANA, express edition instance.
1515

Lines changed: 226 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,226 @@
1+
---
2+
title: Import data from R Datasets package
3+
description: Import data from the R Datasets package in your SAP HANA, express edition instance
4+
primary_tag: products>sap-hana\, express-edition
5+
tags: [ tutorial>beginner, products>sap-hana\, express-edition, topic>machine-learning ]
6+
---
7+
## Prerequisites
8+
- Proficiency: beginner
9+
10+
### You will learn
11+
12+
As part of the [R Datasets package](https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html), you can get access to over a hundred sample datasets to address many Machine learning scenarios.
13+
14+
In this tutorial, you will learn how to proceed to import the sample dataset into your SAP HANA, express edition instance.
15+
16+
## Details
17+
18+
### Time to Complete
19+
**10 Min**.
20+
21+
[ACCORDION-BEGIN [Prerequisite: ](Prepare your environment)]
22+
23+
The steps detailed in this tutorial and the related links will assume that you have completed the following tutorial:
24+
25+
- [Install a SQL query tool for SAP HANA, express edition](https://www.sap.com/developer/tutorials/mlb-hxe-tools-sql.html)
26+
- [Prepare your SAP HANA, express edition instance for Machine Learning](https://www.sap.com/developer/tutorials/mlb-hxe-setup-basic.html)
27+
- [Configure the SAP HANA R integration with SAP HANA, express edition](https://www.sap.com/developer/tutorials/mlb-hxe-setup-r.html)
28+
29+
[ACCORDION-END]
30+
31+
[ACCORDION-BEGIN [Prerequisite : ](Create a dedicated schema)]
32+
33+
In addition, it is a good practice to separate data into different schema based on their origin.
34+
35+
In this tutorial, you will be using the **`R_DATA`** schema to load the SAP Predictive Analytics sample data.
36+
37+
If you have already created the schema, move to the next step.
38+
39+
Connect to the **HXE** tenant using the **`ML_USER`** user credentials and execute the following SQL statement:
40+
41+
```SQL
42+
CREATE SCHEMA R_DATA;
43+
```
44+
45+
[ACCORDION-END]
46+
47+
[ACCORDION-BEGIN [Info : ](Supported Data Types)]
48+
49+
The data structure supported to exchange data between the SAP HANA database and the R environment is the R data frame, which has a similar data structure to a column table in the SAP HANA database.
50+
51+
The supported data types are listed below:
52+
53+
| R Type | SAP HANA SQL Type |
54+
|:----------------------|---------------------|
55+
| numeric (integer) | TINYINT <br> SMALLINT <br> INTEGER
56+
| numeric (double) | REAL <br> DOUBLE <br> FLOAT <br> FLOAT(p) <br> DECIMAL <br> DECIMAL(p,s) <br> BIGINT
57+
| character / factor | VARCHAR <br> NVARCHAR <br> CLOB <br> NCLOB
58+
| Date | DATE
59+
| Date/Time (`POSIXct`) | TIMESTAMP <br> SECONDDATE
60+
| Raw | VARBINARY <br> BLOB
61+
62+
For example, the `ts` R type is not supported.
63+
64+
[ACCORDION-END]
65+
66+
[ACCORDION-BEGIN [Step 1: ](Get The Dataset Structure)]
67+
68+
Before being able to import the data inside of your SAP HANA, express edition instance you will need to create the dataset table.
69+
70+
The [R Datasets package documentation](https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html) doesn't always provide the details to create the corresponding table as data type are not always documented.
71+
72+
However, as most datasets are in fact available as data frame or vectors, and sometime time series, you can easily retrieve the structure and details about the data types.
73+
74+
The following script will give you the information for the Iris dataset.
75+
76+
Connect to the **HXE** tenant using the **`ML_USER`** user credentials and execute the following SQL statement:
77+
78+
```SQL
79+
SET SCHEMA R_DATA;
80+
81+
-- Uncomment the drop statement is you want to run it from scratch
82+
-- DROP TABLE COLUMN_LIST;
83+
-- DROP PROCEDURE GET_COLUMN_NAME;
84+
-- DROP PROCEDURE DISPLAY_COLUMN_NAME;
85+
86+
CREATE COLUMN TABLE COLUMN_LIST (
87+
"name" VARCHAR(5000),
88+
"type" VARCHAR(5000)
89+
);
90+
91+
92+
CREATE PROCEDURE GET_COLUMN_NAME(OUT col_list "COLUMN_LIST")
93+
LANGUAGE RLANG AS
94+
BEGIN
95+
library(datasets)
96+
97+
data(iris)
98+
99+
n <- c(colnames(as.data.frame(iris)))
100+
t <- sapply(as.data.frame(iris), class)
101+
102+
col_name <- data.frame("name" = n)
103+
col_type <- data.frame("type" = t)
104+
105+
106+
col_list <- cbind(as.data.frame(col_name, col_type))
107+
END;
108+
109+
110+
CREATE PROCEDURE DISPLAY_COLUMN_NAME()
111+
AS BEGIN
112+
CALL GET_COLUMN_NAME(col_list);
113+
INSERT INTO COLUMN_LIST SELECT * FROM :col_list;
114+
END;
115+
116+
CALL DISPLAY_COLUMN_NAME();
117+
SELECT * FROM COLUMN_LIST;
118+
```
119+
120+
The expected output should look like this:
121+
122+
123+
| Name | Type |
124+
|:---------------|-------------:|
125+
| `Sepal.Length` | numeric
126+
| `Sepal.Width` | numeric
127+
| `Petal.Length` | numeric
128+
| `Petal.Width` | numeric
129+
| `Species` | factor
130+
131+
And you can deduce the following table structure:
132+
133+
```
134+
CREATE COLUMN TABLE IRIS (
135+
"Sepal.Length" DOUBLE,
136+
"Sepal.Width" DOUBLE,
137+
"Petal.Length" DOUBLE,
138+
"Petal.Width" DOUBLE,
139+
"Species" VARCHAR(5000)
140+
);
141+
```
142+
143+
[ACCORDION-END]
144+
145+
[ACCORDION-BEGIN [Step 2 : ](Import the Iris Dataset)]
146+
147+
Now that we have the dataset structure, it is fairly easy to import the data running the following SQL.
148+
149+
```sql
150+
SET SCHEMA R_DATA;
151+
152+
-- Uncomment the drop statement is you want to run it from scratch
153+
-- DROP TABLE IRIS;
154+
-- DROP PROCEDURE LOAD_IRIS;
155+
-- DROP PROCEDURE DISPLAY_IRIS;
156+
157+
CREATE COLUMN TABLE IRIS (
158+
"Sepal.Length" DOUBLE,
159+
"Sepal.Width" DOUBLE,
160+
"Petal.Length" DOUBLE,
161+
"Petal.Width" DOUBLE,
162+
"Species" VARCHAR(5000)
163+
);
164+
165+
CREATE PROCEDURE LOAD_IRIS(OUT iris "IRIS")
166+
LANGUAGE RLANG AS
167+
BEGIN
168+
library(datasets)
169+
data(iris)
170+
iris <- cbind(as.data.frame(iris))
171+
END;
172+
173+
CREATE PROCEDURE DISPLAY_IRIS()
174+
AS BEGIN
175+
CALL LOAD_IRIS(iris);
176+
INSERT INTO IRIS SELECT * FROM :iris;
177+
END;
178+
179+
CALL DISPLAY_IRIS();
180+
SELECT * FROM IRIS;
181+
```
182+
183+
The Dataset package provide data in many format, with the most common one being a data frame.
184+
185+
For example the `ability.cov` is a covariance matrix, and the script used before will still work thanks to the use of the `as.data.frame` method.
186+
187+
Running the step 1 SQL, you can build the following script to load the Air Quality data.
188+
189+
```sql
190+
SET SCHEMA R_DATA;
191+
192+
-- Uncomment the drop statement is you want to run it from scratch
193+
-- DROP TABLE ABILITY;
194+
-- DROP PROCEDURE LOAD_ABILITY;
195+
-- DROP PROCEDURE DISPLAY_ABILITY;
196+
197+
CREATE COLUMN TABLE ABILITY (
198+
"cov.general" DOUBLE,
199+
"cov.picture" DOUBLE,
200+
"cov.blocks" DOUBLE,
201+
"cov.maze" DOUBLE,
202+
"cov.reading" DOUBLE,
203+
"cov.vocab" DOUBLE,
204+
"center" DOUBLE,
205+
"n.obs" DOUBLE
206+
);
207+
208+
CREATE PROCEDURE LOAD_ABILITY(OUT ability "ABILITY")
209+
LANGUAGE RLANG AS
210+
BEGIN
211+
library(datasets)
212+
data(ability.cov)
213+
ability <- cbind(as.data.frame(ability.cov))
214+
END;
215+
216+
CREATE PROCEDURE DISPLAY_ABILITY()
217+
AS BEGIN
218+
CALL LOAD_ABILITY(ability);
219+
INSERT INTO ABILITY SELECT * FROM :ability;
220+
END;
221+
222+
CALL DISPLAY_ABILITY();
223+
SELECT * FROM ABILITY;
224+
```
225+
226+
[ACCORDION-END]

0 commit comments

Comments
 (0)