You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
title: "Perform Chunking Analysis using rxDataStep (Data Science Deep Dive) | Microsoft Docs"
2
+
title: "Perform Chunking Analysis using rxDataStep| Microsoft Docs"
3
3
ms.custom: ""
4
-
ms.date: "10/03/2016"
4
+
ms.date: "05/03/2017"
5
5
ms.prod: "sql-server-2016"
6
6
ms.reviewer: ""
7
7
ms.suite: ""
@@ -19,115 +19,111 @@ author: "jeannt"
19
19
ms.author: "jeannt"
20
20
manager: "jhubbard"
21
21
---
22
-
# Lesson 3-3 - Perform Chunking Analysis using rxDataStep
23
-
The *rxDataStep* function can be used to process data in chunks, rather than requiring that the entire dataset be loaded into memory and processed at one time, as in traditional R. The way it works is that you read the data in chunks and use R functions to process each chunk of data in turn, and then write the summary results for each chunk to a common [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] data source.
24
-
25
-
In this lesson, you'll practice this technique by using the *table* function in R to compute a contingency table.
26
-
27
-
> [!TIP]
28
-
> This example is meant for instructional purposes only. If you need to tabulate real-world data sets, we recommend that you use the *rxCrossTabs* or *rxCube* functions built into **RevoScaleR**, which are optimized for this sort of operation.
29
-
30
-
## Partition Data by Values
31
-
32
-
1. First, create a custom R function named *ProcessChunk* that calls the *table* function on each chunk of data.
33
-
34
-
```R
35
-
ProcessChunk<-function( dataList) {
36
-
# Convert the input list to a data frame and compute contingency table
37
-
chunkTable<- table(as.data.frame(dataList))
38
-
39
-
# Convert table output to a data frame with a single row
40
-
varNames<- names(chunkTable)
41
-
varValues<-as.vector(chunkTable)
42
-
dim(varValues)<-c(1, length(varNames))
43
-
chunkDF<-as.data.frame(varValues)
44
-
names(chunkDF)<-varNames
45
-
46
-
# Return the data frame, which has a single row
47
-
return( chunkDF )
48
-
}
49
-
```
50
-
51
-
52
-
2.Setthecomputecontexttotheserver.
22
+
# Perform Chunking Analysis using rxDataStep
23
+
24
+
The **rxDataStep** function can be used to process data in chunks, rather than requiring that the entire dataset be loaded into memory and processed at one time, as in traditional R. The way it works is that you read the data in chunks and use R functions to process each chunk of data in turn, and then write the summary results for each chunk to a common [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] data source.
25
+
26
+
In this lesson, you'll practice this technique by using the `table` function in R, to compute a contingency table.
27
+
28
+
> [!TIP]
29
+
> This example is meant for instructional purposes only. If you need to tabulate real-world data sets, we recommend that you use the **rxCrossTabs** or **rxCube** functions in **RevoScaleR**, which are optimized for this sort of operation.
30
+
31
+
## Partition Data by Values
32
+
33
+
1. First, create a custom R function that calls the *table* function on each chunk of data, and name it `ProcessChunk`.
34
+
35
+
```R
36
+
ProcessChunk<-function( dataList) {
37
+
# Convert the input list to a dataframe and compute contingency table
38
+
chunkTable<- table(as.data.frame(dataList))
39
+
40
+
# Convert table output to a data frame with a single row
41
+
varNames<-names(chunkTable)
42
+
varValues<-as.vector(chunkTable)
43
+
dim(varValues)<-c(1, length(varNames))
44
+
chunkDF<-as.data.frame(varValues)
45
+
names(chunkDF) <-varNames
46
+
47
+
# Return the data frame, which has a single row
48
+
return( chunkDF )
49
+
}
50
+
```
51
+
52
+
2.Setthecomputecontexttotheserver.
53
53
54
-
```R
55
-
rxSetComputeContext( sqlCompute )
56
-
```
54
+
```R
55
+
rxSetComputeContext( sqlCompute )
56
+
```
57
57
58
-
3.You'll define a SQL Server data source to hold the data you'reprocessing.StartbyassigningaSQLquerytoavariable.
58
+
3.You'll define a SQL Server data source to hold the data you'reprocessing.StartbyassigningaSQLquerytoavariable.
59
59
60
-
```R
61
-
dayQuery<-"SELECT DayOfWeek FROM AirDemoSmallTest"
62
-
```
60
+
```R
61
+
dayQuery<-"SELECT DayOfWeek FROM AirDemoSmallTest"
Ifyouran*rxGetVarInfo*onthisdatasource, you'd see that it contains just the single column: *Var 1: DayOfWeek, Type: factor, no factor levels available*
74
74
75
-
5. Before applying this factor variable to the source data, create a separate table to hold the intermediate results. Again, you just use the *RxSqlServerData* function to define the data, and delete any existing tables of the same name.
5. Before applying this factor variable to the source data, create a separate table to hold the intermediate results. Again, you just use the RxSqlServerData function to define the data, and delete any existing tables of the same name.
82
76
83
-
7. Now you'llcallthecustomfunction*ProcessChunk*functiontotransformthedataasitisread, byusingitasthe*transformFunc*argumenttothe*rxDataStep*function.
[Lesson4:AnalyzeDatainLocalComputeContext(Data Science Deep Dive)](../../advanced-analytics/r-services/lesson-4-analyze-data-in-local-compute-context-data-science-deep-dive.md)
124
-
125
-
## Previous Step
126
-
[CreateNewSQLServerTableusingrxDataStep(Data Science Deep Dive)](../../advanced-analytics/r-services/lesson-3-2-create-new-sql-server-table-using-rxdatastep.md)
0 commit comments