You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/relational-databases/polybase/polybase-configure-mongodb.md
+61-20Lines changed: 61 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -25,19 +25,17 @@ If you haven't installed PolyBase, see [PolyBase installation](polybase-installa
25
25
26
26
## Configure an External Table
27
27
28
-
To query the data from an MongoDB data source, you must create external tables to reference the external data. This section provides sample code to create these external tables.
29
-
28
+
To query the data from an MongoDB data source, you must create external tables to reference the external data. This section provides sample code to create these external tables.
29
+
30
30
We recommend creating statistics on external table columns, especially the ones used for joins, filters and aggregates, for optimal query performance.
1. Create a master key on the database. This is required to encrypt the credential secret.
42
40
43
41
```sql
@@ -58,16 +56,16 @@ These objects will create in this section:
58
56
1. Create an external data source with [CREATE EXTERNAL DATA SOURCE](../../t-sql/statements/create-external-data-source-transact-sql.md). Specify external data source location and credentials for the MongoDB data source.
59
57
60
58
```sql
61
-
/* LOCATION: Server DNS name or IP address.
62
-
* PUSHDOWN: specify whether computation should be pushed down to the source. ON by default.
63
-
* CREDENTIAL: the database scoped credential, created above.
64
-
*/
65
-
CREATE EXTERNAL DATA SOURCE MongoDBInstance
66
-
WITH (
67
-
LOCATION = '<vendor>://<server>[:<port>]',
68
-
-- PUSHDOWN = ON | OFF,
69
-
, CREDENTIAL = MongoDBCredentials
70
-
);
59
+
/* LOCATION: Location string should be of format '<vendor>://<server>[:<port>]'.
60
+
* PUSHDOWN: specify whether computation should be pushed down to the source. ON by default.
61
+
* CREDENTIAL: the database scoped credential, created above.
62
+
*/
63
+
CREATE EXTERNAL DATA SOURCE MongoInstance
64
+
WITH (
65
+
LOCATION = mongodb://MongoServer,
66
+
-- PUSHDOWN = ON | OFF,
67
+
CREDENTIAL = TeradataCredentials
68
+
);
71
69
```
72
70
73
71
1. Create schemas for external data
@@ -78,7 +76,7 @@ These objects will create in this section:
78
76
```
79
77
80
78
1. Create external tables that represents data stored in external MongoDB system [CREATE EXTERNAL TABLE](../../t-sql/statements/create-external-table-transact-sql.md).
81
-
79
+
82
80
```sql
83
81
/* LOCATION: MongoDB table/view in '<database_name>.<schema_name>.<object_name>' format
84
82
* DATA_SOURCE: the external data source, created above.
@@ -103,9 +101,52 @@ These objects will create in this section:
103
101
CREATE STATISTICS OrdersOrderKeyStatistics ON MongoDB.orders(O_ORDERKEY) WITH FULLSCAN;
104
102
```
105
103
106
-
##Flattening
104
+
## Flattening
105
+
flattening is enable for nested and repeated data from MongoDB document collections. User is required to enable create an external table and explicitly specify a relational schema over MongoDB document collections that may have nested and/or repeated data. We will enable auto-schema detection over mongo document collections in future milestones.
106
+
JSON nested/repeated data types will be flattened as follows
107
+
108
+
* Object: unordered key/value collection enclosed in curly braces (nested)
109
+
110
+
- We will create a table column for each object key
111
+
112
+
* Column Name: objectname_keyname
113
+
114
+
* Array: ordered values, separated by commas, enclosed in square brackets (repeated)
115
+
116
+
- We will add a new table row for each array item
117
+
118
+
- We will create a column per array to store the array item index
119
+
120
+
* Column Name: arrayname_index
121
+
122
+
* Data Type: bigint
123
+
124
+
There are several potential issues with this technique, two of them being:
125
+
126
+
* an empty repeated field will effectively mask the data contained in the flat fields of the same record
127
+
128
+
* the presence of multiple repeated fields can result in an explosion of the number of produced rows
129
+
130
+
As an example, we evaluate the MongoDB sample dataset restaurant collection stored in non-relational JSON format. Each restaurant has a nested address field and an array of grades it was assigned on different days. The figure below illustrates a typical restaurant with nested address and nested-repeated grades.
@@ -58,16 +57,17 @@ These objects will create in this section:
58
57
1. Create an external data source with [CREATE EXTERNAL DATA SOURCE](../../t-sql/statements/create-external-data-source-transact-sql.md). Specify external data source location and credentials for the Oracle data source.
59
58
60
59
```sql
61
-
/* LOCATION: Server DNS name or IP address.
62
-
* PUSHDOWN: specify whether computation should be pushed down to the source. ON by default.
63
-
* CREDENTIAL: the database scoped credential, created above.
64
-
*/
65
-
CREATE EXTERNAL DATA SOURCE OracleInstance
66
-
WITH (
67
-
LOCATION = '<vendor>://<server>[:<port>]',
68
-
-- PUSHDOWN = ON | OFF,
69
-
CREDENTIAL = OracleCredentials
70
-
);
60
+
/* LOCATION: Location string should be of format '<vendor>://<server>[:<port>]'.
61
+
* PUSHDOWN: specify whether computation should be pushed down to the source. ON by default.
62
+
* CREDENTIAL: the database scoped credential, created above.
@@ -58,16 +57,17 @@ These objects will create in this section:
58
57
1. Create an external data source with [CREATE EXTERNAL DATA SOURCE](../../t-sql/statements/create-external-data-source-transact-sql.md).Specify external data source location and credentials for SQL Server.
59
58
60
59
```sql
61
-
/* LOCATION: Server DNS name or IP address.
62
-
* PUSHDOWN: specify whether computation should be pushed down to the source. ON by default.
63
-
* CREDENTIAL: the database scoped credential, created above.
64
-
*/
65
-
CREATE EXTERNAL DATA SOURCE SqlServerInstance
66
-
WITH (
67
-
LOCATION = '<vendor>://<server>[:<port>]',
68
-
-- PUSHDOWN = ON | OFF,
69
-
CREDENTIAL = SqlServerCredentials
70
-
);
60
+
/* LOCATION: Location string should be of format '<vendor>://<server>[:<port>]'.
61
+
* PUSHDOWN: specify whether computation should be pushed down to the source. ON by default.
62
+
* CREDENTIAL: the database scoped credential, created above.
@@ -60,16 +59,17 @@ These objects will create in this section:
60
59
1. Create an external data source with [CREATE EXTERNAL DATA SOURCE](../../t-sql/statements/create-external-data-source-transact-sql.md).Specify external data source location and credentials for Teradata.
61
60
62
61
```sql
63
-
/* LOCATION: Server DNS name or IP address.
64
-
* PUSHDOWN: specify whether computation should be pushed down to the source. ON by default.
65
-
* CREDENTIAL: the database scoped credential, created above.
66
-
*/
67
-
CREATE EXTERNAL DATA SOURCE TeradataInstance
68
-
WITH (
69
-
LOCATION = '<vendor>://<server>[:<port>]',
70
-
-- PUSHDOWN = ON | OFF,
71
-
CREDENTIAL = TeradataCredentials
72
-
);
62
+
/* LOCATION: Location string should be of format '<vendor>://<server>[:<port>]'.
63
+
* PUSHDOWN: specify whether computation should be pushed down to the source. ON by default.
64
+
* CREDENTIAL: the database scoped credential, created above.
4. On the Server Configuration Page, configure the **SQL Server PolyBase Engine Service** and SQL Server PolyBase Data Movement Service to run under the same account.
69
71
@@ -85,7 +87,9 @@ Once you have installed PolyBase as either stand alone or in a scale-out group,
85
87
<!--SQL Server 2019-->
86
88
::: moniker range=">= sql-server-ver15"
87
89
88
-
After installation, you must [enable the PolyBase feature](#enable).
90
+
> **IMPORTANT!**
91
+
>
92
+
> After installation, you must [enable the PolyBase feature](#enable).
Starting with SQL Server 2019 CTP 2.0, you must enable PolyBase after installation using the following Transact-SQL command:
159
+
Once you are done with the installation, Polybase must be enabled to access it's features. connect to SQL Server 2019 CTP 2.0, you must enable PolyBase after installation using the following Transact-SQL command:
0 commit comments