sql-big-data-cluster

SQL Server big data clusters

Pre-requisites

Kubernetes cluster configuration & Kubectl command-line utility
Curl utility
Sqlcmd utility
Bcp utility
Azure Data Studio or SQL Server Management Studio
SQL Server 2019 big data cluster

Installation instructions for SQL Server 2019 big data cluster can be found here.

Samples Setup

Before you begin, download the sample database backup file and save it locally. Run the CMD script called bootstrap-sample-db.cmd or the shell script bootstrap-sample-db.sh depending on your platform. This script will restore the database on the SQL Master instance, execute the bootstrap-sample-db.sql script, create the database objects needed, export the web_clickstreams & inventory tables to CSV file, and upload the web_clickstreams CSV file to HDFS inside the SQL Server 2019 big data cluster.

data-pool

Data ingestion using Spark

Connect to the master instance in your SQL Server big data cluster and the SQL Server big data cluster endpoint, and follow the steps in data-pool/data-ingestion-spark.sql.

Data ingestion using sql

Connect to the master instance in your SQL Server big data cluster and execute the steps in data-pool/data-ingestion-sql.sql.

data-virtualization

External table over HDFS

Connect to the master instance in your SQL Server big data cluster and execute the steps in data-virtualization/external-table-hdfs.sql.

External table over Oracle

To execute this sample script, you will need following:

Oracle instance and credentials
Create inventory table in Oracle using data-virtualization/inventory-oracle.sql script
Import the inventory.csv file generated by the bootstrap-sample-db script to a table in Oracle

Connect to the master instance in your SQL Server big data cluster and execute the steps in data-virtualization/external-table-oracle.sql.

machine-learning

SQL Server ML Services on master instance

Connect to the master instance in your SQL Server big data cluster and execute the steps in machine-learning/sql/book-category-r-ml.sql.

Spark ML

Connect to the SQL Server big data cluster endpoint, and run the notebook files machine-learning/spark/1-data-prep.ipynb and machine-learning/spark/2-build-ml-model.ipynb cell by cell.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

SQL Server big data clusters

Pre-requisites

Samples Setup

Data ingestion using Spark

Data ingestion using sql

External table over HDFS

External table over Oracle

SQL Server ML Services on master instance

Spark ML

Name		Name	Last commit message	Last commit date
parent directory ..
data-pool		data-pool
data-virtualization		data-virtualization
deployment		deployment
machine-learning		machine-learning
spark		spark
README.md		README.md
bootstrap-sample-db.cmd		bootstrap-sample-db.cmd
bootstrap-sample-db.sh		bootstrap-sample-db.sh
bootstrap-sample-db.sql		bootstrap-sample-db.sql

FilesExpand file tree

sql-big-data-cluster

Directory actions

More options

Directory actions

More options

Latest commit

History

sql-big-data-cluster

Folders and files

parent directory

README.md

SQL Server big data clusters

Pre-requisites

Samples Setup

Data ingestion using Spark

Data ingestion using sql

External table over HDFS

External table over Oracle

SQL Server ML Services on master instance

Spark ML