Skip to content

Latest commit

 

History

History
 
 

README.md

SQL Server big data clusters

Pre-requisites

  1. Kubernetes cluster configuration & Kubectl command-line utility
  2. Curl utility
  3. Sqlcmd utility
  4. Bcp utility
  5. Azure Data Studio or SQL Server Management Studio
  6. SQL Server 2019 big data cluster

Installation instructions for SQL Server 2019 big data cluster can be found here.

Samples Setup

Before you begin, download the sample database backup file and save it locally. Run the CMD script called bootstrap-sample-db.cmd or the shell script bootstrap-sample-db.sh depending on your platform. This script will restore the database on the SQL Master instance, execute the bootstrap-sample-db.sql script, create the database objects needed, export the web_clickstreams & inventory tables to CSV file, and upload the web_clickstreams CSV file to HDFS inside the SQL Server 2019 big data cluster.

data-pool

Data ingestion using Spark

Connect to the master instance in your SQL Server big data cluster and the SQL Server big data cluster endpoint, and follow the steps in data-pool/data-ingestion-spark.sql.

Data ingestion using sql

Connect to the master instance in your SQL Server big data cluster and execute the steps in data-pool/data-ingestion-sql.sql.

data-virtualization

External table over HDFS

Connect to the master instance in your SQL Server big data cluster and execute the steps in data-virtualization/external-table-hdfs.sql.

External table over Oracle

To execute this sample script, you will need following:

  1. Oracle instance and credentials
  2. Create inventory table in Oracle using data-virtualization/inventory-oracle.sql script
  3. Import the inventory.csv file generated by the bootstrap-sample-db script to a table in Oracle

Connect to the master instance in your SQL Server big data cluster and execute the steps in data-virtualization/external-table-oracle.sql.

machine-learning

SQL Server ML Services on master instance

Connect to the master instance in your SQL Server big data cluster and execute the steps in machine-learning/sql/book-category-r-ml.sql.

Spark ML

Connect to the SQL Server big data cluster endpoint, and run the notebook files machine-learning/spark/1-data-prep.ipynb and machine-learning/spark/2-build-ml-model.ipynb cell by cell.