- Kubernetes cluster configuration & Kubectl command-line utility
- Curl utility
- Sqlcmd utility
- Bcp utility
- Azure Data Studio or SQL Server Management Studio
- SQL Server 2019 big data cluster
Installation instructions for SQL Server 2019 big data cluster can be found here.
Before you begin, download the sample database backup file and save it locally. Run the CMD script called bootstrap-sample-db.cmd or the shell script bootstrap-sample-db.sh depending on your platform. This script will restore the database on the SQL Master instance, execute the bootstrap-sample-db.sql script, create the database objects needed, export the web_clickstreams & inventory tables to CSV file, and upload the web_clickstreams CSV file to HDFS inside the SQL Server 2019 big data cluster.
Connect to the master instance in your SQL Server big data cluster and the SQL Server big data cluster endpoint, and follow the steps in data-pool/data-ingestion-spark.sql.
Connect to the master instance in your SQL Server big data cluster and execute the steps in data-pool/data-ingestion-sql.sql.
Connect to the master instance in your SQL Server big data cluster and execute the steps in data-virtualization/external-table-hdfs.sql.
To execute this sample script, you will need following:
- Oracle instance and credentials
- Create inventory table in Oracle using data-virtualization/inventory-oracle.sql script
- Import the inventory.csv file generated by the bootstrap-sample-db script to a table in Oracle
Connect to the master instance in your SQL Server big data cluster and execute the steps in data-virtualization/external-table-oracle.sql.
Connect to the master instance in your SQL Server big data cluster and execute the steps in machine-learning/sql/book-category-r-ml.sql.
Connect to the SQL Server big data cluster endpoint, and run the notebook files machine-learning/spark/1-data-prep.ipynb and machine-learning/spark/2-build-ml-model.ipynb cell by cell.