Skip to content

Commit 45135cc

Browse files
authored
Merge pull request MicrosoftDocs#7759 from HeidiSteen/heidist-quickstart
File relocation and TOC updates
2 parents 54dad31 + df9267d commit 45135cc

16 files changed

Lines changed: 128 additions & 100 deletions

.openpublishing.redirection.json

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,24 @@
1010
"redirect_url": "/sql/relational-databases/polybase/polybase-guide",
1111
"redirect_document_id": false
1212
},
13+
{
14+
"source_path": "docs/advanced-analytics/r/sqldev-train-and-save-a-model-using-t-sql.md",
15+
"redirect_url": "/sql/advanced-analytics/tutorials/sqldev-train-and-save-a-model-using-t-sql",
16+
"redirect_document_id": true
17+
},
18+
{
19+
"source_path": "docs/advanced-analytics/tutorials/wrap-python-in-tsql-stored-procedure.md",
20+
"redirect_url": "/sql/advanced-analytics/tutorials/demo-data-iris-in-sql",
21+
"redirect_document_id": false
22+
},
23+
{
24+
"source_path": "docs/advanced-analytics/tutorials/sqldev-download-the-sample-data.md",
25+
"redirect_url": "/sql/advanced-analytics/tutorials/demo-data-nyctaxi-in-sql",
26+
"redirect_document_id": false
27+
},
1328
{
1429
"source_path": "docs/advanced-analytics/tutorials/sqldev-py1-download-the-sample-data.md",
15-
"redirect_url": "/sql/advanced-analytics/tutorials/sqldev-download-the-sample-data",
30+
"redirect_url": "/sql/advanced-analytics/tutorials/demo-data-nyctaxi-in-sql",
1631
"redirect_document_id": false
1732
},
1833
{
@@ -197,7 +212,7 @@
197212
},
198213
{
199214
"source_path": "docs/advanced-analytics/r/sqldev-import-data-to-sql-server-using-powershell.md",
200-
"redirect_url": "/sql/advanced-analytics/tutorials/sqldev-download-the-sample-data",
215+
"redirect_url": "/sql/advanced-analytics/tutorials/demo-data-nyctaxi-in-sql",
201216
"redirect_document_id": false
202217
},
203218
{
@@ -1203,7 +1218,7 @@
12031218
},
12041219
{
12051220
"source_path": "docs/advanced-analytics/r-services/step-1-download-the-sample-data-in-database-advanced-analytics-tutorial.md",
1206-
"redirect_url": "/sql/advanced-analytics/tutorials/sqldev-download-the-sample-data",
1221+
"redirect_url": "/sql/advanced-analytics/tutorials/demo-data-nyctaxi-in-sql",
12071222
"redirect_document_id": false
12081223
},
12091224
{
@@ -1223,7 +1238,7 @@
12231238
},
12241239
{
12251240
"source_path": "docs/advanced-analytics/r-services/step-5-train-and-save-a-model-using-t-sql.md",
1226-
"redirect_url": "/sql/advanced-analytics/r/sqldev-train-and-save-a-model-using-t-sql",
1241+
"redirect_url": "/sql/advanced-analytics/tutorials/sqldev-train-and-save-a-model-using-t-sql",
12271242
"redirect_document_id": false
12281243
},
12291244
{

docs/advanced-analytics/r/set-up-a-data-science-client.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ If your code requires packages that are not installed by default with SQL Server
8989
9090
## 4 - Test connections
9191

92-
SQL Server must be enabled for [remote connections](https://docs.microsoft.com/sql/database-engine/configure-windows/view-or-configure-remote-server-connection-options-sql-server.md) and you must have permissions, including a user login and a database to connect to. The following steps assume the demo database, [NYCTaxi_Sample](../tutorials/sqldev-download-the-sample-data.md) and Windows authentication.
92+
SQL Server must be enabled for [remote connections](https://docs.microsoft.com/sql/database-engine/configure-windows/view-or-configure-remote-server-connection-options-sql-server.md) and you must have permissions, including a user login and a database to connect to. The following steps assume the demo database, [NYCTaxi_Sample](../tutorials/demo-data-nyctaxi-in-sql.md) and Windows authentication.
9393

9494
As a verification step, use a built-in tool and RevoScaleR to confirm connectivity to the remote server.
9595

docs/advanced-analytics/toc.yml

Lines changed: 31 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,32 @@
7070
- name: Tutorials
7171
href: tutorials/machine-learning-services-tutorials.md
7272
items:
73+
- name: Data
74+
items:
75+
- name: Iris data set
76+
href: tutorials/demo-data-iris-in-sql.md
77+
- name: NYC Taxi data set
78+
href: tutorials/demo-data-nyctaxi-in-sql.md
79+
- name: Python
80+
href: tutorials/sql-server-python-tutorials.md
81+
items:
82+
- name: Train and use your first model
83+
href: tutorials/train-score-using-python-in-tsql.md
84+
- name: Learn in-database analytics
85+
href: tutorials/sqldev-in-database-python-for-sql-developers.md
86+
items:
87+
- name: Import data
88+
href: tutorials/sqldev-py2-import-data-to-sql-server-using-powershell.md
89+
- name: Explore and visualize data
90+
href: tutorials/sqldev-py3-explore-and-visualize-the-data.md
91+
- name: Create data features
92+
href: tutorials/sqldev-py4-create-data-features-using-t-sql.md
93+
- name: Train and save the model
94+
href: tutorials/sqldev-py5-train-and-save-a-model-using-t-sql.md
95+
- name: Operationalize the model
96+
href: tutorials/sqldev-py6-operationalize-the-model.md
97+
- name: Create a model using revoscalepy
98+
href: tutorials/use-python-revoscalepy-to-create-model.md
7399
- name: R
74100
href: tutorials/sql-server-r-tutorials.md
75101
items:
@@ -78,15 +104,13 @@
78104
- name: Learn in-database analytics
79105
href: tutorials/sqldev-in-database-r-for-sql-developers.md
80106
items:
81-
- name: 1 - Set up demo data
82-
href: tutorials/sqldev-download-the-sample-data.md
83-
- name: 2 - Visualize data
107+
- name: 1 - Visualize data
84108
href: tutorials/sqldev-explore-and-visualize-the-data.md
85-
- name: 3 - Create data features
109+
- name: 2 - Create data features
86110
href: tutorials/sqldev-create-data-features-using-t-sql.md
87-
- name: 4 - Train and save to SQL
88-
href: r/sqldev-train-and-save-a-model-using-t-sql.md
89-
- name: 5 - Predict outcomes
111+
- name: 3 - Train and save to SQL
112+
href: tutorials/sqldev-train-and-save-a-model-using-t-sql.md
113+
- name: 4 - Predict outcomes
90114
href: tutorials/sqldev-operationalize-the-model.md
91115
- name: Data science end-to-end walkthrough
92116
href: tutorials/walkthrough-data-science-end-to-end-walkthrough.md
@@ -138,30 +162,6 @@
138162
href: tutorials/deepdive-move-data-between-sql-server-and-xdf-file.md
139163
- name: Create a simple simulation
140164
href: tutorials/deepdive-create-a-simple-simulation.md
141-
- name: Python
142-
href: tutorials/sql-server-python-tutorials.md
143-
items:
144-
- name: Train and use your first model
145-
items:
146-
- name: Create the Iris dataset
147-
href: tutorials/wrap-python-in-tsql-stored-procedure.md
148-
- name: Train a model and score data
149-
href: tutorials/train-score-using-python-in-tsql.md
150-
- name: Learn in-database analytics
151-
href: tutorials/sqldev-in-database-python-for-sql-developers.md
152-
items:
153-
- name: Import data
154-
href: tutorials/sqldev-py2-import-data-to-sql-server-using-powershell.md
155-
- name: Explore and visualize data
156-
href: tutorials/sqldev-py3-explore-and-visualize-the-data.md
157-
- name: Create data features
158-
href: tutorials/sqldev-py4-create-data-features-using-t-sql.md
159-
- name: Train and save the model
160-
href: tutorials/sqldev-py5-train-and-save-a-model-using-t-sql.md
161-
- name: Operationalize the model
162-
href: tutorials/sqldev-py6-operationalize-the-model.md
163-
- name: Create a model using revoscalepy
164-
href: tutorials/use-python-revoscalepy-to-create-model.md
165165
- name: Samples
166166
href: https://github.com/Microsoft/sql-server-samples
167167
- name: Solutions

docs/advanced-analytics/tutorials/wrap-python-in-tsql-stored-procedure.md renamed to docs/advanced-analytics/tutorials/demo-data-iris-in-sql.md

Lines changed: 27 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,37 +1,41 @@
11
---
2-
title: Create the Iris dataset in SQL Server | Microsoft Docs
2+
title: Iris demo data set for SQL Server | Microsoft Docs
33
Description: Create a database containing the Iris dataset and a table for storing models. This dataset is used in exercises showing how to wrap Python code in a SQL Server stored procedure.
44
ms.prod: sql
55
ms.technology: machine-learning
66

7-
ms.date: 10/15/2018
7+
ms.date: 10/19/2018
88
ms.topic: tutorial
99
author: HeidiSteen
1010
ms.author: heidist
1111
manager: cgronlun
1212
---
13-
# Create the Iris dataset in SQL Server
13+
# Iris demo data for SQL Server
1414
[!INCLUDE[appliesto-ss-xxxx-xxxx-xxx-md-winonly](../../includes/appliesto-ss-xxxx-xxxx-xxx-md-winonly.md)]
1515

16-
In this exercise, prepare a SQL Server database containing tables for both [Iris](https://en.wikipedia.org/wiki/Iris_flower_data_set) data and model storage. You'll need these objects for the [next exercise](train-score-using-python-in-tsql.md) where you learn how to embed Python code in a stored procedure and write the results to a SQL Server table.
16+
In this exercise, prepare a SQL Server database containing tables for the [Iris flower data set](https://en.wikipedia.org/wiki/Iris_flower_data_set) and model storage. Iris data is included in both the R and Python distributions installed by SQL Server. It's used in machine learning tutorials for SQL Server.
1717

1818
To complete this exercise, you should have [SQL Server Management Studio](https://docs.microsoft.com/sql/ssms/download-sql-server-management-studio-ssms?view=sql-server-2017) or another tool that can run T-SQL queries.
1919

20+
Tutorials and quickstarts using this data set include the following:
21+
22+
+ [Use a Python model in SQL Server for training and scoring](train-score-using-python-in-tsql.md)
23+
2024
## Prepare the database and tables
2125

2226
1. Start SQL Server Management Studio, and open a new **Query** window.
2327

2428
2. Create a new database for this project, and change the context of your **Query** window to use the new database.
2529

2630
```sql
27-
CREATE DATABASE sqlpy
31+
CREATE DATABASE irissql
2832
GO
29-
USE sqlpy
33+
USE irissql
3034
GO
3135
```
3236

3337
> [!TIP]
34-
> If you're new to SQL Server, or are working on a server you own, a common mistake is to log in and start working without noticing that you are in the **master** database. To be sure that you are using the correct database, always specify the context using the `USE <database name>` statement (for example, `use sqlpy`).
38+
> If you're new to SQL Server, or are working on a server you own, a common mistake is to log in and start working without noticing that you are in the **master** database. To be sure that you are using the correct database, always specify the context using the `USE <database name>` statement (for example, `use irissql`).
3539
3640
3. Add some empty tables: one to store the data, and one to store the models you train. Later, you will use the models table to store serialized models generated in Python script.
3741
@@ -68,13 +72,13 @@ To complete this exercise, you should have [SQL Server Management Studio](https:
6872
6973
## Populate the table
7074
71-
To move the training data from Python into a SQL Server table is a multistep process:
75+
You can obtain built-in Iris data from either R or Python. This step uses Python to load the data into a data frame, and then insert it into a table in the database. Moving training data from an external session into a SQL Server table is a multistep process:
7276
73-
+ You design a stored procedure that gets the data you want.
74-
+ You execute the stored procedure to actually get the data.
75-
+ You use an INSERT statement to specify where the retrieved data should be saved.
77+
+ Design a stored procedure that gets the data you want.
78+
+ Execute the stored procedure to actually get the data.
79+
+ Construct an INSERT statement to specify where the retrieved data should be saved.
7680
77-
1. Create the following stored procedure that includes Python code.
81+
1. Create the following stored procedure that includes Python code to load the data.
7882
7983
```sql
8084
CREATE PROCEDURE get_iris_dataset
@@ -109,15 +113,22 @@ To move the training data from Python into a SQL Server table is a multistep pro
109113
> [!TIP]
110114
> To modify the stored procedure later, you don't need to drop and recreate it. Use the [ALTER PROCEDURE](https://docs.microsoft.com/sql/t-sql/statements/alter-procedure-transact-sql) statement.
111115

112-
3. To verify that the data was loaded correctly, you can run some simple queries:
116+
117+
## Query data for verification
118+
119+
As a validation step, run a query to confirm the data was uploaded.
120+
121+
1. In Object Explorer, under Databases, right-click the **irissql** database, and start a new query.
122+
123+
2. Run some simple queries:
113124

114125
```sql
115126
SELECT TOP(10) * FROM iris_data;
116127
SELECT COUNT(*) FROM iris_data;
117128
```
118129

119-
In the next lesson, you will create a machine learning model and save it to a table, and then use the model to generate predicted outcomes.
130+
## Next steps
120131

121-
## Next lesson
132+
In the following lesson, you will create a machine learning model and save it to a table, and then use the model to generate predicted outcomes.
122133

123-
[Train a Python model and generate scores in SQL Server](../tutorials/train-score-using-python-in-tsql.md)
134+
+ [Use a Python model in SQL Server for training and scoring](train-score-using-python-in-tsql.md)

docs/advanced-analytics/tutorials/sqldev-download-the-sample-data.md renamed to docs/advanced-analytics/tutorials/demo-data-nyctaxi-in-sql.md

Lines changed: 14 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ description: Instructions for downloading New York City taxi sample data and cre
44
ms.prod: sql
55
ms.technology: machine-learning
66

7-
ms.date: 10/02/2018
7+
ms.date: 10/19/2018
88
ms.topic: tutorial
99
author: HeidiSteen
1010
ms.author: heidist
@@ -13,17 +13,13 @@ manager: cgronlun
1313
# NYC Taxi demo data for SQL Server
1414
[!INCLUDE[appliesto-ss-xxxx-xxxx-xxx-md-winonly](../../includes/appliesto-ss-xxxx-xxxx-xxx-md-winonly.md)]
1515

16-
This article explains how to obtain sample data for R and Python tutorials for in-database analytics in SQL Server.
16+
This article explains how to set up a sample database consisting of public data from the [New York City Taxi and Limousine Commission](http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml). This data is used in several R and Python tutorials for in-database analytics in SQL Server. The sample data is one percent of the public data set. On your system, the database backup file is slightly over 90 MB, providing 1.7 million rows in the primary data table.
1717

18-
Data originates from the [NYC Taxi and Limousine Commission](http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml) public data set. We took a snapshot of the dataset and captured one percent of the available data for our demo database. On your system, the database backup file is slightly over 90 MB, providing 1.7 million rows in the primary data table.
18+
To complete this exercise, you should have [SQL Server Management Studio](https://docs.microsoft.com/sql/ssms/download-sql-server-management-studio-ssms?view=sql-server-2017) or another tool that can restore a database backup file and run T-SQL queries.
1919

20-
When you are finished with the steps in this article, the **NYCTaxi_Sample** database is available on your local instance, providing demo data for hands-on learning. The database name must be **NYCTaxi_Sample** if you want to run the demo scripts with no modification.
20+
Tutorials and quickstarts using this data set include the following:
2121

22-
## Prerequisites
23-
24-
You need an internet connection, local administrative rights on the computer, and a database engine instance.
25-
26-
It helps to have [SQL Server Management Studio](https://docs.microsoft.com/sql/ssms/download-sql-server-management-studio-ssms) or another tool to verify object creation.
22+
+ [Use a Python model in SQL Server for training and scoring](train-score-using-python-in-tsql.md)
2723

2824
## Download demo database
2925

@@ -56,22 +52,28 @@ The following table summarizes the objects created in the NYC Taxi demo database
5652
|**Object name**|**Object type**|**Description**|
5753
|----------|------------------------|---------------|
5854
|**NYCTaxi_Sample** | database |Created by the create-db-tb-upload-data.sql script. Creates a database and two tables:<br /><br />dbo.nyctaxi_sample table: Contains the main NYC Taxi dataset. A clustered columnstore index is added to the table to improve storage and query performance. The 1% sample of the NYC Taxi dataset is inserted into this table.<br /><br />dbo.nyc_taxi_models table: Used to persist the trained advanced analytics model.|
59-
|**fnCalculateDistance** |scalar-valued function | Created by the fnCalculateDistance.sql script. Calculates the direct distance between pickup and dropoff locations. This function is used in [Create data features](sqldev-create-data-features-using-t-sql.md), [Train and save a model](../r/sqldev-train-and-save-a-model-using-t-sql.md) and [Operationalize the R model](sqldev-operationalize-the-model.md).|
55+
|**fnCalculateDistance** |scalar-valued function | Created by the fnCalculateDistance.sql script. Calculates the direct distance between pickup and dropoff locations. This function is used in [Create data features](sqldev-create-data-features-using-t-sql.md), [Train and save a model](sqldev-train-and-save-a-model-using-t-sql.md) and [Operationalize the R model](sqldev-operationalize-the-model.md).|
6056
|**fnEngineerFeatures** |table-valued function | Created by the fnEngineerFeatures.sql script. Creates new data features for model training. This function is used in [Create data features](sqldev-create-data-features-using-t-sql.md) and [Operationalize the R model](sqldev-operationalize-the-model.md).|
6157
|**PlotHistogram** |stored procedure | Created by the PlotHistogram.sql script. Calls an R function to plot the histogram of a variable and then returns the plot as a binary object. This stored procedure is used in [Explore and visualize data](sqldev-explore-and-visualize-the-data.md).|
6258
|**PlotInOutputFiles** |stored procedure| Created by the PlotInOutputFiles.sql script. Creates a graphic using an R function and then saves the output as a local PDF file. This stored procedure is used in [Explore and visualize data](sqldev-explore-and-visualize-the-data.md).|
6359
|**PersistModel** |stored procedure | Created by the PersistModel.sql script. Takes a model that has been serialized in a varbinary data type, and writes it to the specified table. |
6460
|**PredictTip** |stored procedure |Created by the PredictTip.sql script. Calls the trained model to create predictions using the model. The stored procedure accepts a query as its input parameter and returns a column of numeric values containing the scores for the input rows. This stored procedure is used in [Operationalize the R model](sqldev-operationalize-the-model.md).|
6561
|**PredictTipSingleMode** |stored procedure| Created by the PredictTipSingleMode.sql script. Calls the trained model to create predictions using the model. This stored procedure accepts a new observation as input, with individual feature values passed as in-line parameters, and returns a value that predicts the outcome for the new observation. This stored procedure is used in [Operationalize the R model](sqldev-operationalize-the-model.md).|
66-
|**TrainTipPredictionModel** |stored procedure|Created by the TrainTipPredictionModel.sql script. Trains a logistic regression model by calling an R package. The model predicts the value of the tipped column, and is trained using a randomly selected 70% of the data. The output of the stored procedure is the trained model, which is saved in the table nyc_taxi_models. This stored procedure is used in [Train and save a model](../r/sqldev-train-and-save-a-model-using-t-sql.md).|
62+
|**TrainTipPredictionModel** |stored procedure|Created by the TrainTipPredictionModel.sql script. Trains a logistic regression model by calling an R package. The model predicts the value of the tipped column, and is trained using a randomly selected 70% of the data. The output of the stored procedure is the trained model, which is saved in the table nyc_taxi_models. This stored procedure is used in [Train and save a model](sqldev-train-and-save-a-model-using-t-sql.md).|
6763

6864
## Query data for verification
6965

7066
As a validation step, run a query to confirm the data was uploaded.
7167

7268
1. In Object Explorer, under Databases, right-click the **NYCTaxi_Sample** database, and start a new query.
7369

74-
2. Run **`select * from dbo.nyctaxi_sample`** to return all 1.7 million rows.
70+
2. Run some simple queries:
71+
72+
```sql
73+
SELECT TOP(10) * FROM dbo.nyctaxi_sample;
74+
SELECT COUNT(*) FROM dbo.nyctaxi_sample;
75+
```
76+
The database contains 1.7 million rows.
7577

7678
## Next steps
7779

0 commit comments

Comments
 (0)