Skip to content

Commit 9cd1351

Browse files
authored
Merge pull request #16342 from garyericson/07-28-nyctaxi-tutorial
SQLML: Clean up Python tutorial
2 parents c6266e4 + b448b47 commit 9cd1351

6 files changed

Lines changed: 410 additions & 335 deletions

docs/machine-learning/tutorials/demo-data-nyctaxi-in-sql.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.topic: tutorial
99
author: dphansen
1010
ms.author: davidph
1111
ms.custom: seo-lt-2019
12-
monikerRange: ">=sql-server-2016||>=sql-server-linux-ver15||=sqlallproducts-allversions"
12+
monikerRange: ">=sql-server-2016||>=sql-server-linux-ver15||>=azuresqldb-mi-current||=sqlallproducts-allversions"
1313
---
1414
# NYC Taxi demo data for SQL Server Python and R tutorials
1515
[!INCLUDE [SQL Server](../../includes/applies-to-version/sqlserver.md)]
@@ -34,6 +34,11 @@ File size is approximately 90 MB.
3434
>To restore the sample database on [SQL Server Big Data Clusters](../../big-data-cluster/big-data-cluster-overview.md), download [NYCTaxi_Sample.bak](https://sqlmldoccontent.blob.core.windows.net/sqlml/NYCTaxi_Sample.bak) and follow the directions in [Restore a database into the SQL Server big data cluster master instance](../../big-data-cluster/data-ingestion-restore-database.md).
3535
::: moniker-end
3636

37+
::: moniker range=">=azuresqldb-mi-current||=sqlallproducts-allversions"
38+
>[!NOTE]
39+
>To restore the sample database on [Machine Learning Services in Azure SQL Managed Instance (preview)](/azure/azure-sql/managed-instance/machine-learning-services-overview), follow the instructions in [Quickstart: Restore a database to Azure SQL Managed Instance](/azure/azure-sql/managed-instance/restore-sample-database-quickstart) using the NYC Taxi demo database .bak file: [https://sqlmldoccontent.blob.core.windows.net/sqlml/NYCTaxi_Sample.bak](https://sqlmldoccontent.blob.core.windows.net/sqlml/NYCTaxi_Sample.bak).
40+
::: moniker-end
41+
3742
1. Click [NYCTaxi_Sample.bak](https://sqlmldoccontent.blob.core.windows.net/sqlml/NYCTaxi_Sample.bak) to download the database backup file.
3843

3944
2. Copy the file to C:\Program files\Microsoft SQL Server\MSSQL-instance-name\MSSQL\Backup folder.
Lines changed: 53 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,64 +1,87 @@
11
---
2-
title: "Python + T-SQL: Develop model"
3-
description: Learn how to embed Python code in SQL Server stored procedures and T-SQL functions.
2+
title: "Python tutorial: Predict NYC taxi fares with binary classification"
3+
description: In this five-part tutorial series, you'll learn how to embed Python code in SQL Server stored procedures and T-SQL functions with SQL machine learning to predict NYC taxi fares using binary classification.
44
ms.prod: sql
5-
ms.technology: machine-learning-services
5+
ms.technology: machine-learning
66

7-
ms.date: 10/29/2018
7+
ms.date: 07/28/2020
88
ms.topic: tutorial
99
author: dphansen
1010
ms.author: davidph
1111
ms.custom: seo-lt-2019
12-
monikerRange: ">=sql-server-2017||>=sql-server-linux-ver15||=sqlallproducts-allversions"
12+
monikerRange: ">=sql-server-2017||>=sql-server-linux-ver15||>=azuresqldb-mi-current||=sqlallproducts-allversions"
1313
---
14-
# Tutorial: Python data analytics for SQL developers
14+
15+
# Python tutorial: Predict NYC taxi fares with binary classification
1516
[!INCLUDE [SQL Server](../../includes/applies-to-version/sqlserver.md)]
1617

17-
In this tutorial for SQL programmers, learn about Python integration by building and deploying a Python-based machine learning solution using a [NYCTaxi_sample](demo-data-nyctaxi-in-sql.md) database on SQL Server. You'll use T-SQL, SQL Server Management Studio, and a database engine instance with [Machine Learning Services](../install/sql-machine-learning-services-windows-install.md) and Python language support.
18+
::: moniker range=">=sql-server-2017||>=sql-server-linux-ver15||=sqlallproducts-allversions"
19+
In this five-part tutorial series for SQL programmers, you'll learn about Python integration in [SQL Server Machine Learning Services](../sql-server-machine-learning-services.md) or on [Big Data Clusters](../../big-data-cluster/machine-learning-services.md).
20+
::: moniker-end
1821

19-
This tutorial introduces you to Python functions used in a data modeling workflow. Steps include data exploration, building and training a binary classification model, and model deployment. You'll use sample data from the New York City Taxi and Limosine Commission, and the model you will build predicts whether a trip is likely to result in a tip based on the time of day, distance travelled, and pick-up location.
22+
::: moniker range=">=azuresqldb-mi-current||=sqlallproducts-allversions"
23+
In this five-part tutorial series for SQL programmers, you'll learn about Python integration in [Machine Learning Services in Azure SQL Managed Instance (preview)](/azure/azure-sql/managed-instance/machine-learning-services-overview).
24+
::: moniker-end
2025

21-
All of the Python code used in this tutorial is wrapped in stored procedures that you create and run in Management Studio.
26+
You'll build and deploy a Python-based machine learning solution using a sample database on SQL Server. You'll use T-SQL, Azure Data Studio or SQL Server Management Studio, and a database instance with SQL machine learning and Python language support.
2227

23-
> [!NOTE]
24-
> This tutorial is available in both R and Python. For the R version, see [In-database analytics for R developers](sqldev-in-database-r-for-sql-developers.md).
28+
This tutorial series introduces you to Python functions used in a data modeling workflow. Parts include data exploration, building and training a binary classification model, and model deployment. You'll use sample data from the New York City Taxi and Limousine Commission, and the model you'll build predicts whether a trip is likely to result in a tip based on the time of day, distance traveled, and pick-up location.
2529

26-
## Overview
30+
In the first part of this series, you'll install the prerequisites and restore the sample database. In parts two and three, you'll develop some Python scripts in a notebook to prepare your data and train a machine learning model. Then, in part three, you'll run those Python scripts inside the database using T-SQL stored procedures.
2731

28-
The process of building a machine learning solution is a complex one that can involve multiple tools, and the coordination of subject matter experts across several phases:
32+
In this article, you'll:
2933

30-
+ obtaining and cleaning data
31-
+ exploring the data and building features useful for modeling
32-
+ training and tuning the model
33-
+ deployment to production
34+
> [!div class="checklist"]
35+
> + Install prerequisites
36+
> + Restore the sample database
3437
35-
Development and testing of the actual code is best performed using a dedicated development environment. However, after the script is fully tested, you can easily deploy it to [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] using [!INCLUDE[tsql](../../includes/tsql-md.md)] stored procedures in the familiar environment of [!INCLUDE[ssManStudio](../../includes/ssmanstudio-md.md)]. Wrapping external code in stored procedures is the primary mechanism for operationalizing code in SQL Server.
38+
In [part two](sqldev-py3-explore-and-visualize-the-data.md), you'll explore the sample data and generate some plots.
3639

37-
Whether you are a SQL programmer new to Python, or a Python developer new to SQL, this multi-part tutorial introduces a typical workflow for conducting in-database analytics with Python and SQL Server.
40+
In [part three](sqldev-py4-create-data-features-using-t-sql.md), you'll learn how to create features from raw data by using a Transact-SQL function. You'll then call that function from a stored procedure to create a table that contains the feature values.
3841

39-
+ [Lesson 1: Explore and visualize the data using Python](sqldev-py3-explore-and-visualize-the-data.md)
42+
In [part four](sqldev-py5-train-and-save-a-model-using-t-sql.md), you'll load the modules and call the necessary functions to create and train the model using a SQL Server stored procedure.
4043

41-
+ [Lesson 2: Create data features using custom SQL functions](sqldev-py4-create-data-features-using-t-sql.md)
44+
In [part five](sqldev-py6-operationalize-the-model.md), you'll learn how to operationalize the models that you trained and saved in part four.
4245

43-
+ [Lesson 3: Train and save a Python model using T-SQL](sqldev-py5-train-and-save-a-model-using-t-sql.md)
46+
> [!NOTE]
47+
> This tutorial is available in both R and Python. For the R version, see [In-database analytics for R developers](sqldev-in-database-r-for-sql-developers.md).
4448
45-
+ [Lesson 4: Predict potential outcomes using a Python model in a stored procedure](sqldev-py6-operationalize-the-model.md)
49+
## Prerequisites
4650

47-
After the model has been saved to the database, you can call the model for predictions from [!INCLUDE[tsql](../../includes/tsql-md.md)] by using stored procedures.
51+
::: moniker range=">=sql-server-2017||>=sql-server-linux-ver15||=sqlallproducts-allversions"
52+
+ Install [SQL Server Machine Learning Services with Python](../install/sql-machine-learning-services-windows-install.md#verify-installation)
53+
::: moniker-end
4854

49-
## Prerequisites
55+
+ [Grant permissions to execute Python scripts](../security/user-permission.md)
5056

51-
+ [SQL Server Machine Learning Services with Python](../install/sql-machine-learning-services-windows-install.md#verify-installation)
57+
+ Restore the [NYC Taxi demo database](demo-data-nyctaxi-in-sql.md)
5258

53-
+ [Permissions](../security/user-permission.md)
59+
All tasks can be done using [!INCLUDE[tsql](../../includes/tsql-md.md)] stored procedures in Azure Data Studio or [!INCLUDE[ssManStudio](../../includes/ssmanstudio-md.md)].
5460

55-
+ [NYC Taxi demo database](demo-data-nyctaxi-in-sql.md)
61+
This tutorial series assumes familiarity with basic database operations such as creating databases and tables, importing data, and writing SQL queries. It does not assume you know Python and all Python code is provided.
5662

57-
All tasks can be done using [!INCLUDE[tsql](../../includes/tsql-md.md)] stored procedures in [!INCLUDE[ssManStudio](../../includes/ssmanstudio-md.md)].
63+
## Overview
64+
65+
The process of building a machine learning solution is a complex one that can involve multiple tools, and the coordination of subject matter experts across several phases:
5866

59-
This tutorial assumes familiarity with basic database operations such as creating databases and tables, importing data, and writing SQL queries. It does not assume you know Python. As such, all Python code is provided.
67+
+ obtaining and cleaning data
68+
+ exploring the data and building features useful for modeling
69+
+ training and tuning the model
70+
+ deployment to production
71+
72+
Development and testing of the actual code is best performed using a dedicated development environment. However, after the script is fully tested, you can easily deploy it to [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] using [!INCLUDE[tsql](../../includes/tsql-md.md)] stored procedures in the familiar environment of Azure Data Studio or [!INCLUDE[ssManStudio](../../includes/ssmanstudio-md.md)]. Wrapping external code in stored procedures is the primary mechanism for operationalizing code in SQL Server.
73+
74+
Whether you are a SQL programmer new to Python, or a Python developer new to SQL, this five-part tutorial series introduces a typical workflow for conducting in-database analytics with Python and SQL Server.
75+
76+
After the model has been saved to the database, you can call the model for predictions from [!INCLUDE[tsql](../../includes/tsql-md.md)] by using stored procedures.
6077

6178
## Next steps
6279

80+
In this article, you:
81+
82+
> [!div class="checklist"]
83+
> + Installed prerequisites
84+
> + Restored the sample database
85+
6386
> [!div class="nextstepaction"]
6487
> [Explore and visualize the data using Python](sqldev-py3-explore-and-visualize-the-data.md)

docs/machine-learning/tutorials/sqldev-py3-explore-and-visualize-the-data.md

Lines changed: 33 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,35 @@
11
---
2-
title: "Python + T-SQL: Explore data"
3-
description: Tutorial showing how to embed Python in SQL Server stored procedures and T-SQL functions
2+
title: "Python tutorial: Explore and visualize data"
3+
description: In part two of this five-part tutorial series, you'll explore sample data and generate some plots in preparation for using binary classification in Python with SQL machine learning.
44
ms.prod: sql
5-
ms.technology: machine-learning-services
5+
ms.technology: machine-learning
66

7-
ms.date: 11/01/2018
7+
ms.date: 07/29/2020
88
ms.topic: tutorial
99
author: dphansen
1010
ms.author: davidph
1111
ms.custom: seo-lt-2019
12-
monikerRange: ">=sql-server-2017||>=sql-server-linux-ver15||=sqlallproducts-allversions"
12+
monikerRange: ">=sql-server-2017||>=sql-server-linux-ver15||>=azuresqldb-mi-current||=sqlallproducts-allversions"
1313
---
14-
# Explore and visualize the data
14+
15+
# Python tutorial: Explore and visualize data
1516
[!INCLUDE [SQL Server](../../includes/applies-to-version/sqlserver.md)]
1617

17-
This article is part of a tutorial, [In-database Python analytics for SQL developers](sqldev-in-database-python-for-sql-developers.md).
18+
In part two of this five-part tutorial series, you'll explore the sample data and generate some plots. Later, you'll learn how to serialize graphics objects in Python, and then deserialize those objects and make plots.
19+
20+
In this article, you'll:
21+
22+
> [!div class="checklist"]
23+
> + Review the sample data
24+
> + Create plots using Python in T-SQL
25+
26+
In [part one](sqldev-in-database-python-for-sql-developers.md), you installed the prerequisites and restored the sample database.
27+
28+
In [part three](sqldev-py4-create-data-features-using-t-sql.md), you'll learn how to create features from raw data by using a Transact-SQL function. You'll then call that function from a stored procedure to create a table that contains the feature values.
29+
30+
In [part four](sqldev-py5-train-and-save-a-model-using-t-sql.md), you'll load the modules and call the necessary functions to create and train the model using a SQL Server stored procedure.
1831

19-
In this step, you explore the sample data and generate some plots. Later, you learn how to serialize graphics objects in Python, and then deserialize those objects and make plots.
32+
In [part five](sqldev-py6-operationalize-the-model.md), you'll learn how to operationalize the models that you trained and saved in part four.
2033

2134
## Review the data
2235

@@ -145,13 +158,10 @@ The stored procedure returns a serialized Python `figure` object as a stream of
145158
0xFFD8FFE000104A4649...
146159
```
147160

148-
149161
4. From a [Python client](../python/setup-python-client-tools-sql.md), you can now connect to the SQL Server instance that generated the binary plot objects, and view the plots.
150162

151-
To do this, run the following Python code, replacing the server name, database name, and credentials as appropriate. Make sure the Python version is the same on the client and the server. Also make sure that the Python libraries on your client (such as matplotlib) are the same or higher version relative to the libraries installed on the server.
163+
To do this, run the following Python code, replacing the server name, database name, and credentials as appropriate (for Windows authentication, replace the `UID` and `PWD` parameters with `Trusted_Connection=True`). Make sure the Python version is the same on the client and the server. Also make sure that the Python libraries on your client (such as matplotlib) are the same or higher version relative to the libraries installed on the server.
152164

153-
**Using SQL Server authentication:**
154-
155165
```python
156166
%matplotlib notebook
157167
import pyodbc
@@ -167,36 +177,21 @@ The stored procedure returns a serialized Python `figure` object as a stream of
167177
print("The plots are saved in directory: ",os.getcwd())
168178
```
169179

170-
**Using Windows authentication:**
171-
172-
```python
173-
%matplotlib notebook
174-
import pyodbc
175-
import pickle
176-
import os
177-
cnxn = pyodbc.connect('DRIVER=SQL Server;SERVER={SERVER_NAME};DATABASE={DB_NAME};Trusted_Connection=True;')
178-
cursor = cnxn.cursor()
179-
cursor.execute("EXECUTE [dbo].[PyPlotMatplotlib]")
180-
tables = cursor.fetchall()
181-
for i in range(0, len(tables)):
182-
fig = pickle.loads(tables[i][0])
183-
fig.savefig(str(i)+'.png')
184-
print("The plots are saved in directory: ",os.getcwd())
185-
```
186-
187-
5. If the connection is successful, you should see a message like the following:
180+
5. If the connection is successful, you should see a message like the following:
188181

189-
*The plots are saved in directory: xxxx*
182+
*The plots are saved in directory: xxxx*
190183

191-
6. The output file is created in the Python working directory. To view the plot, locate the Python working directory, and open the file. The following image shows a plot saved on the client computer.
184+
6. The output file is created in the Python working directory. To view the plot, locate the Python working directory, and open the file. The following image shows a plot saved on the client computer.
192185

193-
![Tip amount vs Fare amount](media/sqldev-python-sample-plot.png "Tip amount vs Fare amount")
194-
195-
## Next step
186+
![Tip amount vs Fare amount](media/sqldev-python-sample-plot.png "Tip amount vs Fare amount")
196187

197-
[Create data features using T-SQL](sqldev-py5-train-and-save-a-model-using-t-sql.md)
188+
## Next steps
198189

199-
## Previous step
190+
In this article, you:
200191

201-
[Download the NYC Taxi data set](demo-data-nyctaxi-in-sql.md)
192+
> [!div class="checklist"]
193+
> + Reviewed the sample data
194+
> + Created plots using Python in T-SQL
202195

196+
> [!div class="nextstepaction"]
197+
> [Create data features using T-SQL](sqldev-py5-train-and-save-a-model-using-t-sql.md)

0 commit comments

Comments
 (0)