|
1 | 1 | --- |
2 | | -title: "Python + T-SQL: Develop model" |
3 | | -description: Learn how to embed Python code in SQL Server stored procedures and T-SQL functions. |
| 2 | +title: "Python tutorial: Predict NYC taxi fares with binary classification" |
| 3 | +description: In this five-part tutorial series, you'll learn how to embed Python code in SQL Server stored procedures and T-SQL functions with SQL machine learning to predict NYC taxi fares using binary classification. |
4 | 4 | ms.prod: sql |
5 | | -ms.technology: machine-learning-services |
| 5 | +ms.technology: machine-learning |
6 | 6 |
|
7 | | -ms.date: 10/29/2018 |
| 7 | +ms.date: 07/28/2020 |
8 | 8 | ms.topic: tutorial |
9 | 9 | author: dphansen |
10 | 10 | ms.author: davidph |
11 | 11 | ms.custom: seo-lt-2019 |
12 | | -monikerRange: ">=sql-server-2017||>=sql-server-linux-ver15||=sqlallproducts-allversions" |
| 12 | +monikerRange: ">=sql-server-2017||>=sql-server-linux-ver15||>=azuresqldb-mi-current||=sqlallproducts-allversions" |
13 | 13 | --- |
14 | | -# Tutorial: Python data analytics for SQL developers |
| 14 | + |
| 15 | +# Python tutorial: Predict NYC taxi fares with binary classification |
15 | 16 | [!INCLUDE [SQL Server](../../includes/applies-to-version/sqlserver.md)] |
16 | 17 |
|
17 | | -In this tutorial for SQL programmers, learn about Python integration by building and deploying a Python-based machine learning solution using a [NYCTaxi_sample](demo-data-nyctaxi-in-sql.md) database on SQL Server. You'll use T-SQL, SQL Server Management Studio, and a database engine instance with [Machine Learning Services](../install/sql-machine-learning-services-windows-install.md) and Python language support. |
| 18 | +::: moniker range=">=sql-server-2017||>=sql-server-linux-ver15||=sqlallproducts-allversions" |
| 19 | +In this five-part tutorial series for SQL programmers, you'll learn about Python integration in [SQL Server Machine Learning Services](../sql-server-machine-learning-services.md) or on [Big Data Clusters](../../big-data-cluster/machine-learning-services.md). |
| 20 | +::: moniker-end |
18 | 21 |
|
19 | | -This tutorial introduces you to Python functions used in a data modeling workflow. Steps include data exploration, building and training a binary classification model, and model deployment. You'll use sample data from the New York City Taxi and Limosine Commission, and the model you will build predicts whether a trip is likely to result in a tip based on the time of day, distance travelled, and pick-up location. |
| 22 | +::: moniker range=">=azuresqldb-mi-current||=sqlallproducts-allversions" |
| 23 | +In this five-part tutorial series for SQL programmers, you'll learn about Python integration in [Machine Learning Services in Azure SQL Managed Instance (preview)](/azure/azure-sql/managed-instance/machine-learning-services-overview). |
| 24 | +::: moniker-end |
20 | 25 |
|
21 | | -All of the Python code used in this tutorial is wrapped in stored procedures that you create and run in Management Studio. |
| 26 | +You'll build and deploy a Python-based machine learning solution using a sample database on SQL Server. You'll use T-SQL, Azure Data Studio or SQL Server Management Studio, and a database instance with SQL machine learning and Python language support. |
22 | 27 |
|
23 | | -> [!NOTE] |
24 | | -> This tutorial is available in both R and Python. For the R version, see [In-database analytics for R developers](sqldev-in-database-r-for-sql-developers.md). |
| 28 | +This tutorial series introduces you to Python functions used in a data modeling workflow. Parts include data exploration, building and training a binary classification model, and model deployment. You'll use sample data from the New York City Taxi and Limousine Commission, and the model you'll build predicts whether a trip is likely to result in a tip based on the time of day, distance traveled, and pick-up location. |
25 | 29 |
|
26 | | -## Overview |
| 30 | +In the first part of this series, you'll install the prerequisites and restore the sample database. In parts two and three, you'll develop some Python scripts in a notebook to prepare your data and train a machine learning model. Then, in part three, you'll run those Python scripts inside the database using T-SQL stored procedures. |
27 | 31 |
|
28 | | -The process of building a machine learning solution is a complex one that can involve multiple tools, and the coordination of subject matter experts across several phases: |
| 32 | +In this article, you'll: |
29 | 33 |
|
30 | | -+ obtaining and cleaning data |
31 | | -+ exploring the data and building features useful for modeling |
32 | | -+ training and tuning the model |
33 | | -+ deployment to production |
| 34 | +> [!div class="checklist"] |
| 35 | +> + Install prerequisites |
| 36 | +> + Restore the sample database |
34 | 37 |
|
35 | | -Development and testing of the actual code is best performed using a dedicated development environment. However, after the script is fully tested, you can easily deploy it to [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] using [!INCLUDE[tsql](../../includes/tsql-md.md)] stored procedures in the familiar environment of [!INCLUDE[ssManStudio](../../includes/ssmanstudio-md.md)]. Wrapping external code in stored procedures is the primary mechanism for operationalizing code in SQL Server. |
| 38 | +In [part two](sqldev-py3-explore-and-visualize-the-data.md), you'll explore the sample data and generate some plots. |
36 | 39 |
|
37 | | -Whether you are a SQL programmer new to Python, or a Python developer new to SQL, this multi-part tutorial introduces a typical workflow for conducting in-database analytics with Python and SQL Server. |
| 40 | +In [part three](sqldev-py4-create-data-features-using-t-sql.md), you'll learn how to create features from raw data by using a Transact-SQL function. You'll then call that function from a stored procedure to create a table that contains the feature values. |
38 | 41 |
|
39 | | -+ [Lesson 1: Explore and visualize the data using Python](sqldev-py3-explore-and-visualize-the-data.md) |
| 42 | +In [part four](sqldev-py5-train-and-save-a-model-using-t-sql.md), you'll load the modules and call the necessary functions to create and train the model using a SQL Server stored procedure. |
40 | 43 |
|
41 | | -+ [Lesson 2: Create data features using custom SQL functions](sqldev-py4-create-data-features-using-t-sql.md) |
| 44 | +In [part five](sqldev-py6-operationalize-the-model.md), you'll learn how to operationalize the models that you trained and saved in part four. |
42 | 45 |
|
43 | | -+ [Lesson 3: Train and save a Python model using T-SQL](sqldev-py5-train-and-save-a-model-using-t-sql.md) |
| 46 | +> [!NOTE] |
| 47 | +> This tutorial is available in both R and Python. For the R version, see [In-database analytics for R developers](sqldev-in-database-r-for-sql-developers.md). |
44 | 48 |
|
45 | | -+ [Lesson 4: Predict potential outcomes using a Python model in a stored procedure](sqldev-py6-operationalize-the-model.md) |
| 49 | +## Prerequisites |
46 | 50 |
|
47 | | -After the model has been saved to the database, you can call the model for predictions from [!INCLUDE[tsql](../../includes/tsql-md.md)] by using stored procedures. |
| 51 | +::: moniker range=">=sql-server-2017||>=sql-server-linux-ver15||=sqlallproducts-allversions" |
| 52 | ++ Install [SQL Server Machine Learning Services with Python](../install/sql-machine-learning-services-windows-install.md#verify-installation) |
| 53 | +::: moniker-end |
48 | 54 |
|
49 | | -## Prerequisites |
| 55 | ++ [Grant permissions to execute Python scripts](../security/user-permission.md) |
50 | 56 |
|
51 | | -+ [SQL Server Machine Learning Services with Python](../install/sql-machine-learning-services-windows-install.md#verify-installation) |
| 57 | ++ Restore the [NYC Taxi demo database](demo-data-nyctaxi-in-sql.md) |
52 | 58 |
|
53 | | -+ [Permissions](../security/user-permission.md) |
| 59 | +All tasks can be done using [!INCLUDE[tsql](../../includes/tsql-md.md)] stored procedures in Azure Data Studio or [!INCLUDE[ssManStudio](../../includes/ssmanstudio-md.md)]. |
54 | 60 |
|
55 | | -+ [NYC Taxi demo database](demo-data-nyctaxi-in-sql.md) |
| 61 | +This tutorial series assumes familiarity with basic database operations such as creating databases and tables, importing data, and writing SQL queries. It does not assume you know Python and all Python code is provided. |
56 | 62 |
|
57 | | -All tasks can be done using [!INCLUDE[tsql](../../includes/tsql-md.md)] stored procedures in [!INCLUDE[ssManStudio](../../includes/ssmanstudio-md.md)]. |
| 63 | +## Overview |
| 64 | + |
| 65 | +The process of building a machine learning solution is a complex one that can involve multiple tools, and the coordination of subject matter experts across several phases: |
58 | 66 |
|
59 | | -This tutorial assumes familiarity with basic database operations such as creating databases and tables, importing data, and writing SQL queries. It does not assume you know Python. As such, all Python code is provided. |
| 67 | ++ obtaining and cleaning data |
| 68 | ++ exploring the data and building features useful for modeling |
| 69 | ++ training and tuning the model |
| 70 | ++ deployment to production |
| 71 | + |
| 72 | +Development and testing of the actual code is best performed using a dedicated development environment. However, after the script is fully tested, you can easily deploy it to [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] using [!INCLUDE[tsql](../../includes/tsql-md.md)] stored procedures in the familiar environment of Azure Data Studio or [!INCLUDE[ssManStudio](../../includes/ssmanstudio-md.md)]. Wrapping external code in stored procedures is the primary mechanism for operationalizing code in SQL Server. |
| 73 | + |
| 74 | +Whether you are a SQL programmer new to Python, or a Python developer new to SQL, this five-part tutorial series introduces a typical workflow for conducting in-database analytics with Python and SQL Server. |
| 75 | + |
| 76 | +After the model has been saved to the database, you can call the model for predictions from [!INCLUDE[tsql](../../includes/tsql-md.md)] by using stored procedures. |
60 | 77 |
|
61 | 78 | ## Next steps |
62 | 79 |
|
| 80 | +In this article, you: |
| 81 | + |
| 82 | +> [!div class="checklist"] |
| 83 | +> + Installed prerequisites |
| 84 | +> + Restored the sample database |
| 85 | +
|
63 | 86 | > [!div class="nextstepaction"] |
64 | 87 | > [Explore and visualize the data using Python](sqldev-py3-explore-and-visualize-the-data.md) |
0 commit comments