| title | Machine Learning on SQL Server Big Data Clusters |
|---|---|
| titleSuffix | SQL Server Big Data Clusters |
| description | Machine Learning guide for SQL Server Big Data Clusters. |
| author | HugoMSFT |
| ms.author | hudequei |
| ms.reviewer | wiassaf |
| ms.date | 10/05/2021 |
| ms.service | sql |
| ms.subservice | machine-learning-bdc |
| ms.topic | conceptual |
[!INCLUDESQL Server 2019]
This article explains how to use [!INCLUDEbig-data-clusters-nover] for Machine Learning Scenarios.
[!INCLUDEbig-data-clusters-banner-retirement]
[!INCLUDEbig-data-clusters-nover] enables machine learning scenarios and solutions using different technology stacks: SQL Server Machine Learning Services and Apache Spark ML.
[!INCLUDEbig-data-clusters-nover] offer Machine Learning capabilities inside the SQL Server engine, using the established SQL Server Machine Learning Services technology stack; enabling a high-performance, in-database Machine Learning inference and scoring scenarios.
For big data-based machine learning scenarios, the usage of HDFS for big data hosting and Apache Spark ML capabilities is more cost-effective, scalable, and powerful.
The machine learning capabilities enable different applications and solutions such as: fraud detection, forecasting, churn, and general classification and regression tasks. Yet, it is important to use the best technology for a scenario.
| Aspect | SQL Server Machine Learning Services | Apache Spark ML |
|---|---|---|
| Data placement | Leverages tabular data locality on SQL Server. Premium data tier. | Scalable Big Data data tier using HDFS; either unstructured, semi-structured, and structured data. |
| Best for | Low latency inference and scoring scenarios | 1. Distributed batch training and scoring machine learning models on top of Big Data 2. ETL sinks and large-scale data preparation and featurization for ML |
| Feeds | ML powered BI dashboards, reports, and applications. Low latency required | Batch scored data may be promoted to SQL Server to drive ML powered scenarios |
| Latency | Low latency required | Higher latency acceptable |
| Read more | Run Python and R scripts with Machine Learning Services on SQL Server Big Data Clusters | Introducing Spark Machine Learning on SQL Server Big Data Clusters |
For more information, see [Introducing [!INCLUDEbig-data-clusters-nover]](big-data-cluster-overview.md).