Skip to content

Commit 9749a58

Browse files
20221102 edit pass, fix broken links
1 parent e4f1b9a commit 9749a58

2 files changed

Lines changed: 56 additions & 56 deletions

File tree

docs/machine-learning/tutorials/demo-data-nyctaxi-in-sql.md

Lines changed: 20 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,7 @@ title: NYC Taxi demo data for tutorials
33
description: Create a database containing the New York City taxi sample data. This dataset is used in R and Python tutorials for SQL Server Machine Learning Services.
44
ms.prod: sql
55
ms.technology: machine-learning-services
6-
7-
ms.date: 10/31/2018
6+
ms.date: 11/02/2022
87
ms.topic: tutorial
98
author: WilliamDAssafMSFT
109
ms.author: wiassaf
@@ -16,7 +15,7 @@ monikerRange: ">=sql-server-2016||>=sql-server-linux-ver15||>=azuresqldb-mi-curr
1615

1716
This article explains how to set up a sample database consisting of public data from the [New York City Taxi and Limousine Commission](http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml). This data is used in several R and Python tutorials for in-database analytics on SQL Server. To make the sample code run quicker, we created a representative 1% sampling of the data. On your system, the database backup file is slightly over 90 MB, providing 1.7 million rows in the primary data table.
1817

19-
To complete this exercise, you should have [SQL Server Management Studio](../../ssms/download-sql-server-management-studio-ssms.md?view=sql-server-2017&preserve-view=true) or another tool that can restore a database backup file and run T-SQL queries.
18+
To complete this exercise, you should have [SQL Server Management Studio (SSMS)](../../ssms/download-sql-server-management-studio-ssms.md?view=sql-server-2017&preserve-view=true) or another tool that can restore a database backup file and run T-SQL queries.
2019

2120
Tutorials and quickstarts using this data set include the following:
2221

@@ -25,34 +24,34 @@ Tutorials and quickstarts using this data set include the following:
2524

2625
## Download files
2726

28-
The sample database is a SQL Server 2016 BAK file hosted by Microsoft. You can restore it on SQL Server 2016 and later. File download begins immediately when you click the link.
27+
The sample database is a SQL Server 2016 BAK file hosted by Microsoft. You can restore it on SQL Server 2016 and later. File download begins immediately when you open the link.
2928

3029
File size is approximately 90 MB.
3130

3231
::: moniker range=">=sql-server-ver15||>=sql-server-linux-ver15"
3332
>[!NOTE]
34-
>To restore the sample database on [SQL Server Big Data Clusters](../../big-data-cluster/big-data-cluster-overview.md), download [NYCTaxi_Sample.bak](https://sqlmldoccontent.blob.core.windows.net/sqlml/NYCTaxi_Sample.bak) and follow the directions in [Restore a database into the SQL Server big data cluster master instance](../../big-data-cluster/data-ingestion-restore-database.md).
33+
>To restore the sample database on [SQL Server Big Data Clusters](../../big-data-cluster/big-data-cluster-overview.md), download [NYCTaxi_Sample.bak](https://aka.ms/sqlmldocument/NYCTaxi_Sample.bak) and follow the directions in [Restore a database into the SQL Server big data cluster master instance](../../big-data-cluster/data-ingestion-restore-database.md).
3534
::: moniker-end
3635

3736
::: moniker range=">=azuresqldb-mi-current"
3837
>[!NOTE]
39-
>To restore the sample database on [Machine Learning Services in Azure SQL Managed Instance](/azure/azure-sql/managed-instance/machine-learning-services-overview), follow the instructions in [Quickstart: Restore a database to Azure SQL Managed Instance](/azure/azure-sql/managed-instance/restore-sample-database-quickstart) using the NYC Taxi demo database .bak file: [https://sqlmldoccontent.blob.core.windows.net/sqlml/NYCTaxi_Sample.bak](https://sqlmldoccontent.blob.core.windows.net/sqlml/NYCTaxi_Sample.bak).
38+
>To restore the sample database on [Machine Learning Services in Azure SQL Managed Instance](/azure/azure-sql/managed-instance/machine-learning-services-overview), follow the instructions in [Quickstart: Restore a database to Azure SQL Managed Instance](/azure/azure-sql/managed-instance/restore-sample-database-quickstart) using the NYC Taxi demo database .bak file: [https://aka.ms/sqlmldocument/NYCTaxi_Sample.bak](https://aka.ms/sqlmldocument/NYCTaxi_Sample.bak).
4039
::: moniker-end
4140

42-
1. Click [NYCTaxi_Sample.bak](https://sqlmldoccontent.blob.core.windows.net/sqlml/NYCTaxi_Sample.bak) to download the database backup file.
41+
1. Download the [NYCTaxi_Sample.bak](https://aka.ms/sqlmldocument/NYCTaxi_Sample.bak) database backup file.
4342

44-
2. Copy the file to C:\Program files\Microsoft SQL Server\MSSQL-instance-name\MSSQL\Backup folder.
43+
2. Copy the file to `C:\Program files\Microsoft SQL Server\MSSQL-instance-name\MSSQL\Backup` or similar path, for your instance's default `Backup` folder.
4544

46-
3. In Management Studio, right-click **Databases** and select **Restore Files and File Groups**.
45+
3. In SSMS, right-click **Databases** and select **Restore Files and File Groups**.
4746

48-
4. Enter *NYCTaxi_Sample* as the database name.
47+
4. Enter `NYCTaxi_Sample` as the database name.
4948

50-
5. Click **From device** and then open the file selection page to select the backup file. Click **Add** to select NYCTaxi_Sample.bak.
49+
5. Select **From device** and then open the file selection page to select the `NYCTaxi_Sample.bak` backup file. Select **Add** to select `NYCTaxi_Sample.bak`.
5150

52-
6. Select the **Restore** checkbox and click **OK** to restore the database.
51+
6. Select the **Restore** checkbox and select **OK** to restore the database.
5352

5453
## Review database objects
55-
54+
5655
Confirm the database objects exist on the [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] instance using [!INCLUDE[ssManStudioFull](../../includes/ssmanstudiofull-md.md)]. You should see the database, tables, functions, and stored procedures.
5756

5857
![rsql_devtut_BrowseTables](media/rsql-devtut-browsetables.png "rsql_devtut_BrowseTables")
@@ -63,7 +62,7 @@ The following table summarizes the objects created in the NYC Taxi demo database
6362

6463
|**Object name**|**Object type**|**Description**|
6564
|----------|------------------------|---------------|
66-
|**NYCTaxi_Sample** | database | Creates a database and two tables:<br /><br />dbo.nyctaxi_sample table: Contains the main NYC Taxi dataset. A clustered columnstore index is added to the table to improve storage and query performance. The 1% sample of the NYC Taxi dataset is inserted into this table.<br /><br />dbo.nyc_taxi_models table: Used to persist the trained advanced analytics model.|
65+
|**NYCTaxi_Sample** | database | Creates a database and two tables:<br /><br />`dbo.nyctaxi_sample` table: Contains the main NYC Taxi dataset. A clustered columnstore index is added to the table to improve storage and query performance. The 1% sample of the NYC Taxi dataset is inserted into this table.<br /><br />`dbo.nyc_taxi_models` table: Used to persist the trained advanced analytics model.|
6766
|**fnCalculateDistance** |scalar-valued function | Calculates the direct distance between pickup and dropoff locations. This function is used in [Create data features](r-taxi-classification-create-features.md), [Train and save a model](r-taxi-classification-train-model.md) and [Operationalize the R model](r-taxi-classification-deploy-model.md).|
6867
|**fnEngineerFeatures** |table-valued function | Creates new data features for model training. This function is used in [Create data features](r-taxi-classification-create-features.md) and [Operationalize the R model](r-taxi-classification-deploy-model.md).|
6968

@@ -72,27 +71,28 @@ Stored procedures are created using R and Python script found in various tutoria
7271

7372
|**Stored procedure**|**Language**|**Description**|
7473
|-------------------------|------------|---------------|
75-
|**RxPlotHistogram** |R | Calls the RevoScaleR rxHistogram function to plot the histogram of a variable and then returns the plot as a binary object. This stored procedure is used in [Explore and visualize data](r-taxi-classification-explore-data.md).|
76-
|**RPlotRHist** |R| Creates a graphic using the Hist function and saves the output as a local PDF file. This stored procedure is used in [Explore and visualize data](r-taxi-classification-explore-data.md).|
77-
|**RxTrainLogitModel** |R| Trains a logistic regression model by calling an R package. The model predicts the value of the tipped column, and is trained using a randomly selected 70% of the data. The output of the stored procedure is the trained model, which is saved in the table nyc_taxi_models. This stored procedure is used in [Train and save a model](r-taxi-classification-train-model.md).|
74+
|**RxPlotHistogram** |R | Calls the RevoScaleR `rxHistogram` function to plot the histogram of a variable and then returns the plot as a binary object. This stored procedure is used in [Explore and visualize data](r-taxi-classification-explore-data.md).|
75+
|**RPlotRHist** |R| Creates a graphic using the `Hist` function and saves the output as a local PDF file. This stored procedure is used in [Explore and visualize data](r-taxi-classification-explore-data.md).|
76+
|**RxTrainLogitModel** |R| Trains a logistic regression model by calling an R package. The model predicts the value of the `tipped` column, and is trained using a randomly selected 70% of the data. The output of the stored procedure is the trained model, which is saved in the table `dbo.nyc_taxi_models`. This stored procedure is used in [Train and save a model](r-taxi-classification-train-model.md).|
7877
|**RxPredictBatchOutput** |R | Calls the trained model to create predictions using the model. The stored procedure accepts a query as its input parameter and returns a column of numeric values containing the scores for the input rows. This stored procedure is used in [Predict potential outcomes](r-taxi-classification-deploy-model.md).|
7978
|**RxPredictSingleRow** |R| Calls the trained model to create predictions using the model. This stored procedure accepts a new observation as input, with individual feature values passed as in-line parameters, and returns a value that predicts the outcome for the new observation. This stored procedure is used in [Predict potential outcomes](r-taxi-classification-deploy-model.md).|
8079

8180
## Query the data
8281

8382
As a validation step, run a query to confirm the data was uploaded.
8483

85-
1. In Object Explorer, under Databases, right-click the **NYCTaxi_Sample** database, and start a new query.
84+
1. In Object Explorer, under **Databases**, right-click the **NYCTaxi_Sample** database, and start a new query.
8685

8786
2. Run some simple queries:
8887

8988
```sql
9089
SELECT TOP(10) * FROM dbo.nyctaxi_sample;
9190
SELECT COUNT(*) FROM dbo.nyctaxi_sample;
9291
```
92+
9393
The database contains 1.7 million rows.
9494

95-
3. Within the database is a **nyctaxi_sample** table that contains the data set. The table has been optimized for set-based calculations with the addition of a [columnstore index](../../relational-databases/indexes/columnstore-indexes-overview.md). Run this statement to generate a quick summary on the table.
95+
3. Within the database is a `dbo.nyctaxi_sample` table that contains the data set. The table has been optimized for set-based calculations with the addition of a [columnstore index](../../relational-databases/indexes/columnstore-indexes-overview.md). Run this statement to generate a quick summary on the table.
9696

9797
```sql
9898
SELECT DISTINCT [passenger_count]
@@ -102,6 +102,7 @@ The database contains 1.7 million rows.
102102
GROUP BY [passenger_count]
103103
ORDER BY AvgFares DESC
104104
````
105+
105106
Results should be similar to those showing in the following screenshot.
106107
107108
![Table summary information](media/nyctaxidatatablesummary.png "Query results")

0 commit comments

Comments
 (0)