SQLR

Predictive Maintenance Template with SQL Server ML Services

Introduction
- System Requirements
- Workflow Automation
Step 1: Data Preparation
Step 2: Feature Engineering
Step 3A: Train/test Regression Models
Step 3B: Train/test Binary Classification Models
Step 3C: Train/test Multi-class Classification Models
Step 4: Production Scoring

INTRODUCTION

This template demonstrates how to build and deploy predictive maintenance models to predict asset failures using [SQL Server ML Services]https://docs.microsoft.com/en-us/sql/advanced-analytics/what-is-sql-server-machine-learning).

Three modeling solutions are provided in this template to accomplish the following tasks:

Regression: Predict the Remaining Useful Life (RUL) of an asset, or Time to Failure (TTF).
Binary classification: Predict whether an asset will fail within a certain time frame (e.g. days).
Multi-class classification: Predict whether an asset will fail in any one of multiple time windows: For example, asset fails in window [1, w0] days; asset fails in the window [w0+1,w1] days; asset will not fail within w1 days.

The time units mentioned above can be replaced by working hours, cycles, mileage, transactions, etc. based on the actual scenario.

This template uses the example of simulated aircraft engine run-to-failure events to demonstrate the predictive maintenance modeling process. The implicit assumption of modeling data as done below is that the asset of interest has a progressing degradation pattern, which is reflected in the asset's sensor measurements. By examining the asset's sensor values over time, the machine learning algorithm can learn the relationship between the sensor values and changes in sensor values to the historical failures in order to predict failures in the future. We suggest examining the data format and going through all three steps of the template before replacing the data with your own.

The template is divided into three separate steps, and each step is implemented in SQL Stored Procedures. The R development code was directly wrapped within stored procedures.

The SQL procedures can be executed in SQL Server environment (such as SQL Server Management Studio) and invoked by any applications. We demonstrated the end-to-end execution using a PowerShell script.

SYSTEM REQUIREMENTS

To run the scripts, you must prepare the following environment:

An instance of SQL Server 2016 (Enterprise or Developer edition) CTP 3 or later, with SQL Server ML Services installed and configured
A SQL login and password. The SQL login must have permissions to execute R scripts
A database on the instance in which the login has been granted the permission to create and execute stored procedures
The "plyr" and "zoo" R packages are needed. To install packages into the ML Services environment please read here: https://docs.microsoft.com/en-us/sql/advanced-analytics/r/install-additional-r-packages-on-sql-server

WORKFLOW AUTOMATION

The following graph shows the overall work flow. The blue block represents each step of the PM template. Each step will interact with SQL server, either perform SQL table operations or invoking R through stored procedures.

The end-to-end workflow is fully automated by using a PowerShell script. To learn how to run the script, open a PowerShell command prompt, and type:

Get-Help .\SQLR-Predictive-Maintenance.ps1

To train and evaluate the models, you may run it as:

SQLR-Predictive-Maintenance.ps1 -ServerName [SQL Server instance name] -DBName [database name]

To score the production data, you may specify the -Score option:

SQLR-Predictive-Maintenance.ps1 -ServerName [SQL Server instance name] -DBName [database name] -Score

The following chart shows the workflow. In the chart, the blue parallelogram represents the action to take. Before each step, the user will have the choice to continue, skip or exit.

STEP 1: DATA PREPARATION

The first step is to create the tables in the specified database that are used for the training data, testing data, and trained models. The training data, testing data, and "ground truth" dataset provided as .CSV files in the current working directory. The corresponding tables in SQL Server are populated by using the bcp utility to bulk load the data from the files.

After the raw data is uploaded into SQL tables, we will label the train and test data.

For Regression models, adding column RUL to represent how many more cycles an in-service engine will last before it fails.
For Binary classification models, adding column label1 to represent whether this engine is going to fail within w1 cycles.
For Multi-class classification models, adding column label2 to represent whether this engine is going to fail within the window [1, w0] cycles or to fail within the window [w0+1, w1] cycles, or it will not fail within w1 cycles. We used the following values in the template: w1 = 30, w0 = 15.

The files used in this step are:

File	Description
..\Data\PM_train.csv	Raw training data, aircraft engine run-to-failure data
..\Data\PM_test.csv	Raw testing data, aircraft engine operating data without failure events recorded
..\Data\PM_truth.csv	Ground truth data, containing the information for each engine in testing data
DataProcessing\create_table.sql	T-SQL script to create SQL table for train, test, truth and model tables
DataProcessing\data_labeling.sql	T-SQL script for data labeling

Output of this step: Six tables are created in the SQL Server database:

PM_train: Table for training data
PM_test: Table for test data
PM_truth: Table containing "ground truth" data
PM_models: Table for storing trained models
Labeled_train_data: Train data table with labels added
Labeled_test_data: Test data table with labels added

STEP 2: FEATURE ENGINEERING

This step focuses on data processing and feature engineering. The following tasks are performed in this step:

Feature engineering: Create aggregated features (such as rolling means and standard deviations), perform feature normalization.

The files related to this step are:

File	Description
DataProcessing\feature_engineering.sql	Stored procedure for feature engineering

Output of this step: Four tables are created in the SQL Server database:

train_Features: Training data table with added features
test_Features: Testing data table with added features
train_Features_Normalized: Normalized training data table with added features
test_Features_Normalized: Normalized testing data table with added features

STEP 3A: TRAIN AND EVALUATE REGRESSION MODELS

In this step, features are selected based on correlation, regression models are trained and evaluated, and the trained models are saved in the database. Scores and performance metrics that result from evaluating the model against the test data are also saved in the database. The regression models are created using these machine learning methods:

Decision Forest Regression
Boosted Decision Tree Regression
Poisson Regression
Neural Network Regression

The files related to this step are:

File	Description
Regression\train_regression_model.sql	Train Regression models
Regression\test_regression_models.sql	Evaluate models on test data and save scores and the performance metrics into SQL table

Output of this step:

PM_models table: Four rows will be added with one column for model_name and another one to record the serialized trained model.
- regression_rf: Name of model using Decision Forest Regression
- regression_btree: Name of model using Boosted Decision Tree Regression
- regression_glm: Name of model using Poisson Regression
- regression_nn: Name of model using Neural Network Regression
Regression_prediction: Predictions for test data for each model
Regression_metrics: Metrics measured for each model

STEP 3B: TRAIN AND EVALUATE BINARY CLASSIFICATION MODELS

In this step, features are selected based on correlation, multiple binary classification models are trained and evaluated. The models are saved in the database after training. The scores and performance metrics from evaluating the trained models on test data are saved in the database as well. Models are trained using these four machine learning methods:

Two-Class Logistic Regression
Two-Class Boosted Decision Tree
Two-Class Decision Forest
Two-Class Neural Network

The files related to this step are:

File	Description
BinaryClassification\train_binaryclass_model.sql	Train Binary Classification model
BinaryClassification\test_binaryclass_models.sql	Test models and save the performance metrics into SQL tables

Output of this step:

PM_models table: Four rows will be added, with one column for model_name and another one to record the serialized trained model.
- binaryclass_rf: Name of model using Two-Class Logistic Regression
- binaryclass_btree: Name of model using Two-Class Boosted Decision Tree
- binaryclass_logit: Name of model using Two-Class Decision Forest
- binaryclass_nn: Name of model using for Two-Class Neural Network
Binaryclass_prediction: Predictions for test data for each model
Binaryclass_metrics: Metrics measured for each model

STEP 3C: TRAIN/TEST MULTI-CLASS CLASSIFICATION MODELS

In this step, features are selected based on correlation, multiple multi-class classification models are trained and evaluated. The models are saved in the database after training. The scores and performance metrics from evaluating the trained models on test data are saved in the database as well.

In this step, we train and evaluate two multi-class classification models, using these algorithms:

Multi-class Decision Forest
Multi-class Neural Network

We also train and evaluate two ordinal regression models using the Two-Class Logistic Regression and Two-Class Neural Network algorithms as the base model:

Ordinal regression models using Two-Class Logistic Regression
Ordinal regression models using Two-Class Neural Network

The files related to this step are:

File	Description
MultiClassification\train_multiclass_model.sql	Train Multi-class Classification models
MultiClassification\test_multiclass_models.sql	Evaluate models on test data and save the scores and performance metrics into SQL tables

Output of this step:

PM_models table: Four rows will be added with one column for model_name and another one to record the serialized trained model.
- multiclass_rf: Name of model using Multiclass Decision Forest
- multiclass_btree: Name of model using Ordinal regression on Two-Class Logistic Regression
- multiclass_nn: Name of model using Multiclass Neural Network
- multiclass_mn: Name of model using Ordinal regression on Two-Class Neural Network
Multiclass_prediction: Predictions for test data for each model
Multiclass_metrics: Metrics measured for each model

STEP 4: Production Scoring

In this step, we show how we call the stored procedures to make predictions on new data. For demo purpose, the data used for scoring is taken from testing dataset with engine id as 2 and 3.

Step1: Call the data preparation SQL script: If data is local CSV file, do the follwing:

create PM_Score table in SQL Server using DataProcessing\create_table_score.sql
Upload the data to PM_Score table using bcp utility

Output: SQL table PM_score, the raw data for scoring

Step2. Call the feature engineering SQL script: DataProcessing\feature_engineering_scoring.sql

Output: SQL table score_Features_Normalized, the data with new features and normalized

Step 3a. Call the Regression model SQL script: Regression\score_regression_model.sql

Output: SQL table Regression_score_[model_name], scoring result for regression model

Step 3b. Call the SQL script for Binary classification model: BinaryClassification\score_binaryclass_model.sql

Output: SQL table Binaryclass_score_[model_name], scoring result for binaryclassification model

Step 3c. Call the SQL script for Multiclass classification model: MultiClassification\score_multiclass_model.sql

Output: SQL table Multiclass_score_[model_name], scoring result for multiclass classification model

Name		Name	Last commit message	Last commit date
parent directory ..
BinaryClassification		BinaryClassification
DataProcessing		DataProcessing
MultiClassification		MultiClassification
Regression		Regression
PredictiveMaintenanceHelp.url		PredictiveMaintenanceHelp.url
PredictiveMaintenanceSetup.ps1		PredictiveMaintenanceSetup.ps1
README.md		README.md
SQLR-Predictive-Maintenance.ps1		SQLR-Predictive-Maintenance.ps1
workflow_architect.png		workflow_architect.png
workflow_automation.png		workflow_automation.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Predictive Maintenance Template with SQL Server ML Services

INTRODUCTION

SYSTEM REQUIREMENTS

WORKFLOW AUTOMATION

STEP 1: DATA PREPARATION

STEP 2: FEATURE ENGINEERING

STEP 3A: TRAIN AND EVALUATE REGRESSION MODELS

STEP 3B: TRAIN AND EVALUATE BINARY CLASSIFICATION MODELS

STEP 3C: TRAIN/TEST MULTI-CLASS CLASSIFICATION MODELS

STEP 4: Production Scoring

FilesExpand file tree

SQLR

Directory actions

More options

Directory actions

More options

Latest commit

History

SQLR

Folders and files

parent directory

README.md

Predictive Maintenance Template with SQL Server ML Services

INTRODUCTION

SYSTEM REQUIREMENTS

WORKFLOW AUTOMATION

STEP 1: DATA PREPARATION

STEP 2: FEATURE ENGINEERING

STEP 3A: TRAIN AND EVALUATE REGRESSION MODELS

STEP 3B: TRAIN AND EVALUATE BINARY CLASSIFICATION MODELS

STEP 3C: TRAIN/TEST MULTI-CLASS CLASSIFICATION MODELS

STEP 4: Production Scoring