Skip to content

Latest commit

 

History

History
 
 

README.md

Predictive Maintenance Template with SQL Server ML Services


  • Introduction
    • System Requirements
    • Workflow Automation
  • Step 1: Data Preparation
  • Step 2: Feature Engineering
  • Step 3A: Train/test Regression Models
  • Step 3B: Train/test Binary Classification Models
  • Step 3C: Train/test Multi-class Classification Models
  • Step 4: Production Scoring

INTRODUCTION


This template demonstrates how to build and deploy predictive maintenance models to predict asset failures using [SQL Server ML Services]https://docs.microsoft.com/en-us/sql/advanced-analytics/what-is-sql-server-machine-learning).

Three modeling solutions are provided in this template to accomplish the following tasks:

  • Regression: Predict the Remaining Useful Life (RUL) of an asset, or Time to Failure (TTF).
  • Binary classification: Predict whether an asset will fail within a certain time frame (e.g. days).
  • Multi-class classification: Predict whether an asset will fail in any one of multiple time windows: For example, asset fails in window [1, w0] days; asset fails in the window [w0+1,w1] days; asset will not fail within w1 days.

The time units mentioned above can be replaced by working hours, cycles, mileage, transactions, etc. based on the actual scenario.

This template uses the example of simulated aircraft engine run-to-failure events to demonstrate the predictive maintenance modeling process. The implicit assumption of modeling data as done below is that the asset of interest has a progressing degradation pattern, which is reflected in the asset's sensor measurements. By examining the asset's sensor values over time, the machine learning algorithm can learn the relationship between the sensor values and changes in sensor values to the historical failures in order to predict failures in the future. We suggest examining the data format and going through all three steps of the template before replacing the data with your own.

The template is divided into three separate steps, and each step is implemented in SQL Stored Procedures. The R development code was directly wrapped within stored procedures.

The SQL procedures can be executed in SQL Server environment (such as SQL Server Management Studio) and invoked by any applications. We demonstrated the end-to-end execution using a PowerShell script.

SYSTEM REQUIREMENTS


To run the scripts, you must prepare the following environment:

  • An instance of SQL Server 2016 (Enterprise or Developer edition) CTP 3 or later, with SQL Server ML Services installed and configured
  • A SQL login and password. The SQL login must have permissions to execute R scripts
  • A database on the instance in which the login has been granted the permission to create and execute stored procedures
  • The "plyr" and "zoo" R packages are needed. To install packages into the ML Services environment please read here: https://docs.microsoft.com/en-us/sql/advanced-analytics/r/install-additional-r-packages-on-sql-server

WORKFLOW AUTOMATION


The following graph shows the overall work flow. The blue block represents each step of the PM template. Each step will interact with SQL server, either perform SQL table operations or invoking R through stored procedures.

Architect of E2E work flow

The end-to-end workflow is fully automated by using a PowerShell script. To learn how to run the script, open a PowerShell command prompt, and type:

Get-Help .\SQLR-Predictive-Maintenance.ps1 

To train and evaluate the models, you may run it as:

SQLR-Predictive-Maintenance.ps1 -ServerName [SQL Server instance name] -DBName [database name] 

To score the production data, you may specify the -Score option:

SQLR-Predictive-Maintenance.ps1 -ServerName [SQL Server instance name] -DBName [database name] -Score

The following chart shows the workflow. In the chart, the blue parallelogram represents the action to take. Before each step, the user will have the choice to continue, skip or exit.

Work flow

STEP 1: DATA PREPARATION


The first step is to create the tables in the specified database that are used for the training data, testing data, and trained models. The training data, testing data, and "ground truth" dataset provided as .CSV files in the current working directory. The corresponding tables in SQL Server are populated by using the bcp utility to bulk load the data from the files.

After the raw data is uploaded into SQL tables, we will label the train and test data.

  • For Regression models, adding column RUL to represent how many more cycles an in-service engine will last before it fails.
  • For Binary classification models, adding column label1 to represent whether this engine is going to fail within w1 cycles.
  • For Multi-class classification models, adding column label2 to represent whether this engine is going to fail within the window [1, w0] cycles or to fail within the window [w0+1, w1] cycles, or it will not fail within w1 cycles. We used the following values in the template: w1 = 30, w0 = 15.

The files used in this step are:

File Description
..\Data\PM_train.csv Raw training data, aircraft engine run-to-failure data
..\Data\PM_test.csv Raw testing data, aircraft engine operating data without failure events recorded
..\Data\PM_truth.csv Ground truth data, containing the information for each engine in testing data
DataProcessing\create_table.sql T-SQL script to create SQL table for train, test, truth and model tables
DataProcessing\data_labeling.sql T-SQL script for data labeling

Output of this step: Six tables are created in the SQL Server database:

  • PM_train: Table for training data
  • PM_test: Table for test data
  • PM_truth: Table containing "ground truth" data
  • PM_models: Table for storing trained models
  • Labeled_train_data: Train data table with labels added
  • Labeled_test_data: Test data table with labels added

STEP 2: FEATURE ENGINEERING


This step focuses on data processing and feature engineering. The following tasks are performed in this step:

  • Feature engineering: Create aggregated features (such as rolling means and standard deviations), perform feature normalization.

The files related to this step are:

File Description
DataProcessing\feature_engineering.sql Stored procedure for feature engineering

Output of this step: Four tables are created in the SQL Server database:

  • train_Features: Training data table with added features
  • test_Features: Testing data table with added features
  • train_Features_Normalized: Normalized training data table with added features
  • test_Features_Normalized: Normalized testing data table with added features

STEP 3A: TRAIN AND EVALUATE REGRESSION MODELS


In this step, features are selected based on correlation, regression models are trained and evaluated, and the trained models are saved in the database. Scores and performance metrics that result from evaluating the model against the test data are also saved in the database. The regression models are created using these machine learning methods:

  • Decision Forest Regression
  • Boosted Decision Tree Regression
  • Poisson Regression
  • Neural Network Regression

The files related to this step are:

File Description
Regression\train_regression_model.sql Train Regression models
Regression\test_regression_models.sql Evaluate models on test data and save scores and the performance metrics into SQL table

Output of this step:

  • PM_models table: Four rows will be added with one column for model_name and another one to record the serialized trained model.
    • regression_rf: Name of model using Decision Forest Regression
    • regression_btree: Name of model using Boosted Decision Tree Regression
    • regression_glm: Name of model using Poisson Regression
    • regression_nn: Name of model using Neural Network Regression
  • Regression_prediction: Predictions for test data for each model
  • Regression_metrics: Metrics measured for each model

STEP 3B: TRAIN AND EVALUATE BINARY CLASSIFICATION MODELS


In this step, features are selected based on correlation, multiple binary classification models are trained and evaluated. The models are saved in the database after training. The scores and performance metrics from evaluating the trained models on test data are saved in the database as well. Models are trained using these four machine learning methods:

  • Two-Class Logistic Regression
  • Two-Class Boosted Decision Tree
  • Two-Class Decision Forest
  • Two-Class Neural Network

The files related to this step are:

File Description
BinaryClassification\train_binaryclass_model.sql Train Binary Classification model
BinaryClassification\test_binaryclass_models.sql Test models and save the performance metrics into SQL tables

Output of this step:

  • PM_models table: Four rows will be added, with one column for model_name and another one to record the serialized trained model.
    • binaryclass_rf: Name of model using Two-Class Logistic Regression
    • binaryclass_btree: Name of model using Two-Class Boosted Decision Tree
    • binaryclass_logit: Name of model using Two-Class Decision Forest
    • binaryclass_nn: Name of model using for Two-Class Neural Network
  • Binaryclass_prediction: Predictions for test data for each model
  • Binaryclass_metrics: Metrics measured for each model

STEP 3C: TRAIN/TEST MULTI-CLASS CLASSIFICATION MODELS


In this step, features are selected based on correlation, multiple multi-class classification models are trained and evaluated. The models are saved in the database after training. The scores and performance metrics from evaluating the trained models on test data are saved in the database as well.

In this step, we train and evaluate two multi-class classification models, using these algorithms:

  • Multi-class Decision Forest
  • Multi-class Neural Network

We also train and evaluate two ordinal regression models using the Two-Class Logistic Regression and Two-Class Neural Network algorithms as the base model:

  • Ordinal regression models using Two-Class Logistic Regression
  • Ordinal regression models using Two-Class Neural Network

The files related to this step are:

File Description
MultiClassification\train_multiclass_model.sql Train Multi-class Classification models
MultiClassification\test_multiclass_models.sql Evaluate models on test data and save the scores and performance metrics into SQL tables

Output of this step:

  • PM_models table: Four rows will be added with one column for model_name and another one to record the serialized trained model.
    • multiclass_rf: Name of model using Multiclass Decision Forest
    • multiclass_btree: Name of model using Ordinal regression on Two-Class Logistic Regression
    • multiclass_nn: Name of model using Multiclass Neural Network
    • multiclass_mn: Name of model using Ordinal regression on Two-Class Neural Network
  • Multiclass_prediction: Predictions for test data for each model
  • Multiclass_metrics: Metrics measured for each model

STEP 4: Production Scoring

In this step, we show how we call the stored procedures to make predictions on new data. For demo purpose, the data used for scoring is taken from testing dataset with engine id as 2 and 3.

Step1: Call the data preparation SQL script: If data is local CSV file, do the follwing:

  • create PM_Score table in SQL Server using DataProcessing\create_table_score.sql
  • Upload the data to PM_Score table using bcp utility

Output: SQL table PM_score, the raw data for scoring

Step2. Call the feature engineering SQL script: DataProcessing\feature_engineering_scoring.sql

Output: SQL table score_Features_Normalized, the data with new features and normalized

Step 3a. Call the Regression model SQL script: Regression\score_regression_model.sql

Output: SQL table Regression_score_[model_name], scoring result for regression model

Step 3b. Call the SQL script for Binary classification model: BinaryClassification\score_binaryclass_model.sql

Output: SQL table Binaryclass_score_[model_name], scoring result for binaryclassification model

Step 3c. Call the SQL script for Multiclass classification model: MultiClassification\score_multiclass_model.sql

Output: SQL table Multiclass_score_[model_name], scoring result for multiclass classification model