- Introduction
- System Requirements
- Workflow Automation
- Step 1: Data Preparation
- Step 2: Feature Engineering
- Step 3A: Train/test Regression Models
- Step 3B: Train/test Binary Classification Models
- Step 3C: Train/test Multi-class Classification Models
- Step 4: Production Scoring
This template demonstrates how to build and deploy predictive maintenance models to predict asset failures using [SQL Server ML Services]https://docs.microsoft.com/en-us/sql/advanced-analytics/what-is-sql-server-machine-learning).
Three modeling solutions are provided in this template to accomplish the following tasks:
- Regression: Predict the Remaining Useful Life (RUL) of an asset, or Time to Failure (TTF).
- Binary classification: Predict whether an asset will fail within a certain time frame (e.g. days).
- Multi-class classification: Predict whether an asset will fail in any one of multiple time windows: For example, asset fails in window [1, w0] days; asset fails in the window [w0+1,w1] days; asset will not fail within w1 days.
The time units mentioned above can be replaced by working hours, cycles, mileage, transactions, etc. based on the actual scenario.
This template uses the example of simulated aircraft engine run-to-failure events to demonstrate the predictive maintenance modeling process. The implicit assumption of modeling data as done below is that the asset of interest has a progressing degradation pattern, which is reflected in the asset's sensor measurements. By examining the asset's sensor values over time, the machine learning algorithm can learn the relationship between the sensor values and changes in sensor values to the historical failures in order to predict failures in the future. We suggest examining the data format and going through all three steps of the template before replacing the data with your own.
The template is divided into three separate steps, and each step is implemented in SQL Stored Procedures. The R development code was directly wrapped within stored procedures.
The SQL procedures can be executed in SQL Server environment (such as SQL Server Management Studio) and invoked by any applications. We demonstrated the end-to-end execution using a PowerShell script.
To run the scripts, you must prepare the following environment:
- An instance of SQL Server 2016 (Enterprise or Developer edition) CTP 3 or later, with SQL Server ML Services installed and configured
- A SQL login and password. The SQL login must have permissions to execute R scripts
- A database on the instance in which the login has been granted the permission to create and execute stored procedures
- The "plyr" and "zoo" R packages are needed. To install packages into the ML Services environment please read here: https://docs.microsoft.com/en-us/sql/advanced-analytics/r/install-additional-r-packages-on-sql-server
The following graph shows the overall work flow. The blue block represents each step of the PM template. Each step will interact with SQL server, either perform SQL table operations or invoking R through stored procedures.
The end-to-end workflow is fully automated by using a PowerShell script. To learn how to run the script, open a PowerShell command prompt, and type:
Get-Help .\SQLR-Predictive-Maintenance.ps1
To train and evaluate the models, you may run it as:
SQLR-Predictive-Maintenance.ps1 -ServerName [SQL Server instance name] -DBName [database name]
To score the production data, you may specify the -Score option:
SQLR-Predictive-Maintenance.ps1 -ServerName [SQL Server instance name] -DBName [database name] -Score
The following chart shows the workflow. In the chart, the blue parallelogram represents the action to take. Before each step, the user will have the choice to continue, skip or exit.
The first step is to create the tables in the specified database that are used for the training data, testing data, and trained models. The training data, testing data, and "ground truth" dataset provided as .CSV files in the current working directory. The corresponding tables in SQL Server are populated by using the bcp utility to bulk load the data from the files.
After the raw data is uploaded into SQL tables, we will label the train and test data.
- For Regression models, adding column RUL to represent how many more cycles an in-service engine will last before it fails.
- For Binary classification models, adding column label1 to represent whether this engine is going to fail within w1 cycles.
- For Multi-class classification models, adding column label2 to represent whether this engine is going to fail within the window [1, w0] cycles or to fail within the window [w0+1, w1] cycles, or it will not fail within w1 cycles. We used the following values in the template: w1 = 30, w0 = 15.
The files used in this step are:
| File | Description |
|---|---|
| ..\Data\PM_train.csv | Raw training data, aircraft engine run-to-failure data |
| ..\Data\PM_test.csv | Raw testing data, aircraft engine operating data without failure events recorded |
| ..\Data\PM_truth.csv | Ground truth data, containing the information for each engine in testing data |
| DataProcessing\create_table.sql | T-SQL script to create SQL table for train, test, truth and model tables |
| DataProcessing\data_labeling.sql | T-SQL script for data labeling |
Output of this step: Six tables are created in the SQL Server database:
- PM_train: Table for training data
- PM_test: Table for test data
- PM_truth: Table containing "ground truth" data
- PM_models: Table for storing trained models
- Labeled_train_data: Train data table with labels added
- Labeled_test_data: Test data table with labels added
This step focuses on data processing and feature engineering. The following tasks are performed in this step:
- Feature engineering: Create aggregated features (such as rolling means and standard deviations), perform feature normalization.
The files related to this step are:
| File | Description |
|---|---|
| DataProcessing\feature_engineering.sql | Stored procedure for feature engineering |
Output of this step: Four tables are created in the SQL Server database:
- train_Features: Training data table with added features
- test_Features: Testing data table with added features
- train_Features_Normalized: Normalized training data table with added features
- test_Features_Normalized: Normalized testing data table with added features
In this step, features are selected based on correlation, regression models are trained and evaluated, and the trained models are saved in the database. Scores and performance metrics that result from evaluating the model against the test data are also saved in the database. The regression models are created using these machine learning methods:
- Decision Forest Regression
- Boosted Decision Tree Regression
- Poisson Regression
- Neural Network Regression
The files related to this step are:
| File | Description |
|---|---|
| Regression\train_regression_model.sql | Train Regression models |
| Regression\test_regression_models.sql | Evaluate models on test data and save scores and the performance metrics into SQL table |
Output of this step:
- PM_models table: Four rows will be added with one column for model_name and another one to record the serialized trained model.
- regression_rf: Name of model using Decision Forest Regression
- regression_btree: Name of model using Boosted Decision Tree Regression
- regression_glm: Name of model using Poisson Regression
- regression_nn: Name of model using Neural Network Regression
- Regression_prediction: Predictions for test data for each model
- Regression_metrics: Metrics measured for each model
In this step, features are selected based on correlation, multiple binary classification models are trained and evaluated. The models are saved in the database after training. The scores and performance metrics from evaluating the trained models on test data are saved in the database as well. Models are trained using these four machine learning methods:
- Two-Class Logistic Regression
- Two-Class Boosted Decision Tree
- Two-Class Decision Forest
- Two-Class Neural Network
The files related to this step are:
| File | Description |
|---|---|
| BinaryClassification\train_binaryclass_model.sql | Train Binary Classification model |
| BinaryClassification\test_binaryclass_models.sql | Test models and save the performance metrics into SQL tables |
Output of this step:
- PM_models table: Four rows will be added, with one column for model_name and another one to record the serialized trained model.
- binaryclass_rf: Name of model using Two-Class Logistic Regression
- binaryclass_btree: Name of model using Two-Class Boosted Decision Tree
- binaryclass_logit: Name of model using Two-Class Decision Forest
- binaryclass_nn: Name of model using for Two-Class Neural Network
- Binaryclass_prediction: Predictions for test data for each model
- Binaryclass_metrics: Metrics measured for each model
In this step, features are selected based on correlation, multiple multi-class classification models are trained and evaluated. The models are saved in the database after training. The scores and performance metrics from evaluating the trained models on test data are saved in the database as well.
In this step, we train and evaluate two multi-class classification models, using these algorithms:
- Multi-class Decision Forest
- Multi-class Neural Network
We also train and evaluate two ordinal regression models using the Two-Class Logistic Regression and Two-Class Neural Network algorithms as the base model:
- Ordinal regression models using Two-Class Logistic Regression
- Ordinal regression models using Two-Class Neural Network
The files related to this step are:
| File | Description |
|---|---|
| MultiClassification\train_multiclass_model.sql | Train Multi-class Classification models |
| MultiClassification\test_multiclass_models.sql | Evaluate models on test data and save the scores and performance metrics into SQL tables |
Output of this step:
- PM_models table: Four rows will be added with one column for model_name and another one to record the serialized trained model.
- multiclass_rf: Name of model using Multiclass Decision Forest
- multiclass_btree: Name of model using Ordinal regression on Two-Class Logistic Regression
- multiclass_nn: Name of model using Multiclass Neural Network
- multiclass_mn: Name of model using Ordinal regression on Two-Class Neural Network
- Multiclass_prediction: Predictions for test data for each model
- Multiclass_metrics: Metrics measured for each model
In this step, we show how we call the stored procedures to make predictions on new data. For demo purpose, the data used for scoring is taken from testing dataset with engine id as 2 and 3.
Step1: Call the data preparation SQL script: If data is local CSV file, do the follwing:
- create PM_Score table in SQL Server using DataProcessing\create_table_score.sql
- Upload the data to PM_Score table using bcp utility
Output: SQL table PM_score, the raw data for scoring
Step2. Call the feature engineering SQL script: DataProcessing\feature_engineering_scoring.sql
Output: SQL table score_Features_Normalized, the data with new features and normalized
Step 3a. Call the Regression model SQL script: Regression\score_regression_model.sql
Output: SQL table Regression_score_[model_name], scoring result for regression model
Step 3b. Call the SQL script for Binary classification model: BinaryClassification\score_binaryclass_model.sql
Output: SQL table Binaryclass_score_[model_name], scoring result for binaryclassification model
Step 3c. Call the SQL script for Multiclass classification model: MultiClassification\score_multiclass_model.sql
Output: SQL table Multiclass_score_[model_name], scoring result for multiclass classification model

