This project is to understand and provide a template on how to take a model in Mlflow to deployment with AWS Sagemaker. We will be using a Kinematic Movement Dataset from Kaggle, which is used to predict activite from the phone's sensors.
- [Project Overview]
- [Installation]
- [Usage]
- [Project Structure]
- [Acknowledgments]
This project is an end to end model training to deployment pipeline, below objectives are achieved:
- Creating Linear Regression, XGBoost and a stacked model with a Linear regressor and the XGBoost models.
- Testing and evalution of these models.
- Deploying the model with Amazon Sagemaker.
- Python
- Pandas
- SKlearn
- XGBoost
- Docker
- AWS - Sagemaker
- AWS - ECR
- AWS - S3
This project can be cloned, but it is highly recommend to use it as a guideline. The sets are listed under "Running the Application"
- Step 1 - Create ENV
- Step 2 - Download all neccessary libaries
- Step 3 - Run run_training.py file with python run_training.py
- Step 4 - To view the MlFlow dashboard on port 500 - mlflow ui --port 5000
- Step 5 - Upload model artifact to S3
- Step 6 - Create Docker file
- Step 7 - Upload to ECR
- Step 8 - Create sagemaker endpoint
Please ensure to set up your AWS credentials with 'AWS configure'.
project-root/
│
├── check_kaggle.py/ # Accesing the dataset from Kaggle
├── data_ingestion.py/ # Faciliating the data ingestion from the check_kaggle.py
├── data_cleaning.py/ # Cleaning the data after the data ingestion.
├── log_df.py/ # Log the cleaned dataframe into MLFlow.
├── model_tuning.py/ # Parameter search for XGBoost model's parameter.
├── model_stacking.py/ # The training of all the models as well as model evaluation.
├── run_training.py/ # Running the training pipeline.
├── run_inference.py/ # Running the inference pipeline.
├── deploy_sagemaker_initialrun.py/ # Running deployment at the first time.
├── deploy_sagemaker.py/ # Running the deployment after first time, this replaces the model in the endpoint.
├── dockerfile/ # The docker file of the model selected
└── README.md # Details about the project
Shout out to mlflow, aws sagemaker documentations.