Skip to content

Build an end-to-end ML solution that predicts whether a machine will fail in the near future, using the provided dataset. Focus on clean engineering and deployment craftsmanship; the modelling can be simple if it is defensible.

Notifications You must be signed in to change notification settings

danbello1795/ML-Engineering-Technical-Challenge

Repository files navigation

Predictive Maintenance – Machine Failure Prediction

This project is a complete end-to-end machine learning solution that predicts whether a machine will fail within the next 24 hours using the Microsoft Azure Predictive Maintenance dataset.


🔗 Dataset

Source: Kaggle - Microsoft Azure Predictive Maintenance


🚀 Project Structure

├── data/
│   ├── raw/               # Raw CSV files from Kaggle
│   └── output/            # Prediction results
├── docs/
│   ├── Short_Technical_Report.pdf
│   └── Technical_Design_Document.pdf
├── notebooks/
│   └── EDA.ipynb          # Exploratory Data Analysis
├── src/
│   ├── data_loader.py     # Data loading utilities
│   ├── preprocess.py      # Feature engineering & transformation
│   ├── model.py           # ML model pipeline definition
│   ├── train.py           # Training script
│   └── predict.py         # Prediction script
├── test_telemetry.csv     # Sample test input
├── test_machines.csv      # Sample machine metadata
├── artifacts/
│   └── model.pkl          # Trained model (saved after training)
├── requirements.txt       # Python dependencies
├── Dockerfile             # For containerizing the app
├── app.py                 # FastAPI app for deployment
├── train_pipeline.py      # Production Training Pipeline
└── README.md              # Project documentation

📄 Documentation

This project includes the following technical documents in the docs/ directory:

  • Short_Technical_Report.pdf: A brief summary of the project, methodology, and results.
  • Technical_Design_Document.pdf: A detailed document outlining the system architecture, data processing, modeling approach, and deployment strategy.

📊 Notebooks

The notebooks/EDA.ipynb file contains exploratory data analysis (EDA), including:

  • Dataset overview
  • Missing value checks
  • Visualizations of telemetry variables
  • Machine metadata insights
  • Failure type distributions

This notebook helped inform the feature engineering and preprocessing steps.

🛠️ How to Run

1. Create and Activate a Virtual Environment

It is highly recommended to use a virtual environment to manage project dependencies and avoid conflicts.

On Windows:

python -m venv venv
venv\Scripts\activate

On macOS/Linux:

python3 -m venv venv
source venv/bin/activate

2. Install Dependencies

Once your virtual environment is activated, install the required packages:

pip install -r requirements.txt

Note – ModuleNotFoundError: No module named 'src'
If you get this error, run scripts as modules from the project root:

python -m src.train
python -m src.predict

Or install the package locally so plain calls work:

pip install -e .
python src/predict.py

3. src/train.py — Minimal Training Script (For Quick Tests)

This is a lightweight training script for quick development/testing.

Run with:

python src/train.py

What it does:

  • Loads raw telemetry, machine, and failure data.

  • Merges and preprocesses data.

  • Trains the model and saves it to model.pkl.

  • Prints classification report to console.

train_pipeline.py — Full Production Training Pipeline

This script includes full metrics logging and directory structure, recommended for production.

Run With:

python train_pipeline.py

What it does:

  • Loads and preprocesses all raw data.

  • Splits data into training and testing sets.

  • Trains the model using build_model().

  • Saves the model to artifacts/model.pkl.

  • Generates and saves classification metrics to artifacts/metrics.json.

4. Make Predictions on Test Data

python src/predict.py

Output will be saved to data/output/predictions.csv.

Deployment with Docker

This project includes a Dockerfile to easily build and deploy the application as a container. This is the recommended way to run the prediction service in a production environment.

1. Build the Docker Image

First, ensure Docker is running on your machine. Then, from the project's root directory (the one containing the Dockerfile), run the following command to build the image:

docker build -t predictive-maintenance-app .

2. Run the Docker Container

Once the image is built, run the following command to start the container. This will launch the FastAPI application, and the API will be accessible on port 8000.

docker run -d -p 8000:8000 --name prediction-api predictive-maintenance-app

3. Test the Prediction Endpoint

You can now send requests to the API. Use curl to send the test data and get a prediction. Make sure you run this command from the same directory that contains test_telemetry.csv and test_machines.csv.

curl -X POST -F "file_telemetry=@test_telemetry.csv" -F "file_machines=@test_machines.csv" http://localhost:8000/predict

The API will return a JSON response with the machine IDs and their failure predictions.

Modeling Approach

Aggregation: 3-hour window telemetry aggregation

Features: Telemetry signals, machine model, machine age

Target: Binary label for machine failure within 24 hours

Model: RandomForestClassifier with balanced class weights

Pipeline: StandardScaler + RandomForest wrapped in a sklearn pipeline

Evaluation Metric

Model performance is reported via classification_report, including:

  • Accuracy

  • Precision

  • Recall

  • F1 Score

Trade-offs

  • Focused on end-to-end reproducibility and engineering clarity.
  • Not tuned for highest possible accuracy.
  • RandomForest chosen for simplicity and interpretability.

About

Build an end-to-end ML solution that predicts whether a machine will fail in the near future, using the provided dataset. Focus on clean engineering and deployment craftsmanship; the modelling can be simple if it is defensible.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published