Predictive Maintenance – Machine Failure Prediction

This project is a complete end-to-end machine learning solution that predicts whether a machine will fail within the next 24 hours using the Microsoft Azure Predictive Maintenance dataset.

🔗 Dataset

Source: Kaggle - Microsoft Azure Predictive Maintenance

🚀 Project Structure

├── data/
│   ├── raw/               # Raw CSV files from Kaggle
│   └── output/            # Prediction results
├── docs/
│   ├── Short_Technical_Report.pdf
│   └── Technical_Design_Document.pdf
├── notebooks/
│   └── EDA.ipynb          # Exploratory Data Analysis
├── src/
│   ├── data_loader.py     # Data loading utilities
│   ├── preprocess.py      # Feature engineering & transformation
│   ├── model.py           # ML model pipeline definition
│   ├── train.py           # Training script
│   └── predict.py         # Prediction script
├── test_telemetry.csv     # Sample test input
├── test_machines.csv      # Sample machine metadata
├── artifacts/
│   └── model.pkl          # Trained model (saved after training)
├── requirements.txt       # Python dependencies
├── Dockerfile             # For containerizing the app
├── app.py                 # FastAPI app for deployment
├── train_pipeline.py      # Production Training Pipeline
└── README.md              # Project documentation

📄 Documentation

This project includes the following technical documents in the docs/ directory:

Short_Technical_Report.pdf: A brief summary of the project, methodology, and results.
Technical_Design_Document.pdf: A detailed document outlining the system architecture, data processing, modeling approach, and deployment strategy.

📊 Notebooks

The notebooks/EDA.ipynb file contains exploratory data analysis (EDA), including:

Dataset overview
Missing value checks
Visualizations of telemetry variables
Machine metadata insights
Failure type distributions

This notebook helped inform the feature engineering and preprocessing steps.

🛠️ How to Run

1. Create and Activate a Virtual Environment

It is highly recommended to use a virtual environment to manage project dependencies and avoid conflicts.

On Windows:

python -m venv venv
venv\Scripts\activate

On macOS/Linux:

python3 -m venv venv
source venv/bin/activate

2. Install Dependencies

Once your virtual environment is activated, install the required packages:

pip install -r requirements.txt

Note – ModuleNotFoundError: No module named 'src'
If you get this error, run scripts as modules from the project root:
python -m src.train
python -m src.predict
Or install the package locally so plain calls work:
pip install -e .
python src/predict.py

3. `src/train.py` — Minimal Training Script (For Quick Tests)

This is a lightweight training script for quick development/testing.

Run with:

python src/train.py

What it does:

Loads raw telemetry, machine, and failure data.
Merges and preprocesses data.
Trains the model and saves it to model.pkl.
Prints classification report to console.

`train_pipeline.py` — Full Production Training Pipeline

This script includes full metrics logging and directory structure, recommended for production.

Run With:

python train_pipeline.py

What it does:

Loads and preprocesses all raw data.
Splits data into training and testing sets.
Trains the model using build_model().
Saves the model to artifacts/model.pkl.
Generates and saves classification metrics to artifacts/metrics.json.

4. Make Predictions on Test Data

python src/predict.py

Output will be saved to data/output/predictions.csv.

Deployment with Docker

This project includes a Dockerfile to easily build and deploy the application as a container. This is the recommended way to run the prediction service in a production environment.

1. Build the Docker Image

First, ensure Docker is running on your machine. Then, from the project's root directory (the one containing the Dockerfile), run the following command to build the image:

docker build -t predictive-maintenance-app .

2. Run the Docker Container

Once the image is built, run the following command to start the container. This will launch the FastAPI application, and the API will be accessible on port 8000.

docker run -d -p 8000:8000 --name prediction-api predictive-maintenance-app

3. Test the Prediction Endpoint

You can now send requests to the API. Use curl to send the test data and get a prediction. Make sure you run this command from the same directory that contains test_telemetry.csv and test_machines.csv.

curl -X POST -F "file_telemetry=@test_telemetry.csv" -F "file_machines=@test_machines.csv" http://localhost:8000/predict

The API will return a JSON response with the machine IDs and their failure predictions.

Modeling Approach

Aggregation: 3-hour window telemetry aggregation

Features: Telemetry signals, machine model, machine age

Target: Binary label for machine failure within 24 hours

Model: RandomForestClassifier with balanced class weights

Pipeline: StandardScaler + RandomForest wrapped in a sklearn pipeline

Evaluation Metric

Model performance is reported via classification_report, including:

Accuracy
Precision
Recall
F1 Score

Trade-offs

Focused on end-to-end reproducibility and engineering clarity.
Not tuned for highest possible accuracy.
RandomForest chosen for simplicity and interpretability.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.vscode		.vscode
artifacts		artifacts
data		data
docs		docs
notebooks		notebooks
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
EXPLICACION_PROYECTO.md		EXPLICACION_PROYECTO.md
README.md		README.md
RESULTADOS_EJECUCION.md		RESULTADOS_EJECUCION.md
app.py		app.py
generate_test_data.py		generate_test_data.py
out.json		out.json
output.json		output.json
requirements.txt		requirements.txt
test_api.py		test_api.py
test_machines.csv		test_machines.csv
test_telemetry.csv		test_telemetry.csv
tmp_compute_metrics.py		tmp_compute_metrics.py
train_pipeline.py		train_pipeline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Predictive Maintenance – Machine Failure Prediction

🔗 Dataset

🚀 Project Structure

📄 Documentation

📊 Notebooks

🛠️ How to Run

1. Create and Activate a Virtual Environment

2. Install Dependencies

3. `src/train.py` — Minimal Training Script (For Quick Tests)

`train_pipeline.py` — Full Production Training Pipeline

4. Make Predictions on Test Data

Deployment with Docker

1. Build the Docker Image

2. Run the Docker Container

3. Test the Prediction Endpoint

Modeling Approach

Evaluation Metric

Trade-offs

About

Uh oh!

Releases

Packages

Languages

danbello1795/ML-Engineering-Technical-Challenge

Folders and files

Latest commit

History

Repository files navigation

Predictive Maintenance – Machine Failure Prediction

🔗 Dataset

🚀 Project Structure

📄 Documentation

📊 Notebooks

🛠️ How to Run

1. Create and Activate a Virtual Environment

2. Install Dependencies

3. src/train.py — Minimal Training Script (For Quick Tests)

train_pipeline.py — Full Production Training Pipeline

4. Make Predictions on Test Data

Deployment with Docker

1. Build the Docker Image

2. Run the Docker Container

3. Test the Prediction Endpoint

Modeling Approach

Evaluation Metric

Trade-offs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

3. `src/train.py` — Minimal Training Script (For Quick Tests)

`train_pipeline.py` — Full Production Training Pipeline

Packages