A data-driven project aimed at predicting the extent of structural damage in wildfire events based on various building and environmental factors. The project utilizes machine learning techniques for classification and provides a user-friendly web interface for predictions.
- Objective: Predict structural damage severity during wildfire incidents using building characteristics and geographical attributes.
- Type of Problem: Classification problem (predicting categorical damage levels).
The dataset includes the following features:
CAL FIRE Unit: The region under CAL FIRE jurisdiction.Structure Type: Type of structure (e.g., single-family residence, commercial, etc.).Roof Construction: Material used for the roof (e.g., asphalt, metal, tile, etc.).Vent Screen: Mesh size of vent screens (e.g., <= 1/8", none, etc.).Exterior Siding: Type of exterior siding material (e.g., wood, stucco, etc.).Window Pane: Type of window glazing (e.g., single-pane, double-pane, etc.).Incident Start Date: Date of the wildfire event.Latitude & Longitude: Geographic coordinates of the structure.Damage: Target variable indicating the extent of structural damage.
Key Observations:
- Certain materials and construction types exhibit higher vulnerability to wildfires.
- Location-based factors may influence damage severity.
- Feature encoding was performed for categorical variables.
- Imported Libraries: Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn.
- Dataset Loading: Data was read, cleaned, and structured for modeling.
- Missing Values: Checked and handled missing values appropriately.
- Exploratory Data Analysis (EDA):
- Visualized feature distributions and their relationship with damage levels.
- Identified potential correlations and risk factors.
- Handeled Incident Start Date: As the Machine Learning couldn't understand date foramt so created new columns seperately for Incident Start Year, Month & Day.
- Categorical Encoding: Encoded categorical variables such as
CAL FIRE Unit,Structure Type,Roof Construction,Vent Screen,Exterior Siding,Window Pane&Damage. - Scaling: Standardized numerical features (
Latitude,Longitude). - Data Splitting: Split the dataset into training and testing sets for model evaluation.
Implemented the following classification models:
- Logistic Regression
- Decision Tree Classifier
- Random Forest Classifier
- Gradient Boosting Classifier
- XGBoost Classifier
- Evaluation Metrics:
- Accuracy Score
- Precision, Recall, and F1-Score
- Confusion Matrix
- The models were compared, and the best-performing model was selected based on classification accuracy and recall for severe damage cases.
- Confusion Matrix: To understand misclassifications.
- Feature Importance: Identified key predictors of structural damage.
The best-performing model decision tree classifier model was saved and deployed as a web application:
- Model Saved:
import joblib joblib.dump(best_model, 'wildfire_damage_model.pkl')
- Web Application:
- Backend developed using FastAPI.
- Frontend implemented with HTML, CSS, and JavaScript for user input and predictions.
project/
├── api/
│ ├── main.py # FastAPI backend for handling predictions
├── data/
│ ├── california-wildfire-dataset.csv # Dataset used for model training
├── frontend/
│ ├── index.html # Web interface for the application
│ ├── styles.css # Styling for the web page
│ ├── script.js # Form handling and API interaction
├── model/
│ ├── Structural Damage Classification.ipynb # Notebook for EDA and model training
│ ├── decision_tree_model.pkl # Saved machine learning model
├── encoded_data_columns.pkl
├── label_encoder.pkl
├── model_scaler.pkl
├── one_hot_encoders.pkl
├── ordinal_encoder.pkl
├── scaler.pkl
├── README.md # Project documentation
- Navigate to the
api/directory. - Run the
main.pyfile using Python. - The backend will start at
http://localhost:8000.
- Open the
index.htmlfile in thefrontend/directory. - Fill out the form with structure details and submit to get the predicted damage classification.
- Improve model accuracy by incorporating additional environmental variables.
- Implement GIS-based analysis for better risk assessment.
- Deploy the application to a cloud service for real-time wildfire damage predictions.
