This repository contains a credit risk analysis pipeline using machine learning models to predict loan default likelihood. The dataset consists of borrower information, loan details, and credit history. The project includes data preprocessing, exploratory data analysis (EDA), model training, evaluation, and visualization.
- Handles missing values and outliers
- Encodes categorical features
- Splits data into training and testing sets
- Summary statistics and outlier detection
- Scatter matrix and box plots to identify trends
- Parallel category diagrams to analyze categorical feature relationships
The project implements multiple classification models to predict loan default likelihood:
- K-Nearest Neighbors (KNN)
- Logistic Regression (LG)
- XGBoost Classifier
Each model is evaluated using metrics such as:
- Classification Report (Precision, Recall, F1-score)
- ROC Curve & AUC Score
- Reliability Plot & Brier Score
- Feature Importance Analysis
- The dataset used for this analysis is credit_risk_dataset.csv, sourced from Kaggle. A link to the dataset will be added later.
- It includes borrower demographics, loan details, employment history, and previous credit behavior.
📂 credit-risk-analysis
├── README.md # Project documentation
├── credit_risk_dataset.csv # Dataset used in the analysis
├── docs/ # Additional project documentation
├── images/ # Generated plots and analysis figures
│ ├── FeatureImportance.png
│ ├── ROC.png
│ └── RealiabilityPlot.png
└── notebook.ipynb # Jupyter Notebook with full analysis
- Clone the repository:
git clone https://github.com/yourusername/credit-risk-analysis.git cd credit-risk-analysis - Open the Jupyter Notebook and run the cells to process data, train models, and analyze results.
The project includes various visualizations such as:
- Scatter Matrix: Identifying trends and outliers.
- Box Plots: Analyzing loan grades and default probability.
- Parallel Category Diagram: Understanding relationships between categorical features.
- ROC Curve: Model performance comparison.
- Feature Importance Plot: Identifying key predictors of loan default.
For detailed documentation, refer to the docs/ directory or the provided LaTeX/PDF documentation.
- [Your Name]
This project is licensed under the MIT License.