diff --git a/docs/ML/projects/cardiovascular_disease_prediction.md b/docs/ML/projects/cardiovascular_disease_prediction.md
new file mode 100644
index 00000000..0cf36a6b
--- /dev/null
+++ b/docs/ML/projects/cardiovascular_disease_prediction.md
@@ -0,0 +1,311 @@
+# Cardiovascular Disease Prediction
+
+### AIM
+
+To predict the risk of cardiovascular disease based on lifestyle factors.
+
+### DATASET LINK
+
+[https://www.kaggle.com/datasets/alphiree/cardiovascular-diseases-risk-prediction-dataset](https://www.kaggle.com/datasets/alphiree/cardiovascular-diseases-risk-prediction-dataset)
+
+### MY NOTEBOOK LINK
+
+[https://www.kaggle.com/code/sid4ds/cardiovascular-disease-risk-prediction](https://www.kaggle.com/code/sid4ds/cardiovascular-disease-risk-prediction)
+
+### LIBRARIES NEEDED
+
+??? quote "LIBRARIES USED"
+
+ - pandas
+ - numpy
+ - scikit-learn (>=1.5.0 for TunedThresholdClassifierCV)
+ - matplotlib
+ - seaborn
+ - joblib
+
+---
+
+### DESCRIPTION
+
+!!! info "What is the requirement of the project?"
+ - This project aims to predict the risk of cardivascular diseases (CVD) based on data provided by people about their lifestyle factors. Predicting the risk in advance can minimize cases which reach a terminal stage.
+
+??? info "Why is it necessary?"
+ - CVD is one of the leading causes of death globally. Using machine learning models to predict risk of CVD can be an important tool in helping the people affected by it.
+
+??? info "How is it beneficial and used?"
+ - Doctors can use it as a second opinion to support their diagnosis. It also acts as a fallback mechanism in rare cases where the diagnosis is not obvious.
+ - People (patients in particular) can track their risk of CVD based on their own lifestyle and schedule an appointment with a doctor in advance to mitigate the risk.
+
+??? info "How did you start approaching this project? (Initial thoughts and planning)"
+ - Going through previous research and articles related to the problem.
+ - Data exploration to understand the features. Using data visualization to check their distributions.
+ - Identifying key metrics for the problem based on ratio of target classes.
+ - Feature engineering and selection based on EDA.
+ - Setting up a framework for easier testing of multiple models.
+ - Analysing results of models using confusion matrix.
+
+??? info "Mention any additional resources used (blogs, books, chapters, articles, research papers, etc.)."
+ - Research paper: [Integrated Machine Learning Model for Comprehensive Heart Disease Risk Assessment Based on Multi-Dimensional Health Factors](https://eajournals.org/ejcsit/vol11-issue-3-2023/integrated-machine-learning-model-for-comprehensive-heart-disease-risk-assessment-based-on-multi-dimensional-health-factors/)
+ - Public notebook: [Cardiovascular-Diseases-Risk-Prediction](https://www.kaggle.com/code/avdhesh15/cardiovascular-diseases-risk-prediction)
+
+---
+
+### EXPLANATION
+
+#### DETAILS OF THE DIFFERENT FEATURES
+
+| **Feature Name** | **Description** | **Type** | **Values/Range** |
+|------------------|-----------------|----------|------------------|
+| General_Health | "Would you say that in general your health is—" | Categorical | [Poor, Fair, Good, Very Good, Excellent] |
+| Checkup | "About how long has it been since you last visited a doctor for a routine checkup?" | Categorical | [Never, 5 or more years ago, Within last 5 years, Within last 2 years, Within the last year] |
+| Exercise | "Did you participate in any physical activities like running, walking, or gardening?" | Categorical | [Yes, No] |
+| Skin_Cancer | Respondents that reported having skin cancer | Categorical | [Yes, No] |
+| Other_Cancer | Respondents that reported having any other types of cancer | Categorical | [Yes, No] |
+| Depression | Respondents that reported having a depressive disorder | Categorical | [Yes, No] |
+| Diabetes | Respondents that reported having diabetes. If yes, specify the type. | Categorical | [Yes, No, No pre-diabetes or borderline diabetes, Yes but female told only during pregnancy] |
+| Arthritis | Respondents that reported having arthritis | Categorical | [Yes, No] |
+| Sex | Respondent's gender | Categorical | [Yes, No] |
+| Age_Category | Respondent's age range | Categorical | ['18-24', '25-34', '35-44', '45-54', '55-64', '65-74', '75-80', '80+'] |
+| Height_(cm) | Respondent's height in cm | Numerical | Measured in cm |
+| Weight_(kg) | Respondent's weight in kg | Numerical | Measured in kg |
+| BMI | Respondent's Body Mass Index in kg/cm² | Numerical | Measured in kg/cm² |
+| Smoking_History| Respondent's smoking history | Categorical | [Yes, No] |
+| Alcohol_Consumption | Number of days of alcohol consumption in a month | Numerical | Integer values |
+| Fruit_Consumption | Number of servings of fruit consumed in a month | Numerical | Integer values |
+| Green_Vegetables_Consumption | Number of servings of green vegetables consumed in a month | Numerical | Integer values |
+| FriedPotato_Consumption | Number of servings of fried potato consumed in a month | Numerical | Integer values |
+
+---
+
+#### WHAT I HAVE DONE
+
+=== "Step 1"
+
+ Exploratory Data Analysis
+
+ - Summary statistics
+ - Data visualization for numerical feature distributions
+ - Target splits for categorical features
+
+=== "Step 2"
+
+ Data cleaning and Preprocessing
+
+ - Regrouping rare categories
+ - Categorical feature encoding
+ - Outlier clipping for numerical features
+
+=== "Step 3"
+
+ Feature engineering and selection
+
+ - Combining original features based on domain knowledge
+ - Discretizing numerical features
+
+=== "Step 4"
+
+ Modeling
+
+ - Holdout dataset created or model testing
+ - Models trained: Logistic Regression, Decision Tree, Random Forest, AdaBoost, HistGradient Boosting, Multi-Layer Perceptron
+ - Class imbalance handled through:
+ - Class weights, when supported by model architecture
+ - Threshold tuning using TunedThresholdClassifierCV
+ - Metric for model-tuning: F2-score (harmonic weighted mean of precision and recall, with twice the weightage for recall)
+
+=== "Step 5"
+
+ Result analysis
+
+ - Confusion matrix using predictions made on holdout test set
+
+---
+
+#### PROJECT TRADE-OFFS AND SOLUTIONS
+
+=== "Trade Off 1"
+
+ **Accuracy vs Recall:**
+ Data is extremely imbalanced, with only ~8% representing the positive class. This makes accuracy unsuitable as a metric for our problem. It is critical to correctly predict all the positive samples, due to which we must focus on recall. However, this lowers the overall accuracy since some negative samples may be predicted as positive.
+
+ - **Solution**: Prediction threshold for models is tuned using F2-score to create a balance between precision and recall, with more importance given to recall. This maintains overall accuracy at an acceptable level while boosting recall.
+
+---
+
+### SCREENSHOTS
+
+!!! success "Project workflow"
+
+ ``` mermaid
+ graph LR
+ A[Start] --> B{Error?};
+ B -->|Yes| C[Hmm...];
+ C --> D[Debug];
+ D --> B;
+ B ---->|No| E[Yay!];
+ ```
+
+??? tip "Numerical feature distributions"
+
+ === "Height_(cm)"
+ 
+
+ === "Weight_(kg)"
+ 
+
+ === "BMI"
+ 
+
+ === "Alcohol"
+ 
+
+ === "Fruit"
+ 
+
+ === "Vegetable"
+ 
+
+ === "Fried Patato"
+ 
+
+??? tip "Correlations"
+
+ === "Pearson"
+ 
+
+ === "Spearman's Rank"
+ 
+
+ === "Kendall-Tau"
+ 
+
+---
+
+### MODELS USED AND THEIR ACCURACIES
+
+| Model + Feature set | Accuracy (%) | Recall (%) |
+|-------|----------|-----|
+| Logistic Regression + Original | 76.29 | 74.21 |
+| Logistic Regression + Extended | 76.27 | 74.41 |
+| Logistic Regression + Selected | 72.66 | 78.09 |
+| Decision Tree + Original | 72.76 | 78.61 |
+| Decision Tree + Extended | 74.09 | 76.69 |
+| Decision Tree + Selected | 75.52 | 73.61 |
+| Random Forest + Original | 73.97 | 77.33 |
+| Random Forest + Extended | 74.10 | 76.61 |
+| Random Forest + Selected | 74.80 | 74.05 |
+| AdaBoost + Original | 76.03 | 74.49 |
+| AdaBoost + Extended | 74.99 | 76.25 |
+| AdaBoost + Selected | 74.76 | 75.33 |
+| Multi-Layer Perceptron + Original | **76.91** | 72.81 |
+| **Multi-Layer Perceptron + Extended** | 73.26 | **79.01** |
+| Multi-Layer Perceptron + Selected | 74.86 | 75.05 |
+| Hist-Gradient Boosting + Original | 75.98 | 73.49 |
+| Hist-Gradient Boosting + Extended | 75.63 | 74.73 |
+| Hist-Gradient Boosting + Selected | 74.40 | 75.85 |
+
+#### MODELS COMPARISON GRAPHS
+
+!!! tip "Logistic Regression"
+
+ === "LR Original"
+ 
+
+ === "LR Extended"
+ 
+
+ === "LR Selected"
+ 
+
+??? tip "Decision Tree"
+
+ === "DT Original"
+ 
+
+ === "DT Extended"
+ 
+
+ === "DT Selected"
+ 
+
+??? tip "Random Forest"
+
+ === "RF Original"
+ 
+
+ === "RF Extended"
+ 
+
+ === "RF Selected"
+ 
+
+??? tip "Ada Boost"
+
+ === "AB Original"
+ 
+
+ === "AB Extended"
+ 
+
+ === "AB Selected"
+ 
+
+??? tip "Multi-Layer Perceptron"
+
+ === "MLP Original"
+ 
+
+ === "MLP Extended"
+ 
+
+ === "MLP Selected"
+ 
+
+??? tip "Hist-Gradient Boosting"
+
+ === "HGB Original"
+ 
+
+ === "HGB Extended"
+ 
+
+ === "HGB Selected"
+ 
+
+---
+
+### CONCLUSION
+
+#### WHAT YOU HAVE LEARNED
+
+!!! tip "Insights gained from the data"
+ - General Health, Age and Co-morbities (such as Diabetes & Arthritis) are the most indicative features for CVD risk.
+
+??? tip "Improvements in understanding machine learning concepts"
+ - Learned and implemented the concept of predicting probability and tuning the prediction threshold for more accurate results, compared to directly predicting with the default thresold for models.
+
+??? tip "Challenges faced and how they were overcome"
+ - Deciding the correct metric for evaluation of models due to imbalanced nature of the dataset. Since positive class is more important, Recall was used as the final metric for ranking models.
+ - F2-score was used to tune the threshold for models to maintain a balance between precision and recall, thereby maintaining overall accuracy.
+
+---
+
+#### USE CASES OF THIS MODEL
+
+=== "Application 1"
+
+ - Doctors can use it as a second opinion when assessing a new patient. Model trained on cases from previous patients can be used to predict the risk.
+
+=== "Application 2"
+
+ - People (patients in particular) can use this tool to track the risk of CVD based on their own lifestyle factors and take preventive measures when the risk is high.
+
+---
+
+#### FEATURES PLANNED BUT NOT IMPLEMENTED
+
+=== "Feature 1"
+
+ - Different implementations of gradient-boosting models such as XGBoost, CatBoost, LightGBM, etc. were not implemented since none of the tree ensemble models such as Random Forest, AdaBoost or Hist-Gradient Boosting were among the best performers. Hence, avoid additional dependencies based on such models.
+
diff --git a/docs/ML/projects/cardiovascular_disease_prediction/README.md b/docs/ML/projects/cardiovascular_disease_prediction/README.md
deleted file mode 100644
index 48136f89..00000000
--- a/docs/ML/projects/cardiovascular_disease_prediction/README.md
+++ /dev/null
@@ -1,222 +0,0 @@
-# Cardiovascular Disease Prediction
-
-## AIM
-
-To predict the risk of cardiovascular disease based on lifestyle factors.
-
-## DATASET LINK
-
-[Cardiovascular Diseases Risk Prediction Dataset - Kaggle](https://www.kaggle.com/datasets/alphiree/cardiovascular-diseases-risk-prediction-dataset)
-
-## MY NOTEBOOK LINK
-
-[Cardiovascular Disease Risk Prediction](https://www.kaggle.com/code/sid4ds/cardiovascular-disease-risk-prediction)
-
-## DESCRIPTION
-
-* What is the requirement of the project?
-This project aims to predict the risk of cardivascular diseases (CVD) based on data provided by people about their lifestyle factors. Predicting the risk in advance can minimize cases which reach a terminal stage.
-
-* Why is it necessary?
-CVD is one of the leading causes of death globally. Using machine learning models to predict risk of CVD can be an important tool in helping the people affected by it.
-
-* How is it beneficial and used?
- * Doctors can use it as a second opinion to support their diagnosis. It also acts as a fallback mechanism in rare cases where the diagnosis is not obvious.
- * People (patients in particular) can track their risk of CVD based on their own lifestyle and schedule an appointment with a doctor in advance to mitigate the risk.
-
-* How did you start approaching this project? (Initial thoughts and planning)
- * Going through previous research and articles related to the problem.
- * Data exploration to understand the features. Using data visualization to check their distributions.
- * Identifying key metrics for the problem based on ratio of target classes.
- * Feature engineering and selection based on EDA.
- * Setting up a framework for easier testing of multiple models.
- * Analysing results of models using confusion matrix.
-
-* Mention any additional resources used (blogs, books, chapters, articles, research papers, etc.).
- * Research paper: [Integrated Machine Learning Model for Comprehensive Heart Disease Risk Assessment Based on Multi-Dimensional Health Factors](https://eajournals.org/ejcsit/vol11-issue-3-2023/integrated-machine-learning-model-for-comprehensive-heart-disease-risk-assessment-based-on-multi-dimensional-health-factors/)
- * Public notebook: [Cardiovascular-Diseases-Risk-Prediction](https://www.kaggle.com/code/avdhesh15/cardiovascular-diseases-risk-prediction)
-
-## EXPLANATION
-
-### DETAILS OF THE DIFFERENT FEATURES
-
-Dataset consists of self-reported entries by people based on prompts for each factor. The prompts and type for each feature are:
-
-1. **General_Health**: "Would you say that in general your health is—"
-Categorical: [Poor, Fair, Good, Very Good, Excellent]
-2. **Checkup**: "About how long has it been since you last visited a doctor for a routine checkup?"
-Categorical: [Never, 5 or more years ago, Within last 5 years, Within last 2 years, Within the last year]
-3. **Exercise**: "During the past month, other than your regular job, did you participate in any physical activities or exercises such as running, calisthenics, golf, gardening, or walking for exercise?"
-Categorical: [Yes, No]
-4. **Skin_Cancer**: Respondents that reported having skin cancer
-Categorical: [Yes, No]
-5. **Other_Cancer**: Respondents that reported having any other types of cancer
-Categorical: [Yes, No]
-6. **Depression**: Respondents that reported having a depressive disorder (including depression, major depression, dysthymia, or minor depression)
-Categorical: [Yes, No]
-7. **Diabetes**: Respondents that reported having a diabetes. If yes, what type of diabetes it is/was.
-Categorical: [Yes, No, No pre-diabetes or borderline diabetes, Yes but female told only during pregnancy]
-8. **Arthritis**: Respondents that reported having an Arthritis
-Categorical: [Yes, No]
-9. **Sex**: Respondent's Gender
-Categorical: [Yes, No]
-10. **Age_Category**: Respondent's age range
-Categorical: '18-24' to '80+'
-11. **Height_(cm)**: Respondent's height in cm
-Numerical
-12. **Weight_(kg)**: Respondent's weight in kg
-Numerical
-13. **BMI**: Respondent's Body Mass Index in kg/cm^2
-Numerical
-14. **Smoking_History**: Respondent's that reported having a history of smoking
-Categorical: [Yes, No]
-15. **Alcohol_Consumption**: Number of days of alcohol consumption in a month
-Numerical
-16. **Fruit_Consumption**: Number of servings of fruit consumed in a month
-Numerical
-17. **Green_Vegetables_Consumption**: Number of servings of green vegetables consumed in a month
-Numerical
-18. **FriedPotato_Consumption**: Number of servings of fried potato consumed in a month
-Numerical
-
-### WHAT I HAVE DONE
-
-1. **Exploratory Data Analysis**:
- * Summary statistics
- * Data visualization for numerical feature distributions
- * Target splits for categorical features
-2. **Data cleaning and Preprocessing**:
- * Regrouping rare categories
- * Categorical feature encoding
- * Outlier clipping for numerical features
-3. **Feature engineering and selection**:
- * Combining original features based on domain knowledge
- * Discretizing numerical features
-4. **Modeling**:
- * Holdout dataset created or model testing
- * Models trained: Logistic Regression, Decision Tree, Random Forest, AdaBoost, HistGradient Boosting, Multi-Layer Perceptron
- * Class imbalance handled through:
- * class weights, when supported by model architecture
- * threshold tuning using TunedThresholdClassifierCV
- * Metric for model-tuning: F2-score (harmonic weighted mean of precision and recall, with twice the weightage for recall)
-5. **Result analysis**: confusion matrix using predictions made on holdout test set
-
-### PROJECT TRADE-OFFS AND SOLUTIONS
-
-**Accuracy vs Recall**
-Data is extremely imbalanced, with only ~8% representing the positive class. This makes accuracy unsuitable as a metric for our problem. It is critical to correctly predict all the positive samples, due to which we must focus on recall. However, this lowers the overall accuracy since some negative samples may be predicted as positive.
-
-* **Solution**: Prediction threshold for models is tuned using F2-score to create a balance between precision and recall, with more importance given to recall. This maintains overall accuracy at an acceptable level while boosting recall.
-
-### LIBRARIES NEEDED
-
-Libraries required for the project:
-
-* pandas
-* numpy
-* scikit-learn (>=1.5.0 for TunedThresholdClassifierCV)
-* matplotlib
-* seaborn
-* joblib
-
-### SCREENSHOTS
-
-**Numerical feature distributions**:
-
-
-
-
-
-
-
-
-**Correlations**:
-Pearson:
-
-
-Spearman's rank:
-
-
-Kendall-Tau:
-
-
-### MODELS USED AND THEIR ACCURACIES
-
-| Model + Feature set | Accuracy (%) | Recall (%) |
-|-------|----------|-----|
-| Logistic Regression + Original | 76.29 | 74.21 |
-| Logistic Regression + Extended | 76.27 | 74.41 |
-| Logistic Regression + Selected | 72.66 | 78.09 |
-| Decision Tree + Original | 72.76 | 78.61 |
-| Decision Tree + Extended | 74.09 | 76.69 |
-| Decision Tree + Selected | 75.52 | 73.61 |
-| Random Forest + Original | 73.97 | 77.33 |
-| Random Forest + Extended | 74.10 | 76.61 |
-| Random Forest + Selected | 74.80 | 74.05 |
-| AdaBoost + Original | 76.03 | 74.49 |
-| AdaBoost + Extended | 74.99 | 76.25 |
-| AdaBoost + Selected | 74.76 | 75.33 |
-| Multi-Layer Perceptron + Original | **76.91** | 72.81 |
-| **Multi-Layer Perceptron + Extended** | 73.26 | **79.01** |
-| Multi-Layer Perceptron + Selected | 74.86 | 75.05 |
-| Hist-Gradient Boosting + Original | 75.98 | 73.49 |
-| Hist-Gradient Boosting + Extended | 75.63 | 74.73 |
-| Hist-Gradient Boosting + Selected | 74.40 | 75.85 |
-
-### MODELS COMPARISON GRAPHS
-
-* **Logistic Regression**:
-  
-
-* **Decision Tree**:
-  
-
-* **Random Forest**:
-  
-
-* **AdaBoost**:
-  
-
-* **Multi-Layer Perceptron**:
-  
-
-* **Hist-Gradient Boosting**:
-  
-
-## CONCLUSION
-
-### WHAT YOU HAVE LEARNED
-
-* Insights gained from the data:
-General Health, Age and Co-morbities (such as Diabetes & Arthritis) are the most indicative features for CVD risk.
-* Improvements in understanding machine learning concepts:
-Learned and implemented the concept of predicting probability and tuning the prediction threshold for more accurate results, compared to directly predicting with the default thresold for models.
-* Challenges faced and how they were overcome:
-Deciding the correct metric for evaluation of models due to imbalanced nature of the dataset. Since positive class is more important, Recall was used as the final metric for ranking models. F2-score was used to tune the threshold for models to maintain a balance between precision and recall, thereby maintaining overall accuracy.
-
-### USE CASES OF THIS MODEL
-
-1. Doctors can use it as a second opinion when assessing a new patient. Model trained on cases from previous patients can be used to predict the risk.
-2. People (patients in particular) can use this tool to track the risk of CVD based on their own lifestyle factors and take preventive measures when the risk is high.
-
-### HOW TO INTEGRATE THIS MODEL IN REAL WORLD
-
-* The model uses data based on lifestyle factors without using any private identifiers.
-* A simple web interface using Streamlit can be used to input required data from the user.
-* Input data is preprocessed according to the steps taken before model training and the extended features are created.
-* Best model from the experiments i.e., "Multi-Layer Perceptron + Extended features" can be exported as a joblib file and loaded into Streamlit interface for inference.
-* Results on new data can be stored to monitor if the model maintains high recall, as intended in the experiments.
-
-### FEATURES PLANNED BUT NOT IMPLEMENTED
-
-* Different implementations of gradient-boosting models such as XGBoost, CatBoost, LightGBM, etc. were not implemented since none of the tree ensemble models such as Random Forest, AdaBoost or Hist-Gradient Boosting were among the best performers. Hence, avoid additional dependencies based on such models.
-
-### NAME
-
-**Siddhant Tiwari**
-
-[](https://www.linkedin.com/in/siddhant-tiwari-ds)
-
-#### Happy Coding 🧑💻
-
-### Show some ❤️ by 🌟 this repository!
diff --git a/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_adaboost_extended.png b/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_adaboost_extended.png
deleted file mode 100644
index e058c2ef..00000000
Binary files a/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_adaboost_extended.png and /dev/null differ
diff --git a/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_adaboost_original.png b/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_adaboost_original.png
deleted file mode 100644
index f0f8ff12..00000000
Binary files a/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_adaboost_original.png and /dev/null differ
diff --git a/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_adaboost_selected.png b/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_adaboost_selected.png
deleted file mode 100644
index 8f85eff3..00000000
Binary files a/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_adaboost_selected.png and /dev/null differ
diff --git a/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_decisiontree_extended.png b/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_decisiontree_extended.png
deleted file mode 100644
index 7d075fb1..00000000
Binary files a/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_decisiontree_extended.png and /dev/null differ
diff --git a/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_decisiontree_original.png b/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_decisiontree_original.png
deleted file mode 100644
index f615b5b8..00000000
Binary files a/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_decisiontree_original.png and /dev/null differ
diff --git a/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_decisiontree_selected.png b/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_decisiontree_selected.png
deleted file mode 100644
index 77b91e1b..00000000
Binary files a/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_decisiontree_selected.png and /dev/null differ
diff --git a/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_histgradient_extended.png b/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_histgradient_extended.png
deleted file mode 100644
index 79a4096c..00000000
Binary files a/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_histgradient_extended.png and /dev/null differ
diff --git a/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_histgradient_original.png b/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_histgradient_original.png
deleted file mode 100644
index 487f3874..00000000
Binary files a/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_histgradient_original.png and /dev/null differ
diff --git a/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_histgradient_selected.png b/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_histgradient_selected.png
deleted file mode 100644
index 1ac0af56..00000000
Binary files a/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_histgradient_selected.png and /dev/null differ
diff --git a/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_logistic_extended.png b/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_logistic_extended.png
deleted file mode 100644
index 4ebfde6e..00000000
Binary files a/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_logistic_extended.png and /dev/null differ
diff --git a/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_logistic_original.png b/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_logistic_original.png
deleted file mode 100644
index c39d0e4e..00000000
Binary files a/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_logistic_original.png and /dev/null differ
diff --git a/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_logistic_selected.png b/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_logistic_selected.png
deleted file mode 100644
index bfc03a2e..00000000
Binary files a/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_logistic_selected.png and /dev/null differ
diff --git a/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_mlpnn_extended.png b/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_mlpnn_extended.png
deleted file mode 100644
index 5cfa4b55..00000000
Binary files a/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_mlpnn_extended.png and /dev/null differ
diff --git a/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_mlpnn_original.png b/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_mlpnn_original.png
deleted file mode 100644
index cb5bd3cc..00000000
Binary files a/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_mlpnn_original.png and /dev/null differ
diff --git a/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_mlpnn_selected.png b/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_mlpnn_selected.png
deleted file mode 100644
index 41c8d08d..00000000
Binary files a/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_mlpnn_selected.png and /dev/null differ
diff --git a/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_randomforest_extended.png b/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_randomforest_extended.png
deleted file mode 100644
index 7415fc8d..00000000
Binary files a/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_randomforest_extended.png and /dev/null differ
diff --git a/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_randomforest_original.png b/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_randomforest_original.png
deleted file mode 100644
index 325241aa..00000000
Binary files a/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_randomforest_original.png and /dev/null differ
diff --git a/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_randomforest_selected.png b/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_randomforest_selected.png
deleted file mode 100644
index d49df9d1..00000000
Binary files a/docs/ML/projects/cardiovascular_disease_prediction/assets/cm_randomforest_selected.png and /dev/null differ
diff --git a/docs/ML/projects/cardiovascular_disease_prediction/assets/dist_alcohol.png b/docs/ML/projects/cardiovascular_disease_prediction/assets/dist_alcohol.png
deleted file mode 100644
index 24281ab1..00000000
Binary files a/docs/ML/projects/cardiovascular_disease_prediction/assets/dist_alcohol.png and /dev/null differ
diff --git a/docs/ML/projects/cardiovascular_disease_prediction/assets/dist_bmi.png b/docs/ML/projects/cardiovascular_disease_prediction/assets/dist_bmi.png
deleted file mode 100644
index 1b9e1f72..00000000
Binary files a/docs/ML/projects/cardiovascular_disease_prediction/assets/dist_bmi.png and /dev/null differ
diff --git a/docs/ML/projects/cardiovascular_disease_prediction/assets/dist_friedpotato.png b/docs/ML/projects/cardiovascular_disease_prediction/assets/dist_friedpotato.png
deleted file mode 100644
index d4eabaef..00000000
Binary files a/docs/ML/projects/cardiovascular_disease_prediction/assets/dist_friedpotato.png and /dev/null differ
diff --git a/docs/ML/projects/cardiovascular_disease_prediction/assets/dist_fruit.png b/docs/ML/projects/cardiovascular_disease_prediction/assets/dist_fruit.png
deleted file mode 100644
index d94eb04b..00000000
Binary files a/docs/ML/projects/cardiovascular_disease_prediction/assets/dist_fruit.png and /dev/null differ
diff --git a/docs/ML/projects/cardiovascular_disease_prediction/assets/dist_height.png b/docs/ML/projects/cardiovascular_disease_prediction/assets/dist_height.png
deleted file mode 100644
index 98d6ba05..00000000
Binary files a/docs/ML/projects/cardiovascular_disease_prediction/assets/dist_height.png and /dev/null differ
diff --git a/docs/ML/projects/cardiovascular_disease_prediction/assets/dist_vegetable.png b/docs/ML/projects/cardiovascular_disease_prediction/assets/dist_vegetable.png
deleted file mode 100644
index c2e2c1db..00000000
Binary files a/docs/ML/projects/cardiovascular_disease_prediction/assets/dist_vegetable.png and /dev/null differ
diff --git a/docs/ML/projects/cardiovascular_disease_prediction/assets/dist_weight.png b/docs/ML/projects/cardiovascular_disease_prediction/assets/dist_weight.png
deleted file mode 100644
index fe8f924d..00000000
Binary files a/docs/ML/projects/cardiovascular_disease_prediction/assets/dist_weight.png and /dev/null differ
diff --git a/docs/ML/projects/cardiovascular_disease_prediction/assets/kendall_correlation.png b/docs/ML/projects/cardiovascular_disease_prediction/assets/kendall_correlation.png
deleted file mode 100644
index a7495993..00000000
Binary files a/docs/ML/projects/cardiovascular_disease_prediction/assets/kendall_correlation.png and /dev/null differ
diff --git a/docs/ML/projects/cardiovascular_disease_prediction/assets/pearson_correlation.png b/docs/ML/projects/cardiovascular_disease_prediction/assets/pearson_correlation.png
deleted file mode 100644
index 7b3847d9..00000000
Binary files a/docs/ML/projects/cardiovascular_disease_prediction/assets/pearson_correlation.png and /dev/null differ
diff --git a/docs/ML/projects/cardiovascular_disease_prediction/assets/spearman_correlation.png b/docs/ML/projects/cardiovascular_disease_prediction/assets/spearman_correlation.png
deleted file mode 100644
index 361f0b4f..00000000
Binary files a/docs/ML/projects/cardiovascular_disease_prediction/assets/spearman_correlation.png and /dev/null differ
diff --git a/docs/ML/projects/health_insurance_cross_sell_prediction.md b/docs/ML/projects/health_insurance_cross_sell_prediction.md
new file mode 100644
index 00000000..b81a28d6
--- /dev/null
+++ b/docs/ML/projects/health_insurance_cross_sell_prediction.md
@@ -0,0 +1,305 @@
+# Health Insurance Cross-Sell Prediction
+
+### AIM
+
+To predict whether a Health Insurance customer would be interested in buying Vehicle Insurance.
+
+### DATASET LINK
+
+[https://www.kaggle.com/datasets/anmolkumar/health-insurance-cross-sell-prediction](https://www.kaggle.com/datasets/anmolkumar/health-insurance-cross-sell-prediction)
+
+### MY NOTEBOOK LINK
+
+[https://www.kaggle.com/code/sid4ds/insurance-cross-sell-prediction-eda-modeling](https://www.kaggle.com/code/sid4ds/insurance-cross-sell-prediction-eda-modeling)
+
+### LIBRARIES NEEDED
+
+??? quote "LIBRARIES USED"
+
+ - pandas
+ - numpy
+ - scikit-learn (>=1.5.0 for TunedThresholdClassifierCV)
+ - xgboost
+ - catboost
+ - lightgbm
+ - matplotlib
+ - seaborn
+ - joblib
+
+---
+
+### DESCRIPTION
+
+!!! info "Why is it necessary?"
+ - This project aims to predict the chances of cross-selling Vehicle insurance to existing Health insurance customers. This would be extremely helpful for companies because they can then accordingly plan communication strategy to reach out to those customers and optimise their business model and revenue.
+
+??? info "How did you start approaching this project? (Initial thoughts and planning)"
+ - Going through previous research and articles related to the problem.
+ - Data exploration to understand the features. Using data visualization to check their distributions.
+ - Identifying key metrics for the problem based on ratio of target classes - ROC-AUC & Matthew's Correlation Coefficient (MCC) instead of Accuracy.
+
+??? info "Mention any additional resources used (blogs, books, chapters, articles, research papers, etc.)."
+ - Feature Engineering: [Tutorial notebook](https://www.kaggle.com/code/milankalkenings/feature-engineering-tutorial)
+ - Public notebook: [Vehicle Insurance EDA and boosting models](https://www.kaggle.com/code/yashvi/vehicle-insurance-eda-and-boosting-models)
+
+---
+
+### EXPLANATION
+
+#### DETAILS OF THE DIFFERENT FEATURES
+
+| **Feature Name** | **Description** | **Type**| **Values/Range** |
+|------------------|-----------------|---------|------------------|
+| id| Unique ID for the customer | Numerical | Unique numerical values |
+| Gender | Binary gender of the customer | Binary | [0: Male, 1: Female] (or other binary representations as applicable) |
+| Age | Numerical age of the customer | Numerical | Measured in years|
+| Driving_License | Indicates if the customer has a Driving License| Binary | [0: No, 1: Yes] |
+| Region_Code | Unique code for the customer's region | Numerical | Unique numerical values |
+| Previously_Insured| Indicates if the customer already has Vehicle Insurance| Binary | [0: No, 1: Yes] |
+| Vehicle_Age | Age of the vehicle categorized as ordinal values | Categorical | [< 1 year, 1-2 years, > 2 years] |
+| Vehicle_Damage | Indicates if the vehicle was damaged in the past | Binary | [0: No, 1: Yes] |
+| Annual_Premium | Amount to be paid as premium over the year | Numerical | Measured in currency |
+| Policy_Sales_Channel | Anonymized code for the channel of customer outreach | Numerical | Unique numerical codes representing various channels |
+| Vintage | Number of days the customer has been associated with the company | Numerical | Measured in days |
+
+---
+
+#### WHAT I HAVE DONE
+
+=== "Step 1"
+
+ Exploratory Data Analysis
+
+ - Summary statistics
+ - Data visualization for numerical feature distributions
+ - Target splits for categorical features
+
+=== "Step 2"
+
+ Data cleaning and Preprocessing
+
+ - Removing duplicates
+ - Categorical feature encoding
+
+=== "Step 3"
+
+ Feature engineering and selection
+
+ - Discretizing numerical features
+ - Feature selection based on model-based feature importances and statistical tests.
+
+=== "Step 4"
+
+ Modeling
+
+ - Holdout dataset created or model testing
+ - Setting up a framework for easier testing of multiple models.
+ - Models trained: Logistic Regression, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Gaussian Naive-Bayes, Decision Tree, Random Forest, AdaBoost, Multi-Layer Perceptron, XGBoost, CatBoost, LightGBM
+ - Class imbalance handled through:
+ - Class weights, when supported by model architecture
+ - Threshold tuning using TunedThresholdClassifierCV
+ - Metric for model-tuning: F1-score (harmonic weighted mean of precision and recall)
+
+=== "Step 5"
+
+ Result analysis
+
+ - Predictions made on holdout test set
+ - Models compared based on classification report and chosen metrics: ROC-AUC and MCC.
+
+---
+
+#### PROJECT TRADE-OFFS AND SOLUTIONS
+
+=== "Trade Off 1"
+ Accuracy vs Recall & Precision
+
+ Data is heavily imbalanced, with only ~12% representing the positive class. This makes accuracy unsuitable as a metric for our problem. Our goal is to correctly predict all the positive samples, due to which we must focus on recall. However, this lowers the overall accuracy since some negative samples may be predicted as positive.
+
+ - **Solution**: Prediction threshold for models is tuned using F1-score to create a balance between precision and recall. This maintains overall accuracy at an acceptable level while boosting recall.
+
+---
+
+#### SCREENSHOTS
+
+!!! success "Project workflow"
+
+ ``` mermaid
+ graph LR
+ A[Start] --> B{Error?};
+ B -->|Yes| C[Hmm...];
+ C --> D[Debug];
+ D --> B;
+ B ---->|No| E[Yay!];
+ ```
+
+??? tip "Feature distributions (Univariate Analysis)"
+
+ === "Age"
+ 
+
+ === "License"
+ 
+
+ === "Region Code"
+ 
+
+ === "Previously Insured"
+ 
+
+ === "Vehical Age"
+ 
+
+ === "Vehical Damage"
+ 
+
+ === "Annual Premium"
+ 
+
+ === "Policy Channel"
+ 
+
+ === "Vintage"
+ 
+
+??? tip "Engineered Features"
+
+ === "Age Group"
+ 
+
+ === "Policy Group"
+ 
+
+??? tip "Feature Distributions (Bivariate Analysis)"
+
+ === "Pair Plots"
+ 
+
+ === "Spearman-Rank Correlation"
+ 
+
+ === "Point-Biserial Correlation"
+ 
+
+ === "Tetrachoric Correlation"
+ 
+
+??? tip "Feature Selection"
+
+ === "Point-Biserial Correlation"
+ 
+
+ === "ANOVA F-Test"
+ 
+
+ === "Tetrachoric Correlation"
+ 
+
+ === "Chi-Squared Test of Independence"
+ 
+
+ === "Mutual Information"
+ 
+
+ === "XGBoost Feature Importances"
+ 
+
+ === "Extra Trees Feature Importances"
+ 
+
+---
+
+#### MODELS USED AND THEIR PERFORMANCE
+
+ Best threshold after threshold tuning is also mentioned.
+
+| Model + Feature set | ROC-AUC | MCC | Best threshold |
+|:-------|:----------:|:-----:|:-----:|
+| Logistic Regression + Original | 0.8336 | 0.3671 | 0.65 |
+| Logistic Regression + Extended | 0.8456 | 0.3821 | 0.66 |
+| Logistic Regression + Reduced | 0.8455 | 0.3792 | 0.67 |
+| Logistic Regression + Minimal | 0.8177 | 0.3507 | 0.60 |
+| Linear DA + Original | 0.8326 | 0.3584 | 0.19 |
+| Linear DA + Extended | 0.8423 | 0.3785 | 0.18 |
+| Linear DA + Reduced | 0.8421 | 0.3768 | 0.18 |
+| Linear DA + Minimal | 0.8185 | 0.3473 | 0.15 |
+| Quadratic DA + Original | 0.8353 | 0.3779 | 0.45 |
+| Quadratic DA + Extended | 0.8418 | 0.3793 | 0.54 |
+| Quadratic DA + Reduced | 0.8422 | 0.3807 | 0.44 |
+| Quadratic DA + Minimal | 0.8212 | 0.3587 | 0.28 |
+| Gaussian Naive Bayes + Original | 0.8230 | 0.3879 | 0.78 |
+| Gaussian Naive Bayes + Extended | 0.8242 | 0.3914 | 0.13 |
+| Gaussian Naive Bayes + Reduced | 0.8240 | 0.3908 | 0.08 |
+| Gaussian Naive Bayes + Minimal | 0.8055 | 0.3605 | 0.15 |
+| K-Nearest Neighbors + Original | 0.7819 | 0.3461 | 0.09 |
+| K-Nearest Neighbors + Extended | 0.7825 | 0.3469 | 0.09 |
+| K-Nearest Neighbors + Reduced | 0.7710 | 0.3405 | 0.01 |
+| K-Nearest Neighbors + Minimal | 0.7561 | 0.3201 | 0.01 |
+| Decision Tree + Original | 0.8420 | 0.3925 | 0.67 |
+| Decision Tree + Extended | 0.8420 | 0.3925 | 0.67 |
+| Decision Tree + Reduced | 0.8419 | 0.3925 | 0.67 |
+| Decision Tree + Minimal | 0.8262 | 0.3683 | 0.63 |
+| Random Forest + Original | 0.8505 | 0.3824 | 0.70 |
+| Random Forest + Extended | 0.8508 | 0.3832 | 0.70 |
+| Random Forest + Reduced | 0.8508 | 0.3820 | 0.71 |
+| Random Forest + Minimal | 0.8375 | 0.3721 | 0.66 |
+| Extra-Trees + Original | 0.8459 | 0.3770 | 0.70 |
+| Extra-Trees + Extended | 0.8504 | 0.3847 | 0.71 |
+| Extra-Trees + Reduced | 0.8515 | 0.3836 | 0.72 |
+| Extra-Trees + Minimal | 0.8337 | 0.3682 | 0.67 |
+| AdaBoost + Original | 0.8394 | 0.3894 | 0.83 |
+| AdaBoost + Extended | 0.8394 | 0.3894 | 0.83 |
+| AdaBoost + Reduced | 0.8404 | 0.3839 | 0.84 |
+| AdaBoost + Minimal | 0.8269 | 0.3643 | 0.86 |
+| Multi-Layer Perceptron + Original | 0.8512 | 0.3899 | 0.22 |
+| Multi-Layer Perceptron + Extended | 0.8528 | 0.3865 | 0.23 |
+| Multi-Layer Perceptron + Reduced | 0.8517 | 0.3892 | 0.23 |
+| Multi-Layer Perceptron + Minimal | 0.8365 | 0.3663 | 0.21 |
+| XGBoost + Original | 0.8585 | 0.3980 | 0.68 |
+| XGBoost + Extended | 0.8585 | 0.3980 | 0.68 |
+| XGBoost + Reduced | 0.8584 | 0.3967 | 0.68 |
+| XGBoost + Minimal | 0.8459 | 0.3765 | 0.66 |
+| CatBoost + Original | 0.8579 | 0.3951 | 0.46 |
+| CatBoost + Extended | 0.8578 | 0.3981 | 0.45 |
+| CatBoost + Reduced | 0.8577 | 0.3975 | 0.45 |
+| CatBoost + Minimal | 0.8449 | 0.3781 | 0.42 |
+| LightGBM + Original | 0.8587 | 0.3978 | 0.67 |
+| LightGBM + Extended | 0.8587 | 0.3976 | 0.67 |
+| **LightGBM + Reduced** | **0.8587** | **0.3983** | 0.67 |
+| LightGBM + Minimal | 0.8462 | 0.3753 | 0.66 |
+
+---
+
+### CONCLUSION
+
+#### WHAT YOU HAVE LEARNED
+
+!!! tip "Insights gained from the data"
+ 1. *Previously_Insured*, *Vehicle_Damage*, *Policy_Sales_Channel* and *Age* are the most informative features for predicting cross-sell probability.
+ 2. *Vintage* and *Driving_License* have no predictive power. They are not included in the best model.
+
+??? tip "Improvements in understanding machine learning concepts"
+ 1. Implemented threshold-tuning for more accurate results.
+ 2. Researched and utilized statistical tests for feature selection.
+
+??? tip "Challenges faced and how they were overcome"
+ 1. Shortlisting the apropriate statistical test for bivariate analysis and feature selection.
+ 2. Deciding the correct metric for evaluation of models due to imbalanced nature of the dataset.
+ 3. F1-score was used for threshold-tuning. ROC-AUC score and MCC were used for model comparison.
+
+---
+
+#### USE CASES OF THIS MODEL
+
+=== "Application 1"
+
+ - Companies can use customer data to predict which customers to target for cross-sell marketing. This saves cost and effort for the company, and protects uninterested customers from unnecessary marketing calls.
+
+---
+
+#### FEATURES PLANNED BUT NOT IMPLEMENTED
+
+=== "Feature 1"
+
+ - Complex model-ensembling through stacking or hill-climbing was not implemented due to significantly longer training time.
+
diff --git a/docs/ML/projects/insurance_cross_sell_prediction/README.md b/docs/ML/projects/insurance_cross_sell_prediction/README.md
deleted file mode 100644
index 43e05ec5..00000000
--- a/docs/ML/projects/insurance_cross_sell_prediction/README.md
+++ /dev/null
@@ -1,235 +0,0 @@
-# Health Insurance Cross-sell Prediction
-
-## AIM
-
-To predict whether a Health Insurance customer would be interested in buying Vehicle Insurance.
-
-## DATASET LINK
-
-[Health Insurance Cross Sell Prediction Dataset - Kaggle](https://www.kaggle.com/datasets/anmolkumar/health-insurance-cross-sell-prediction)
-
-## MY NOTEBOOK LINK
-
-[Insurance Cross-sell Prediction: EDA + Modeling](https://www.kaggle.com/code/sid4ds/insurance-cross-sell-prediction-eda-modeling)
-
-## DESCRIPTION
-
-* Why is the project necessary?
-This project aims to predict the chances of cross-selling Vehicle insurance to existing Health insurance customers. This would be extremely helpful for companies because they can then accordingly plan communication strategy to reach out to those customers and optimise their business model and revenue.
-
-* How did you start approaching this project? (Initial thoughts and planning)
- * Going through previous research and articles related to the problem.
- * Data exploration to understand the features. Using data visualization to check their distributions.
- * Identifying key metrics for the problem based on ratio of target classes - ROC-AUC & Matthew's Correlation Coefficient (MCC) instead of Accuracy.
-
-* Mention any additional resources used (blogs, books, chapters, articles, research papers, etc.).
- * Feature Engineering: [Tutorial notebook](https://www.kaggle.com/code/milankalkenings/feature-engineering-tutorial)
- * Public notebook: [Vehicle Insurance EDA and boosting models](https://www.kaggle.com/code/yashvi/vehicle-insurance-eda-and-boosting-models)
-
-## EXPLANATION
-
-### DETAILS OF THE DIFFERENT FEATURES
-
-1. id: numerical unique ID for the customer
-2. Gender: binary gender of the customer
-3. Age: numerical age of the customer
-4. Driving_License: binary - 0: no DL, 1: has DL
-5. Region_Code: numerical unique code for the region of the customer
-6. Previously_Insured: binary - 1: already has Vehicle Insurance, 0: doesn't have Vehicle Insurance
-7. Vehicle_Age: ordinal categorical age of the vehicle
-8. Vehicle_Damage: binary - 1: vehicle damaged in the past, 0: vehicle not damaged in the past.
-9. Annual_Premium: numerical amount to be paid as premium over the year
-10. Policy_Sales_Channel: numerical anonymized code for the channel of outreaching to the customer ie. Different Agents, Over Mail, Over Phone, In Person, etc.
-11. Vintage: numerical number of days the Customer has been associated with the company
-
-### WHAT I HAVE DONE
-
-1. **Exploratory Data Analysis**:
- * Summary statistics
- * Data visualization for numerical feature distributions
- * Target splits for categorical features
-2. **Data cleaning and Preprocessing**:
- * Removing duplicates
- * Categorical feature encoding
-3. **Feature engineering and selection**:
- * Discretizing numerical features
- * Feature selection based on model-based feature importances and statistical tests.
-4. **Modeling**:
- * Holdout dataset created or model testing
- * Setting up a framework for easier testing of multiple models.
- * Models trained: Logistic Regression, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Gaussian Naive-Bayes, Decision Tree, Random Forest, AdaBoost, Multi-Layer Perceptron, XGBoost, CatBoost, LightGBM
- * Class imbalance handled through:
- * class weights, when supported by model architecture
- * threshold tuning using TunedThresholdClassifierCV
- * Metric for model-tuning: F1-score (harmonic weighted mean of precision and recall)
-5. **Result analysis**:
- * Predictions made on holdout test set
- * Models compared based on classification report and chosen metrics: ROC-AUC and MCC.
-
-### PROJECT TRADE-OFFS AND SOLUTIONS
-
-**Accuracy vs Recall & Precision**
-Data is heavily imbalanced, with only ~12% representing the positive class. This makes accuracy unsuitable as a metric for our problem. Our goal is to correctly predict all the positive samples, due to which we must focus on recall. However, this lowers the overall accuracy since some negative samples may be predicted as positive.
-
-* **Solution**: Prediction threshold for models is tuned using F1-score to create a balance between precision and recall. This maintains overall accuracy at an acceptable level while boosting recall.
-
-### LIBRARIES NEEDED
-
-Libraries required for the project:
-
-* pandas
-* numpy
-* scikit-learn (>=1.5.0 for TunedThresholdClassifierCV)
-* xgboost
-* catboost
-* lightgbm
-* matplotlib
-* seaborn
-* joblib
-
-### SCREENSHOTS
-
-**Feature distributions (Univariate analysis)**:
-
-
-
-
-
-
-
-
-
-
-**Engineered features**:
-
-
-
-**Bivariate analysis**:
-Pairplots:
-
-Spearman-rank correlation:
-
-Point-Biserial correlation:
-
-Tetrachoric correlation:
-
-
-**Feature selection**:
-Point-biserial correlation:
-
-ANOVA F-test:
-
-Tetrachoric correlation:
-
-Chi-squared test of independence:
-
-Mutual information:
-
-XGBoost feature importances:
-
-ExtraTrees feature importances:
-
-
-### MODELS USED AND THEIR PERFORMANCE
-
-(best threshold after thresholdtuning is also mentioned)
-
-| Model + Feature set | ROC-AUC | MCC | Best threshold |
-|:-------|:----------:|:-----:|:-----:|
-| Logistic Regression + Original | 0.8336 | 0.3671 | 0.65 |
-| Logistic Regression + Extended | 0.8456 | 0.3821 | 0.66 |
-| Logistic Regression + Reduced | 0.8455 | 0.3792 | 0.67 |
-| Logistic Regression + Minimal | 0.8177 | 0.3507 | 0.60 |
-| Linear DA + Original | 0.8326 | 0.3584 | 0.19 |
-| Linear DA + Extended | 0.8423 | 0.3785 | 0.18 |
-| Linear DA + Reduced | 0.8421 | 0.3768 | 0.18 |
-| Linear DA + Minimal | 0.8185 | 0.3473 | 0.15 |
-| Quadratic DA + Original | 0.8353 | 0.3779 | 0.45 |
-| Quadratic DA + Extended | 0.8418 | 0.3793 | 0.54 |
-| Quadratic DA + Reduced | 0.8422 | 0.3807 | 0.44 |
-| Quadratic DA + Minimal | 0.8212 | 0.3587 | 0.28 |
-| Gaussian Naive Bayes + Original | 0.8230 | 0.3879 | 0.78 |
-| Gaussian Naive Bayes + Extended | 0.8242 | 0.3914 | 0.13 |
-| Gaussian Naive Bayes + Reduced | 0.8240 | 0.3908 | 0.08 |
-| Gaussian Naive Bayes + Minimal | 0.8055 | 0.3605 | 0.15 |
-| K-Nearest Neighbors + Original | 0.7819 | 0.3461 | 0.09 |
-| K-Nearest Neighbors + Extended | 0.7825 | 0.3469 | 0.09 |
-| K-Nearest Neighbors + Reduced | 0.7710 | 0.3405 | 0.01 |
-| K-Nearest Neighbors + Minimal | 0.7561 | 0.3201 | 0.01 |
-| Decision Tree + Original | 0.8420 | 0.3925 | 0.67 |
-| Decision Tree + Extended | 0.8420 | 0.3925 | 0.67 |
-| Decision Tree + Reduced | 0.8419 | 0.3925 | 0.67 |
-| Decision Tree + Minimal | 0.8262 | 0.3683 | 0.63 |
-| Random Forest + Original | 0.8505 | 0.3824 | 0.70 |
-| Random Forest + Extended | 0.8508 | 0.3832 | 0.70 |
-| Random Forest + Reduced | 0.8508 | 0.3820 | 0.71 |
-| Random Forest + Minimal | 0.8375 | 0.3721 | 0.66 |
-| Extra-Trees + Original | 0.8459 | 0.3770 | 0.70 |
-| Extra-Trees + Extended | 0.8504 | 0.3847 | 0.71 |
-| Extra-Trees + Reduced | 0.8515 | 0.3836 | 0.72 |
-| Extra-Trees + Minimal | 0.8337 | 0.3682 | 0.67 |
-| AdaBoost + Original | 0.8394 | 0.3894 | 0.83 |
-| AdaBoost + Extended | 0.8394 | 0.3894 | 0.83 |
-| AdaBoost + Reduced | 0.8404 | 0.3839 | 0.84 |
-| AdaBoost + Minimal | 0.8269 | 0.3643 | 0.86 |
-| Multi-Layer Perceptron + Original | 0.8512 | 0.3899 | 0.22 |
-| Multi-Layer Perceptron + Extended | 0.8528 | 0.3865 | 0.23 |
-| Multi-Layer Perceptron + Reduced | 0.8517 | 0.3892 | 0.23 |
-| Multi-Layer Perceptron + Minimal | 0.8365 | 0.3663 | 0.21 |
-| XGBoost + Original | 0.8585 | 0.3980 | 0.68 |
-| XGBoost + Extended | 0.8585 | 0.3980 | 0.68 |
-| XGBoost + Reduced | 0.8584 | 0.3967 | 0.68 |
-| XGBoost + Minimal | 0.8459 | 0.3765 | 0.66 |
-| CatBoost + Original | 0.8579 | 0.3951 | 0.46 |
-| CatBoost + Extended | 0.8578 | 0.3981 | 0.45 |
-| CatBoost + Reduced | 0.8577 | 0.3975 | 0.45 |
-| CatBoost + Minimal | 0.8449 | 0.3781 | 0.42 |
-| LightGBM + Original | 0.8587 | 0.3978 | 0.67 |
-| LightGBM + Extended | 0.8587 | 0.3976 | 0.67 |
-| **LightGBM + Reduced** | **0.8587** | **0.3983** | 0.67 |
-| LightGBM + Minimal | 0.8462 | 0.3753 | 0.66 |
-
-## CONCLUSION
-
-### WHAT YOU HAVE LEARNED
-
-* Insights gained from the data:
-
-1. *Previously_Insured*, *Vehicle_Damage*, *Policy_Sales_Channel* and *Age* are the most informative features for predicting cross-sell probability.
-2. *Vintage* and *Driving_License* have no predictive power. They are not included in the best model.
-
-* Improvements in understanding machine learning concepts:
-
-1. Implemented threshold-tuning for more accurate results.
-2. Researched and utilized statistical tests for feature selection.
-
-* Challenges faced and how they were overcome:
-
-1. Shortlisting the apropriate statistical test for bivariate analysis and feature selection.
-2. Deciding the correct metric for evaluation of models due to imbalanced nature of the dataset. F1-score was used for threshold-tuning. ROC-AUC score and MCC were used for model comparison.
-
-### USE CASES OF THIS MODEL
-
-Companies can use customer data to predict which customers to target for cross-sell marketing. This saves cost and effort for the company, and protects uninterested customers from unnecessary marketing calls.
-
-### HOW TO INTEGRATE THIS MODEL IN REAL WORLD
-
-* The model uses data based on customer data without any private identifiers.
-* Since *Vintage* was eliminated from the feature set, even new customers can be analyzed through this model.
-* Input data is preprocessed according to the steps taken before model training and the extended features are created.
-* The best model: *LightGBM + Reduced-features* can be used for inference.
-* Results on new data can be stored to monitor if the model maintains performance level. Once a significant number of new customers have been processed, system can be evaluated for model-drift and retrained if required.
-
-### FEATURES PLANNED BUT NOT IMPLEMENTED
-
-* Complex model-ensembling through stacking or hill-climbing was not implemented due to significantly longer training time.
-
-### NAME
-
-**Siddhant Tiwari**
-
-[](https://www.linkedin.com/in/siddhant-tiwari-ds)
-
-#### Happy Coding 🧑💻
-
-### Show some ❤️ by 🌟 this repository!
diff --git a/docs/ML/projects/insurance_cross_sell_prediction/assets/01_featdist_age.png b/docs/ML/projects/insurance_cross_sell_prediction/assets/01_featdist_age.png
deleted file mode 100644
index eba7b073..00000000
Binary files a/docs/ML/projects/insurance_cross_sell_prediction/assets/01_featdist_age.png and /dev/null differ
diff --git a/docs/ML/projects/insurance_cross_sell_prediction/assets/02_featdist_license.png b/docs/ML/projects/insurance_cross_sell_prediction/assets/02_featdist_license.png
deleted file mode 100644
index a4b74c75..00000000
Binary files a/docs/ML/projects/insurance_cross_sell_prediction/assets/02_featdist_license.png and /dev/null differ
diff --git a/docs/ML/projects/insurance_cross_sell_prediction/assets/03_featdist_regioncode.png b/docs/ML/projects/insurance_cross_sell_prediction/assets/03_featdist_regioncode.png
deleted file mode 100644
index fd6d4ad7..00000000
Binary files a/docs/ML/projects/insurance_cross_sell_prediction/assets/03_featdist_regioncode.png and /dev/null differ
diff --git a/docs/ML/projects/insurance_cross_sell_prediction/assets/04_featdist_previnsured.png b/docs/ML/projects/insurance_cross_sell_prediction/assets/04_featdist_previnsured.png
deleted file mode 100644
index bbed142e..00000000
Binary files a/docs/ML/projects/insurance_cross_sell_prediction/assets/04_featdist_previnsured.png and /dev/null differ
diff --git a/docs/ML/projects/insurance_cross_sell_prediction/assets/05_featdist_vehicleage.png b/docs/ML/projects/insurance_cross_sell_prediction/assets/05_featdist_vehicleage.png
deleted file mode 100644
index a1d1a3ee..00000000
Binary files a/docs/ML/projects/insurance_cross_sell_prediction/assets/05_featdist_vehicleage.png and /dev/null differ
diff --git a/docs/ML/projects/insurance_cross_sell_prediction/assets/06_featdist_vehicledamage.png b/docs/ML/projects/insurance_cross_sell_prediction/assets/06_featdist_vehicledamage.png
deleted file mode 100644
index 16c393d4..00000000
Binary files a/docs/ML/projects/insurance_cross_sell_prediction/assets/06_featdist_vehicledamage.png and /dev/null differ
diff --git a/docs/ML/projects/insurance_cross_sell_prediction/assets/07_featdist_premium.png b/docs/ML/projects/insurance_cross_sell_prediction/assets/07_featdist_premium.png
deleted file mode 100644
index 75b087a4..00000000
Binary files a/docs/ML/projects/insurance_cross_sell_prediction/assets/07_featdist_premium.png and /dev/null differ
diff --git a/docs/ML/projects/insurance_cross_sell_prediction/assets/08_featdist_policychannel.png b/docs/ML/projects/insurance_cross_sell_prediction/assets/08_featdist_policychannel.png
deleted file mode 100644
index 9d02cff3..00000000
Binary files a/docs/ML/projects/insurance_cross_sell_prediction/assets/08_featdist_policychannel.png and /dev/null differ
diff --git a/docs/ML/projects/insurance_cross_sell_prediction/assets/09_featdist_vintage.png b/docs/ML/projects/insurance_cross_sell_prediction/assets/09_featdist_vintage.png
deleted file mode 100644
index 71a7c690..00000000
Binary files a/docs/ML/projects/insurance_cross_sell_prediction/assets/09_featdist_vintage.png and /dev/null differ
diff --git a/docs/ML/projects/insurance_cross_sell_prediction/assets/10_featengg_agegroup.png b/docs/ML/projects/insurance_cross_sell_prediction/assets/10_featengg_agegroup.png
deleted file mode 100644
index d890b53a..00000000
Binary files a/docs/ML/projects/insurance_cross_sell_prediction/assets/10_featengg_agegroup.png and /dev/null differ
diff --git a/docs/ML/projects/insurance_cross_sell_prediction/assets/11_featengg_policygroup.png b/docs/ML/projects/insurance_cross_sell_prediction/assets/11_featengg_policygroup.png
deleted file mode 100644
index 798ab95a..00000000
Binary files a/docs/ML/projects/insurance_cross_sell_prediction/assets/11_featengg_policygroup.png and /dev/null differ
diff --git a/docs/ML/projects/insurance_cross_sell_prediction/assets/12_bivariate_pairplots.png b/docs/ML/projects/insurance_cross_sell_prediction/assets/12_bivariate_pairplots.png
deleted file mode 100644
index b6a1692f..00000000
Binary files a/docs/ML/projects/insurance_cross_sell_prediction/assets/12_bivariate_pairplots.png and /dev/null differ
diff --git a/docs/ML/projects/insurance_cross_sell_prediction/assets/13_bivariate_spearmancorr.png b/docs/ML/projects/insurance_cross_sell_prediction/assets/13_bivariate_spearmancorr.png
deleted file mode 100644
index 5342d7b8..00000000
Binary files a/docs/ML/projects/insurance_cross_sell_prediction/assets/13_bivariate_spearmancorr.png and /dev/null differ
diff --git a/docs/ML/projects/insurance_cross_sell_prediction/assets/14_bivariate_pointbiserial.png b/docs/ML/projects/insurance_cross_sell_prediction/assets/14_bivariate_pointbiserial.png
deleted file mode 100644
index 6e82d7c2..00000000
Binary files a/docs/ML/projects/insurance_cross_sell_prediction/assets/14_bivariate_pointbiserial.png and /dev/null differ
diff --git a/docs/ML/projects/insurance_cross_sell_prediction/assets/15_bivariate_tetrachoric.png b/docs/ML/projects/insurance_cross_sell_prediction/assets/15_bivariate_tetrachoric.png
deleted file mode 100644
index a70fa2f8..00000000
Binary files a/docs/ML/projects/insurance_cross_sell_prediction/assets/15_bivariate_tetrachoric.png and /dev/null differ
diff --git a/docs/ML/projects/insurance_cross_sell_prediction/assets/16_featselect_pointbiserial.png b/docs/ML/projects/insurance_cross_sell_prediction/assets/16_featselect_pointbiserial.png
deleted file mode 100644
index b976f955..00000000
Binary files a/docs/ML/projects/insurance_cross_sell_prediction/assets/16_featselect_pointbiserial.png and /dev/null differ
diff --git a/docs/ML/projects/insurance_cross_sell_prediction/assets/17_featselect_anova.png b/docs/ML/projects/insurance_cross_sell_prediction/assets/17_featselect_anova.png
deleted file mode 100644
index c14f97aa..00000000
Binary files a/docs/ML/projects/insurance_cross_sell_prediction/assets/17_featselect_anova.png and /dev/null differ
diff --git a/docs/ML/projects/insurance_cross_sell_prediction/assets/18_featselect_tetrachoric.png b/docs/ML/projects/insurance_cross_sell_prediction/assets/18_featselect_tetrachoric.png
deleted file mode 100644
index 1ab1bb25..00000000
Binary files a/docs/ML/projects/insurance_cross_sell_prediction/assets/18_featselect_tetrachoric.png and /dev/null differ
diff --git a/docs/ML/projects/insurance_cross_sell_prediction/assets/19_featselect_chisquared.png b/docs/ML/projects/insurance_cross_sell_prediction/assets/19_featselect_chisquared.png
deleted file mode 100644
index b57d9ed4..00000000
Binary files a/docs/ML/projects/insurance_cross_sell_prediction/assets/19_featselect_chisquared.png and /dev/null differ
diff --git a/docs/ML/projects/insurance_cross_sell_prediction/assets/20_featselect_mutualinfo.png b/docs/ML/projects/insurance_cross_sell_prediction/assets/20_featselect_mutualinfo.png
deleted file mode 100644
index 93a8fab0..00000000
Binary files a/docs/ML/projects/insurance_cross_sell_prediction/assets/20_featselect_mutualinfo.png and /dev/null differ
diff --git a/docs/ML/projects/insurance_cross_sell_prediction/assets/21_featselect_xgbfimp.png b/docs/ML/projects/insurance_cross_sell_prediction/assets/21_featselect_xgbfimp.png
deleted file mode 100644
index 67c28c4a..00000000
Binary files a/docs/ML/projects/insurance_cross_sell_prediction/assets/21_featselect_xgbfimp.png and /dev/null differ
diff --git a/docs/ML/projects/insurance_cross_sell_prediction/assets/22_featselect_xtreesfimp.png b/docs/ML/projects/insurance_cross_sell_prediction/assets/22_featselect_xtreesfimp.png
deleted file mode 100644
index 1e47d7a0..00000000
Binary files a/docs/ML/projects/insurance_cross_sell_prediction/assets/22_featselect_xtreesfimp.png and /dev/null differ
diff --git a/docs/ML/projects/used_cars_price_prediction.md b/docs/ML/projects/used_cars_price_prediction.md
new file mode 100644
index 00000000..e96484d8
--- /dev/null
+++ b/docs/ML/projects/used_cars_price_prediction.md
@@ -0,0 +1,227 @@
+# Used Cars Price Prediction
+
+### AIM
+
+Predicting the prices of used cars based on their configuration and previous usage.
+
+### DATASET LINK
+
+[https://www.kaggle.com/datasets/avikasliwal/used-cars-price-prediction](https://www.kaggle.com/datasets/avikasliwal/used-cars-price-prediction)
+
+### MY NOTEBOOK LINK
+
+[https://www.kaggle.com/code/sid4ds/used-cars-price-prediction/](https://www.kaggle.com/code/sid4ds/used-cars-price-prediction/)
+
+### LIBRARIES NEEDED
+
+??? quote "LIBRARIES USED"
+
+ - pandas
+ - numpy
+ - scikit-learn (>=1.5.0 required for Target Encoding)
+ - xgboost
+ - catboost
+ - matplotlib
+ - seaborn
+
+---
+
+### DESCRIPTION
+
+!!! info "Why is it necessary?"
+ - This project aims to predict the prices of used cars listed on an online marketplace based on their features and usage by previous owners. This model can be used by sellers to estimate an approximate price for their cars when they list them on the marketplace. Buyers can use the model to check if the listed price is fair when they decide to buy a used vehicle.
+
+??? info "How did you start approaching this project? (Initial thoughts and planning)"
+ - Researching previous projects and articles related to the problem.
+ - Data exploration to understand the features.
+ - Identifying different preprocessing strategies for different feature types.
+ - Choosing key metrics for the problem - Root Mean Squared Error (for error estimation), R2-Score (for model explainability)
+
+??? info "Mention any additional resources used (blogs, books, chapters, articles, research papers, etc.)."
+ - [Dealing with features that have high cardinality](https://towardsdatascience.com/dealing-with-features-that-have-high-cardinality-1c9212d7ff1b)
+ - [Target-encoding Categorical Variables](https://towardsdatascience.com/dealing-with-categorical-variables-by-using-target-encoder-a0f1733a4c69)
+ - [Cars Price Prediction](https://www.kaggle.com/code/khotijahs1/cars-price-prediction)
+
+---
+
+### EXPLANATION
+
+#### DETAILS OF THE DIFFERENT FEATURES
+
+| **Feature Name** | **Description** | **Type** | **Values/Range** |
+|------------------|-----------------|----------|------------------|
+| Name | Car model | Categorical | Names of car models |
+| Location | City where the car is listed for sale | Categorical | Names of cities|
+| Year | Year of original purchase of car | Numerical | Years (e.g., 2010, 2015, etc.) |
+| Kilometers_Driven | Odometer reading of the car | Numerical | Measured in kilometers |
+| Fuel_Type| Fuel type of the car | Categorical | [Petrol, Diesel, CNG, Electric, etc.] |
+| Transmission | Transmission type of the car | Categorical | [Automatic, Manual] |
+| Owner_Type | Number of previous owners of the car | Numerical | Whole numbers |
+| Mileage | Current mileage provided by the car | Numerical | Measured in km/l or equivalent |
+| Engine | Engine capacity of the car | Numerical | Measured in CC (Cubic Centimeters) |
+| Power | Engine power output of the car | Numerical | Measured in BHP (Brake Horsepower) |
+| Seats | Seating capacity of the car | Numerical | Whole numbers |
+| New_Price| Original price of the car at the time of purchase | Numerical | Measured in currency |
+
+---
+
+#### WHAT I HAVE DONE
+
+=== "Step 1"
+
+ Exploratory Data Analysis
+
+ - Summary statistics
+ - Data visualization for numerical feature distributions
+ - Target splits for categorical features
+
+=== "Step 2"
+
+ Data cleaning and Preprocessing
+
+ - Removing rare categories of brands
+ - Removing outliers for numerical features and target
+ - Categorical feature encoding for low-cardinality features
+ - Target encoding for high-cardinality categorical features (in model pipeline)
+
+=== "Step 3"
+
+ Feature engineering and selection
+
+ - Extracting brand name from model name for a lower-cardinality feature.
+ - Converting categorical Owner_Type to numerical Num_Previous_Owners.
+ - Feature selection based on model-based feature importances and statistical tests.
+
+=== "Step 4"
+
+ Modeling
+
+ - Holdout dataset created for model testing
+ - Setting up a framework for easier testing of multiple models.
+ - Models trained: LLinear Regression, K-Nearest Neighbors, Decision Tree, Random Forest, AdaBoost, Multi-Layer Perceptron, XGBoost and CatBoost.
+ - Models were ensembled using Simple and Weighted averaging.
+
+=== "Step 5"
+
+ Result analysis
+
+ - Predictions made on holdout test set
+ - Models compared based on chosen metrics: RMSE and R2-Score.
+ - Visualized predicted prices vs actual prices to analyze errors.
+
+---
+
+#### PROJECT TRADE-OFFS AND SOLUTIONS
+
+=== "Trade Off 1"
+
+ **Training time & Model complexity vs Reducing error**
+
+ - **Solution:** Limiting depth and number of estimators for tree-based models. Overfitting detection and early stopping mechanism for neural network training.
+
+---
+
+#### SCREENSHOTS
+
+!!! success "Project workflow"
+
+ ``` mermaid
+ graph LR
+ A[Start] --> B{Error?};
+ B -->|Yes| C[Hmm...];
+ C --> D[Debug];
+ D --> B;
+ B ---->|No| E[Yay!];
+ ```
+
+??? tip "Data Exploration"
+
+ === "Price"
+ 
+
+ === "Year"
+ 
+
+ === "KM Driven"
+ 
+
+ === "Engine"
+ 
+
+ === "Power"
+ 
+
+ === "Mileage"
+ 
+
+ === "Seats"
+ 
+
+??? tip "Feature Selection"
+
+ === "Feature Correlation"
+ 
+
+ === "Target Correlation"
+ 
+
+ === "Mutual Information"
+ 
+
+---
+
+#### MODELS USED AND THEIR PERFORMANCE
+
+| Model | RMSE | R2-Score
+|:-----|:-----:|:-----:
+| Linear Regression | 3.5803 | 0.7915 |
+| K-Nearest Neighbors | 2.8261 | 0.8701 |
+| Decision Tree | 2.6790 | 0.8833 |
+| Random Forest | 2.4619 | 0.9014 |
+| AdaBoost | 2.3629 | 0.9092 |
+| Multi-layer Perceptron | 2.6255 | 0.8879 |
+| XGBoost w/o preprocessing | 2.1649 | 0.9238 |
+| **XGBoost with preprocessing** | **2.0987** | **0.9284** |
+| CatBoost w/o preprocessing | 2.1734 | 0.9232 |
+| Simple average ensemble | 2.2804 | 0.9154 |
+| Weighted average ensemble | 2.1296 | 0.9262 |
+
+---
+
+### CONCLUSION
+
+#### WHAT YOU HAVE LEARNED
+
+!!! tip "Insights gained from the data"
+ 1. Features related to car configuration such as Power, Engine and Transmission are some of the most informative features. Usage-related features such as Year and current Mileage are also important.
+ 2. Seating capacity and Number of previous owners had relatively less predictive power. However, none of the features were candidates for removal.
+
+??? tip "Improvements in understanding machine learning concepts"
+ 1. Implemented target-encoding for high-cardinality categorical features.
+ 2. Designed pipelines to avoid data leakage.
+ 3. Ensembling models using prediction averaging.
+
+??? tip "Challenges faced and how they were overcome"
+ 1. Handling mixed feature types in preprocessing pipelines.
+ 2. Regularization and overfitting detection to reduce training time while maintaining performance.
+
+---
+
+#### USE CASES OF THIS MODEL
+
+=== "Application 1"
+
+ - Sellers can use the model to estimate an approximate price for their cars when they list them on the marketplace.
+
+=== "Application 2"
+
+ - Buyers can use the model to check if the listed price is fair when they decide to buy a used vehicle.
+
+---
+
+#### FEATURES PLANNED BUT NOT IMPLEMENTED
+
+=== "Feature 1"
+
+ - Complex model-ensembling through stacking or hill-climbing was not implemented due to significantly longer training time.
+
diff --git a/docs/ML/projects/used_cars_price_prediction/README.md b/docs/ML/projects/used_cars_price_prediction/README.md
deleted file mode 100644
index c8d74104..00000000
--- a/docs/ML/projects/used_cars_price_prediction/README.md
+++ /dev/null
@@ -1,170 +0,0 @@
-# Used Cars Price Prediction
-
-## AIM
-
-Predicting the prices of used cars based on their configuration and previous usage.
-
-## DATASET LINK
-
-[Used Cars Price Prediction Dataset - Kaggle](https://www.kaggle.com/datasets/avikasliwal/used-cars-price-prediction)
-
-## MY NOTEBOOK LINK
-
-[Used Cars Price Prediction](https://www.kaggle.com/code/sid4ds/used-cars-price-prediction/)
-
-## DESCRIPTION
-
-* Why is the project necessary?
-This project aims to predict the prices of used cars listed on an online marketplace based on their features and usage by previous owners. This model can be used by sellers to estimate an approximate price for their cars when they list them on the marketplace. Buyers can use the model to check if the listed price is fair when they decide to buy a used vehicle.
-
-* How did you start approaching this project? (Initial thoughts and planning)
- * Researching previous projects and articles related to the problem.
- * Data exploration to understand the features.
- Identifying different preprocessing strategies for different feature types.
- * Choosing key metrics for the problem - Root Mean Squared Error (for error estimation), R2-Score (for model explainability)
-
-* Mention any additional resources used (blogs, books, chapters, articles, research papers, etc.).
- * [Dealing with features that have high cardinality](https://towardsdatascience.com/dealing-with-features-that-have-high-cardinality-1c9212d7ff1b)
- * [Target-encoding Categorical Variables](https://towardsdatascience.com/dealing-with-categorical-variables-by-using-target-encoder-a0f1733a4c69)
- * [Cars Price Prediction](https://www.kaggle.com/code/khotijahs1/cars-price-prediction)
-
-## EXPLANATION
-
-### DETAILS OF THE DIFFERENT FEATURES
-
-1. Name: car model
-2. Location: city where the car is listed for sale
-3. Year: year of original purchase of car
-4. Kilometers_Driven: odometer reading of the car
-5. Fuel_Type: fuel type of the car - petrol, diesel, CNG, etc.
-6. Transmission: transmission type of the car - automatic vs manual
-7. Owner_Type: number of previous owners of the car
-8. Mileage: current mileage (distance per unit of fuel) provided by the car
-9. Engine: engine capacity of the car in CC
-10. Power: engine power output in BHP
-11. Seats: seating capacity of the car
-12. New_Price: original price of the car at the time of purchase
-
-### WHAT I HAVE DONE
-
-1. **Exploratory Data Analysis**:
- * Summary statistics
- * Data visualization for numerical feature distributions
- * Target splits for categorical features
-2. **Data cleaning and Preprocessing**:
- * Removing rare categories of brands
- * Removing outliers for numerical features and target
- * Categorical feature encoding for low-cardinality features
- * Target encoding for high-cardinality categorical features (in model pipeline)
-3. **Feature engineering and selection**:
- * Extracting brand name from model name for a lower-cardinality feature.
- * Converting categorical Owner_Type to numerical Num_Previous_Owners.
- * Feature selection based on model-based feature importances and statistical tests.
-4. **Modeling**:
- * Holdout dataset created for model testing
- * Setting up a framework for easier testing of multiple models.
- * Models trained: LLinear Regression, K-Nearest Neighbors, Decision Tree, Random Forest, AdaBoost, Multi-Layer Perceptron, XGBoost and CatBoost.
- * Models were ensembled using Simple and Weighted averaging.
-5. **Result analysis**:
- * Predictions made on holdout test set
- * Models compared based on chosen metrics: RMSE and R2-Score.
- * Visualized predicted prices vs actual prices to analyze errors.
-
-### PROJECT TRADE-OFFS AND SOLUTIONS
-
-**Training time & Model complexity vs Reducing error**
-
-* Solution: Limiting depth and number of estimators for tree-based models. Overfitting detection and early stopping mechanism for neural network training.
-
-### LIBRARIES NEEDED
-
-Libraries required for the project:
-
-* pandas
-* numpy
-* scikit-learn (>=1.5.0 required for Target Encoding)
-* xgboost
-* catboost
-* matplotlib
-* seaborn
-
-### SCREENSHOTS
-
-**Data exploration**:
-
-
-
-
-
-
-
-
-**Feature selection**:
-Feature correlation:
-
-Target correlation:
-
-Mutual information:
-
-
-### MODELS USED AND THEIR PERFORMANCE
-
-| Model | RMSE | R2-Score
-|:-----|:-----:|:-----:
-| Linear Regression | 3.5803 | 0.7915
-| K-Nearest Neighbors | 2.8261 | 0.8701
-| Decision Tree | 2.6790 | 0.8833
-| Random Forest | 2.4619 | 0.9014
-| AdaBoost | 2.3629 | 0.9092
-| Multi-layer Perceptron | 2.6255 | 0.8879
-| XGBoost w/o preprocessing | 2.1649 | 0.9238
-| **XGBoost with preprocessing** | **2.0987** | **0.9284**
-| CatBoost w/o preprocessing | 2.1734 | 0.9232
-| Simple average ensemble | 2.2804 | 0.9154
-| Weighted average ensemble | 2.1296 | 0.9262
-
-## CONCLUSION
-
-### WHAT YOU HAVE LEARNED
-
-* Insights gained from the data:
-
-1. Features related to car configuration such as Power, Engine and Transmission are some of the most informative features. Usage-related features such as Year and current Mileage are also important.
-2. Seating capacity and Number of previous owners had relatively less predictive power. However, none of the features were candidates for removal.
-
-* Improvements in understanding machine learning concepts:
-
-1. Implemented target-encoding for high-cardinality categorical features.
-2. Designed pipelines to avoid data leakage.
-3. Ensembling models using prediction averaging.
-
-* Challenges faced:
-
-1. Handling mixed feature types in preprocessing pipelines.
-2. Regularization and overfitting detection to reduce training time while maintaining performance.
-
-### USE CASES OF THIS MODEL
-
-* Sellers can use the model to estimate an approximate price for their cars when they list them on the marketplace.
-* Buyers can use the model to check if the listed price is fair when they decide to buy a used vehicle.
-
-### HOW TO INTEGRATE THIS MODEL IN REAL WORLD
-
-* The model can be used without any private identifiers.
-* Input data is preprocessed according to the steps taken before model training and the extended features are created.
-* The best model: *XGBoost with Preprocessing* can be used for inference.
-* Results on new data can be stored to monitor if the model maintains performance level. Once a significant number of sale transactions have been processed, system can be evaluated for model-drift and retrained if required.
-
-### FEATURES PLANNED BUT NOT IMPLEMENTED
-
-* Complex model-ensembling through stacking or hill-climbing was not implemented due to significantly longer training time.
-
-### NAME
-
-**Siddhant Tiwari**
-
-[](https://www.linkedin.com/in/siddhant-tiwari-ds)
-
-#### Happy Coding 🧑💻
-
-### Show some ❤️ by 🌟 this repository!
diff --git a/docs/ML/projects/used_cars_price_prediction/assets/featdist_engine.png b/docs/ML/projects/used_cars_price_prediction/assets/featdist_engine.png
deleted file mode 100644
index 7ee87da6..00000000
Binary files a/docs/ML/projects/used_cars_price_prediction/assets/featdist_engine.png and /dev/null differ
diff --git a/docs/ML/projects/used_cars_price_prediction/assets/featdist_kmdriven.png b/docs/ML/projects/used_cars_price_prediction/assets/featdist_kmdriven.png
deleted file mode 100644
index 970e6930..00000000
Binary files a/docs/ML/projects/used_cars_price_prediction/assets/featdist_kmdriven.png and /dev/null differ
diff --git a/docs/ML/projects/used_cars_price_prediction/assets/featdist_mileage.png b/docs/ML/projects/used_cars_price_prediction/assets/featdist_mileage.png
deleted file mode 100644
index 35cf6445..00000000
Binary files a/docs/ML/projects/used_cars_price_prediction/assets/featdist_mileage.png and /dev/null differ
diff --git a/docs/ML/projects/used_cars_price_prediction/assets/featdist_power.png b/docs/ML/projects/used_cars_price_prediction/assets/featdist_power.png
deleted file mode 100644
index 5a6a2c88..00000000
Binary files a/docs/ML/projects/used_cars_price_prediction/assets/featdist_power.png and /dev/null differ
diff --git a/docs/ML/projects/used_cars_price_prediction/assets/featdist_seats.png b/docs/ML/projects/used_cars_price_prediction/assets/featdist_seats.png
deleted file mode 100644
index 4c690746..00000000
Binary files a/docs/ML/projects/used_cars_price_prediction/assets/featdist_seats.png and /dev/null differ
diff --git a/docs/ML/projects/used_cars_price_prediction/assets/featdist_year.png b/docs/ML/projects/used_cars_price_prediction/assets/featdist_year.png
deleted file mode 100644
index 4c4ac1df..00000000
Binary files a/docs/ML/projects/used_cars_price_prediction/assets/featdist_year.png and /dev/null differ
diff --git a/docs/ML/projects/used_cars_price_prediction/assets/featselect_corrfeatures.png b/docs/ML/projects/used_cars_price_prediction/assets/featselect_corrfeatures.png
deleted file mode 100644
index b2773f3f..00000000
Binary files a/docs/ML/projects/used_cars_price_prediction/assets/featselect_corrfeatures.png and /dev/null differ
diff --git a/docs/ML/projects/used_cars_price_prediction/assets/featselect_corrtarget.png b/docs/ML/projects/used_cars_price_prediction/assets/featselect_corrtarget.png
deleted file mode 100644
index ff92f02e..00000000
Binary files a/docs/ML/projects/used_cars_price_prediction/assets/featselect_corrtarget.png and /dev/null differ
diff --git a/docs/ML/projects/used_cars_price_prediction/assets/featselect_mutualinfo.png b/docs/ML/projects/used_cars_price_prediction/assets/featselect_mutualinfo.png
deleted file mode 100644
index 01331c12..00000000
Binary files a/docs/ML/projects/used_cars_price_prediction/assets/featselect_mutualinfo.png and /dev/null differ
diff --git a/docs/ML/projects/used_cars_price_prediction/assets/preds_ada.png b/docs/ML/projects/used_cars_price_prediction/assets/preds_ada.png
deleted file mode 100644
index 1a72bcfd..00000000
Binary files a/docs/ML/projects/used_cars_price_prediction/assets/preds_ada.png and /dev/null differ
diff --git a/docs/ML/projects/used_cars_price_prediction/assets/preds_cb.png b/docs/ML/projects/used_cars_price_prediction/assets/preds_cb.png
deleted file mode 100644
index 169a4f85..00000000
Binary files a/docs/ML/projects/used_cars_price_prediction/assets/preds_cb.png and /dev/null differ
diff --git a/docs/ML/projects/used_cars_price_prediction/assets/preds_dt.png b/docs/ML/projects/used_cars_price_prediction/assets/preds_dt.png
deleted file mode 100644
index bfa11f72..00000000
Binary files a/docs/ML/projects/used_cars_price_prediction/assets/preds_dt.png and /dev/null differ
diff --git a/docs/ML/projects/used_cars_price_prediction/assets/preds_knn.png b/docs/ML/projects/used_cars_price_prediction/assets/preds_knn.png
deleted file mode 100644
index 42a65b19..00000000
Binary files a/docs/ML/projects/used_cars_price_prediction/assets/preds_knn.png and /dev/null differ
diff --git a/docs/ML/projects/used_cars_price_prediction/assets/preds_lr.png b/docs/ML/projects/used_cars_price_prediction/assets/preds_lr.png
deleted file mode 100644
index 99358ff6..00000000
Binary files a/docs/ML/projects/used_cars_price_prediction/assets/preds_lr.png and /dev/null differ
diff --git a/docs/ML/projects/used_cars_price_prediction/assets/preds_mlp.png b/docs/ML/projects/used_cars_price_prediction/assets/preds_mlp.png
deleted file mode 100644
index b0f1f610..00000000
Binary files a/docs/ML/projects/used_cars_price_prediction/assets/preds_mlp.png and /dev/null differ
diff --git a/docs/ML/projects/used_cars_price_prediction/assets/preds_prexgb.png b/docs/ML/projects/used_cars_price_prediction/assets/preds_prexgb.png
deleted file mode 100644
index 366a2334..00000000
Binary files a/docs/ML/projects/used_cars_price_prediction/assets/preds_prexgb.png and /dev/null differ
diff --git a/docs/ML/projects/used_cars_price_prediction/assets/preds_rf.png b/docs/ML/projects/used_cars_price_prediction/assets/preds_rf.png
deleted file mode 100644
index 68421bcf..00000000
Binary files a/docs/ML/projects/used_cars_price_prediction/assets/preds_rf.png and /dev/null differ
diff --git a/docs/ML/projects/used_cars_price_prediction/assets/preds_xgb.png b/docs/ML/projects/used_cars_price_prediction/assets/preds_xgb.png
deleted file mode 100644
index f1432a54..00000000
Binary files a/docs/ML/projects/used_cars_price_prediction/assets/preds_xgb.png and /dev/null differ
diff --git a/docs/ML/projects/used_cars_price_prediction/assets/results_summary.png b/docs/ML/projects/used_cars_price_prediction/assets/results_summary.png
deleted file mode 100644
index ee640889..00000000
Binary files a/docs/ML/projects/used_cars_price_prediction/assets/results_summary.png and /dev/null differ
diff --git a/docs/ML/projects/used_cars_price_prediction/assets/target_dist.png b/docs/ML/projects/used_cars_price_prediction/assets/target_dist.png
deleted file mode 100644
index fa79b3bd..00000000
Binary files a/docs/ML/projects/used_cars_price_prediction/assets/target_dist.png and /dev/null differ
diff --git a/docs/NLP/projects/Email_Spam_Detection/images/Confusion Matrix - AdaBoost.png b/docs/NLP/projects/Email_Spam_Detection/images/Confusion Matrix - AdaBoost.png
deleted file mode 100644
index cea45f81..00000000
Binary files a/docs/NLP/projects/Email_Spam_Detection/images/Confusion Matrix - AdaBoost.png and /dev/null differ
diff --git a/docs/NLP/projects/Email_Spam_Detection/images/Confusion Matrix - Decision Tree.png b/docs/NLP/projects/Email_Spam_Detection/images/Confusion Matrix - Decision Tree.png
deleted file mode 100644
index 678468e0..00000000
Binary files a/docs/NLP/projects/Email_Spam_Detection/images/Confusion Matrix - Decision Tree.png and /dev/null differ
diff --git a/docs/NLP/projects/Email_Spam_Detection/images/Confusion Matrix - Naive Bayes.png b/docs/NLP/projects/Email_Spam_Detection/images/Confusion Matrix - Naive Bayes.png
deleted file mode 100644
index d2372578..00000000
Binary files a/docs/NLP/projects/Email_Spam_Detection/images/Confusion Matrix - Naive Bayes.png and /dev/null differ
diff --git a/docs/NLP/projects/Email_Spam_Detection/images/Confusion Matrix - Random Forest.png b/docs/NLP/projects/Email_Spam_Detection/images/Confusion Matrix - Random Forest.png
deleted file mode 100644
index ffbbb73f..00000000
Binary files a/docs/NLP/projects/Email_Spam_Detection/images/Confusion Matrix - Random Forest.png and /dev/null differ
diff --git a/docs/NLP/projects/Email_Spam_Detection/images/Confusion Matrix - SVM.png b/docs/NLP/projects/Email_Spam_Detection/images/Confusion Matrix - SVM.png
deleted file mode 100644
index 32313728..00000000
Binary files a/docs/NLP/projects/Email_Spam_Detection/images/Confusion Matrix - SVM.png and /dev/null differ
diff --git a/docs/NLP/projects/Email_Spam_Detection/images/Model accracy comparison.png b/docs/NLP/projects/Email_Spam_Detection/images/Model accracy comparison.png
deleted file mode 100644
index b605b0d2..00000000
Binary files a/docs/NLP/projects/Email_Spam_Detection/images/Model accracy comparison.png and /dev/null differ
diff --git a/docs/NLP/projects/Email_Spam_Detection/README.md b/docs/NLP/projects/email_spam_detection.md
similarity index 85%
rename from docs/NLP/projects/Email_Spam_Detection/README.md
rename to docs/NLP/projects/email_spam_detection.md
index 5be4710e..15bf34b5 100644
--- a/docs/NLP/projects/Email_Spam_Detection/README.md
+++ b/docs/NLP/projects/email_spam_detection.md
@@ -21,7 +21,6 @@ To develop a machine learning-based system that classifies email content as spam
- scikit-learn
- matplotlib
- seaborn
-
---
@@ -115,7 +114,7 @@ The dataset contains features like word frequency, capital letter counts, and ot
### SCREENSHOTS
-!!! success "Project structure or tree diagram"
+!!! success "Project flowchart"
``` mermaid
graph LR
@@ -127,15 +126,22 @@ The dataset contains features like word frequency, capital letter counts, and ot
E -->|Retry| C;
```
-??? tip "Visualizations and EDA of different features"
+??? tip "Confusion Matrix"
- === "Feature Correlation Heatmap"
- 
+ === "SVM"
+ 
-??? example "Model performance graphs"
+ === "Naive Bayes"
+ 
- === "Model Comparison"
- 
+ === "Decision Tree"
+ 
+
+ === "AdaBoost"
+ 
+
+ === "Random Forest"
+ 
---
@@ -156,7 +162,7 @@ The dataset contains features like word frequency, capital letter counts, and ot
!!! tip "Models Comparison Graphs"
=== "Accuracy Comparison"
- 
+ 
---
@@ -196,18 +202,3 @@ The dataset contains features like word frequency, capital letter counts, and ot
- Integration of deep learning models (LSTM) for improved accuracy.
----
-
-### **DEVELOPER**
-***Insha Khan***
-
-[LinkedIn](https://www.linkedin.com/in/insha-khan-4087532a4/){ .md-button }
-[GitHub](https://www.github.com/ikcod){ .md-button }
-
-##### Happy Coding 🤓
-#### Show some ❤️ by 🌟 this repository!
-
-
-
-
-
diff --git a/docs/NLP/projects/twitter_sentiment_analysis/README.md b/docs/NLP/projects/twitter_sentiment_analysis.md
similarity index 91%
rename from docs/NLP/projects/twitter_sentiment_analysis/README.md
rename to docs/NLP/projects/twitter_sentiment_analysis.md
index ee8eb79c..99b0ac73 100644
--- a/docs/NLP/projects/twitter_sentiment_analysis/README.md
+++ b/docs/NLP/projects/twitter_sentiment_analysis.md
@@ -156,21 +156,21 @@ To analyze sentiment in Twitter data using natural language processing technique
??? tip "Visualizations and EDA of different features"
=== "Sentiment Distribution"
- 
+ 
??? example "Model performance graphs"
=== "LR Confusion Matrix"
- 
+ 
=== "LR ROC Curve"
- 
+ 
=== "Naive Bayes Confusion Matrix"
- 
+ 
=== "Naive Bayes ROC Curve"
- 
+ 
---
@@ -189,9 +189,9 @@ To analyze sentiment in Twitter data using natural language processing technique
!!! tip "Models Comparison Graphs"
=== "LSTM Accuracy"
- 
+ 
=== "LSTM Loss"
- 
+ 
---
@@ -235,13 +235,3 @@ To analyze sentiment in Twitter data using natural language processing technique
- Couldn't do it with SVM (Support Vector Machine) and Random Forest due to computational/system requirements.
----
-
-### **DEVELOPER**
-***Laya***
-
-[LinkedIn](https://www.linkedin.com/in/laya-reddy-092911245){ .md-button }
-[GitHub](https://www.github.com/devil-90){ .md-button }
-
-##### Happy Coding 🧑💻
-#### Show some ❤️ by 🌟 this repository!
diff --git a/docs/NLP/projects/twitter_sentiment_analysis/images/confusion_matrix_logistic_regression.png b/docs/NLP/projects/twitter_sentiment_analysis/images/confusion_matrix_logistic_regression.png
deleted file mode 100644
index 5c2a2265..00000000
Binary files a/docs/NLP/projects/twitter_sentiment_analysis/images/confusion_matrix_logistic_regression.png and /dev/null differ
diff --git a/docs/NLP/projects/twitter_sentiment_analysis/images/confusion_matrix_naive_bayes.png b/docs/NLP/projects/twitter_sentiment_analysis/images/confusion_matrix_naive_bayes.png
deleted file mode 100644
index 0095a279..00000000
Binary files a/docs/NLP/projects/twitter_sentiment_analysis/images/confusion_matrix_naive_bayes.png and /dev/null differ
diff --git a/docs/NLP/projects/twitter_sentiment_analysis/images/lstm_accuracy.jpg b/docs/NLP/projects/twitter_sentiment_analysis/images/lstm_accuracy.jpg
deleted file mode 100644
index 53581982..00000000
Binary files a/docs/NLP/projects/twitter_sentiment_analysis/images/lstm_accuracy.jpg and /dev/null differ
diff --git a/docs/NLP/projects/twitter_sentiment_analysis/images/lstm_loss.jpg b/docs/NLP/projects/twitter_sentiment_analysis/images/lstm_loss.jpg
deleted file mode 100644
index 6abb092b..00000000
Binary files a/docs/NLP/projects/twitter_sentiment_analysis/images/lstm_loss.jpg and /dev/null differ
diff --git a/docs/NLP/projects/twitter_sentiment_analysis/images/roc_curve_logistic_regression.png b/docs/NLP/projects/twitter_sentiment_analysis/images/roc_curve_logistic_regression.png
deleted file mode 100644
index 985f1e9c..00000000
Binary files a/docs/NLP/projects/twitter_sentiment_analysis/images/roc_curve_logistic_regression.png and /dev/null differ
diff --git a/docs/NLP/projects/twitter_sentiment_analysis/images/roc_curve_naive_bayes.png b/docs/NLP/projects/twitter_sentiment_analysis/images/roc_curve_naive_bayes.png
deleted file mode 100644
index 81437f35..00000000
Binary files a/docs/NLP/projects/twitter_sentiment_analysis/images/roc_curve_naive_bayes.png and /dev/null differ
diff --git a/docs/NLP/projects/twitter_sentiment_analysis/images/sentiment_distribution.png b/docs/NLP/projects/twitter_sentiment_analysis/images/sentiment_distribution.png
deleted file mode 100644
index 0332afd6..00000000
Binary files a/docs/NLP/projects/twitter_sentiment_analysis/images/sentiment_distribution.png and /dev/null differ
diff --git a/docs/OpenCV/projects/music_genre_classification_model.md b/docs/OpenCV/projects/music_genre_classification_model.md
new file mode 100644
index 00000000..a419553e
--- /dev/null
+++ b/docs/OpenCV/projects/music_genre_classification_model.md
@@ -0,0 +1,288 @@
+# Music Genre Classification Model
+
+### AIM
+
+To develop a precise and effective music genre classification model using Convolutional Neural Networks (CNN), Support Vector Machines (SVM), Random Forest and XGBoost Classifier algorithms for the Kaggle GTZAN Dataset Music Genre Classification.
+
+### DATASET LINK
+
+[https://www.kaggle.com/datasets/andradaolteanu/gtzan-dataset-music-genre-classification/data](https://www.kaggle.com/datasets/andradaolteanu/gtzan-dataset-music-genre-classification/data)
+
+### MY NOTEBOOK LINK
+
+[https://colab.research.google.com/drive/1j8RZccP2ee5XlWEFSkTyJ98lFyNrezHS?usp=sharing](https://colab.research.google.com/drive/1j8RZccP2ee5XlWEFSkTyJ98lFyNrezHS?usp=sharing)
+
+### LIBRARIES NEEDED
+
+??? quote "LIBRARIES USED"
+
+ - librosa
+ - matplotlib
+ - pandas
+ - sklearn
+ - seaborn
+ - numpy
+ - scipy
+ - xgboost
+
+---
+
+### DESCRIPTION
+
+!!! info "What is the requirement of the project?"
+ - The objective of this research is to develop a precise and effective music genre classification model using Convolutional Neural Networks (CNN), Support Vector Machines (SVM), Random Forest and XGBoost algorithms for the Kaggle GTZAN Dataset Music Genre Classification.
+
+??? info "Why is it necessary?"
+ - Music genre classification has several real-world applications, including music recommendation, content-based music retrieval, and personalized music services. However, the task of music genre classification is challenging due to the subjective nature of music and the complexity of audio signals.
+
+??? info "How is it beneficial and used?"
+ - **For User:** Provides more personalised music
+ - **For Developers:** A recommendation system for songs that are of interest to the user
+ - **For Business:** Able to charge premium for the more personalised and recommendation services provided
+
+??? info "How did you start approaching this project? (Initial thoughts and planning)"
+ - Initially how the different sounds are structured.
+ - Learned how to represent sound signal in 2D format on graphs using the librosa library.
+ - Came to know about the various features of sound like
+ - Mel-frequency cepstral coefficients (MFCC)
+ - Chromagram
+ - Spectral Centroid
+ - Zero-crossing rate
+ - BPM - Beats Per Minute
+
+??? info "Mention any additional resources used (blogs, books, chapters, articles, research papers, etc.)."
+ - [https://scholarworks.calstate.edu/downloads/73666b68n](https://scholarworks.calstate.edu/downloads/73666b68n)
+ - [https://www.kaggle.com/datasets/andradaolteanu/gtzan-dataset-music-genre-classification/data](https://www.kaggle.com/datasets/andradaolteanu/gtzan-dataset-music-genre-classification/data)
+ - [https://towardsdatascience.com/music-genre-classification-with-python-c714d032f0d8](https://towardsdatascience.com/music-genre-classification-with-python-c714d032f0d8)
+
+---
+
+### EXPLANATION
+
+#### DETAILS OF THE DIFFERENT FEATURES
+
+ There are 3 different types of the datasets.
+
+ - genres_original
+ - images_original
+ - features_3_sec.csv
+ - feature_30_sec.csv
+
+- The features in `genres_original`
+
+ ['blues', 'classical', 'country', 'disco', 'hiphop', 'jazz', 'metal', 'pop', 'reggae', 'rock']
+ Each and every genre has 100 WAV files
+
+- The features in `genres_original`
+
+ ['blues', 'classical', 'country', 'disco', 'hiphop', 'jazz', 'metal', 'pop', 'reggae', 'rock']
+ Each and every genre has 100 PNG files
+
+- There are 60 features in `features_3_sec.csv`
+
+- There are 60 features in `features_30_sec.csv`
+
+---
+
+#### WHAT I HAVE DONE
+
+=== "Step 1"
+
+ - Created data visual reprsentation of the data to help understand the data
+
+=== "Step 2"
+
+ - Found strong relationships between independent features and dependent feature using correlation.
+
+=== "Step 3"
+
+ - Performed Exploratory Data Analysis on data.
+
+=== "Step 4"
+
+ - Used different Classification techniques like SVM, Random Forest,
+
+=== "Step 5"
+
+ - Compared various models and used best performance model to make predictions.
+
+=== "Step 6"
+
+ - Used Mean Squared Error and R2 Score for evaluating model's performance.
+
+=== "Step 7"
+
+ - Visualized best model's performance using matplotlib and seaborn library.
+
+---
+
+#### PROJECT TRADE-OFFS AND SOLUTIONS
+
+=== "Trade Off 1"
+
+ How do you visualize audio signal
+
+ - **Solution**:
+
+ - **_librosa_**: It is the mother of all audio file libraries
+ - **Plotting Graphs**: As I have the necessary libraries to visualize the data. I started plotting the audio signals
+ - **Spectogram**:A spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time. When applied to an audio signal, spectrograms are sometimes called sonographs, voiceprints, or voicegrams. Here we convert the frequency axis to a logarithmic one.
+
+=== "Trade Off 2"
+
+ Features that help classify the data
+
+ - **Solution**:
+
+ - **Feature Engineering**: What are the features present in audio signals
+ - **Spectral Centroid**: Indicates where the ”centre of mass” for a sound is located and is calculated as the weighted mean of the frequencies present in the sound.
+ - **Mel-Frequency Cepstral Coefficients**: The Mel frequency cepstral coefficients (MFCCs) of a signal are a small set of features (usually about 10–20) which concisely describe the overall shape of a spectral envelope. It models the characteristics of the human voice.
+ - **Chroma Frequencies**: Chroma features are an interesting and powerful representation for music audio in which the entire spectrum is projected onto 12 bins representing the 12 distinct semitones (or chroma) of the musical octave.
+
+=== "Trade Off 3"
+
+ Performing EDA on the CSV files
+
+ - **Solution**:
+
+ - **Tool Selection**: Used the correlation matrix on the features_30_sec.csv dataset to extract most related datasets
+ - **Visualization Best Practices**: Followed best practices such as using appropriate chart types (e.g., box plots for BPM data, PCA plots for correlations), adding labels and titles, and ensuring readability.
+ - **Iterative Refinement**: Iteratively refined visualizations based on feedback and self-review to enhance clarity and informativeness.
+
+=== "Trade Off 4"
+
+ Implementing Machine Learning Models
+
+ - **Solution**:
+
+ - **Cross-validation**: Used cross-validation techniques to ensure the reliability and accuracy of the analysis results.
+ - **Collaboration with Experts**: Engaged with Music experts and enthusiasts to validate the findings and gain additional perspectives.
+ - **Contextual Understanding**: Interpreted results within the context of the music, considering factors such as mood of the users, surrounding, and specific events to provide meaningful and actionable insights.
+
+---
+
+### SCREENSHOTS
+
+!!! success "Project workflow"
+
+ ``` mermaid
+ graph LR
+ A[Start] --> B{Error?};
+ B -->|Yes| C[Hmm...];
+ C --> D[Debug];
+ D --> B;
+ B ---->|No| E[Yay!];
+ ```
+
+??? tip "Visualizations and EDA of different features"
+
+ === "Harm Perc"
+ 
+
+ === "Sound Wave"
+ 
+
+ === "STFT"
+ 
+
+ === "Pop Mel-Spec"
+ 
+
+ === "Blues Mel-Spec"
+ 
+
+ === "Spec Cent"
+ 
+
+ === "Spec Rolloff"
+ 
+
+ === "MFCC"
+ 
+
+ === "Chromogram"
+ 
+
+ === "Corr Heatmap"
+ 
+
+ === "BPM Boxplot"
+ 
+
+ === "PCA Scatter Plot"
+ 
+
+ === "Confusion Matrix"
+ 
+
+---
+
+### MODELS USED AND THEIR ACCURACIES
+
+| Model | Accuracy |
+|------------------------------|------------|
+| KNN |0.80581 |
+| Random Forest |0.81415 |
+| Cross Gradient Booster |0.90123 |
+| SVM |0.75409 |
+
+---
+
+#### MODELS COMPARISON GRAPHS
+
+!!! tip "Models Comparison Graphs"
+
+ === "ACC Plot"
+ 
+
+---
+
+### CONCLUSION
+
+ We can see that Accuracy plots of the different models.
+ XGB Classifier can predict most accurate results for predicting the Genre of the music.
+
+#### WHAT YOU HAVE LEARNED
+
+!!! tip "Insights gained from the data"
+ - Discovered a new library that help visualize audio signal
+ - Discovered new features related to audio like STFT, MFCC, Spectral Centroid, Spectral Rolloff
+ - Gained a deeper understanding of the features of different genres of music
+
+??? tip "Improvements in understanding machine learning concepts"
+ - Enhanced knowledge of data cleaning and preprocessing techniques to handle real-world datasets.
+ - Improved skills in exploratory data analysis (EDA) to extract meaningful insights from raw data.
+ - Learned how to use visualization tools to effectively communicate data-driven findings.
+
+---
+
+#### USE CASES OF THIS MODEL
+
+=== "Application 1"
+
+ **User Personalisation**
+
+ - It can be used to provide more personalised music recommendation for users based on their taste in music or the various genres they listen to. This personalisation experience can be used to develop 'Premium' based business models.
+
+=== "Application 2"
+
+ **Compatability Between Users**
+
+ - Based on the musical taste and the genres they listen we can identify the user behaviour and pattern come with similar users who can be friends with. This increases social interaction within the app.
+
+---
+
+#### FEATURES PLANNED BUT NOT IMPLEMENTED
+
+=== "Feature 1"
+
+ - **Real-time Compatability Tracking**
+
+ - Implementing a real-time tracking system to view compatability between users.
+
+=== "Feature 1"
+
+ - **Predictive Analytics**
+
+ - Using advanced machine learning algorithms to predict the next song the users is likely to listen to.
+
diff --git a/docs/OpenCV/projects/music_genre_classification_model/README.md b/docs/OpenCV/projects/music_genre_classification_model/README.md
deleted file mode 100644
index 5316f9c5..00000000
--- a/docs/OpenCV/projects/music_genre_classification_model/README.md
+++ /dev/null
@@ -1,224 +0,0 @@
-# Music Genre Classification Model
-
-## AIM
-
-The aim of this project is to develop a precise and effective music genre classification model using Convolutional Neural Networks (CNN), Support Vector Machines (SVM), Random Forest and XGBoost Classifier algorithms for the Kaggle GTZAN Dataset Music Genre Classification.
-
-## DATASET LINK
-
-[GTZAN Dataset](https://www.kaggle.com/datasets/andradaolteanu/gtzan-dataset-music-genre-classification/data)
-## MY NOTEBOOK LINK
-
-[Music Genre Classification Model](https://colab.research.google.com/drive/1j8RZccP2ee5XlWEFSkTyJ98lFyNrezHS?usp=sharing)
-
-
-## DESCRIPTION
-
-- What is the requirement of the project?
- - The objective of this research is to develop a precise and effective music genre classification model using Convolutional Neural Networks (CNN), Support Vector Machines (SVM), Random Forest and XGBoost algorithms for the Kaggle GTZAN Dataset Music Genre Classification.
-
-- Why is it necessary?
- - Music genre classification has several real-world applications, including music recommendation, content-based music retrieval, and personalized music services. However, the task of music genre classification is challenging due to the subjective nature of music and the complexity of audio signals.
-
-- How is it beneficial and used?
- - **For User :** Provides more personalised music
- - **For Developers:** A recommendation system for songs that are of interest to the user
- - **For Business:** Able to charge premium for the more personalised and recommendation services provided
-
-
-- How did you start approaching this project? (Initial thoughts and planning)
- - Initially how the different sounds are structured.
- - Learned how to represent sound signal in 2D format on graphs using the librosa library.
- - Came to know about the various features of sound like
- * Mel-frequency cepstral coefficients (MFCC)
- * Chromagram
- * Spectral Centroid
- * Zero-crossing rate
- * BPM - Beats Per Minute
-
-- Mention any additional resources used (blogs, books, chapters, articles, research papers, etc.).
- - https://scholarworks.calstate.edu/downloads/73666b68n
- - https://www.kaggle.com/datasets/andradaolteanu/gtzan-dataset-music-genre-classification/data
- - https://towardsdatascience.com/music-genre-classification-with-python-c714d032f0d8
-
-
-## EXPLANATION
-
-### DETAILS OF THE DIFFERENT FEATURES
-
-There are 3 different types of the datasets.
-
-- genres_original
-- images_original
-- features_3_sec.csv
-- feature_30_sec.csv
-
-The features in `genres_original`
-['blues', 'classical', 'country', 'disco', 'hiphop', 'jazz', 'metal', 'pop', 'reggae', 'rock']
-Each and every genre has 100 WAV files
-
-The features in `genres_original`
-['blues', 'classical', 'country', 'disco', 'hiphop', 'jazz', 'metal', 'pop', 'reggae', 'rock']
-Each and every genre has 100 PNG files
-
-There are 60 features in `features_3_sec.csv`
-
-There are 60 features in `features_30_sec.csv`
-
-
-### WHAT I HAVE DONE
-
-* Created data visual reprsentation of the data to help understand the data
-* Found strong relationships between independent features and dependent feature using correlation.
-* Performed Exploratory Data Analysis on data.
-* Used different Classification techniques like SVM, Random Forest,
-* Compared various models and used best performance model to make predictions.
-* Used Mean Squared Error and R2 Score for evaluating model's performance.
-* Visualized best model's performance using matplotlib and seaborn library.
-
-### PROJECT TRADE-OFFS AND SOLUTIONS
-
-
-1. **Trade-off 1**: How do you visualize audio signal
- - **Solution**:
- - **_librosa_**: It is the mother of all audio file libraries
- - **Plotting Graphs**: As I have the necessary libraries to visualize the data. I started plotting the audio signals
- - **Spectogram**:A spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time. When applied to an audio signal, spectrograms are sometimes called sonographs, voiceprints, or voicegrams. Here we convert the frequency axis to a logarithmic one.
-
-2. **Trade-off 2**: Features that help classify the data
- - **Solution**:
- - **Feature Engineering**: What are the features present in audio signals
- - **Spectral Centroid**: Indicates where the ”centre of mass” for a sound is located and is calculated as the weighted mean of the frequencies present in the sound.
- - **Mel-Frequency Cepstral Coefficients**: The Mel frequency cepstral coefficients (MFCCs) of a signal are a small set of features (usually about 10–20) which concisely describe the overall shape of a spectral envelope. It models the characteristics of the human voice.
- - **Chroma Frequencies**: Chroma features are an interesting and powerful representation for music audio in which the entire spectrum is projected onto 12 bins representing the 12 distinct semitones (or chroma) of the musical octave.
-
-3. **Trade-off 3**: Performing EDA on the CSV files
- - **Solution**:
- - **Tool Selection**: Used the correlation matrix on the features_30_sec.csv dataset to extract most related datasets
- - **Visualization Best Practices**: Followed best practices such as using appropriate chart types (e.g., box plots for BPM data, PCA plots for correlations), adding labels and titles, and ensuring readability.
- - **Iterative Refinement**: Iteratively refined visualizations based on feedback and self-review to enhance clarity and informativeness.
-
-4. **Trade-off 4**: Implementing Machine Learning Models
- - **Solution**:
- - **Cross-validation**: Used cross-validation techniques to ensure the reliability and accuracy of the analysis results.
- - **Collaboration with Experts**: Engaged with Music experts and enthusiasts to validate the findings and gain additional perspectives.
- - **Contextual Understanding**: Interpreted results within the context of the music, considering factors such as mood of the users, surrounding, and specific events to provide meaningful and actionable insights.
-
-### LIBRARIES NEEDED
-
-- librosa
-- matplotlib
-- pandas
-- sklearn
-- seaborn
-- numpy
-- scipy
-- xgboost
-
-
-### SCREENSHOTS
-
-
-
-  |
-  |
-
-
-  |
-  |
-
-
-  |
-  |
-
-
-  |
-  |
-
-
-  |
-
-
-  |
-  |
-
-
-  |
-
-
-
-### MODELS USED AND THEIR ACCURACIES
-
-| Model | Accuracy |
-|------------------------------|------------|
-| KNN |0.80581 |
-| Random Forest |0.81415 |
-| Cross Gradient Booster |0.90123 |
-| SVM |0.75409 |
-
-### MODELS COMPARISON GRAPHS
-
-
-
-  |
-
-
-
-### CONCLUSION
-
-* Here we can see that Accuracy plots of the different models
-* Here, XGB Classifier can predict most accurate results for predicting the Genre of the music
-
-### WHAT YOU HAVE LEARNED
-
-- **Insights gained from the data**:
- - Discovered a new library that help visualize audio signal
- - Discovered new features related to audio like STFT, MFCC, Spectral Centroid, Spectral Rolloff
- - Gained a deeper understanding of the features of different genres of music
-
-- **Improvements in understanding machine learning concepts**:
- - Enhanced knowledge of data cleaning and preprocessing techniques to handle real-world datasets.
- - Improved skills in exploratory data analysis (EDA) to extract meaningful insights from raw data.
- - Learned how to use visualization tools to effectively communicate data-driven findings.
-
-### USE CASES OF THIS MODEL
-
-1. **Application 1: User Personalisation**:
- - **Explanation**: Can be used to provide more personalised music recommendation for users based on their taste in music or the various genres they listen to. This personalisation experience can be used to develop 'Premium' based business models
-
-2. **Application 2: Compatability Between Users**:
- - **Explanation**: Based on the musical taste and the genres they listen we can identify the user behaviour and pattern come with similar users who can be friends with. This increases social interaction within the app.
-
-### HOW TO INTEGRATE THIS MODEL IN REAL WORLD
-
-1. Use API to collect user information
-2. Deploy the model using appropriate tools (e.g., Flask, Docker)
-3. Monitor and maintain the model in production
-
-### FEATURES PLANNED BUT NOT IMPLEMENTED
-
-- **Feature 1: Real-time Compatability Tracking**:
- - **Description**: Implementing a real-time tracking system to view compatability between users
- - **Reason it couldn't be implemented**: Lack of access to live data streams and the complexity of integrating real-time data processing.
-
-- **Feature 2: Predictive Analytics**:
- - **Description**: Using advanced machine learning algorithms to predict the next song the users is likely to listen to.
- - **Reason it couldn't be implemented**: Constraints in computational resources and the need for more sophisticated modeling techniques that were beyond the current scope of the project.
-
-### YOUR NAME
-*Filbert Shawn*
-
-[](https://www.linkedin.com/in/filbert-shawn-1a694a256/)
-
-
-#### Happy Coding 🧑💻
-### Show some ❤️ by 🌟 this repository!
-
diff --git a/docs/OpenCV/projects/music_genre_classification_model/assets/accplot.png b/docs/OpenCV/projects/music_genre_classification_model/assets/accplot.png
deleted file mode 100644
index 01524901..00000000
Binary files a/docs/OpenCV/projects/music_genre_classification_model/assets/accplot.png and /dev/null differ
diff --git a/docs/OpenCV/projects/music_genre_classification_model/assets/b_p_m _boxplot.jpg b/docs/OpenCV/projects/music_genre_classification_model/assets/b_p_m _boxplot.jpg
deleted file mode 100644
index 0e356155..00000000
Binary files a/docs/OpenCV/projects/music_genre_classification_model/assets/b_p_m _boxplot.jpg and /dev/null differ
diff --git a/docs/OpenCV/projects/music_genre_classification_model/assets/blues _mel-_spec.jpg b/docs/OpenCV/projects/music_genre_classification_model/assets/blues _mel-_spec.jpg
deleted file mode 100644
index 0574396b..00000000
Binary files a/docs/OpenCV/projects/music_genre_classification_model/assets/blues _mel-_spec.jpg and /dev/null differ
diff --git a/docs/OpenCV/projects/music_genre_classification_model/assets/chromogram.jpg b/docs/OpenCV/projects/music_genre_classification_model/assets/chromogram.jpg
deleted file mode 100644
index a2d5ec66..00000000
Binary files a/docs/OpenCV/projects/music_genre_classification_model/assets/chromogram.jpg and /dev/null differ
diff --git a/docs/OpenCV/projects/music_genre_classification_model/assets/conf matrix.png b/docs/OpenCV/projects/music_genre_classification_model/assets/conf matrix.png
deleted file mode 100644
index 0def89ba..00000000
Binary files a/docs/OpenCV/projects/music_genre_classification_model/assets/conf matrix.png and /dev/null differ
diff --git a/docs/OpenCV/projects/music_genre_classification_model/assets/corr _heatmap.jpg b/docs/OpenCV/projects/music_genre_classification_model/assets/corr _heatmap.jpg
deleted file mode 100644
index 709aeb8a..00000000
Binary files a/docs/OpenCV/projects/music_genre_classification_model/assets/corr _heatmap.jpg and /dev/null differ
diff --git a/docs/OpenCV/projects/music_genre_classification_model/assets/harm&_perc.jpg b/docs/OpenCV/projects/music_genre_classification_model/assets/harm&_perc.jpg
deleted file mode 100644
index c32c4e35..00000000
Binary files a/docs/OpenCV/projects/music_genre_classification_model/assets/harm&_perc.jpg and /dev/null differ
diff --git a/docs/OpenCV/projects/music_genre_classification_model/assets/m_f_c_c.jpg b/docs/OpenCV/projects/music_genre_classification_model/assets/m_f_c_c.jpg
deleted file mode 100644
index 723ce14c..00000000
Binary files a/docs/OpenCV/projects/music_genre_classification_model/assets/m_f_c_c.jpg and /dev/null differ
diff --git a/docs/OpenCV/projects/music_genre_classification_model/assets/p_c_a _scattert.jpg b/docs/OpenCV/projects/music_genre_classification_model/assets/p_c_a _scattert.jpg
deleted file mode 100644
index 99da2711..00000000
Binary files a/docs/OpenCV/projects/music_genre_classification_model/assets/p_c_a _scattert.jpg and /dev/null differ
diff --git a/docs/OpenCV/projects/music_genre_classification_model/assets/pop _mel-_spec.jpg b/docs/OpenCV/projects/music_genre_classification_model/assets/pop _mel-_spec.jpg
deleted file mode 100644
index 629da2eb..00000000
Binary files a/docs/OpenCV/projects/music_genre_classification_model/assets/pop _mel-_spec.jpg and /dev/null differ
diff --git a/docs/OpenCV/projects/music_genre_classification_model/assets/sound _wave.jpg b/docs/OpenCV/projects/music_genre_classification_model/assets/sound _wave.jpg
deleted file mode 100644
index fe437ccb..00000000
Binary files a/docs/OpenCV/projects/music_genre_classification_model/assets/sound _wave.jpg and /dev/null differ
diff --git a/docs/OpenCV/projects/music_genre_classification_model/assets/spec _cent.jpg b/docs/OpenCV/projects/music_genre_classification_model/assets/spec _cent.jpg
deleted file mode 100644
index b98964d5..00000000
Binary files a/docs/OpenCV/projects/music_genre_classification_model/assets/spec _cent.jpg and /dev/null differ
diff --git a/docs/OpenCV/projects/music_genre_classification_model/assets/spec _rolloff.jpg b/docs/OpenCV/projects/music_genre_classification_model/assets/spec _rolloff.jpg
deleted file mode 100644
index b663ebf7..00000000
Binary files a/docs/OpenCV/projects/music_genre_classification_model/assets/spec _rolloff.jpg and /dev/null differ
diff --git a/docs/OpenCV/projects/music_genre_classification_model/assets/stft.jpg b/docs/OpenCV/projects/music_genre_classification_model/assets/stft.jpg
deleted file mode 100644
index 8862efd5..00000000
Binary files a/docs/OpenCV/projects/music_genre_classification_model/assets/stft.jpg and /dev/null differ
diff --git a/docs/Pre-Processing/blogs/min_max_scaler/README.md b/docs/Pre-Processing/blogs/min_max_scaler.md
similarity index 90%
rename from docs/Pre-Processing/blogs/min_max_scaler/README.md
rename to docs/Pre-Processing/blogs/min_max_scaler.md
index c2f80931..0ffc8dfe 100644
--- a/docs/Pre-Processing/blogs/min_max_scaler/README.md
+++ b/docs/Pre-Processing/blogs/min_max_scaler.md
@@ -1,4 +1,4 @@
-# MinMaxScaler
+# Min Max Scaler
A custom implementation of a MinMaxScaler class for scaling numerical data in a pandas DataFrame. The class scales the features to a specified range, typically between 0 and 1.
@@ -44,11 +44,11 @@ A custom implementation of a MinMaxScaler class for scaling numerical data in a
## Use Case
-
+
## Output
-
+
## Installation
diff --git a/docs/Pre-Processing/blogs/min_max_scaler/images/output.png b/docs/Pre-Processing/blogs/min_max_scaler/images/output.png
deleted file mode 100644
index 4d542ffc..00000000
Binary files a/docs/Pre-Processing/blogs/min_max_scaler/images/output.png and /dev/null differ
diff --git a/docs/Pre-Processing/blogs/min_max_scaler/images/use_case.png b/docs/Pre-Processing/blogs/min_max_scaler/images/use_case.png
deleted file mode 100644
index c2857a3c..00000000
Binary files a/docs/Pre-Processing/blogs/min_max_scaler/images/use_case.png and /dev/null differ
diff --git a/docs/Pre-Processing/blogs/ordinal_encoder/README.md b/docs/Pre-Processing/blogs/ordinal_encoder.md
similarity index 87%
rename from docs/Pre-Processing/blogs/ordinal_encoder/README.md
rename to docs/Pre-Processing/blogs/ordinal_encoder.md
index 09061153..34f90cd1 100644
--- a/docs/Pre-Processing/blogs/ordinal_encoder/README.md
+++ b/docs/Pre-Processing/blogs/ordinal_encoder.md
@@ -1,4 +1,4 @@
-# OrdinalEncoder
+# Ordinal Encoder
A custom implementation of an OrdinalEncoder class for encoding categorical data into ordinal integers using a pandas DataFrame. The class maps each unique category to an integer based on the order of appearance.
@@ -41,12 +41,12 @@ A custom implementation of an OrdinalEncoder class for encoding categorical data
## Use Case
-
+
## Output
-
+
## Installation
-No special installation is required. Just ensure you have `pandas` installed in your Python environment.
\ No newline at end of file
+No special installation is required. Just ensure you have `pandas` installed in your Python environment.
diff --git a/docs/Pre-Processing/blogs/ordinal_encoder/images/output.png b/docs/Pre-Processing/blogs/ordinal_encoder/images/output.png
deleted file mode 100644
index 3fd534eb..00000000
Binary files a/docs/Pre-Processing/blogs/ordinal_encoder/images/output.png and /dev/null differ
diff --git a/docs/Pre-Processing/blogs/ordinal_encoder/images/use_case.png b/docs/Pre-Processing/blogs/ordinal_encoder/images/use_case.png
deleted file mode 100644
index ebedda2b..00000000
Binary files a/docs/Pre-Processing/blogs/ordinal_encoder/images/use_case.png and /dev/null differ
diff --git a/docs/Pre-Processing/blogs/standard_scaler/README.md b/docs/Pre-Processing/blogs/standard_scaler.md
similarity index 88%
rename from docs/Pre-Processing/blogs/standard_scaler/README.md
rename to docs/Pre-Processing/blogs/standard_scaler.md
index ce2b2150..6c5dc659 100644
--- a/docs/Pre-Processing/blogs/standard_scaler/README.md
+++ b/docs/Pre-Processing/blogs/standard_scaler.md
@@ -1,4 +1,4 @@
-# StandardScaler
+# Standard Scaler
A custom implementation of a StandardScaler class for scaling numerical data in a pandas DataFrame or NumPy array. The class scales the features to have zero mean and unit variance.
@@ -49,12 +49,12 @@ A custom implementation of a StandardScaler class for scaling numerical data in
## Use Case
-
+
## Output
-
+
## Installation
-No special installation is required. Just ensure you have `pandas` and `numpy` installed in your Python environment.
\ No newline at end of file
+No special installation is required. Just ensure you have `pandas` and `numpy` installed in your Python environment.
diff --git a/docs/Pre-Processing/blogs/standard_scaler/images/output.png b/docs/Pre-Processing/blogs/standard_scaler/images/output.png
deleted file mode 100644
index 6a8af2f2..00000000
Binary files a/docs/Pre-Processing/blogs/standard_scaler/images/output.png and /dev/null differ
diff --git a/docs/Pre-Processing/blogs/standard_scaler/images/use_case.png b/docs/Pre-Processing/blogs/standard_scaler/images/use_case.png
deleted file mode 100644
index f386310c..00000000
Binary files a/docs/Pre-Processing/blogs/standard_scaler/images/use_case.png and /dev/null differ
diff --git a/docs/Pre-Processing/projects/bangladesh_premier_league_analysis.md b/docs/Pre-Processing/projects/bangladesh_premier_league_analysis.md
new file mode 100644
index 00000000..f91e64eb
--- /dev/null
+++ b/docs/Pre-Processing/projects/bangladesh_premier_league_analysis.md
@@ -0,0 +1,320 @@
+# Bangladesh Premier League Analysis
+
+### AIM
+
+The main goal of the project is to analyze the performance of the bangladesh players in their premier league and obtaining the top 5 players in all of them in different fields like bowling, batting, toss_winner, highest runner, man of the match, etc.
+
+### DATASET LINK
+
+[https://www.kaggle.com/abdunnoor11/bpl-data](https://www.kaggle.com/abdunnoor11/bpl-data)
+
+### MY NOTEBOOK LINK
+
+[https://colab.research.google.com/drive/1equud2jwKnmE1qbbTJLsi2BbjuA7B1Si?usp=sharing](https://colab.research.google.com/drive/1equud2jwKnmE1qbbTJLsi2BbjuA7B1Si?usp=sharing)
+
+### LIBRARIES NEEDED
+
+??? quote "LIBRARIES USED"
+
+ - matplotlib
+ - pandas
+ - sklearn
+ - seaborn
+ - numpy
+ - scipy
+ - xgboost
+ - Tensorflow
+ - Keras
+
+---
+
+### DESCRIPTION
+
+!!! info "What is the requirement of the project?"
+ - This project aims to analyze player performance data from the Bangladesh Premier League (BPL) to classify players into categories such as best, good, average, and poor based on their performance.
+ - The analysis provides valuable insights for players and coaches, highlighting who needs more training and who requires less, which can aid in strategic planning for future matches.
+
+??? info "Why is it necessary?"
+ - Analyzing player performance helps in understanding strengths and weaknesses, which can significantly reduce the chances of losing and increase the chances of winning future matches.
+ - It aids in making informed decisions about team selection and match strategies.
+
+??? info "How is it beneficial and used?"
+ - **For Players:** Provides feedback on their performance, helping them to improve specific aspects of their game.
+ - **For Coaches:** Helps in identifying areas where players need improvement, which can be focused on during training sessions.
+ - **For Team Management:** Assists in strategic decision-making regarding player selection and match planning.
+ - **For Fans and Analysts:** Offers insights into player performances and trends over the league, enhancing the understanding and enjoyment of the game.
+
+??? info "How did you start approaching this project? (Initial thoughts and planning)"
+ - Perform initial data exploration to understand the structure and contents of the dataset.
+ - To learn about the topic and searching the related content like `what is league`, `About bangladesh league`, `their players` and much more.
+ - Learn about the features in details by searching on the google or quora.
+
+??? info "Mention any additional resources used (blogs, books, chapters, articles, research papers, etc.)."
+ - Articles on cricket analytics from websites such as ESPNcricinfo and Cricbuzz.
+ - [https://www.linkedin.com/pulse/premier-league-202223-data-analysis-part-i-ayomide-aremu-cole-iwn4e/](https://www.linkedin.com/pulse/premier-league-202223-data-analysis-part-i-ayomide-aremu-cole-iwn4e/)
+ - [https://analyisport.com/insights/how-is-data-used-in-the-premier-league/](https://analyisport.com/insights/how-is-data-used-in-the-premier-league/)
+
+---
+
+### EXPLANATION
+
+#### DETAILS OF THE DIFFERENT FEATURES
+
+ There are 3 different types of the datasets.
+
+ - Batsman Dataset
+ - Bowler Dataset
+ - BPL (Bangladesh Premier League) Dataset
+
+- There are 12 features in `Batsman Dataset`
+
+| Feature Name | Description|
+|--------------|------------|
+| id | All matches unique id |
+| season | Season |
+| match_no | Number of matches |
+| date | Date of Play |
+| player_name | Player Name |
+| comment | How did the batsman get out? |
+| R | Batsman's run |
+| B | How many balls faced the batsman? |
+| M | How long their innings was in minutes? |
+| fours | Fours |
+| sixs | Sixes |
+| SR | Strike rate |
+
+- There are 12 features in `Bowler Dataset`
+
+| Feature Name | Description|
+|--------------|------------|
+| id | All matches unique id |
+| season | Season |
+| match_no | Number of matches |
+| date | Date of Play |
+| player_name | Player Name |
+| O | Overs |
+| M | middle overs |
+| R | Runs |
+| W | Wickets |
+| ECON | The average number of runs they have conceded per over bowled |
+| WD | Wide balls |
+| NB | No balls |
+
+- There are 19 features in `BPL Dataset`
+
+| Feature Name | Description|
+|--------------|------------|
+| id | All matches unique id |
+| season | Season |
+| match_no | Number of matches |
+| date | Date of Play |
+| team_1 | First Team |
+| team_1_score | First Team Score |
+| team_2 | Second Team |
+| team_2_score | Second Team Score |
+| player_of_match | Which team won the toss? |
+| toss_winner | Which team won the toss? |
+| toss_decision | Toss winner team decision |
+| winner | Match Winner |
+| venue | Venue |
+| city | City |
+| win_by_wickets | Win by wickets. |
+| win_by_runs | Win by runs |
+| result | Result of the winner |
+| umpire_1 | First Umpire Name |
+| umpire_2 | Second Umpire Name |
+
+---
+
+#### WHAT I HAVE DONE
+
+=== "Step 1"
+
+ - Performed Exploratory Data Analysis on data.
+
+=== "Step 2"
+
+ - Created data visualisations to understand the data in a better way.
+
+=== "Step 3"
+
+ - Found strong relationships between independent features and dependent feature using correlation.
+
+=== "Step 4"
+
+ - Handled missing values using strong correlations,dropping unnecessary ones.
+
+=== "Step 5"
+
+ - Used different Regression techniques like Linear Regression,Ridge Regression,Lasso Regression and deep neural networks to predict the dependent feature in most suitable manner.
+
+=== "Step 6"
+
+ - Compared various models and used best performance model to make predictions.
+
+=== "Step 7"
+
+ - Used Mean Squared Error and R2 Score for evaluating model's performance.
+
+=== "Step 8"
+
+ - Visualized best model's performance using matplotlib and seaborn library.
+
+---
+
+#### PROJECT TRADE-OFFS AND SOLUTIONS
+
+=== "Trade Off 1"
+
+ Handling missing and inconsistent data entries.
+
+ - **Solution**
+ - **Data Imputation**: For missing numerical values, I used techniques such as mean, median, or mode imputation based on the distribution of the data.
+ - **Data Cleaning**: For inconsistent entries, I standardized the data by removing duplicates, correcting typos, and ensuring uniform formatting.
+ - **Dropping Irrelevant Data**: In cases where the missing data was extensive and could not be reliably imputed, I decided to drop those rows/columns to maintain data integrity.
+
+=== "Trade Off 2"
+
+ Extracting target variables from the dataset.
+
+ - **Solution**
+ - **Feature Engineering**: Created new features that could serve as target variables, such as aggregating player statistics to determine top performers.
+ - **Domain Knowledge**: Utilized cricket domain knowledge to identify relevant metrics (e.g., strike rate, economy rate) and used them to define target variables.
+ - **Label Encoding**: For categorical target variables (e.g., player categories like best, good, average, poor), I used label encoding techniques to convert them into numerical format for analysis.
+
+=== "Trade Off 3"
+
+ Creating clear and informative visualizations that effectively communicate the findings.
+
+ - **Solution**
+ - **Tool Selection**: Used powerful visualization tools like Matplotlib and Seaborn in Python, which provide a wide range of customization options.
+ - **Visualization Best Practices**: Followed best practices such as using appropriate chart types (e.g., bar charts for categorical data, scatter plots for correlations), adding labels and titles, and ensuring readability.
+ - **Iterative Refinement**: Iteratively refined visualizations based on feedback and self-review to enhance clarity and informativeness.
+
+=== "Trade Off 4"
+
+ Correctly interpreting the results to provide actionable insights.
+
+ - **Solution**
+ - **Cross-validation**: Used cross-validation techniques to ensure the reliability and accuracy of the analysis results.
+ - **Collaboration with Experts**: Engaged with cricket experts and enthusiasts to validate the findings and gain additional perspectives.
+ - **Contextual Understanding**: Interpreted results within the context of the game, considering factors such as player roles, match conditions, and historical performance to provide meaningful and actionable insights.
+
+---
+
+### SCREENSHOTS
+
+!!! success "Project workflow"
+
+ ``` mermaid
+ graph LR
+ A[Start] --> B{Error?};
+ B -->|Yes| C[Hmm...];
+ C --> D[Debug];
+ D --> B;
+ B ---->|No| E[Yay!];
+ ```
+
+??? tip "Visualizations and EDA of different features"
+
+ === "Top 5 Player Of Match"
+ 
+
+ === "Top 5 Batsman Runners"
+ 
+
+ === "Top 5 Four Runners"
+ 
+
+ === "Top 5 Overs"
+ 
+
+ === "Top 5 Runs"
+ 
+
+ === "Top 5 Umpires"
+ 
+
+ === "Top 5 Wickets"
+ 
+
+ === "Toss Winners"
+ 
+
+---
+
+### MODELS USED AND THEIR ACCURACIES
+
+| Model | MSE | R2 |
+|------------------------------|------------|------------|
+| Random Forest Regression | 19.355984 | 0.371316 |
+| Gradient Boosting Regression | 19.420494 | 0.369221 |
+| XG Boost Regression | 21.349168 | 0.306577 |
+| Ridge Regression | 26.813981 | 0.129080 |
+| Linear Regression | 26.916888 | 0.125737 |
+| Deep Neural Network | 27.758216 | 0.098411 |
+| Decision Tree Regression | 29.044533 | 0.056631 |
+
+---
+
+#### MODELS COMPARISON GRAPHS
+
+!!! tip "Models Comparison Graphs"
+
+ === "RF Regression Plot"
+ 
+
+ === "Conclusion Graph"
+ 
+
+---
+
+### CONCLUSION
+
+ We can see that R2 Score and Mean Absolute Error is best for Random Forest Regression.
+ By Using Neural network, We cannot get the minimum Mean Squared Error value possible.
+ Random Forest Regression model can predict most accurate results for predicting bangladesh premier league winning team which is the highest model performance in comparison with other Models.
+
+#### WHAT YOU HAVE LEARNED
+
+!!! tip "Insights gained from the data"
+ - Identified key performance indicators for players in the Bangladesh Premier League, such as top scorers, best bowlers, and players with the most man of the match awards.
+ - Discovered trends and patterns in player performances that could inform future strategies and training programs.
+ - Gained a deeper understanding of the distribution of player performances across different matches and seasons.
+
+??? tip "Improvements in understanding machine learning concepts"
+ - Enhanced knowledge of data cleaning and preprocessing techniques to handle real-world datasets.
+ - Improved skills in exploratory data analysis (EDA) to extract meaningful insights from raw data.
+ - Learned how to use visualization tools to effectively communicate data-driven findings.
+
+---
+
+#### USE CASES OF THIS MODEL
+
+=== "Application 1"
+
+ **Team Selection and Strategy Planning**
+
+ - Coaches and team managers can use the model to analyze player performance data and make informed decisions about team selection and match strategies. By identifying top performers and areas for improvement, the model can help optimize team composition and tactics for future matches.
+
+=== "Application 2"
+
+ **Player Performance Monitoring and Training**
+
+ - The model can be used to track player performance over time and identify trends in their performance. This information can be used by coaches to tailor training programs to address specific weaknesses and enhance overall player development. By monitoring performance metrics, the model can help ensure that players are continuously improving.
+
+---
+
+#### FEATURES PLANNED BUT NOT IMPLEMENTED
+
+=== "Feature 1"
+
+ **Real-time Performance Tracking**
+
+ - Implementing a real-time tracking system to update player performance metrics during live matches.
+
+=== "Feature 2"
+
+ **Advanced Predictive Analytics**
+ - Using advanced machine learning algorithms to predict future player performances and match outcomes.
+
diff --git a/docs/Pre-Processing/projects/bangladesh_premier_league_analysis/README.md b/docs/Pre-Processing/projects/bangladesh_premier_league_analysis/README.md
deleted file mode 100644
index 7138fa68..00000000
--- a/docs/Pre-Processing/projects/bangladesh_premier_league_analysis/README.md
+++ /dev/null
@@ -1,249 +0,0 @@
-# Bangladesh Premier League Analysis
-
-## AIM
-
-The main goal of the project is to analyze the performance of the bangladesh players in their premier league and obtaining the top 5 players in all of them in different fields like bowling, batting, toss_winner, highest runner, man of the match, etc.
-
-## DATASET LINK
-
-https://www.kaggle.com/abdunnoor11/bpl-data
-
-## MY NOTEBOOK LINK
-
-https://colab.research.google.com/drive/1equud2jwKnmE1qbbTJLsi2BbjuA7B1Si?usp=sharing
-
-
-## DESCRIPTION
-
-- What is the requirement of the project?
- - This project aims to analyze player performance data from the Bangladesh Premier League (BPL) to classify players into categories such as best, good, average, and poor based on their performance.
- - The analysis provides valuable insights for players and coaches, highlighting who needs more training and who requires less, which can aid in strategic planning for future matches.
-
-- Why is it necessary?
- - Analyzing player performance helps in understanding strengths and weaknesses, which can significantly reduce the chances of losing and increase the chances of winning future matches.
- - It aids in making informed decisions about team selection and match strategies.
-
-- How is it beneficial and used?
- - **For Players:** Provides feedback on their performance, helping them to improve specific aspects of their game.
- - **For Coaches:** Helps in identifying areas where players need improvement, which can be focused on during training sessions.
- - **For Team Management:** Assists in strategic decision-making regarding player selection and match planning.
- - **For Fans and Analysts:** Offers insights into player performances and trends over the league, enhancing the understanding and enjoyment of the game.
-
-- How did you start approaching this project? (Initial thoughts and planning)
- - Perform initial data exploration to understand the structure and contents of the dataset.
- - To learn about the topic and searching the related content like `what is league`, `About bangladesh league`, `their players` and much more.
- - Learn about the features in details by searching on the google or quora.
-
-- Mention any additional resources used (blogs, books, chapters, articles, research papers, etc.).
- - Articles on cricket analytics from websites such as ESPNcricinfo and Cricbuzz.
- - https://www.linkedin.com/pulse/premier-league-202223-data-analysis-part-i-ayomide-aremu-cole-iwn4e/
- - https://analyisport.com/insights/how-is-data-used-in-the-premier-league/
-
-
-## EXPLANATION
-
-### DETAILS OF THE DIFFERENT FEATURES
-
-There are 3 different types of the datasets.
-
-- Batsman Dataset
-- Bowler Dataset
-- BPL (Bangladesh Premier League) Dataset
-
-There are 12 features in `Batsman Dataset`
-
-| Feature Name | Description|
-|--------------|------------|
-| id | All matches unique id |
-| season | Season |
-| match_no | Number of matches |
-| date | Date of Play |
-| player_name | Player Name |
-| comment | How did the batsman get out? |
-| R | Batsman's run |
-| B | How many balls faced the batsman? |
-| M | How long their innings was in minutes? |
-| fours | Fours |
-| sixs | Sixes |
-| SR | Strike rate |
-
-There are 12 features in `Bowler Dataset`
-
-| Feature Name | Description|
-|--------------|------------|
-| id | All matches unique id |
-| season | Season |
-| match_no | Number of matches |
-| date | Date of Play |
-| player_name | Player Name |
-| O | Overs |
-| M | middle overs |
-| R | Runs |
-| W | Wickets |
-| ECON | The average number of runs they have conceded per over bowled |
-| WD | Wide balls |
-| NB | No balls |
-
-There are 19 features in `BPL Dataset`
-
-| Feature Name | Description|
-|--------------|------------|
-| id | All matches unique id |
-| season | Season |
-| match_no | Number of matches |
-| date | Date of Play |
-| team_1 | First Team |
-| team_1_score | First Team Score |
-| team_2 | Second Team |
-| team_2_score | Second Team Score |
-| player_of_match | Which team won the toss? |
-| toss_winner | Which team won the toss? |
-| toss_decision | Toss winner team decision |
-| winner | Match Winner |
-| venue | Venue |
-| city | City |
-| win_by_wickets | Win by wickets. |
-| win_by_runs | Win by runs |
-| result | Result of the winner |
-| umpire_1 | First Umpire Name |
-| umpire_2 | Second Umpire Name |
-
-### WHAT I HAVE DONE
-
-* Performed Exploratory Data Analysis on data.
-* Created data visualisations to understand the data in a better way.
-* Found strong relationships between independent features and dependent feature using correlation.
-* Handled missing values using strong correlations,dropping unnecessary ones.
-* Used different Regression techniques like Linear Regression,Ridge Regression,Lasso Regression and deep neural networks to predict the dependent feature in most suitable manner.
-* Compared various models and used best performance model to make predictions.
-* Used Mean Squared Error and R2 Score for evaluating model's performance.
-* Visualized best model's performance using matplotlib and seaborn library.
-
-### PROJECT TRADE-OFFS AND SOLUTIONS
-
-
-1. **Trade-off 1**: Handling missing and inconsistent data entries.
- - **Solution**:
- - **Data Imputation**: For missing numerical values, I used techniques such as mean, median, or mode imputation based on the distribution of the data.
- - **Data Cleaning**: For inconsistent entries, I standardized the data by removing duplicates, correcting typos, and ensuring uniform formatting.
- - **Dropping Irrelevant Data**: In cases where the missing data was extensive and could not be reliably imputed, I decided to drop those rows/columns to maintain data integrity.
-
-2. **Trade-off 2**: Extracting target variables from the dataset.
- - **Solution**:
- - **Feature Engineering**: Created new features that could serve as target variables, such as aggregating player statistics to determine top performers.
- - **Domain Knowledge**: Utilized cricket domain knowledge to identify relevant metrics (e.g., strike rate, economy rate) and used them to define target variables.
- - **Label Encoding**: For categorical target variables (e.g., player categories like best, good, average, poor), I used label encoding techniques to convert them into numerical format for analysis.
-
-3. **Trade-off 3**: Creating clear and informative visualizations that effectively communicate the findings.
- - **Solution**:
- - **Tool Selection**: Used powerful visualization tools like Matplotlib and Seaborn in Python, which provide a wide range of customization options.
- - **Visualization Best Practices**: Followed best practices such as using appropriate chart types (e.g., bar charts for categorical data, scatter plots for correlations), adding labels and titles, and ensuring readability.
- - **Iterative Refinement**: Iteratively refined visualizations based on feedback and self-review to enhance clarity and informativeness.
-
-4. **Trade-off 4**: Correctly interpreting the results to provide actionable insights.
- - **Solution**:
- - **Cross-validation**: Used cross-validation techniques to ensure the reliability and accuracy of the analysis results.
- - **Collaboration with Experts**: Engaged with cricket experts and enthusiasts to validate the findings and gain additional perspectives.
- - **Contextual Understanding**: Interpreted results within the context of the game, considering factors such as player roles, match conditions, and historical performance to provide meaningful and actionable insights.
-
-### LIBRARIES NEEDED
-
-- matplotlib
-- pandas
-- sklearn
-- seaborn
-- numpy
-- scipy
-- xgboost
-- Tensorflow
-- Keras
-
-### SCREENSHOTS
-
-
-
-
-
-
-
-
-
-
-### MODELS USED AND THEIR ACCURACIES
-
-| Model | MSE | R2 |
-|------------------------------|------------|------------|
-| Random Forest Regression | 19.355984 | 0.371316 |
-| Gradient Boosting Regression | 19.420494 | 0.369221 |
-| XG Boost Regression | 21.349168 | 0.306577 |
-| Ridge Regression | 26.813981 | 0.129080 |
-| Linear Regression | 26.916888 | 0.125737 |
-| Deep Neural Network | 27.758216 | 0.098411 |
-| Decision Tree Regression | 29.044533 | 0.056631 |
-
-### MODELS COMPARISON GRAPHS
-
-
-
-
-## CONCLUSION
-
-* Here we can see that R2 Score and Mean Absolute Error is best for Random Forest Regression.
-* By Using Neural network, We cannot get the minimum Mean Squared Error value possible.
-* Here, Random Forest Regression model can predict most accurate results for predicting bangladesh premier league winning team which is the highest model performance in comparison with other Models.
-
-
-
-### WHAT YOU HAVE LEARNED
-
-- **Insights gained from the data**:
- - Identified key performance indicators for players in the Bangladesh Premier League, such as top scorers, best bowlers, and players with the most man of the match awards.
- - Discovered trends and patterns in player performances that could inform future strategies and training programs.
- - Gained a deeper understanding of the distribution of player performances across different matches and seasons.
-
-- **Improvements in understanding machine learning concepts**:
- - Enhanced knowledge of data cleaning and preprocessing techniques to handle real-world datasets.
- - Improved skills in exploratory data analysis (EDA) to extract meaningful insights from raw data.
- - Learned how to use visualization tools to effectively communicate data-driven findings.
-
-### USE CASES OF THIS MODEL
-
-1. **Application 1: Team Selection and Strategy Planning**:
- - **Explanation**: Coaches and team managers can use the model to analyze player performance data and make informed decisions about team selection and match strategies. By identifying top performers and areas for improvement, the model can help optimize team composition and tactics for future matches.
-
-2. **Application 2: Player Performance Monitoring and Training**:
- - **Explanation**: The model can be used to track player performance over time and identify trends in their performance. This information can be used by coaches to tailor training programs to address specific weaknesses and enhance overall player development. By monitoring performance metrics, the model can help ensure that players are continuously improving.
-
-### HOW TO INTEGRATE THIS MODEL IN REAL WORLD
-
-1. Prepare the data pipeline
-2. Deploy the model using appropriate tools (e.g., Flask, Docker)
-3. Monitor and maintain the model in production
-
-### FEATURES PLANNED BUT NOT IMPLEMENTED
-
-- **Feature 1: Real-time Performance Tracking**:
- - **Description**: Implementing a real-time tracking system to update player performance metrics during live matches.
- - **Reason it couldn't be implemented**: Lack of access to live data streams and the complexity of integrating real-time data processing.
-
-- **Feature 2: Advanced Predictive Analytics**:
- - **Description**: Using advanced machine learning algorithms to predict future player performances and match outcomes.
- - **Reason it couldn't be implemented**: Constraints in computational resources and the need for more sophisticated modeling techniques that were beyond the current scope of the project.
-
-### YOUR NAME
-*Avdhesh Varshney*
-
-[](https://www.linkedin.com/in/avdhesh-varshney/)
-
-
-#### Happy Coding 🧑💻
-### Show some ❤️ by 🌟 this repository!
-
diff --git a/docs/Pre-Processing/projects/bangladesh_premier_league_analysis/assets/conclusion-_graph.png b/docs/Pre-Processing/projects/bangladesh_premier_league_analysis/assets/conclusion-_graph.png
deleted file mode 100644
index 737b20f5..00000000
Binary files a/docs/Pre-Processing/projects/bangladesh_premier_league_analysis/assets/conclusion-_graph.png and /dev/null differ
diff --git a/docs/Pre-Processing/projects/bangladesh_premier_league_analysis/assets/r_f_regression_plot.png b/docs/Pre-Processing/projects/bangladesh_premier_league_analysis/assets/r_f_regression_plot.png
deleted file mode 100644
index 2a1d3e7a..00000000
Binary files a/docs/Pre-Processing/projects/bangladesh_premier_league_analysis/assets/r_f_regression_plot.png and /dev/null differ
diff --git a/docs/Pre-Processing/projects/bangladesh_premier_league_analysis/assets/top-5-_batsman-_runners.png b/docs/Pre-Processing/projects/bangladesh_premier_league_analysis/assets/top-5-_batsman-_runners.png
deleted file mode 100644
index 8118eadc..00000000
Binary files a/docs/Pre-Processing/projects/bangladesh_premier_league_analysis/assets/top-5-_batsman-_runners.png and /dev/null differ
diff --git a/docs/Pre-Processing/projects/bangladesh_premier_league_analysis/assets/top-5-_four-_runners.png b/docs/Pre-Processing/projects/bangladesh_premier_league_analysis/assets/top-5-_four-_runners.png
deleted file mode 100644
index 5aa1e599..00000000
Binary files a/docs/Pre-Processing/projects/bangladesh_premier_league_analysis/assets/top-5-_four-_runners.png and /dev/null differ
diff --git a/docs/Pre-Processing/projects/bangladesh_premier_league_analysis/assets/top-5-_overs.png b/docs/Pre-Processing/projects/bangladesh_premier_league_analysis/assets/top-5-_overs.png
deleted file mode 100644
index af118ce7..00000000
Binary files a/docs/Pre-Processing/projects/bangladesh_premier_league_analysis/assets/top-5-_overs.png and /dev/null differ
diff --git a/docs/Pre-Processing/projects/bangladesh_premier_league_analysis/assets/top-5-_player-of-_match.png b/docs/Pre-Processing/projects/bangladesh_premier_league_analysis/assets/top-5-_player-of-_match.png
deleted file mode 100644
index edfcbb4e..00000000
Binary files a/docs/Pre-Processing/projects/bangladesh_premier_league_analysis/assets/top-5-_player-of-_match.png and /dev/null differ
diff --git a/docs/Pre-Processing/projects/bangladesh_premier_league_analysis/assets/top-5-_runs.png b/docs/Pre-Processing/projects/bangladesh_premier_league_analysis/assets/top-5-_runs.png
deleted file mode 100644
index 53b8b30e..00000000
Binary files a/docs/Pre-Processing/projects/bangladesh_premier_league_analysis/assets/top-5-_runs.png and /dev/null differ
diff --git a/docs/Pre-Processing/projects/bangladesh_premier_league_analysis/assets/top-5-_umpires.png b/docs/Pre-Processing/projects/bangladesh_premier_league_analysis/assets/top-5-_umpires.png
deleted file mode 100644
index 7d913bbe..00000000
Binary files a/docs/Pre-Processing/projects/bangladesh_premier_league_analysis/assets/top-5-_umpires.png and /dev/null differ
diff --git a/docs/Pre-Processing/projects/bangladesh_premier_league_analysis/assets/top-5-_wickets.png b/docs/Pre-Processing/projects/bangladesh_premier_league_analysis/assets/top-5-_wickets.png
deleted file mode 100644
index 7a20b716..00000000
Binary files a/docs/Pre-Processing/projects/bangladesh_premier_league_analysis/assets/top-5-_wickets.png and /dev/null differ
diff --git a/docs/Pre-Processing/projects/bangladesh_premier_league_analysis/assets/toss-_winners.png b/docs/Pre-Processing/projects/bangladesh_premier_league_analysis/assets/toss-_winners.png
deleted file mode 100644
index 7a073e2b..00000000
Binary files a/docs/Pre-Processing/projects/bangladesh_premier_league_analysis/assets/toss-_winners.png and /dev/null differ