About the Project: The project relates to applying predictive analytics on customer churn. A major telecom company’s postpaid business of voice-only plans is struggling to maintain its strong foothold in local market because of:
- High churn rate amongst customers leading to a revenue decline of ~500k USD every month
- Decline in overall customer base (high churn rate combined with low acquisition rate), leading to a decline in total market share
- Build a classification model to predict churners one month in advance
- Identify key churn drivers
Company CEO believes that existing models can predict churners precisely, but it’s too late to take any retention actions, as customer usage has significantly declined by then.
- Perform initial data preparation a. Number of customers with zero monthly revenue? b. Number of customers with missing values percentage > 5%? c. Remove outliers for columns ‘UniqueSubs’ and ‘DirectorAssistedCalls’ is any.
- Perform exploratory analysis to analyse customer churn a. Do customers with high overage minutes also have high revenue? b. Does high number of active subscribers lead to low monthly revenue? c. Does credit rating have an impact on churn rate?
- Create additional features to help predict churn a. Percent of current active subs over total subs b. Percent of recurrent charge to monthly charge c. Percent of overage minutes over total monthly minutes
- Build classification model to predict customer churn a. Build a simple logistic regression model to predict churn and evaluate model accuracy on test data set b. Build Random Forest classifier to compare model accuracy over the logistic regression model c. Identify most important features impacting churn (Model evaluation metrics to be used: GINI, AUC, Precision and Recall)
- Use the ‘Prediction Data’ provided to predict churners using the best model identified in step 4
- Calculate lift chart and total monthly revenue saved by targeting top 10-20% of the customers using your best predictive model
1. Exploratory data analysis
a. Missing value identification and treatment, outlier detection, etc.
b. Univariate analysis - histograms to check distribution, box/violin plot, summary stats such as mean, median, mode, etc.
c. Correlation analysis, scatter plots
d. Time analysis to monitor trend and seasonality
2. Feature engineering
a. Target variable creation (dependent variable) - align with the objectives
b. New feature creation based on business and the objectives
c. Trend variables such as moving average, st. dev. over time, etc.
3. ML model build
a. Model type – classification vs regression
b. Algorithm type – logistic regression, linear regression, Random forest, etc.
c. Train and test data split (e.g., 70:30 split)
d. Hyperparameter tuning for model optimization
e. Feature importance
4. Model evaluation
a. Model evaluation metrics
b. Regression – R-Square, Mean Absolute Error (MAE), Mean Square Error
c. Classification – Confusion metrics, precision, recall, GINI, AUC, etc.
d. Lift and Gain charts
e. Prediction data set for model validation
