Skip to content

This repository contains implementations of various statistical hypothesis tests in Python language applied to synthetic business sales data. It covers various parametric and non-parametric tests to support data-driven decision-making in business analytics. Visualizations and assumption checks are also included.

Notifications You must be signed in to change notification settings

Nasif-ctrl/Hypothesis-Testing-in-Business-Analytics-using-Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Hypothesis Testing in Business Analytics using Python

📝 Overview

This project explores a suite of statistical hypothesis tests, encompassing both parametric and non-parametric methods, to evaluate differences between groups, relationships between variables, and distribution characteristics.

Statistical Tests Applied

Parametric Tests Non-Parametric Tests
Two-Sample T-Test (Independent Groups) Mann-Whitney U Test
Paired T-Test (Before vs. After Campaign) Wilcoxon Signed-Rank Test
One-Way ANOVA Kruskal-Wallis H Test
Pearson Correlation Coefficient Spearman Rank Correlation Coefficient
Chi-Square Test of Independence
Chi-Square Goodness-of-Fit Test

🧪 Each test is applied considering its assumptions and conditions of validity, supported by preliminary checks such as the Shapiro-Wilk test for normality and Levene’s test for homogeneity of variances.
🧪 The visual representation of data through histograms, Q-Q plots and scatter plots facilitates a clearer understanding of the underlying distributions and assists in the selection of appropriate statistical techniques.
🧪 The dataset used in this study was synthetically generated to simulate realistic sales and regional data patterns. Coding was conducted using Google Colab, a cloud-based Python environment that allows seamless integration with Google Drive and simplifies code execution and data handling. Jupyter Notebook could also be used as an alternative.

📂 Contents

File Name File Type Description
README MD Read this before anything else
sales_data CSV Dataset generated for analysis
Code_Hypothesis_Tests_in_Python IPYNB Python Notebook

▶️ How to Execute the Program

Before executing the program, download the IPYNB file (Code_Hypothesis_Tests_in_Python) from this repository. Afterwards, follow these steps:

If you are using Google Colab:

• Open a browser and go to https://colab.research.google.com.
• Click on File > Upload Notebook.
• Select and open the downloaded IPYNB file.
• Click on the run button adjacent to each code snippet to run the code.

If you are using Jupyter Notebook:

• If you don’t have Anaconda or Jupyter Notebook installed, visit: https://www.anaconda.com and download the installer appropriate for your OS.
• After downloading, double-click on the downloaded file and follow the on-screen instructions to complete the installation process.
• Locate and run the program ‘Anaconda Prompt’.
• Run Jupyter Notebook after navigating to the folder containing the downloaded IPYNB file. For instance, if the files are located in a folder called PythonCode in Local Disk (D:), then you have to run D:\PythonCode>jupyter notebook.
• After opening the IPYNB file, select the code snippets and click on Run to run the code.

🔍 Observation

Dataset Summary: The dataset consists of 360 entries across 8 columns, including region, campaign_type, before_sales, after_sales, ad_spend, and revenue, allowing for comprehensive statistical analysis of marketing effectiveness.

Notable Statistical Test Insights:
✅ Paired t-test showed a significant difference between before and after sales (p < 0.000001), confirming the effectiveness of campaigns.
❌ Independent t-test found no significant revenue difference between Asia and Europe (p = 0.8708).
❌ One-way ANOVA revealed no significant revenue variation among Email, Social Media, and TV campaigns (p = 0.5067).
❌ Pearson Correlation showed no significant linear relationship between ad spend and revenue; a non-parametric alternative may be better due to non-linearity.
❌ Chi-square test of independence indicated no association between gender and email open rates.
✅ Chi-square goodness-of-fit test showed that campaign types are not equally distributed, indicating intentional or skewed targeting.

These observations help in refining marketing strategies and choosing appropriate statistical methods for deeper insights.

📌 Things to Keep in Mind

  • For all the tests conducted, the level of significance is taken to be 5% as it is used most commonly.
  • Ensure your data doesn't contain NaN values when performing tests or plotting.
  • Modify the file path if you're reading the dataset from your own Drive in Colab.

About

This repository contains implementations of various statistical hypothesis tests in Python language applied to synthetic business sales data. It covers various parametric and non-parametric tests to support data-driven decision-making in business analytics. Visualizations and assumption checks are also included.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published