This SQL script performs data cleaning and exploratory data analysis on a layoffs table to prepare it for further insights. The process includes the following steps:
- Layoffs Dataset: layoffs.csv
- Download the
layoffs.csvfile. - Import the CSV file into a MySQL database.
- Execute the following SQL script.
- A staging table (
layoffs_staging2) is created as a copy of the originallayoffstable. - The script identifies and removes duplicate rows.
- Leading and trailing whitespaces are removed from text columns.
- Inconsistencies in the
industryandcountrycolumns are addressed. - The
datecolumn is converted to a proper DATE format. - Null and blank values in key columns are handled.
- Redundant columns are dropped.
- The final cleaned dataset is stored in
layoffs_staging2.
The script also includes exploratory data analysis to provide insights into the cleaned data. Key areas of investigation include:
- Identifying companies and industries with the highest layoff numbers.
- Analyzing layoffs as a percentage of company size.
- Examining the relationship between funds raised and layoffs.
- Calculating monthly layoff trends.
- Determining the top 5 companies with the most layoffs per year.
