Spending Predictor

A machine learning project to predict customer spending patterns using sales transaction data.

🔗 Database

Kaggle Database by Vineet Bahl: https://www.kaggle.com/datasets/thedevastator/analyzing-customer-spending-habits-to-improve-sa

📊 Project Overview

This project analyzes sales transaction data to build predictive models for customer spending behavior. The pipeline includes data cleaning, exploratory data analysis, feature engineering, and machine learning model training.

🚀 Quick Start

Prerequisites

Python 3.13+
Jupyter Lab
Required Python packages (see Installation)

Installation

Clone the repository

git clone <repository-url>
cd spending-predictor

Install required packages

pip3 install pandas matplotlib seaborn scikit-learn joblib --user --break-system-packages

Start Jupyter Lab
```
jupyter-lab
```

📁 Project Structure

spending-predictor/
├── data/
│   ├── raw/
│   │   └── SalesForCourse_quizz_table.csv
│   └── processed/
│       └── cleaned_data.csv
├── notebooks/
│   ├── 01_eda.ipynb
│   └── 02_preprocess.ipynb
├── src/
│   ├── preprocess.py
│   └── train.py
├── models/
│   └── spending_predictor.joblib
├── outputs/
└── README.md

🔧 Usage

Data Preprocessing

from src.preprocess import clean_data, save_clean_data

# Load raw data
df = pd.read_csv('data/raw/SalesForCourse_quizz_table.csv')

# Clean and preprocess
cleaned_df = clean_data(df)

# Save processed data
save_clean_data(cleaned_df)

Model Training

from src.train import prepare_features, train_model, evaluate_model

# Prepare features
X, y = prepare_features(cleaned_df)

# Train model
model = train_model(X_train, y_train, model_type='linear')

# Evaluate
metrics = evaluate_model(model, X_test, y_test)

Running the Pipeline

# Run preprocessing
python src/preprocess.py

# Run training
python src/train.py

🛠️ Dependencies

pandas (2.3.1): Data manipulation and analysis
matplotlib (3.10.3): Data visualization
seaborn (0.13.2): Statistical data visualization
scikit-learn (1.7.1): Machine learning algorithms
joblib (1.5.1): Model persistence and parallel processing
numpy (2.3.2): Numerical computing
jupyter-lab (4.4.5): Interactive development environment

📊 Data

The project uses sales transaction data containing:

Customer demographics (age, gender, location)
Product information (category, sub-category)
Transaction details (date, quantity, revenue, profit)
Geographic information (country, state)

🆘 Support

Check me out on irla.dev!

Note: This project is designed for educational and research purposes. Always validate model predictions in real-world applications.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Spending Predictor

🔗 Database

📊 Project Overview

🚀 Quick Start

Prerequisites

Installation

📁 Project Structure

🔧 Usage

Data Preprocessing

Model Training

Running the Pipeline

🛠️ Dependencies

📊 Data

🆘 Support

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
models		models
notebooks		notebooks
outputs/figures		outputs/figures
src		src
.gitignore		.gitignore
README.md		README.md

Ddomir/Revenue-Predictor

Folders and files

Latest commit

History

Repository files navigation

Spending Predictor

🔗 Database

📊 Project Overview

🚀 Quick Start

Prerequisites

Installation

📁 Project Structure

🔧 Usage

Data Preprocessing

Model Training

Running the Pipeline

🛠️ Dependencies

📊 Data

🆘 Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages