End-to-end OLTP → Staging → Data Warehouse implementation for a retail and rental business. The project demonstrates dimensional modeling, SCD Type 2 handling, incremental and late-arriving fact loads, analytical aggregates, and automated ETL validation.
This repository contains a deterministic full-build SQL pipeline, Python utilities for execution and testing, and schema documentation.
-
zagi_dw_full_build.sql
A single, ordered SQL script that:- Resets and rebuilds the OLTP schema
- Creates staging (DS) and warehouse (DW) schemas
- Loads SCD Type 2 dimensions
- Generates the calendar dimension
- Loads fact tables via an intermediate fact table (IFT)
- Builds analytical aggregates and snapshots
- Defines ETL refresh and validation procedures
-
db_example.py
Sanitized example database configuration for managing MySQL connections. Credentials and host details are intentionally redacted. Used by the ETL test harness. -
test_etl.py
Automated ETL test runner that:- Executes ETL refresh procedures
- Compares DS vs DW row counts
- Logs results to
ETL_Test_Logfor auditability
-
diagrams/- Core schema diagram: OLTP → DS → DW (no aggregates)
- Full schema diagram: Includes aggregates, snapshots, and ETL metadata
The warehouse follows a layered design:
-
OLTP layer
Normalized transactional tables for sales and rentals -
Staging layer (DS)
Cleansed, conformed dimensions with SCD Type 2 handling and an intermediate fact table -
Data warehouse (DW)
Star schema centered onRevenueFact -
Analytical layer
Pre-aggregated fact tables and daily snapshots for performance and reporting
OLTP SOURCE TABLES
(vendor, category, product,
salestransaction, rentaltransaction, rentvia, soldvia)
|
v
+----------------------------+
| STAGING (DS) LAYER |
|----------------------------|
| - CustomerDimension (SCD2) |
| - StoreDimension (SCD2) |
| - ProductDimension (SCD2) |
| - CalendarDimension |
| - IFT (Intermediate Fact) |
+----------------------------+
|
| Initial Load
| DailyFactLoad
| LateArrivingFactLoad
v
+----------------------------+
| DATA WAREHOUSE (DW) |
|----------------------------|
| - RevenueFact |
| - CustomerDimension (SCD2) |
| - StoreDimension (SCD2) |
| - ProductDimension (SCD2) |
| - CalendarDimension |
+----------------------------+
|
v
+----------------------------+
| ANALYTICAL LAYER |
|----------------------------|
| - RevenueAggByRegion |
| - RevenueAggByCustomerZip |
| - RevenueAggByProductCat |
| - Daily_Store_Snapshot |
+----------------------------+
|
v
+----------------------------+
| ETL VALIDATION |
|----------------------------|
| - test_etl.py |
| - ETL_Test_Log |
| - DS vs DW row checks |
+----------------------------+
Two schema diagrams are provided to distinguish between warehouse layers:
Shows OLTP sources, staging (DS) SCD Type 2 dimensions, and the primary star schema
centered on RevenueFact. This represents the logical and physical foundation
of the warehouse.
- Core Schema (OLTP → DS → DW)
Shows the complete analytical layer, including revenue aggregates, daily store snapshots, and ETL metadata tables used for validation and monitoring.
- Full Warehouse Schema (Analytical Layer)
- Customer, Store, and Product dimensions retain full history
- Facts join to the correct dimensional version via validity windows
- Normalizes sales and rental transactions into a unified fact structure
- Simplifies fact loading and late-arriving data handling
DailyFactLoadfor routine loadsLateArrivingFactLoadfor delayed transactions
- Pre-aggregated revenue tables
- Daily store performance snapshots
- DS vs DW row-count checks
- Duplicate detection
- Calendar integrity checks
- Results logged to
ETL_Test_Log
Run the full build script in MySQL 8.0+ to create and populate the OLTP, staging (DS), and data warehouse (DW) schemas:
SOURCE zagi_oltp_to_dw_full_pipeline.sql;Execute the Python ETL test harness to verify successful loads and log results:
python3 test_etl.py