Skip to content

End-to-end OLTP to data warehouse implementation with SCD Type 2 dimensions, incremental ETL, and analytical aggregates.

Notifications You must be signed in to change notification settings

Thooms-coder/zagi-data-warehouse

Repository files navigation

ZAGI Data Warehouse (ZAGI_DW)

End-to-end OLTP → Staging → Data Warehouse implementation for a retail and rental business. The project demonstrates dimensional modeling, SCD Type 2 handling, incremental and late-arriving fact loads, analytical aggregates, and automated ETL validation.

This repository contains a deterministic full-build SQL pipeline, Python utilities for execution and testing, and schema documentation.


Repository Contents

  • zagi_dw_full_build.sql
    A single, ordered SQL script that:

    • Resets and rebuilds the OLTP schema
    • Creates staging (DS) and warehouse (DW) schemas
    • Loads SCD Type 2 dimensions
    • Generates the calendar dimension
    • Loads fact tables via an intermediate fact table (IFT)
    • Builds analytical aggregates and snapshots
    • Defines ETL refresh and validation procedures
  • db_example.py
    Sanitized example database configuration for managing MySQL connections. Credentials and host details are intentionally redacted. Used by the ETL test harness.

  • test_etl.py
    Automated ETL test runner that:

    • Executes ETL refresh procedures
    • Compares DS vs DW row counts
    • Logs results to ETL_Test_Log for auditability
  • diagrams/

    • Core schema diagram: OLTP → DS → DW (no aggregates)
    • Full schema diagram: Includes aggregates, snapshots, and ETL metadata

Architecture Overview

The warehouse follows a layered design:

  • OLTP layer
    Normalized transactional tables for sales and rentals

  • Staging layer (DS)
    Cleansed, conformed dimensions with SCD Type 2 handling and an intermediate fact table

  • Data warehouse (DW)
    Star schema centered on RevenueFact

  • Analytical layer
    Pre-aggregated fact tables and daily snapshots for performance and reporting


ETL Flow Overview

OLTP SOURCE TABLES
(vendor, category, product,
 salestransaction, rentaltransaction, rentvia, soldvia)
        |
        v
+----------------------------+
|  STAGING (DS) LAYER        |
|----------------------------|
| - CustomerDimension (SCD2) |
| - StoreDimension (SCD2)    |
| - ProductDimension (SCD2)  |
| - CalendarDimension        |
| - IFT (Intermediate Fact)  |
+----------------------------+
        |
        |  Initial Load
        |  DailyFactLoad
        |  LateArrivingFactLoad
        v
+----------------------------+
|  DATA WAREHOUSE (DW)       |
|----------------------------|
| - RevenueFact              |
| - CustomerDimension (SCD2) |
| - StoreDimension (SCD2)    |
| - ProductDimension (SCD2)  |
| - CalendarDimension        |
+----------------------------+
        |
        v
+----------------------------+
|  ANALYTICAL LAYER          |
|----------------------------|
| - RevenueAggByRegion       |
| - RevenueAggByCustomerZip  |
| - RevenueAggByProductCat   |
| - Daily_Store_Snapshot     |
+----------------------------+
        |
        v
+----------------------------+
|  ETL VALIDATION            |
|----------------------------|
| - test_etl.py              |
| - ETL_Test_Log             |
| - DS vs DW row checks      |
+----------------------------+

Schema Documentation

Two schema diagrams are provided to distinguish between warehouse layers:

Core Schema

Shows OLTP sources, staging (DS) SCD Type 2 dimensions, and the primary star schema centered on RevenueFact. This represents the logical and physical foundation of the warehouse.

  • Core Schema (OLTP → DS → DW)

ZAGI Core Schema

Full Schema (with Aggregates and Snapshots)

Shows the complete analytical layer, including revenue aggregates, daily store snapshots, and ETL metadata tables used for validation and monitoring.

  • Full Warehouse Schema (Analytical Layer)

ZAGI Full Schema


Key Design Features

SCD Type 2 Dimensions

  • Customer, Store, and Product dimensions retain full history
  • Facts join to the correct dimensional version via validity windows

Intermediate Fact Table (IFT)

  • Normalizes sales and rental transactions into a unified fact structure
  • Simplifies fact loading and late-arriving data handling

Incremental and Late-Arriving Fact Loads

  • DailyFactLoad for routine loads
  • LateArrivingFactLoad for delayed transactions

Analytical Optimization

  • Pre-aggregated revenue tables
  • Daily store performance snapshots

Automated ETL Validation

  • DS vs DW row-count checks
  • Duplicate detection
  • Calendar integrity checks
  • Results logged to ETL_Test_Log

Execution

1. Build the Data Warehouse

Run the full build script in MySQL 8.0+ to create and populate the OLTP, staging (DS), and data warehouse (DW) schemas:

SOURCE zagi_oltp_to_dw_full_pipeline.sql;

2. Run ETL Validation Tests

Execute the Python ETL test harness to verify successful loads and log results:

python3 test_etl.py

About

End-to-end OLTP to data warehouse implementation with SCD Type 2 dimensions, incremental ETL, and analytical aggregates.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages