Skip to content

PatrickRutledge/Mitsui-Public-Notebook

Repository files navigation

πŸš€ Sequential Causality in Commodity Price Prediction

Overview

This repository contains notebooks exploring the effects of treating observable market data as daily observations versus intraday observations.

Developed during the MITSUI&CO. Commodity Prediction Challenge, this work investigates whether modeling the sequential flow of information across global markets improves prediction performance.

πŸ“Š View the Notebook on Kaggle


🎯 The Big Idea

Problem

The dataset stamped all observations during the course of one day as if all information is available simultaneously:

Japanese Metals Exchange
London Metals Exchange
US Stock Market
Forex

This ignores a fundamental reality: markets operate sequentially across time zones, and each market has access to different information when trading decisions are made.

Our Sequential Causality Solution

We restructure daily observations into multiple sequential states that match the actual information flow:

Day 1: State 1 β†’ JPX opens (only has overnight info)
Day 1: State 2 β†’ LME opens (now sees JPX results)  
Day 1: State 3 β†’ US opens (now sees JPX + LME results)

Key Insight: By training on these sequential states, the model can learn causal relationships:

  • βœ… JPX β†’ LME causality: Model observes JPX state followed by LME state
  • βœ… LME β†’ US causality: Model observes LME state followed by US state
  • βœ… Information accumulation: Each state represents realistic market information
  • βœ… 3x training samples: Three observations per day instead of one

πŸ“š What This Notebook Does

The notebook provides a side-by-side comparison of two approaches:

1. Sequential Intraday Approach

  • Restructures 1,961 daily observations β†’ ~5,883 sequential samples
  • Each day creates 3 intraday states (JPX phase β†’ LME phase β†’ US/FX phase)
  • Model learns temporal causality through sequential training
  • Uses forward-fill to maintain complete market state at each phase

2. Daily Baseline Approach

  • Uses original 1,961 daily observations (traditional 1-row-per-day)
  • All market information aggregated into single daily snapshot
  • Standard modeling approach for comparison

Comparison Analysis

The notebook quantitatively compares both approaches across:

  • Prediction accuracy metrics (MAE, RMSE)
  • Statistical distributions (correlation, error patterns)
  • Target-specific performance (JPX/LME/US markets)
  • Visual analysis (time series plots, error distributions, heatmaps)

οΏ½ Key Results: Lag Forecast Validation

Sequential Model Achieves 3-4% MAE Reduction

Our comprehensive lag forecast analysis (notebook-prediction-comparison/) validates the Sequential approach across multiple forecast horizons using real competition ground truth labels:

Sequential Model Advantage

Forecast Horizon Sequential MAE Daily MAE Improvement
t+1 (1 day ahead) 0.01716 0.01785 +4.0% βœ…
t+2 (2 days ahead) 0.02285 0.02268 -0.7%
t+3 (3 days ahead) 0.02495 0.02518 +0.9% βœ…
t+4 (4 days ahead) 0.02625 0.02727 +3.9% βœ…

Key Findings:

  • βœ… Sequential demonstrates consistent precision advantage at short (t+1) and long (t+4) horizons
  • βœ… Performance validated across 424 target time series (106 commodities Γ— 4 forecast horizons)
  • βœ… Both approaches show similar degradation patterns as forecast horizon extends
  • πŸ“Š View full lag analysis β†’

οΏ½πŸ”¬ Research Question

Does restructuring daily data into sequential intraday states improve model performance?

This notebook provides empirical evidence through:

  • Identical algorithms (Recursive Least Squares with feature engineering)
  • Fair comparison (same features, parameters, validation methodology)
  • Quantitative metrics (statistical tests, visual analysis)
  • Reproducible methodology (fully documented process)

πŸ› οΈ How to Use This Notebook

Option 1: Run on Kaggle (Recommended)

  1. Open the notebook: Kaggle Notebook
  2. Fork the notebook: Click "Copy & Edit" to create your own version
  3. Run all cells: The notebook will automatically download competition data
  4. Explore outputs: View comparisons, metrics, and visualizations

Advantages: No setup required, competition data automatically available, free compute resources

Option 2: Run Locally

  1. Clone this repository:

    git clone https://github.com/PatrickRutledge/Mitsui-Public-Notebook.git
    cd Mitsui-Public-Notebook
  2. Download competition data:

    • Visit Competition Data Page
    • Download: train.csv, train_labels.csv, target_pairs.csv, test.csv
    • Place in ./mitsui-commodity-prediction-challenge/ directory
  3. Install dependencies:

    pip install pandas numpy scikit-learn matplotlib seaborn tqdm
  4. Run the notebook:

    jupyter notebook "mitsui-daily-to intraday_public.ipynb"

Option 3: Just Read the Analysis

If you want to understand the methodology without running code:

  1. Read the Sequential Causality README for theoretical foundation
  2. View the Kaggle Notebook for full analysis with outputs

πŸ“Š Notebook Structure

Part 1: Sequential Intraday Approach (Sections 1-8)

  1. Environment Setup - Data loading and configuration
  2. Data Loading - Competition dataset preparation
  3. Intraday Restructuring - Convert daily β†’ sequential states
  4. Feature Engineering - Temporal features, lag features, technical indicators
  5. Validation - Walk-forward validation with ground truth comparison
  6. Training - Recursive Least Squares with online learning
  7. Prediction - Generate forecasts for sequential approach
  8. Visualization - Model performance analysis

Part 2: Daily Baseline Approach (Section 9)

  • Same pipeline applied to original daily data (no restructuring)
  • Identical algorithms and features for fair comparison

Part 3: Comparison Analysis (Sections 10-11)

  • Section 10: Statistical comparison (correlation, MAE, RMSE, distributions)
  • Section 11: Key findings and insights for practitioners

Visualizations

  • Global information flow diagram
  • Metrics comparison dashboard
  • Error distribution analysis
  • Time series prediction plots
  • Correlation heatmaps

πŸŽ“ Key Learnings

This research demonstrates:

βœ… Reproducible methodology for evaluating temporal data restructuring
βœ… Fair comparison using identical algorithms and features
βœ… Quantitative framework for decision-making
βœ… Transparent analysis of trade-offs
βœ… Practical insights for time series practitioners

Whether sequential restructuring improves performance depends on:

  • Correlation between approaches (high = minimal impact)
  • Target-specific patterns (some markets benefit more)
  • Computational constraints (sequential = 3x training time)
  • Feature engineering quality (can traditional daily match sequential?)

πŸ“š Additional Resources


🀝 Contributing

This research is shared as an educational resource. If you:

  • Find this helpful: Please star ⭐ the repository
  • Have suggestions: Open an issue or discussion
  • Build on this work: We'd love to hear about it!
  • Spot errors: Please let us know so we can correct them

πŸ“ Citation

If you use this work in your research, please reference:

Patrick Rutledge (2025). Sequential Causality in Commodity Price Prediction.
MITSUI&CO. Commodity Prediction Challenge.
https://github.com/PatrickRutledge/Mitsui-Public-Notebook

βš–οΈ License

This project is shared for educational purposes. The competition data is subject to Kaggle Competition Rules.


πŸ™ Acknowledgments

  • MITSUI&CO. for hosting this challenging competition
  • AlpacaTech Co., Ltd. for technical support and data creation
  • APILayer for providing Forex data via Exchange Rates API
  • Kaggle community for inspiration and learning resources

Questions? Ideas? Feedback?
Open an issue or start a discussion. We're here to learn together! πŸš€

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published