This repository contains notebooks exploring the effects of treating observable market data as daily observations versus intraday observations.
Developed during the MITSUI&CO. Commodity Prediction Challenge, this work investigates whether modeling the sequential flow of information across global markets improves prediction performance.
π View the Notebook on Kaggle
The dataset stamped all observations during the course of one day as if all information is available simultaneously:
Japanese Metals Exchange
London Metals Exchange
US Stock Market
Forex
This ignores a fundamental reality: markets operate sequentially across time zones, and each market has access to different information when trading decisions are made.
We restructure daily observations into multiple sequential states that match the actual information flow:
Day 1: State 1 β JPX opens (only has overnight info)
Day 1: State 2 β LME opens (now sees JPX results)
Day 1: State 3 β US opens (now sees JPX + LME results)
Key Insight: By training on these sequential states, the model can learn causal relationships:
- β JPX β LME causality: Model observes JPX state followed by LME state
- β LME β US causality: Model observes LME state followed by US state
- β Information accumulation: Each state represents realistic market information
- β 3x training samples: Three observations per day instead of one
The notebook provides a side-by-side comparison of two approaches:
- Restructures 1,961 daily observations β ~5,883 sequential samples
- Each day creates 3 intraday states (JPX phase β LME phase β US/FX phase)
- Model learns temporal causality through sequential training
- Uses forward-fill to maintain complete market state at each phase
- Uses original 1,961 daily observations (traditional 1-row-per-day)
- All market information aggregated into single daily snapshot
- Standard modeling approach for comparison
The notebook quantitatively compares both approaches across:
- Prediction accuracy metrics (MAE, RMSE)
- Statistical distributions (correlation, error patterns)
- Target-specific performance (JPX/LME/US markets)
- Visual analysis (time series plots, error distributions, heatmaps)
Our comprehensive lag forecast analysis (notebook-prediction-comparison/) validates the Sequential approach across multiple forecast horizons using real competition ground truth labels:
| Forecast Horizon | Sequential MAE | Daily MAE | Improvement |
|---|---|---|---|
| t+1 (1 day ahead) | 0.01716 | 0.01785 | +4.0% β |
| t+2 (2 days ahead) | 0.02285 | 0.02268 | -0.7% |
| t+3 (3 days ahead) | 0.02495 | 0.02518 | +0.9% β |
| t+4 (4 days ahead) | 0.02625 | 0.02727 | +3.9% β |
Key Findings:
- β Sequential demonstrates consistent precision advantage at short (t+1) and long (t+4) horizons
- β Performance validated across 424 target time series (106 commodities Γ 4 forecast horizons)
- β Both approaches show similar degradation patterns as forecast horizon extends
- π View full lag analysis β
Does restructuring daily data into sequential intraday states improve model performance?
This notebook provides empirical evidence through:
- Identical algorithms (Recursive Least Squares with feature engineering)
- Fair comparison (same features, parameters, validation methodology)
- Quantitative metrics (statistical tests, visual analysis)
- Reproducible methodology (fully documented process)
- Open the notebook: Kaggle Notebook
- Fork the notebook: Click "Copy & Edit" to create your own version
- Run all cells: The notebook will automatically download competition data
- Explore outputs: View comparisons, metrics, and visualizations
Advantages: No setup required, competition data automatically available, free compute resources
-
Clone this repository:
git clone https://github.com/PatrickRutledge/Mitsui-Public-Notebook.git cd Mitsui-Public-Notebook -
Download competition data:
- Visit Competition Data Page
- Download:
train.csv,train_labels.csv,target_pairs.csv,test.csv - Place in
./mitsui-commodity-prediction-challenge/directory
-
Install dependencies:
pip install pandas numpy scikit-learn matplotlib seaborn tqdm
-
Run the notebook:
jupyter notebook "mitsui-daily-to intraday_public.ipynb"
If you want to understand the methodology without running code:
- Read the Sequential Causality README for theoretical foundation
- View the Kaggle Notebook for full analysis with outputs
- Environment Setup - Data loading and configuration
- Data Loading - Competition dataset preparation
- Intraday Restructuring - Convert daily β sequential states
- Feature Engineering - Temporal features, lag features, technical indicators
- Validation - Walk-forward validation with ground truth comparison
- Training - Recursive Least Squares with online learning
- Prediction - Generate forecasts for sequential approach
- Visualization - Model performance analysis
- Same pipeline applied to original daily data (no restructuring)
- Identical algorithms and features for fair comparison
- Section 10: Statistical comparison (correlation, MAE, RMSE, distributions)
- Section 11: Key findings and insights for practitioners
- Global information flow diagram
- Metrics comparison dashboard
- Error distribution analysis
- Time series prediction plots
- Correlation heatmaps
This research demonstrates:
β
Reproducible methodology for evaluating temporal data restructuring
β
Fair comparison using identical algorithms and features
β
Quantitative framework for decision-making
β
Transparent analysis of trade-offs
β
Practical insights for time series practitioners
Whether sequential restructuring improves performance depends on:
- Correlation between approaches (high = minimal impact)
- Target-specific patterns (some markets benefit more)
- Computational constraints (sequential = 3x training time)
- Feature engineering quality (can traditional daily match sequential?)
- Competition: MITSUI&CO. Commodity Prediction Challenge
- Dataset: Competition Data
- Main Notebook: Interactive Kaggle Version
- Lag Analysis: Forecast Horizon Validation | Documentation
- Writeup: Sequential Causality: Full Analysis on Kaggle - Comprehensive methodology & results
- Theory: Sequential Causality README
This research is shared as an educational resource. If you:
- Find this helpful: Please star β the repository
- Have suggestions: Open an issue or discussion
- Build on this work: We'd love to hear about it!
- Spot errors: Please let us know so we can correct them
If you use this work in your research, please reference:
Patrick Rutledge (2025). Sequential Causality in Commodity Price Prediction.
MITSUI&CO. Commodity Prediction Challenge.
https://github.com/PatrickRutledge/Mitsui-Public-Notebook
This project is shared for educational purposes. The competition data is subject to Kaggle Competition Rules.
- MITSUI&CO. for hosting this challenging competition
- AlpacaTech Co., Ltd. for technical support and data creation
- APILayer for providing Forex data via Exchange Rates API
- Kaggle community for inspiration and learning resources
Questions? Ideas? Feedback?
Open an issue or start a discussion. We're here to learn together! π
