This repository documents our journey through the QRT Challenge, a competitive machine learning challenge focused on predicting the performance of daily asset allocations in systematic trading. As students venturing into quantitative finance for the first time, we embraced this challenge as a learning laboratory—prioritizing exploration, experimentation, and skill development over final ranking.
Note on Reproducibility: The datasets used in this challenge are proprietary and not publicly available. This repository serves as a portfolio piece demonstrating our methodology, code organization, and learning process rather than a fully reproducible pipeline.
Objective: Predict whether a given asset allocation will generate positive or negative returns on the following trading day.
Data: 180,245 training observations with 20-day historical windows of allocation returns, volume-weighted liquidity behavior, and turnover metrics.
Evaluation: Binary classification accuracy on the sign of next-day returns.
Finishing 50th out of 104 teams in this highly selective competition is an achievement we can be proud of. The event brought together 100+ students from the country's most prestigious engineering schools—CentraleSupélec, École Polytechnique, ENSAE, Télécom Paris, Mines Paris, and École des Ponts ParisTech—with participants further filtered through a CV-based selection process due to limited spots.
Competing against this caliber of talent pushed us to our limits:
- We invested 100+ hours outside of coursework, driven purely by curiosity and the challenge
- We debugged complex issues involving temporal data and learned to navigate the pitfalls of model evaluation in real-world settings
- We developed real research team dynamics
This project represented our first deep dive into quantitative finance and time series prediction. We had to:
- Understand domain-specific concepts (allocation weights, turnover, signed volumes)
- Grasp the nuances of financial data where signal-to-noise ratios are notoriously low
Key Takeaway: Stepping outside our comfort zone taught us how to rapidly acquire domain knowledge and apply it to real-world problems.
Rather than pursuing a single approach, we deliberately explored multiple methodologies:
- Linear Models: Ridge, ElasticNet, and Huber regression with extensive feature selection
- Tree-Based Models: XGBoost and LightGBM with hyperparameter optimization
- Deep Learning: LSTM networks for capturing temporal dependencies
- Ensemble Methods: Model stacking and weighted averaging
- Specialized Approaches: Market-specific, risk-focused, and portfolio-technical models
Key Takeaway: This breadth-first exploration taught us when different model families excel and fail—invaluable intuition for future projects.
One of our most valuable lessons came from getting it wrong:
- We initially achieved impressive validation scores that didn't translate to the leaderboard
- We discovered subtle data leakage issues in our preprocessing pipeline
- We learned the critical importance of proper time series cross-validation
- We experienced firsthand how feature engineering can inadvertently introduce future information
Beyond domain knowledge, we significantly strengthened our technical capabilities:
| Skill Area | Specific Achievements |
|---|---|
| Feature Engineering | Created 40+ engineered feature groups |
| Model Architecture | Implemented custom wrappers for systematic cross-validation and preprocessing integration |
| Hyperparameter Optimization | Conducted extensive grid searches as well as bayesian optimisation via optuna |
| Ensemble Techniques | Tested linear, XGBoost, and weighted averaging approaches for model combination |
| Production Mindset | Organized code into modular components for reproducibility |
QRT_challenge/
├── preprocessing/ # Data cleaning, normalization, feature engineering
│ ├── preprocess.py
│ └── feature_engineering.py
├── models/ # Custom model classes with built-in CV
│ └── models.py
├── training/ # Training pipelines and experiment tracking
│ └── train.py
├── datasets/ # Data loading and manipulation utilities
│ └── datasets.py
├── notebooks/
│ ├── feature_engineering/ # Feature selection and correlation analysis
│ ├── linear/ # Linear model experiments
│ ├── xgboost/ # Gradient boosting optimization
│ ├── lgbm/ # LightGBM experiments
│ ├── ensembling/ # Model combination strategies
│ └── specialized_models/ # Domain-specific approaches
├── evaluation/ # Performance metrics and validation
│ └── eval.py
└── ressources/ # Documentation and learning materials
├── demarche_expliquee.md
└── explication_markowitz.md