Skip to content

malerbe/QRT_challenge

Repository files navigation

QRT Challenge - Exploring Quantitative Finance Through Competition

Team

Project Overview

This repository documents our journey through the QRT Challenge, a competitive machine learning challenge focused on predicting the performance of daily asset allocations in systematic trading. As students venturing into quantitative finance for the first time, we embraced this challenge as a learning laboratory—prioritizing exploration, experimentation, and skill development over final ranking.

Note on Reproducibility: The datasets used in this challenge are proprietary and not publicly available. This repository serves as a portfolio piece demonstrating our methodology, code organization, and learning process rather than a fully reproducible pipeline.

The Challenge

Objective: Predict whether a given asset allocation will generate positive or negative returns on the following trading day.

Data: 180,245 training observations with 20-day historical windows of allocation returns, volume-weighted liquidity behavior, and turnover metrics.

Evaluation: Binary classification accuracy on the sign of next-day returns.


Final ranking

Finishing 50th out of 104 teams in this highly selective competition is an achievement we can be proud of. The event brought together 100+ students from the country's most prestigious engineering schools—CentraleSupélec, École Polytechnique, ENSAE, Télécom Paris, Mines Paris, and École des Ponts ParisTech—with participants further filtered through a CV-based selection process due to limited spots.

See QRT's post about the competition

Competing against this caliber of talent pushed us to our limits:

  • We invested 100+ hours outside of coursework, driven purely by curiosity and the challenge
  • We debugged complex issues involving temporal data and learned to navigate the pitfalls of model evaluation in real-world settings
  • We developed real research team dynamics

What We Learned

1. Navigating Unknown Territory

This project represented our first deep dive into quantitative finance and time series prediction. We had to:

  • Understand domain-specific concepts (allocation weights, turnover, signed volumes)
  • Grasp the nuances of financial data where signal-to-noise ratios are notoriously low

Key Takeaway: Stepping outside our comfort zone taught us how to rapidly acquire domain knowledge and apply it to real-world problems.


2. The Value of Systematic Experimentation

Rather than pursuing a single approach, we deliberately explored multiple methodologies:

  • Linear Models: Ridge, ElasticNet, and Huber regression with extensive feature selection
  • Tree-Based Models: XGBoost and LightGBM with hyperparameter optimization
  • Deep Learning: LSTM networks for capturing temporal dependencies
  • Ensemble Methods: Model stacking and weighted averaging
  • Specialized Approaches: Market-specific, risk-focused, and portfolio-technical models

Key Takeaway: This breadth-first exploration taught us when different model families excel and fail—invaluable intuition for future projects.


3. Learning from Mistakes: Data Leakage & Validation

One of our most valuable lessons came from getting it wrong:

  • We initially achieved impressive validation scores that didn't translate to the leaderboard
  • We discovered subtle data leakage issues in our preprocessing pipeline
  • We learned the critical importance of proper time series cross-validation
  • We experienced firsthand how feature engineering can inadvertently introduce future information

4. Technical Skills Developed

Beyond domain knowledge, we significantly strengthened our technical capabilities:

Skill Area Specific Achievements
Feature Engineering Created 40+ engineered feature groups
Model Architecture Implemented custom wrappers for systematic cross-validation and preprocessing integration
Hyperparameter Optimization Conducted extensive grid searches as well as bayesian optimisation via optuna
Ensemble Techniques Tested linear, XGBoost, and weighted averaging approaches for model combination
Production Mindset Organized code into modular components for reproducibility

Project Structure

QRT_challenge/
├── preprocessing/              # Data cleaning, normalization, feature engineering
│   ├── preprocess.py
│   └── feature_engineering.py
├── models/                     # Custom model classes with built-in CV
│   └── models.py
├── training/                   # Training pipelines and experiment tracking
│   └── train.py
├── datasets/                   # Data loading and manipulation utilities
│   └── datasets.py
├── notebooks/                  
│   ├── feature_engineering/    # Feature selection and correlation analysis
│   ├── linear/                 # Linear model experiments
│   ├── xgboost/                # Gradient boosting optimization
│   ├── lgbm/                   # LightGBM experiments
│   ├── ensembling/             # Model combination strategies
│   └── specialized_models/     # Domain-specific approaches
├── evaluation/                 # Performance metrics and validation
│   └── eval.py
└── ressources/                 # Documentation and learning materials
    ├── demarche_expliquee.md
    └── explication_markowitz.md

About

Submission code for QRT Grand Data Challenge 2025

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •