OHLCV QuantGAN - Synthetic Market Data Generator

Generate realistic synthetic OHLCV (Open, High, Low, Close, Volume) data for backtesting trading strategies without overfitting to historical data.

Why This Matters for S/R and Breakout Strategies

Traditional synthetic data generators (like the original QuantGAN) only generate log returns - a single column. But for support/resistance and breakout strategies, you need:

Full OHLC structure to detect S/R levels (highs and lows matter!)
Candle patterns (wicks, body ratios) for rejection signals
Volume for breakout confirmation
Volatility clustering (periods of calm followed by explosions)

This modified QuantGAN generates 6 correlated features that preserve these relationships:

Feature	What It Captures
`log_return`	Price movement direction and magnitude
`body_ratio`	Bullish/bearish sentiment (-1 to 1)
`upper_wick`	Rejection from highs (resistance)
`lower_wick`	Rejection from lows (support)
`range_pct`	Volatility / candle size
`volume_norm`	Volume patterns (spikes on breakouts)

Quick Start

1. Install Dependencies

cd QuantGAN
pip install -r requirements.txt

For GPU training (recommended):

pip install torch --index-url https://download.pytorch.org/whl/cu118  # CUDA 11.8
# or
pip install torch --index-url https://download.pytorch.org/whl/cu121  # CUDA 12.1

2. Preprocess Your Data

# Preprocess BTC data
python preprocess_ohlcv.py ../testdata/BTCUSDT_5m_365d.csv \
    --output_dir preprocessed \
    --seq_length 64

# Preprocess ETH data
python preprocess_ohlcv.py ../testdata/ETHUSDT_5m_365d.csv \
    --output_dir preprocessed \
    --seq_length 64

This creates:

preprocessed/BTCUSDT_5m_365d_sequences.npy - Training sequences
preprocessed/BTCUSDT_5m_365d_features.csv - Feature data
preprocessed/BTCUSDT_5m_365d_params.json - Transform parameters

3. Train the GAN

# Train on BTC (GPU recommended)
python train_ohlcv.py --name BTCUSDT_5m_365d \
    --epochs 200 \
    --batch_size 64 \
    --cuda

# Train on ETH
python train_ohlcv.py --name ETHUSDT_5m_365d \
    --epochs 200 \
    --batch_size 64 \
    --cuda

Training creates checkpoints in checkpoints/<name>_<timestamp>/:

netG_best.pth - Best generator (lowest loss)
netG_final.pth - Final generator
train_info.json - Model configuration

Training time estimates:

CPU: ~2-4 hours for 200 epochs
GPU (RTX 3080+): ~15-30 minutes

4. Generate Synthetic Data

# Generate 10 synthetic BTC datasets (each ~6400 candles ≈ 22 days of 5m data)
python generate_ohlcv.py \
    --checkpoint_dir checkpoints/BTCUSDT_5m_365d_tcn_<timestamp> \
    --n_sequences 100 \
    --n_datasets 10 \
    --initial_price 50000 \
    --output_dir ../testdata/synthetic \
    --output_prefix BTCUSDT_synthetic

# Generate synthetic ETH
python generate_ohlcv.py \
    --checkpoint_dir checkpoints/ETHUSDT_5m_365d_tcn_<timestamp> \
    --n_sequences 100 \
    --n_datasets 10 \
    --initial_price 3500 \
    --output_dir ../testdata/synthetic \
    --output_prefix ETHUSDT_synthetic

Output Format

time,open,high,low,close,volume
1702000000,50000.00,50125.50,49875.25,50100.00,123.45678
1702000300,50100.00,50200.00,50050.00,50175.50,145.67890
...

Architecture

The GAN uses Temporal Convolutional Networks (TCN) with:

Dilated convolutions for capturing long-range patterns (S/R levels that persist)
Self-attention for learning global dependencies (market regime awareness)
Spectral normalization for stable WGAN-GP training
OHLCV constraint loss to ensure valid candle relationships

Generator:
  Noise (64-dim) → Project → [ResBlock + Attention] × 4 → OHLCV Features (6-dim)

Discriminator:  
  OHLCV Features (6-dim) → [ResBlock + Attention] × 4 → Pool → Real/Fake Score

Advanced Usage

Hyperparameter Tuning

python train_ohlcv.py --name BTCUSDT_5m_365d \
    --model_type tcn \       # or 'lstm' for slower but potentially better long-term patterns
    --hidden_dim 256 \       # larger = more capacity, slower training
    --n_layers 6 \           # more layers = longer temporal receptive field
    --noise_dim 128 \        # more noise = more diversity in generations
    --lr_g 0.00005 \         # lower LR if training is unstable
    --n_critic 5 \           # D updates per G update (WGAN-GP)
    --epochs 500

Resume Training

python train_ohlcv.py --name BTCUSDT_5m_365d \
    --resume_G checkpoints/.../netG_epoch_100.pth \
    --resume_D checkpoints/.../netD_epoch_100.pth \
    --epochs 200

Generate with Specific Seeds (Reproducibility)

python generate_ohlcv.py \
    --checkpoint_dir checkpoints/... \
    --n_datasets 5 \
    --seed 42  # Same seed = same synthetic data

Tips for Better Synthetic Data

Train on more data: The more real data, the better the GAN learns patterns
Longer sequences: Try --seq_length 128 for longer temporal dependencies
Multiple assets: Train separate models for BTC and ETH, or combine them
Validate visually: Plot synthetic vs real data to check for obvious issues
Statistical tests: Compare return distributions, autocorrelations, etc.

Troubleshooting

Training is unstable (loss oscillating wildly)

Reduce learning rates: --lr_g 0.00005 --lr_d 0.00005
Increase critic steps: --n_critic 10

Generated data looks too smooth/boring

Train longer
Increase noise dimension: --noise_dim 128
Check if your preprocessing lost volatility

Generated data has invalid OHLC (high < low)

This should be handled by features_to_ohlcv() but if it persists, increase --lambda_constraint

Out of memory on GPU

Reduce batch size: --batch_size 32 or --batch_size 16
Reduce hidden dimension: --hidden_dim 64

License

MIT License - see LICENSE for details. Use freely, but keep the attribution!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
checkpoints/BTCUSDT_5m_tcn_20251212_162516		checkpoints/BTCUSDT_5m_tcn_20251212_162516
models		models
preprocessed		preprocessed
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
compare_charts.py		compare_charts.py
generate_ohlcv.py		generate_ohlcv.py
monitor.sh		monitor.sh
preprocess_ohlcv.py		preprocess_ohlcv.py
quick_test.py		quick_test.py
requirements.txt		requirements.txt
train_ohlcv.py		train_ohlcv.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OHLCV QuantGAN - Synthetic Market Data Generator

Why This Matters for S/R and Breakout Strategies

Quick Start

1. Install Dependencies

2. Preprocess Your Data

3. Train the GAN

4. Generate Synthetic Data

Output Format

Architecture

Advanced Usage

Hyperparameter Tuning

Resume Training

Generate with Specific Seeds (Reproducibility)

Tips for Better Synthetic Data

Troubleshooting

License

About

Uh oh!

Releases

Packages

Languages

License

zachyking/ohlcv-quant-gan

Folders and files

Latest commit

History

Repository files navigation

OHLCV QuantGAN - Synthetic Market Data Generator

Why This Matters for S/R and Breakout Strategies

Quick Start

1. Install Dependencies

2. Preprocess Your Data

3. Train the GAN

4. Generate Synthetic Data

Output Format

Architecture

Advanced Usage

Hyperparameter Tuning

Resume Training

Generate with Specific Seeds (Reproducibility)

Tips for Better Synthetic Data

Troubleshooting

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages