An advanced machine learning system for generating, optimizing, and backtesting trading strategy Domain Specific Languages (DSLs). This platform combines synthetic market data generation, deep learning models, and sophisticated backtesting to create optimized trading strategies.
- Domain Specific Language (DSL): A structured way to describe a trading playbook using clear parameters such as entry signals, exit rules, and risk controls. The DSL files in this project capture the "what" of a strategy without hard-coding the logic.
- Strategy: A concrete set of trading decisions derived from a DSL plus market data. Strategies define how and when trades happen once the DSL rules are interpreted by the engine.
- Backtest: A replay of historical or simulated market data to see how a strategy would have performed. Backtesting surfaces metrics like returns and drawdowns before you risk live capital.
- Optimization: The process of tweaking DSL parameters to strike a better balance between profit and risk while respecting safety constraints.
- Synthetic Market Data: Generate realistic OHLCV candlestick data with multiple market conditions
- Market Conditions: Trending (up/down), Volatile, Ranging, Breakout, Reversal, Consolidation
- Dataset Management: Save, load, and manage multiple datasets with custom naming
- Batch Processing: Generate 100+ samples with varied market conditions
- Multiple Architectures: LSTM, GRU, and Transformer models
- Feature Extraction: Technical indicators (RSI, MACD, Bollinger Bands, EMA, ATR)
- Training Pipeline: Automated training with progress tracking and visualization
- Model Persistence: Save and load trained models for future use
- Comprehensive DSL Structure:
{ "risk_management": { "stop_loss": 0.02, "take_profit": 0.05, "position_size": 0.20, "trailing_stop": 0.015 }, "position_management": { "max_positions": 3, "cooldown_period": 5 }, "entry_signals": ["RSI Oversold", "MACD Bullish Cross"], "exit_signals": ["RSI Overbought", "Trailing Stop"], "indicators": { "rsi": {"period": 14, "oversold": 35, "overbought": 65}, "macd": {"fast": 12, "slow": 26, "signal": 9}, "ema": {"short": 50, "long": 200}, "atr": {"period": 14, "multiplier": 2.0} } }
- Smart Optimization: Prioritizes positive returns over Sharpe ratio
- Risk Management: Constrains position sizing (max 25%) and drawdown
- Warning System: Detects misleading results (high Sharpe with negative returns)
- Parameter Comparison: Side-by-side comparison of baseline vs optimized strategies
- Comprehensive Metrics: Returns, Sharpe ratio, Max drawdown, Win rate
- Signal Mapping: Automatic conversion of DSL signals to trading actions
- Position Management: Supports multiple positions, cooldown periods, trailing stops
- Transaction Costs: Configurable commission rates
- Model Training Center: Train and track ML models with real-time progress
- DSL Generator: Create and modify trading strategies with intuitive UI
- Backtesting Lab: Test strategies on various market conditions
- Optimization Suite: Optimize DSL parameters with visual feedback
- Asset Manager: View and export datasets, models, and DSLs
ML-generateDSL/
├── src/
│ ├── data_generator.py # Synthetic market data pipeline
│ ├── backtesting.py # Backtesting engine & strategy execution
│ ├── feature_extraction.py # Technical indicator calculations
│ ├── ml_models.py # Neural network architectures
│ ├── training_pipeline.py # Model training & optimization orchestration
│ ├── signal_mapping.py # DSL signal to trading strategy mapping
│ ├── dsl_comparison.py # Strategy comparison utilities
│ └── visualization.py # Plotting and visualization helpers
├── app.py / dashboard.py # Streamlit entry points
├── app_fast.py # Lightweight dashboard variant
├── quickstart.py # Scriptable CLI workflow
├── requirements.txt # Python dependencies
└── README.md
Generated artifacts are intentionally excluded from Git to keep commits lean:
data/– training sessions, feature matrices, and evaluation logsdatasets/– exported synthetic market datasets (JSON/CSV)models/– trained ML checkpointsdsls/– saved DSL configurations
Create these folders as needed. The apps will automatically populate them when running locally.
git clone https://github.com/anisirji/ML-Generate_DSL.git
cd ML-generateDSLpython3 -m venv .venv
source .venv/bin/activate # On Windows use: .venv\\Scripts\\activatepip install -r requirements.txtstreamlit run dashboard.pyAdditional entry points (app.py, app_fast.py, quickstart.py) expose thinner shells around the same core modules. Run python app.py --help for CLI usage.
- Navigate to "Model Training" → "Generate Data"
- Set dataset name and parameters (samples, candles, conditions)
- Click "Generate" and "Save Dataset"
- Select saved dataset in "Train Model" tab
- Configure model architecture (LSTM/GRU/Transformer)
- Set training parameters (epochs, batch size, learning rate)
- Start training and monitor progress
- Go to "DSL Configuration"
- Set risk management parameters (stop loss, take profit, position size)
- Configure entry/exit signals and technical indicators
- Save DSL configuration
- Load market data in "Backtesting Lab"
- Select baseline DSL and trained model
- Run optimization (50-200 iterations)
- Review warnings and optimization quality metrics
- Save optimized DSL if results are satisfactory
- Run backtest with both baseline and optimized DSLs
- Compare metrics (returns, Sharpe, drawdown, win rate)
- Export results for further analysis
The system includes comprehensive warnings for dangerous optimization results:
- Misleading Sharpe Ratio: Detects high Sharpe with negative returns
- Risk Explosion: Warns when drawdown increases significantly
- Oversized Positions: Alerts for position sizes >30%
- Win Rate Paradox: Identifies when higher win rate yields lower returns
CandlestickGenerator: Creates realistic OHLCV dataDatasetGenerator: Manages training/test dataset creation- Market condition simulation with configurable parameters
TradingStrategy: Executes DSL-based trading logicBacktester: Simulates trading with position managementBacktestResult: Comprehensive performance metrics
- LSTM, GRU, Transformer architectures
- Configurable layers, units, dropout
- Support for multi-step prediction
- Safe optimization avoiding TensorFlow/PyTorch conflicts
- Prioritizes returns over Sharpe ratio
- Conservative parameter constraints
- Total Return: Overall profit/loss percentage
- Sharpe Ratio: Risk-adjusted returns
- Max Drawdown: Maximum peak-to-trough decline
- Win Rate: Percentage of profitable trades
- Total Trades: Number of executed trades
- Position sizing constraints (5-25% max)
- Stop loss limits (1-3% max)
- Trailing stop implementation
- Maximum concurrent positions
- Cooldown periods between trades
- Datasets: JSON, CSV, Excel formats
- Models: H5 (Keras), metadata JSON
- DSLs: JSON configuration files
- Results: CSV backtest reports
Contributions are welcome! Please feel free to submit pull requests or open issues for bugs and feature requests.
This project is open source and available under the MIT License.
Built with:
- Streamlit for the web interface
- TensorFlow/Keras for deep learning
- Pandas for data manipulation
- Plotly for interactive visualizations
- NumPy for numerical computations
Note: This is a research and educational tool. Always perform thorough testing and validation before using any trading strategy in live markets.
- Streamlit Documentation
- TensorFlow Time Series Guide
- PyTorch LSTM Reference
- pandas User Guide
- NumPy Random Sampling
- Quantitative Finance Indicators (QuantConnect)
- Hyperparameter Optimization with Hyperopt
- Backtesting Metrics Overview (QuantStart)
- ML Experiment Tracking with MLflow
- Pytest Getting Started Guide