This is a sophisticated statistical arbitrage system for futures pairs trading, focusing on mean-reverting relationships with optimal entry/exit timing and rigorous risk management.
The system identifies and exploits temporary mispricings between cointegrated instruments, using various statistical methods and ML enhancements to improve trading signals. It includes functionality for:
- Pair selection and cointegration testing
- Spread calculation with dynamic hedge ratios
- Signal generation with ML-based enhancements
- Risk management with adaptive position sizing
- Paper trading with performance monitoring
- Comprehensive backtesting with transaction costs
For detailed system documentation, refer to these resources:
- System Architecture Diagram: Visual representation of system components
- PAIRS Design Document: Overall system design and architecture
- Data Flow Architecture: Data flow between system components
- Docker Architecture: Containerized distributed processing
- Next Steps: Current priorities and immediate focus items
- Implementation Status: Current state of implementation
- Implementation Notes: Notes on implementation details
- Cointegration Framework: Comprehensive cointegration framework design
- Statistical Methods: Statistical methods for cointegration testing
- Johansen Test Implementation: Detailed Johansen test implementation
- Engle-Granger Test Implementation: Detailed Engle-Granger test implementation
- Statistical Validation Methods: Methods for validating cointegration relationships
- Z-Score Strategy Implementation: Comprehensive Z-Score strategy documentation
- Cointegration Framework: Detailed component interactions
- Statistical Methods: Mathematical foundations
- Statistical Validation: Ensuring robustness
- Z-Score Strategy: Core strategy implementation
- Strategy Variants: Different strategy variations
- Backtest Implementation Guide: Detailed implementation guide
- Intraday Backtest: Intraday backtesting specifics
- Paper Trading Guide: Guide for paper trading setup and usage
- Configuration Guide: Configuration options and parameters
- Intraday ML System User Guide: End-user guide for ML system
- Troubleshooting Guide: Solutions for common issues
- IB Connection Troubleshooting: Interactive Brokers connectivity help
For a complete overview of all documentation, refer to the Documentation Guide.
The system provides a unified command-line interface through main.py with the following commands:
Find and analyze cointegrated pairs from your futures data:
python main.py analyze-pairs --tickers GC SI ZB ZN --min-correlation 0.7 --timeframe 1hourTest trading strategies on historical data:
python main.py backtest --pairs GC SI ZB ZN --start-date 2023-01-01 --end-date 2023-12-31Test intraday-specific strategies with enhanced features:
python main.py intraday-backtest --pairs GC SI --start-date 2023-01-01 --end-date 2023-12-31 --timeframe 5min --use-mlTrain machine learning models to enhance trading signals:
python main.py train-models --pair GC_SI --start-date 2023-01-01 --end-date 2023-12-31 --timeframe 5minTrain market regime classifier for adaptive parameter selection:
python main.py train-regime-classifier --tickers GC SI ZB ZN --timeframe 1day --n-regimes 3Optimize strategy parameters for different market regimes:
python main.py optimize-parameters --pairs-file output/pairs_analysis.json --start-date 2023-01-01 --end-date 2023-12-31 --n-regimes 3Test your strategy in a simulated environment:
python main.py paper-trade --config config/paper_trading.json --capital 100000 --test-modePrepare data for analysis and backtesting:
python main.py process-data --symbols GC SI ZB ZN --start-date 2020-01-01 --end-date 2023-12-31 --timeframe 5minLaunch the web API for monitoring and control:
python main.py api --host 0.0.0.0 --port 8000Start a Celery worker for background tasks:
python main.py worker --concurrency 4 --queue default- Clone the repository
- Install dependencies:
pip install -r requirements.txt - Process your historical data:
python main.py process-data --symbols GC SI ZB ZN --start-date 2020-01-01 --end-date 2023-12-31 - Analyze pairs:
python main.py analyze-pairs --tickers GC SI ZB ZN - Run a backtest:
python main.py intraday-backtest --pairs GC SI --start-date 2023-01-01 --end-date 2023-12-31 - Train ML models:
python main.py train-models --pair GC_SI --start-date 2023-01-01 --end-date 2023-12-31 - Start paper trading:
python main.py paper-trade --config config/paper_trading.json --test-mode
A typical workflow might look like:
- Process your futures data
- Find cointegrated pairs with the pair analyzer
- Run backtests to validate the strategy
- Train ML models to enhance signal generation
- Optimize parameters for different market regimes
- Run paper trading with the optimized strategy
- Monitor performance and refine the strategy
The system expects futures data in the following structure:
/data/processed/: Processed futures data/data/models/: Trained ML models/data/results/: Backtest results and analysis/config/: System and strategy configuration files
The system consists of several components:
- Asset Classes: Abstractions for futures, equities, and other asset types
- Pair Trading: Pair selection, spread analytics, and cointegration testing
- Signal Generation: Signal processing, z-score calculation, and filtering
- ML Enhancements: Feature engineering, model training, intraday signal enhancement
- Paper/Live Trading: Order execution, position management, performance tracking
- Risk Management: Position sizing, stop loss management, exposure control
- Backtesting: Backtesting engine, strategy optimization, performance metrics
- Infrastructure: Docker containers, Celery tasks, monitoring dashboard
For a visual representation of the system architecture, see System Architecture Diagram.
The system uses Docker containers for distributed task processing:
# Start the Docker containers
./scripts/start-containers.ps1
# Stop the Docker containers
./scripts/stop-containers.ps1
# Submit a task to the system
./scripts/submit-task.ps1 -TaskType train-models -Pair GC_SI -Timeframe 1hourFor details on the Docker-based architecture, see Docker Architecture.
We are actively working on improving code quality by addressing technical debt in these areas:
-
Large Files Refactoring:
- Breaking down large files into smaller, focused modules
- Implementing design patterns to improve code organization
-
Complex Functions Simplification:
- Extracting helper methods from complex functions
- Applying the single responsibility principle
-
Duplicate Code Elimination:
- Using template method pattern and inheritance
- Creating base classes and mixins for common functionality
For details on our technical debt resolution plan, see Technical Debt Analysis.
Guidelines for contributing to the project:
- Use consistent naming conventions with existing code
- Add comprehensive tests for new features
- Document your changes in the appropriate documentation files
- Follow the code structure and patterns established in the project
- Run tests before submitting changes
This project is proprietary and not licensed for public use. All rights reserved.