This document serves as a place to capture implementation decisions, thought processes, and notes during development. It's especially useful when implementations are interrupted or when context is lost between sessions.
Decision: We are reorienting our implementation approach to focus on a Minimal Viable Trading Strategy (MVTS) to enable early validation and profitability testing.
Rationale:
- Current development path has created a complex codebase with many features without validating core profitability
- Early validation will allow us to identify which enhancements actually improve returns
- Iterative implementation-validation cycles will ensure development resources are directed toward profitable improvements
- Simplifying the initial implementation will accelerate time-to-validation
MVTS Architecture:
- Basic Z-Score strategy with simplified pair selection
- Static hedge ratio calculation (deferring Kalman filter complexity)
- Simple position sizing and risk management
- Essential transaction cost modeling
- Streamlined paper trading environment
Implementation Approach:
- Focus on small, focused modules with clear interfaces
- Prioritize functionality over optimization initially
- Implement only what's needed for basic validation
- Create clear performance metrics for comparing iterations
- Document assumptions and design decisions for later reference
Validation Framework:
- 2-week validation cycles for each significant implementation
- Clear performance metrics to evaluate improvements
- Benchmark comparisons to assess relative performance
- Detailed transaction and signal logs for analysis
Current State:
- Basic implementation of Linear Kalman Filter is in place in
src/cointegration/kalman_filter.py - Extended Kalman Filter has been started for non-linear models
- Helper functions for estimating time-varying hedge ratios are implemented
- Visualization tools for Kalman filter outputs are available
Next Steps:
- Complete implementation of additional state space models in the Extended Kalman Filter
- Add online parameter estimation for adaptive models
- Implement diagnostics for filter performance evaluation
- Add comprehensive test coverage for all Kalman filter variants
- Create integration with spread calculation components
- Optimize performance for large datasets
- Ensure proper documentation of mathematical foundations
Implementation Challenges:
- Need to ensure numerical stability in the filter updates
- Must handle edge cases like missing data points
- Performance optimization needed for real-time applications
- Need to validate against known reference implementations
Current Approach:
- Running the system in paper trading mode using
run_ml_paper_trader.py - Using ML-enhanced signals with dynamic adaptation based on market regimes
- Tracking performance metrics compared to baseline statistical strategy
Key Considerations:
- Position sizing follows prop firm requirements (max 2% risk per trade)
- Paper trading environment simulates real execution with realistic slippage
- Performance metrics include Sharpe ratio, max drawdown, and win rate
Next Steps:
- Complete 4-week paper trading validation period
- Analyze performance metrics and compare to backtest expectations
- Implement any necessary refinements based on paper trading results
Current Approach:
- Implementing consistent error handling framework across all components
- Following the phased approach outlined in
intraday_ml_next_steps.md - Currently in Phase 1: Apply error handling to all data processing components
Key Considerations:
- Each component should handle its specific error cases
- Critical operations need retry mechanisms
- All errors should be properly logged and reported
Next Steps:
- Complete Phase 1 of error handling implementation
- Move to Phase 2: Apply error handling to all ML training components
- Ensure all error logs are captured and accessible for monitoring
Decision: Use XGBoost for intraday signal classification with feature-based approach rather than deep learning
- Rationale:
- XGBoost provides better interpretability
- Works well with tabular financial data
- Less prone to overfitting with our data size
- Faster training and inference than deep learning approaches
Alternative Considered:
- LSTM/RNN approach for sequence modeling
- Rejected due to higher complexity, more difficult interpretation, and slower inference time
Decision: Implement a separate market regime classifier rather than integrating regime detection into the main model
- Rationale:
- Cleaner separation of concerns
- Can retrain regime detection independently
- Easier to validate and test
Alternative Considered:
- Multi-task learning approach with shared features
- Rejected due to increased model complexity and potential training instability
Decision: Use incremental processing with daily updates rather than full reprocessing
- Rationale:
- More efficient for large datasets
- Reduces computational load
- Supports faster adaptation to new data
Alternative Considered:
- Full reprocessing approach for consistency
- Rejected due to computational inefficiency with 15 years of data across 130 tickers
Potential Approach:
- Containerize the system for consistent deployment
- Use managed services for data storage and processing
- Implement auto-scaling for ML training components
Open Questions:
- How to handle data synchronization between local and cloud environments?
- What is the most cost-effective approach for cloud infrastructure?
- Should we use a hybrid approach with critical components on dedicated hardware?
Potential Approach:
- Implement a portfolio allocation framework across multiple strategy variants
- Use hierarchical risk management to control overall exposure
- Develop correlation-based allocation to maximize diversification
Open Questions:
- How to balance computational resources across multiple strategies?
- What is the optimal rebalancing frequency for the strategy portfolio?
- Should strategies share the same data pipeline or have independent ones?
We've successfully implemented the following Celery task modules:
-
Backtest Tasks
- Implemented
run_backtestandrun_intraday_backtestfunctions insrc/tasks/backtest_tasks.py - Connected them to the actual backtest functionality via
run_intraday_backtest.py - Used dynamic configuration generation to adapt to different parameter sets
- Implemented comprehensive progress tracking and error handling
- Implemented
-
Optimization Tasks
- Implemented
optimize_parametersfunction insrc/tasks/optimization_tasks.py - Connected to parameter optimization functionality via
run_intraday_parameter_optimization.py - Added support for quick mode optimization
- Implemented proper result collection and return values
- Implemented
For all task implementations, we followed a consistent pattern:
-
Task State Management
- Update task state at key points in execution
- Include relevant metadata in each state update
- Use descriptive status messages (loading_data, running_backtest, etc.)
-
Configuration Handling
- Accept configuration path as an optional parameter
- Generate temporary configuration files when needed
- Timestamp and organize results in dedicated directories
-
Error Handling
- Implement comprehensive try/except blocks
- Log detailed error information
- Return appropriate error responses
- Update task state on failure
-
Results Processing
- Process and format results for API consumption
- Include file paths to generated resources (plots, configs)
- Provide summary statistics when available
The next priority is to enhance the API endpoints in src/api/main.py to better integrate with the implemented task functions:
- Add additional validation for input parameters
- Improve error responses with more detailed information
- Add documentation strings for all API endpoints
- Implement additional endpoints for task management (cancel, pause, resume)
We have successfully completed all the priority tasks related to the Celery task and API implementation:
- Added
run_backtestandrun_intraday_backtestinsrc/tasks/backtest_tasks.py - Connected to the actual backtesting implementation via
run_intraday_backtest.py - Implemented comprehensive error handling and progress tracking
- Added configuration generation with appropriate defaults
- Added
optimize_parametersinsrc/tasks/optimization_tasks.py - Connected to the optimization implementation via
run_intraday_parameter_optimization.py - Added support for configuration customization and quick mode
- Implemented proper result collection and error handling
- Improved the FastAPI endpoints in
src/api/main.py - Added comprehensive input validation with Pydantic models
- Improved error responses with detailed information
- Added documentation strings for all API endpoints
- Implemented new endpoints for task management:
/tasksfor listing all tasks with filtering/tasks/cancelfor canceling running tasks/tasks/intraday-backtestfor dedicated intraday backtesting/system/statusfor system monitoring
Throughout the implementation, we maintained consistent patterns:
-
Proper State Management
- All tasks update their state with detailed progress information
- Error states include comprehensive error details
- Results include paths to generated files and summaries
-
Validation and Error Handling
- API endpoints perform thorough input validation
- All functions have comprehensive try/except blocks
- Error messages are detailed and actionable
-
Configurability
- All tasks support external configuration files
- Default values are provided when configuration is missing
- Dynamic configuration generation based on input parameters
With the priority tasks completed, the focus can shift to:
- Running integration tests to verify the component interactions
- Testing the containerized environment with the implemented tasks
- Potentially implementing additional optimization algorithms
- Adding more unit tests for the implemented components
The test data generation script (scripts/generate_test_data.py) has been enhanced to support:
- Intraday and Daily Data: Can generate data at multiple timeframes (1d, 1h, 5min)
- Controlled Cointegration: Allows specific hedge ratios and mean reversion parameters
- Regime Shifts: Can simulate sudden changes in relationship for testing adaptation
- CLI Interface: Provides command-line options for customization
- Visualization: Can plot pairs and their relationship for visual inspection
The test data is designed to mimic real futures pairs with specific cointegration properties while being completely controlled for reproducible testing. Data is saved in the standard format expected by the system.
Important considerations:
- Default pairs include: CL-HO, GC-SI, ZC-ZW, ES-NQ, ZN-ZB
- All data includes OHLCV format for compatibility with existing code
- The script also generates a pairs configuration file with correct parameters for testing
Container integration tests in tests/integration/test_containers.py focus on:
- Container Startup: Ensures all required containers start correctly
- Container Communication: Tests inter-container communication via Celery tasks
- Volume Persistence: Validates data persists across container restarts
- API Availability: Tests that the API container responds to requests
- Error Detection: Checks container logs for error conditions
These tests require Docker to be installed and available. The tests use both unittest and pytest frameworks to provide different testing approaches.
The task testing framework (in tests/tasks/) follows these principles:
- Mock-Based Testing: Uses mock objects when actual components aren't available
- Fall-Through Implementation: Can use real implementations when available
- Test Data Integration: Leverages the test data generation script
- Multiple Test Strategies: Includes unit tests and integration tests
Key considerations when extending these tests:
- Always provide graceful fallbacks when dependencies are missing
- Tests should be isolated and not rely on global state
- Use temporary directories for test file storage
API testing (tests/api/test_api_endpoints.py) implements two testing strategies:
- External Testing: Tests API as a black box using requests library
- Flask Test Client: Tests using Flask's built-in test client for deeper testing
Both approaches have advantages, with external testing better for integration testing and the Flask client better for unit testing.
The automated test execution script (tests/run_automated_tests.py) has been implemented to provide a comprehensive framework for CI/CD integration. Key features include:
- Flexible Test Selection: Tests can be selected by type (unit, integration, API, etc.) and filtered by include/exclude patterns.
- Docker Integration: Tests can be run in a Docker container for consistent execution environment.
- Comprehensive Reporting: Results are saved in JSON format and can be automatically converted to HTML reports.
- Test Type Management: The system supports multiple test types with specialized execution methods:
- Unit tests (via unittest and pytest)
- Integration tests
- API tests
- Performance benchmarks
- Container tests
- Error Handling: Robust error handling with detailed logging and timeout protection for long-running tests.
- Results Analysis: Summary metrics include total tests, passing/failing counts, execution time, and detailed error reporting.
The implementation follows these design principles:
- Modularity: Each test type has its own execution logic.
- Configurability: All aspects of test execution can be configured via command-line arguments.
- Detailed Reporting: Test results contain enough detail for debugging failures.
- CI/CD Integration: Exit codes follow standard conventions for CI/CD pipeline integration.
Usage example:
# Run all tests
python -m tests.run_automated_tests
# Run only unit tests
python -m tests.run_automated_tests --type=unit
# Run tests with HTML report generation
python -m tests.run_automated_tests --report
# Run selected tests with timeout
python -m tests.run_automated_tests --include=intraday,backtest --timeout=600
# Run tests in Docker
python -m tests.run_automated_tests --dockerThe script can be integrated with CI/CD systems like Jenkins, GitHub Actions, or GitLab CI to provide automated testing on code changes.
The optimization benchmark system (tests/benchmark/test_optimization_benchmark.py) has been implemented to evaluate and compare the performance of different parameter optimization algorithms. Key features include:
- Multi-Algorithm Benchmarking: Side-by-side comparison of grid search and genetic algorithm optimizers.
- Parameter Space Scaling: Tests scaling behavior with different parameter space sizes (small, medium, large).
- Regime-Specific Optimization Benchmarks: Evaluation of regime detection and regime-specific optimization with different regime counts.
- Population Size Analysis: Tests genetic algorithm performance with different population sizes.
- Memory Profiling: Measures memory consumption for each optimization approach.
The benchmarks are designed to work with synthetic data generated specifically for testing, ensuring reproducible results while simulating real-world conditions. Benchmark scenarios include:
- Grid Search Benchmarks: Testing the exhaustive search approach with different parameter space sizes.
- Genetic Algorithm Benchmarks: Testing the evolutionary approach with different parameter spaces and population sizes.
- Regime Optimization Benchmarks: Testing market regime detection and regime-specific parameter optimization.
- Algorithm Comparison: Direct comparison of grid search vs. genetic algorithms on identical problems.
Implementation design principles:
- Graceful Degradation: The benchmarks work even if the actual implementations are not available (using mock classes).
- Isolated Environment: Tests use synthetic data and don't depend on existing datasets.
- Comprehensive Metrics: Captures runtime, memory usage, and scaling behavior.
- Integration with Benchmark Framework: Uses the existing
BenchmarkRunnerclass for consistent measurement.
The benchmark system can be run individually or as part of the complete benchmark suite with:
# Run just optimization benchmarks
python -m tests.benchmark.test_optimization_benchmark
# Run specific optimization benchmark
python -m tests.benchmark.test_optimization_benchmark --benchmark grid
# Run as part of complete benchmark suite
python -m tests.benchmark.test_run_benchmarks --tests optimizationThe performance test suite (tests/performance/) has been implemented to provide comprehensive performance testing for all system components. Key features include:
- Modular Architecture: Separate test modules for data processing, model training, trading execution, API, and system-level performance.
- Configurable Test Runner: A central runner script with CLI arguments for controlling test execution.
- Multiple Report Formats: Results are saved in both JSON and HTML formats with visualization.
- Resource Measurement: Tests track execution time, memory usage, and I/O operations.
- Scalable Test Data: Tests can run with different data sizes from small to production-level.
The performance test suite is designed to identify performance bottlenecks and verify that new code changes don't negatively impact system performance. Several key components have been implemented:
Fully implemented with:
- Format Comparison: Tests data loading from different formats (CSV, Parquet, HDF5) for speed and memory efficiency.
- Feature Calculation: Measures the performance of various feature engineering operations.
- Data Merging: Tests the efficiency of data joining and merging operations.
- Scaling Behavior: Examines how performance scales with increasing data volume.
The ML model training performance tests analyze:
- Model Comparison: Benchmarks different model types (Linear Regression, Random Forest, Gradient Boosting, XGBoost) for training time, prediction time, and memory usage.
- Hyperparameter Tuning: Compares the performance of grid search and random search algorithms.
- Feature Selection: Tests various feature selection methods (SelectKBest, RFE, SelectFromModel) for time and memory efficiency.
- Scaling Analysis: Examines how model training performance scales with increasing data size.
- ML Component Integration: Provides a framework for testing system-specific ML components with graceful degradation when components are not available.
The trading system performance tests analyze:
- Signal Generation: Measures the speed and memory usage of signal generation across different pairs and market conditions.
- Position Management: Tests the performance of trade execution and position tracking operations.
- Scaling Behavior: Analyzes how the trading system performance scales with an increasing number of pairs.
- Mock Components: Implements mock versions of trading components for testing when actual components are not available.
- Synthetic Data Generation: Creates realistic trading data with cointegration properties for consistent testing.
The framework uses several profiling techniques:
- Context managers for timing code blocks
- Decorators for memory profiling
- Tracemalloc for detailed memory tracking
- Statistical aggregation for reliable results
Performance reports include:
- Execution time for each operation
- Memory usage patterns
- Visual comparisons via charts and graphs
- System and environment information
This comprehensive approach ensures that performance metrics are tracked consistently and provides early warning of performance regressions.
The performance benchmark testing framework has been implemented to provide consistent, repeatable performance measurements for critical system components. This framework serves several purposes:
- Performance Optimization: Identifies bottlenecks in data processing and algorithmic implementations
- Implementation Comparison: Allows comparison of different implementation approaches
- Scaling Analysis: Measures how performance scales with different data volumes
- Resource Usage: Monitors memory consumption for resource planning
The framework consists of these key components:
The core of the framework is the BenchmarkRunner class which provides:
- Consistent execution timing with configurable repetitions
- Statistical analysis of results (mean, median, std dev)
- Result serialization to JSON and CSV formats
- Implementation comparison capabilities
This class has been designed with a simple API to make adding new benchmarks easy:
# Create runner
runner = BenchmarkRunner("my_benchmark")
# Run benchmark for a function
runner.run_benchmark(func, *args, **kwargs)
# Compare multiple implementations
runner.compare_implementations([func1, func2], args_list, kwargs_list)
# Save results
runner.save_results()For consistent benchmarking, we've implemented controlled test data generation with:
- Parameterized cointegration properties
- Multiple data frequencies (daily, hourly, 5-min)
- Optional regime shifts for testing adaptivity
- Consistent random seed option for reproducibility
The framework generates comprehensive HTML reports with:
- Execution time tables for all benchmarks
- Comparison charts for competing implementations
- Memory usage profiles
- Scaling analysis visualizations
Several key decisions were made during implementation:
-
Statistical Summary Over Raw Times: Rather than just using raw timing, we calculate mean, median, and standard deviation to account for system variability.
-
Isolated Test Environment: Tests create their own isolated data to avoid dependencies on existing datasets.
-
Graceful Fallbacks: If components are missing (e.g., the actual BacktestEngine), the benchmarks use mock implementations to still test the framework itself.
-
Modular Design: Each benchmark type is in its own module, allowing selective execution and easy addition of new benchmarks.
-
No Side Effects: Benchmarks clean up after themselves and don't modify the existing system state.
Planned improvements to the benchmark framework:
- CI Integration: Add automation to run benchmarks on PRs to catch performance regressions
- Historical Tracking: Build a database of performance over time to track long-term trends
- System Resource Monitoring: Add CPU/memory/disk I/O monitoring during benchmarks
- Distributed Testing: Support for testing distributed processing performance
When utilizing the benchmark framework:
- Always test with realistic data volumes
- Compare multiple implementation approaches for key algorithms
- Test both small-scale and large-scale data to understand scaling properties
- Run benchmarks before and after major changes to detect regressions
- Use memory profiling for resource-intensive operations to plan infrastructure needs
The migration to Docker containers was completed to improve deployment consistency and enable better resource management. Key decisions included:
-
Container Structure: Multiple services are organized in separate containers:
- Main application container
- Database container
- Worker containers for parallel processing
- Monitoring container
-
Volume Management: Persistent data is stored in named volumes:
data-volume: For market data and processed datamodel-volume: For trained ML modelsconfig-volume: For configuration files
-
Network Configuration: Services are connected through a custom Docker network:
- Internal communication uses service names as hostnames
- Only specific ports are exposed to the host system
-
Resource Constraints: Each container has defined resource limits:
- Worker containers: 2 CPU cores, 4GB RAM
- Database container: 1 CPU core, 2GB RAM
- Main application: 2 CPU cores, 4GB RAM
The data pipeline was optimized to reduce processing time and memory usage:
-
Chunked Processing: Large datasets are processed in chunks to avoid memory issues.
-
Parallel Processing: Added multiprocessing for independent data transformations:
- Feature calculation is distributed across worker processes
- Each worker handles a subset of the symbols
-
Caching Strategy: Implemented a two-level caching approach:
- Memory cache for frequently accessed data
- Disk cache for preprocessed data
-
On-Demand Processing: Changed from batch processing to on-demand processing:
- Data is processed when requested rather than all at once
- Intermediate results are cached for reuse
The Position Manager was extracted from the IntradayMLPaperTrader class to improve code organization, maintainability, and enable more focused testing. This is part of our technical debt reduction effort.
-
Component Extraction: Identified position management functionality in IntradayMLPaperTrader and moved it to a dedicated class following single responsibility principle.
-
Interface Design: Created a clean API for position management operations:
- Position entry/exit
- Position tracking
- Risk management
- Performance monitoring
-
Configuration Handling: Position Manager accepts configuration at both the class level and the pair level, allowing for global and pair-specific settings.
-
Dependency Injection: The Position Manager receives a reference to the paper trader rather than directly instantiating it, making testing easier and reducing coupling.
-
Core Position Operations:
add_pair(): Add trading pairs to be managedexecute_signals(): Execute trading signals for pairs_enter_position(): Enter new positions based on signals_exit_position(): Exit positions with proper logging
-
Risk Management:
check_stop_losses(): Implement stop loss based on z-score thresholdscheck_take_profits(): Implement take profit based on mean reversioncheck_holding_limits(): Enforce maximum position holding timecheck_correlation_breakdown(): Exit positions when correlation weakens
-
Position Tracking:
get_positions(): Get all current positionsget_position(): Get position for a specific pairget_position_history(): Retrieve closed positions history
-
Advanced Features:
adjust_position_size(): Dynamically adjust position size based on volatilitytrack_position_performance(): Record detailed performance metricsanalyze_position_risk(): Calculate risk metrics for open positionsget_position_summary(): Generate aggregate position reportsmonitor_all_positions(): Comprehensive position monitoring system
The integration with IntradayMLPaperTrader will follow these steps:
- Initialization: IntradayMLPaperTrader will initialize a PositionManager instance
- Signal Routing: Trading signals will be routed to PositionManager
- Callback Handling: Position updates from PaperTrader will be forwarded to PositionManager
- Monitoring Integration: PositionManager monitoring will feed into the dashboard
The Position Manager will be tested using:
- Unit Tests: Testing individual methods with mocked dependencies
- Integration Tests: Testing interactions with PaperTrader
- Scenario Tests: Testing different market conditions and position management scenarios
- Complete unit tests for the PositionManager
- Integrate PositionManager with IntradayMLPaperTrader
- Refactor IntradayMLPaperTrader to use the new component
- Update documentation to reflect the new design
The Johansen test has been implemented as a standalone function in src/cointegration/cointegration_tests.py. Key implementation details:
- The function supports multivariate cointegration testing with configurable deterministic terms.
- Results include trace statistics, critical values, eigenvalues, and the number of cointegrating relations.
- Implementation uses statsmodels'
coint_johansenfunction with proper error handling. - Input validation ensures the function receives at least two time series.
- Results are formatted as a nested dictionary for easy consumption by other components.
This implementation satisfies a critical gap identified in the audit and provides a robust foundation for identifying multiple cointegrating relationships.
The Engle-Granger two-step cointegration test has been implemented as a standalone function in src/cointegration/cointegration_tests.py. Key implementation details:
- The function accepts two price series and returns comprehensive cointegration results.
- The implementation follows the two-step process: regression to find hedge ratio, then ADF test on residuals.
- Results include ADF statistic, p-value, critical values, cointegration flag, and hedge ratio.
- The function handles log price transformation internally and aligns series indices automatically.
- Results include half-life calculation for mean reversion speed assessment.
This implementation provides a standardized approach to the Engle-Granger test and ensures consistent results across different components.
The rolling cointegration analysis function has been enhanced in src/cointegration/cointegration_tests.py to provide more robust stability assessment. Key enhancements:
- Added support for testing multiple window sizes to evaluate relationship stability.
- Implemented metrics to measure consistency across different window sizes:
- Window consistency: How often different window sizes agree on cointegration status
- Hedge ratio stability: Consistency of hedge ratios across window sizes
- Cointegration frequency: Percentage of windows showing cointegration
- Added comprehensive validation to ensure sufficient data and proper window sizes.
- Enhanced error handling with informative warnings and error messages.
- Returns rich metadata about the cointegration relationship stability over time.
These enhancements enable more robust pair selection by ensuring cointegration relationships are stable across different time horizons.
The out-of-sample validation in the test_cointegration() function has been significantly enhanced to provide more robust assessment of cointegration relationship stability. Key enhancements:
-
Comprehensive Stability Metrics: Implemented multiple measures to assess the stability of the cointegration relationship between training and validation periods:
- Consistency of cointegration finding between periods
- Half-life ratio stability assessment
- R-squared ratio for model quality comparison
- Overall stability score that combines multiple factors
-
Statistical Significance Testing: Added rigorous statistical testing across validation data:
- Added normality testing for residuals using Shapiro-Wilk test
- Implemented stationarity consistency checks across subperiods
- Added mean and variance stability assessment by comparing first and second halves of validation period
-
Advanced Validation Processing:
- Adaptive validation based on data length (more metrics when more data is available)
- Extreme deviation detection to identify potential issues
- Enhanced error handling with proper warnings when data is insufficient
-
Rich Metadata Return: Results include detailed information for decision-making:
- Complete ADF test results with critical values
- Comprehensive stability metrics in nested dictionary format
- Normalized statistics for easy interpretation
- Comparison metrics between training and validation periods
This enhanced validation provides a much more robust framework for assessing the quality of potential trading pairs, helping to filter out relationships that might appear cointegrated in-sample but break down out-of-sample.
The calculate_half_life() function has been significantly enhanced for better robustness and more comprehensive results. Key improvements:
-
Enhanced Return Format: Instead of returning a single float value, the function now returns a dictionary with multiple validation metrics:
- Half-life value for mean reversion speed
- R-squared of the regression for model quality assessment
- Model validity flag based on statistical significance
- Residual normality flag based on Shapiro-Wilk test
- Hurst exponent for time series memory assessment
-
Hurst Exponent Calculation: Added calculation of the Hurst exponent to provide an additional verification of mean-reversion properties:
- H < 0.5 indicates mean-reversion (desirable)
- H = 0.5 indicates random walk
- H > 0.5 indicates trending behavior
-
Improved Robustness:
- Added maximum half-life cap to prevent unrealistic values from near unit-root processes
- Implemented proper input validation with type checking
- Added comprehensive error handling with try-except blocks
- Added data length validation with appropriate warnings
-
Enhanced Regression Analysis:
- Proper alignment of vectors for regression
- Statistical significance testing of the regression coefficient
- Validation of regression residuals for normality
These improvements provide a much more robust foundation for pair trading strategy development by ensuring that half-life calculations are reliable and properly validated, which is critical for trading signals based on mean reversion.
The basic Z-Score Strategy Backtest has been implemented in src/backtest/zscore_strategy_backtest.py. This implementation provides a complete foundation for testing pair trading strategies based on the z-score of the spread between cointegrated asset pairs. Key implementation details:
-
Comprehensive Strategy Design: The implementation follows the design outlined in PAIRS_DESIGN.md with:
- Entry signals at z-score thresholds beyond ±2.0 (configurable)
- Exit signals when z-score reverts to ±0.5 (configurable)
- Stop-loss implementation at extreme z-score levels
- Maximum holding period constraints
- Transaction cost modeling including commissions and slippage
-
Spread Calculation & Z-Score Computation:
- Supports multiple calculation methods for z-scores (rolling window, exponential weighted moving average, full history)
- Implements proper hedge ratio calculation using OLS regression
- Supports log price transformation for improved stationarity properties
- Includes proper error handling and validation
-
Position Management:
- Implements position tracking with entry and exit logic
- Supports various exit conditions (target reached, stop loss, maximum holding period)
- Tracks holding periods for time-based exit decisions
- Handles position sizing based on account size and risk parameters
-
Performance Analysis:
- Calculates key performance metrics (returns, Sharpe ratio, drawdown, win rate, etc.)
- Generates detailed trade history with entry/exit information
- Implements visualization methods for backtest results
- Supports saving results to various formats for further analysis
-
Integration & Usability:
- Provides a simple helper function
run_zscore_backtest()for easy execution - Designed to work seamlessly with the existing data pipeline and cointegration testing framework
- Includes comprehensive documentation for all functions and parameters
- Provides a simple helper function
This implementation satisfies all requirements for the Phase 1 basic z-score strategy backtesting component and provides a solid foundation for more advanced strategies in subsequent phases. The design emphasizes modularity, allowing for easy extension with more sophisticated entry/exit rules, adaptive parameters, and additional risk management techniques in future phases.
The statistical methods required for cointegration analysis have been implemented in src/cointegration/statistical_methods.py with the following key components:
- Full implementation of Johansen's maximum likelihood procedure for testing cointegration
- Support for different deterministic trend specifications (-1 to 3)
- Both trace and maximum eigenvalue test statistics
- P-value calculation with proper critical values
- Extraction and normalization of cointegrating vectors
- Human-readable conclusions for easier interpretation
- Implementation of the two-step Engle-Granger procedure
- Multiple regression methods (OLS, Dynamic OLS, Total Least Squares)
- Proper residual analysis with ADF test
- Customizable ADF test options (trend, maxlag, autolag)
- Half-life calculation for mean-reversion speed
- Comprehensive validation utilities in
src/cointegration/validation_utils.py - Functions to generate synthetic data with known cointegration properties
- Validation against different parameter combinations
- Comparison with external library implementations
- Visualization tools for validation results
- Comprehensive test suite in
tests/unit/cointegration/test_statistical_methods.py - Tests for all major functions with synthetic data
- Input validation tests
- Edge case handling
This implementation satisfies the critical requirements for Phase 1 of the project, providing the statistical foundation for the pairs trading framework. The code includes proper error handling, detailed documentation, and follows academic standards for statistical rigor.
- Created comprehensive test fixtures for:
- Configuration setup
- Model initialization and loading
- Test data generation
- Behavior recording for enhance_signals method
- Behavior recording for apply_intraday_adaptations method
- Edge cases for model training and prediction
-
Extract feature calculation into a separate
IntradayFeatureGeneratorclass- Move
calculate_featuresmethod and related helpers - Create a clean interface between feature generation and signal enhancement
- Move
-
Extract model training into a separate
IntradayModelTrainerclass- Move all
train_*methods into this class - Create a standardized interface for model training and evaluation
- Move all
-
Extract prediction functionality into a separate
IntradayPredictionEngineclass- Move all
predict_*methods into this class - Establish clear input/output contracts
- Move all
-
Decompose
enhance_signalsmethod (231 lines, complexity 26):- Split into smaller, focused methods:
apply_ml_filteringapply_technical_filtersoptimize_entry_timingoptimize_exit_timingadjust_for_volume_patterns
- Split into smaller, focused methods:
-
Decompose
apply_intraday_adaptationsmethod (181 lines, complexity 23):- Split into smaller, focused methods:
adapt_to_time_of_dayadapt_to_market_regimeadapt_to_volatility_conditionsadapt_to_liquidity_conditions
- Split into smaller, focused methods:
- Create
IntradayFeatureGeneratorclass - Refactor and simplify
enhance_signalsmethod - Create
IntradayModelTrainerclass - Refactor and simplify
apply_intraday_adaptationsmethod - Create
IntradayPredictionEngineclass - Finalize and validate the refactored structure
- Verify behavior consistency before and after refactoring
- Use test fixtures to capture behavior of original implementation
- Compare outputs of refactored implementation against original
- Ensure all functionality is preserved
- Maintain test coverage throughout refactoring process
- Code complexity metrics
- Reduce method length (target < 50 lines per method)
- Reduce cyclomatic complexity (target < 15 per method)
- Test coverage (maintain or improve)
- Performance benchmarks (should not significantly impact execution time)
Date: [Current Date]
The tests for the Kalman filter implementation are failing with the following error:
ValueError: The shape of all parameters is not consistent. Please re-check their values.
This error is occurring in the pykalman library's _determine_dimensionality function when initializing a KalmanFilter object. The function is detecting inconsistent dimensions between the parameters provided.
- The error occurs during the initialization of KalmanFilter object in the
estimate_timevarying_hedge_ratiofunction - The specific error is raised when dimensions of variables don't match during initialization:
if not np.all(np.array(candidates) == candidates[0]): raise ValueError("The shape of all parameters is not consistent. Please re-check their values.")
- The variables being checked include:
- The exogenous variables matrix (X)
- The observation covariance matrix
- Potentially other matrices related to the Kalman filter state space model
- Parameter Shape Alignment: Ensure all matrix parameters have consistent dimensions by explicitly reshaping them before passing to KalmanFilter
- Configuration Adjustment: Modify the KalmanFilter configuration to accept the current shapes or provide appropriate dimension hints
- Input Data Preprocessing: Apply necessary transformations to input data before passing to the Kalman filter
- Library Compatibility: Verify that we're using the correct version of pykalman and that our usage matches expectations
- Debug the
estimate_timevarying_hedge_ratiofunction to identify the exact dimensionality issue - Fix the parameter shapes to ensure consistency
- Update tests to use the corrected implementation
- Add specific test cases to verify that the dimension handling is robust
- This issue affects all tests that use the Kalman filter implementation
- Resolution is required before proceeding with comprehensive testing of the Kalman filter functionality
- Coordination with Agent 1 (Implementation Agent) is needed to align implementation and testing approaches