Overfitting / Curve-Fitting
Definition
Overfitting (curve-fitting in trading) is when a model or strategy fits historical data so closely that it captures noise rather than signal. The model performs brilliantly on the data it was trained/tested on but fails on new data. The defining failure mode of naive trading-system development; addressed by walk-forward validation, out-of-sample testing, and parameter parsimony.
In-depth: Overfitting / Curve-Fitting
Overfitting is the central problem of machine learning and quantitative trading. The intuition: any dataset contains a mix of signal (real patterns) and noise (random variation). A model with enough complexity can fit both — but fitting noise is counterproductive because noise doesn't repeat. The fit appears to work brilliantly on the data the model was developed on, then fails dramatically on new data.
In trading, overfitting takes specific forms:
**Parameter overfitting:** - Strategy with 10 parameters optimised on 5 years of data - Optimisation finds a parameter combination that produces 60% returns at 5% drawdown on the optimisation data - Live deployment produces -20% in the first year - Diagnosis: the 'optimal' parameters were chasing noise; the search space was too large relative to data, allowing optimisation to find parameters that fit historical noise rather than persistent edge
**Strategy overfitting:** - Developer iterates 50 different strategy variations, picks the best-performing one on historical data - Even if each variation is honestly developed (no peeking), the selection step introduces selection bias — among 50 strategies, the best is almost certainly partially benefiting from noise - 'p-hacking' equivalent in trading
**Feature overfitting:** - ML model with 100 features trained on 1000 historical examples - Model finds spurious correlations between random features and outcomes - Predictions look strong in-sample but fail out-of-sample
Indicators of overfitting (warnings during development): 1. Performance dropping substantially between optimisation period and out-of-sample test 2. Parameter values that are highly specific (e.g. lookback = 73 bars rather than a round number like 50 or 100) — suggests optimisation chased a noise peak 3. Performance highly sensitive to small parameter changes (e.g. changing lookback from 73 to 75 cuts returns in half) 4. Trade count very low — fewer trades means less statistical confidence in performance metrics 5. Performance varying dramatically across different time periods within the test data (suggests strategy works in some regimes but not others; the optimisation period happened to favour the strategy's regime)
Mitigations:
**Walk-forward optimisation:** see the WFO entry — the primary defence against parameter overfitting.
**Out-of-sample testing:** reserve 20-30% of historical data, never use it for parameter selection or strategy iteration. Test final strategy on this data; if it works, that's stronger evidence than in-sample performance.
**Parameter parsimony:** prefer strategies with fewer parameters. A strategy with 3 parameters is much harder to overfit than one with 15 parameters because the search space is exponentially smaller.
**Robustness testing:** vary each parameter ±20% from its 'optimal' value and verify the strategy still works. If performance collapses with small parameter changes, the strategy is overfitting.
**Multi-market validation:** test the strategy on currency pairs it was not optimised for. A trend-following strategy optimised on EURUSD should still produce positive expectancy on GBPUSD, USDJPY, AUDUSD with the same parameters (different magnitude is expected; sign should be the same).
**Regime decomposition:** test performance in different regimes (high-volatility vs low-volatility months; trending vs chop years). A strategy that requires specific regimes to work may be overfit to historical periods that happened to provide those regimes.
**Strategy logic plausibility:** beyond statistics, ask whether the strategy has a plausible economic rationale. 'Buy on Tuesdays at 10:37 GMT' has no rationale; backtest performance is almost certainly overfit. 'Buy on London open breakouts when volatility is below recent average' has a rationale; backtest performance is more credible.
For commercial EAs: vendors should disclose their development methodology, walk-forward results, and parameter sensitivity analysis. Vendors who do not respond to these questions or only show in-sample backtest results should be treated with significant skepticism.