Neither strategy managed to beat buy and hold over the test period, and honestly the gap is pretty large. That said, this is a useful result. It tells us something concrete about where mean reversion works and where it doesn't, which is the point of running the backtest in the first place.
SPY compounded to roughly 7x over the period. Both strategies ended close to where they started. The Z-Score strategy finished slightly below 1 (a small net loss over 15 years) and Bollinger Bands came out just above it.
The reason is pretty simple. SPY spent most of 2010 to 2024 in a strong bull market. A mean reversion strategy that shorts when price looks stretched is essentially betting against the trend, and in a market that kept making new highs year after year, those short bets kept losing. The long signals worked reasonably well since they were at least pointing in the right direction, but the losses from the short side outweighed them over the full period.
Both strategies were underwater for almost the entire test period. The Z-Score strategy in particular shows a drawdown that deepens more or less continuously from 2011 onwards, getting close to 50% by 2024. Bollinger Bands follows a similar path but stays slightly shallower for most of it.
The sharp drop around 2020 is the COVID crash. SPY fell hard and then recovered very quickly. The strategy would have gone long on the way down, which is fine, but the speed of the recovery followed by the continued rally meant the short signals that came after the rebound did a lot of damage.
What is notable is that the drawdown never really recovers at any point. A strategy with a genuine edge might draw down and then claw back. Here it just keeps getting worse, which suggests there is no persistent edge on this asset over this period.
The z-score crosses the entry thresholds very frequently, meaning the strategy was in and out of positions constantly. This matters because every position change incurs transaction costs, and with a small edge per trade those costs add up fast over 15 years of daily trading.
You can also see that the z-score spends a lot of time elevated above zero, which reflects the upward trend in SPY. In a trending market the rolling mean is always playing catch-up to the price, so the z-score stays positive and keeps generating short signals that turn out to be wrong.
Zooming into 2023 and 2024 makes the problem easy to see. The red short signals appear all along the upper band as SPY kept climbing through its post-2022 recovery. Most of those shorts would have lost money. The green long signals at the lower band are more accurate in this window, picking up some of the pullbacks, but there are far fewer of them and they don't make up for the short side losses.
This is probably the most informative chart of the two. Each point shows the Sharpe ratio calculated over the previous 6 months, so it gives a sense of whether the strategy was working at any given time rather than just over the full period.
Both strategies spend roughly as much time below zero as above it. There is a window around late 2015 to 2017 where both strategies show consistently positive Sharpe ratios, which lines up with a period where equity markets were choppier and more range-bound. That is exactly the kind of environment mean reversion is suited to. But from 2018 onwards the Sharpe is mostly negative, with a particularly bad stretch around the COVID period.
The two lines track each other closely throughout, which makes sense since they are two versions of the same basic idea.
Mean reversion needs a range-bound market to work. Prices have to actually come back to the mean, and in a strong trend they often don't, at least not quickly enough to make the trade worthwhile before costs eat the profit.
SPY from 2010 to 2024 was about as bad an environment for this approach as you could pick. The Fed held rates near zero for most of that period which pushed equity valuations higher, and the index compounded strongly with only brief interruptions.
This doesn't mean mean reversion is a bad strategy. It tends to work better in a few specific situations:
- On individual stocks, where price dislocations from earnings, news, or sector rotations are more common and more likely to reverse
- On pairs of correlated assets, where you trade the spread between them rather than the outright direction
- On shorter timeframes, where intraday reversion happens faster than daily signals can capture
The result here is a naive daily mean reversion strategy with short signals on a trending index ETF doesn't work over a 15-year bull market.
- Remove short signals and test long-only mean reversion to avoid fighting the uptrend
- Test on individual stocks where mean reversion tends to be more reliable
- Widen the entry thresholds to reduce trading frequency and lower the cost drag

