“…if this [probability] calculus be condemned, then the whole of sciences must also be condemned.” – Henri Poincaré
One of the personal gripes that I have with a lot of folks in the financial industry is that they don’t understand statistics, they don’t make an effort to report them, and they don’t realize the value that they have in determining the efficacy of their lauded strategies.
We’ve all seen a tweet or read an article, written by some financial analyst big wig, talking about the reasons why the bull market will continue to rally, or put up price performance predictions for the next 3 months, 6 months, year, etc. For example, I might see a graph showing a technical indicator whose signal preceded a pullback, or someone reporting a bounce off of a Fibonacci retracement level, or (this is a made up example) that when September is up, the market was up by an average of 10% 6 months later. All of these examples are missing any of the context that should be used in determining whether or not these signals are worthwhile investment triggers: statistics.
It should be standard practice in the financial realm, just as it is in the scientific realm, that any measurement without associated error/confidence is to be disregarded. The reason for this is that the reported measurement could either be extremely precise, or entirely unreliable, and it is safest to assume the worst case of low reliability when dealing with unknowns.
As an example of the fallacy of only using averages to guide your investment decisions, let’s assume that you choose to invest in Starbucks whenever they announce their pumpkin spice latte, assuming that the increase in sales will provide a seasonal boost to the company valuation. Below is a compilation of the SBUX price change through year end after the annual release of the pumpkin spice latte.
On average, opening a long SBUX position in August after Starbucks releases the PSL appears to be a foolproof way to give a portfolio’s returns a nice 8% boost into year end. But only looking at that average misses one of the most obvious drawbacks to this plan. The price performance varies wildly from year to year. The standard deviation of this strategy is a ludicrous 18.27%, meaning that one can expect, with a 95% confidence, that the returns from this strategy would fall anywhere between -28.44% and 44.64%. On the high end, that would be phenomenal, but the probability of getting a return at that level or more is only 5%. It should be self-evident that expecting a 45% return with only a 5% chance of success is not the right way to manage a portfolio. Turning to my preferred perspective when managing my portfolio, which is looking at how much I could lose, it would be, bluntly a terrible idea to risk as much as a 28% loss for an 8% average gain. And in reality, the lower end of the return spectrum actually came to fruition, where the minimum return ended up being -39.20%, which completely counteracts the maximum return of 39.82%.
This is an excellent example of why it should be industry standard to report baseline statistical information on any performance metrics. Yes, the S&P 500 has averaged ~10% returns over the past 60 years. But a portfolio fully invested in SPY would have seen a -38.49% return in 2008. Using statistics can help to properly assess whether or not a strategy, indicator, etc. is a worthwhile tool for trying to play the stock market.
For another example, I recently saw a twitter post extolling that, after a 1% gap upward by the S&P 500, the index was higher 3 months later 100% of the time. On its face, this sounds like a very compelling signal for being mid-term bullish with the index. However, an immediate follow-up question should be: is this significantly different from opening a long position on a randomly-selected day? By knowing the standard deviation of the yearly returns, it is possible to determine whether or not a trade signal is actually doing something, or whether it is likely that the signal is actually not predictive, and any outsized gains in the past have just been a matter of luck.
Comparing the actual performance of this strategy to choosing a day at random shows that this strategy has likely been successful by a matter of luck. Over the past 10 years, there have been 39 instances of a 1% gap by the S&P 500 that was held through the trading day (meaning days where there was a strong bullish push to start the day that did not end up fading). Well first off, it turns out that tweet was not true, because the S&P 500 was higher 89.74% of the time as opposed to 100%. Still, an 89.74% success rate is not bad, especially with a 9.48% average gain. The standard deviation of this gap up strategy is, however, not as impressive at 8.04%, almost as large as the mean itself.
In the same time frame, the S&P 500 was up 3 months later 73.85% of the time for any randomly selected day, with an average return of 2.88% and a standard deviation of 6.78%. For a relation to the scientific method, this would effectively be the control trial, where the strategy of going long on the S&P 500 after a 1% gap is the interventional trial. When there is a difference between 2 observations of 2 or more standard deviations, then we can declare that the two results are statistically significant, and the observed difference is highly likely to be due to a real effect as opposed to happenstance. In this case, the results of the control (randomly choosing a day to open a long position) and the gap up strategy (opening a long position after a gap up over 1%) are well within 2 standard deviations of each other.
Perhaps the main takeaway from this is that, outside of yielding insanely outsized gains, it is unlikely that the vast majority of short-term trading strategies are actually effective beyond getting lucky. One way to increase confidence in a short-term strategy, however, is to get a larger sample size, which would shorten the error bars if the strategy is actually consistently effective. Without fully overlapping errors (e.g. if the standard deviation of the gap up strategy was only 1%), it becomes much less likely that the two results are statistically equivalent. This would lend credence to a claim that a certain signal/indicator is actually predictive and useful. Soap box time: but without reporting these errors, LIKE NEARLY ALL OF WALL STREET NEGLECTS TO DO, I will lend zero credence until properly performing the statistical analysis myself, and I’m a busy person, so I just don’t listen to the finance bros. Analyzing statistics really does provide a vast amount of context for investors, and it is hyperbolically criminal that people in the financial industry are not required to have a legitimate understanding of and respect for statistics before being allowed to handle people’s money. A closing bit of advice: think for yourself; trust but verify.