With the release of the 2009-2010 NBA schedule yesterday, we saw the first wave of predictions for the upcoming season. In making predictions for the Blazers or any other team the first thing most people consider, whether explicitly or implicitly, is a team's record the previous season. The previous season serves as a baseline in our mind and then we try to figure out if the team will improve or get worse. By and large, this is a sensible way of thinking.
However, an additional consideration that should go into our thought process--but that we often forget--is a phenomenon called regression to the mean. Regression to the mean is a technical term in probability and statistics that refers to the fact that, left to themselves, things tend to return to normal, whatever that is. The term was coined by Francis Galton, inventor of the regression, when he noticed that the offspring of very tall parents tended to be shorter than their parents--at least that's the story they always tell in statistics classes. Regression to the mean in NBA wins would imply that teams that tend to win a lot of games in one season are more likely to win fewer games the following season, or conversely that teams that win few games in one season are more likely to win more games the following season. In other words, a 60 win team in 2009 is more likely to win 55-59 games than 60-64 in 2010, while a 25 win team in 2000 is more likely to win 26-30 games than 25-21 games in 2001. Does this happen in the NBA? Indeed it does. In fact, regression to the mean is quite pronounced and holds controlling for the average age of teams. Details below.
Regression to the Mean in wins from Season to Season, the NBA from 1956 to 2009
It is actually fairly easy to assess the importance of regression to the mean in wins from one season to the next in the NBA. Team season records are readily available at basketball-reference.com or databasebasketball.com and the empirical question is straightforward: do teams tend to win fewer games the following season if they are above the mean number of wins (41)? There are a variety of ways to answer this question analytically, all of which point toward regression to the mean being quite robust. To demonstrate that this occurs, I have simply graphed the average change in the of number of wins as a function of wins in the previous season for all teams from 1956 to 2009:
The graph shows that teams that have won 40 or 41 games the previous season, win about 40 to 41 games on average the next season, because the average change in the number of games is about 0. As teams move in either direction of average, however, regression to the mean occurs. Teams that win 46 games a season, tend to win about 2 fewer games the following season (44). In contrast, teams that win 37 games in one season win, on average, about 3 more games the next season (40). In addition, the farther a team moves away from the mean, the stronger the pull of the mean. 60 win teams win, on average, about 6 fewer games the following season (54), while 25 win teams tend to win about 6 more games the following season (31).
In common language, this graph shows that bad teams tend to get better and elite teams tend to get worse. A fairly sensible implication of this pattern is that, as many have suggested, going from 54 to 60 wins is "harder" than going from 40 wins to 46 wins. Why is true? There are a variety of possible reasons, but the most important one is probably luck. Teams that do well tend to avoid injuries, have favorable schedules, and win close games. Teams with bad luck (injuries, bad chemistry, or bad bounces) tend to see their fortunes brighten the following season simply because average luck is more likely than bad luck. I was fairly confident that I would see evidence of regression to the mean in the data, but the strength, regularity, and linearity of the pattern surprised me. I figured that the graph would be fairly flat around the middle, with teams with 35 to 47 wins not regressing to the mean much, but truly elite and terrible teams regressing to the mean quite strongly.
Regression to the Mean in wins from Season to Season, the NBA from 1980 to 2009
To check on the robustness of this pattern, consider the graph below, which restricts the analysis to season from 1980 to 2009. Though the graphs look almost indentical, they are run on different data, which is one indication of how regular this pattern is:
Is it all about Age?
One might wonder if this trend is simply a reflection of something that we've talked about before, age. That is, is the tendency for bad teams to get better and elite teams to get worse, simply a reflection of the fact that "bad" teams are really just younger teams and elite teams are full of veterans? The short answer is no. While it is true that older teams tend to get worse and younger teams tend to get better (with the break even point being an average age of 27 years), regression to the mean is still strong controlling for age. In other words, teams with an average age greater than 27 tend to have a worse record the following season, but the higher the number of wins the previous season, the worse their record.
To illustrate this, the graph below shows the average change in number of wins as a function of previous wins, controlling for a team's average age in the previous season. (For those that care, the y-axis is actually the residuals from a regression of change in wins on the age of a team in the previous season). That the slope of the line in this graph is less steep indicates that age was driving some--but not all--of the pattern in the previous graphs.
So what does this mean for the Blazers in 2009-2010?
Since the Blazers were both very young and very good in 2009, what should we expect in 2010? The youthfulness of the Blazers suggest that they should improve, but improving from 54 wins is very difficult. In particular, teams that have won 54 games have won an average of 51 games the following season. On the other hand, teams of an average age of 24 to 25, win about 5 additional games the following season. Quick and dirty regressions of wins on a set of dummy variables for wins the previous season and age the previous season yields a prediction of 54-56 wins for the Blazers in 2010, depending on some minor technical assumptions. While I do not believe that theses are the only factors one should consider in projecting the Blazers season in 2010, I also would not ignore them. If you think that the Blazers are going to win more than 54-56 games in 2010, it should be because you believe the additions of Andre Miller and improvement of Oden and other players will make-up for the normal regression to the mean that occurs in the NBA.
Lastly, for those of you that are not interested in averages, graphs, regressions, and whanot, below is a list of the team records for all teams following a season of 54 wins (so, LAL won 54 games in 62, but 43 games in 63):
|Team||Year||Wins|| Age prior year
As you can see, some 54 win teams improved, more got worse. In addition, last year's Blazers, with an average age of 24.5, is far younger than all previous 54 win teams. Thus, there is no perfect historical analogy for the current team.
Nonetheless, seeing the strength of regression to the mean in the NBA probably has made me a bit more skeptical about the Blazers chances of winning 60 games in 2010 (my original prediction), and it has had an even bigger impact on the way I will think about the rest of the league in 2010. Anyway, this is far from the final word on making projections for the coming season. It's just an interesting pattern that is easy to document that I thought the Blazersedge community might find interesting. Does this change your outlook on the Blazers in 2009-10? For other teams? Why or why, not?
Any alternative explanations, comments, questions, or suggestions for additional analysis?