Predicting Wins in 2009-2010, Consider "Regression to the Mean"
With the release of the 2009-2010 NBA schedule yesterday, we saw the first wave of predictions for the upcoming season. In making predictions for the Blazers or any other team the first thing most people consider, whether explicitly or implicitly, is a team's record the previous season. The previous season serves as a baseline in our mind and then we try to figure out if the team will improve or get worse. By and large, this is a sensible way of thinking.
However, an additional consideration that should go into our thought process--but that we often forget--is a phenomenon called regression to the mean. Regression to the mean is a technical term in probability and statistics that refers to the fact that, left to themselves, things tend to return to normal, whatever that is. The term was coined by Francis Galton, inventor of the regression, when he noticed that the offspring of very tall parents tended to be shorter than their parents--at least that's the story they always tell in statistics classes. Regression to the mean in NBA wins would imply that teams that tend to win a lot of games in one season are more likely to win fewer games the following season, or conversely that teams that win few games in one season are more likely to win more games the following season. In other words, a 60 win team in 2009 is more likely to win 55-59 games than 60-64 in 2010, while a 25 win team in 2000 is more likely to win 26-30 games than 25-21 games in 2001. Does this happen in the NBA? Indeed it does. In fact, regression to the mean is quite pronounced and holds controlling for the average age of teams. Details below.
Regression to the Mean in wins from Season to Season, the NBA from 1956 to 2009
It is actually fairly easy to assess the importance of regression to the mean in wins from one season to the next in the NBA. Team season records are readily available at basketball-reference.com or databasebasketball.com and the empirical question is straightforward: do teams tend to win fewer games the following season if they are above the mean number of wins (41)? There are a variety of ways to answer this question analytically, all of which point toward regression to the mean being quite robust. To demonstrate that this occurs, I have simply graphed the average change in the of number of wins as a function of wins in the previous season for all teams from 1956 to 2009:
The graph shows that teams that have won 40 or 41 games the previous season, win about 40 to 41 games on average the next season, because the average change in the number of games is about 0. As teams move in either direction of average, however, regression to the mean occurs. Teams that win 46 games a season, tend to win about 2 fewer games the following season (44). In contrast, teams that win 37 games in one season win, on average, about 3 more games the next season (40). In addition, the farther a team moves away from the mean, the stronger the pull of the mean. 60 win teams win, on average, about 6 fewer games the following season (54), while 25 win teams tend to win about 6 more games the following season (31).
In common language, this graph shows that bad teams tend to get better and elite teams tend to get worse. A fairly sensible implication of this pattern is that, as many have suggested, going from 54 to 60 wins is "harder" than going from 40 wins to 46 wins. Why is true? There are a variety of possible reasons, but the most important one is probably luck. Teams that do well tend to avoid injuries, have favorable schedules, and win close games. Teams with bad luck (injuries, bad chemistry, or bad bounces) tend to see their fortunes brighten the following season simply because average luck is more likely than bad luck. I was fairly confident that I would see evidence of regression to the mean in the data, but the strength, regularity, and linearity of the pattern surprised me. I figured that the graph would be fairly flat around the middle, with teams with 35 to 47 wins not regressing to the mean much, but truly elite and terrible teams regressing to the mean quite strongly.
Regression to the Mean in wins from Season to Season, the NBA from 1980 to 2009
To check on the robustness of this pattern, consider the graph below, which restricts the analysis to season from 1980 to 2009. Though the graphs look almost indentical, they are run on different data, which is one indication of how regular this pattern is:
Is it all about Age?
One might wonder if this trend is simply a reflection of something that we've talked about before, age. That is, is the tendency for bad teams to get better and elite teams to get worse, simply a reflection of the fact that "bad" teams are really just younger teams and elite teams are full of veterans? The short answer is no. While it is true that older teams tend to get worse and younger teams tend to get better (with the break even point being an average age of 27 years), regression to the mean is still strong controlling for age. In other words, teams with an average age greater than 27 tend to have a worse record the following season, but the higher the number of wins the previous season, the worse their record.
To illustrate this, the graph below shows the average change in number of wins as a function of previous wins, controlling for a team's average age in the previous season. (For those that care, the y-axis is actually the residuals from a regression of change in wins on the age of a team in the previous season). That the slope of the line in this graph is less steep indicates that age was driving some--but not all--of the pattern in the previous graphs.
So what does this mean for the Blazers in 2009-2010?
Since the Blazers were both very young and very good in 2009, what should we expect in 2010? The youthfulness of the Blazers suggest that they should improve, but improving from 54 wins is very difficult. In particular, teams that have won 54 games have won an average of 51 games the following season. On the other hand, teams of an average age of 24 to 25, win about 5 additional games the following season. Quick and dirty regressions of wins on a set of dummy variables for wins the previous season and age the previous season yields a prediction of 54-56 wins for the Blazers in 2010, depending on some minor technical assumptions. While I do not believe that theses are the only factors one should consider in projecting the Blazers season in 2010, I also would not ignore them. If you think that the Blazers are going to win more than 54-56 games in 2010, it should be because you believe the additions of Andre Miller and improvement of Oden and other players will make-up for the normal regression to the mean that occurs in the NBA.
Lastly, for those of you that are not interested in averages, graphs, regressions, and whanot, below is a list of the team records for all teams following a season of 54 wins (so, LAL won 54 games in 62, but 43 games in 63):
| Team | Year | Wins | Age prior year |
| PH1 | 1960 | 48 | 26.43747 |
| LAL | 1963 | 43 | 26.49471 |
| BOS | 1968 | 48 | 30.05789 |
| NYK | 1969 | 60 | 26.63787 |
| CHI | 1974 | 47 | 29.21364 |
| BOS | 1976 | 44 | 29.32505 |
| WA1 | 1979 | 39 | 28.55113 |
| LAL | 1981 | 57 | 27.44369 |
| LAL | 1984 | 62 | 28.10951 |
| PHI | 1986 | 45 | 28.51308 |
| DEN | 1988 | 44 | 28.99218 |
| DET | 1988 | 63 | 27.97651 |
| PHO | 1990 | 55 | 27.21177 |
| UTA | 1991 | 55 | 29.1215 |
| CLE | 1993 | 47 | 29.28346 |
| CHA | 1997 | 51 | 29.47461 |
| DET | 1997 | 37 | 28.57562 |
| IND | 1999 | 56 | 31.22052 |
| ORL | 1999 | 41 | 29.17436 |
| MIA | 1999 | 52 | 30.66639 |
| DET | 2004 | 54 | 27.88333 |
| DET | 2005 | 64 | 28.37027 |
| PHO | 2006 | 61 | 28.01781 |
As you can see, some 54 win teams improved, more got worse. In addition, last year's Blazers, with an average age of 24.5, is far younger than all previous 54 win teams. Thus, there is no perfect historical analogy for the current team.
Nonetheless, seeing the strength of regression to the mean in the NBA probably has made me a bit more skeptical about the Blazers chances of winning 60 games in 2010 (my original prediction), and it has had an even bigger impact on the way I will think about the rest of the league in 2010. Anyway, this is far from the final word on making projections for the coming season. It's just an interesting pattern that is easy to document that I thought the Blazersedge community might find interesting. Does this change your outlook on the Blazers in 2009-10? For other teams? Why or why, not?
Any alternative explanations, comments, questions, or suggestions for additional analysis?
69 comments
|
20 recs |
Do you like this story?
Comments
I applaud your willkingness to
wade into meaningful stats. You’re well ahead of me there.
Another way of stating regression to the mean is that it is hard to become really good, and even harder to maintain that level.
"I'm a man, but I can change.....if I have to......I guess." - Red Green
So getting above par (improving) means return to par is more likely than not. What comes up must come down, but how far up is different for each team. If three teams get 54 wins, one is already on their way down and is passing 54 going the wrong direction. One is getting better and passing 54 on the way up. The third peaked the previous year at 54 and so will be part of the batch going back down the next year. 2 go down for 1 going up.
How can you analyze the zenith of each teams rise? Wins spent above .500? Peak wins and time to return to .500? Age relative to peak wins? Championships per team that include somebody on the roster named Brandon Roy?
I don’t know any statistics jokes, but I looked one up and included it here:
Statistics play an important role in genetics. For instance, statistics prove that numbers of offspring is an inherited trait. If your parent didn’t have any kids, odds are you won’t either.
The cowards never started
The weak died along the way
Only the strong survived
They were the Trailblazers
by lukeyhere on Aug 5, 2009 11:45 AM PDT reply actions 1 recs
A very intersting suggestion...
What you are suggesting is that there is some momentum to the trajectory of a franchise, with teams oscillating back and force. You are right that this would produce a regression towards the mean type pattern in the data.
It’s not entirely clear how best to assess if this happens, but I did a quick check and I was surprised to see the opposite: the more "good’ teams improved from year 1 to year 2, the more they regressed in year year 3. Conversely, the more bad teams deteriorated from year 1 to year 2, the more they improved in year 3. For example, the Phoenix suns won 62 games in 2004, 54 games in 2005, and then 61 games in 2006. In contrast, Orlando won 41 games in 1997, improved to 54 games in 1998, and then fell back to 41 wins in 1999.
I don’t want to look at past teams stats i want a real prediction not some where between 54 and 56 wins. and no lakers stats this is a blazer blog. if you went to church would you talk about how the devil improved? no, i personally think 58 wins and a trip to the western conference finals, anything else will just be disappointing. (see short and concise)
This comment
was utterly silly.
The Michael Ruffin of BlazersEdge, cuz Amlmart said so.
by BlazersOrBust on Aug 5, 2009 2:41 PM PDT up reply actions 2 recs
Regression to the mean is not applicable here, though your graphs
might lead you to believe this. Specifically, when looking at all teams as a whole as compared to previous seasons, number of wins will be equally far from the mean from year to year. So if Cleveland’s wins go down, Portland’s might go up to maintain that “equally far from the mean” phenomenon.
Also, you cannot predict an individual team’s wins based on regression to the mean – if a team knows it’s not going to make the playoffs, they may not play as hard, skewing their win total for both them and their opponents.
Also, you should have a control sample when trying to establish regression to the mean, which is pretty much impossible in the NBA…
Patty Mills - PG of the future. Book it.
by Blazerholic on Aug 5, 2009 1:14 PM PDT reply actions 2 recs
well, i think you are using the term in a more technical way than I intended
what you described in the first paragraph is simply a reason that terms tend to revert toward the average record.
While it’s true that using a control sample is valuable for correcting for statistical regression to the mean, it’s not necessary here.
Seems to me like several of the teams that did improove after 54 win seasons
went on to win championships, either that year or the following.
Normally you might be right
But barring any major injuries we should improve. For one the number of ridiculous comebacks should drop.
"Good evening Blazer fans, wherever you may be!"-Bill Schonely
Player injury, Player movement, Draft talent, and Salary cap
would seem to be big reasons for teams getting better or worse.
— How badly did Gilbert Arenas getting injured hurt the Wizards?
— Bostons traded to put together the Big Three and went from worst to first
— Drafting Tim Duncan made San Antonio go from worst to first
— Orlando would have liked to hold on to Hedo, but he went where the money was better
For these various factors, team from one year can have very little bearing to the team from previous year. For example, San Antonio won a lot last year, but they have upgraded with trades, and should win more. The team name “San Antonio Spurs” has stayed the same but the team make-up has not. So does regression to mean even apply when you’re not talking about the same teams?
For these various factors, team from one year can have very little bearing to the team from previous year. For example, San Antonio won a lot last year, but they have upgraded with trades, and should win more. The team name "San Antonio Spurs" has stayed the same but the team make-up has not. So does regression to mean even apply when you’re not talking about the same teams?
I am simply assuming the units are the same and letting the data tell me if there is regression toward the mean in those units. If the units were completely unrelated, you’d be less likely to see regression towards the mean.
Isn't it the opposite?
I think that one of the biggest reasons why regression to mean happens is because of player turnover. If teams completely changed every year, there would be 100% regression to mean (every team could expect to win about 41 wins next year regardless of what happened in previous years). If teams are stable and are able to retain their key players, the regression effect should be reduced.
by trk on Aug 6, 2009 1:11 PM PDT up reply actions
Is it possible
that you are mixing up parity and regression to the mean? I think what polisam said is the correct way to think about it with every team being a unit, while it seems that you are looking at the league as a unit.
Life is exhausting when you are this stupid.
I will talk about DeJuan Blair no more forever
Still question whether league wide regression around mean is applicable
If teams completely changed every year, there would be 100% regression to mean
A test of this statements validity is to examine the inverse. That is if teams stayed exactly the same, would they produce the exact same results from year to year? I’m not trying to suggest that we are trying to oversimplify the complex factors on performance. However, I question whether regression to the mean can be validly applied.
Regression to a mean suggests that point results from an experiment are not a reliable final indicator, since an occasional extreme result might be misconstrued as the typical answer. And rather, that results will “regress to a mean” after a series of experiments. In general you are keeping other factors constant. Applying regression to the mean across the entire league — there just seem to be too many variables, where there can be any regression to a mean.
The Clippers are “always” going to be the Clippers. So just because they had a 20 (?) loss season, does not mean that will get up to 62 win season someday to balance it out. Similarly, salary cap or not, some teams always seem to contend for championships. [Celtics, Spurs and L@kers]. And because they had a 60 (?) win season, does not mean that they will have a 22 loss season to balance it out. Maybe each team has its own mean that it regresses about.
Or for that matter for performance with constantly fluid set of conditions, bell curves are considerably more likely than a flat mean, where there a few really good, a few really bad, and the bulk falling in the middle. Could one apply some form of regression to a bell curve?
Lastly I would put forward that there is a “hysteresis effect”. Teams that are good remain good for several years, and they strive to “keep their window open”. Teams that are bad remain bad for several years, as they have “patience while rebuilding”. If a young, up-and-coming team won 49 games, would could certainly expect it to win more (and not regress to 41). The 90-91 Bulls won 61 games, and then won 67 games the following season. This hysteresis skews regression to the mean.
Maybe a given team that does not have much change might regress around some mean relevant for itself. The 72 win bulls team, won 69 the following season, and 62 wins the year later. They went lower each year, but the mean was still way over 54, and 54 wins probably would have been considered under-performance. So I cant see how league wide regression around 41 could reasonably apply to assess future performance of a given team.
I believe we will win more games
As long as major injuries and/or major trades etc. dont happen.
S
The Princess of Blazersedge
Sports do not build character. They reveal it. - Casey Dillon Stengel
Factors I see
1. Age (as you mentioned). As you say, controlling for age it is still there, but it appears slightly less. Therefore, your conclusion that it is not “all about age” is manifestly sound, but it still appears to be at least one factor.
2. The draft. Bad teams get better players coming in, good teams usually don’t. Not every top rookie has an impact his first year, but most impact rookies join bad teams, not good teams.
3. Salary cap. Usually, good teams are good because they have good players, and good players are usually paid well. Thus, good teams usually have less room to maneuver/sign free agents/make advantageous trades. They may even have to let good players go.
4. Injuries. A team that wins a lot of games has often been relatively fortunate on the injury front. There’s a good chance they won’t be as fortunate the next year.
5. GM paralysis. A team that won 55 games is much less likely to have major personnel changes than a team that won 25. You don’t want to mess up a good thing. This can be wise — but sometimes you will pass up a deal you should have made.
Those are all the factors I see. I don’t think there is any magical regression to the mean, I think it is driven by factors such as these. Most of them don’t apply to this team this year — we aren’t an older team, we did have the cap space to upgrade, we weren’t exceptionally lucky on injuries last year, and we have the depth to absorb some this year.
We weren’t able to add significant talent in the draft, though it’s possible we picked up a rotation role player, but since we had four rookies last year, I don’t think we are going to be contributing to draft-driven parity this year.
We’ll be fine, and win 61-63.
Nice work on this post. I think if you controlled for games lost to injury for stars and significant rotation players (20+ mpg), you would see that line flatten out quite a bit. But I suspect that would be quite a lot of work for limited benefit.
When I rule the world, everyone will know how to use Excel.
by jscot on Aug 5, 2009 1:47 PM PDT reply actions 5 recs
Factors cause
regression to the mean shows an overall picture, but the things you mentioned should account for it. That’s what should guide us into predictions and not a line drawn down and showing a trend.
I'm a really really ridiculously good looking orange mocha frappaccino drinking manhammer sandwich
I like this post a lot.
You can measure skill and talent with your eyes, but productivity is shown through statistics.
I recced both you and PoliSam
his original post was an excellent springboard for exactly this kind of incisive counter-commentary. You made all the points I wish I had thought of during the ten seconds that I was considering extenuating factors.
The Michael Ruffin of BlazersEdge, cuz Amlmart said so.
by BlazersOrBust on Aug 5, 2009 2:44 PM PDT up reply actions
I rec'd him, too
He identified reality. The cause of reality is open to discussion (and testing, if someone wants to do the work of testing it).
When I rule the world, everyone will know how to use Excel.
Injuries.
I could be way off, but I kinda figured we were lucky last year with regard to injuries. Maybe my view is skewed because I’m holding my breath anytime Roy has the ball and anything short of him blowing out a knee is a good thing.
Did we actually have more injuries last year than normal? I smell a graph. Of course it would need to be weighted by the value of that player, as a game lost by Roy weights heavier than a game lost by Shav. I smell a new stat. Player Injury Relative to Value or PIRV. I smell something else…
The cowards never started
The weak died along the way
Only the strong survived
They were the Trailblazers
Martell....
Greg for a short period
Roy for an even shorter period
LaMarcus for a short stint
Blake also out for some games.
Those are big names for this team, and for them to be sitting out games can only hurt our overall production.
For example: How we almost beat Cleveland in Cleveland without LaMarcus… that could have been a very different game if he were playing.
Big D from Blog-A-Bull - "Pritchard is such a genius that teams just give him players for free."
Greg Oden - The only other rookie with more than 500 points, 400 rebounds, and 65 blocks in under 1400 minutes played. Since 1946
by FiveOhThree-RipCity!! on Aug 5, 2009 3:39 PM PDT up reply actions
some good points, but a couple of counterpoints
you are right that many things can cause the pattern in the data that I produced above. The draft is a really good one that I missed. Player movement as a result of the salary cap is another, though I would note that the pattern existed before the cap was instituted.
More importantly, I would make a pretty sharp distinction between things that are observable ex ante, like previous average age and the draft and things that are unpredictable or immeasurable ex ante, like injuries. While you can certainly measure injuries in the previous season, you cannot do so for the following season. The occurrence of injuries in 2010 is pretty much magical in my view.
I also think you are under appreciating the effect of luck on win total a bit. How many games are won because of a blown call by a ref? Because of a hot shooting night by a star? Intuition says that those things tend to balance out over the course of the season, but part of the intuition for regression to the mean comes from the fact that luck rarely balances out perfectly. Indeed, the probability that a fair coin will land heads 5,000,000 times in 10,000,000 tosses is pretty close to zero.
Here’s a random factoid: there is a sort of regression to the mean effect within seasons. That is, NBA teams that win by more than expected in one game, tend to revert back to their underlying strength. There is almost no evidence of “streakiness” within seasons for teams and this is true even though there are injuries, which should drive the data towards “streakiness.”
One way to control for luck, at least somewhat
would be to look at point differential rather than win/loss.
When I rule the world, everyone will know how to use Excel.
I would love to see the exact same analysis, but instead of focusing on W-L record focus on point differential
And see how much age plays in to that as well.
My one concern, how much statistical correlation is there between point differential and W-L? I suspect over the sample size we are talking about .99 or better, meaning that it will essentially show us the exact same thing.
But that is just a gut feeling and would actually need to be tested to be proved.
There isn't going to be a .99 correlation between point differential and W-L, partially due to rounding
(i.e. you can’t have a fractional win, even though your pythagorean w-l projection suggests you should).
But there is a strong correlation. I took the W-L records for every team in the databasebasketball.com data set (data ends after the 2007-2008 season) and looked at the correlation between team winning % and pythagorean winning . There is some question about the proper exponent to use in the equation, so I looked at both 14 and 16.5. Both and an R^2 value of .926. Or in common talk point differential explained 92.6 of team wins and loses, which seems mean that about 6 games per season seem to be up to the vagaries of chance.
I was thinking about this more
and I think that the draft is probably the biggest factor.
You will always have aberrations, the #1 pick who adds nothing to his team, the #25 pick who becomes an instant star, etc.
But over many seasons, if you look at the changes in wins based on draft position, I think you’ll see that is the biggest factor. For instance, it has been well documented that the team with the #1 pick, on average, gains 11 wins the following season.
Are all 11 of those wins attributable to that draft pick? Hard to say. But if they are, you have just wiped out most of what you are seeing here at the lower end of the scale.
It would be very difficult to measure. Sure, we added Greg, but we also made a significant roster move (dumping Zach) and Greg didn’t play. So our improvement two years ago was completely independent of Greg. Last year, we added Greg and he played, so we should see positive impact — but we also added Nic and Rudy, a 24 and 25 pick, and they made major contributions. So I don’t know how you really measure draft impact.
When I rule the world, everyone will know how to use Excel.
We also gained
12 wins when Greg had his rookie season (even though it wasn’t the year immediately after his draft). Weird
"I'm tired" -Me
Baker's dozen
we knew what you meant.
When I rule the world, everyone will know how to use Excel.
The draft is definitely huge on the bottom side (for the losers)
but not nearly as important at the top (the winners). A very basic indicator for this claim is that there is more reversion towards the mean for teams with fewer wins than teams with a lot of wins. The draft is going to affect all teams, but would not explain why 65 win teams would regress more than 55 win teams.
I suppose a 55 win team is marginally more likely
to draft a player who will help than a 65 win team would.
So we might expect to see some impact from the draft even at that level, but you are right that it wouldn’t (on average) be very much.
It might be more complicated than that, though.
The 20 win team loses all of their games to the 65 win team, but might win 1 of 4 against the 55 win team. If the draft turns them into a 30 win team, the odds are still pretty good that they only win 1 of four against the 55 win team, but their chances of winning one of their home games against the 65 win team are arguably much better than the year before.
In other words, strengthening the bottom teams impacts the good teams, but may have a greater impact on the overall record of the very best teams. To win a very high number of games, you probably need a lot of games against bottom feeders who aren’t a real threat to beat you.
So I’m guessing that the draft is non-negligible even on the top side — but probably not the main factor.
When I rule the world, everyone will know how to use Excel.
Well, as far as "luck" goes,
there is good and bad luck WITHIN an individual season, so that would reduce the likelihood that it would differentiate seasons. Regression to the mean, a model ultimately based on a large number or random factors, is a pretty rough predictor for more specific events. The more depth of understanding you have of the underlying factors, the better predictions you can make which may or may not show a “regression to the mean”, or tendency toward mediocrity.
Basically, I can agree that “regression to the mean” is a legitimate factor in how the Blazers (or any team) do next year. I just think other factors like how Oden plays and the addition of Miller are way more powerful variables that will mask the “regression to the mean” effect, plus numerous others that lots of fans could list. Just assumming that the negatives will counterbalance the positives toward the mean is ultimately a rather superficial and simplistic presumption, when we have quite a bit of information available on each player, understandings of their compatibility, etc…
And then we can go further into more philosophical issues, such as the ultimate adequacy of random events as the determinates of all that occurs, including human experience, life…. Interesting material, possible junk drawer OT sometime.
I wonder how much luck could be quantified for a team in a given season
Record in close games is often cited as a lucky statistic, but what else could we use to determine if the Blazers were particularly lucky last season?
Team wide games missed to injury
Star players’ games missed to injury
Number of back to backs (and other schedule quirks)
Number of games against an opponent in the 2nd game of a back to back
Free throw percentage against
all compared to the mean, plus the aforementioned record in close games statistic.
There are probably some other ways to measure luck in basketball, but that’s a good place to start. It would be interesting to see a “luck factor”. I wonder if we were more lucky or unlucky last season.
"It’s a good ol’ fashioned Rip City beat down!"
those are some really good suggestions
I think number of close games won, injuries, and opponent free throw percentage are all relatively good indicators of luck. If you really wanted to know how much bad luck affected a team as a result of injury, you probably want to weight the number of games missed by something like win shares, adjust-plus minus or whatever number you think would give you a good estimate of the effect of the injury.
I wish I remembered where I have seen this number
but the Blazers are something like .800 in more than 40 games decided by 3 points or less since Brandon Roy has joined the team. It’s definitely statistically significant. Would you chalk that up to Type I error?
The Michael Ruffin of BlazersEdge, cuz Amlmart said so.
by BlazersOrBust on Aug 6, 2009 7:53 AM PDT up reply actions
One measure of luck I have seen
is comparing the Pythagorean wins vs actual wins
just throwing that out there.
Life is exhausting when you are this stupid.
I will talk about DeJuan Blair no more forever
and in that case
we finished 2 games lower than would be expected by Pythagorean wins, so we were generally unlucky.
"It’s a good ol’ fashioned Rip City beat down!"
It's a good cautionary stat
but it’s nothing to base predictions for an individual team on. I get why you put it up there, but I’m not sure whether it’s appropriate for this team.
I'm a really really ridiculously good looking orange mocha frappaccino drinking manhammer sandwich
Nice research, and very well done
I still think the Blazers are gonna win more than 60 games this year. Jscot pretty much took my reasons: all the things that usually cause such a regression to the mean don’t apply to us very strongly.
You can measure skill and talent with your eyes, but productivity is shown through statistics.
Nice, someone is breaking out the Minitab ;-)
"I'm addicted to polo y'all...respect my fresh" - Travis25Outlaw
Comment Summary thus far:
“Wow! Those are some good stats. I think they make a lot of sense. I don’t like them though, because they indicate my team won’t win as many games next year. Since I want them to win more games, I will say that my team will defy those odds and win more games next season.”
:)
Yes! Yes! In the face!
by LeafHawk on Aug 5, 2009 3:29 PM PDT reply actions 1 recs
nice well laid out post
However the graph, I would liked to have seen is the raw data of teams wins vs the change in wins the following year. That is plot the data, not just the mean of the data. Those two graphs tell very different stories. One is your story which is true, on average all teams will tend to be average (because it is a closed system, for every win there is one loss). However, by plotting all the raw data for each individual team you might see that 2/3 of losing teams improve while 1/3 get worse. Teams that are 41-41 half get better half get worse. and teams that win 2/3 get worse while 1/3 get better. So you see my point is there is a lot of wiggle room for the trend and that when you’re only looking at the mean of means will always find the mean.
by NWfan on Aug 5, 2009 3:50 PM PDT reply actions 1 recs
Yeah...
There were three 54 win teams last year. The Spurs hit 54 on a decline from 56. The Nuggets peaked at 54 after rising from 50 and will decline again. That’s 2 teams out of 3 on the decline. The third must be rising past 54 or the universe will explode.
The cowards never started
The weak died along the way
Only the strong survived
They were the Trailblazers
Very cool
I haven’t read other comments, but in favor of improvement I list:
1.) Oden improvement=overall team improvement.
2.) Miller
3.) Hard workers and unwillingness to relax.
4.) End of season domination
5.) A summer of sitting out play-offs they shouldn’t have been sitting
6.) Roy. I think he’s probably one of the most underrated superstars to ever be in the league, and yes… I’m giving him superstar status. People consistently place him top 6 and don’t seem to really think about what that means. He’s been rookie of the year and voted 2x all star by coaches who passed up other amazing talent. He dominates and WILLS games to wins. He’s had a summer to simmer. He wants to win. He could show off like other stars, but chooses to play smartly and win. I’ve also undervalued him for three years in a row despite believing he’d be good.
Cool stats though. Here’s hoping i’m right :)
"Fernandez, to my eyes, is the Blazer who walks that walk most comfortably. A lot of Portland's fans (egged on, dare I say, by their local broadcasters) lament things like how Ron Artest or Yao Ming get to hit Brandon Roy's arms.
But I suspect Fernandez sees all that and thinks: We get to hit arms! Cool!"
http://myespn.go.com/blogs/truehoop/0-39-135/On-Playoff-Experience.html
Tenuous corellation to individual teams and seasons but interesting from a macro view
Although the point is hard to argue with as a reminder that momentum is often overstated season to season, and that historically the opposite effect is realized.
I would be interested to see similar analysis for other pro (and major college) sports. This tendency in the NBA is consistent with the relative parity found here (again, relative). I wonder what more dynasty-oriented sports such as college BB & FB, MLB, and Premier League would show.
You could argue that the ‘gravity’ exerted pulling teams towards .500 is strongest in the NBA, with measures such as the cap, luxury tax, and revenue sharing.
MLB vs NBA
Given that one of the graphs that showed the very even regression to the mean was from 1980 and beyond I think it’s hard to call MLB more dynastic than the NBA. Of the 30 finals contested since 1980 28 have been won by 6 teams.
the nba has the last parity in recent history of any major american pro sport. i can’t speak to premier league but i would agree that college football basketball are very unlikely to show regression to the mean at least among the elite programs.
Regression towards the mean is interesting
Blazerholic hit a lot of good points. I’ll just say a couple other things as to why this isn’t as appropriate as the historical examples of regression to the means (or even as appropriate as the “Sophmore Slump” example in Wikipedia about single player performances after a rookie season).
True Regression towards the mean occurs due to measurement error being symmetrically distributed around the center of the population. It requires a high amount of measurement error… what I mean by measurement error is the difference between a observable score or measurement and the concept of a true score. Now, how does this make sense in the sense of something like height which we can clearly physically measure within a reasonable amount of precision. But in the Galton story, our construct isn’t actual height.. it is the influence of parent’s height on their children’s height. However, due to the multitude of other factors causing noise in trying to predict heights between two generations, Galton found that the grand population mean of height is a better predictor of offspring height than the degree that the parent’s heights diverge from the population mean because a single indicator of height (parents measures) is more likely to reflect anomalies that are statistically significantly different from the average of the distribution but are just one of many factors that will actually produce the actual height of the offspring (recessive genetic factors, mutations, and environment being the most major other predictors of height in addition to dominant genetic inheritance).
There are some analogies to sports. The team of the following year is going to have some relatedness to the team of the previous year… mostly.. certainly the Blazers will as they have a solid core and very few major changes. But there are million issues with assuming regression towards the mean is our biggest obstacle to improving next season and that we can use methods like Galton developed to predict that effect. Among them.. teams win numbers are dependent upon each other. For us to win, other teams have to lose. No one has to be taller, because I was born short. Laws of central tendency will make this tend to show this happening, but height of two unrelated people are essentially independent of each other while every team influences every other teams win totals in as direct of a way as possible. Also, while there is certainly an amount of luck to winning some NBA games, there are a lot of games where luck didn’t really play into it. Again, for regression towards the mean to be a factor next season, we would have to assume a large amount of error in the number of wins we got… and that this error over all teams and through all time would be symmetric around the error (teams winning lucky will balance with teams losing not lucky and over time each team will show a balance in winning and losing luck games) and it is not related to true ability (that winners and losers are more equal in close games and the final result is not based on who the better team was). I know there are some games that are decided by luck, but I don’t think it is very many.
Finally, the biggest obstacle to the Blazers increasing their win totals is much more related (imo) to how their improvements (through changing the team and through development) compare to the improvement every team we are playing most of our games against. Esp. a team like the Thunder.
"...the primary focus of all obstacles is to induce labor, so progression can be born." - LiL C
Regression to Mean is not an obstacle
>But there are million issues with assuming regression towards the mean is our biggest obstacle to >improving next season
I’m pretty sure that’s not what he’s saying, after all regression to the mean is just an observation of effects of effects, not a cause.
What I do see in the post saying is that the myriad other influences of wins – injuries, other teams injuries, schedule, grumpiness, sweat, refs, fans, weather etc – combine to have a greater than expected influence.
We want player selection and development to dominate. As fans we’ll spend more time talking about that anyway. After all who want weather in influence an indoor sport – but it does.
Cheers, Alistair
Yeah, that was pretty much my message
I can see why introducing the term regression to the mean had a distracting effect… but I was basically pointing up all of the uncertainties that can influence the course of a season.
And that those uncertainties tend
to have a negative influence on teams that were good last year and a positive influence on teams that were bad last year.
It’s a good message. But I think you are right that “regression to the mean” was distracting.
When I rule the world, everyone will know how to use Excel.
Yeah, If I had it to do over
I would have simply presented the pattern in the data and then said that the forces that produce statistical regression to the mean is likely to be one of the factors producing that pattern.
yes,
that the results above conflate a “pure” statistical regression to the mean with less pristine factors, such as the mutual dependence of team wins on each other. So, surely the graphs above are not a measurement of the strength of pure statistical regression to the mean. That’s one reason that I put in bold that age and regression to the mean were not the only factors that I would consider..
On the other hand, I disagree with most of the second half of your second paragraph. I think there is a large amount of fundamental error in the number of wins a team gets—at least enough to consider when making projections from one season to the next. Suppose each teams true win total in year 1 is x +/- 4 wins, which I don’t think is absurd at all, then you’d definitely want to consider it in year two.
(I’d also note that regression to the mean does not have to be the result of measurement error; it can be produced by sampling variability and other types of random error).
I think my definition of measurement error (which perhaps would be better called observation error) encompasses both of those other types of error.
And I’m willing to agree that +/- 4 is a completely reasonable estimate of the “random” wins.. esp. in terms of predictions. I still don’t think that you will find that the average number of wins for the entire NBA as being a significant predictor of a teams following year success.
"...the primary focus of all obstacles is to induce labor, so progression can be born." - LiL C
can check that with simulations
And I’m willing to agree that +/- 4 is a completely reasonable estimate of the "random" wins.. esp. in terms of predictions. I still don’t think that you will find that the average number of wins for the entire NBA as being a significant predictor of a teams following year success.
I am not sure what you mean by the second sentence. The claim was that wins in year 1 are a significant predictor of the change in wins from year 1 to year 2. To test if +/- 4 is enough noise to make this true, I generated simulated data with teams having a “true” win total (or propensity) from 25 to 50 games and then added 4 games of noise, literally a normally distributed random variable with mean 0 and standard deviation of 4. I then created a simulated second season (reseeding the random number generator), subtracted the first season from the second and regressed the difference in wins on win total in year 1. Win total in year 1 was a significant predictor of the change in win total. (the coefficient was -.133).
On the other hand, if by +/- 4 wins, you had in mind a standard deviation of 2 games, so that +/-4 is the “margin of error” as reported by polling organizations, then you’d get a much weaker effect. Something you’d detect with enough data, but only worth a game for the elite teams. For what it’s worth, one would need about 6 games of noise (r.v. with mean=0, sd=6) to produce the results in the nba data.
only problem with your approach
is that two randomly produced win totals would have a lower correlation than a teams win total would…which would produce a great regression to the mean effect. You should use a regression equation with a noise element to predict the second season win totals from the simulation on the first season.
Plus, you have created a restriction of range problem where teams cannot change beyond the limits you set on it.
but Bravo for breaking out the simulation solution
And all I meant by “I still don’t think that you will find that the average number of wins for the entire NBA as being a significant predictor of a teams following year success.” is that the stronger the regression to the mean artifact, the more likely the average number of wins for every NBA team is as good or a better predictor of one team’s year 2 performance than that same team’s previous year win totals.
"...the primary focus of all obstacles is to induce labor, so progression can be born." - LiL C
is that two randomly produced win totals would have a lower correlation than a teams win total would…which would produce a great regression to the mean effect. You should use a regression equation with a noise element to predict the second season win totals from the simulation on the first season.
Plus, you have created a restriction of range problem where teams cannot change beyond the limits you set on it.
The simulation is an example of pure statistical regression to the mean. It simply illustrates how much regression to the man there would be if teams did not improve at all, but each season’s true quality was measured with error by the wins. It was a way to show that +/- 4 games of noise is enough noise to matter, not a claim about what happens in the NBA.
If I introduced the possibility that teams can also improve at random, that would actually increase the amount of observed regression to the mean.
Regression to the mean
is all fine and dandy when talking about large samples, but we’re talking about a sample of ONE here. Despite all the nice graphs and examples, I don’t think you can really apply it with any accuracy on the “micro” level. Statistics is all about the “macro”.
Far more important than generalized tendencies over years and many different teams is the maturation of the Blazers young players and their ability to become a cohesive team (i.e. chemistry).
Interesting post, but I just don’t think it can be applied.
Duct tape makes you smart.
Dynamics Indicate the Opposite
I love stats and appreciate your analysis. And I agree that it it possible that the Blazers will win less games, not more. But I think it likely that they will win more, for the reasons so many other commenters have suggested.
I think that there is something in year to year dynamics that is not captured in your graphs. It would be improved if the “shape” of 5 year win streams were analyzed.
I’m going to play a bit of devil’s advocate here. If regression to the mean was fully in effect, all teams would get stuck at 41 wins. Looking at your middle graph, start with the Blazers 21 win season. The graph tells me that the next year they would win 8 more, for 29 wins. And the following year they would win 5 more, for 34 wins. Then they’d add 3 more, for 37 wins. They’d slowly creep up from there to 41 wins. And they’d get stuck. Meanwhile, all the good teams would head towards the center. All teams would end up with 41 wins.
So the fact that there is dispersion between 20 win teams and 60 win teams demonstrates that regression to the mean is not in effect! Something else is going on.
I think that teams sustain winning, at least over a 5 or so year period. I’ve done no statistical analysis, but here’s a few facts and figures that support this concept. Some teams are consistently good. The Lakers have won 63% of their games in the 49 years since they moved to LA. The Spurs have a won 60% since they came into the NBA. Boston has been in the league for 63 years, and has won 59%. At the other end of the spectrum, the Grizz have won 33%, the Bobcats 35%, the Clips 36%, and the Raptors 41% since coming into the league.
I think regression to the mean works for time series that are random, with one time period outcome unaffected by the next. Think rolls of the dice. Getting snake eyes on one throw has no effect on the next throw. But the NBA is different. There is “momentum” carried from year to year. It’s like if you threw snake eyes, that side of the dice become lighter and more likely to end up on top.
So I agree with your basic point, but believe that there is a dynamic analysis that analyzes time series with “momentum” that would be more appropriate.
by Blaz06Draft on Aug 6, 2009 5:22 PM PDT reply actions 1 recs
I’m going to play a bit of devil’s advocate here. If regression to the mean was fully in effect, all teams would get stuck at 41 wins. Looking at your middle graph, start with the Blazers 21 win season. The graph tells me that the next year they would win 8 more, for 29 wins. And the following year they would win 5 more, for 34 wins. Then they’d add 3 more, for 37 wins. They’d slowly creep up from there to 41 wins. And they’d get stuck. Meanwhile, all the good teams would head towards the center. All teams would end up with 41 wins.
So the fact that there is dispersion between 20 win teams and 60 win teams demonstrates that regression to the mean is not in effect! Something else is going on.
Hypothetically speaking, one would observe a regression to the mean if every team had a different “true” win total. Say the Lakers are a 55 win a year franchise, the Grizzlies a 25 win a year franchise and so on. If you took each teams true win total and added wins at random or noise wins, drawn from a normal distribution with a mean of 0 and a standard deviation of 4, you’d observe regression to the mean.
Regarding larger trends over a larger number of years. There’s actually not that much evidence for it the data. The number of wins two years prior is only barely a significant predictor of wins. Wins from three seasons ago are completely unrelated to wins in the current season, controlling for wins one year and two years ago.
54 wins, for most teams
would probably be their peak, hence the regression.
call me a homer, but it’d say we’re just getting started.
Yellow Mamba FTW!
Nerd Alert!
Get ready for your swirly, Sam. Hope you brought a comb.
(jkjkjkjkjkjkjkjkjk, don’t kickban me!!)
Life is hilarious.

























