In celebration of the ongoing Sloan Sports Analytics Conference, here's a Mailbag question about statistics.
You said in a [recent game review] comment that understanding how to use and communicate stats was a hidden key to your success. We also have the Sloan Conference going on this week and Neil Olshey is there. Could you talk about analytics and how you feel it's shaping the league? What stats do you use most and how do you decide how to communicate them?
It's a great question. It's also a vast question, even in the edited-down form we're presenting here. I'll try to break it down in three parts: sports professionals, fans, and media. In each case a mantra will hold that I've shared before: Statistics make a wonderful (these days irreplaceable) servant but a poor master.
My first exposure to the significance of statistical analysis came in the 80's when the three-point shot was a relatively new thing in the NBA. Up until that point the lesson had been "Closer is Better" when it comes to shooting. Long jumpers were the last resort. I was watching a national broadcast one day and the play-by-play guy was echoing this sentiment, that shooting so long was a bad-percentage play. The color analyst must have had some math in his background because he pointed out that shooting 33% on three-point shots equaled 50% on regular shots. At first my young mind couldn't grasp this. "33% is really bad!" But being a precocious math genius myself I grabbed a big pencil and a piece of paper and figured out that he was actually right. Boom! Statistical analysis changes the game...and one boy's perceptions.
Few back then could have dreamed how far numbers-crunching would progress. Adjusted plus-minus, mapping of player tendencies...the NBA is like a 24-hour statistical buffet nowadays. It can't be ignored. Any team that's not taking advantage is going to fall behind. Properly used, information is power. If the other guy has more information than you he has potential access to more power. Stats are here to stay. Showing up without them is like a carpenter showing up to work without a hammer.
But statistics are a tool, not the tool. The carpenter needs a hammer, but you can't just go around hammering everything and calling it good.
Statistics allow critical examination of theories or conventional wisdom, much as with the three-pointer argument long ago. To me that's their strongest point. If they can help you see the game a different way they've given you more avenues for evaluation. Every team needs access to those avenues whether they choose to take them or not.
That everyone will believe and act on some kind of theory is a given. You want to know what that theory is and why it makes sense, though. That's what stats are for. They're a birthplace and testing ground for ideas.
I'm not sure statisticians have made the leap to showing that numbers go beyond the descriptive into the prescriptive realm, though. Analytics gurus are doing a better and better job of describing what's happening on the court but that doesn't necessarily mean they're able to tell us how to use that information to make the right decisions in the future. Actually I'm not sure this should be a goal at all, but most folks who talk about stats seem to assume it as the Holy Grail. The musical montage training session is the most recognizable trope in sports movies but a close second is the bespectacled guru who finally cracks the code revealing the magic number that turns the underdog into a champion. No matter how you refine and parse the numbers, I'm not sure that code exists.
For example, the numbers may tell you that LeBron James has X level of success when he goes right and Y level when he goes left, and X is greater than Y. That's an accurate description of what's going on. The obvious prescription based on that analysis is "Force him to the left!" But those numbers don't tell you whether forcing him left will actually succeed. What if defenders have noticed this tendency all along and have been forcing him left already...in other words this is as "left" as he can practically be forced and his performance is still way better than anybody can believe? You haven't altered anything there. Or what if the defender goes to the extreme and oversells the right hand defense entirely, allowing LeBron to dribble down the lane with his left hand and throw down a dunk when he would have shot a jumper under a normal defense? Now you've gone backwards. Either way, maybe the final analysis is that LeBron is LeBron and no numbers are going to change the fact that he's better than you. Do you still want to know the tendencies? Yes. But there's no magic code to the future, nor any number that'll change the laws of nature and guarantee your success.
It's near heresy in some circles, but there's also a practical limitation to the utility of statistical information. A player can no doubt keep one or two broad trends in mind when facing an opponent. It's unlikely that he'll be able to retain and access complex formulas while running up and down the floor. If description is the goal then data for data's sake is fine. Theoretically any formula can show something. Putting that something into practical use is another matter.
I envision the relationship between statistician and team management like that between the Science Officer and Captain on the Enterprise. What does Spock do? He reads the sensor data, parses it, then gives Kirk the best view possible of what's going on in front of the ship, opening up the possible courses of action for the Captain. Spock doesn't make those decisions himself, nor would the ship be better off if he did. He can't replace Kirk. He doesn't have the right answers. He just makes sure the Captain sees the useful variables before making a decision.
The best work in the field, at least to my perception, comes from analysts who approach the work as scientists do: thorough and rigorous, skeptical of theories instead of falling in love with them. Finding more data, better ways to sort data, stronger bridges between data sets...these are irreplaceable functions of the statistical analyst. Becoming a "basketball guru" or decision-maker is secondary. Not that this is impossible. Spock could become captain and he'd probably make a fine one. But then he's captain and not an analyst anymore. If he takes the center chair and refuses to think or decide in any other mode but that of science officer his tenure will be unsuccessful. More importantly, he needs more than a grasp of numbers to make the ship run properly.
In the end sports are a human endeavor. That's why they engage us so thoroughly. You can quantify human behavior, even predict it with a reasonable degree of accuracy if you have access to enough personal and environmental data. But you can never sum it up wholly with a formula, no matter how precise. An analyst in love with a can't-miss stat or formula should be viewed the same way as a scout in love with a can't-miss player. (And believe me, this happens with numbers as easily as it does with people. Those numerals become their own religion.) There's no such thing as "can't miss" in either field. That's why they need each other...and all the relevant data they can get their hands on.
The development and display of statistics has smartened up fans to a degree unheard of a couple generations ago. But as with professionals, the data itself doesn't matter if it's not used well.
People able to use statistics meaningfully operate in the same way those scientific pros do. They look at the numbers and draw conclusions from them, understanding that no single theory will explain the entire set of circumstances in play. Less convincing people reverse that process, developing their theory (or passion or argument) and then finding the stats to support it. The difference in results is huge.
We're all familiar with correlation not equating to causation. "I get up at 4:00 a.m. every day. The sun rises from 1-3 hours after I awaken without fail, 100% of the time. I am beginning to suspect that I cause the sun to rise..." With basketball numbers dangling out there on any number of websites it's easy to get seduced. You think something is true, a few numbers fall into place that agree with you, therefore they seem indicative. Without being scientific about it at all, I'd guesstimate that 4 out of 10 uses of statistics by "regular folks" contain a fatal flaw and at least 8 out of 10 carry some omission which calls them into question even if the data presented is true as far as it goes.
Almost without exception if someone is trying to use statistics to end a debate instead of inform or provide nuance to it their assertion is oversimplified and incorrect. Even if a statistic is simple--recording how many offensive rebounds a player gets each game, for example--the environment surrounding that stat is complex. Using that number to extrapolate how a player will perform in a different environment, or how that player compares to another player in a different environment, is not a simple matter. You can achieve a reasonable estimation but it's not iron-clad...certainly not iron-clad enough to end all debate on the subject. To fully explain the importance and permutations of even the basic stats--let alone their application--requires a paper at least, not two sentences and a "So there!"
Stats can be used well. In fact they're probably the fairest means of debate for those not professionally trained (or at least very astute) in physical observation. Two sets of eyes can see radically different things but 50% is 50% for both people. But to use a stat well you have to understand what it measures, how that fits into the game, and the environmental factors influencing that fit. Then you have to decide if those things actually mesh with the argument you're trying to make.
Every year we fight the single-game plus-minus battle on this site as people want to use one game's number to show one player had a better night than another or, worse, that one player is better than another overall. Or we get tidbits from Blazers Broadcasting that "Only two small forwards in the league have X points and Y rebounds per game: LeBron James and Nicolas Batum!" The implication is clear: that Batum and James are in a class by themselves and that Batum is creeping ever closer to elite status. It's a classic shady use of stats to create an effect or an argument. Why cite those two stats and not an across-the-board comparison? If they're that indicative, how come they're not linked on a regular basis and players quantified thereby? How far is the distance between #1 and #2 and how many other non-small-forward NBA players fit in that gap? At what point in the season are you taking these measurements and which way are the numbers trending? The impression based on the numbers presented is that Batum is having a great season. In reality there's a debate whether he's having an upwardly-trending year even compared to his own statistics, let alone closing the LeBron Gap. It's the equivalent of Spock saying, "Captain, there are three Klingon cruisers off the bow and we should kick their butts! Whoo!" The data may be accurate as far as it goes but the conclusion may not be warranted or helpful. The agenda is driving the stats instead of vice-versa.
In short, stats have helped the fan base in general but they've also made it easier to destroy meaningful conversation as people draw upon them for authority without really using them correctly.
Since neither a degree in Journalism nor a Doctorate of Blogology equates to a PhD in Statistics media folks experience the same glories and vices when using stats as other non-experts do. It quickly becomes clear who's able to use stats well and who isn't but, as with all media-related things nowadays, inaccurate snippets are often taken out of context and repeated as truth until they become so.
Media gurus can't ignore statistics any more than professional NBA folks can. It's impossible to describe the action without some kind of stat. They certainly form the backbone of my analysis, especially for game previews. I watch a few games of a given opponent but there's no way I can cover the same expanse that the numbers do.
To me, though, statistical research is like wearing underwear. People should generally know that you do it and really shouldn't perceive that you aren't doing it. But you don't need to show your skivvies every ten minutes just to prove you got dressed correctly. That distracts from the story rather than helping it.
The media serves its purpose best when it translates incredibly complex matters into a cohesive story that even a novice could grasp without losing the fidelity the expert requires. In other words it's not my job to tell you that J.J. Hickson has a 27.7% defensive rebounding percentage this season. It's my job to determine why that particular fact is important to the matchup at hand and explain the ways in which it could affect the game. Ideally the novice grandmother watching with her 8-year-old grandson would go, "I get it!" while the stats gurus among us would say, "That's a fairly accurate description."
Media types go off the rails when they ignore stats in favor of their own observations alone. I make plenty of observations with my eyes but I also cross-check with the numbers. If there's a difference there, I want to know why. Sometimes after further review my observation is wrong (and I'm relieved I got educated before I spouted it publicly). More often than not, though, the discrepancy between what I've observed and what the numbers are saying leads me to an explanation that I never would have considered. I had a great conversation a couple weeks ago when a national-professional-type person said, "The stats say this about LaMarcus Aldridge and that's different than most everybody else in the league. Why?" We got, like, ten e-mail paragraphs out of a couple numbers and that simple question. I'd never bothered to even consider the matter before. That's great use of stats.
Media types also go off the rails when they use stats to show how smart they are instead of using them to tell the story. My friend (and inestimably great journalist) Rob Neyer once said something that set my heart flying. It was the equivalent of, "I'm sick of reading: 'Dwyane Wade shot 53.26% last night while holding Kyrie Irving to 41.05% shooting as the Heat managed 39.4% on three-pointers against the Cavaliers' opponent average of 35.68% for the season. This raised Miami's True Shooting Percentage to a league-leading 54.97%, a 2.89% lead over the nearest competitor.' What are all of those decimal points doing in there besides completely disrupting the flow of the narrative?"
Decimal points have their place in specific comparative studies. But for normal folks accurate rounding is just fine. In fact I've noticed that the more numbers somebody puts after the decimal in their layman's argument the more likely they are to be hiding something than showing something...appearing to be more of an authority than they actually are while citing stats in sketchy fashion. (Acting like a math bully slapping people across the face with "superior" numeric knowledge is another certain clue that someone is compensating.) Being more precise doesn't do any good if you're using data for a purpose it's not designed for. Those extra decimal places just make you more precisely wrong. For statisticians the decimal points are the point. For the rest of us conclusions reached and stories told on the basis of the numbers are more important. Readers are free to do their own research on data as desired, but first you have to show them why and how it's important.
My job, then, is to understand the complexities of the game and the numbers just like a GM or Coach would, but then to go the opposite direction of the professional executive. Instead of using my wisdom to make a single, highly complex command decision I use it to distill the data down into many simple-yet-probable possibilities and lay them out before you, inviting you to join in with your own conclusions and to speculate which of the available courses will be most important to that lofty professional who actually makes the decision.
This is what I mean by stats and the communication thereof being critical to my work. If I don't know the data I can't see the possibilities. Absent the work of statisticians I'd be blind as a bat. If I can't distill those possibilities into understandable form then seeing them does me no good in this context.
That's probably 87.25% more free-form thought on statistics than you wanted, but oh well. Keep those questions coming to the e-mail address below.
Also don't forget that we really, REALLY want to send underprivileged kids to the the Blazers vs. Warriors on April 17th and we need your help! It's easy and not expensive at all but the deadline for purchasing tickets is March 24th. CLICK HERE for details!