Filed under:

# Statistics and Analysis in the NBA

Dave takes a look at the role of statistics in the pantheon of analysis options in the NBA. How important are stats to understanding the game and what are their limits?

In a stroke of serendipity Sam Tongue opened up a discussion on analytics over the weekend.  Today's Mailbag continues in that vein, taking a look at stats.

Dave,

Sometimes you're at odds with the more statistical minded Bedgers.  Do you rely on the eye test more?  How valuable are stats in your appraisals?

BL

Statistics form one of the planks upon which good, modern basketball analysis is built.  In fact, advanced statistical work is the single most significant development in basketball analysis in the last decade.  Stats are indispensable.  Count me in the heavy pro-statistics camp.  They're a heavy part of my work here.

If you are going to rely on a single method of evaluation, rely on stats.  It's the best approach, hands down.  But telling you statistical analysis is the single best approach to basketball evaluation is like telling you that if you're going to predict a winner in the NBA Draft Lottery, go with the team holding the most ping-pong ball combinations.  The statement is correct.  You're going to be right more often guessing the team in #1 position will win the lottery than you will with any other method.  But that doesn't make your guess infallible, nor does it draw a complete picture of the lottery system.

I deeply admire, and daily build upon, the work of the great statisticians in this field.  They're pioneers.  I've also noticed that like explorers, they're seldom satisfied with what they've done.  They're always looking to refine, improve, to head on to the next steps.  Good statisticians understand the limits of their data as well as its revelations.  They're seldom satisfied.  They seldom claim, "Well...we've found it now!  This is it, the ultimate answer!  Case closed, time to quit!"

Consumers of the work of those statisticians tend to take the opposite approach.  They want conclusions instead of probabilities, answers instead of vistas.  They want to build houses upon statistical grounds too narrow to provide the foundation.  At times numbers almost become a new religion.  I'm slightly familiar with the stereotypical hallmarks of religious fervor.  They include things like, "Here is THE answer.  Here is THE [slice of] data that clearly shows it.  If you don't see it this way you're ignorant.  If you want to invoke any other methodology you're a heretic."

I resist this kind of claim for any statistical formulation.  No single number, set of numbers, or formula can support it.  Numbers can open up doors to better answers about basketball.  They cannot provide THE answer on their own.  They are not infallible...or rather they cannot be used infallibly.

Numbers themselves are neutral, without judgment.  But as soon as a human being tries to pick them up and do something with them--necessary to the evaluation process--you introduce an interpretative element which is prey to the fallacies of all human judgment.  In other words, the numbers and formulas may not be lacking, but the conclusions you're drawing from them always are in some way or another.

You should definitely use stats to broaden your understanding of the game.  But when you're tempted to consider statistics the ONLY approach to basketball evaluation, consider the following:

1.  A few years ago Deadspin published a couple famous articles, An Assist for Nick Van Exel and  The Confessions of an NBA Scorekeeper.  They chronicled how badly official scorekeeping in the NBA can go.  Phantom assists, blocks, and rebounds were par for the course, skewed by the whim of the scorekeeper or the directives of teams wanting to make statistical news.  Those stories make for a shocking read.

That shock does not encompass the whole scorekeeping story.  No doubt most of the raw data coming out of NBA games is fine.  But it's not sacrosanct.  Creating the data set that brilliant statisticians fiddle with is, at its root, a human--and thus imperfect--process.  Whether charting the NBA is science or art, the specter of "entertainment" encroaches on all elements of the game, including its stats.

2.  One of the most interesting and difficult processes in statistical work is deciding what to measure and how to measure it.  Here, too, humanity enters the equation.

Of how much value is a rebound?  Around such questions doctoral theses are formed.  But even asking the question imparts some value to the event.  A rebound must more precious than your average dribble, for example.  We're charting one and not the other.  But is that assumption true?  And by how much?

Even if we accept the original assumption, how do we choose to measure rebounds?  30 years ago the answer would have been some variation of, "Dude...count the player's rebounds."  Even the simplest alternatives like per-minute stats and per-position sorting change the game significantly, introducing variables that bring sharper definition to specific targets while loosening the connection to actual events.  Then you have rebounding rate of teammates when a guy is on the floor, rebounds the opponent doesn't get when a guy is on the floor, rebounds compared to the guy who replaces the guy on the floor, and so on.  Who decides which version applies best?  The numbers themselves aren't going to answer that.  They may be objective, but the decisions you make in sorting and the narrative you tell after reviewing them--subjective endeavors both--impart them meaning and utility.

3.  Even the most rigorous, reasoned processes will produce hiccups.  These are the texture, maybe the purpose, of the exercise.  Devising a system to tell everyone that LeBron James was the best player in the league last year is like inventing vanilla ice cream.  Yup.  Good flavor.  No points for you.

Ideally you want a result that confirms what most people already know in obvious areas (LeBron, Kobe good at scoring, Howard and Love good at rebounding, etc) but values unexpected players higher or lower than commonly predicted.  This is your genius moment.  Those differentiation markers make your name.  People get to use your work to claim that Player X is way more valuable than everybody thinks; Player Y is way less.

OR...it could be that no formulation or order can explain a complex process perfectly.  Those players might not be better or worse outside the narrow scope of your work.  Maybe the aberration is found more in your process than in their performance.  The same moment that brings your greatest glory could be your greatest downfall.

How does one decide which is which, whether the hiccup is a bonus or a glitch, a new view on reality or an outlying result to be disregarded?  How do you determine how important the hiccup is, what significance it carries for your analysis?  Again, human judgment comes into play.

4.  So far we've been talking mostly about the high-level professional realm of stats work.  The subjectivity gets magnified a hundredfold when casual folks take up the cause.

Let's say your logic reads:  "My team needs more rebounding.  Player X is a good rebounder.  Thus Player X is really going to help my team."  You're going to invoke stats in the argument, but which of the statements is purely statistical?

The "good rebounder" line trends most towards stats because you're likely using a statistical formulation in order to define "good".  Even then, though, you have to choose an approach, as mentioned above. "Good" by what measures?  Do they lend themselves to the broad claim of "goodness"?  Is it the type of goodness that your team needs at this point in time?  Why is that player a "good" rebounder?  Will those underlying reasons be duplicated with a change in scenery and system?  Even the simplest claim demands further examination beyond the numbers.

Your team needing more rebounding probably has a stat line involved, but value judgments leak in more heavily here.  Who's to say you need more rebounding as opposed to more scoring or defense?  And aren't those concepts linked?  What if your "good rebounder" is also a limited defender who lets more shots go in?  Is he more valuable than a great defender who doesn't rebound himself but will cause more misses for everybody else?  Or what if you get better rebounding but it turns out it wasn't that important after all?  The logic started with an inherently speculative assertion, assumed its veracity, and pursued relevant stats accordingly.  If that first, speculative assertion turns out to be untrue (or even less true) the statistical work that follows loses its relevance.

Naturally the "really going to help my team" statement is the most subjective of all.  But for most casual folks this is the whole point...the thing they're hoping to show above all.  The least statistical portion of the argument actually carries the most importance even though the argument is supposedly about stats.

In these cases, the stats are like a hot dog wrapped in a soft bun of subjective judgment.  You can't bite the former without getting a mouthful of the latter first.

.

None of these four points invalidates the usefulness of statistics.  Again: they're the single most valuable tool we have in evaluating the game.  But these points do illustrate that there's nothing sacrosanct about the statistical approach.  It's not above criticism or evaluation, nor is it beyond the same impulses that contaminate other forms of evaluation.  It may be less susceptible to them, but they're still there.

Statistical approaches need to be checked and balanced by other factors--observation, experience, "wisdom", or whatever you term them--the same way statistics check and balance those factors.  Most of us do not have the same access to those observational/experiential tools that we do to stats, but that doesn't invalidate their importance or existence.

The key take-away for our purposes is that statistics, like every other analytic approach, make a great place to start conversation but a poor conversation ender.  Saying, "This stat appears to say this..." and then checking that conclusion versus other stats/methodologies, environmental data, experience, and common sense--putting the stat in context--makes for superior analysis.  Saying, "This stat says THIS, end of story!"--or worse, making that kind of claim and then rationalizing away or converting all other sources of data to conform--is no more valid when using statistics than it would be with any other source of data.

In short, I am not at all at odds with stats or their use in analysis.  I sometimes find myself at odds with what people try to do with stats, especially if they claim that numbers on a page tell the whole story about a complex, and ultimately quite human, endeavor.  We're farther along statistically than we've ever been in the history of basketball and we'll get farther still.  They still haven't invented the stat that explains it all, though.  Until they do, more refinement, cross-checking, and employing every source of data at our disposal will remain a necessity.

Keep those questions coming to blazersub@gmail.com with "Mailbag" somewhere in the subject line!

--Dave (blazersub@gmail.com)