Numbers don’t lie on Hall of Fame chances

To write about the NBA is to almost automatically get sucked into a very specific nerd vortex of websites, which offer everything from advanced stats to shot charts to five-man unit data and have become increasingly popular in recent years. Basketball has become the sport of choice for analytics, and though much of that data remains proprietary and behind closed doors, there's enough at least somewhat specialized data out there for the general public to access. 

A few weeks ago, in one of the who knows how many hours I've spent (wasted?) trolling said sites, I stumbled upon a portion of basketball-reference.com that I hadn't ever noticed before: the Hall of Fame Probability section. The page provides two charts, one for active NBA players and another for both active and retired NBA players, giving the probability of each being elected into the Basketball Hall of Fame.

The page is a welcome distraction from defensive ratings and miserable shooting percentages on midrange jumpers, and it's no surprise that among all active players, Kobe Bryant is the only one with a 100 percent chance of making the Hall of Fame, according to the model. The nine active players with a greater than 90 percent chance of making it are as follows: Bryant, Tim Duncan (99.99 percent), Kevin Garnett (99.90), LeBron James (99.87), Dwyane Wade (99.78), Dirk Nowitzki (98.28), Paul Pierce (98.28), Jason Kidd (95.25) and Ray Allen (92.36).

On the more comprehensive list of players both active and retired, 59 players boast a probability of greater than 90 percent, and of those who are retired, only Shaquille O'Neal (100 percent) and Allen Iverson (99.8 percent) have not been voted into the Hall of Fame. Neither is yet eligible, though, having not yet been retired for five years. In fact, the predictor is so accurate that only six of the 73 retired players with probabilities greater than 70 percent have not yet been enshrined: O'Neal, Iverson, Gary Payton (88.26 percent), Jo Jo White (84.49), Chris Webber (74.59) and Willie Naulls (71.73). O'Neal, Webber and Iverson will likely make it when they become eligible, and odds are that Payton will be announced as a Hall of Famer as a member of this year's class, the first for which he is eligible. White, who retired in 1981, and Naulls, who retired in 1966, are the only true omissions, and so for the model, 71 out of 73 isn't half bad.

So now, onto how the thing works. Basketball-Reference offers a detailed discussion of the model and the regression run to yield it, which incorporated seven variables: height (in inches), last season played indicator (1 if 1959-60 or before, 0 otherwise), points per game, rebounds per game, assists per game, All-Star game selections and championships won. The probabilities were calculated for all players who have played a minimum of 400 games, and it views active players such that the probability dictates their chances of being elected to the Hall of Fame if they were to retire today. Thus, probabilities for guys like LeBron, Chris Paul (89.09 percent), Kevin Durant (71.18) and Carmelo Anthony (80.57) should likely increase over time.

In order to find out more about the model, I got in touch with Neil Paine of basketball-reference.com, and an edited version of our email Q&A appears below:

FSN: Where did the idea of doing this originate? It seems pretty unique.

PAINE: The thought of gauging Hall of Fame chances has been around for a while, certainly since (baseball writer and statistician) Bill James was writing about it and inventing metrics pertaining to it in the 1980s. … As for the particulars of this implementation – Justin Kubatko, who created Basketball-Reference, has a background as an academic and industrial statistician, and the technique used here, logistic regression, is very common in that field, so it just seemed like a natural direction to take Hall of Fame debates in.

FSN: Do you do this for all sports, or just basketball?

PAINE: Basketball is the only sport for which we have a probability model per se. In baseball, we have Bill James' Hall of Fame stats, which are a set of four generally useful gauges for how a player stacks up to HoF standards, and we also have Jay Jaffe's JAWS system. But those are all about eyeballing a guy's numbers against historical benchmarks and getting a vague feel for his chances that way, rather than putting an actual probability estimate on them.

In theory, though, we could apply the same statistical technique to baseball in the future. It's just a matter of finding the specific set of stats that tends to track most closely with the way HoF voters have historically behaved.

FSN: Did you talk at all to actual Hall of Fame voters in the process of devising this?

PAINE: No, it's all done through a statistical model. As a group, the voters actually tend to exhibit very predictable decision-making patterns when it comes to whether a given player is deemed Hall-worthy, and those patterns align closely with just a handful of career stats and accomplishments. Each individual voter may differ slightly, and take more or less information into account, but when taken in the aggregate their behavior can be predicted quite accurately.

FSN: How long have you been calculating the probabilities? Have you been successful?

PAINE: It's been eight years now, if memory serves. We added the feature pretty early on in Basketball-Reference's lifetime. In terms of success, it's always tough to say whether a probability was accurate or not after the fact, but our model would correctly classify 83 percent of actual Hall of Famers as such, and would correctly "deny" 98 percent of actual non-Hall of Famers.

FSN: What kind of feedback do you get about it, if any?

PAINE: We get a lot of people thanking us for building the model, but we also get many critiques, where fans will ask why we did or didn't include a certain variable in the final model. Right now, the most common complaint is that MVP voting is not a predictor in the current model, which is why you see Steve Nash's probability as just 56 percent. Of course, you and I know that no MVP winner has ever been denied Hall of Fame enshrinement, but the computer doesn't – it just looks for strong relationships in the data that it knows are not due to random noise. There have been so few MVPs over the years, and so many other, stronger factors that overlap with the tendency to vote in MVPs, that the model couldn't say for sure that MVP voting was a significant factor in predicting Hall voting. The good news, though, is that we can re-run the model with new data from the past few years, and things might change. MVP voting might appear to be a significant factor again.

Since 2005, I'd say the only real misses among players whose cases weren't either mitigated by the ABA (Artis Gilmore, Mel Daniels) or a great college career (Ralph Sampson), neither of which the model can account for, are Reggie Miller and Chris Mullin. The model did not expect them to be inducted, although I'd argue that they are outliers, representing major departures from the voters' usual patterns.

FSN: The variables you use in the regression are interesting. Most I'd expect, but height and last season played seem slightly more interesting than the rest. How did you choose to include those, and were their effects what you expected them to be?

PAINE: Justin used a model-building technique that either adds or subtracts variables until it comes up with the set of variables that are statistically significant and, at the same time, actually add to the model's predictive value. Most of the relationships make sense – scoring, rebounding, and assists are good, as are All-Star nods and championships won – and the others are in there to represent voter behavior. For whatever reason, it was quite a bit easier to get into the Hall as a player who retired before 1960, which needed to be accounted for. You see this in many sports, where earlier players are highly over-represented in the Hall of Fame, relative to modern ones. Height is significant because voters tend to give smaller players a statistical "handicap", whereby they need less production than taller players to achieve the same level of HoF probability.

FSN: The model was devised using a pool of 750 players. How did you choose the 750? 

PAINE: That group was chosen because it represented all players in history who had at least 400 career NBA games and had been eligible for at least one Hall of Fame election. Four hundred games was chosen because almost every Hall of Famer who played in the NBA appeared in at least that many games. The only exceptions are guys like Maurice Stokes and Drazen Petrovic, who had their careers cut short for tragic reasons and therefore wouldn't really be representative for the sample we were trying to assemble.

FSN: Have there been any results/predictions that the model yielded that surprised you? Any players who ended up with a much higher probability than you'd expected? Much lower?

PAINE: Nash's low probability in the current model is always surprising at a glance, although it makes sense when you realize MVP voting isn't a factor. … There are also a number of active players whose probabilities seem quite a bit higher than they should: Chris Bosh (89 percent), Vince Carter (78 percent), Tracy McGrady (58 percent). I'd be shocked if any of those three ever came anywhere close to the Hall. But, fortunately, those are the model's exceptions, not the rule.


Follow Joan Niesen on Twitter.