LOVE, WAR & the great MVP debate

Just a month or so ago, it seemed that Mike Trout would just cruise to another Most Valuable Player award. Trout’s Angels were in first place, and he was quite obviously the best player in the American League. Again.

Now, though? Not so much.

Monday, Owen Watson made the case for Josh Donaldson, who seems to have somehow become an even better player since joining the Blue Jays.

Now, maybe you’re like me and maybe you’re not, but when I start thinking about something like this, the first thing I do – because it’s easy, and because I’m generally as lazy as the next fellow – is look up the players’ Wins Above Replacement on FanGraphs. I mean, it’s really easy. Because from the landing page, if you click on Leaders the result is major leaguers ranked by WAR. You can then sort by league if you want, but that’s not really necessary since the top three are Bryce Harper, Trout and Donaldson. Or Donaldson and Trout, depending on which day you’re clicking.

Of course you don’t have to rely on FanGraphs and I wouldn’t. There’s also Baseball-Reference.com’s version of WAR, and of course Baseball Prospectus has theirs too. I just begin with FanGraphs because it’s the quickest and I’m the laziest.

My biggest "problem" with WAR is that a) there’s not a single version I trust more than others, and b) I just can’t stand the acronym because the older I get, the less tolerance I have for violence. A few years ago, I half-heartedly suggested a new metric that would combine the various versions and be called Wins+.

Of course that didn’t go anywhere. But earlier this week, Joe Posnanski—in the midst of a much larger discussion about statistics and MVP voting — echoed my concerns:

OK, now we get to the two WARs. I have gone back and forth about there being two distinct versions of WAR. For a while there, I hoped that they would come together a bit more — it seemed to me that it didn’t help the credibility of either version of the statistic to have such divergent results.

Now, I think differently — I think the two statistics should break apart even more. And I think they should have different names. I understand, as Tom Tango says, that they are just two different methods for the same framework (and technically they have different names, fWAR and bWAR). But I don’t completely buy it. I think they take two different approaches to valuing players, especially pitchers (bWAR looks at run prevention, fWAR looks at strikeouts, walks and homers). They value defense somewhat differently. I now believe they should break apart, become two completely different statistics. One can be called WAR. The other can be called PEACE (Price Effective Above Common Earthling).

Hey, I think I’m going to do that. Baseball Reference WAR will still be called WAR. Fangraphs will be known as PEACE.

But what about Baseball Prospectus? They’ve gotta be in the mix, too! So here’s my brilliant new idea: Combine all three versions and name (with a nod to Joe) our new metric Leveraged Outstanding Value Earthling. It’s better because it’s more inclusive and it’s got one fewer character for those column headings!

So here are your major-league LOVE leaders, as of Wednesday morning:

8 Bryce Harper
7 Mike Trout
7 Paul Goldschmidt
7 Josh Donaldson
6 Manny Machado

Oh, I almost forgot one more thing about LOVE: I round them to the nearest integer, for the simple reason that 7.8 vs. 7.6 is a distinction without a difference; the tools simply aren’t precise enough to make 0.2 even worth mentioning.

Except I’m probably wrong to do this, in this particular discussion. What if you had two players whose LOVE were 7.6 and 7.4? Rounding would show them as being 1 LOVE apart, when the actually difference was far smaller (percentage-wise). So, my apologies. In this context, we probably do need our tenths. So here’s the same list but with (slightly) more accuracy:

7.7 Bryce Harper
7.3 Mike Trout
7.1 Paul Goldschmidt
6.8 Josh Donaldson
5.7 Manny Machado

Again, all this really tells us is that even if we believe in LOVE, it’s hardly all we need. And if you believe MGL, we shouldn’t believe in LOVE at all. Not when we’re discussing MVP candidates, anyway:

So what is WAR good for and why was it "invented?" Mostly it was invented as a way to combine all aspects of a player’s performance – offense, defense, base running, etc. – on a common scale. It was also invented to be able to estimate player talent and to project future performance. For that it is nearly perfect. The reason it ignores context is because we know that context is not part of a player’s skill set to any significant degree. Which also means that context-non-neutral performance is not predictive – if we want to project future performance, we need a metric that strips out context – hence WAR.

But, for MVP discussions? It is a terrible metric for the aforementioned reasons. Again, regardless of how you define MVP caliber performance, almost everyone is in agreement that it includes and needs context, precisely that which WAR disdains and ignores. Now, obviously WAR will correlate very highly with non-context-neutral performance. That goes without saying. It would be unlikely that a player who is a legitimate MVP candidate does not have a high WAR. It would be equally unlikely that a player with a high WAR did not specifically contribute to lots of runs and wins and to his team’s success in general. But that doesn’t mean that WAR is a good metric to use for MVP considerations. Batting average correlates well with overall offensive performance and pitcher wins correlate well with good pitching performance, but we would hardly use those two stats to determine who was the better overall batter or pitcher. And to say, for example, that Trout is the proper MVP and not Cabrera because Trout was 1 or 2 WAR better than Miggy, without looking at context, is an absurd and disingenuous argument.

I don’t really disagree with any of this.

Much.

We do return to those hoary old discussions of the definition of "value," though. If a player hits 50 solo homers because his last-place team had the league’s worst on-base percentage, does that mean he didn’t have any value?

In the past, I’ve tried to square this circle by arguing that he still had a great deal of value; at the very least, 50 times that season, he gave his team’s fans something to cheer about.

Of course that’s an extreme example. MGL mentions a player who draws four walks in a game his team loses, 10-1. Did that player have any value in that game? No, not much. That one’s not so extreme, but then again no player’s going to have many games like that. I wouldn’t rely on LOVE, nor would I rely on (to name some metrics MGL mentions) WPA or WPA in winning games or some adjusted RE27 or whatever. Neither would Mickey. Ultimately, he simply argues that when making your MVP argument, "make sure you can justify your position in a rational, logical, and accurate fashion."

That last part’s easy, but "rational" and "logical" are both open to interpretation.

I happen to think WAR or LOVE are perfectly fine metrics ... but only at the beginning of the process. Hell, you gotta start somewhere. And there are lots worse places to start than Trout and Donaldson. But the ballot’s got spots for 10 players. So it’s not like you’re going to miss a great candidate because he’s only third on the LOVE list. Indeed, I would give Manny Machado a long look this season, and Goldschmidt in the other league.

And you know what? It seems that MVP voters have been doing exactly this sort of thing in recent years. Oh, I don’t know about WPA in winning games or whatever. But they are treating the MVP as a "story award" ... after winnowing down the list with WAR. Here’s Posnanski again:

What I found out, I must admit, is not especially interesting or surprising. By breaking down MVP voting over the last 50 years in several unscientific ways, I found that WAR has changed the voting in some ways but not in others. Brilliant, right? What I mean is, I don’t see WAR leaders becoming MVPs any more often than they did before the statistic gained favor. From 1975-1984, for example, 11 Fangraphs WAR leaders won the MVP award — and that was obviously decades before the statistic was even invented. Over the last 10 years, nine Fangraphs WAR leaders have won the MVP award. So I don’t think the WAR impact has been that direct.

But where I think WAR has entirely changed the landscape is in getting rid of quirky MVP winners. Since 2008, which is just about when WAR and similar complex statistics started to become mainstream, every single MVP has finished Top 5 in WAR. In fact, every single winner except Miguel Cabrera finished first or second in WAR. Cabrera finished either third or fourth the two years he won, so he wasn’t exactly an outlier either.

Today, it would just be too difficult for most voters to justify giving the things to Justin Morneau and Ryan Howard, as they did in 2006. MGL laments the use of WAR in these discussions, but it really does seem that the voters are using it well, or at least better than we might have expected (I’m still not sold on Cabrera over Trout, but whatever).

And getting back the original question, what about Donaldson and Trout? FanGraphs tracks a ton of "context" metrics, and ... Well, now everything’s open to debate. Because Trout hasn’t been real "clutchy" this season … but neither has Bryce Harper. You look at these numbers, and suddenly Donaldson’s looking better than Trout and Anthony Rizzo’s looking better than Harper. Or maybe, considering LOVE and CLUTCH, Goldschmidt’s looking better than everybody.

Everything counts, friends. But you still gotta decide what counts for what, and everybody’s going to come up with different answers.

Postcript: Speaking of different answers, Buster Posey’s on our list, too ... If you believe the pitch-framing metrics, which to this point only BP is using in their version of WAR. So when does everyone else start counting that, too?