Will the Red Sox get enough value from closer Craig Kimbrel?

Well, we have a pretty good idea of what Craig Kimbrel is worth at this point. Kimbrel – one of the prime contenders for “best active reliever in baseball” – has swapped his Atlanta address for sunny San Diego and then again for Boston, all within 2015. Boston paid a pretty price for the fireballer, sending four prospects to the Padres for the honor of having Kimbrel on their team. The Padres had previously sent Cameron Maybin , Matt Wisler and Jordan Paroubeck (and oh yeah, Carlos Quentin, who was immediately DFAed) to the Braves for Kimbrel, along with the most important piece of the deal, relieving the Braves of the contract of Melvin Upton Jr. (nee B.J.).

Apparently, at least two Major League general managers think that Kimbrel is worth a lot. And the Twitterati reaction was fast and severe in both cases. The usual lines emerged. Closers are the mashed potatoes of a Major League roster. They taste so good, but provide such limited nutritional value for the calories. At the time of Kimbrel’s deal from the Braves to the Padres, because Upton was considered to be dead weight, it was calculated that the Padres had essentially signed Kimbrel to a contract in the neighborhood of $16.5 million per year. Not bad for a guy who works only 65 innings a year. Now the Red Sox had gone and traded for the same guy and given up four guys to do it. This was obviously a classic case of a team overpaying for saves.

Or was it?

There’s no question that Kimbrel makes the Red Sox a better team, but is this an efficient use of resources on the part of the Red Sox to trade prospects for a closer?

Warning! Gory Mathematical Details Ahead!

Trying to determine whether a closer deserves a salary anywhere near the best starting pitchers always starts with an old argument. The starter will pitch 200 innings, and the closer only 65, but the closer will pitch them really important situations in the ninth inning. In statistical terms, it’s a discussion of the fact that relievers – at least in the modern bullpen – have a job description that is based largely on leverage. Perhaps it’s the only job in baseball which is entirely contingent on leverage.

At this time, I would invite everyone to insert their favorite critique about the inefficiency of the design of the closer role. Yes, teams undervalue tie game situations. Yes, there’s a case to be made to develop closers who pitch more than just one-inning stints. Yes, the three-run save is only slightly less difficult than the Wes Littleton 27-run save. But despite all of these inefficiencies, there’s a good case to be made that the closer is essentially a creature of the leverage index. He does pitch in some very important ninth innings. While starters are inserted at the beginning of a game, before the shape of the game has been revealed (it might be a nail-biter or a laugher) and a hitter can only come to bat when it’s his turn, the closer knows that he’s not going out there unless it’s a high-tension moment. What a closer does (or doesn’t do) over the course of a season will have a much greater effect on how many games his team wins than if he had done the exact same things in the fifth inning. Or if he were inserted into just random innings in a game. We need to correct for that.

In general, we tend to view player transactions through the lens of WAR. WAR, by its very nature, seeks to strip out the context out of a player’s results, although the major WAR indices are all aware that for relievers, that’s a little silly. In general, we find that in WAR for relievers, there is an adjustment made so that a pitcher’s WAR is inflated by a factor that is halfway between 1.00 (average leverage) and the average leverage that he faced in the games that he threw. If he normally faced a leverage value of 2.00, his WAR (for his pitching components) would be inflated by a factor of 1.5. The problem here is that while closers do pitch in 40-50 save situations each year, they also pitch in games where they are just getting work in or are filling an inning. (Here’s Kimbrel’s game log for last season. There are a few decidedly non-save situations.) Those “extra” innings aren’t really what teams are paying closers for, and they are generally low-leverage, but they mean that the pitcher’s “average” leverage will decrease. In other words, WAR under-values closers, even with its adjustment and here I don’t think it’s a good metric for what we really want to measure. Closers are hired to save games, even if the save stat is silly.

Instead, I’d suggest that we need to focus our evaluations of closers on save situations and instead, think of closers in terms of the win probability that they are (sorta) responsible for. Even at a more base level, we can look at a simple stat, blown saves. We know that a save that is turned into a blown save means that a team’s win probability goes from 100 percent to 0 if the blown save is a loss. And we’ll assume that it ends up at 50 percent if the closer simply lets the game be tied on his watch, rather than saving the game. Using this sort of framework, can we make a case for Kimbrel or any other elite closer as worth a major investment?

(Note: There’s probably a case to be made that adding Kimbrel has secondary value in that Koji Uehara can move into the eighth inning, and it means one more good arm in the bullpen so that John Farrell might feel more comfortable lifting a tired starter earlier… before he gets into trouble. It’s likely that this value is positive, but we’re going to ignore that for now.)

One problem with declaring Kimbrel “worth it” is the idea that we have to hold a certain skepticism that Kimbrel is actually an elite-level pitcher. It’s a strange thing to say, but relief pitchers tend to face a few hundred batters a year, far less than is generally accepted as a reliable sample for many of our key performance metrics for pitchers. Kimbrel has built up enough of a sample over the last few years that we can feel pretty good about his bona fides, but that’s a big problem with relievers in general. Because there can be huge swings in a small sample size, it’s impossible to know whether Kimbrel will give the Boston faithful a Pavarotti concert on the mound this season or merely a Foo Fighters’ one. And perhaps some other pitcher will simply have an amazing (small sample fluke) season and suddenly spending all of those resources on Kimbrel will seem a little silly.

And then there’s the argument that most competent relievers could handle a save situation. Most relievers get through single innings unscathed and the only thing that separates a closer from a “regular” reliever is that the closer does it in the ninth rather than the seventh, and for his work, he gets an “S” after his name. Seems a shame to give him all the credit when two other relievers also pitched in with scoreless innings. Saves are about when you pitch, rather than how well you pitch.

Indeed, here’s a chart showing “save” rates for teams that enter the seventh, eighth, and ninth innings ahead, but by three or fewer runs – but they preserve the lead.

While ninth-inning pitchers (usually the closer, although that could be a starter finishing a complete game) are usually pretty good at protecting small leads, they are only about 1 percentage point better than their eighth-inning brethren and a little more than 2 percentage points better than the seventh-inning guys. Sure, every little bit helps, but – in theory, anyway – cloning the seventh-inning guy and sticking him in the ninth inning (so the rest of the bullpen isn’t affected) would only result in an extra 2.5-percent chance of a blown save. The average team has about 40 save situations per year, so he would blow one extra save per year. Figuring that one save might result in either a win being changed into a tie game or a straight-up loss, that one blown save is actually worth around .75 wins (if we assume that dropping into a tie is worth a “half” win.)

I think these sorts of comparisons miss the point though. We can look at the aggregate results, but miss the variation within those groups. Using data from 2010-2014, I looked at all situations in which a reliever entered the ninth inning with a “save” opportunity (his team was leading, but by fewer than 3 runs), and looked at how often he converted those saves (got three outs and did not cough up the lead). Among the thirty most common “closers” (had the most save opportunities) in each year, performances varied from Jose Valverde’s perfect record in 2011 to Heath Bell’s 2012 season in which he had a 63 percent success rate. A good performance from a closer is around a 90 percent success rate. A bad one is around 80 percent.

These are all anointed closers and yet performance can vary widely. The difference between an 80-percent closer and a 90-percent closer would be about 4 blown saves for the average team. Again, assuming an even split between wins that turn into losses and wins that turn into ties (which we will value at half a win), that’s a swing of 3 games in the standings. Yikes!

Suddenly, we really want to make sure that we have one of the good ones in the back end of our bullpen. There’s just one tiny little problem. Is save rate (and here, I’m just using saves / save opportunity, with no adjustment for difficulty) a reliable stat, particular in a sample size like 40 save opportunities in a season? I used the Kuder-Richardson reliability formula that I have used elsewhere to answer the question. Even at a sample of 40 save opportunities, the reliability for save rate barely budged above zero. Remember how we used to think that all pitchers were league average when it came to BABIP and that any variations were the product of luck? This is the same sort of finding. Among closers, there’s so much noise around performance that it’s nigh impossible to tell (using save rate anyway) who’s a good closer and who’s not.

There’s an important mistake that people often make in this sort of argument, mistaking the unreliability of our current measures for the impossibility of their ever being measured. Let’s for a moment acknowledge what we know to be true. There are differences between relief pitchers in how good they are at their jobs. There’s Saint Mariano on one end of the continuum and Joe “Oh God” Borowski on the other. (At least that’s what I said every time he came into a game in 2007 for the Indians.)

We acknowledge that there’s a lot of noise, but “a lot of noise” doesn’t mean that there’s absolutely no signal in there. That was the mistake of hardcore DIPS theory. A lot of noise means that there would be years when Borowski was better than Rivera at doing their shared job, but of course, when I tell you that I’d rather have Rivera closing for me than Borowski, you’d understand why.

It’s possible that Craig Kimbrel next year would have an 80-percent save percentage season just by dumb luck. Maybe he’ll have a 90-percent season (or better!), and there’s a lot riding on that small sample size (again, we might assume three games in the standings between those two points). Now it’s just a matter of how confident you can feel in your ability to discern the signal of which bin he’s likely to end up in. We don’t know which way luck will break, but if we can either do some #GoryMath or use all that touchy-feely stuff, and figure out (or guess) that Kimbrel has a 33-percent better chance of ending up in the “good bin” than the bad bin, then he’s worth about a win more than a replacement level closer, even pushing further aside that he’s likely better than the other seventh- or eighth-inning guys that the Red Sox could have gotten. And probably Koji Uehara. Suddenly, valuing him at $5-7 million more per year than the average closer makes a little more sense.

The Problem Is Those Error Bars

Looked at from a probabilistic standpoint, Craig Kimbrel is probably a pretty good investment. There’s a general agreement that he’s probably one of the best relievers in the game, which means he’s probably one of the best bets to put up a good save conversion rate, and that’s worth a lot if he’s able to do it. The problem is that unlike signing a third baseman or a shortstop to stick in left field, where the results would be much more predicta… oh right… well, position players do tend to be more secure bets. The Red Sox essentially bought a high volatility asset, but one with significant upside. And while it’s easy to critique them for placing such a risky bet, we have to remember that – despite being the ****ing Red Sox, they do not have infinite resources, nor do they have limitless opportunities to nab players that have the potential to bring them three wins worth of value.

The Red Sox paid a steep price for Kimbrel, and they’ll pay him a salary of $11.25 million this year, and $13 million in the two seasons after that. And they might get burned on the whole deal. But to win in baseball, you sometimes have to live with some high variance strategies. Suppose that a team felt that it had a roster with 84-win talent and was choosing between a player who was a guaranteed 2-win upgrade and one who might bring them nothing or might bring them four wins. Eighty-six wins probably doesn’t secure a playoff spot, but 88 might, and it’s only by signing the higher variance guy that they have a chance to make the playoffs. All playoff teams get a little lucky. The trick is structuring your team so that if the lucky wind blows your way, your sails are positioned in such a way that it will really move you.

The Red Sox want to contend, so this move makes sense in the way that a lot of prospects-for-veterans trades make sense. The Red Sox probably overpaid in terms of prospects (everyone does) but let’s not undervalue what they got back in the form of three years of, likely, the best closer in baseball at what looks to be a reasonable salary. The problem is those error bars.