Deserved Run Average: A better way to determine the Cy Young winners

We are rapidly approaching awards season, when writers and fans alike decide which player has been "the best" in his respective category.

The Cy Young Award has been given since 1956, and is widely viewed as a definitive benchmark for an elite pitching career. The award is conferred by the Baseball Writers' Association of America (BBWAA), different members of which are assigned to vote for each of the BBWAA's various awards.

How does one best pick a Cy Young winner? Not surprisingly, there is no consensus. The most commonly expressed view is that while a pitcher's statistics are important, writers need to consider a variety of other factors, including things that statistics might not capture. Factors cited by writers have included a pitcher's perceived strength of schedule, the quality of the defense behind him, the stadiums in which he competed, and more ephemeral indicators like "wins" or "leadership."

While no one statistic can capture everything, I do think we now have at least one statistic that captures most factors that are legitimately considered by a Cy Young voter. That statistic is Deserved Run Average (DRA). DRA was introduced earlier this year at Baseball Prospectus, and has been continually refined since then.

DRA is unique among pitching statistics in that it is explicitly designed to address the two things most important to a conscientious Cy Young voter: 1) How well did a pitcher actually pitch? 2) How much was a pitcher unfairly penalized by factors beyond his control?

How well did the pitcher actually pitch?
Traditionally, analysts have focused on a pitcher's ERA. It's not a terrible place to start, but it increasingly seems like an obsolete one.

The problems with ERA are well known. The decision to assign errors rests in the discretion of each team's home scorer, and the rules that declare runs "earned" and "unearned" are arcane. One can work around this by using raw runs allowed per nine innings, but 1) virtually nobody does this, and 2) both earned and unearned runs are too blunt an instrument by themselves to tell us enough about what the pitcher, himself, was responsible for.

DRA addresses this in part by focusing on "run expectancy," rather than the actual runs that happened to cross the plate in each game. In plain language, DRA looks at the average effect on scoring of various types of on-base events rather than whether each individual runner actually scored. This takes a lot of luck out of the equation, and focuses (correctly) on how many bases the pitcher actually gave up to batters.

Thus, if a pitcher gives up a lot of doubles and home runs, DRA expects him to give up more runs. If a pitcher is primarily giving up singles and other forms of weaker contact, DRA expects him to give up fewer runs. By using DRA, analysts no longer need to speculate about whether a pitcher "stranded more runners than expected" or "seemed to have bad luck at the wrong times." DRA compensates for much of this and does so in a consistent way for all pitchers.

As a result of this, when it comes past performance, DRA better estimates run expectancy than any other sabermetric statistic. Our testing indicates that DRA can explain about 70 percent of the runs pitchers allowed on their watch. Fielding Independent Pitching (FIP), another fine statistic, can explain only 50 percent of those runs. Other useful statistics, such as xFIP and SIERA, explain even less. FIP, xFIP, and SIERA all have their uses in predicting future performance, but the Cy Young Award is about how well a pitcher already pitched, and in that respect, DRA's performance is unmatched in publicly released statistics.

Was the pitcher unfairly penalized by external factors?
There is no question that factors like a pitcher's ballpark, quality of defense and strength of schedule affect his ERA. Pretending otherwise -- €”which is what relying solely on ERA would do -- €”is neither realistic nor fair. Thus, writers are quite correct to look behind a pitcher's ERA (or RA9) and analyze contributing factors beyond the runs that were charged to each pitcher.

The problem is that it is impossible for any human being -- €”certainly including me -- €”to individually make those adjustments in a way that is both accurate and fair to all pitchers. Very few of us watch every game, and no one has the capability to remember every play and the complete effect of every player even if he/she did watch every game. Therefore, we end up making ad hoc adjustments that might be based on good ideas, but end up arbitrary. For example, in 2013, Hisashi Iwakuma might have been very good, but he admittedly also pitched in a pitcher-friendly park and got to face the Astros a lot. Many writers concluded that "some" of his sparkling ERA was undeserved. The definition of "some" naturally would vary from writer to writer, to the extent any precise discount was settled upon at all. Similarly, an Indians starter (take your pick, really) in recent years plays in a batter-friendly park and has generally had a poor defense behind him, so a writer could just assume his ERA should have been better to some undetermined extent.

Again, DRA comes to the rescue. By putting all events for a given season into a self-contained model, DRA can and does control for these factors, and most importantly, it controls for them in a consistent way across all pitchers. That doesn't mean that you have to fully agree with (or even completely understand) how DRA makes these adjustments. What matters above all is that applying the same standard to all pitchers, DRA allows these adjustments to be consistent.

The good news is that DRA accounts for virtually all of the factors that researchers have shown affect pitchers the most. Good catcher framing can save or cost a pitcher six or more runs in a season. Bad defense can have a similar effect. A pitcher's mix of stadiums over a season can affect run-scoring by over 20 runs. It's also much easier to be a reliever than a starter, which means relievers deserve to give up more runs than they actually do. Finally, DRA controls for pitchers who have had the benefit of pitching at home, the benefit (or disadvantage) of pitching in certain temperatures, and the strength of the batters each pitcher has faced.

All of this information is broken down for each pitcher in our DRA Runs table at Baseball Prospectus (currently, from 1998 to the present). This means there is no longer any need to speculate on the likely effect of a pitcher's ballparks, his quality of opponents, or how much better he would have been with a different defense. DRA estimates all of that for you. (Because pitchers are trying to prevent runs, remember that pitchers with negative estimates in a column have benefitted from that factor, while those with positive run values have been hurt by it.)

Most importantly, although the DRA Runs table estimates how many runs a pitcher gained or lost from each factor, DRA ultimately gives you what you want: a Deserved Run Average for each pitcher, right alongside his ERA so you can compare the two. DRA is scaled to Runs Allowed (RA9), rather than Earned Runs Allowed (ERA), but that distinction ultimately makes no difference, because lower is still better. What DRA tells you is our best estimate of how many runs each pitcher deserved to give up. And the question during Cy Young consideration is exactly that: Which pitcher has been the most deserving of baseball's highest pitching award?

I won't presume to tell people how they "must" issue their vote for a Cy Young Award, but I do think that the best process really should start with a pitcher's DRA, and in particular his Pitcher Wins Above Replacement Player (PWARP), which is our calculation here at Baseball Prospectus of how much win value each pitcher ultimately brought to his team by virtue of his DRA and the number of innings he pitched. If you have a bugaboo about WARP a/k/a WAR, then just look at the best DRAs for qualified starters, and you'll end up in pretty much the same place.

Who would DRA vote for?
In the National League, as of Labor Day, DRA thinks the clear choice is Zack Greinke (2.00 DRA). Clayton Kershaw (who ranks second in DRA at 2.19) has a better FIP than Greinke, but we've already discussed why FIP is not the best choice for evaluating the quality of past performance. The two pitchers are certainly close overall, and Greinke has benefitted from several runs of superior catcher framing. But even taking that into account, as of now, Greinke still has suppressed hitters better than anyone else. Jake Arrieta, in third place, has had an outstanding season (2.30 DRA), but he has allowed hitters to be more productive than Greinke or Kershaw, and has benefitted from pitching in friendlier stadiums.

In the American League, DRA gives the nod to Sonny Gray, whose 2.49 DRA is the best in the league and one of the few bright spots for the Oakland A's this season. Even adjusting for the pitcher-friendliness of his parks this year, Gray has prevented runs at the best rate. Dallas Keuchel has been tremendous, but comes in at a clear second (2.73 DRA). Keuchel has benefited from better catcher framing and equally friendly ballparks. David Price, with an excellent 2.79 DRA, comes in third.

I stress, as I always do, that others have every right to a different opinion, particularly if they, unlike me, have a Cy Young ballot. But I think the voting process would be more understandable, more consistent and probably of better quality if a pitcher's DRA was a strong component of writers' respective decisions.

Jonathan Judge is an author of Baseball Prospectus.