Sabermetric Mining – Hitter BABIP

Title: Sabermetric Mining – Hitter BABIP
Date: August 10, 2012
Original Source: Full Spectrum Baseball
Synopsis: This weekly Sabermetric Mining feature will look at a different statistics each week, explaining it and identifying how to leverage it for fantasy purposes. This time I took a look at BABIP for hitters, what components go into creating a baseline expectation for each player, and how the stat can be used by fantasy players.

You can forgive the screenwriters for not referencing the stat back in 1988, but “batted ball luck” as interpreted through the lens of Batting Average on Balls In Play (BABIP) is at the crux of the success and struggles of many baseball players.

“Know what the difference between hitting .250 and .300 is? It’s 25 hits. 25 hits in 500 at bats is 50 points, okay? There’s 6 months in a season, that’s about 25 weeks. That means if you get just one extra flare a week – just one – a gorp… you get a groundball, you get a groundball with eyes… you get a dying quail, just one more dying quail a week… and you’re in Yankee Stadium.” — Crash Davis , Bull Durham

It is certainly possible that our friend Crash was simply displaying sour grapes about being the type of player who sustains a low BABIP profile, but it is also possible that he is correct in attributing his low batting average to some degree of randomness. While we do not have access to Davis’ career numbers, aside from his 247 career minor league home runs, we do have access to this information for every active ballplayer. As such, if we can understand BABIP and how it impacts Batting Average (AVG), we can leverage the stat as a tool in our fantasy endeavors. For this article I will examine BABIP as it pertains to analyzing hitters, as BABIP is one of the core elements of batting average performance, a standard category for most fantasy leagues.

The Stat
BABIP – As mentioned, the acronym stands for Batting Average on Balls In Play, or rather, the batting average players achieve when home runs and strikeouts are stripped out. The calculation is rather simple, and is as follows:


For reference, the league BABIP has fluctuated between .293 and .303 over the last decade and sits at .296 for 2012. So a player with a .350 BABIP might be the recipient of some degree of luck and due for regression. However, it is also possible for players to exhibit the ability to perform better or worse than the league average in this metric, so it is important that we look at some factors before establishing if a player is due for regression.

Factors Effecting BABIP
Speed – Speed can impact BABIP in a very obvious way, as a fast player should be able to run out more hits than a slower player. On a chopper to third, for example, Michael Bourn may be safe by a step while David Ortiz is out by a mile.

Defense – If a batter hits a grounder into the hole, forcing the shortstop to range far to his right and make an off-balance throw, there is a good chance it goes for a hit. But what about if the batter happens to hit that ball towards Brendan Ryan instead of Derek Jeter? The odds certainly increase that a play is made. The same thing goes for a shallow fly ball hit in the direction of a Ben Revereinstead of a Lucas Duda. Over time, we can expect the effects of defense to “even out” so that a player’s BABIP will normalize. With that said, since certain players and teams exhibit skill in the area of defense, a team like Oakland may expect a lower BABIP since they play in a division with strong defensive teams, meaning at least 54 of their 162 games take place against teams who would be expected to limit opponent BABIP.

Batted Ball Profile – An infield fly is not going to get you a hit, except maybe in the case of extremely windy conditions. Similarily, a line drive is tough to get a handle on for a defender, and more often than not gets to its point of destination too fast for a defender to make a play on it. Likewise, fly balls are easier to turn into outs than ground balls. All of these things seem logical and obvious, but for some time were ignored in terms of BABIP. As batted ball data becomes more readily available, tools have been further developed to identify a player’s expected BABIP (xBABIP) based on the type of balls they put in play. A laser-show like Joey Votto (30.2% line drive rate, LD%) should see a higher BABIP than fly ball machine Jay Bruce (45.2% fly ball rate, FB%) because he is hitting balls that are more difficult to field. Thus, we have to keep a player’s batted ball profile in mind when evaluating BABIPs that are different from the league average.

Luck – Luck is mostly what Crash Davis is referring to in the quote above, but you should probably think of it in terms of “randomness,” which is a more karma-neutral phrase. It is difficult to narrate or define randomness in terms of BABIP, except to say that things like defense, park factors, weather conditions, and other elements of randomness should, in theory, even out over time. Still, this might not happen over sample sizes as small as a single season, so we can think of randomness as our key indicator in trying to identify buy or sell candidates. If speed, defense, and batted ball profile have all been accounted for, mentally if not statistically, then any remaining gap between actual BABIP and the BABIP we would expect is ready to be exploited.

Career BABIP – This is not a factor effecting BABIP, rather one describing it, but it is a good element to keep in mind. Since players demonstrate a certain level of expected BABIP, once they have been around for long enough we can compare their current BABIPs to their career levels, making the assumption that our factors above have evened out over a multi-season sample. Thus, we can expect BABIPs to regress to the player’s mean rather than a league mean. For example, David Wright‘s .372 BABIP looks like a strong candidate for regression, but maybe not to the league average .296 mark, but to the level he has established over his career, a .343 mark.

How to Use
With all of the factors I have mentioned that can impact BABIP, how can fantasy owners possibly consider everything quickly to identify buy and sell candidates? Unfortunately, there is not really a quick, readily available stat like xBABIP available on leaderboards…yet. Luckily, smarter minds than mine are working on it. Back in 2009, Beyond The Boxscore introduced an xBABIP calculator, which has since been ammended at Fangraphs. What this latest tool attempts to do is take a player’s batted ball data, apply the league average levels of success for those type of hits, and then determine an expected BABIP.

So your first option, if you have the time and the desire to look at players on an individual basis, is to follow the Fangraphs link I used above and scroll to the “Appendix” section. There, Jeff Zimmermanhas kindly provided a link to download his xBABIP clalculator, with instructions for how to use it.

My process for analyzing BABIP is usually to sort by the highest or lowest BABIP players, check their current BABIPs against their career norms, and then, if they seem like a potential regression candidate, plug them into Jeff’s xBABIP calculator to see if their xBABIP backs up my analysis. Allow me to walk through a few examples to illustrate.

Carlos Gonzalez – CarGo is a hard-hitting batter with great speed, but is his .377 BABIP a significant outlier due for regression? Looking at his career numbers, you notice that he has never had a BABIP below .318 and his career mark is a healthy .352. Since his batted ball data is also more or less in line with his career norms, it appears that, while high, CarGo’s BABIP is only due for a modest regression. The xBABIP calculator projects a .309 mark, confirming that he is due for some regression, but not quite to a league average mark. Thus, while Gonzalez may see his .326 AVG decrease a bit, he should still remain a challenge for the batting crown.

Melky Cabrera – Melky has been highly analyzed over the past two years due to a seemingly unsustainable breakout. Our BABIP tools confirm the narrative that Cabrera is probably not experiencing an enormous improvement to his hit tool, but rather making small improvements and getting a bit lucky. Cabrera’s career mark of .310 is above average, and his smaller number of fly balls confirms that he should be an above-average player in this category. However, his performance is all too extreme for 2012, and the xBABIP calculator pegs him for a .322 clip, much lower than his current .389 level. Cabrera has tallied enough hits to remain in the batting crown conversation, but he should probably be flagged for regression heading into 2013 assuming he is priced as a category stud for AVG.

Jose Bautista – As the ultimate fly ball hitter, Joey Bats is a player we would generally expect to have a lower BABIP. But a .217 mark is exceptionally low, and it is the second time in three years he’s had a BABIP of .233 or less. With a .270 career mark, is Jose one of the unluckiest batters in baseball or has his home runs or nothing approach just created new ground when it comes to BABIP? xBABIP actually believes the latter to a degree, indicating he should have only expected a mark of .257, although this seemingly minor difference accounts for 10 hits so far this year and would raise his AVG from .244 to .274. So yes, Bautista has been a bit unlucky, but you are owning him for the power numbers anyway, and he will not be available as a buy low.

Cameron Maybin – Maybin is speedy, and while he does not square up many balls for line drives, he also puts a lot of balls on the ground. So how does he come out with just a .257 BABIP, especially with a career .313 mark? Maybin has been tabbed as a potential second half sleeper due to strong run and steal potential, but a BABIP analysis should show that you can also expect a significant jump from his .213 AVG, right? Curiously, xBABIP only likes Maybin as a .263 man this year, but since his batted ball data has stayed the same since last year’s .327 xBABIP, we can identify at least part of that being due to the league-wide drop in batting average on ground balls, from .182 to .159. Maybin is an odd case in that I was shocked to see the xBABIP results after he passed through all my normal filters for a regression candidate, serving warning that we need to be thorough in situations where we plan to be aggressive, like I have with Maybin for the second half (oops?).

Potential Sell High – These batters have the highest BABIPs in the league. Look at the type of player they are, their career norms, and utilize the xBABIP Calculator to help identify players due to regress to a lower BABIP, and thus a lower batting average.

Potential Buy Low – These batters have the lowest BABIPs in the league. Look at the type of player they are, their career norms, and utilize the xBABIP Calculator to help identify players due to regress to a higher BABIP, and thus a higher batting average.

Hopefully this article has helped explain some of the elements that can go into a player’s batting average on balls in play. I should note that there is ongoing data analysis to determine the impact that the type of contact made has on BABIP (e.g. hard hit, soft hit, etc) which will hopefully give us an even more robust understanding of BABIP and “ball luck” in the near future. Knowing which types of players can sustain high BABIPs can help to identify batting average outliers, and the xBABIP tool is an excellent resource for identifying “luck” (randomness) in player performance.

I’m new here, so come get to know me on Twitter @BlakeMurphyODC.

All stats courtesy of FanGraphs, for games through August 6 (sorry, going on vacation).

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: