Last week YoDaddyWags had suggested a correlation between MiLB BABIP and MLB BABIP while discussing Yan Gomes’s expected 2014 numbers, using a 26 point reduction as noted here. Some thought the correlation wasn't strong enough to use as a predictor and I figured it could use a little expansion to see what unfolds in a larger sample size. With that in mind, I took all current position players with 1,000 or more plate appearances and compared the MLB and MiLB stats to see what I could find. I ended up with a sample size of 324, with no guarantee that I didn't miss any. If they didn't play in the MLB or an affiliate in 2013, I didn't include them. Even if they're retiring or only saw brief MiLB time this year, they're included. Hello, Manny Ramirez. Also, players such as Yoenis Cespedes weren't included for lack of MiLB stats, as well as Ichiro Suzuki, since the Japanese Leagues might not be the best comp.
The first thing that stood out as I began to compile the data was that my range seemed to be much less predictable than the data that was reviewed in the Pirates study, which I'll get to in a minute. But once I had all players entered I came to a stunning result - the average regression was, wait for it, 26 points. Exactly as predicted in the Pirates study. The other freaky thing? Yan Gomes's MLB BABIP is exactly 26 points below his MiLB career average. You can't write this stuff. It is safe to say that, yes, on average a player's BABIP should be expected to fall that much when making the jump to MLB. But, of course, it isn't that simple.
First, 40 players either saw their BABIP increase or remain the same (2 remained the same). That is roughly 12% of all players or greater than 1 in 10. Of those players, the range in MiLB BABIP was .278-.360, with only three of those being greater than Yan Gomes' .346. Those three? Joey Votto (.351), Derek Jeter (.352) and Austin Jackson (.360). I think we can go ahead and assume that Yan's MLB numbers will be below .346.
Second, the range is enormous - 46 point progression to 87 point regression. That is a swing of 136 points. It isn't fair to peg it quite that way since there are some outliers. The top and bottom 5 look like this (negative indicates regression in BABIP):
I don't know what Chris Johnson is eating, but we're tossing his stats along with Wieters, Hairston and Holliday. Without those four, the groupings come together much nicer and leave us with a range of 98 points - much more reasonable but hardly something you could put down in stone. For the record, removing the outliers does nothing to change the average of 26 points. That still stands.
Third, there could be variance in handedness as has been suggested with LHH's having an advantage out of the box. Separating handedness, and discounting switch hitters, moves the BABIP bar very little. Range movement is inconsequential and the average moves to 29 points for LHH and 24 points for RHH, with .306 and .302 the respective MLB BABIP numbers for each. Meaning that LHHs see some advantage, but not markedly so. A better study might be to factor in speed, but drawing that line is arbitrary at this point, so the numbers would need weighted somehow. And SBs aren't a perfect measure of speed, though I will use them later.
I think the most assured predictor, short of finding comps, is the fourth study. I broke out the old school box plot to find the mass group, the middle half of the numbers. We know the average regression is 26 points, but we now find that half of all major league hitters see a regression somewhere between 11-38 points, a 27 point range. Yeah, I was hoping for a 26 point range, too. We can't win them all, I guess. And, honestly, this is where you see most of your league average and "good" players. There aren't a ton of top names that fall in the middle half in BABIP regression, meaning that if you're predicting a player that you don't view as having superstar potential, this is where you'll want to look. Some of those players that fall in the top half (least regression or actual progression)? Holliday, Cano, Carlos Gonzalez, Miguel Cabrera. Those guys are just in the top 10. The rest? Stanton, Votto, McCutchen, Manny Ramirez. That's top 30. You get more ordinary the further you go. In fairness, the bottom half, the most regressive, doesn't show any particular trend.
But what does this mean for Yan Gomes in particular? A rough spitball would tell us that his regression from .346 to .320 is exactly average and perfectly in line with that we might expect. Wholly unsurprising, really. You could also predict, within the middle range, that his league average might fall anywhere from .308 to .335. Even with a consistent average of .320, you could see deviations in this range without causing any particular alarm. But is simply using league average the best look? Probably not. With the data at hand, let's find some better comps.
Finally, I took all of those players with an MiLB BABIP similar to Yan's .346 (I used 10 points in either direction) and then isolated the slower players. This was an imperfect system, but I removed anyone with more than 100 SBs in their career, as well as some with less than 100 who are young, like Kipnis. It wasn't perfect, but Yan has stolen exactly 6 bases since leaving college, so I wasn't worried about accidentally removing a few middle ground speedsters. This left me with a 37 point average on 55 players and middle half range of 25-49 points. Not encouraging. Thinking it might get better, I further isolated on RH and S hitters. That gave 27 players an average of 36 and range of, wait, 25-49. Crap. The sample size was cut in half and nothing changed. Isolating further to on RHHs gave an average of 35 and a range of 28-43.
What does all this mean? With most things, it is better to aim for a range rather than a particular number as average. 26 looks good over a nice sample size, but the range of 11-38 looks a lot better, especially when you compare that to players who more closely comp who you're predicting. The closest comp to Yan's MiLB numbers, especially when factoring in BB/K ratio, seems to be Delmon Young, who netted a 34 point regression. I'd take that number into consideration along with the 26 point average. Yan's already dropped 26 points in BABIP and I wouldn't be surprised to see him drop another 10 on average, meaning next year might likely range down around .300 to even out for his current average of .320. I wouldn't expect him to improve upon that number, though. Expect to see something between .305 and .315. Unless, of course, he really is a Yanimal.