Jhonny Peralta is Cold Like the Weather
Following up on my earlier Fanshot, here's a look at the temperature's affect on Jhonny Peralta vs. the rest of the AL. At Jay's request, I looked at batting average, BB/K, and OPS.
I took Retrosheet's AL data from 2003 (the year Peralta entered the league) to 2008. Here's how the American League fared in terms of batting average:
and here's how Peralta fared:
In care they're hard to see, the AL had an r-squared of .47 and Peralta an r-squared of .18. Not much, but they're both positive. I also had Excel give me the equation for the trendlines:
AL: average = 0.0007*temperature + 0.2202
Peralta: average = 0.0028*temperature + 0.0484
If I'm interpreting that correctly (and there's no guarantee I am), that means the temperature has four times the affect on Peralta as is does the average AL hitter.
Next. here's BB/K for the AL:
and for Peralta:
The AL graph had a distinctive curve to it, so I decided to fit a polynomial trendline. As you can see, the average AL hitter's BB/K ratio improves in extreme temperatures, while Peralta's gets worse. I'm not sure what, if anything, that means.
Finally, here's a look at the AL OPS:
and Peralta's:
The AL had an r-squared of .39 and Peralta .06. The line equations are:
AL: OPS = 0.0017*temperature + 0.636
Peralta: OPS = 0.0042*temperature + 0.4071
This time - again, if I'm interpreting things correctly - the temperature has 2.5 times the influence on Peralta's OPS as it has on the average AL hitter's.
8 recs |
29 comments
Comments
Huge rec and a reminder to myself to look at this again when not in bed.
Ben Francisco: An Outfielder only on baseball cards and roster sheets.
by westbrook on May 2, 2009 2:33 AM EDT reply actions 0 recs
My last stats class was in ’75. All I can say without the aid of more coffee is: Wow! Nice work.
by LeftyCatcher on May 2, 2009 8:18 AM EDT reply actions 0 recs
That is one more than I’ve ever taken. My last real math class was my senior year at St. Ignatius in 1998. Antonelli’s Analysis, not exactly life changing. So, yeah, interpretation would be nice; however, it sure as hell looks impressive, and I’m sure a fair amount of work went into it, so thanks.
Il faut d'abord durer.
by CU Adam on May 2, 2009 11:54 AM EDT up reply actions 0 recs
Mr. or Mrs.?
Carmona for Cy Young 2009
by danvail on May 2, 2009 12:19 PM EDT up reply actions 0 recs
Mr. The Coach. Fall was great, every time someone didn’t do their homework, we’d just ask what the defensive gameplan was for Friday night. Sort of like with Gus Caliguire, who could talk about Carlos Baerga or any other 2B for hours.
Il faut d'abord durer.
by CU Adam on May 2, 2009 12:41 PM EDT up reply actions 0 recs
So the consensus is that Peralta hits poorly in cold weather, right? I majored in Philosophy.
by piersall on May 2, 2009 9:31 AM EDT reply actions 0 recs
Nice. The high BB/K rate at the extreme temperatures is intriguing.
Did you control for number of PAs for each temperature? That shouldremove some of the outliers in Jhonny’s charts – perhaps you can get a tighter result on the R-squared. For instance, the 100-degree data point for Jhonny has a .000 OPS, but no value for BB/K – that leads me to assume he has no K’s for 100-degree ABs, so very few ABs.
"Ignorance more frequently begets confidence than does knowledge..." C. Darwin
by Spidey on May 2, 2009 10:51 AM EDT reply actions 0 recs
Interesting analysis. If you’ve got the raw data in a spreadsheet, I’d love to take at look at it; I think I have an idea that would help remove those small sample outliers at the ends of the graph.
I'm not really into Song of Hiawatha.
by sarcasmdave on May 2, 2009 12:17 PM EDT reply actions 0 recs
These are all extremely weak correlations, and as we know correlation does not imply causation. This apparent trend, even if it is real, could simply be the result of players’ tendency to start the year a little rusty. Look at the first graph, much wider variance inthe beginning of the year.
Carmona for Cy Young 2009
by danvail on May 2, 2009 12:20 PM EDT reply actions 0 recs
*Still a cool look at some data though, can you post the excel spreadsheets?
Carmona for Cy Young 2009
by danvail on May 2, 2009 12:22 PM EDT up reply actions 0 recs
This is definitely a cool idea, thanks. So we can expect him to put up huge numbers in another month. Problem fixed.
"Lotta heart in Cleveland." - Ian Hunter
by Denver Tribe Fan on May 2, 2009 12:27 PM EDT reply actions 0 recs
I like this, but I think what it suggests is that Peralta is pretty normal when it comes to temperature/performance. The one chart that makes Peralta look difference is the BB/K chart, but with that chart, the effect is minimal and the explanatory power represented by Peralta’s curve is also not far removed from zero. It’s great to see it laid out, though.
by APV on May 2, 2009 12:29 PM EDT reply actions 0 recs
I take it you used equally sized temperature bins all across the spectrum? It might clean things up a bit if, instead of constraining the bins to be equally sized in temperature, make them equally sized in terms of number of plate appearances. So, in other words, sort the plate appearances by temperature, and divide them evenly into however many groups. That way you’ll be less likely to have the bloated variability at the extreme temperatures, where there’s less data (and if it’s still there it might tell you something substantive, instead of simply reflecting a smaller number of PAs per data point).
I think the uneven data issue might have something to do with the shape of the BB/K curves also. In Jhonny’s case, the small samples at the ends give you a bunch of zeros, which pulls the curve down. On the league graph, reducing the sample size gives you more variability, but because the measure is a ratio of two positive quantities, the variability is asymmetrical. That is, there’s more room to skew the ratio up than down, which will tend to pull the curve up when you’re fitting the curve by minimizing squared errors.
Ideally you probably want to use a log scale for a ratio measure, which will make things more symmetrical. It also will give you the same curve for BB/K and K/BB (just upside-down), which using a raw ratio scale doesn’t do. Once you’ve fit the regression line on the log scale, you can always transform back to raw ratios, to make reading the graph more intuitive.
If you wanted to send me your raw data, I might play around with it a little…
by Logodaedalus on May 2, 2009 5:06 PM EDT reply actions 0 recs
I don’t know what is happening here but Jhonny’s bb/k looks like a frown, so I assume it is bad.
I become an expert simply by doing something.
by fwembt on May 2, 2009 11:27 PM EDT reply actions 5 recs
For those who asked for it, here's the spreadsheet
as a Google Doc. If the link doesn’t work, let me know.
http://spreadsheets.google.com/ccc?key=rGbDl4Vc3sOKfsGfeKH-Osw
In case it’s not obvious, on each sheet the first column is the temperature and the last column is the statistic in question.
by kanka on May 3, 2009 10:34 PM EDT reply actions 0 recs
Thanks! I’m sure a few stat-heads here will figure out some ways to turn these data into lies.
I'm not really into Song of Hiawatha.
by sarcasmdave on May 3, 2009 10:53 PM EDT up reply actions 0 recs
Cool, thanks
Carmona for Cy Young 2009
by danvail on May 4, 2009 7:25 AM EDT up reply actions 0 recs
I disagree with your “2.5x great effect on OPS” conclusion because that regression is an extremely weak predictor (as a single variable model should be in this instance). The model is so weak that it’s basically useless – as evidenced by the stochastic, non-trending data points.
by joeee on May 4, 2009 4:29 PM EDT reply actions 0 recs
To expand -
R^2 values show how much variability in the data is explained by the regression. An R^2 value of 6% is very weak in this instance, so Jhonny’s OPS V TEMP coefficient is totally bogus.
RE: single variable model
This is the more important point. Until you consider the myriad factors for hitting performance, any easy relationship between OPS and temp is probably bogus. Off the top of my head, here is an example of a correlated nuisance factor that could disappear if you did a rigorous multivariable regression: temperature also corresponds to number of reps seen in the season. Colder temps mean beginnings and ends – break in and break down periods, respectively. Whose to say that fluctuation isn’t merely a function of repetitions before your body breaks down which happens to correlate to temperature?
In fact, the conclusion I draw from these data is that there is no rhyme or reason to Jhonny’s performance, and his slumps are random. I only say all this because you clearly have the skills and resources to do a really great job with this stuff – Excel is a great tool, and MINITAB is better if you have it. But until then, this is stuck in the basement of number-massaging-to-rationalize-a-bad-team. Which, as evidenced by recent discussions, we’re all tired of.
by joeee on May 4, 2009 4:47 PM EDT up reply actions 1 recs
Thanks for the feedback
I seriously doubted that there would be a direct relationship with the coefficients (""2.5x great effect on OPS"), so thanks for clearing that up.
As you pointed out, I have the foundation and desire to dive into heavy baseball statistical analysis. But I’m still a beginner in many ways. (Plus I need to dust of my copy of Minitab.) So, I’m very fortunate that there’s a lot of intelligent feedback in these comments.
by kanka on May 4, 2009 6:07 PM EDT up reply actions 0 recs
No prob – I’m glad you interpreted my tone correctly. I meant my remarks as constructive criticism, not cheap-shotting, and with the internet, you really never know how you come across.
I’m sure I have much more to gain from your work than you do from my comment. For the life of me, I can’t figure out how you even got temperature data for individual players. I even dug on retrosheet and couldn’t figure it out. Did you have to cross reference each date with the average temp of the city the game was played in? I highly doubt you used such inelegant methods as what I just suggested. Care to share your secret?
by joeee on May 4, 2009 6:25 PM EDT up reply actions 0 recs
The Retrosheet play-by-play (event) files include temperature data, among a lot of other things. You can use software tools available at Retrosheet to extract the data into a more convenient form (although you could in theory open the event file in any text editor).
by FredOx on May 5, 2009 9:20 AM EDT up reply actions 0 recs
Fred's correct
BEVENT.EXE creates a play-by-play file from Retrosheet, and BGAME.EXE creates a file that includes gametime temperature.
I imported those two files into Access, so I could combine them in a query (or a complicated series of queries, since I took the quick-and-dirty approach). The nice thing is that each line of both files contained a gameid field, so I could use that to combined the two.
Once I got the data the way I wanted, I copied it over to Excel so I could create the graphs.
by kanka on May 5, 2009 10:37 AM EDT up reply actions 0 recs
(Psst … kanka … nobody uses the “subject” here … maybe you haven’t noticed …)
Well, that's just, like, your opinion, man.
by Jay on May 5, 2009 5:52 PM EDT up reply actions 0 recs
Oh, I guess you're right
I mean… whoops….
by kanka on May 6, 2009 8:57 AM EDT up reply actions 2 recs
Perhaps you should retitle this “Johny Peralta is Zero Degrees Kelvin”
by ShawnK on May 7, 2009 2:14 PM EDT reply actions 0 recs

by 



























