Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
Around SBN: 2012 Budweiser Shootout Entry List Released

Hafner, Hope and Small Samples

The brightest light in what has been a dark beginning to the Tribe's season has been the performance of Travis Hafner.  As has been well chronicled, Hafner was a major question mark coming into this season following a regression in 2007, a dismal 2008, and off-season shoulder surgery.  The question of Hafner is made many times larger by the $57M the Indians are on the hook for him through 2012.  A repeat of last year would seemingly assure the Indians of the largest contractual albatross in club history.  A return to pre-2007 levels would alternatively be a huge boost to what should be a good offensive lineup even in Pronk's absence.

As we are pretty much all aware, Hafner has begun the season with 3 HRs, a double and two singles in his first 24 plate appearances (spanning 5 games).  This is great.  But does it give us any indication if Pronk is back?  One of the things I found most striking about Jay's interview with Antonetti was the repeated references to the need to get a large enough sample to assess performance and the difficulties of performance analysis in the context of small samples (hello, bullpen).  This is a fairly well addressed topic in sabermetric research, with several studies examining how many plate appearances are necessary for current season performance to be reliably predictive of total season performance (i.e. how many ABs do you need before something like SLG% or OBP% to become significant?).  24 plate appearances is about an order of magnitude too small to predict what Hafner will do the rest of the season.

But it doesn't mean we can't ask questions about his performance thus far.  As it turns out, my real life profession regularly involves me asking questions strongly constrained by sample size (morphological variability in the fossil record).  And while sample size is a major obstacle, it is possible to ask simple questions even given the limited information available to us.  

So that's what I want to do, ask a simple question about Hafner 2009.  And my question is this - what is the likelihood of observing Hafner's 2009 performance across 24 plate appearances in the 2008, 2007, or 2006 Hafner seasons?

Star-divide

To do this, I did a few things.  First, I coded Hafner's plate appearances over the past three years based on result; how many outs, walks, singles, doubles, triples and HRs (for simplicity sake, I'm excluding sacrifices, HBP, etc).  I then created a distribution of performance based on 24 plate appearances for each of Hafner's past three seasons by randomly drawing 24 plate appearances out of the entire sample of his season and repeating this process 10,000 times.  For example, in 2006 Hafner had 563 plate appearances, 100 of which were BBs, 66 of which were singles....42 of which were HRs.  What I did was randomly draw 24 of those plate appearances against which I can compare Hafner's actual 24 plate appearances this season.  To compare them, I simply calculated the OPS across those 24 plate appearances.  I then repeated this 10,000 times for each season.

For those worried about my sanity, this only takes about 5 minutes and 10 lines of code.  This is essentially a "bootstrap" test for those with a little stats background.

As a reminder, I'm interested in what the likelihood is of observing Hafner's 2009 performance in 24 plate appearances based on his 2008, 2007 or 2006 performance.

Here are the results:

Compared against his 2008 performance (for which he put up a stellar .628 OPS in 233 PAs), Hafner's 2009 performance was better than all but 111 of the 10,000 trials.  In other words, his numbers thus far would put him in the 99th percentile of expectations based on his 2008 performance (see distribution below).  At the 1% level we can make a pretty strong claim that Hafner's performance is better than you would expect based on his 2008 numbers.

2008_medium

Compared against his 2007 numbers, Hafner's 2009 totals exceed the simulated trials in 9,065 of the 10,000 trials.  Being conservative, although this is pretty far onto the right half of the distribution (see below) we can't say that his performance thus far has been different than what you would expect from his 2007 numbers.

2007_medium

Finally, compared against the 2006 season, Hafner's 2009 totals exceed the simulated trials in 6,570 of the 10,000 trials.  This is pretty much smack in the middle ground of the simulated distribution.

2006_medium

So what does this mean?  It provides some evidence, even given the limited data available to us, that Pronk is back.  This in no way suggests he is going to continue on this path.  This in no way predicts what he will do the remainder of the season.  But it suggests his performance this far is pretty inconsistent with his performance last season - which since he was terrible, is reason for hope.

 

Comment 55 comments  |  9 recs  | 

Do you like this story?

Comments

Display:

Best use of five minutes, ever?

by Voltaire on Apr 13, 2009 5:05 PM EDT reply actions  

This is awesome. I wish I had any clue at all how to do things like this. Will you teach me?

I'm *always* in the driver's seat, cugino -- Chuck

by Turkmenbashi on Apr 13, 2009 5:07 PM EDT reply actions  

It’s actually very easy. Basically:

  • enter your data into something (you can do this in excel)
  • randomize the order of your data
  • draw 24 things and calculate whatever metric you are comparing (OPS in this case)
  • compare your randomly simulated data (what you’ve just calculated) with your observed data
  • save that information
  • repeat a lot of times

by APV on Apr 13, 2009 5:19 PM EDT up reply actions  

this is the best thing i’ve seen since my sister-in-law showed me a peep in the microwave yesterday.

by Brick. on Apr 13, 2009 5:25 PM EDT reply actions   2 recs

zomg. you had me until the “in the microwave part”. what a letdown….

this is the best thing i’ve seen since my sister-in-law showed me a peep

by oaksterdam on Apr 13, 2009 7:42 PM EDT up reply actions  

I should add – there are a few assumptions with this approach. The first is that each plate appearance is independent. This probably isn’t actually true, but it is difficult to get around and I’m guessing it doesn’t have a major impact on the final result. It would be possible to consider Hafner’s performance in 24 consecutive plate appearances, but that would necessitate me entering the exact sequence of Hafner’s plate appearances, which I have no desire to do. The second is that the level of competition Hafner has faced is equivalent to that of previous seasons. This is potentially a more substantive problem, but with the limited performance available to us in 2009, can’t really be corrected.

by APV on Apr 13, 2009 5:28 PM EDT reply actions  

so, you didn’t take 24-in-a-row chunks. just any random 24 of the total.

by Brick. on Apr 13, 2009 5:35 PM EDT up reply actions  

yes – to do 24 consecutive I would have had to enter data for each season in sequence and not just as aggregate data

by APV on Apr 13, 2009 5:36 PM EDT up reply actions  

You don’t have an intern for that sort of thing?

Though I look right at home, I still feel like an exile

by Manhattan Tribe Fan on Apr 13, 2009 5:37 PM EDT up reply actions  

if only my dog Lola had the necessary dexterity in her fingers. or a brain larger than an plum.

by APV on Apr 13, 2009 5:39 PM EDT up reply actions  

I can’t get around this. These aren’t 24 randomly selected at-bats. I think you did great work here but it would be significantly more meaningful the other way.

by jakesinger777 on Apr 13, 2009 6:49 PM EDT up reply actions  

It’s computationally more challenging. Also, you can make the argument that what I did is more conservative. By making it 24 consecutive plate appearances the problem of level of competition becomes a lot more severe.

by APV on Apr 13, 2009 6:51 PM EDT up reply actions  

Here’s one paper on topic. I still think it’s not going to make a huge difference in Adam’s study though. It would broaden the distribution curves a bit, but I doubt enough to change his conclusion.

by dgcambridge on Apr 13, 2009 7:12 PM EDT up reply actions  

In 2006, Pronk had 541 combinations of 24 consecutive PA. Of these, 198 (or about 36.6%) produce an OPS of more than 1.217.

In the next 5 2009 PAs, of course, Pronk’s OPS dropped to 0.985. Changing the 2006 data to streaks of 29 PAs shows that 2006 Pronk had an OPS of more than 0.985 in 323 of 526 samples (60.3%).

by FredOx on Apr 14, 2009 11:03 AM EDT up reply actions  

Quick question (for my sake): is it clear based on my description what I did and how it addresses my question?

by APV on Apr 13, 2009 5:31 PM EDT reply actions  

the post is clear. this sentance is not.

by Brick. on Apr 13, 2009 5:32 PM EDT up reply actions  

nor is this sentence.

by Brick. on Apr 13, 2009 5:32 PM EDT up reply actions  

I thought so.

Though I look right at home, I still feel like an exile

by Manhattan Tribe Fan on Apr 13, 2009 5:36 PM EDT up reply actions  

Wow, Adam. I picked a bad week to stop sniffing glue. Awesome, man.

by afh4 on Apr 13, 2009 5:44 PM EDT reply actions  

what are you, some kind of nerd?

by Cap'n Snegiryov on Apr 13, 2009 5:46 PM EDT reply actions   1 recs

And I just noticed Brick’s fanshot. Hope. And remember, even though we’ve only won 1 game, we’re only 4 games under .500. As I used to repeat mantra-like as an undergrad, getting behind early just gives you more time to make it up.

by APV on Apr 13, 2009 5:47 PM EDT reply actions  

2.5 games out with 156 to go!

by dgcambridge on Apr 13, 2009 5:49 PM EDT up reply actions  

Me: lazily cherry pick an encouraging quote from an article I was reading and link it.

You: do substantive analysis, compile and present it.

by Brick. on Apr 13, 2009 5:50 PM EDT up reply actions  

good community = quality + depth

LGT = good community

by APV on Apr 13, 2009 5:51 PM EDT up reply actions   1 recs

along those same lines: web 2.0 + indians fans = LGT > any other analysis you can find in the MSM

kind of funny how this went up the same day as jay’s fanshot of the morning journal article—that is, seeing your work thrown up against that “analysis.” you produced something in five minutes that the LMJ guy couldn’t even dream of doing—no wonder the interwebs have the mainstream writers going into a full blown epistemological crisis. nice job btw.

by Cap'n Snegiryov on Apr 13, 2009 6:08 PM EDT up reply actions  

Good stuff! I’m going to show this to my stats class tomorrow — we’re talking about probability distributions right now.

by Buckeye Brad on Apr 13, 2009 5:51 PM EDT reply actions  

well…you can tell them there are a few logical alternatives to what I did. Instead of creating a distribution based on 10,000 random draws of 24 (which is the simplest approach computationally), I could have created distributions based on every possible combination of 24 plate appearances (arguably more accurate, but in practice probably not different at all). I also could have looked at every possible combination of 24 consecutive plate appearances (but see above for the data difficulties).

by APV on Apr 13, 2009 5:57 PM EDT up reply actions  

you’re just going to have to redo it anyway when he goes yard twice off of Greinke tonight.

by Brick. on Apr 13, 2009 6:03 PM EDT up reply actions   2 recs

Seriously, and this brings up another point. Watching the past couple games (I guess 3, since Pronk didn’t play in one of them), it occurred to me. I had total confidence in Pronk coming up in that situation yesterday and I couldn’t remember how long it’d been since I felt that way. Then it hit me. I never actually lost that utter faith in the man, it was more like each time he failed anew over the past 2 years a small piece of it died within me, but never completely.

by jakesinger777 on Apr 13, 2009 6:07 PM EDT up reply actions  

I’m not so sold yet. He looked like he was wincing and taking a lot of time in between pitches in the ABs I’ve seen. He could just grimace and wince because he’s a bad dude, but I’m still fearful he’s hurting. Also, he got a day off Saturday – I don’t know if it was to protect him from Halladay, basic rest, or something more ominous.

by joeee on Apr 13, 2009 6:10 PM EDT up reply actions  

Basic rest. The plan is to play him five games a week this early in the season.

by Voltaire on Apr 13, 2009 10:04 PM EDT up reply actions  

I agree. He was atrocious, but I always was thinking “This is the at-bat where he destroys the ball and turns it around.” It was the opposite of Casey Blake, who even when he was doing well, I thought: “This guy is going to strike out looking.”

by OddlyGaussian on Apr 13, 2009 6:20 PM EDT up reply actions   1 recs

Rec because I had the EXACT SAME thought process. “Oh, Hafner will take care of this” vs. “Man, Blake’s going to screw this up,” even long past the point where I’d seen enough of both of them to prove otherwise.

by zempf on Apr 13, 2009 10:36 PM EDT up reply actions  

This raises a real issue, though, which is the ‘sensitivity’ of this analysis to sample size. For example, an 0-4 at this point is going to make a significant difference in what Hafner’s performance looks like at the moment. And I’m also not totally convinced on Hafner. I watched him in 4 straight games the penultimate week of spring training and came away decidedly unimpressed. But I’m hopeful.

by APV on Apr 13, 2009 6:38 PM EDT up reply actions  

Dry desert air.

by fleerdon on Apr 13, 2009 6:48 PM EDT up reply actions  

I’m going to blame it on Scott Lewis’s elbow.

Though I look right at home, I still feel like an exile

by Manhattan Tribe Fan on Apr 13, 2009 6:56 PM EDT up reply actions  

A good community knows how to use the word “penultimate.”

Though I look right at home, I still feel like an exile

by Manhattan Tribe Fan on Apr 13, 2009 6:56 PM EDT up reply actions  

you know last night was your fault, right?

by Brick. on Apr 14, 2009 12:01 PM EDT up reply actions  

Yes. And the spring cold which I developed pretty rapidly late last night is my instant karma justice

by APV on Apr 14, 2009 12:09 PM EDT up reply actions  

Excellent analysis. Out of curiosity, what software/language did you do this with?

by OddlyGaussian on Apr 13, 2009 6:22 PM EDT reply actions  

I almost always work in Matlab, just because it’s what I’m familiar/comfortable with, and I like it’s graphical output capabilities

by APV on Apr 13, 2009 6:35 PM EDT up reply actions  

My buddy asked me yesterday whether he should pick up Pronk in fantasy baseball. This is way better than my answer of “Do it. He’s got that scary mystique back.”

by supermarioelia on Apr 13, 2009 6:29 PM EDT reply actions  

This is good news, I think. Though I was made to understand there would be no math.

B-Man would be proud, Adam.

Il faut d'abord durer.

by CU Adam on Apr 13, 2009 7:26 PM EDT reply actions   2 recs

rec for B-man reference. I spent a lot of time in that classroom.

by APV on Apr 13, 2009 7:29 PM EDT up reply actions  

All the time I spent in that classroom was either detention or some theology class being held there for some reason. There’s a reason I became a lawyer. I wish this sort of stuff made any kind of intuitive sense to me. You explain it well and simply, though, and that is very appreciated.

Il faut d'abord durer.

by CU Adam on Apr 13, 2009 7:47 PM EDT up reply actions  

A college buddy of mine said he once drew a full-body Chief Wahoo over an entire one-page B-man quiz. B-man graded each problem individually and gave him a C-minus.

by fleerdon on Apr 14, 2009 9:41 AM EDT up reply actions  

I was horrified to receive his first quiz after I misinterpreted his study tip of “Know Definitions” to be “No Definitions”.

by The DiaTriber on Apr 14, 2009 9:56 AM EDT up reply actions   1 recs

hahaha, i remember that. classic B-Man

by Roger Dorn on Apr 14, 2009 12:07 PM EDT up reply actions  

I wanted to repost this here since APV did an excellent job in his analysis and this is another way at answering the, “Is Pronk back?” question.

Well a quick look at hit tracker would say that his line drive HR on 4/10/09 against TOR was harder hit if you go by speed off the bat (115.8) than the one he hit yesterday (113.1), but the one yesterday went the farthest of the two 421ft true distance to 379 ft true distance.

He hit one of his five HRs from last season harder (116.2) on 4/4/08 in Oakland.

None of his 24 HRs in 2007 were harder hit than either of the two listed above that he has hit this year, and going back to 2006 he had 7 HRs hit at 113.0 and above and only 2 HRs hit harder than the line drive HR on 4/10/09 listed above. 2006 was the year he hit 42 HRs.

All and all Wedge is pretty spot on in regards to Hafner’s power showing already in the early month of the season. Its pretty impressive that he has hit those two most recent HRs as hard as he has.

by hans on Apr 13, 2009 7:54 PM EDT reply actions  

Cool.

Can Pronk pitch?

by Buzz on Apr 14, 2009 10:31 AM EDT reply actions  

Comments For This Post Are Closed


User Tools

Constantly updated Indians news, lots of in-depth analysis, live in-game discussions — and more fanatical and thoughtful Indians fans than every other web site combined.

FanPosts

Community blog posts and discussion.

Recent FanPosts

Dsc01731_small
Some quick questions for the locals
Etat_small
Eric's 2012 Cleveland Indians Projections...
Its_alive-fstn_small
Oswalt > Carmona/Heredia
Topps1978-332f_small
Indians by the Numbers — #24
Avatard_small
Nickname Seeks Indian — "Country Peach Passion"
Avatard_small
Nickname seeks Indian vote — "Fridge Magnet"
Topps1978-332f_small
Indians by the Numbers — #23
Small
Seriously Go Get Carlos Peña Now
Avatard_small
Indians by the Numbers — #22
Avatard_small
Nickname Seeks Indian: "Fridge Magnet"

+ New FanPost All FanPosts >

Featured Poll

Poll
Will Matt LaPorta be on the opening day roster?
Yes
59 votes
No
140 votes

199 votes | Poll has closed

FanShots

Quick hits of video, photos, quotes, chats, links and lists that you find around the web.

Recent FanShots

Indians depth chart heading into spring training (from Acta's twitter account)
Cleveland reliever Vinnie Pestano had the highest fastball swinging-strike...
Oakland Out-Winnercurses MLB for Cespedes
Indians Sign Jon Garland
A look back at the last Tribe arbitration hearing
MLB.COM Tribe Top 20
Jared Goedert is Puddin Head Jones
Chisenhall v.  Hannahan
After watching Lindor in the Fall Instructional League, I have very little...
Coming off of an optimistic 80-82 season, is this the Indians window to win?  

See full post on Beyond the Box Score

+ New FanShot All FanShots >


Managers

427px-nap_lajoie_1913_small Ryan

Dosequisman_small Jay

Editors

3444ant_black_small APV

47b8dd28b3127cceb64839d9746800000026102bauwjrq3za_small afh4