Yesterday, I spent a couple of hours answering question #2 of the Oilers Hackathon.
Predict next season’s even strength save percentage of the goaltenders listed in appendix A
My approach to this question was to keep my analysis fairly short and simple. I give the illusion of precision, but the reality is a bunch of guesswork. A goalie’s save percentage has a lot to do with luck and his teammates, and a bit to do with his skill and his age.
Appendix A comprises a substantial number of NHL goalies. I decided to look at the average of the sample population for the past three years and use that as my starting point for each goalie. That number was 0.922.
I then pulled the even strength save percentage for the past three seasons for each goalie and computed the average. If the average was within 0.003 of the overall average (0.919 to 0.925), I simply assumed the goalie would regress to the mean for the upcoming season. For Appendix A, this applied to 50% of the goalies.
Regression was the overall theme to my methodology. That invisible force that pulls the great ones back to earth and props the bottom feeders up with a bit of hope.
For those goalies outside of that range, I assumed their skill level was consistently better or worse than average. Roughly 25% were above average and 25% were below average.
I broke the rest of the goalies into two groups. The top/bottom three goalies were assumed to regress 25% towards the mean (my so called “Henrik Lundqvist/ Steve Mason rule”)*. If a goalie had a three year average even strength GAA of 0.930, my estimate for the upcoming season was 0.928 = 0.930- ((0.930 -0.922) x 25%).
* Poor Steve Mason. He is the first person I think of when I think of shoddy goaltending (well perhaps Red Light Racicot comes to mind a little bit quicker). I checked Mason’s numbers and there were a couple guys in Appendix A that were worse. Trade hims to Detroit and he would magically improve and be one of the great comeback stories. Play in Detroit long enough and some media folks would start talking about the Hall of Fame.
I assumed a 50% regression towards the mean of 0.922 for the rest of the population. My hope was to end up with an overall average save percentage of 0.922 for the upcoming year. That is in fact what I ended up with. I note that my 25% and 50% regression assumptions are not backed by any empirical data. I do not think that extra effort would result in a materially better estimate.
That is about it for question #2. Question #3 on team goal differentials should be a bit more interesting. I have not given much thought to my strategy yet, but would like to do something a bit more involved. Hopefully there will be some existing research out there that will be of use.