The Oilers announced the second running of their contest to allow amateur stat-heads the opportunity to play around with their proprietary data. Up for grabs are 10 prizes, ranging from signed jerseys to a chance to work with the Oilers’ analytics group.
From the confidentiality acknowledgement.
“I have and will learn confidential or proprietary information relating to the past, existing and contemplated operations of RSC and the Edmonton Oilers Hockey Club (the “Oilers”), including but not limited to personal, business, hockey, financial, strategic, statistical and investment information about RSC and the Oilers, their respective employees, subsidiaries, parent companies or affiliates, and the websites containing the information in the Hackathon package (the “Websites”) (collectively, the “Confidential Information”).”
I signed up for the contest. I thought it would be an interesting chance to get a sense of how the Oilers use advanced stats to manage their business. To accompany my entry, I am going to post a few blogs outlining my experience.
Having received the Confidential Information, I am left somewhat disappointed. It is simply a large collection of unprocessed data. Most of it is readily available on the internet. I don’t see a single piece of information in the package that would be consistent with what they suggest we would receive other than a list of websites they find useful (which was interesting to see). Perhaps there is more to come in the future.
From the rules and regulations, the Oilers want me to answer four questions.
- Predict next regular season’s points/game for the players listed in appendix A.
- Predict next season’s even strength save percentage of the goaltenders listed in appendix A
- Predict the goal differential per regular season game ((goals for less goals against) divided by games played) for all thirty teams for the upcoming season.
- Conduct a predictive analysis of your choice on some dimension of potential value to the Oilers. The analysis must be testable in the upcoming season and judged on its difficulty, accuracy, clarity, and value
Given the data provided, we are left blind as to why the Oilers think these questions are important and how they have attempted to answer the questions in the past. What I don’t understand about the first three questions is how the Oilers end up with something that will influence their actions in the future. Either they are barking up the wrong tree, or this is just a bit of fun busywork for fans.
As mc79hockey points out, the first two questions are going to be extremely difficult to predict in a shortened season. If the Oilers are going to judge my prediction against a shortened 2012-2013 season, I will need a fair bit of luck on my side.
Given a reliance on luck, I will not exert much effort on this prediction. After quickly seeing what research is out there, I will apply some sort of factor against a starting points per game assumption for each individual. Intuitively, players peak by age 24, so I will likely assume a lower points per game figure across the board, save for those players who are aged 23 or less. Perhaps I will try to earn some bonus points by further reducing my estimate for those players who did not play in Europe during the lockout.
For the second question, even strength save percentage, I have read a fair bit about how little there is between goalies in the NHL. For the most part, save percentage is driven by the team and not the goalie. For this one, I will probably start with the league average save percentage and have that as a pulling force on the 2011-2012 figures. I will also look at career averages for the few goalies that consistently beat or fall short of the league average. Call it the Henrik Lundqvist and Steve Mason adjustments.
The third question asks for a prediction of the goal differential for all 30 teams. Intuitively this is an important question as goal differential will drive the number of points a team has. Teams that score more goals than they allow make the playoffs. Teams that allow more goals hit the golf course with the Oilers in early spring.
The fundamental question that is missing is what drives goal differential? I expect I will make use of Corsi, PDO or something similar to build up my expectation of goal differential. In a shortened season, it will be a crapshoot.
The fourth question is a chance for me to put together some sort of predictive analysis for their consideration. At this point, I do not know what I will do here, but it will likely be in the spirit of something fairly quick. There is a layer of dust on some power play work I attempted a couple of years ago. Maybe I use that as a predictor of where each team will end up in the standings.
Next week I think I will answer the first couple of questions. I will report back on how I ended up making my predictions.