I am now three quarters of the way through the Oilers Hackathon contest questions. I received an email last week letting me know that I am competing against almost 500 other entrants. The deadline of February 15th is subject to change pending the resolution of the CBA negotiations. Translation: we might cancel the contest!! That would be a nice piece of PR for the Oil.
One interesting piece to the email was an indication that the 10 winners would be required to sign the confidentiality agreement before receiving their prize. The implication is that we aren’t subject to the CA at this point in time. I am going to revise my post on question #1 and discuss the nature of the players in Appendix A and what I think the Oilers are doing. I will hold off on actually listing the players for now.
For questions #3, I will outlay exactly what I did and include my predictions for each team. Below is question 3.
Predict the goal differential per regular season game ((goals for less goals against) divided by games played) for all thirty teams for the upcoming season.
I have a tendency to jump in and try to solve a question before taking some time to think about it and look at what might be out in the public domain. Such was the case here, as I made three different attempts at answering before looking around. There was some interesting stuff out there, but I couldn’t find a magic pill that would guide me to a good prediction. I made a fourth prediction. All four will be discussed in this post.
The fundamental challenge to this question is that there are so many things that go into goals scored for a team. Off the top of my head:
- Shots directed towards the goal
- Shots on goal
- Shooting percentage
- Number of power plays
- Effectiveness of the power play
- Change in team personnel
- Game situation (close game vs. a blowout)
For goals against, just spin that list around.
This question suggests to me that the Oilers are looking at statistical analysis from the wrong direction. To be curious about goal differential is not really conducive to actual decision making. Let’s say a contestant perfectly predicts the goal differential for all thirty teams. What next?
The Oilers would be much better off looking at one or more of the factors that lead to goals being scored. That work might lead to a subtle change in strategy or a change in their roster. Bye-bye Ben Eager and Darcy Hordichuk! The glass half full view is there seems to be some easy fixes for the Oilers that an analytics department could identify. Unfortunately the glass seems to be well below half full with management not terribly interested in original thinking.
This question brought to bear my limitations as it relates to statistical analysis. I simply do not have the ability to pull together and manipulate large amounts of data. It is one part lack of computer skills and one part deficiency in how I think about problems. I tip my hat to those guys out there that can really data mine and get us a little closer to answering some of the questions out there.
For my answer, I at least attempted to leave any personal biases at the door. I avoided using my gut, or tweaking the estimates made to fit my eye a little bit better.
To summarize, I made four attempts to predict the upcoming season’s goal differential per game for each team.
- Simply used the average Goal Differential (“GD”) for the past three years as my estimate.
- Adjust the GD for the past three years +/- the GD movement between the last two seasons.
- Adjust the GD for the past three years +/- 50% of the GD movement between the last two seasons.
- Start with last year’s goal differential per game for each team and adjust each team by a random amount, based on the +/- range of team GD changes year to year.
At the beginning I did not have the above methodologies in mind. I started by out defining goals for and goals against: shots on goal x shooting percentage. This left me with four variables to estimate: shots for; shooting percentage; shots against; and save percentage.
I decided to take a look at PDO over time for each team. From behindthenet.ca
“PDO is the sum of “On-Ice Shooting Percentage” and “On-Ice Save Percentage” while a player was on the ice. It regresses very heavily to the mean in the long-run: a team or player well above 1000 has generally played in good luck and should expect to drop going forward and vice-versa.”
PDO is also used at a team level. What I wanted to do was take a look over time to get a sense if teams had a consistent PDO or not.* If so, perhaps that would solve my shooting and saves percentage estimates in my equation. For each team, I looked at their PDO for the past three seasons and past seven seasons (For Anaheim I made a mistake and looked at the past six seasons. I was too lazy to rebuild their data with the missing season).
* I bastardized PDO a bit by framing it as a percentage: 100% equals 1000 PDO, 99% equals PDO of 990, etc. That’s what I get for doing the work and then looking up the precise definition of PDO.
I found this chart quite interesting. Team PDO appears to be fairly consistent over time. Some teams are consistently below 100% and some are above.* This ruled out any thoughts I may have had of assuming a constant shooting and save percentage across all teams (which would have left shots for and against as the key variable).
* Detroit is most interesting to me. The consistently have a PDO less than 100% (1000) yet are among the top teams each year. The recipe is fairly simple: outshoot your opponents significantly.
Teams that saw their PDO improve relative to their seven year average: Vancouver, Boston, Washington, Rangers, Detroit, St. Louis, Los Angeles, Phoenix, Dallas, Carolina, Colorado, Tampa, Minnesota and Toronto.
Teams that saw their PDO decline: Edmonton, Islanders, Blue Jackets, Winnipeg, Florida, Ottawa, Anaheim, Calgary, Montreal, New Jersey, Buffalo, Philadelphia, Pittsburgh, San Jose and Chicago.
There are exceptions, but the above is directionally a decent overview of the each team’s performance the past three years. For my analysis, this led me to the conclusion that I am better off using a team’s shooting and save percentage for the past three years vs. the past seven years.
I then took at look at each team’s shots for and against, again looking at three and seven years. I did not compile this work into a summary to paste into the post, so you will have to take my word for what it shows.
Unlike PDO, the number of shots taken change a fair bit over time and year to year. My brain did try to form patterns, looking for teams improving their shot totals and climbing up the standings. Generally speaking, increasing shots for or decreasing shots against is a recipe for an improved goal differential. Stop the presses!!
For the purposes of my answer, I was only interested in deciding between a three year and seven year average. I went with the three year average, which is consistent with my use of the three year PDO (actually, I am using three year shot and save percentages).
Summarizing this bit of work, all I am really doing is using the average of the prior three years as my estimate for the upcoming year. The attached chart summarizes with a comparison to each team’s actual ranking last year. GD/G equals goal differential per game. It equals the three year average: (shots for) x (shooting percentage) – (shots against) x (1-save percentage).
Originally, I wanted to adjust the above predicted GD/G for each team by some factor. I looked at the change in PDO over time and assumed the larger variances would repeat in the upcoming year. The end result was not all that different than the above chart and I had absolutely nothing to support my factors. So I scrapped it.
I do like the fact the estimate shows a bit of regression for teams like Washington, Chicago, New Jersey, St. Louis and Edmonton. That said, the regression in my estimate seems way too simple. I would be surprised to see Washington and Chicago move back to the top of the standings. I would be even more surprised to see Edmonton go right back to the basement.
The goal however is not to make an estimate that fits my eye. The next chart shows the 2011-2012 goal differential rankings against 2010-2011. It does show that there will be a handful of teams that move significantly from one year to the next. It confirms that I am wrong to even attempt to eyeball a predicted list for reasonableness.
So I moved on to my second attempt at a prediction. I started with the same foundation of the goal differential for the past three years. I then pulled the year over year movement in goal differential over the past two seasons. To derive my second estimate, I assumed the 2012-2013 goal differential change would be the same as it was last season.
The attached really could be summarized as ‘anti-regression’. Teams that made a big move in one direction are assumed to do so again. St. Louis rockets to the top of the rankings, while Ottawa, Colorado and Edmonton make big jumps up the standings. On the other side of the ledger, Tampa zooms all the way down to the bottom of the standings. It really does not make any sense.
Attempt number two is out and I was on to my third attempt at a prediction. I again used the goal differential for the past three years and the movement in goal differential from last season. This time I assumed the 2012-2013 goal differential movement would be half of what it was last season. Thought process simplified: my second attempt was crazy; maybe cutting the movement in half would be way less crazy.
The above does result in much smaller movements from the prior year. Top teams remain at the top and bottom teams stay near the bottom. This approach seems to look a bit more reasonable than simply assuming the same movement as the prior year. But does that mean it actually makes sense? Probably not. To the scrap pile with this attempt.
At this point, I had spent a few hours working through the numbers and a couple of hours writing about it. Then Christmas came and I put my computer away for a couple of days. Coming back to my work today, I decided to throw out everything I have done. I leave the first three attempts in this post in hopes of being ridiculed. If another contestant reads this post and it helps their thinking, that would also be fine with me.
Where I finally landed was that I have no ability to predict how teams will fare next year. The only solution to that is to take out of my hands any ability to influence the numbers. Enter stage right, a random number generator.
I assembled a list of teams based on their 2011-2012 goal differential figures and included the movement from the prior year. The movement in the league ranged from -0.65 to +0.75 goals per game. I learned that if your version of Excel is old enough, the “RANDBETWEN” function doesn’t exist, so I went -0.65 to +0.65 as my range of possible movement for the upcoming season.
The only influence I put on the process was the decision that the top four teams could not have a better goal differential per game in 2012-2013 and the bottom four teams could not fare any worse. Those eight teams are highlighted in yellow. My thinking there is a natural floor and ceiling to how good or bad a team could be. I can’t fathom a team have a goal differential per game greater than 1.0 for example.
For the other 22 teams, I then did a simple random number generator to determine if the team’s GD would increase or decrease from the previous season. I then ran the random number generator.
The only constraint to the process was that the sum of my team goal differentials needed to be very close to zero. The first time I ran it I had an overall league goal differential of 2.25. This of course is impossible. So I hit the F9 key a couple hundred times until the sum was 0.02. Close enough.
The resulting list has 12 of the 16 playoff teams from 2011-2012 returning to the dance. Stevie Yzerman’s Tampa Lightning shake off a poor season to finish 12th overall. The list projects a big drop off for San Jose and the defending Stanley Cup Champion Los Angeles Kings. Sadly, it also predicts a return to the basement for our Oilers. That may earn me negative points on my Hackathon entry. Who is projected to go #1 overall next year?
To summarize, I made four attempts to predict the upcoming season’s goal differential per game for each team.
- Simply used the average GD for the past three years as my estimate. This approach is ok, but fails to account for teams on the rise or on the drop.
- Adjust the GD for the past three years +/- the GD movement between the last two seasons. This “anti-regression” approach is clearly an incorrect approach.
- Adjust the GD for the past three years +/- 50% of the GD movement between the last two seasons. This approach is less wrong than #2, but still wrong.
- Start with last year’s goal differential per game for each team and adjust each team by a random amount, based on the +/- range of GD change year to year. The top four teams’ GD cannot improve and the bottom four teams’ GD cannot get worse. This approach infers that it is impossible to predict future GD/game and results in an almost bias free approach to deriving a prediction.
Unless I came back and revisit this question, I am going to submit the table earlier that represents methodology number four. I will get going on question #4 early in the New Year and will write about it in a couple of weeks. I hope everyone is having a nice holiday season.