The Leaguevine Blog

The Leaguevine Blog entries labeled with the tag 'ultimate'

Power Rankings in Ultimate

Posted on October 8th, 2012 by Mark "Spike" Liu
This is a guest post by tournament director Chris Schaffner.

The Question

What is the best way of ranking ultimate teams based on their performance? This fundamental question can be asked both in the context of a whole season or for a single tournament. While USA Ultimate's use of a season-based ranking algorithm has lead to quite some discussions, I will in this blog post focus on the simpler case of ranking teams at tournaments, where various factors such as home-field advantage, changing rosters over a season etc. play a smaller role. One possibility currently used in the Swissdraw format is that teams earn "swiss points" at the completion of every game. The number of swiss points awarded depends on the point differential (also called margin) of the game. So far, we have been using the following table to convert point differentials into swiss points:



There are a number of advantages in using the innovative Swissdraw format compared to the more common pool format. All teams can potentially play each other. The Swissdraw format is designed so that teams of similar strength match up quickly and within a few rounds, the ranking of the teams represents their level of play. This guarantees attractive games against different opponents of comparable strength. While I am personally a big fan of the Swissdraw format, I will argue that the currently used system to rank teams has the problem that teams are awarded the same amount of Swiss points for a certain point differential, independent of the strength of the opponent. This drawback has particularly bad consequences in big divisions with a widespread level of play, where in later rounds of Swiss draw, teams can still make big jumps in the ranking by winning/losing by big margins. In this post, I would like to suggest another method of ranking the teams which will make the Swissdraw format work even better.

Power Rankings

This method has been suggested already back in 1976 by Leake in the context of ranking American college football teams. A nice mathematical explanation of it can be found in Chapter 4 of Ken Massey's undergrad thesis.

The basic assumption is that every team can be assigned a numerical value representing its strength (or power) so that the point differential in a game is the difference in strength of the participating teams. For example, if Team Alice wins against Team Danny with a score of 15-10, this result could be explained by assigning a strength of +2.5 to Team Alice and a strength of 2.5 to Team Danny. (Of course, any two numbers with a difference of +5 would work, but let us try to keep the numbers as small as possible in absolute value.)

If there are more teams with many games played among them, it will become more difficult to assign strengths to the teams, but we can nevertheless try to optimize these numbers so that they fit the outcomes as well as possible. In fact, this problem is well-studied in the area of mathematical regression.
In more mathematical terms, we assume that the game outcome yij between teams i and j is the difference of their according strength βi - βj plus some error term ϵi,j which is independent and identically normally distributed for every game. Expressed in matrix form, we can write
y=Xβ+ϵ
where y is a column vector containing all game margins, the rows of the matrix X are all-zero except for a +1 in column i and a 1 in column j. The column vector β denotes the strength of the teams and ϵ is the column vector of the error terms with normal distribution.
There exist efficient methods (such as the least-square method) to compute strength vectors β that minimize the square of the errors:
yXβ2.

Example

Let us consider a simple example based on a made-up tournament with six teams. The game outcomes reflect the imaginary fact that their level of play is about evenly spread out among the top three teams and the bottom three teams where Team Alice is the best and Team Fred the worst team.
The results of the first round are as follows:



resulting in the following strength values and swiss points:


PowerRank denotes the team's rank according to their strength, whereas SwissRank is the team's rank according to the amount of Swiss points earned so far.

All game outcomes can be perfectly explained with those strengths. After the first round (and assuming no prior knowledge of the strength of these teams), it is impossible to compare Team Alice with Team Bob, because there is no connection between them yet.

After a second round with the following results:



We can compute the following strength values, Swiss points and according ranks:


Notice that Team Charlie would be ranked first if sorted according to swiss points, as it had the largest marginal of +5 in the second round among the winners of the first round. Analogously, Team Danny would be ranked last. However, the new method re-evaluates all previous games from the point of view of the latest results, with the (correct) outcome that assigning the biggest strength to Alice gives the best explanation of the results. In fact, the power ranks after only two rounds already reflect the order of teams we had in mind when making up the results.


The seventh column (entitled "predicted margin") is the difference in current strength of the teams involved in a particular game which can be interpreted as the margin predicted by the strength. The values in the last columns are the squared differences of the actually observed and predicted margins. If such a value is high, the model could not predict this game outcome well. Hence, big values stand for surprising game outcomes.
The least-square procedure tries to find strength values that minimize the sum of the surprise values in the last column.

Playing one more round with results:



gives:


By now the teams are clearly separated in strength. However, notice that the Swiss points still do not reflect the strengths of the teams correctly. Sorting according to number of wins as first criterion (and Swiss points as second) would put Alice on first place (she is the only one with three wins), but it would still place Team Eve ahead of Team Danny (both have one win and two losses).

Let us examine the example graphically in the following chart. Clicking on series in the legend will toggle its visibility. Clicking on particular points in the chart will show detailed explanations how that strength was obtained.
1: Team Alice2: Team Bob3: Team Charlie4: Team Danny5: Team Eve6: Team Fred123Rounds-6-4-20246Strength1: Team Alice: 3 (1st)2: Team Bob: 2.33 (2nd)3: Team Charlie: 2.33 (2nd)4: Team Danny: -2.33 (4th)5: Team Eve: -2.33 (4th)6: Team Fred: -3 (6th)Highcharts.comExport to raster or vector imagePrint the chart
For comparison, let us consider the graph of average Swiss points:
Average Swiss scores1: Team Alice2: Team Bob3: Team Charlie4: Team Danny5: Team Eve6: Team FredR1R2R3Rounds7.51012.51517.52022.5Average Swiss Scores1: Team Alice: 18 (2nd)2: Team Bob: 17 (3rd)3: Team Charlie: 20 (1st)4: Team Danny: 10 (6th)5: Team Eve: 13 (4th)6: Team Fred: 12 (5th)Highcharts.comExport to raster or vector imagePrint the chart
The Swiss score does not reflect the correct order of the teams, neither after round 2 nor after round 3. In contrast, the power ranking "gets it right" already after two rounds. The frequent crossings of the lines indicates that teams make jumps in their placement as illustrated here: (click on series in the legend to toggle visibility of the power ranks)
Ranks1: Team Alice1: Team Alice (power)2: Team Bob2: Team Bob (power)3: Team Charlie3: Team Charlie (power)4: Team Danny4: Team Danny (power)5: Team Eve5: Team Eve (power)6: Team Fred6: Team Fred (power)R1R2R3Rounds01234567Rank1: Team Alice: 22: Team Bob: 33: Team Charlie: 14: Team Danny: 65: Team Eve: 46: Team Fred: 5Highcharts.comExport to raster or vector imagePrint the chart


Here are the final evaluations of the games, based on the team's strength after Round 3, sorted by the upset/surprise value.


The first line can be read as follows: based on the strengths computed based on all game results, the biggest surprise of all games happened in the round-2 game, where Team Charlie won against Team Danny with a margin of +5 where the model predicted a margin of only +3.73.

Equipped with all this knowledge you can dive into the power scores for Windmill Windup and Wisconsin Swiss provided in the Further Analysis section at the end.

Conclusion

There exist a wide variety of sports ranking systems. Most of these systems do not reveal the details of their algorithms. The power-rating system presented here is a very basic variant that I suggest to use for ranking teams e.g. in the Swissdraw phase of an ultimate tournament. As outlined in Chapter 4 of Massey's thesis, the system could be extended in various ways to account for things like home-field advantage, blow-out scores etc. As illustrated in the example above, it converges faster to the real ranking of the teams.

Here is a list of pros and cons compared to the currently used swiss-point system:
Pros:
  1. Your strength depends on the performance of your opponents. A large win against a strong opponent counts more than against a weak one.
  2. Converges faster to the "real" ranking, smaller jumps in rankings from one round to the next.
  3. Strengths say more about teams than swiss points (e.g. the difference in strength of two teams directly predicts the game outcome and point differential. This also allows us to get a sense of "surprising" results. This information might be of interest for spectators live at a tournament or when reporting about it afterwards.)
Cons:
  1. Difficult to understand
  2. Games of previous rounds are "re-evaluated", you never have a certain amount of points for sure.
  3. Your strength depends on the performance of your opponents.
I think that the first concern can be mitigated by providing interactive graphs like the ones above to help the teams explaining how their strength was computed. The other two disadvantages are inherent to the system.

I am very curious to hear what you think about the suggested power-ratings in Ultimate. Do you see more advantages and disadvantages? Please leave your comments below.

Further Analysis

To see how these power rankings apply to events in the past, please take a look at some in depth analysis for 5 popular tournaments:

Further Discussion

If you want to get involved in discussions about ranking teams and devising optimal ranking algorithms, please take a second to join this new Google Group: https://groups.google.com/forum/#!forum/sport-stats

References:

  • Leake, R. J. (1976), "A Method for Ranking Teams with an Application to 1974 College Football," Management Science in Sports, North Holland.
  • Massey, Kenneth (1997), "Statistical Models Applied to the Rating of Sports Teams", Honors Project: Mathematics, Bluefield College. http://masseyratings.com/theory/massey97.pdf
Christian Schaffner has been managing the Swissdraw schedule and scores since 2009 at Windmill Windup, Europe's largest grass tournament. Since then, he has been involved in promoting and advancing the Swissdraw format for Ultimate tournaments. In his everyday life, he is a researcher in quantum cryptology at the University of Amsterdam.