The Elo rating system originated in the mid-1900s and has since been predominantly used in chess rankings. On occasion Elo has been used for other sports, or for video games (e.g. League of Legends used the system until just recently). The system itself is very basic; it’s entirely based on wins and losses against other players. Simply put, your new skill will be measured based on the skill of your opponent, your expected performance given your opponent’s skill and your skill, and your actual skill (win, loss, draw, or anywhere in between) on the actual game.
Let’s get right into the math before discussing how it can be applied to tennis. So first off, there are two numbers that can basically be arbitrarily picked – the maximum possible change per match (called the K-factor, typically set between 16 and 32), and the average score (typically 1,000, 1,500, or some nice variation of). The system itself can be broken down into two formulas, your expected result and your new rating. First some declarations:
Let and represent the ratings of Player A and Player B, respectively.
Let and represent the expected scores of Player A and Player B, respectively (between 0 and 1 (loss and win)).
Let and represent the actual scores of Player A and Player B, respectively.
So the expected ratings ( or ) are based off of the logistic curve (i.e. players expected ratings should fall along the logistic curve):
Given the expected rating we can easily calculate the new score when combined with the actual results:
Now let’s think about how to apply this model to tennis. First, remember what we are trying to do. We need not one, but two separate ratings for players in order to fit our model; we need a player’s rating on serve and on return. This means that we need two separate Elo ratings for each player. Second, we need to recognize how the inputs/outputs will work. For Elo we need an input of the players’ current ratings. However, we cannot use serve rating for both players as an input; it is not logically sound to compare serve to serve. Remember that when one player is serving, the other player has to return, i.e. we need to look at one player’s serve rating and another player’s return rating per Elo rating (and then swap it afterwards to get the other half of the puzzle). Let us summarize this below in an example where we calculate Player A’s serve rating and Player B’s return rating:
- Player A’s current serve rating
- Player B’s current return rating
- Player A’s percentage won on serve
- Player B’s percentage won on return (which equals 1 – Player A’s percentage won on serve)
- Player A’s new serve rating
- Player B’s new return rating
All in all it’s a very simple model when you look at it. The only caveat here is that until chess Elo rating, we are not using wins and losses as a determinant for the score and expected score, but rather using a percentage.
And that’s all there is to it! So that being said, it was fairly simple in its essence, and for the most part it is a good rough estimation of skill on one-on-one player games. Yet it has its limitations. For example, Elo does not take into account time as a factor. Sure a player may have won a match ten years ago. That clearly is not a strong indicator of their skill today. Another limitation is that it gives no indication of variance (something that the Glicko2 model offers). That being said there are slight alternatives to Elo that may work when adjusted. Specifically, in a future post, I will discuss Elo++ developed by Yannis Sismanis for a Kaggle competition. Elo++ offers a unique solution to several of the problems encountered with basic Elo and we shall explore those soon.