Difference between revisions of "Elo rating"

Latest revision as of 00:44, 3 October 2024

The Elo rating system is a system for quantifying the relative strength of players in zero-sum games such as Hex, originally developed by Arpad Elo on behalf of the United States Chess Federation around 1960.

Assumptions

The odds of a player X winning against a player Y are defined as the ratio of X's number of wins to Y's number of wins, averaged over a large number of games between X and Y. For example, if X wins 75% of the time, X's odds of winning are 3 : 1 (or 3/1).

The Elo rating system is based on the assumption that odds are multiplicative, i.e., if the odds of X winning against Y are a : b and the odds of Y winning against Z are b : c, then the odds of X winning against Z are a : c.

Of course this assumption is not strictly true in practice, for a number of reasons. For example, if player X has a particular weakness that Y knows how to exploit, and player Y has a weakness that Z knows how to exploit, and player Z has a weakness that X knows how to exploit, it is quite possible that in matchups between two players, X on average loses to Y, Y on average loses to Z, and Z on average loses to X. Nevertheless, the assumption that odds are multiplicative seems to hold at least approximately in many practical situations.

The Elo rating is a relative rating. Specifically, the odds of X winning against Y are meant to be reflected in the difference between X's and Y's Elo rating. By design, Elo ratings do not have an absolute meaning, as the strengths of players are always calibrated in relation to that of other players they have played against. For example, there is no absolute definition of how good a player must be to have an Elo rating of 800 — and indeed, this can differ from game to game (e.g. Hex vs. chess), organization to organization, website to website, and may also change over time. It would be very difficult to objectively compare the strength of a player from 1942 to that of a player from 2020, since it is not likely that they have ever played against each other or against a common opponent. Moreover, players' strength changes over time. However, the statement "player X's Elo rating is 100 points higher than that of player Y" theoretically has a well-defined meaning (given the assumption on odds, above).

Definition

By definition, the Elo rating is calibrated in such a way that a difference of 400 Elo points corresponds to odds of 10 : 1.

In formulas, this means that the difference between two players' Elo ratings is 400 times the 10-based logarithm of the odds. Thus, if the odds of player X winning against player Y are a : b, then player X's true Elo rating exceeds that of player Y by 400 log₁₀ (a/b). Conversely, if player X's Elo rating exceeds that of player Y by N points, the odds of X winning against Y are 10^N/400 : 1.

In table form, the following table translates various relative Elo ratings to the corresponding odds and probabilities of winning:

Relative Elo rating	Odds of winning	Probability of winning
0	1.0000 = 1 : 1	50%
25	1.1548 ≈ 7 : 6	54%
50	1.3335 ≈ 4 : 3	57%
75	1.5399 ≈ 3 : 2	61%
100	1.7782 ≈ 7 : 4	64%
125	2.0535 ≈ 2 : 1	67%
150	2.3714 ≈ 7 : 3	70%
200	3.1623 ≈ 3 : 1	76%
250	4.2170 ≈ 4 : 1	81%
300	5.6234 ≈ 6 : 1	85%
350	7.4989 ≈ 7 : 1	88%
400	10.000 = 10 : 1	91%
500	17.783 ≈ 18 : 1	95%
600	31.623 ≈ 32 : 1	97%
700	56.234 ≈ 56 : 1	98%
800	100.00 = 100 : 1	99%

Calculating Elo ratings

A player's "true" Elo rating at a given moment (in relation to other players) is an abstract quantity that we could only calculate if we had perfect knowledge of that player's odds of winning against other players at that exact moment in time, as well as the other players' true Elo ratings. Since this is not possible, Elo ratings are instead calculated by an approximate method. This is done by updating both player's Elo ratings after each game based on three factors: each player's previous Elo rating and who won the game. The update is done by a simple formula that causes the calculated Elo rating to approximately track the "true" Elo rating over time.

Motivation

To explain how the method works, consider a player X playing against an opponent Y. Let us assume that initially, X's Elo rating is 100 points higher than that of Y. According to the above formula or table, this means that X is expected to win against Y about 64% of the time. Now suppose the players play a number of games. How should X's Elo rating be updated? This should be done in such a way that on average: (1) if X actually wins 64% of the time as predicted, then X's Elo rating should not change; (2) if X wins more than 64% of the time, X's Elo rating should increase; and (3) if X wins less than 64% of the time, X's Elo rating should decrease.

We achieve this by wagering some of X's Elo points on the outcome of each game. In each game, X will have the chance to win a certain number of points, say 10 points. If X's winning probability is truly 64%, this means that X will win 6.4 points on average. To compensate for this, X must pay 6.4 points at the beginning of each game. Now if X actually wins 64% of the games, then X wins on average the same number of Elo points that X paid, and X's rating does not change. But if X wins more than 64% of the games, then X wins on average more points than X paid, and X's rating increases. Similarly, if X wins less than 64% of the games, X's rating decreases.

The number of points that X can win in each game is called the K-factor of player X. In the above example, X's K-factor was 10. A larger K-factor causes ratings to change more quickly, but it also causes larger random fluctuations in ratings. A smaller K-factor causes ratings to change more slowly, but it also causes smaller random fluctuations. For this reason, it is common to give different K-factors to different players. Typically, new players will be given larger K-factors so that they have a chance to reach their "true" rating quickly. Established players will be given smaller K-factors, because it is likely that they are already in the vicinity of their true ratings, so it is better to minimize random ups and downs.

Since ratings are updated after each individual game, it does not matter if X plays against Y many times or just once, or indeed if X plays against many different opponents.

Summary of rating update

In summary, this is how X's Elo score is updated after playing against Y:

Let k be X's K-factor.
Let N be the difference of X's and Y's Elo ratings prior to the game (i.e., positive if X's rating was higher, and negative if Y's rating was higher).
Calculate X's predicted odds of winnings: odds = 10^N/400
Calculate X's predicted winning probability: p = odds / (odds + 1)
Take pk Elo points from X.
If X wins, give k Elo points to X.

Of course, Y's score is also updated according to the same method. If X and Y have the same K-factor, the result is that X receives exactly the same number of points that Y loses, and vice versa, so the total number of Elo points in the world remains constant. However, if X and Y have different K-factors, the player with the higher K-factor can win or lose additional points.

In games where a draw is a possible outcome, such as chess, the case of a draw is usually dealt with by giving X k/2 Elo points in the last step. This makes sense, because it is reasonable to consider a draw to be half-way between a win and a loss. In the case of Hex, no draw is possible, so this issue does not arise.

Details of specific Hex sites that use Elo ratings

Board Game Arena: The K-factor is k = 60 for the first 10 games, k = 40 for the next 10 games, and k = 20 thereafter. There are special rules for ratings under 100: Players rated below 100 can never lose Elo points, and win at least 1 Elo point for their first game. Once a player has reached Elo 100, they cannot go below 100 again, so 100 points is basically the baseline rating. The strongest players are rated in the 900s.

@@ Line 9: / Line 9: @@
 Of course this assumption is not strictly true in practice, for a number of reasons. For example, if player X has a particular weakness that Y knows how to exploit, and player Y has a weakness that Z knows how to exploit, and player Z has a weakness that X knows how to exploit, it is quite possible that in matchups between two players, X on average loses to Y, Y on average loses to Z, and Z on average loses to X. Nevertheless, the assumption that odds are multiplicative seems to hold at least approximately in many practical situations.
-The Elo rating is a ''relative'' rating. Specifically, the odds of X winning against Y are meant to be reflected in the ''difference'' between X's and Y's Elo rating. By design, Elo ratings do not have an absolute meaning, as the strengths of players are always calibrated in relation to that of other players they have played against. For example, there is no absolute definition of how good a player must be to have an Elo rating of 800 --- and indeed, this can differ from game to game (e.g. Hex vs. chess), organization to organization, website to website, and may also change over time. It would be very difficult to objectively compare the strength of a player from 1942 to that of a player from 2020, since it is not likely that they have ever played against each other or against a common opponent. Moreover, players' strength changes over time. However, the statement "player X's Elo rating is 100 points higher than that of player Y" theoretically has a well-defined meaning (given the assumption on odds, above).
+The Elo rating is a ''relative'' rating. Specifically, the odds of X winning against Y are meant to be reflected in the ''difference'' between X's and Y's Elo rating. By design, Elo ratings do not have an absolute meaning, as the strengths of players are always calibrated in relation to that of other players they have played against. For example, there is no absolute definition of how good a player must be to have an Elo rating of 800 — and indeed, this can differ from game to game (e.g. Hex vs. chess), organization to organization, website to website, and may also change over time. It would be very difficult to objectively compare the strength of a player from 1942 to that of a player from 2020, since it is not likely that they have ever played against each other or against a common opponent. Moreover, players' strength changes over time. However, the statement "player X's Elo rating is 100 points higher than that of player Y" theoretically has a well-defined meaning (given the assumption on odds, above).
 == Definition ==
@@ Line 15: / Line 15: @@
 By definition, the Elo rating is calibrated in such a way that a difference of 400 Elo points corresponds to odds of 10 : 1.
-In formulas, this means that the difference between two players' Elo ratings is 400 times the 10-based logarithm of the odds. Thus, if the odds of player X winning against player Y are ''a'' : ''b'', then player X's true Elo rating is supposed to exceed that of player Y by 400 log₁₀ (''b''/''a''). Conversely, if player X's Elo rating exceeds that of player Y by ''N'' points, the odds of X winning against Y are 10^(''N''/400) : 1.
+In formulas, this means that the difference between two players' Elo ratings is 400 times the 10-based logarithm of the odds. Thus, if the odds of player X winning against player Y are ''a'' : ''b'', then player X's true Elo rating exceeds that of player Y by 400 log₁₀ (''a''/''b''). Conversely, if player X's Elo rating exceeds that of player Y by ''N'' points, the odds of X winning against Y are 10<sup>''N''/400</sup> : 1.
 In table form, the following table translates various relative Elo ratings to the corresponding odds and probabilities of winning:
@@ Line 24: / Line 24: @@
 | 0 || 1.0000 = 1 : 1 || 50%
 |-
-| 20 || 1.1220 ≈ 9 : 8 || 53%
+| 25 || 1.1548 ≈ 7 : 6 || 54%
 |-
 | 50 || 1.3335 ≈ 4 : 3 || 57%
@@ Line 31: / Line 31: @@
 |-
 | 100 || 1.7782 ≈ 7 : 4 || 64%
+|-
+| 125 || 2.0535 ≈ 2 : 1 || 67%
 |-
 | 150 || 2.3714 ≈ 7 : 3 || 70%
@@ Line 61: / Line 63: @@
 To explain how the method works, consider a player X playing against an opponent Y. Let us assume that initially, X's Elo rating is 100 points higher than that of Y. According to the above formula or table, this means that X is expected to win against Y about 64% of the time. Now suppose the players play a number of games. How should X's Elo rating be updated? This should be done in such a way that on average: (1) if X actually wins 64% of the time as predicted, then X's Elo rating should not change; (2) if X wins more than 64% of the time, X's Elo rating should increase; and (3) if X wins less than 64% of the time, X's Elo rating should decrease.
-We achieve this by wagering some of X's points on the outcome of each game. In each game, X will have the chance to win a certain number of points, say 10 points. If X's winning probability is truly 64%, this means that X will win 6.4 points on average. To compensate for this, X must pay 6.4 points at the beginning of each game. Now if X actually wins 64% of the games on average, then X wins the same number of Elo points that X paid, and X's rating does not change. But if X wins more than 64% of the games, then X wins on average more points than X paid, and X's rating increases. Similarly, if X wins less than 64% of the games, X's rating decreases.
+We achieve this by wagering some of X's Elo points on the outcome of each game. In each game, X will have the chance to win a certain number of points, say 10 points. If X's winning probability is truly 64%, this means that X will win 6.4 points on average. To compensate for this, X must pay 6.4 points at the beginning of each game. Now if X actually wins 64% of the games, then X wins on average the same number of Elo points that X paid, and X's rating does not change. But if X wins more than 64% of the games, then X wins on average more points than X paid, and X's rating increases. Similarly, if X wins less than 64% of the games, X's rating decreases.
 The number of points that X can win in each game is called the ''K-factor'' of player X. In the above example, X's K-factor was 10. A larger K-factor causes ratings to change more quickly, but it also causes larger random fluctuations in ratings. A smaller K-factor causes ratings to change more slowly, but it also causes smaller random fluctuations. For this reason, it is common to give different K-factors to different players. Typically, new players will be given larger K-factors so that they have a chance to reach their "true" rating quickly. Established players will be given smaller K-factors, because it is likely that they are already in the vicinity of their true ratings, so it is better to minimize random ups and downs.
@@ Line 67: / Line 69: @@
 Since ratings are updated after each individual game, it does not matter if X plays against Y many times or just once, or indeed if X plays against many different opponents.
-=== Summary ===
+=== Summary of rating update ===
 In summary, this is how X's Elo score is updated after playing against Y:
@@ Line 73: / Line 75: @@
 * Let ''k'' be X's K-factor.
 * Let ''N'' be the difference of X's and Y's Elo ratings prior to the game (i.e., positive if X's rating was higher, and negative if Y's rating was higher).
-* Calculate X's predicted odds of winnings: ''odds'' = 10^(''N''/400)
+* Calculate X's predicted odds of winnings: ''odds'' = 10<sup>''N''/400</sup>
-* Calculate X's predicated winning probability: ''p'' = ''odds'' / (''odds'' + 1)
+* Calculate X's predicted winning probability: ''p'' = ''odds'' / (''odds'' + 1)
 * Take ''pk'' Elo points from X.
 * If X wins, give ''k'' Elo points to X.
-Of course, Y's score is also updated according to the same method.
+Of course, Y's score is also updated according to the same method. If X and Y have the same K-factor, the result is that X receives exactly the same number of points that Y loses, and vice versa, so the total number of Elo points in the world remains constant. However, if X and Y have different K-factors, the player with the higher K-factor can win or lose additional points.
+In games where a draw is a possible outcome, such as chess, the case of a draw is usually dealt with by giving X ''k''/2 Elo points in the last step. This makes sense, because it is reasonable to consider a draw to be half-way between a win and a loss. In the case of Hex, no draw is possible, so this issue does not arise.
 == Details of specific Hex sites that use Elo ratings ==
-* Boardgamearena: K-factor is ''k'' = 60 for the first 10 games, ''k'' = 40 for the next 10 games, and ''k'' = 20 thereafter. There are special rules for ratings under 100: Players rated below 100 can never lose Elo points, and win at least 1 Elo point for each game. Once a player has reached Elo 100, they cannot go below 100 again, so 100 points is basically the baseline rating. The strongest players are rated in the 800s.
+* [http://en.boardgamearena.com Board Game Arena]: The K-factor is ''k'' = 60 for the first 10 games, ''k'' = 40 for the next 10 games, and ''k'' = 20 thereafter. There are special rules for ratings under 100: Players rated below 100 can never lose Elo points, and win at least 1 Elo point for their first game. Once a player has reached Elo 100, they cannot go below 100 again, so 100 points is basically the baseline rating. The strongest players are rated in the 900s.
+[[category: Rules and Conventions]]

Difference between revisions of "Elo rating"

Latest revision as of 00:44, 3 October 2024

Contents

Assumptions

Definition

Calculating Elo ratings

Motivation

Summary of rating update

Details of specific Hex sites that use Elo ratings

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Tools