Difference between revisions of "Smart Game Format"

From HexWiki
Jump to: navigation, search
(Added some examples of trees.)
(Added descriptions of the main properties, and an example.)
Line 28: Line 28:
 
* colon ':'
 
* colon ':'
 
* a key, which is a sequence of one or more upper-case ASCII letters
 
* a key, which is a sequence of one or more upper-case ASCII letters
* literal string, which is any sequence of:
+
* a literal string, which is any sequence of:
** characters except ':', ']', and '\'
+
** any characters except ':', ']', and '\'
 
** two-character escape sequences, which consist of '\' followed by any character
 
** two-character escape sequences, which consist of '\' followed by any character
  
 
A literal string starts immediately after a '[' or ':' token, and extends until the next unescaped ']' or ':'. No characters except ':', ']', and '\' have any special syntactic meaning in literal strings, and in particular, if '[', '(', ')', or ';' appear in a literal string, they are not interpreted as separate tokens. Literal strings may include whitespace, including newlines, and these are preserved. Whitespace is also preserved at the beginning or end of literal strings. If an unescaped '\' is immediately followed by a newline, both are removed. Semantic rules may place further limits on the string data, and may prohibit certain kinds of strings from including whitespace.  
 
A literal string starts immediately after a '[' or ':' token, and extends until the next unescaped ']' or ':'. No characters except ':', ']', and '\' have any special syntactic meaning in literal strings, and in particular, if '[', '(', ')', or ';' appear in a literal string, they are not interpreted as separate tokens. Literal strings may include whitespace, including newlines, and these are preserved. Whitespace is also preserved at the beginning or end of literal strings. If an unescaped '\' is immediately followed by a newline, both are removed. Semantic rules may place further limits on the string data, and may prohibit certain kinds of strings from including whitespace.  
  
All of the tokens are expressed in the ASCII character set, except for literal string data, which can use any character set. The interpretation of this data is further described by semantic rules. White space before or after tokens is ignored except when it is part of a literal string.   
+
All of the tokens are expressed in the ASCII character set, except for literal string data, which can use any character set. Newlines can be encoded as NL, CR, CRNL, or NLCR. The interpretation of literal string data is further described by semantic rules. White space before or after tokens is ignored except when it is part of a literal string.   
  
 
In current applications, keys consist of one or two upper-case ASCII letters, and some implementations may not recognize keys that are longer than 2 letters.
 
In current applications, keys consist of one or two upper-case ASCII letters, and some implementations may not recognize keys that are longer than 2 letters.
Line 77: Line 77:
 
The semantic rules place further restrictions on how many tuples are allowed after certain keywords, and how many components are allowed in certain tuples. Current applications only permit 1 or 2 components per tuple.
 
The semantic rules place further restrictions on how many tuples are allowed after certain keywords, and how many components are allowed in certain tuples. Current applications only permit 1 or 2 components per tuple.
  
A ''dictionary'' consists of zero more more bindings. The keys in any one dictionary ''must'' be distinct. Example:
+
A ''dictionary'' consists of zero more more bindings. The keys in any one dictionary ''must'' be distinct, and their ordering is not significant. Example:
  
 
  AP[HexGui:0.9]FF[4]GM[11]SZ[11]
 
  AP[HexGui:0.9]FF[4]GM[11]SZ[11]
Line 138: Line 138:
 
=== Semantic rules ===
 
=== Semantic rules ===
  
(to be written).
+
Each key accepts specific types of values specified below. There are also rules about which node a given key may appear in, and which keys can or can't be combined with other keys within a node.
 +
 
 +
The SGF file format only allows 1-tuples (simple values, example <code>[11]</code>), 2-tuples (composite values, example <code>[name:version]</code>), and in some cases 0-tuples (empty values, example <code>[]</code>). As a special case, for keys that expect a simple value, the original SGF specification stipulates that ':' may appear unescaped in a literal string. Since this makes it impossible to parse a file without semantic information, we stipulate instead that if a key expects a simple value but receives a composite value, it should interpret the composite value as if its literal strings had been concatenated into a single literal string with ':' inserted between each string. In this way, the result is equivalent to the original SGF specification, but parsing can be done independently of the semantic interpretation.
 +
 
 +
Some common value types are:
 +
 
 +
* Number. Example: <code>[11]</code>.
 +
* Cell. Examples: <code>[a1]</code>, <code>[f6]</code>, <code>[ab28]</code>. A cell is denoted by a column label (one or more letters) and a row label (one or more digits). If there are more than 26 columns, they are labeled by base-26 alphabet numbers, i.e., the next columns after 'z' are 'aa', 'ab', 'ac', etc.
 +
* Move. This is either a cell name, or one of the special moves 'swap-sides', 'swap-pieces', 'pass', 'resign', 'forfeit'.
 +
* Text. This is a simple value consisting of arbitrary text, except that whitespace characters other than newlines (example: tab, vertical tab) are converted to spaces.
 +
* Simpletext. This is a simple value consisting of arbitrary text, except that whitespace characters including newlines are converted to spaces.
 +
 
 +
The SGF format defines a large number of keys, but many are not relevant to Hex, or are not supported by commonly used software. We only list the most commonly supported keys below. Others can be found at the [http://www.red-bean.com/sgf/ official specifications site]. Also, users and applications are free to define their own private keys, as long as they do not clash with existing ones. A possible convention is for private keys to start with 'X'. Applications that read SGF files should ignore keys that they do not know about, and if possible, should preserve them (i.e., when writing the same file again).
 +
 
 +
Note that the players are called B and W (black and white), regardless of the colors actually used in the original game. See also [[conventions]] for more information on how player colors relate to cell numbering.
 +
 
 +
==== Root keys ====
 +
 
 +
The following keys may appear at the root node of a game. They describe global attributes of the game, such as board size.
 +
 
 +
* '''AP'''. Value: composite name : version. This identifies the name and version of the application (typically software) that generated the SGF file. Example: <code>AP[HexGUI:0.10]</code>.
 +
* '''FF'''. Value: integer. This identifies the version of the SGF file format, currently 4. Example: <code>FF[4]</code>.
 +
* '''GM'''. Value: integer. This identifies the game. The value for Hex is 11. Example: <code>GM[11]</code>.
 +
* '''SZ'''. Value: integer, or composite integer : integer. This identifies the board size. For non-square boards, the number of columns is given before the number of rows, e.g. <code>SZ[6:7]</code> for a board with 6 columns and 7 rows (i.e., the distance between the white edges is smaller than the distance between the black edges). If the number of rows and columns is equal, it must be given as a single integer, e.g. <code>SZ[11]</code>.
 +
 
 +
The following keys are not currently supported by [[HexGui]]:
 +
 
 +
* '''PB''', '''PW'''. Value: simpletext. The name of the black player and white player, respectively. Example: <code>PB[Bill LeBoeuf]</code>.
 +
* '''RE'''. Value: simpletext. The result of the game. If given, it must be one of the following: 'B+' or 'W+' for a black or white win, respectively, 'Void' for no result (such as suspended play), '?' for an unknown result. Optionally, the method of winning can be specified after '+', as follows: 'B+R', 'B+Resign', 'W+R', or 'W+Resign' for win by [[resigning]], 'B+T', 'B+Time', 'W+T', 'W+Time' for a win on time, 'B+F', 'B+Forfeit', 'W+F', or 'W+Forfeit' for a win by forfeit.
 +
* '''DT'''. Value: date. The date on which the game was played, in the format 'YYYY-MM-DD'. There is support for partial dates and date ranges, see the [http://www.red-bean.com/sgf/ official specifications site] for details.
 +
* '''EV'''. Value: simpletext. The name of the event, e.g., tournament. Example: <code>EV[2022 Mind Sports Olympiad]</code>
 +
* '''GC'''. Value: text. Background information on the game, or a summary of the game itself. This free-form text not usually interpreted by software.
 +
* '''SO'''. Value: simpletext. The source of the game (e.g., book). This can be used to identify the website and table number for games played online. Example: <code>SO[BGA 123456789]</code>
 +
 
 +
==== Node keys ====
 +
 
 +
The following keys may appear at any node in a game. They describe attributes of the particular move or node. There are two kinds of nodes: move nodes and setup nodes. A move node holds one move by one player, including special moves such as 'swap-sides' or 'resign'. A setup node exists to set up a board position, for example, a special starting position for a game or puzzle, or a position that is used to explain some point in a game comment. Any node that does not define a B or W key is a setup node. The root node is always a setup node.
 +
 
 +
* '''B''', '''W'''. Value: move. A move by the black, respectively white, player. There can be at most one B or W key at a given node. Examples: <code>B[a3]</code>, <code>W[swap-pieces]</code>, <code>B[resign]</code>. A node that has no B or W key is a setup node.
 +
* '''PL'''. Value: 'B' or 'W'. Sets the player whose turn it is after the current move. This is especially used in conjunction with setup nodes, but can also be used for move nodes, say in certain handicap situations where a player gets two moves in a row.
 +
* '''AB''', '''AW''', '''AE'''. Value: list of cells. These are only allowed in setup nodes (including the root node of the game). In other words, they cannot be combined with keys B or W. The values of AB, AW, and AE are lists of cells to be occupied by black, white, or empty, respectively. The cell contents overwrite whatever was there before. In particular, AE can be used to empty a previously occupied cell. A setup node usually also receives a PL key to define whose turn it is. Example: <code>AB[e7][e8][e9]AW[a6][b6]AE[f8][g6][g7]PL[B]</code>.
 +
* '''C'''. Value: text. A human-readable comment for the given node. Comments are free-form, but it is good style to avoid referring to physical board directions since it is not know how the viewer will orient the board. So instead of "left edge" or "bottom edge", it might be better to refer to the "A-edge" or "11-edge". It probably makes sense to refer to the players as Black and White, regardless of what the original colors were. Example: <code>C[White is already connected to the K-edge.]</code>
 +
 +
The following key is only partially supported by [[HexGui]]:
 +
 
 +
* '''LB'''. Value: list of composite cell : simpletext. This assigns (preferably short) labels to cells. Example: <code>LB[a1:x][a2:y][a3:z]</code>
 +
 
 +
=== Example ===
 +
 
 +
Here is a small but complete example of a game with two branches, some comments, and a setup node:
 +
 
 +
(;AP[HexGui:0.10.GIT]FF[4]GM[11]SZ[7]C[Example game]
 +
  ;B[c5]C[This opening is too strong. White will definitely swap it.]
 +
  ;W[swap-pieces];B[c4];W[c5];B[a6];W[c6]C[Good.]
 +
  ;B[a7];W[b5];B[a5];W[b3]
 +
  (;B[d2]C[See the next variation for what happens if Black plays b4.]
 +
  ;W[b4];B[d4];W[e5];B[resign];)
 +
  (;B[b4];W[d2]
 +
  ;AB[a2][b2][c1][d1][d4][d5][e1][e5][f1][f5][g5]
 +
  C[Note that White is connected by templates, requiring only the area shown.])
 +
)
  
 
== See Also ==
 
== See Also ==
  
[[Coordinates]]
+
* [[Coordinates]]
 +
* [[Conventions]]
  
 
== External links ==
 
== External links ==

Revision as of 23:52, 6 December 2022

The SGF file format is designed to store game records of board games for two players. It's a text-only, tree-based format.

Games stored in SGF format can easily be emailed, posted or processed with text-based tools.

The main purposes of SGF are to store records of played games and to provide features for storing annotated and analyzed games (e.g. board markup, variations).

SGF files are comprised of pairs of properties and property values, each of which describes a feature of the game.

Description of the file format

The SGF file format is a text-based format (not a binary format). The description of the format can be separated into two parts:

1. Syntactic rules. This governs how SGF files are parsed. At this level, the format describes a very generic representation of abstract trees of key-value dictionaries. It can be used for all kinds of tree-like data and is not necessarily limited to game trees.

2. Semantic rules. This governs how specific keys and values should be interpreted in the context of game trees, and more specifically in the context of Hex. It also limits the set of possible keys and the types of values that may be associated with them.

The official SGF specification mixes syntactic and semantics concepts; for example, it distinguishes two different syntactic text types, Text and SimpleText, and it gives semantic rules for which of these types must be used in a given context and how they are to be parsed. By contrast, here we give an (equivalent) description that strictly separates syntax from semantics. This allows the file format to be parsed without any semantic knowledge, and it allows semantic properties to be checked without any knowledge of parsing. It also allows the same parser to be re-used for other, more general kinds of tree-like data.

Lexical structure

When reading SGF, the text is first converted to a sequence of lexical tokens. There are 8 different kinds of token:

  • left parenthesis '('
  • right parenthesis ')'
  • semicolon ';'
  • left square bracket '['
  • right square bracket ']'
  • colon ':'
  • a key, which is a sequence of one or more upper-case ASCII letters
  • a literal string, which is any sequence of:
    • any characters except ':', ']', and '\'
    • two-character escape sequences, which consist of '\' followed by any character

A literal string starts immediately after a '[' or ':' token, and extends until the next unescaped ']' or ':'. No characters except ':', ']', and '\' have any special syntactic meaning in literal strings, and in particular, if '[', '(', ')', or ';' appear in a literal string, they are not interpreted as separate tokens. Literal strings may include whitespace, including newlines, and these are preserved. Whitespace is also preserved at the beginning or end of literal strings. If an unescaped '\' is immediately followed by a newline, both are removed. Semantic rules may place further limits on the string data, and may prohibit certain kinds of strings from including whitespace.

All of the tokens are expressed in the ASCII character set, except for literal string data, which can use any character set. Newlines can be encoded as NL, CR, CRNL, or NLCR. The interpretation of literal string data is further described by semantic rules. White space before or after tokens is ignored except when it is part of a literal string.

In current applications, keys consist of one or two upper-case ASCII letters, and some implementations may not recognize keys that are longer than 2 letters.

Syntactic structure

An SGF file describes a finitely branching ordered tree, possibly with more than one root (technically a forest). Moreover, at each node of the tree, there is a dictionary, which is a mapping from keys to certain kinds of structured values. We begin by describing the encoding of dictionaries.

Dictionaries

A tuple consists of the token '[', zero or more literal strings that are separated by ':', and the token ']'. Examples of tuples are:

[]
[value]
[value1:value2]
[value1:value2:value3]
[Values may be arbitrary strings of characters,
including newlines and   other whitespace. 
Be aware\: the characters '\:', '\]', and '\\' must be escaped.
Other characters \m\a\y be escaped but this is optional. Sequences
such as \n have no special meaning; this is just another way to 
write the letter n.
]

A binding consists of a key followed by one or more tuples. Examples are:

FF[4]
AP[HexGui:0.9]
AB[a1][a2][a3]
C[This is a comment!]

Where a key is followed by more than one tuple, the data is intended to be unordered. In other words, the following are two ways of expressing exactly the same data:

AB[a1][a2]
AB[a2][a1]

The data within each tuple is ordered. For example, the following are distinct:

AP[name:version]
AP[version:name]

The semantic rules place further restrictions on how many tuples are allowed after certain keywords, and how many components are allowed in certain tuples. Current applications only permit 1 or 2 components per tuple.

A dictionary consists of zero more more bindings. The keys in any one dictionary must be distinct, and their ordering is not significant. Example:

AP[HexGui:0.9]FF[4]GM[11]SZ[11]

Tree structure

A node in the tree consists of the token ';' followed by a dictionary (remember that dictionaries can be empty). Here are some examples of nodes:

;AP[HexGui:0.9]FF[4]GM[11]SZ[11]
;B[i3]
;
;AB[f4][g2]PL[B]

A tree is given by the following grammar:

tree ::= '(' node+ tree* ')'

Here, node+ means a sequence of one or more nodes, and tree* means a sequence of zero or more trees. Trees are interpreted as follows: the tree

( node₁ node₂ node₃ ... nodeₙ tree₁ ... treeₖ )

has a single root node₁ with a single child node₂, which has a single child node₃ and so on until nodeₙ, which has k children tree₁ ... treeₖ:

Tree1.png

Note that it is possible that k = 0, in which case nodeₙ is a leaf; it is also possible that n = 1, in which case the entire tree is a leaf. Here are some examples:

Code Tree
(a) Tree2.png
(a b c) Tree3.png
(a (b (c))) Tree3.png
(a (b) (c)) Tree4.png
(a b (c) (d e (f (g) (h)) (i))) Tree5.png

Finally, an SGF file holds a sequence of one or more trees. (The idea of this is that a single file may hold more than one game record, each with its own root. However, in practice, most SGF files contain exactly one tree, and most software that reads SGF files will ignore all but the first tree in it).

To conclude this section, here is an example of a syntactically (but not semantically) well-formed SGF file representing the tree

Tree5.png.

Each node has a dictionary with a single key NN holding the node's label.

(;NN[a];NN[b](;NN[c])(;NN[d];NN[e](;NN[f](;NN[g])(;NN[h]))(;NN[i])))

Semantic rules

Each key accepts specific types of values specified below. There are also rules about which node a given key may appear in, and which keys can or can't be combined with other keys within a node.

The SGF file format only allows 1-tuples (simple values, example [11]), 2-tuples (composite values, example [name:version]), and in some cases 0-tuples (empty values, example []). As a special case, for keys that expect a simple value, the original SGF specification stipulates that ':' may appear unescaped in a literal string. Since this makes it impossible to parse a file without semantic information, we stipulate instead that if a key expects a simple value but receives a composite value, it should interpret the composite value as if its literal strings had been concatenated into a single literal string with ':' inserted between each string. In this way, the result is equivalent to the original SGF specification, but parsing can be done independently of the semantic interpretation.

Some common value types are:

  • Number. Example: [11].
  • Cell. Examples: [a1], [f6], [ab28]. A cell is denoted by a column label (one or more letters) and a row label (one or more digits). If there are more than 26 columns, they are labeled by base-26 alphabet numbers, i.e., the next columns after 'z' are 'aa', 'ab', 'ac', etc.
  • Move. This is either a cell name, or one of the special moves 'swap-sides', 'swap-pieces', 'pass', 'resign', 'forfeit'.
  • Text. This is a simple value consisting of arbitrary text, except that whitespace characters other than newlines (example: tab, vertical tab) are converted to spaces.
  • Simpletext. This is a simple value consisting of arbitrary text, except that whitespace characters including newlines are converted to spaces.

The SGF format defines a large number of keys, but many are not relevant to Hex, or are not supported by commonly used software. We only list the most commonly supported keys below. Others can be found at the official specifications site. Also, users and applications are free to define their own private keys, as long as they do not clash with existing ones. A possible convention is for private keys to start with 'X'. Applications that read SGF files should ignore keys that they do not know about, and if possible, should preserve them (i.e., when writing the same file again).

Note that the players are called B and W (black and white), regardless of the colors actually used in the original game. See also conventions for more information on how player colors relate to cell numbering.

Root keys

The following keys may appear at the root node of a game. They describe global attributes of the game, such as board size.

  • AP. Value: composite name : version. This identifies the name and version of the application (typically software) that generated the SGF file. Example: AP[HexGUI:0.10].
  • FF. Value: integer. This identifies the version of the SGF file format, currently 4. Example: FF[4].
  • GM. Value: integer. This identifies the game. The value for Hex is 11. Example: GM[11].
  • SZ. Value: integer, or composite integer : integer. This identifies the board size. For non-square boards, the number of columns is given before the number of rows, e.g. SZ[6:7] for a board with 6 columns and 7 rows (i.e., the distance between the white edges is smaller than the distance between the black edges). If the number of rows and columns is equal, it must be given as a single integer, e.g. SZ[11].

The following keys are not currently supported by HexGui:

  • PB, PW. Value: simpletext. The name of the black player and white player, respectively. Example: PB[Bill LeBoeuf].
  • RE. Value: simpletext. The result of the game. If given, it must be one of the following: 'B+' or 'W+' for a black or white win, respectively, 'Void' for no result (such as suspended play), '?' for an unknown result. Optionally, the method of winning can be specified after '+', as follows: 'B+R', 'B+Resign', 'W+R', or 'W+Resign' for win by resigning, 'B+T', 'B+Time', 'W+T', 'W+Time' for a win on time, 'B+F', 'B+Forfeit', 'W+F', or 'W+Forfeit' for a win by forfeit.
  • DT. Value: date. The date on which the game was played, in the format 'YYYY-MM-DD'. There is support for partial dates and date ranges, see the official specifications site for details.
  • EV. Value: simpletext. The name of the event, e.g., tournament. Example: EV[2022 Mind Sports Olympiad]
  • GC. Value: text. Background information on the game, or a summary of the game itself. This free-form text not usually interpreted by software.
  • SO. Value: simpletext. The source of the game (e.g., book). This can be used to identify the website and table number for games played online. Example: SO[BGA 123456789]

Node keys

The following keys may appear at any node in a game. They describe attributes of the particular move or node. There are two kinds of nodes: move nodes and setup nodes. A move node holds one move by one player, including special moves such as 'swap-sides' or 'resign'. A setup node exists to set up a board position, for example, a special starting position for a game or puzzle, or a position that is used to explain some point in a game comment. Any node that does not define a B or W key is a setup node. The root node is always a setup node.

  • B, W. Value: move. A move by the black, respectively white, player. There can be at most one B or W key at a given node. Examples: B[a3], W[swap-pieces], B[resign]. A node that has no B or W key is a setup node.
  • PL. Value: 'B' or 'W'. Sets the player whose turn it is after the current move. This is especially used in conjunction with setup nodes, but can also be used for move nodes, say in certain handicap situations where a player gets two moves in a row.
  • AB, AW, AE. Value: list of cells. These are only allowed in setup nodes (including the root node of the game). In other words, they cannot be combined with keys B or W. The values of AB, AW, and AE are lists of cells to be occupied by black, white, or empty, respectively. The cell contents overwrite whatever was there before. In particular, AE can be used to empty a previously occupied cell. A setup node usually also receives a PL key to define whose turn it is. Example: AB[e7][e8][e9]AW[a6][b6]AE[f8][g6][g7]PL[B].
  • C. Value: text. A human-readable comment for the given node. Comments are free-form, but it is good style to avoid referring to physical board directions since it is not know how the viewer will orient the board. So instead of "left edge" or "bottom edge", it might be better to refer to the "A-edge" or "11-edge". It probably makes sense to refer to the players as Black and White, regardless of what the original colors were. Example: C[White is already connected to the K-edge.]

The following key is only partially supported by HexGui:

  • LB. Value: list of composite cell : simpletext. This assigns (preferably short) labels to cells. Example: LB[a1:x][a2:y][a3:z]

Example

Here is a small but complete example of a game with two branches, some comments, and a setup node:

(;AP[HexGui:0.10.GIT]FF[4]GM[11]SZ[7]C[Example game]
 ;B[c5]C[This opening is too strong. White will definitely swap it.]
 ;W[swap-pieces];B[c4];W[c5];B[a6];W[c6]C[Good.]
 ;B[a7];W[b5];B[a5];W[b3]
 (;B[d2]C[See the next variation for what happens if Black plays b4.]
  ;W[b4];B[d4];W[e5];B[resign];)
 (;B[b4];W[d2]
  ;AB[a2][b2][c1][d1][d4][d5][e1][e5][f1][f5][g5]
  C[Note that White is connected by templates, requiring only the area shown.])
)

See Also

External links

You can find more info in the Official Specifications Site.