Difference between revisions of "Smart Game Format"

From HexWiki
Jump to: navigation, search
m (inner link + headings)
(Started adding some details on the SGF format. Not finished.)
Line 6: Line 6:
  
 
SGF files are comprised of pairs of ''properties'' and ''property values'', each of which describes a feature of the game.
 
SGF files are comprised of pairs of ''properties'' and ''property values'', each of which describes a feature of the game.
 +
 +
== Description of the file format ==
 +
 +
The SGF file format is a text-based format (not a binary format). The description of the format can be separated into two parts:
 +
 +
1. Syntactic rules. This governs how SGF files are parsed. At this level, the format describes a very generic representation of abstract trees of key-value dictionaries. It can be used for all kinds of tree-like data and is not necessarily limited to game trees.
 +
 +
2. Semantic rules. This governs how specific keys and values should be interpreted in the context of game trees, and more specifically in the context of Hex. It also limits the set of possible keys and the types of values that may be associated with them.
 +
 +
The [http://www.red-bean.com/sgf/ official SGF specification] mixes syntactic and semantics concepts; for example, it distinguishes two different syntactic text types, Text and SimpleText, and it gives semantic rules for which of these types must be used in a given context and how they are to be parsed. By contrast, here we give an (equivalent) description that strictly separates syntax from semantics. This allows the file format to be parsed without any semantic knowledge, and it allows semantic properties to be checked without any knowledge of parsing. It also allows the same parser to be re-used for other, more general kinds of tree-like data.
 +
 +
=== Lexical structure ===
 +
 +
When reading SGF, the text is first converted to a sequence of lexical tokens. There are 8 different kinds of token:
 +
 +
* left parenthesis '('
 +
* right parenthesis ')'
 +
* semicolon ';'
 +
* left square bracket '['
 +
* right square bracket ']'
 +
* colon ':'
 +
* a key, which is a sequence of one or more upper-case ASCII letters
 +
* literal string, which is any sequence of:
 +
** characters except ':', ']', and '\'
 +
** two-character escape sequences, which consist of '\' followed by any character
 +
 +
A literal string starts immediately after a '[' or ':' token, and extends until the next unescaped ']' or ':'. No characters except ':', ']', and '\' have any special syntactic meaning in literal strings, and in particular, if '[', '(', ')', or ';' appear in a literal string, they are not interpreted as separate tokens. Literal strings may include whitespace, including newlines, and these are preserved. Whitespace is also preserved at the beginning or end of literal strings. If an unescaped '\' is immediately followed by a newline, both are removed. Semantic rules may place further limits on the string data, and may prohibit certain kinds of strings from including whitespace.
 +
 +
All of the tokens are expressed in the ASCII character set, except for literal string data, which can use any character set. The interpretation of this data is further described by semantic rules. White space before or after tokens is ignored except when it is part of a literal string. 
 +
 +
In current applications, keys consist of one or two upper-case ASCII letters, and some implementations may not recognize keys that are longer than 2 letters.
 +
 +
=== Syntactic structure ===
 +
 +
An SGF file describes a finitely branching ordered tree, possibly with more than one root (technically a forest). Moreover, at each node of the tree, there is a dictionary, which is a mapping from keys to certain kinds of structured values. We begin by describing the encoding of dictionaries.
 +
 +
==== Dictionaries ====
 +
 +
A ''tuple'' consists of the token '[', zero or more literal strings that are separated by ':', and the token ']'. Examples of tuples are:
 +
 +
[]
 +
[value]
 +
[value1:value2]
 +
[value1:value2:value3]
 +
[Values may be arbitrary strings of characters,
 +
including newlines and  other whitespace.
 +
Be aware\: the characters '\:', '\]', and '\\' must be escaped.
 +
Other characters \m\a\y be escaped but this is optional. Sequences
 +
such as \n have no special meaning; this is just another way to
 +
write the letter n.
 +
]
 +
 +
A ''binding'' consists of a key followed by one or more tuples. Examples are:
 +
 +
FF[4]
 +
AP[HexGui:0.9]
 +
AB[a1][a2][a3]
 +
C[This is a comment!]
 +
 +
Where a key is followed by more than one tuple, the data is intended to be unordered. In other words, the following are two ways of expressing exactly the same data:
 +
 +
AB[a1][a2]
 +
AB[a2][a1]
 +
 +
The data within each tuple is ordered. For example, the following are distinct:
 +
 +
AP[name:version]
 +
AP[version:name]
 +
 +
The semantic rules place further restrictions on how many tuples are allowed after certain keywords, and how many components are allowed in certain tuples. Current applications only permit 1 or 2 components per tuple.
 +
 +
A ''dictionary'' consists of zero more more bindings. The keys in any one dictionary ''must'' be distinct. Example:
 +
 +
AP[HexGui:0.9]FF[4]GM[11]SZ[11]
 +
 +
==== Tree structure ====
 +
 +
A ''node'' in the tree consists of the token ';' followed by a dictionary (remember that dictionaries can be empty).
 +
Here are some examples of nodes:
 +
 +
;AP[HexGui:0.9]FF[4]GM[11]SZ[11]
 +
;B[i3]
 +
;
 +
;AB[f4][g2]PL[B]
 +
 +
A ''tree'' is given by the following grammar:
 +
 +
tree ::= '(' node+ tree* ')'
 +
 +
Here, <code>node+</code> means a sequence of one or more nodes, and <code>tree*</code> means a sequence of zero or more trees. Trees are interpreted as follows: the tree
 +
 +
( node₁ node₂ node₃ ... nodeₙ tree₁ ... treeₖ )
 +
 +
has a single root <code>node₁</code> with a single child <code>node₂</code>, which has a single child <code>node₃</code> and so on until <code>nodeₙ</code>, which has ''k'' children <code>tree₁ ... treeₖ</code>. Note that it is possible that ''k'' = 0, in which case <code>nodeₙ</code> is a leaf; it is also possible that ''n'' = 1, in which case the entire tree is a leaf. Here are some examples:
 +
 +
(to be continued).
 +
 +
=== Semantic rules ===
 +
 +
(to be written).
  
 
== See Also ==
 
== See Also ==

Revision as of 06:41, 6 December 2022

The SGF file format is designed to store game records of board games for two players. It's a text-only, tree-based format.

Games stored in SGF format can easily be emailed, posted or processed with text-based tools.

The main purposes of SGF are to store records of played games and to provide features for storing annotated and analyzed games (e.g. board markup, variations).

SGF files are comprised of pairs of properties and property values, each of which describes a feature of the game.

Description of the file format

The SGF file format is a text-based format (not a binary format). The description of the format can be separated into two parts:

1. Syntactic rules. This governs how SGF files are parsed. At this level, the format describes a very generic representation of abstract trees of key-value dictionaries. It can be used for all kinds of tree-like data and is not necessarily limited to game trees.

2. Semantic rules. This governs how specific keys and values should be interpreted in the context of game trees, and more specifically in the context of Hex. It also limits the set of possible keys and the types of values that may be associated with them.

The official SGF specification mixes syntactic and semantics concepts; for example, it distinguishes two different syntactic text types, Text and SimpleText, and it gives semantic rules for which of these types must be used in a given context and how they are to be parsed. By contrast, here we give an (equivalent) description that strictly separates syntax from semantics. This allows the file format to be parsed without any semantic knowledge, and it allows semantic properties to be checked without any knowledge of parsing. It also allows the same parser to be re-used for other, more general kinds of tree-like data.

Lexical structure

When reading SGF, the text is first converted to a sequence of lexical tokens. There are 8 different kinds of token:

  • left parenthesis '('
  • right parenthesis ')'
  • semicolon ';'
  • left square bracket '['
  • right square bracket ']'
  • colon ':'
  • a key, which is a sequence of one or more upper-case ASCII letters
  • literal string, which is any sequence of:
    • characters except ':', ']', and '\'
    • two-character escape sequences, which consist of '\' followed by any character

A literal string starts immediately after a '[' or ':' token, and extends until the next unescaped ']' or ':'. No characters except ':', ']', and '\' have any special syntactic meaning in literal strings, and in particular, if '[', '(', ')', or ';' appear in a literal string, they are not interpreted as separate tokens. Literal strings may include whitespace, including newlines, and these are preserved. Whitespace is also preserved at the beginning or end of literal strings. If an unescaped '\' is immediately followed by a newline, both are removed. Semantic rules may place further limits on the string data, and may prohibit certain kinds of strings from including whitespace.

All of the tokens are expressed in the ASCII character set, except for literal string data, which can use any character set. The interpretation of this data is further described by semantic rules. White space before or after tokens is ignored except when it is part of a literal string.

In current applications, keys consist of one or two upper-case ASCII letters, and some implementations may not recognize keys that are longer than 2 letters.

Syntactic structure

An SGF file describes a finitely branching ordered tree, possibly with more than one root (technically a forest). Moreover, at each node of the tree, there is a dictionary, which is a mapping from keys to certain kinds of structured values. We begin by describing the encoding of dictionaries.

Dictionaries

A tuple consists of the token '[', zero or more literal strings that are separated by ':', and the token ']'. Examples of tuples are:

[]
[value]
[value1:value2]
[value1:value2:value3]
[Values may be arbitrary strings of characters,
including newlines and   other whitespace. 
Be aware\: the characters '\:', '\]', and '\\' must be escaped.
Other characters \m\a\y be escaped but this is optional. Sequences
such as \n have no special meaning; this is just another way to 
write the letter n.
]

A binding consists of a key followed by one or more tuples. Examples are:

FF[4]
AP[HexGui:0.9]
AB[a1][a2][a3]
C[This is a comment!]

Where a key is followed by more than one tuple, the data is intended to be unordered. In other words, the following are two ways of expressing exactly the same data:

AB[a1][a2]
AB[a2][a1]

The data within each tuple is ordered. For example, the following are distinct:

AP[name:version]
AP[version:name]

The semantic rules place further restrictions on how many tuples are allowed after certain keywords, and how many components are allowed in certain tuples. Current applications only permit 1 or 2 components per tuple.

A dictionary consists of zero more more bindings. The keys in any one dictionary must be distinct. Example:

AP[HexGui:0.9]FF[4]GM[11]SZ[11]

Tree structure

A node in the tree consists of the token ';' followed by a dictionary (remember that dictionaries can be empty). Here are some examples of nodes:

;AP[HexGui:0.9]FF[4]GM[11]SZ[11]
;B[i3]
;
;AB[f4][g2]PL[B]

A tree is given by the following grammar:

tree ::= '(' node+ tree* ')'

Here, node+ means a sequence of one or more nodes, and tree* means a sequence of zero or more trees. Trees are interpreted as follows: the tree

( node₁ node₂ node₃ ... nodeₙ tree₁ ... treeₖ )

has a single root node₁ with a single child node₂, which has a single child node₃ and so on until nodeₙ, which has k children tree₁ ... treeₖ. Note that it is possible that k = 0, in which case nodeₙ is a leaf; it is also possible that n = 1, in which case the entire tree is a leaf. Here are some examples:

(to be continued).

Semantic rules

(to be written).

See Also

Coordinates

External links

You can find more info in the Official Specifications Site.