CSF File Format: Difference between revisions
Styling updates, corrections based on executable analysis |
Add "Special case of decoding the value" and the notification about spaces in the label name |
||
Line 88: | Line 88: | ||
|'''LabelName'''<br>A '''non'''-zero-terminated string that is as long as the DWORD at 0x8 says. If it is longer, the rest will be cut off. | |'''LabelName'''<br>A '''non'''-zero-terminated string that is as long as the DWORD at 0x8 says. If it is longer, the rest will be cut off. | ||
|} | |} | ||
The first label in ra2md.csf can be found at 0x18.<br>'''Note:''' Spaces, tabs and line breaks will be formatted out of the label's name, therefore they cannot be used. | The first label in ra2md.csf can be found at 0x18.<br>'''Note:''' Spaces, tabs and line breaks will be formatted out of the label's name, therefore they cannot be used. However, although spaces will be formatted out, there is a label named "gui:password entry box label" in the ra2.csf file and ra2md.csf that comes with the game. | ||
===Values=== | ===Values=== | ||
Line 130: | Line 130: | ||
ValueData[i] = ~ValueData[i]; | ValueData[i] = ~ValueData[i]; | ||
} | } | ||
====Special case of decoding the value==== | |||
Although the value is a Unicode string, in Red Alert 2 & Yuri's revenge, the built-in game.fnt file mistakenly treat the Unicode code point as a Windows-2052 code point. Thus, for the characters of code point 128-159 (0x80-0x9F), the game treat these characters as Windows-2052 characters, unless the modder replace the game.fnt file with correct Unicode code points. In some mod and some editors, such characters, such as ‘’“”•, will be saved as Windows-2052 encoding. | |||
When writing a tool processing with CSF, it is recommended to let modders decide whether treat 128-159 as Windows-2052 (to be compatible with original RA2 font file) or as Unicode (this is the correct way, but modders must supply a correct font file). |
Revision as of 18:13, 4 December 2021
CSF files hold stringtables for RA2/YR (also for Generals/ZH and probably others).
For more information about what a CSF file is, go to the CSF page.
On this page you will find a guide to how the format is built up.
The Header
The header of a CSF file is 0x18 bytes long.
It is built up like this:
Offset | Type | Description |
---|---|---|
0x0 | char[4] | " FSC" CSF header identifier If this is not " FSC", the game will not load the file. |
0x4 | DWORD | CSF Version The version number of the CSF format. RA2, YR, Generals, ZH and the BFME series use version 3. Nox uses version 2. Nothing is known about the actual difference between the versions. Thanks to Siberian GRemlin for providing this information (see here)! |
0x8 | DWORD | NumLabels The total amount of labels in the stringtable. |
0xC | DWORD | NumStrings The total amount of string pairs in the stringtable. (A string pair is made up of a Unicode Value and an ASCII ExtraValue, a label can contain more than one such pair, but only the first pair's Value is ever actually used by the game.) |
0x10 | DWORD | (unused) This is not used by the game, which means it is useless. If you want, you can store an extra information tag there, if your program could use one (assuming you want to write a program that reads CSF files). |
0x14 | DWORD | Language The language value for this stringtable. See below for a list |
Language
The language DWORD can have the following values (others will be recognized as "Unknown"):
0 = US (English)* 1 = UK (English) 2 = German* 3 = French* 4 = Spanish 5 = Italian 6 = Japanese 7 = Jabberwockie 8 = Korean* 9 = Chinese* >9 = Unknown
* RA2/YR has been released in this language.
Labels
After the header, the label data follows.
A label can be considered an entry in the stringtable (e.g. "GUI:OK" is a label).
Each label has a name (ASCII string, e.g. "NAME:MTNK") and zero or more string pairs. As mentioned above, a string pair is made up of a Unicode Value (e.g. "Grizzly Tank") and an ASCII ExtraValue (no example in the original ra2.csf/ra2md.csf, not used by the game).
Now let's come to how the data is stored in the CSF file:
Label header
The label data begins with a label header, which is built up like this:
Offset | Type | Description |
---|---|---|
0x0 | char[4] | " LBL" Label identifier If this is not " LBL", the game will not recognize the following data as label data and read the next 4 bytes. |
0x4 | DWORD | Number of string pairs This is the number of string pairs associated with this label. Usual value is 1. |
0x8 | DWORD | LabelNameLength This value holds the size of the label name that follows. |
0xC | char[LabelNameLength] | LabelName A non-zero-terminated string that is as long as the DWORD at 0x8 says. If it is longer, the rest will be cut off. |
The first label in ra2md.csf can be found at 0x18.
Note: Spaces, tabs and line breaks will be formatted out of the label's name, therefore they cannot be used. However, although spaces will be formatted out, there is a label named "gui:password entry box label" in the ra2.csf file and ra2md.csf that comes with the game.
Values
Directly after the label header, the value data (string pairs) follows.
This is how it is built up:
Offset | Type | Description |
---|---|---|
0x0 | char[4] | " RTS" or "WRTS" Identifier " RTS" means that there is no Extra Value for this label. "WRTS" means that after the Value data, data for the Extra Value follows (see below). |
0x4 | DWORD | ValueLength This holds the length of the Unicode string (the Value) that follows. |
0x8 | byte[ValueLength*2] | Value This holds the encoded Value of the label. Note that this is ValueLength*2 bytes long, because the value is a Unicode string, i.e. every character is a word instead of a byte. To decode the value to a Unicode string, not every byte of the value data (or subtract it from 0xFF, see below for an example). |
0x8+ValueLength*2 | DWORD |
ExtraValueLength |
0x8+ValueLength*2+0x4 | char[ExtraValueLength] | ExtraValue Like the label name, a non-zero-terminated string that is as long as ExtraValueLength says. If it is longer, the rest will be cut off. |
Decoding the value
To decode the value to a Unicode string, not every byte of the value data (or subtract it from 0xFF).
An example in C++:
int ValueDataLength = ValueLength << 1; for(int i = 0; i < ValueDataLength; ++i) { ValueData[i] = ~ValueData[i]; }
Special case of decoding the value
Although the value is a Unicode string, in Red Alert 2 & Yuri's revenge, the built-in game.fnt file mistakenly treat the Unicode code point as a Windows-2052 code point. Thus, for the characters of code point 128-159 (0x80-0x9F), the game treat these characters as Windows-2052 characters, unless the modder replace the game.fnt file with correct Unicode code points. In some mod and some editors, such characters, such as ‘’“”•, will be saved as Windows-2052 encoding.
When writing a tool processing with CSF, it is recommended to let modders decide whether treat 128-159 as Windows-2052 (to be compatible with original RA2 font file) or as Unicode (this is the correct way, but modders must supply a correct font file).