Difference between revisions of "CSF File Format"

Revision as of 05:33, 14 February 2024

CSF files hold stringtables for RA2/YR (also for Generals/ZH and probably others).
For more information about what a CSF file is, go to the CSF page.

On this page you will find a guide to how the format is built up.

The Header

The header of a CSF file is 0x18 bytes long.
It is built up like this:

Offset	Type	Description
0x0	char[4]	" FSC" CSF header identifier If this is not " FSC", the game will not load the file. By the way, " FSC" means "CSF" file.
0x4	DWORD	CSF Version The version number of the CSF format. RA2, YR, Generals, ZH and the BFME series use version 3. Nox uses version 2. Nothing is known about the actual difference between the versions. Thanks to Siberian GRemlin for providing this information (see here)!
0x8	DWORD	NumLabels The total amount of labels in the stringtable.
0xC	DWORD	NumStrings The total amount of string pairs in the stringtable. (A string pair is made up of a Unicode Value and an ASCII ExtraValue, a label can contain more than one such pair, but only the first pair's Value is ever actually used by the game.)
0x10	DWORD	(unused) This is not used by the game, which means it is useless. If you want, you can store an extra information tag there, if your program could use one (assuming you want to write a program that reads CSF files).
0x14	DWORD	Language The language value for this stringtable. See below for a list

Language

The language DWORD can have the following values (others will be recognized as "Unknown"):

 0 = US (English)*
 1 = UK (English)
 2 = German*
 3 = French*
 4 = Spanish
 5 = Italian
 6 = Japanese
 7 = Jabberwockie
 8 = Korean*
 9 = Chinese*
>9 = Unknown

* RA2/YR has been released in this language.

Labels

After the header, the label data follows.

A label can be considered an entry in the stringtable (e.g. "GUI:OK" is a label).
Each label has a name (ASCII string, e.g. "NAME:MTNK") and zero or more string pairs. As mentioned above, a string pair is made up of a Unicode Value (e.g. "Grizzly Tank") and an ASCII ExtraValue (no example in the original ra2.csf/ra2md.csf, not used by the game).

Now let's come to how the data is stored in the CSF file:

Label header

The label data begins with a label header, which is built up like this:

Offset	Type	Description
0x0	char[4]	" LBL" Label identifier If this is not " LBL", the game will not recognize the following data as label data and read the next 4 bytes. " LBL" means "Label".
0x4	DWORD	Number of string pairs This is the number of string pairs associated with this label. Usual value is 1.
0x8	DWORD	LabelNameLength This value holds the size of the label name that follows.
0xC	char[LabelNameLength]	LabelName A non-zero-terminated string that is as long as the DWORD at 0x8 says. If it is longer, the rest will be cut off.

The first label in ra2md.csf can be found at 0x18.
Note: Spaces, tabs and line breaks will be formatted out of the label's name, therefore they cannot be used. However, although spaces will be formatted out, there is a label named "gui:password entry box label" in the ra2.csf and ra2md.csf file that comes with the game.
Note: The label name is case-insensitive. If a label name is shown up for more than once, the last item will actually be loaded by the game.

Values

Directly after the label header, the value data (string pairs) follows.
This is how it is built up:

Offset	Type	Description
0x0	char[4]	" RTS" or "WRTS" Identifier " RTS" means that there is no Extra Value for this label. "WRTS" means that after the Value data, data for the Extra Value follows (see below). Everything else is invalid. " RTS" means "string", "WRTS" means "string wide".
0x4	DWORD	ValueLength This holds the length of the Unicode string (the Value) that follows.
0x8	byte[ValueLength*2]	Value This holds the encoded Value of the label. Note that this is ValueLength2 bytes long, because the value is a Unicode (inverse code of unsigned int UTF-16-LE) string, i.e. every character is a word instead of a byte. To decode the value to a Unicode string, not* every byte of the value data (or subtract it from 0xFF, see below for an example).
0x8+ValueLength*2	DWORD	ExtraValueLength This holds the length of the extra value string that follow. This and the following line only exists if the identifier is "WRTS" and not " RTS".
0x8+ValueLength*2+0x4	char[ExtraValueLength]	ExtraValue Like the label name, a non-zero-terminated string that is as long as ExtraValueLength says. If it is longer, the rest will be cut off.

Decoding the value

To decode the value to a Unicode string, not every byte of the value data (or subtract it from 0xFF).
An example in C++:

int ValueDataLength = ValueLength << 1;
for(int i = 0; i < ValueDataLength; ++i) {
  ValueData[i] = ~ValueData[i];
}

Special case of decoding the value

Although the value is a Unicode string, in Red Alert 2 & Yuri's revenge, the built-in game.fnt file mistakenly treat the Unicode code point as a Windows-1252 code point. Thus, for the characters of code point 128-159 (0x80-0x9F), the game treat these characters as Windows-1252 characters, unless the modder replace the game.fnt file with correct Unicode code points. In some mods and some editors, such characters, such as ‘’“”•, will be saved as Windows-1252 encoding. See https://i18nqa.com/debug/table-iso8859-1-vs-windows-1252.html for the difference.

When reading CSF files, it is recommended to let the user decide whether to treat these characters (0x80-0x9F) as Windows-1252 (as some mods mistakenly stored the wrong code point), or to treat these characters as Unicode (a correct font file is required). But when saving CSF, always treat these characters as the correct Unicode code point. Because in the original game.fnt file, except for Trade Mark Sign ™, other influenced characters has their correct font data in Unicode code point.

Example: a CSF file may mistakenly contains a ’ character at code point 0x92 (Windows-1252). When saving, this character should be corrected to code point 0x2019.

Summary and Example

Summary

The format of an CSF file can be summarized like this.

**CSF Format**
FSC	Ver	NumL	NumS	null	Lang
LBL	NumP	LenL	L
RTS	LenS	S
WRTS	LenS	S		LenX	X

Parameters Lenth

Expect the LabelName(L), StringValue(S) and ExtraValue(X), all data and identifiers are 4 bytes lenth.

Encoding

StringValue(S) is encoded by inverse code of UTF-16-LE. Identifiers, LabelName(L) and ExtraValue(X) are encoded by ascii but identifiers are in little endian(LE) order(left side is the lower set). Other parameters are 4 bytes hexadecimal integer with LE order.

Useful Parameters

Usually we only need " LBL", " RTS"/"WRTS", LabelName(L), StringValue(S), LabelNameLength(LenL) and StringValueLength(LenS). The LenL and LenS are needed to cut the references that might exist while reading the file.

Useless Parameters

Other parameters as " FSC", CSFVersion(Ver), NumLabels(NumL), NumStrings(NumS), UnusedSet(null), Language(Lang), NumOfStringPairs(NumP), ExtraValueLength(LenX) and ExtraValue(X) are useless.

Most of times, games will not read or use Ver, null, Lang, LenX and X, just set Ver=3, Lang=0 and null=0x00000000 while writing the CSF file.
" FSC" and NumP are static parameters in usual, just set NumP=1 and doesn't need to change.
NumL, NumS can be calculate by L and S while writing the CSF.

Example

This is an example in stringtable09.csf extracted from expandmo98.mix of Mental Omega version 3.3.6.

Before decode

(line breaks have been inserted by identifiers)

1
2
3
4
5
⋮
2874
2875
⋮
7862
7863

\x20\x46\x53\x43\x03\x00\x00\x00\x5b\x0f\x00\x00\x5b\x0f\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x20\x4c\x42\x4c\x01\x00\x00\x00\x0c\x00\x00\x00\x4d\x53\x47\x3a\x50\x69\x6e\x67\x49\x6e\x66\x6f
\x20\x52\x54\x53\x0b\x00\x00\x00\xaf\xff\x96\xff\x91\xff\x98\xff\xdf\xff\xc2\xff\xdf\xff\xda\xff\x9b\xff\x92\xff\x8c\xff
\x20\x4c\x42\x4c\x01\x00\x00\x00\x0c\x00\x00\x00\x4d\x53\x47\x3a\x43\x72\x69\x74\x69\x63\x61\x6c
\x20\x52\x54\x53\x04\x00\x00\x00\x0b\x7e\x82\xab\xac\x9d\x04\xae
…
\x20\x4c\x42\x4c\x01\x00\x00\x00\x0b\x00\x00\x00\x56\x4f\x58\x3a\x63\x65\x76\x61\x30\x30\x31
\x57\x52\x54\x53\x0c\x00\x00\x00\x99\x74\xb5\xab\xe5\x00\x59\xb0\xb4\x92\xcf\xad\xc7\x97\xc6\xa0\x2e\xac\xfb\xa3\x6a\xb1\xfd\xcf\x08\x00\x00\x00\x63\x65\x76\x61\x30\x30\x31\x63
…
\x20\x4c\x42\x4c\x01\x00\x00\x00\x0e\x00\x00\x00\x4e\x61\x6d\x65\x3a\x53\x6f\x76\x46\x69\x6e\x61\x6c\x65
\x20\x52\x54\x53\x04\x00\x00\x00\x30\x7d\xab\x7f\x37\x81\xc5\xa8

After decode

1
2
3
4
5
⋮
2874
2875
⋮
7862
7863

 FSC[3][3931][3931][0][0]
 LBL[1][12]MSG:PingInfo
 RTS[11]Ping = %dms
 LBL[1][12]MSG:Critical
 RTS[4]致命打击
…
 LBL[1][11]VOX:ceva001
WRTS[12]警告：侦测到核弹发射井。[8]ceva001c
…
 LBL[1][14]Name:SovFinale
 RTS[4]苏联终场

Decode Script

Python

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68

# coding=UTF-8
import os
# set the path stored CSF files
csf_path=r"C:\Users\Keigetsu\Documents"
# csf_path=r"/storage/emulated/0/1/CSF"
out_json_dir=os.path.join(csf_path, "CSF.json")
out_ini_dir=os.path.join(csf_path, "CSF.ini")
# create output json file and write a header in cover mode
with open(out_json_dir, "w", encoding="UTF-8") as f:
    f.write("{\n")
# clear output ini file in cover mode
with open(out_ini_dir, "w", encoding="UTF-8") as f:
    pass
# read the folder to list directories of CSF
csf_dir_list=list()
for fn in os.listdir(csf_path):
    if os.path.splitext(fn)[1]==".csf":
        csf_dir_list.append(os.path.join(csf_path, fn))

def intb(b: bytes) -> int:
    """convert csf bytes as an integer number"""
    return int.from_bytes(b, "little")

# read CSFs in list
need_comma_json=False
for csf_dir in csf_dir_list:
    # filename no extension
    fnnx=os.path.basename(csf_dir)[:os.path.basename(csf_dir).rfind(".")]
    # read CSF content as bytes
    with open(csf_dir, "rb") as f:
        csf=f.read()
    # cut CSF content to a list by " LBL"
    csf_list=csf.split(b" LBL")[1:]
    csf_dict=dict()
    # cut every label to key-value by " RTS"/"WRTS"
    for i in range(len(csf_list)):
        if b" RTS" in csf_list[i]:
            csf_list[i]=csf_list[i].split(b" RTS")
        else:
            csf_list[i]=csf_list[i].split(b"WRTS")
        csf_key=csf_list[i][0][8:8+intb(csf_list[i][0][4:7])].decode("ASCII")
        csf_val_b=csf_list[i][1][4:4+2*intb(csf_list[i][1][0:3])]
        csf_val=bytes([0xff-b for b in csf_val_b]).decode("UTF-16-LE")
        csf_val=csf_val.replace("/n", "\n").replace("\n", "\\n")
        # store key-value as an dictionary
        csf_dict[csf_key]=csf_val
    # write the json file in addition mode
    with open(out_json_dir, "a", encoding="UTF-8") as f:
        if need_comma_json:
            f.write(",\n")
        f.write("    \""+fnnx+"\":\n    {\n")
        need_comma_line=False
        for key in csf_dict:
            if need_comma_line:
                f.write(",\n")
            f.write("        \""+key+"\": \""+csf_dict[key].replace("\"", "\\\"")+"\"")
            need_comma_line=True
        f.write("\n    }")
        need_comma_json=True
    # write the ini file in addition mode
    with open(out_ini_dir, "a", encoding="UTF-8") as f:
        f.write("["+fnnx+"]\n")
        for key in csf_dict:
            f.write(key+"\t=\t"+csf_dict[key]+"\n")
        f.write("\n")
# write the terminator of json in addition  mode
with open(out_json_dir, "a", encoding="UTF-8") as f:
    f.write("\n}")

Difference between revisions of "CSF File Format"

Revision as of 05:33, 14 February 2024

Contents

The Header

Language

Labels

Label header

Values

Decoding the value

Special case of decoding the value

Summary and Example

Summary

Example

Decode Script

Python

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

quick links

community

Tools