SCI Specifications: Chapter 2 - Resource files

From SCI Wiki
Revision as of 19:17, 2 December 2015 by Andrew Branscom (talk | contribs) (→‎Decompression algorithm LZW)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Resource files

Table of Contents SCI0 resources SCI1 resources Decompression algorithms

with significant contributions from Petr Vyhnak and Vladimir Gneushev

In order to allow games to be both distributeable and playable from several floppy disks, SCI was designed to support multi-volume data. The data itself could therefore be spread into separate files, with some of the more common resources present in more than one of them. The global index for these files was a "resource.map" file, which was read during startup and present on the same disk as the interpreter itself. This file contained a linear lookup table that mapped resource type/number tuples to a set of resource number/ offset tuples, which they could subsequently be read from.

SCI0 resources

SCI0 resource.map

The SCI0 map file format is pretty simple: It consists of 6-byte entries, terminated by the sequence 0xffff ffff ffff. The first 2 bytes, interpreted as little endian 16 bit integer, encode resource type (high 5 bits) and number (low 11 bits). The next 4 bytes are a 32 bit LE integer that contains the resource file number in the high 6 bits, and the absolute offset within the file in the low 26 bits. SCI0 performs a linear search to find the resource; however, multiple entries may match the search, since resources may be present more than once (the inverse mapping is not injective).

SCI0 resource.<nr>

SCI0 resource entries start with a four-tuple of little endian 16 bit words, which we will call (id, comp_size, decomp_size, method). id has the usual SCI0 semantics (high 5 are the resource type, low 11 are its number). comp_size and decomp_size are the size of the compressed and the decompressed resource, respectively. The compressed size actually starts counting at the record position of decomp_size, so it counts four bytes in addition to the actual content. method, finally, is the compression method used to store the data.

SCI1 resources

SCI1 resource.map

The SCI1 resource.map starts with an array of 3-byte structures where the 1st byte is the resource type (0x80 ... 0x91) and next 2 bytes (interpreted as little-endian 16 bit integer) represent the absolute offset of the resource's lookup table (within resource.map). This first array is terminated by a 3-byte entry with has 0xFF as a type and the offset pointing to the first byte after the last resource type's lookup table. SCI1 first goes through this list to find the start of list for the correct resource type and remember this offset and the offset from the next entry to know where it ends. The resulting interval contains a sorted list of 6-byte structures, where the first LE 16 bit integer is the resource number, and the next LE 32 bit integer contains the resource file number in its high 4 bits and the absolute resource offset (in the indicated resource file) in its low 28 bits. Because the list is sorted and its length is known, Sierra SCI can use binary search to locate the resource ID it is looking for.

SCI1 resource.<nr>

Later versions of SCI1 changed the resource file structure slightly: The resource header now contains a byte describing the resource's type, and a four-tuple (res_nr, comp_size, decomp_size, method), where comp_size, decomp_size, and method have the same meanings as before (with the exception of method referring to different algorithms), while res_nr is simply the resource's number.

Decompression algorithms

The decompression algorithms used in SCI are as follows:

Table 2-1. SCI0 compression algorithms

method algorithm
0 uncompressed
1 LZW
2 HUFFMAN

Table 2-2. SCI01 compression algorithms

method algorithm
0 uncompressed
1 LZW
2 COMP3
3 HUFFMAN

Table 2-3. SCI1.0 compression algorithms

method algorithm
0 uncompressed
1 LZW
2 COMP3
3 UNKNOWN-0
4 UNKNOWN-1

Table 2-4. SCI1.1 compression algorithms

method algorithm
0 uncompressed
18 DCL-EXPLODE
19 DCL-EXPLODE
20 DCL-EXPLODE

As reported by Vladimir Gneushev, SCI32 uses STACpack (as described in RFC 1974) explicitly, determining whether there is a need for compression by comparing the size of the compressed data block with that of the uncompressed.

Decompression algorithm LZW

The LZW algorithm itself, when used for compression or decompression in an apparatus (sic) designed for compression and decompression, has been patented by Unisys in Japan, Europe, and the United States. Fortunately, FreeSCI only needs LZW decompression, which means that it does not match the description of the apparatus as given above. (Further, patents on software are (at the time of this writing) not enforceable in Europe, where the FreeSCI implementation of the LZW decompressor was written).

WriteMe.

Decompression algorithm HUFFMAN

This is an implementation of a simple huffman token decoder, which looks up tokens in a huffman tree. A huffman tree is a hollow binary search tree. This means that all inner nodes, usually including the root, are empty, and have two siblings. The tree's leaves contain the actual information.

FUNCTION get_next_bit(): Boolean;
/* This function reads the next bit from the input stream. Reading starts at the MSB. */


FUNCTION get_next_byte(): Byte
VAR
    i: Integer;
    literal: Byte;
BEGIN
    literal := 0;
    FOR i := 0 to 7 DO
        literal := (literal << 1) | get_next_bit();
    OD
    RETURN literal;
END


FUNCTION get_next_char(nodelist : Array of Nodes, index : Integer): (Char, Boolean)
VAR
    left, right: Integer;
    literal : Char;
    node : Node;
BEGIN
    Node := nodelist[index];

    IF node.siblings == 0 THEN
	RETURN (node.value, False);
    ELSE BEGIN
       left := (node.siblings & 0xf0) >> 4;
       right := (node.siblings & 0x0f);

       IF get_next_bit() THEN BEGIN
	   IF right == 0 THEN /* Literal token */
	       literal := ByteToChar(get_next_byte());

	       RETURN (literal, True);
	   ELSE
	       RETURN get_next_char(nodelist, index + right)
        END ELSE
	        RETURN get_next_char(nodelist, index + left)
    END
END

The function get_next_char() is executed until its second return value is True (i.e. if a value was read directly from the input stream) while the first return value equals a certain terminator character, which is the first byte stored in the compressed resource:

Offset Name Meaning
0 terminator Terminator character
1 nodes Number of nodes
2 + i*2 nodelist[i].value Value of node #i (0 ≤ i < nodes)
3 + i*2 nodelist[i].siblings Sibling nodes of node #i
2 + nodes*2 data[] The actual compressed data

where nodelist[0] is the root node.

Decompression algorithm COMP3

WriteMe.

Decompression algorithm DCL-EXPLODE

originally by Petr Vyhnak

This algorithm matches one or more of the UNKNOWN algorithms.

This algorithm is based on the Deflate algorithm described in the Internet RFC 1951 (see also RFC 1950 for related material). The algorithm is quite similar to the explode algorithm (ZIP method #6 - implode ) but there are differences.

	/* The first 2 bytes are parameters */

P1 = ReadByte(); /* 0 or 1 */
	/* I think this means 0=binary and 1=ascii file, but in RESOURCEs I saw always 0 */

P2 = ReadByte();
	/* must be 4,5 or 6 and it is a parameter for the decompression algorithm */


/* Now, a bit stream follows, which is decoded as described below: */


LOOP:
     read 1 bit (take bits from the lowest value (LSB) to the MSB i.e. bit 0, bit 1 etc ...)
         - if the bit is 0 read 8 bits and write it to the output as it is.
         - if the bit is 1 we have here a length/distance pair:
                 - decode a number with Hufmman Tree #1; variable bit length, result is 0x00 .. 0x0F -> L1
                   if L1 <= 7:
                         LENGTH = L1 + 2
                   if L1 > 7
                         read more (L1-7) bits -> L2
                         LENGTH = L2 + M[L1-7] + 2

                 - decode another number with Hufmann Tree #2 giving result 0x00..0x3F -> D1
                   if LENGTH == 2
                         D1 = D1 << 2
                         read 2 bits -> D2
                   else
                         D1 = D1 << P2  // the parameter 2
                         read P2 bits -> D2

                   DISTANCE = (D1 | D2) + 1

                 - now copy LENGTH bytes from (output_ptr-DISTANCE) to output_ptr
END LOOP

The algorithm terminates as soon as it runs out of bits. The data structures used are as follows:

M

M is a constant array defined as M[0] = 7, M[n+1] = M[n]+ 2n. That means M[1] = 8, M[2] = 0x0A, M[3] = 0x0E, M[4] = 0x16, M[5] = 0x26, etc.

Huffman Tree #1

The first huffman tree (the Section called Decompression algorithm HUFFMAN) contains the length values. It is described by the following table:

value (hex) code (binary)
0 101
1 11
2 100
3 011
4 0101
5 0100
6 0011
7 0010 1
8 0010 0
9 0001 1
a 0001 0
b 0000 11
c 0000 10
d 0000 01
e 0000 001
f 0000 000

where bits should be read from the left to the right.

Huffman Tree #2

The second huffman code tree contains the distance values. It can be built from the following table:

value (hex) code (binary)
00 11
01 1011
02 1010
03 1001 1
04 1001 0
05 1000 1
06 1000 0
07 0111 11
08 0111 10
09 0111 01
0a 0111 00
0b 0110 11
0c 0110 10
0d 0110 01
0e 0110 00
0f 0101 11
10 0101 10
11 0101 01
12 0101 00
13 0100 11
14 0100 10
15 0100 01
16 0100 001
17 0100 000
18 0011 111
19 0011 110
1a 0011 101
1b 0011 100
1c 0011 011
1d 0011 010
1e 0011 001
1f 0011 000
20 0010 111
21 0010 110
22 0010 101
23 0010 100
24 0010 011
25 0010 010
26 0010 001
27 0010 000
28 0001 111
29 0001 110
2a 0001 101
2b 0001 100
2c 0001 011
2d 0001 010
2e 0001 001
2f 0001 000
30 0000 1111
31 0000 1110
32 0000 1101
33 0000 1100
34 0000 1011
35 0000 1010
36 0000 1001
37 0000 1000
38 0000 0111
39 0000 0110
3a 0000 0101
3b 0000 0100
3c 0000 0011
3d 0000 0010
3e 0000 0001
3f 0000 0000

where bits should be read from the left to the right.

Huffman Tree #3

This tree describes literal values for ASCII mode, which adds another compression step to the algorithm.

value (hex) code (binary)
00 0000 1001 001
01 0000 0111 1111
02 0000 0111 1110
03 0000 0111 1101
04 0000 0111 1100
05 0000 0111 1011
06 0000 0111 1010
07 0000 0111 1001
08 0000 0111 1000
09 0001 1101
0a 0100 011
0b 0000 0111 0111
0c 0000 0111 0110
0d 0100 010
0e 0000 0111 0101
0f 0000 0111 0100
10 0000 0111 0011
11 0000 0111 0010
12 0000 0111 0001
13 0000 0111 0000
14 0000 0110 1111
15 0000 0110 1110
16 0000 0110 1101
17 0000 0110 1100
18 0000 0110 1011
19 0000 0110 1010
1a 0000 0010 0100 1
1b 0000 0110 1001
1c 0000 0110 1000
1d 0000 0110 0111
1e 0000 0110 0110
1f 0000 0110 0101
20 1111
21 0000 1010 01
22 0001 1100
23 0000 0110 0100
24 0000 1010 00
25 0000 0110 0011
26 0000 1001 11
27 0001 1011
28 0100 001
29 0100 000
2a 0001 1010
2b 0000 1101 1
2c 0011 111
2d 1001 01
2e 0011 110
2f 0001 1001
30 0011 101
31 1001 00
32 0011 100
33 0011 011
34 0011 010
35 0011 001
36 0001 1000
37 0011 000
38 0010 111
39 0001 0111
3a 0001 0110
3b 0000 0110 0010
3c 0000 1001 000
3d 0010 110
3e 0000 1101 0
3f 0000 1000 111
40 0000 0110 0001
41 1000 11
42 0010 101
43 1000 10
44 1000 01
45 1110 1
46 0010 100
47 0001 0101
48 0001 0100
49 1000 00
4a 0000 1000 110
4b 0000 1100 1
4c 0111 11
4d 0010 011
4e 0111 10
4f 0111 01
50 0010 010
51 0000 1000 101
52 0111 00
53 0110 11
54 0110 10
55 0010 001
56 0000 1100 0
57 0001 0011
58 0000 1011 1
59 0000 1011 0
5a 0000 1000 100
5b 0001 0010
5c 0000 1000 011
5d 0000 1010 1
5e 0000 0110 0000
5f 0001 0001
60 0000 0101 1111
61 1110 0
62 0110 01
63 0110 00
64 0101 11
65 1101 1
66 0101 10
67 0101 01
68 0101 00
69 1101 0
6a 0000 1000 010
6b 0010 000
6c 1100 1
6d 0100 11
6e 1100 0
6f 1011 1
70 0100 10
71 0000 1001 10
72 1011 0
73 1010 1
74 1010 0
75 1001 1
76 0001 0000
77 0001 111
78 0000 1111
79 0000 1110
7a 0000 1001 01
7b 0000 1000 001
7c 0000 1000 000
7d 0000 0101 1110
7e 0000 0101 1101
7f 0000 0101 1100
80 0000 0010 0100 0
81 0000 0010 0011 1
82 0000 0010 0011 0
83 0000 0010 0010 1
84 0000 0010 0010 0
85 0000 0010 0001 1
86 0000 0010 0001 0
87 0000 0010 0000 1
88 0000 0010 0000 0
89 0000 0001 1111 1
8a 0000 0001 1111 0
8b 0000 0001 1110 1
8c 0000 0001 1110 0
8d 0000 0001 1101 1
8e 0000 0001 1101 0
8f 0000 0001 1100 1
90 0000 0001 1100 0
91 0000 0001 1011 1
92 0000 0001 1011 0
93 0000 0001 1010 1
94 0000 0001 1010 0
95 0000 0001 1001 1
96 0000 0001 1001 0
97 0000 0001 1000 1
98 0000 0001 1000 0
99 0000 0001 0111 1
9a 0000 0001 0111 0
9b 0000 0001 0110 1
9c 0000 0001 0110 0
9d 0000 0001 0101 1
9e 0000 0001 0101 0
9f 0000 0001 0100 1
a0 0000 0001 0100 0
a1 0000 0001 0011 1
a2 0000 0001 0011 0
a3 0000 0001 0010 1
a4 0000 0001 0010 0
a5 0000 0001 0001 1
a6 0000 0001 0001 0
a7 0000 0001 0000 1
a8 0000 0001 0000 0
a9 0000 0000 1111 1
aa 0000 0000 1111 0
ab 0000 0000 1110 1
ac 0000 0000 1110 0
ad 0000 0000 1101 1
ae 0000 0000 1101 0
af 0000 0000 1100 1
b0 0000 0101 1011
b1 0000 0101 1010
b2 0000 0101 1001
b3 0000 0101 1000
b4 0000 0101 0111
b5 0000 0101 0110
b6 0000 0101 0101
b7 0000 0101 0100
b8 0000 0101 0011
b9 0000 0101 0010
ba 0000 0101 0001
bb 0000 0101 0000
bc 0000 0100 1111
bd 0000 0100 1110
be 0000 0100 1101
bf 0000 0100 1100
c0 0000 0100 1011
c1 0000 0100 1010
c2 0000 0100 1001
c3 0000 0100 1000
c4 0000 0100 0111
c5 0000 0100 0110
c6 0000 0100 0101
c7 0000 0100 0100
c8 0000 0100 0011
c9 0000 0100 0010
ca 0000 0100 0001
cb 0000 0100 0000
cc 0000 0011 1111
cd 0000 0011 1110
ce 0000 0011 1101
cf 0000 0011 1100
d0 0000 0011 1011
d1 0000 0011 1010
d2 0000 0011 1001
d3 0000 0011 1000
d4 0000 0011 0111
d5 0000 0011 0110
d6 0000 0011 0101
d7 0000 0011 0100
d8 0000 0011 0011
d9 0000 0011 0010
da 0000 0011 0001
db 0000 0011 0000
dc 0000 0010 1111
dd 0000 0010 1110
de 0000 0010 1101
df 0000 0010 1100
e0 0000 0000 1100 0
e1 0000 0010 1011
e2 0000 0000 1011 1
e3 0000 0000 1011 0
e4 0000 0000 1010 1
e5 0000 0010 1010
e6 0000 0000 1010 0
e7 0000 0000 1001 1
e8 0000 0000 1001 0
e9 0000 0010 1001
ea 0000 0000 1000 1
eb 0000 0000 1000 0
ec 0000 0000 0111 1
ed 0000 0000 0111 0
ee 0000 0010 1000
ef 0000 0000 0110 1
f0 0000 0000 0110 0
f1 0000 0000 0101 1
f2 0000 0010 0111
f3 0000 0010 0110
f4 0000 0010 0101
f5 0000 0000 0101 0
f6 0000 0000 0100 1
f7 0000 0000 0100 0
f8 0000 0000 0011 1
f9 0000 0000 0011 0
fa 0000 0000 0010 1
fb 0000 0000 0010 0
fc 0000 0000 0001 1
fd 0000 0000 0001 0
fe 0000 0000 0000 1
ff 0000 0000 0000 0

where bits should be read from the left to the right.

Decompression algorithm UNKNOWN

The algorithms listed as UNKNOWN-x have not yet been mapped to actual algorithms but are known to be used by the games. For some of them, it is possible that they match one of the algorithms described above, but have not yet been added to FreeSCI in an appropriate way (refer to DCL-EXPLODE for a good example).

 

< Previous: Chapter 1 - IntroductionNext: Chapter 3 - The Graphics subsystem >