Decompression algorithms

The decompression algorithms used in SCI are as follows:

Table 2-1. SCI0 compression algorithms

methodalgorithm
0uncompressed
1LZW
2HUFFMAN

Table 2-2. SCI01 compression algorithms

methodalgorithm
0uncompressed
1LZW
2COMP3
3HUFFMAN

Table 2-3. SCI1.0 compression algorithms

methodalgorithm
0uncompressed
1LZW
2COMP3
3UNKNOWN-0
4UNKNOWN-1

Table 2-4. SCI1.1 compression algorithms

methodalgorithm
0uncompressed
18UNKNOWN-2
19UNKNOWN-3
20UNKNOWN-4

Decompression algorithm LZW

The LZW algorithm itself, when used for compression or decompression in an apparatus (sic) designed for compression and decompression, has been patented by Unisys in Japan, Europe, and the United States. Fortunately, FreeSCI only needs LZW decompression, which means that it does not match the description of the apparatus as given above. (Further, patents on software are (at the time of this writing) not enforceable in Europe, where the FreeSCI implementation of the LZW decompressor was written).

WriteMe.

Decompression algorithm HUFFMAN

This is an implementation of a simple huffman token decoder, which looks up tokens in a huffman tree. A huffman tree is a hollow binary search tree. This means that all inner nodes, usually including the root, are empty, and have two siblings. The tree's leaves contain the actual information.
FUNCTION get_next_bit(): Boolean;
/* This function reads the next bit from the input stream. Reading starts at the MSB. */


FUNCTION get_next_byte(): Byte
VAR
    i: Integer;
    literal: Byte;
BEGIN
    literal := 0;
    FOR i := 0 to 7 DO
        literal := (literal << 1) | get_next_bit();
    OD
    RETURN literal;
END


FUNCTION get_next_char(nodelist : Array of Nodes, index : Integer): (Char, Boolean)
VAR
    left, right: Integer;
    literal : Char;
    node : Node;
BEGIN
    Node := nodelist[index];

    IF node.siblings == 0 THEN
	RETURN (node.value, False);
    ELSE BEGIN
       left := (node.siblings & 0xf0) >> 4;
       right := (node.siblings & 0x0f);

       IF get_next_bit() THEN BEGIN
	   IF right == 0 THEN /* Literal token */
	       literal := ByteToChar(get_next_byte());

	       RETURN (literal, True);
	   ELSE
	       RETURN get_next_char(nodelist, index + right)
        END ELSE
	        RETURN get_next_char(nodelist, index + left)
    END
END
	
The function get_next_char() is executed until its second return value is True (i.e. if a value was read directly from the input stream) while the first return value equals a certain terminator character, which is the first byte stored in the compressed resource:

OffsetNameMeaning
0terminatorTerminator character
1nodesNumber of nodes
2 + i*2nodelist[i].valueValue of node #i (0 ≤ i < nodes)
3 + i*2nodelist[i].siblingsSibling nodes of node #i
2 + nodes*2data[]The actual compressed data

where nodelist[0] is the root node.

Decompression algorithm COMP3

WriteMe.

Decompression algorithm DCL-EXPLODE

originally by Petr Vyhnak

Note

This algorithm matches one or more of the UNKNOWN algorithms.

This algorithm is based on the Deflate algorithm described in the Internet RFC 1951 (see also RFC 1950 for related material).

The algorithm is quite similar to the explode algorithm (ZIP method #6 - implode ) but there are differences.
	/* The first 2 bytes are parameters */

P1 = ReadByte(); /* 0 or 1 */
	/* I think this means 0=binary and 1=ascii file, but in RESOURCEs I saw always 0 */

P2 = ReadByte();
	/* must be 4,5 or 6 and it is a parameter for the decompression algorithm */


/* Now, a bit stream follows, which is decoded as described below: */


LOOP:
     read 1 bit (take bits from the lowest value (LSB) to the MSB i.e. bit 0, bit 1 etc ...)
         - if the bit is 0 read 8 bits and write it to the output as it is.
         - if the bit is 1 we have here a length/distance pair:
                 - decode a number with Hufmman Tree #1; variable bit length, result is 0x00 .. 0x0F -> L1
                   if L1 <= 7:
                         LENGTH = L1 + 2
                   if L1 > 7
                         read more (L1-7) bits -> L2
                         LENGTH = L2 + M[L1-7] + 2

                 - decode another number with Hufmann Tree #2 giving result 0x00..0x3F -> D1
                   if LENGTH == 2
                         D1 = D1 << 2
                         read 2 bits -> D2
                   else
                         D1 = D1 << P2  // the parameter 2
                         read P2 bits -> D2

                   DISTANCE = (D1 | D2) + 1

                 - now copy LENGTH bytes from (output_ptr-DISTANCE) to output_ptr
END LOOP

	
The algorithm terminates as soon as it runs out of bits. The data structures used are as follows:

M

M is a constant array defined as M[0] = 7, M[n+1] = M[n]+ 2^n. That means M[1] = 8, M[2] = 0x0A, M[3] = 0x0E, M[4] = 0x16, M[5] = 0x26, etc.

Huffman Tree #1

The first huffman tree (the Section called Decompression algorithm HUFFMAN) contains the length values. It is described by the following table:

value (hex)code (binary)
0101
111
2100
3011
40101
50100
60011
70010 1
80010 0
90001 1
a0001 0
b0000 11
c0000 10
d0000 01
e0000 001
f0000 000

where bits should be read from the left to the right.

Huffman Tree #2

The second huffman code tree contains the distance values. It can be built from the following table:

value (hex)code (binary)
0011
011011
021010
031001 1
041001 0
051000 1
061000 0
070111 11
080111 10
090111 01
0a0111 00
0b0110 11
0c0110 10
0d0110 01
0e0110 00
0f0101 11
100101 10
110101 01
120101 00
130100 11
140100 10
150100 01
160100 001
170100 000
180011 111
190011 110
1a0011 101
1b0011 100
1c0011 011
1d0011 010
1e0011 001
1f0011 000
200010 111
210010 110
220010 101
230010 100
240010 011
250010 010
260010 001
270010 000
280001 111
290001 110
2a0001 101
2b0001 100
2c0001 011
2d0001 010
2e0001 001
2f0001 000
300000 1111
310000 1110
320000 1101
330000 1100
340000 1011
350000 1010
360000 1001
370000 1000
380000 0111
390000 0110
3a0000 0101
3b0000 0100
3c0000 0011
3d0000 0010
3e0000 0001
3f0000 0000

where bits should be read from the left to the right.

Huffman Tree #3

This tree describes literal values for ASCII mode, which adds another compression step to the algorithm.

value (hex)code (binary)
000000 1001 001
010000 0111 1111
020000 0111 1110
030000 0111 1101
040000 0111 1100
050000 0111 1011
060000 0111 1010
070000 0111 1001
080000 0111 1000
090001 1101
0a0100 011
0b0000 0111 0111
0c0000 0111 0110
0d0100 010
0e0000 0111 0101
0f0000 0111 0100
100000 0111 0011
110000 0111 0010
120000 0111 0001
130000 0111 0000
140000 0110 1111
150000 0110 1110
160000 0110 1101
170000 0110 1100
180000 0110 1011
190000 0110 1010
1a0000 0010 0100 1
1b0000 0110 1001
1c0000 0110 1000
1d0000 0110 0111
1e0000 0110 0110
1f0000 0110 0101
201111
210000 1010 01
220001 1100
230000 0110 0100
240000 1010 00
250000 0110 0011
260000 1001 11
270001 1011
280100 001
290100 000
2a0001 1010
2b0000 1101 1
2c0011 111
2d1001 01
2e0011 110
2f0001 1001
300011 101
311001 00
320011 100
330011 011
340011 010
350011 001
360001 1000
370011 000
380010 111
390001 0111
3a0001 0110
3b0000 0110 0010
3c0000 1001 000
3d0010 110
3e0000 1101 0
3f0000 1000 111
400000 0110 0001
411000 11
420010 101
431000 10
441000 01
451110 1
460010 100
470001 0101
480001 0100
491000 00
4a0000 1000 110
4b0000 1100 1
4c0111 11
4d0010 011
4e0111 10
4f0111 01
500010 010
510000 1000 101
520111 00
530110 11
540110 10
550010 001
560000 1100 0
570001 0011
580000 1011 1
590000 1011 0
5a0000 1000 100
5b0001 0010
5c0000 1000 011
5d0000 1010 1
5e0000 0110 0000
5f0001 0001
600000 0101 1111
611110 0
620110 01
630110 00
640101 11
651101 1
660101 10
670101 01
680101 00
691101 0
6a0000 1000 010
6b0010 000
6c1100 1
6d0100 11
6e1100 0
6f1011 1
700100 10
710000 1001 10
721011 0
731010 1
741010 0
751001 1
760001 0000
770001 111
780000 1111
790000 1110
7a0000 1001 01
7b0000 1000 001
7c0000 1000 000
7d0000 0101 1110
7e0000 0101 1101
7f0000 0101 1100
800000 0010 0100 0
810000 0010 0011 1
820000 0010 0011 0
830000 0010 0010 1
840000 0010 0010 0
850000 0010 0001 1
860000 0010 0001 0
870000 0010 0000 1
880000 0010 0000 0
890000 0001 1111 1
8a0000 0001 1111 0
8b0000 0001 1110 1
8c0000 0001 1110 0
8d0000 0001 1101 1
8e0000 0001 1101 0
8f0000 0001 1100 1
900000 0001 1100 0
910000 0001 1011 1
920000 0001 1011 0
930000 0001 1010 1
940000 0001 1010 0
950000 0001 1001 1
960000 0001 1001 0
970000 0001 1000 1
980000 0001 1000 0
990000 0001 0111 1
9a0000 0001 0111 0
9b0000 0001 0110 1
9c0000 0001 0110 0
9d0000 0001 0101 1
9e0000 0001 0101 0
9f0000 0001 0100 1
a00000 0001 0100 0
a10000 0001 0011 1
a20000 0001 0011 0
a30000 0001 0010 1
a40000 0001 0010 0
a50000 0001 0001 1
a60000 0001 0001 0
a70000 0001 0000 1
a80000 0001 0000 0
a90000 0000 1111 1
aa0000 0000 1111 0
ab0000 0000 1110 1
ac0000 0000 1110 0
ad0000 0000 1101 1
ae0000 0000 1101 0
af0000 0000 1100 1
b00000 0101 1011
b10000 0101 1010
b20000 0101 1001
b30000 0101 1000
b40000 0101 0111
b50000 0101 0110
b60000 0101 0101
b70000 0101 0100
b80000 0101 0011
b90000 0101 0010
ba0000 0101 0001
bb0000 0101 0000
bc0000 0100 1111
bd0000 0100 1110
be0000 0100 1101
bf0000 0100 1100
c00000 0100 1011
c10000 0100 1010
c20000 0100 1001
c30000 0100 1000
c40000 0100 0111
c50000 0100 0110
c60000 0100 0101
c70000 0100 0100
c80000 0100 0011
c90000 0100 0010
ca0000 0100 0001
cb0000 0100 0000
cc0000 0011 1111
cd0000 0011 1110
ce0000 0011 1101
cf0000 0011 1100
d00000 0011 1011
d10000 0011 1010
d20000 0011 1001
d30000 0011 1000
d40000 0011 0111
d50000 0011 0110
d60000 0011 0101
d70000 0011 0100
d80000 0011 0011
d90000 0011 0010
da0000 0011 0001
db0000 0011 0000
dc0000 0010 1111
dd0000 0010 1110
de0000 0010 1101
df0000 0010 1100
e00000 0000 1100 0
e10000 0010 1011
e20000 0000 1011 1
e30000 0000 1011 0
e40000 0000 1010 1
e50000 0010 1010
e60000 0000 1010 0
e70000 0000 1001 1
e80000 0000 1001 0
e90000 0010 1001
ea0000 0000 1000 1
eb0000 0000 1000 0
ec0000 0000 0111 1
ed0000 0000 0111 0
ee0000 0010 1000
ef0000 0000 0110 1
f00000 0000 0110 0
f10000 0000 0101 1
f20000 0010 0111
f30000 0010 0110
f40000 0010 0101
f50000 0000 0101 0
f60000 0000 0100 1
f70000 0000 0100 0
f80000 0000 0011 1
f90000 0000 0011 0
fa0000 0000 0010 1
fb0000 0000 0010 0
fc0000 0000 0001 1
fd0000 0000 0001 0
fe0000 0000 0000 1
ff0000 0000 0000 0

where bits should be read from the left to the right.

Decompression algorithm UNKNOWN

The algorithms listed as UNKNOWN-x have not yet been mapped to actual algorithms but are known to be used by the games. For some of them, it is possible that they match one of the algorithms described above, but have not yet been added to FreeSCI in an appropriate way (refer to DCL-EXPLODE for a good example).