Pro Pinball Music Reverse Engineering

2020-03-15

This was an absolute pain of a codec to reverse. It’s a simple ADPCM format, but instead of using standard step and index tables, it precalculates a lookup table and uses that. I’m not sure why as the whole purpose of ADPCM was okay-compression and fast decode.

This was done in all of the four games, at least on their PC versions.

ADPCM Tables

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24


/*
 * Dumped from the binaries:
 * - FantasticJourney.exe - 0x794D2, DGROUP:0x47A4D2
 * - BigRaceUSA.exe       - 0x9B8AA, DGROUP:0x49C4AA
 * - Timeshock!.exe       - 0x8506A, DGROUP:0x485C6A
 */
const int8_t fj_index_table[8] = {
    -1, -1, -1, -1, 1, 2, 3, 4
};

/*
 * Dumped from the binaries:
 * - FantasticJourney.exe - 0x79458, DGROUP:0x47A458
 * - BigRaceUSA.exe       - 0x9B830, DGROUP:0x49C430
 * - Timeshock!.exe       - 0x84FF0, DGROUP:0x485BF0
 */
const int16_t fj_step_table[61] = {
       1,    1,   1,      1,     2,     2,     3,     3,    4,      5,
       6,    7,   8,     10,    12,    14,    16,    20,    24,    28,
      32,   40,  48,     56,    64,    80,    96,   112,   128,   160,
     192,  224,  256,   320,   384,   448,   512,   640,   768,   896,
    1024, 1280, 1536,  1792,  2048,  2560,  3072,  3584,  4096,  5120,
    6144, 7168, 8192, 10240, 12288, 14336, 16384, 20480, 24576, 28672, 0
};

Decoder

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


/* Decode 8 samples at a time using the static table. */
uint32_t decode_dynamic(uint32_t next, int16_t *samples, uint32_t prev)
{
    for(int i = 0; i < 8; ++i) 
    	uint16_t index = (uint16_t)(((prev >> 16) & 0xFF00) | (next & 0x00FF));
    	prev = fj_sound_table[index] + (uint16_t)prev;
    	*samples++ = (int16_t)prev;
    	next >>= 4;
	}

	return prev;
}

This doesn’t look like much a ADPCM decoder, but if it’s nop’d-out all the music and sound effects stop. To find this, poke around the executables starting at the DirectSound calls.

Looking at XREFs to fj_sound_table, the only place it’s written to is in sub_44F790:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52


void __cdecl sub_44F790()
{
  int v0; // ebp
  int v1; // eax
  signed __int16 v2; // cx
  int v3; // eax
  unsigned __int16 v4; // di
  int v5; // edx
  signed int v6; // [esp+0h] [ebp-24h]        
  signed int v7; // [esp+4h] [ebp-20h]        
  __int16 v8; // [esp+8h] [ebp-1Ch]

  v7 = -7;
  do
  {
    v6 = 0;
    v0 = 0;
    v8 = v7 & 0xF;
    do
    {
      if ( v7 < 0 )
        v1 = -v7;
      else
        v1 = v7;
      v2 = fj_adpcm_index_table2[v1] + v6;    
      if ( v2 >= 0 )
      {
        if ( v2 >= 61 )
          v2 = 60;
      }
      else
      {
        v2 = 0;
      }
      v3 = 0;
      v4 = fj_adpcm_step_table[v0] * v7;      
      do
      {
        v5 = v3 | v8;
        v3 += 16;
        fj_sound_table[v5] = (v2 << 24) + v4; 
      }
      while ( v3 != 256 );
      ++v0;
      ++HIBYTE(v8);
      ++v6;
    }
    while ( v6 < 61 );
    ++v7;
  }
  while ( v7 < 8 );
}

This does look like an ADPCM decoder, albeit an odd one. Looking at the innermost loop, the first part of it can be simplified to:

1
2


v2 = fj_adpcm_index_table[abs(v7)] + v6;
v2 = clip(v2, 0, 60);

This looks familiar. v2 is the step index, v6 is the previous step index, and v7 is presumably the nibble.

1
2


step_index = fj_adpcm_index_table[abs(nibble)] + prev_step_index;
step_index = clip(step_index, 0, 60);

The next part is a bit strange. It’s a simple loop, but what is it actually doing? It looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


v3 = 0;
v4 = fj_adpcm_step_table[v0] * v7;      
do
{
  v5 = v3 | v8;
  v3 += 16;
  fj_sound_table[v5] = (v2 << 24) + v4; 
}
while ( v3 != 256 );
++v0;
++HIBYTE(v8);
++v6;

From this, observe that:

0 ≤ v0 ≤ 60, meaning it’s a step index. This also explains why it’s being used to index fj_adpcm_step_table;
v7 is a sign-extended nibble;
v4 the sample difference, a.k.a. step * nibble;
v8 is the actual nibble, with the sign-extend lopped off;
0 ≤ v3 ≤ 256, in steps of 16. Notice that 16 == 1 << 4. By extension:
- 0 >> 4 == 0, and 256 >> 4 == 16, suggesting that v3 is an ADPCM nibble right shifted by 4, and we’re looping over each value.

Notice the ++HIBYTE(v8). We were slightly wrong about v8: only the lower byte contains the nibble. The high byte is looped from 0 to 60, meaning it is probably a step index.

Now, look at fj_sound_table; specifically how it’s indexed. fj_sound_table is indexed by v5, where v5 == v3 | v8, where v8 == step_index | v7 & 0xF). In essence, the format of the key is:

1

v5 = step_index << 8 | (nibble1 & 0xF << 4) | nibble0`

This type of indexing heavily suggests a multi-dimensional array, and in fact, it is:

1

uint32_t fj_sound_table[15616];

becomes:

1

uint32_t fj_sound_table[61][16][16];

We’re half-way there. We know the layout of the table, but not its contents. This is the easy part:

1

fj_sound_table[v5] = (v2 << 24) + v4;

Observe:

The << 24 tells us that it’s at least a 32-bit type. Assume it is because this is 32-bit game;
v2 is a step index, and easily fits in 8-bits.
v4 is the sample difference, and is a 16-bit quantity. This is either a bitfield, or a struct. I’m going with a struct:

1
2
3
4
5
6


typedef struct FjTable
{
	int16_t diff;
	uint8_t zero;
	uint8_t step_index;
} FjTable;

Don’t believe me? Check out the tables yourself.

The full-reversed function is:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


static FjTable fj_sound_table[61][16][16];

void fj_fill_sound_table()
{
    for(int8_t i = -7; i < 8; ++i) {
        int nib0 = i & 0x0F;

        for(uint8_t index = 0; index < 61; ++index) {
            int next_index = clip(fj_adpcm_index_table2[abs(i)] + index, 0, 60);
            int16_t diff   = fj_adpcm_step_table[index] * i;

            for(int8_t nib1 = 0; nib1 < 16; ++nib1) {
                FjTable *t = &fj_sound_table[index][nib1][nib0];
                t->diff       = diff;
                t->zero       = 0;
                t->step_index = (uint8_t)next_index;
            }
        }
    }
}

From here, it’s trivial to reverse the decoder. The completed decoder, from FFmpeg is:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


static inline int16_t adpcm_ima_cunning_expand_nibble(ADPCMChannelStatus *c, int8_t nibble)
{
    int step_index;
    int predictor;
    int step;

    nibble = sign_extend(nibble & 0xF, 4);

    step = ff_adpcm_ima_cunning_step_table[c->step_index];
    step_index = c->step_index + ff_adpcm_ima_cunning_index_table[abs(nibble)];
    step_index = av_clip(step_index, 0, 60);

    predictor = c->predictor + step * nibble;

    c->predictor = av_clip_int16(predictor);
    c->step_index = step_index;

    return c->predictor;
}

There’s actually a minor bug in this that I missed. Notice the abs(nibble) being used to index the index table. This is in the range 0 ≤ abs(nibble) ≤ 8, but there’s only 8 elements in the index table, leading to a potential buffer overrun. This was caught by FFmpeg’s automated fuzzing¹.

As explained here:

ff_adpcm_ima_cunning_index_table[abs(nibble)] is wrong in the case where nibble == -8.

If you take the unsigned nibble, and apply f():
  f(x) = 16 - x if x > 8 else x & 0x7
you’ll get the same value as abs() applied with the signed nibble, except for this one case (abs(-8) == 8, f(8) == 0).

The fix was to simply extend the index table with an extra -1:

1
2
3


const int8_t ff_adpcm_ima_cunning_index_table[9] = {
    -1, -1, -1, -1, 1, 2, 3, 4, -1
};

https://ffmpeg.org/pipermail/ffmpeg-devel/2020-May/263432.html ↩︎