View Single Post
Old 07 October 2011, 16:46   #8
Kroah
Registered User

Kroah's Avatar
 
Join Date: Apr 2009
Location: France
Age: 40
Posts: 112
Quote:
Originally Posted by copse View Post
Is there any chance you could give some highlights of your approach? How much time do you spend reverse engineering and how often? Are there any tips that you'd give for how to streamline the process? Perhaps, just get it to recompile and then play with changing things? Or under a debugger/WinUAE do the same?
Well, i'll try to sum up the process.

I use the following Windows softwares:
- Steem Engine with its extremely powerful debugger (sorry, but WinUAE debugger is awful and unstable)
- IDA (something similar to ReSource) to disassemble
- Ultraedit to edit some hex files
- Visual Studio C# Express

Even if each game is specific, i usually follow this general guideline:
1) I run the game up to the part i want to disassemble, trying to load as much data as i can (some games load everything at start, others load gradually).

2) Then i save the state and dump the RAM to a file.
Using a dump (loaded code) and not directly the binary executable has several advantage:

- It bypasses the eventual protected loader and/or decryption part.
- Both IDA and the emulator share the same memory reference. No need to convert or relocate addresses between them.
- pointers point to real data whatever the addressing mode is (relative or absolute)
- Structures in IDA can be applied to these data.

This method is ideal if the objective is to undestand the game (not to get a "ready to assemble" code). I think the fastest way to do a remake is to first understand the game logic and structures, then port it to another language. Having 10k asm lines without understanding anything from it, is a real nightmare to port.

3) I run IDA and load the dump.

4) Using the emulator stack, i look for the entry point of the code and ask IDA to disassemble from there. Usually up to 75% of the code is disassembled. The remaining parts are jump tables, interrupt routines and lazy loading. They will be disassembled later.

5) The big analysis phase starts now. Routines, variables and structures are identified and named. It's very important to label the input/ouput variables of each function because they appear at the function call and allow to find the meaning and type of the passed variables.

6) First step: top-bottom analysis. If applicable to the game,i look for the main game loop (update, draw, update, draw...). I run the game up to the main game loop (for a platform game, i load the first level and move a little for example), then look at the stack. I peek the first return address on the stack and set IDA to this address. If i see something similar to a game loop, nice. Otherwise i get the following rts address on the stack and so on.

7) With the main game loop identified, i take 1 function call in IDA, NOP it in the emulator and interpret the result (ie. the status bar disappears). If this is an easy function (draw the score for example), i try to find the low-level routines (PlaySound, DrawGfx, ReadInput) used in it. They are used very often, are easy to locate and don't need to be understood in detail. Sometimes 5-10% of the code are drawing functions for every case possible (sprite size, location on the screen, pre-shifted gfx, etc.).

8) Second step: bottom-up analysis. For each low-level function identified, i use the cross reference in IDA to get all the function calls for it and then name the passed variables. As stated above, the input parameters (x,y) can be used to identify many (many!) variables passed.

9) I take a part of the game i know very well and try to understand as much code as possible (naming, commenting) without insisting. Then take another one. Like a puzzle, it's easier to build several small parts and link them together later than building around 1 part only.
The emulator is often used to set breakpoints on a variable R/W access or to NOP a call and see the result. This helps so much to understand the code.

10) I decode the loading part of the data and gfx (disk access, decryption, unpacking) and write a C# program to extract them from the original disk. This will be the base of a viewer or a remake. This can be hard because of a custom file system, an unknown compression routine or a difficult decryption algorithm. Even those are coded.

11) Now if i want to remake a part of the game, i have everything needed. The disassembled code is globally understood, data are extracted, structured objects are known.
I begin by the main loop and port it to the new language. I stub all important called functions and then convert them. The difficulty is to convert the routines while refactoring AND keeping the same behavior. That's why it's best to refactor step by step: first remove the spaghetti code (add 'if', 'else' and 'for', remove jumps), then structure the data (use local variables, remove global variables, use identified structures).
Some functions will still be coded line by line from asm to get the exact same result, mainly routines with binary operations (random number generator, optimized math algebra).

12) To find incorrect behavior, the remake is run side by side with the emulator and frame by frame. As soon as a discrepancy is noticed, the previous frame is run step by step until the divergence is found.

For Speedball 2, i have disassembled and analyzed the code for ~40 hours. The remake took me about 20 hours to code and 20 hours to find discrepancies ('<' instead of '<=' for example) for a total of ~80hours over 3 weeks.
A full remake should take another 50 hours because there are a lot of screens with menus, sounds to rip, etc... without talking about the amiga gfx decoding.

Hope you liked the read,
Cheers

Last edited by Kroah; 07 October 2011 at 18:08.
Kroah is offline  
 
Page generated in 0.08052 seconds with 9 queries