Small IRA Tutorial
Many years ago I did this tutorial in german on a1k.org. I tried to quickly translate everything, so it can be useful for more people.
There was also a second part, which is about modifying the resulting source to fix a problem in it, but it has not directly to do with IRA. --- This example is based on a user request who wanted to get rid of the flickering date/time display in the screen bar of the "BootMan11" (a 20-years old boot manager for OS2.0+). So I demonstrated how to use a reassembler for fixing a program. We download the program from Aminet (http://aminet.net/search?query=bootman11) and unpack it. The main program resides in BootMan/s/BootMan and has a size of 21580 bytes. We copy it into our empty work directory. First you should make IRA generate a config file, because we need it to iteratetively improve the output of the reassembly. The option -preproc exists for this purpose. In contrast to a simple single-pass reassembly (without config file) IRA also tries to follow the program flow and marks only those parts of the program as code which are referenced by already known code, beginning from the starting point. The rest is regarded as data. That's not always quite correct, of course (for example when using function pointers), but we can fix that in the config file later on. Code:
frank@albireo ira -a -preproc BootMan Code:
MACHINE 68000 First view of the output: Code:
SECTION S_0,CODE If we provide IRA with the correct small-data setting it will automatically create labels for all these references, which makes the output much more readable. Therefore we have to make three settings in the config file:
Code:
SECTION S_1,DATA Now we add two new lines to BootMan.cnf: Code:
BASEREG A4 ira -a -config BootMan Caution! It would be a quite tragical error to confuse the two options at a later point, because -preproc would overwrite your config file. (Translator's annotation: no longer true with recent IRA versions!) As you might notice the output has changed. Now we have many new lables inside the data section. And in the code you will see e.g. LAB_036E(A4) instead of 2960(A4). Code:
MOVE.L A7,LAB_036E(A4) ;0028: 294f0b90 BASEREG SECSTRT_1,A4 To assemble IRA-output I am using vasm (vasmm68k_mot) in Devpac mode. This is required to make sure that the assembler performs no optimisations and to treat escape characters (\) in strings correctly. (Translator's annotation: newer vasm versions ignore escape characters by default, like Devpac, so -no-opt would suffice.) Now it becomes laborious, depending on how much perfectionist you are. You should inspect the reassembled output at least once from the beginning to the end to fix non-detected code- and data regions in the config file. Then rerun IRA with this config file and check again for more regions. Repeat until you found everything. In the end it should no longer be a problem to modify the resulting source without breaking it. The first suspicious location in our example is at offset $10fc: Code:
RTS ;10fa: 4e75 Code:
CODE $00000A60 - $000010FC Code:
CODE $00000A60 - $00001280 But what's that? Code:
MOVEA.L #$00dff016,A0 ;1522: 207c00dff016 Of course, you can make IRA to reassemble such illegel instructions as well, by using the compatibility option -compat=b. Code:
TEXT $00001700 - $00001704 Important: Regions defined by TEXT directives cannot cross labels! Code:
DC.L $0000206f,$0004226f ;1a82 Hence, the code starts at $1a84 and not at $1a8c, which you should fix in the config file. Now it becomes interesting. Especially in compiler output, but also in optimised assembler programs you will often see this: Code:
CMPI.L #$00000008,D0 ;1cce: 0c8000000008 We see a sequence of eight 16-bit offsets for jumping pc-relatively to LAB_010D: $e,$12,$1c,$26,$36,$40,$50,$60. The first opcode after that is $7000 (a MOVEQ #0,D0). Hence the jump table extends until offset $1cf2. IRA knows 8-, 16- and 32-bit jump tables. The directives are JMPB, JMPW and JMPL and expect, as usual, a start- and end-offset for the region. Obviously we select JMPW in this case. Frequently jump table offsets are used to jump relatively to the table's start address. But many C compilers, as in this case, branch relative to the table's start address + 2. To support this case IRA allows an optional third agument for the JMPx directive, which designates the relative jump table base address. So we make the following entry into BootMan.cnf: Code:
JMPW $00001CE2 - $00001CF2 @ $00001CE4 Code:
LAB_010C: We continue like this through the whole output, until we reach the point where everything seems perfectly reassembled. The final BootMan.cnf looks like this: Code:
MACHINE 68000 |
Thanks, first thing that comes to mind is the options and how/why/when to use them?
Why not make the conf file by default I’m interested in the 2nd part too. You wrote BASEREG 4 you meant A4? Is it possible to use 2 labels instead of hex values ? Likening your example CODE $00000A60 - $000001280? How to be sure to have the right value for the BASEOFF value? |
Quote:
Most (all?) options are shown when IRA is started without arguments, together with a short description. Most important are -a, -preproc and -config, as shown in the example above. -m680x0 has to be used when code for a different CPU than 68000 should be reassembled. -oldstyle nad -newstyle select the syntax. -binary is used when reassembling raw binary files, instead of hunk-format executables. This is more difficult, because the reassembler is lacking relocation information. You will usually spend more time to differentiate real pointers from constants (using the PTRS and NOPTRS directives). -entry defines the starting point of the code, which is usually only important for binary files (hunk-executables always start with first instruction in first section). -keepzh keeps empty sections in the output. Only required if you want to make sure the reassembled program is identical. -compat=[b][i]. Compatibility with bad code/assemblers. 'b' recognises bit-instructions accessing a bit number > 7 in a byte, and 'i' recognises immediate byte addressing modes, where the MSB is no zero The rest is rather rare. Quote:
Quote:
Quote:
Quote:
Quote:
|
Is there a way to tell IRA to replace offsets calls to the OS and then it replace all of them with LVO? Same for hardware registers are they recognized and replaced with EQUs?
|
For hardware registers this is easy, for library calls I do not see how this can be implemented in a way that always correctly guesses the correct library that has to be used for certain _LVO offsets.
|
@kamelito I have a python script (cheapres.py) which does that. I was planning on releasing it on github
|
Sherlock disassembler does this, it is like IRA, if it find $4 then a call it knows that it is exec calls, then you can tell it that A6 point to intuition base then all cals will be replaced with LVO until A6 changes.
http://translate.googleusercontent.c...-s45ii-oPjW0KA More readable in French http://obligement.free.fr/articles/sherlock.php |
Quote:
Please link it here when ready :) |
Quote:
|
I had a C++ version at some point. But I rewrote it from scratch into python. Besides, the script can make wrong assumptions. But it generally works very well 99% of the time. I've been using this approach for 15+ years and it always worked well.
It can detect library strings when opening libraries, change the labels to "xxxbase" and then change the offsets when A6 is loaded with this base. The base is reset when encountering RTS, so there's little chance that there's a mistake. At worst, the offsets remain (and the tool comments "unknown"). You can then figure out the unknown calls manually by setting the proper library base, and repeat the operation until there aren't any "unknown" left. I recently added some "formal execution" that follows library base into registers, until it is set to A6. I'll try to publish the github repository later. I don't want to create a repository only for this tool. Note: CFOU! used it to reverse-engineer EOB / EOB2 to create AGA versions. I also use it a lot to clarify what the hell a DOS game is doing when creating a whdload slave. BTW one thing I hate is when IRA fails to disassemble an executable. First it doesn't support overlayed (d68k does but the output sucks) at least for the non-overlayed parts. Then sometimes, on some executables, it just creates a bigger and bigger file. When you stop it, you get correct output at first, then a neverending lot of junk (dc.b). I usually edit it out. Anyway I'm generally not using IRA to be able to reassemble the code (although it works well). I use it to understand what the code does, and to apply patches at given offsets (that's why -a option is a must have for me) I don't want to hijack this thread. I'll try the configuration tricks. I always used IRA in single pass. |
Quote:
Famous last words. :) |
lol. I think I have covered RTE as well :) but if someone does a BRA to a routine just after the code then changes A6, the next routine could have wrong lib offset
- if there is no RTS/RTE in between (means spaghetti code with another BRA) - if A6 isn't reloaded by the routine In real life, I may have encountered such a situation a few times. And I used the tool on hundreds of executables. That and the fact that a lot of system-friendly games are coded in C, and C compilers always remind the A6 base prior to call the system. The odds aren't against me at least. |
Quote:
Quote:
|
When you are done with reassembly and start to analyse the source, the SYMBOL directive might become useful. You can replace any label by a symbol name of your choice, by specifying a name and the program offset.
A good example for this tutorial might be Great Giana Sisters, which I reassembled and analysed 10 years ago for fun. It also shows how do deal with a raw binary program, starting at $1000 with program entry at $102c. In the config file you see that I have lots of PTRS directives to define program addresses (which are otherwise no known to the reassembler) and all my symbols which I added during analysation of the code. The ripped binary (from a cracked disk): https://server.owl.de/~frank/download/gianacode.1000 The config file: https://server.owl.de/~frank/download/gianacode.cnf The reassembled source: https://server.owl.de/~frank/download/gianacode.asm |
Apologizes if this sounds stupid or something, but, what exactly is IRA ?!? I mean, it's the title of the thread but nobody explains or links to whatever it is.
Again, sorry if I missed the obvious. |
|
I think I get it and start to like it because of the windows version ;)
One feature could be to be able to set the value of the indentation when using the -a so all offsets could be aligned and so have a more readable output. Is there a way to change data size of dc.x ? .b .w .l Is it also possible to change to binary, hex and decimal the values? Say A5 point to $DFF000, then something like move.w D6,98(A5) is there a way for IRA to put the Amiga chipset EQU instead? VASM seems to do not like label with a ., like jmp .label update : labels were .begin .divs .mulu .modu maybe those word are reserved or is it something else, getting rid of the . fixed the issues. |
Thank you phx for this. Now I can patch MEmacs to run with a black backround and white/green text so that it is a lot easier on the eyes.
|
Quote:
|
just a few python tools based on IRA disassembly output on my github:
https://github.com/jotd666/amiga68ktools tools/cheapres.py is the one which can put LVO names on IRA resourced code. note that my approach was to first disassemble, then apply my tool on the disassembled text, so it may appear to conflict with the config file approach phx was exposing here. But if you run cheapres.py after each IRA pass, that is going to work too. There's also a 68k code checker I advertised several years ago and which allowed to find hard-to-find CPU dependent loop on some games and also self-modifying code. |
All times are GMT +2. The time now is 08:22. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.