English Amiga Board - Small IRA Tutorial

Many years ago I did this tutorial in german on a1k.org. I tried to quickly translate everything, so it can be useful for more people.
There was also a second part, which is about modifying the resulting source to fix a problem in it, but it has not directly to do with IRA.

---

This example is based on a user request who wanted to get rid of the flickering date/time display in the screen bar of the "BootMan11" (a 20-years old boot manager for OS2.0+). So I demonstrated how to use a reassembler for fixing a program.

We download the program from Aminet (http://aminet.net/search?query=bootman11) and unpack it. The main program resides in BootMan/s/BootMan and has a size of 21580 bytes. We copy it into our empty work directory.

First you should make IRA generate a config file, because we need it to iteratetively improve the output of the reassembly. The option -preproc exists for this purpose. In contrast to a simple single-pass reassembly (without config file) IRA also tries to follow the program flow and marks only those parts of the program as code which are referenced by already known code, beginning from the starting point. The rest is regarded as data. That's not always quite correct, of course (for example when using function pointers), but we can fix that in the config file later on.

Code:

frank@albireo ira -a -preproc BootMan



IRA V2.07 (Jun  6 2012)

(c)1993-95 Tim Ruehsen (SiliconSurfer/PHANTASM)

(c)2009-2012 Frank Wille





SOURCE : BootMan

TARGET : BootMan.asm

MACHINE: MC68000

OFFSET : $00000000

Pass 0: scanning for data in code

Areas:   58  

CodeAdrs: 0   CodeAdrMax: 128

CodeArea[0]: 00000000 - 00000254

CodeArea[1]: 00000260 - 00000a5e

CodeArea[2]: 00000a60 - 000010fc

CodeArea[3]: 0000110e - 00001280

CodeArea[4]: 00001294 - 00001458

CodeArea[5]: 00001464 - 000016fe

CodeArea[6]: 00001854 - 000019ac

CodeArea[7]: 00001a48 - 00001a82

CodeArea[8]: 00001a8c - 00001b94

CodeArea[9]: 00001b98 - 00001bda

CodeArea[10]: 00001c18 - 00001ce2

CodeArea[11]: 00001d5a - 00001d62

CodeArea[12]: 00001d70 - 00001e62

CodeArea[13]: 00001e70 - 00001f5c

CodeArea[14]: 00001f6a - 00001fee

CodeArea[15]: 00001ff4 - 00002242

CodeArea[16]: 00002244 - 000022b6

CodeArea[17]: 000022e4 - 000023a6

CodeArea[18]: 000023a8 - 000023d6

CodeArea[19]: 000023d8 - 000023e6

CodeArea[20]: 000023ec - 00002444

CodeArea[21]: 00002448 - 00002546

CodeArea[22]: 00002594 - 000025da

CodeArea[23]: 000025dc - 0000262a

CodeArea[24]: 0000262c - 000026bc

CodeArea[25]: 000026c8 - 00002be2

CodeArea[26]: 00002c3c - 00002c78

CodeArea[27]: 00002c7c - 00002e22

CodeArea[28]: 00002e2c - 00002e3c

CodeArea[29]: 00002e48 - 00002e76

CodeArea[30]: 00002eac - 000032b6

CodeArea[31]: 000032c0 - 000033ce

CodeArea[32]: 000033d4 - 00003428

CodeArea[33]: 0000343c - 0000346a

CodeArea[34]: 0000346c - 000034e2

CodeArea[35]: 000034e8 - 000037c6

CodeArea[36]: 000037cc - 000038fc

CodeArea[37]: 0000390a - 00003a3e

CodeArea[38]: 00003a40 - 00003aa2

CodeArea[39]: 00003ab8 - 00003b1c

CodeArea[40]: 00003b68 - 00003c64

CodeArea[41]: 00003e6c - 00003e72

CodeArea[42]: 00003f1c - 00003f26

CodeArea[43]: 00003f3c - 00003fa4

CodeArea[44]: 00003fa8 - 00003fd2

CodeArea[45]: 00003fd4 - 00003fe6

CodeArea[46]: 000041f8 - 00004262

CodeArea[47]: 00004264 - 0000427a

CodeArea[48]: 0000427c - 00004296

CodeArea[49]: 00004298 - 000042da

CodeArea[50]: 000042dc - 0000435a

CodeArea[51]: 0000435c - 000043a2

CodeArea[52]: 000043a4 - 0000440e

CodeArea[53]: 00004410 - 00004462

CodeArea[54]: 00004464 - 00004496

CodeArea[55]: 00004498 - 000044b2

CodeArea[56]: 000044b4 - 000044ca

CodeArea[57]: 000044da - 000045b6

CodeArea[58]: 000045b8 - 000045b8

CodeArea[59]: 00005364 - 00005364





Pass 1: 100%

Pass 2: correcting labels

Pass 2: writing mnemonics

100%

The generated config file, BootMan.cnf, looks like this and defines all the detected code regions:

Code:

MACHINE 68000

ENTRY $00000000

OFFSET $00000000

CODE $00000000 - $00000254

CODE $00000260 - $00000A5E

CODE $00000A60 - $000010FC

CODE $0000110E - $00001280

CODE $00001294 - $00001458

CODE $00001464 - $000016FE

CODE $00001854 - $000019AC

CODE $00001A48 - $00001A82

CODE $00001A8C - $00001B94

CODE $00001B98 - $00001BDA

CODE $00001C18 - $00001CE2

CODE $00001D5A - $00001D62

CODE $00001D70 - $00001E62

CODE $00001E70 - $00001F5C

CODE $00001F6A - $00001FEE

CODE $00001FF4 - $00002242

CODE $00002244 - $000022B6

CODE $000022E4 - $000023A6

CODE $000023A8 - $000023D6

CODE $000023D8 - $000023E6

CODE $000023EC - $00002444

CODE $00002448 - $00002546

CODE $00002594 - $000025DA

CODE $000025DC - $0000262A

CODE $0000262C - $000026BC

CODE $000026C8 - $00002BE2

CODE $00002C3C - $00002C78

CODE $00002C7C - $00002E22

CODE $00002E2C - $00002E3C

CODE $00002E48 - $00002E76

CODE $00002EAC - $000032B6

CODE $000032C0 - $000033CE

CODE $000033D4 - $00003428

CODE $0000343C - $0000346A

CODE $0000346C - $000034E2

CODE $000034E8 - $000037C6

CODE $000037CC - $000038FC

CODE $0000390A - $00003A3E

CODE $00003A40 - $00003AA2

CODE $00003AB8 - $00003B1C

CODE $00003B68 - $00003C64

CODE $00003E6C - $00003E72

CODE $00003F1C - $00003F26

CODE $00003F3C - $00003FA4

CODE $00003FA8 - $00003FD2

CODE $00003FD4 - $00003FE6

CODE $000041F8 - $00004262

CODE $00004264 - $0000427A

CODE $0000427C - $00004296

CODE $00004298 - $000042DA

CODE $000042DC - $0000435A

CODE $0000435C - $000043A2

CODE $000043A4 - $0000440E

CODE $00004410 - $00004462

CODE $00004464 - $00004496

CODE $00004498 - $000044B2

CODE $000044B4 - $000044CA

CODE $000044DA - $000045B6

CODE $000045B8 - $000045B8

CODE $00005364 - $00005364

END

You should always use the option -a to add the program offset as a comment to each line. Important to know is that these offsets are consecutive over the whole program. They are no section-offsets, but offsets to the program start, where all sections form a contiguous block.

First view of the output:

Code:

        SECTION S_0,CODE



SECSTRT_0:

        MOVEM.L D1-D6/A0-A6,-(A7)       ;0000: 48e77efe

        MOVEA.L A0,A2                   ;0004: 2448

        MOVE.L  D0,D2                   ;0006: 2400

        LEA     SECSTRT_1,A4            ;0008: 49f9000045b8

        MOVEA.L ABSEXECBASE.W,A6        ;000e: 2c780004

        LEA     LAB_0314,A3             ;0012: 47f900005118

        MOVEQ   #0,D1                   ;0018: 7200

        MOVE.L  #$00000093,D0           ;001a: 203c00000093

        BRA.S   LAB_0001                ;0020: 6002

LAB_0000:

        MOVE.L  D1,(A3)+                ;0022: 26c1

LAB_0001:

        DBF     D0,LAB_0000             ;0024: 51c8fffc

        MOVE.L  A7,2960(A4)             ;0028: 294f0b90

        MOVE.L  A6,2952(A4)             ;002c: 294e0b88

You will quickly notice that the program was compiled from a C source (arguments are passed on the stack, code is somewhat inefficient) and that it uses the small-data model. The data in section S_1 (starting with SECSTRT_1) are referenced via base register A4.

If we provide IRA with the correct small-data setting it will automatically create labels for all these references, which makes the output much more readable. Therefore we have to make three settings in the config file:

The base register (BASEREG).
The base address of the small data section (BASEADR).
The offset to the baseregister to reference the start (BASEOFF).

The base register is A4. The small data section's base address is SECSTRT_1. There are no small-data references before this label. Now we look for SECSTRT_1 in the output:

Code:

        SECTION S_1,DATA



SECSTRT_1:

        DS.L    1                       ;45b8

As you see the program offset for the small data section is $45b8. Now we just need to determine BASEOFF. It is needed whenever the base register doesn't point directly to the start of the small data section, but somewhere into the middle of it. This is usually done because it allows you to access the whole 64K region (offset -32768 to 32767) with base relative addressing modes. So in most case BASEOFF will be 32766, but in this case it is 0 (probably because data was known to be small). There are no negative offsets in the output and A4 is loaded with SECSTRT_1 directly. Therefore we can also omit BASEOFF, as it is zero.
Now we add two new lines to BootMan.cnf:

Code:

BASEREG A4

BASEADR $45b8

After this addition we rerun the reassembler, but this time with option -config instead of -preproc, because we want to use the already existing config file, BootMan.cnf.

ira -a -config BootMan

Caution! It would be a quite tragical error to confuse the two options at a later point, because -preproc would overwrite your config file.
(Translator's annotation: no longer true with recent IRA versions!)

As you might notice the output has changed. Now we have many new lables inside the data section. And in the code you will see e.g. LAB_036E(A4) instead of 2960(A4).

Code:

        MOVE.L  A7,LAB_036E(A4)         ;0028: 294f0b90

        MOVE.L  A6,LAB_036C(A4)         ;002c: 294e0b88

Not all assemblers can handle this label(An) syntax! A label used as a base displacement should indicate the assembler that this is a small data addressing mode, which creates the appropriate HUNK_DREL16 relocations in an object file. For simplicity, IRA uses the BASEREG directive, known from AsmOne, which is also supported by PhxAss and vasm:

        BASEREG SECSTRT_1,A4

To assemble IRA-output I am using vasm (vasmm68k_mot) in Devpac mode. This is required to make sure that the assembler performs no optimisations and to treat escape characters (\) in strings correctly.
(Translator's annotation: newer vasm versions ignore escape characters by default, like Devpac, so -no-opt would suffice.)

Now it becomes laborious, depending on how much perfectionist you are. You should inspect the reassembled output at least once from the beginning to the end to fix non-detected code- and data regions in the config file. Then rerun IRA with this config file and check again for more regions. Repeat until you found everything. In the end it should no longer be a problem to modify the resulting source without breaking it.

The first suspicious location in our example is at offset $10fc:

Code:

        RTS                             ;10fa: 4e75

LAB_0074:

        DC.L    $bfec0b64,$650012d6,$7000302c,$0bcc4480 ;10fc

        DC.W    $4e75

LAB_0075:

        MOVE.L  A7,D0                   ;110e: 200f

A code block was not recognised, because it is probably not used at all (or referenced by a pointer). Nevertheless it is code and we should declare it as such. So we change the following two lines from BootMan.cnf

Code:

CODE $00000A60 - $000010FC

CODE $0000110E - $00001280

into:

Code:

CODE $00000A60 - $00001280

...which makes the gap between $10fc and $110e disappear. Feel free to rerun IRA with the new config before you continue to inspect the output.

But what's that?

Code:

        MOVEA.L #$00dff016,A0           ;1522: 207c00dff016

        DC.W    $0810                   ;1528

        DC.W    $000a                   ;152a

        BNE.W   LAB_009B                ;152c: 660000fa

An instruction was not reassembled? Exactly! It encodes as a "BTST #10,(A0)", although BTST can only address bytes in memory. So the #10 would be interpreted as #2 by the CPU. Some developers write the #10 nevertheless and may even be lucky that it works, when the target address is a word (like in this case, accessing register PORTGOR, $dff016, for the right mouse button).
Of course, you can make IRA to reassemble such illegel instructions as well, by using the compatibility option

-compat=b

Code:



LAB_00A0:

        DC.L    $53756e00               ;1700

Here the automatic text recognition didn't work. It should have been the string "Sun", but IRA does not recognise very short strings or strings which contain non-standard ASCII codes (i.e. ISO-8859-...). You can ignore that, because it doesn't break the reassembled code. But if you want to have a pretty string you can always add a TEXT directive to the config file:

TEXT $00001700 - $00001704

Important: Regions defined by TEXT directives cannot cross labels!

Code:

        DC.L    $0000206f,$0004226f     ;1a82

        DC.W    $0008

LAB_00F9:

        MOVEM.L D2-D3/A2,-(A7)          ;1a8c: 48e73020

On top of LAB_00F9 seems to be code, which IRA didn't recognise as code. Irritating are the leading two 0-bytes, which certainly do not belong to that function. You will find that frequently and is most likely caused by a 32-bit alignment in the original source.
Hence, the code starts at $1a84 and not at $1a8c, which you should fix in the config file.

Now it becomes interesting. Especially in compiler output, but also in optimised assembler programs you will often see this:

Code:

        CMPI.L  #$00000008,D0           ;1cce: 0c8000000008

        BCC.W   LAB_010E                ;1cd4: 64000084

        ADD.W   D0,D0                   ;1cd8: d040

        MOVE.W  LAB_010C(PC,D0.W),D0    ;1cda: 303b0006

        JMP     LAB_010D(PC,D0.W)       ;1cde: 4efb0004

LAB_010C:

        DC.W    $000e                   ;1ce2

LAB_010D:

        DC.L    $0012001c,$00260036,$00400050,$00607000 ;1ce4

        DC.L    $60660806,$0001675e,$7000605c,$08060002

        DC.L    $67547000,$60520806,$0002674a,$08060001

        DC.L    $67447000,$60420806,$0003673a,$70006038

        DC.L    $08060003,$670a0806,$00016704,$70006028

        DC.L    $08060003,$67200806,$0002671a,$70006018

        DC.L    $08060003,$67100806,$0002670a,$08060001

        DC.L    $67047000

        DC.W    $6002

LAB_010E:

        MOVEQ   #-1,D0                  ;1d5a: 70ff

        MOVEM.L (A7)+,D2/D6-D7/A3/A5-A6 ;1d5c: 4cdf68c4

        RTS                             ;1d60: 4e75

That's a jump table. It starts at offset $1ce2. An experienced 68k coder will also quickly notice where it ends. Otherwise you have to try it with different sizes and look at the reassembly.
We see a sequence of eight 16-bit offsets for jumping pc-relatively to LAB_010D: $e,$12,$1c,$26,$36,$40,$50,$60. The first opcode after that is $7000 (a MOVEQ #0,D0). Hence the jump table extends until offset $1cf2.
IRA knows 8-, 16- and 32-bit jump tables. The directives are JMPB, JMPW and JMPL and expect, as usual, a start- and end-offset for the region. Obviously we select JMPW in this case.

Frequently jump table offsets are used to jump relatively to the table's start address. But many C compilers, as in this case, branch relative to the table's start address + 2. To support this case IRA allows an optional third agument for the JMPx directive, which designates the relative jump table base address. So we make the following entry into BootMan.cnf:

Code:

JMPW $00001CE2 - $00001CF2 @ $00001CE4

CODE $00001CF2 - $00001D62

The next instruction is at $1cf2, so we also had to adapt the following CODE directive. Now the data block from above looks like this:

Code:

LAB_010C:

        DC.W    (LAB_010E)-(LAB_010C+2) ;1ce2: 000e

        DC.W    (LAB_010F)-(LAB_010C+2) ;1ce4: 0012

        DC.W    (LAB_0110)-(LAB_010C+2) ;1ce6: 001c

        DC.W    (LAB_0111)-(LAB_010C+2) ;1ce8: 0026

        DC.W    (LAB_0112)-(LAB_010C+2) ;1cea: 0036

        DC.W    (LAB_0113)-(LAB_010C+2) ;1cec: 0040

        DC.W    (LAB_0114)-(LAB_010C+2) ;1cee: 0050

        DC.W    (LAB_0115)-(LAB_010C+2) ;1cf0: 0060

LAB_010E:

        MOVEQ   #0,D0                   ;1cf2: 7000

        BRA.S   LAB_0117                ;1cf4: 6066

LAB_010F:

        BTST    #1,D6                   ;1cf6: 08060001

        BEQ.S   LAB_0116                ;1cfa: 675e

        MOVEQ   #0,D0                   ;1cfc: 7000

        BRA.S   LAB_0117                ;1cfe: 605c

LAB_0110:

        BTST    #2,D6                   ;1d00: 08060002

        BEQ.S   LAB_0116                ;1d04: 6754

        ...

You must find such jump tables in the output! Otherwise you will never reach a perfect reassembly and later modifications to the source will corrupt these tables.

We continue like this through the whole output, until we reach the point where everything seems perfectly reassembled. The final BootMan.cnf looks like this:

Code:

MACHINE 68000

ENTRY $00000000

OFFSET $00000000

BASEREG A4

BASEADR $45B8

CODE $00000000 - $00000254

CODE $00000260 - $00000A5E

CODE $00000A60 - $000016FE

CODE $0000183C - $00001A0A

CODE $00001A1C - $00001A82

CODE $00001A84 - $00001BDA

CODE $00001BDC - $00001CE2

JMPW $00001CE2 - $00001CF2 @ $00001CE4

CODE $00001CF2 - $00001D62

CODE $00001D64 - $00001FEE

CODE $00001FF0 - $00002242

CODE $00002244 - $000022B6

CODE $000022E4 - $000023A6

CODE $000023A8 - $000023D6

CODE $000023D8 - $000023E6

CODE $000023E8 - $00002546

TEXT $0000257E - $000025DA

CODE $000025DC - $0000262A

CODE $0000262C - $000026BC

TEXT $000026BC - $000026C0

CODE $000026C8 - $00002BE2

TEXT $00002C0E - $00002C12

CODE $00002C3C - $00002E22

CODE $00002E24 - $00002E76

CODE $00002E78 - $000033CE

CODE $000033D4 - $0000346A

CODE $0000346C - $000034E2

CODE $000034E4 - $000037C6

CODE $000037C8 - $00003A3E

CODE $00003A40 - $00003B1C

JMPW $00003B1C - $00003B3E @ $3B1E

CODE $00003B3E - $00003C64

JMPW $00003C64 - $00003CB6 @ $3C66

CODE $00003CB6 - $00003F26

CODE $00003F28 - $00003FD2

CODE $00003FD4 - $00003FE6

CODE $00003FE8 - $00004112

TEXT $00004138 - $0000413C

CODE $0000413C - $00004164

CODE $00004176 - $00004262

CODE $00004264 - $0000427A

CODE $0000427C - $00004296

CODE $00004298 - $000042DA

CODE $000042DC - $0000435A

CODE $0000435C - $000043A2

CODE $000043A4 - $0000440E

CODE $00004410 - $00004462

CODE $00004464 - $00004496

CODE $00004498 - $000044B2

CODE $000044B4 - $000044CA

CODE $000044DA - $000045B6

CODE $000045B8 - $000045B8

TEXT $00004650 - $00004660

TEXT $0000471A - $00004730

TEXT $00004846 - $0000484A

TEXT $000048E6 - $00004902

TEXT $000049A4 - $000049A8

TEXT $000049E2 - $000049F0

TEXT $00004A08 - $00004C74

TEXT $000050C4 - $000050D6

CODE $00005364 - $00005364

END

Now the reassembler output, BootMan.asm, has reached a state where we can freely modify and optimize it. The resulting executable will still work.