English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General > Coders. Tutorials

 
 
Thread Tools
Old 24 March 2020, 17:14   #1
phx
Natteravn
 
phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,496
Small IRA Tutorial

Many years ago I did this tutorial in german on a1k.org. I tried to quickly translate everything, so it can be useful for more people.
There was also a second part, which is about modifying the resulting source to fix a problem in it, but it has not directly to do with IRA.

---

This example is based on a user request who wanted to get rid of the flickering date/time display in the screen bar of the "BootMan11" (a 20-years old boot manager for OS2.0+). So I demonstrated how to use a reassembler for fixing a program.

We download the program from Aminet (http://aminet.net/search?query=bootman11) and unpack it. The main program resides in BootMan/s/BootMan and has a size of 21580 bytes. We copy it into our empty work directory.

First you should make IRA generate a config file, because we need it to iteratetively improve the output of the reassembly. The option -preproc exists for this purpose. In contrast to a simple single-pass reassembly (without config file) IRA also tries to follow the program flow and marks only those parts of the program as code which are referenced by already known code, beginning from the starting point. The rest is regarded as data. That's not always quite correct, of course (for example when using function pointers), but we can fix that in the config file later on.

Code:
frank@albireo ira -a -preproc BootMan

IRA V2.07 (Jun  6 2012)
(c)1993-95 Tim Ruehsen (SiliconSurfer/PHANTASM)
(c)2009-2012 Frank Wille


SOURCE : BootMan
TARGET : BootMan.asm
MACHINE: MC68000
OFFSET : $00000000
Pass 0: scanning for data in code
Areas:   58  
CodeAdrs: 0   CodeAdrMax: 128
CodeArea[0]: 00000000 - 00000254
CodeArea[1]: 00000260 - 00000a5e
CodeArea[2]: 00000a60 - 000010fc
CodeArea[3]: 0000110e - 00001280
CodeArea[4]: 00001294 - 00001458
CodeArea[5]: 00001464 - 000016fe
CodeArea[6]: 00001854 - 000019ac
CodeArea[7]: 00001a48 - 00001a82
CodeArea[8]: 00001a8c - 00001b94
CodeArea[9]: 00001b98 - 00001bda
CodeArea[10]: 00001c18 - 00001ce2
CodeArea[11]: 00001d5a - 00001d62
CodeArea[12]: 00001d70 - 00001e62
CodeArea[13]: 00001e70 - 00001f5c
CodeArea[14]: 00001f6a - 00001fee
CodeArea[15]: 00001ff4 - 00002242
CodeArea[16]: 00002244 - 000022b6
CodeArea[17]: 000022e4 - 000023a6
CodeArea[18]: 000023a8 - 000023d6
CodeArea[19]: 000023d8 - 000023e6
CodeArea[20]: 000023ec - 00002444
CodeArea[21]: 00002448 - 00002546
CodeArea[22]: 00002594 - 000025da
CodeArea[23]: 000025dc - 0000262a
CodeArea[24]: 0000262c - 000026bc
CodeArea[25]: 000026c8 - 00002be2
CodeArea[26]: 00002c3c - 00002c78
CodeArea[27]: 00002c7c - 00002e22
CodeArea[28]: 00002e2c - 00002e3c
CodeArea[29]: 00002e48 - 00002e76
CodeArea[30]: 00002eac - 000032b6
CodeArea[31]: 000032c0 - 000033ce
CodeArea[32]: 000033d4 - 00003428
CodeArea[33]: 0000343c - 0000346a
CodeArea[34]: 0000346c - 000034e2
CodeArea[35]: 000034e8 - 000037c6
CodeArea[36]: 000037cc - 000038fc
CodeArea[37]: 0000390a - 00003a3e
CodeArea[38]: 00003a40 - 00003aa2
CodeArea[39]: 00003ab8 - 00003b1c
CodeArea[40]: 00003b68 - 00003c64
CodeArea[41]: 00003e6c - 00003e72
CodeArea[42]: 00003f1c - 00003f26
CodeArea[43]: 00003f3c - 00003fa4
CodeArea[44]: 00003fa8 - 00003fd2
CodeArea[45]: 00003fd4 - 00003fe6
CodeArea[46]: 000041f8 - 00004262
CodeArea[47]: 00004264 - 0000427a
CodeArea[48]: 0000427c - 00004296
CodeArea[49]: 00004298 - 000042da
CodeArea[50]: 000042dc - 0000435a
CodeArea[51]: 0000435c - 000043a2
CodeArea[52]: 000043a4 - 0000440e
CodeArea[53]: 00004410 - 00004462
CodeArea[54]: 00004464 - 00004496
CodeArea[55]: 00004498 - 000044b2
CodeArea[56]: 000044b4 - 000044ca
CodeArea[57]: 000044da - 000045b6
CodeArea[58]: 000045b8 - 000045b8
CodeArea[59]: 00005364 - 00005364


Pass 1: 100%
Pass 2: correcting labels
Pass 2: writing mnemonics
100%
The generated config file, BootMan.cnf, looks like this and defines all the detected code regions:
Code:
MACHINE 68000
ENTRY $00000000
OFFSET $00000000
CODE $00000000 - $00000254
CODE $00000260 - $00000A5E
CODE $00000A60 - $000010FC
CODE $0000110E - $00001280
CODE $00001294 - $00001458
CODE $00001464 - $000016FE
CODE $00001854 - $000019AC
CODE $00001A48 - $00001A82
CODE $00001A8C - $00001B94
CODE $00001B98 - $00001BDA
CODE $00001C18 - $00001CE2
CODE $00001D5A - $00001D62
CODE $00001D70 - $00001E62
CODE $00001E70 - $00001F5C
CODE $00001F6A - $00001FEE
CODE $00001FF4 - $00002242
CODE $00002244 - $000022B6
CODE $000022E4 - $000023A6
CODE $000023A8 - $000023D6
CODE $000023D8 - $000023E6
CODE $000023EC - $00002444
CODE $00002448 - $00002546
CODE $00002594 - $000025DA
CODE $000025DC - $0000262A
CODE $0000262C - $000026BC
CODE $000026C8 - $00002BE2
CODE $00002C3C - $00002C78
CODE $00002C7C - $00002E22
CODE $00002E2C - $00002E3C
CODE $00002E48 - $00002E76
CODE $00002EAC - $000032B6
CODE $000032C0 - $000033CE
CODE $000033D4 - $00003428
CODE $0000343C - $0000346A
CODE $0000346C - $000034E2
CODE $000034E8 - $000037C6
CODE $000037CC - $000038FC
CODE $0000390A - $00003A3E
CODE $00003A40 - $00003AA2
CODE $00003AB8 - $00003B1C
CODE $00003B68 - $00003C64
CODE $00003E6C - $00003E72
CODE $00003F1C - $00003F26
CODE $00003F3C - $00003FA4
CODE $00003FA8 - $00003FD2
CODE $00003FD4 - $00003FE6
CODE $000041F8 - $00004262
CODE $00004264 - $0000427A
CODE $0000427C - $00004296
CODE $00004298 - $000042DA
CODE $000042DC - $0000435A
CODE $0000435C - $000043A2
CODE $000043A4 - $0000440E
CODE $00004410 - $00004462
CODE $00004464 - $00004496
CODE $00004498 - $000044B2
CODE $000044B4 - $000044CA
CODE $000044DA - $000045B6
CODE $000045B8 - $000045B8
CODE $00005364 - $00005364
END
You should always use the option -a to add the program offset as a comment to each line. Important to know is that these offsets are consecutive over the whole program. They are no section-offsets, but offsets to the program start, where all sections form a contiguous block.

First view of the output:
Code:
        SECTION S_0,CODE

SECSTRT_0:
        MOVEM.L D1-D6/A0-A6,-(A7)       ;0000: 48e77efe
        MOVEA.L A0,A2                   ;0004: 2448
        MOVE.L  D0,D2                   ;0006: 2400
        LEA     SECSTRT_1,A4            ;0008: 49f9000045b8
        MOVEA.L ABSEXECBASE.W,A6        ;000e: 2c780004
        LEA     LAB_0314,A3             ;0012: 47f900005118
        MOVEQ   #0,D1                   ;0018: 7200
        MOVE.L  #$00000093,D0           ;001a: 203c00000093
        BRA.S   LAB_0001                ;0020: 6002
LAB_0000:
        MOVE.L  D1,(A3)+                ;0022: 26c1
LAB_0001:
        DBF     D0,LAB_0000             ;0024: 51c8fffc
        MOVE.L  A7,2960(A4)             ;0028: 294f0b90
        MOVE.L  A6,2952(A4)             ;002c: 294e0b88
You will quickly notice that the program was compiled from a C source (arguments are passed on the stack, code is somewhat inefficient) and that it uses the small-data model. The data in section S_1 (starting with SECSTRT_1) are referenced via base register A4.

If we provide IRA with the correct small-data setting it will automatically create labels for all these references, which makes the output much more readable. Therefore we have to make three settings in the config file:
  1. The base register (BASEREG).
  2. The base address of the small data section (BASEADR).
  3. The offset to the baseregister to reference the start (BASEOFF).
The base register is A4. The small data section's base address is SECSTRT_1. There are no small-data references before this label. Now we look for SECSTRT_1 in the output:
Code:
        SECTION S_1,DATA

SECSTRT_1:
        DS.L    1                       ;45b8
As you see the program offset for the small data section is $45b8. Now we just need to determine BASEOFF. It is needed whenever the base register doesn't point directly to the start of the small data section, but somewhere into the middle of it. This is usually done because it allows you to access the whole 64K region (offset -32768 to 32767) with base relative addressing modes. So in most case BASEOFF will be 32766, but in this case it is 0 (probably because data was known to be small). There are no negative offsets in the output and A4 is loaded with SECSTRT_1 directly. Therefore we can also omit BASEOFF, as it is zero.
Now we add two new lines to BootMan.cnf:
Code:
BASEREG A4
BASEADR $45b8
After this addition we rerun the reassembler, but this time with option -config instead of -preproc, because we want to use the already existing config file, BootMan.cnf.
ira -a -config BootMan

Caution! It would be a quite tragical error to confuse the two options at a later point, because -preproc would overwrite your config file.
(Translator's annotation: no longer true with recent IRA versions!)

As you might notice the output has changed. Now we have many new lables inside the data section. And in the code you will see e.g. LAB_036E(A4) instead of 2960(A4).
Code:
        MOVE.L  A7,LAB_036E(A4)         ;0028: 294f0b90
        MOVE.L  A6,LAB_036C(A4)         ;002c: 294e0b88
Not all assemblers can handle this label(An) syntax! A label used as a base displacement should indicate the assembler that this is a small data addressing mode, which creates the appropriate HUNK_DREL16 relocations in an object file. For simplicity, IRA uses the BASEREG directive, known from AsmOne, which is also supported by PhxAss and vasm:
        BASEREG SECSTRT_1,A4

To assemble IRA-output I am using vasm (vasmm68k_mot) in Devpac mode. This is required to make sure that the assembler performs no optimisations and to treat escape characters (\) in strings correctly.
(Translator's annotation: newer vasm versions ignore escape characters by default, like Devpac, so -no-opt would suffice.)

Now it becomes laborious, depending on how much perfectionist you are. You should inspect the reassembled output at least once from the beginning to the end to fix non-detected code- and data regions in the config file. Then rerun IRA with this config file and check again for more regions. Repeat until you found everything. In the end it should no longer be a problem to modify the resulting source without breaking it.

The first suspicious location in our example is at offset $10fc:
Code:
        RTS                             ;10fa: 4e75
LAB_0074:
        DC.L    $bfec0b64,$650012d6,$7000302c,$0bcc4480 ;10fc
        DC.W    $4e75
LAB_0075:
        MOVE.L  A7,D0                   ;110e: 200f
A code block was not recognised, because it is probably not used at all (or referenced by a pointer). Nevertheless it is code and we should declare it as such. So we change the following two lines from BootMan.cnf

Code:
CODE $00000A60 - $000010FC
CODE $0000110E - $00001280
into:
Code:
CODE $00000A60 - $00001280
...which makes the gap between $10fc and $110e disappear. Feel free to rerun IRA with the new config before you continue to inspect the output.

But what's that?
Code:
        MOVEA.L #$00dff016,A0           ;1522: 207c00dff016
        DC.W    $0810                   ;1528
        DC.W    $000a                   ;152a
        BNE.W   LAB_009B                ;152c: 660000fa
An instruction was not reassembled? Exactly! It encodes as a "BTST #10,(A0)", although BTST can only address bytes in memory. So the #10 would be interpreted as #2 by the CPU. Some developers write the #10 nevertheless and may even be lucky that it works, when the target address is a word (like in this case, accessing register PORTGOR, $dff016, for the right mouse button).
Of course, you can make IRA to reassemble such illegel instructions as well, by using the compatibility option
-compat=b
.

Code:
LAB_00A0:
        DC.L    $53756e00               ;1700
Here the automatic text recognition didn't work. It should have been the string "Sun", but IRA does not recognise very short strings or strings which contain non-standard ASCII codes (i.e. ISO-8859-...). You can ignore that, because it doesn't break the reassembled code. But if you want to have a pretty string you can always add a TEXT directive to the config file:
TEXT $00001700 - $00001704

Important: Regions defined by TEXT directives cannot cross labels!


Code:
        DC.L    $0000206f,$0004226f     ;1a82
        DC.W    $0008
LAB_00F9:
        MOVEM.L D2-D3/A2,-(A7)          ;1a8c: 48e73020
On top of LAB_00F9 seems to be code, which IRA didn't recognise as code. Irritating are the leading two 0-bytes, which certainly do not belong to that function. You will find that frequently and is most likely caused by a 32-bit alignment in the original source.
Hence, the code starts at $1a84 and not at $1a8c, which you should fix in the config file.


Now it becomes interesting. Especially in compiler output, but also in optimised assembler programs you will often see this:
Code:
        CMPI.L  #$00000008,D0           ;1cce: 0c8000000008
        BCC.W   LAB_010E                ;1cd4: 64000084
        ADD.W   D0,D0                   ;1cd8: d040
        MOVE.W  LAB_010C(PC,D0.W),D0    ;1cda: 303b0006
        JMP     LAB_010D(PC,D0.W)       ;1cde: 4efb0004
LAB_010C:
        DC.W    $000e                   ;1ce2
LAB_010D:
        DC.L    $0012001c,$00260036,$00400050,$00607000 ;1ce4
        DC.L    $60660806,$0001675e,$7000605c,$08060002
        DC.L    $67547000,$60520806,$0002674a,$08060001
        DC.L    $67447000,$60420806,$0003673a,$70006038
        DC.L    $08060003,$670a0806,$00016704,$70006028
        DC.L    $08060003,$67200806,$0002671a,$70006018
        DC.L    $08060003,$67100806,$0002670a,$08060001
        DC.L    $67047000
        DC.W    $6002
LAB_010E:
        MOVEQ   #-1,D0                  ;1d5a: 70ff
        MOVEM.L (A7)+,D2/D6-D7/A3/A5-A6 ;1d5c: 4cdf68c4
        RTS                             ;1d60: 4e75
That's a jump table. It starts at offset $1ce2. An experienced 68k coder will also quickly notice where it ends. Otherwise you have to try it with different sizes and look at the reassembly.
We see a sequence of eight 16-bit offsets for jumping pc-relatively to LAB_010D: $e,$12,$1c,$26,$36,$40,$50,$60. The first opcode after that is $7000 (a MOVEQ #0,D0). Hence the jump table extends until offset $1cf2.
IRA knows 8-, 16- and 32-bit jump tables. The directives are JMPB, JMPW and JMPL and expect, as usual, a start- and end-offset for the region. Obviously we select JMPW in this case.

Frequently jump table offsets are used to jump relatively to the table's start address. But many C compilers, as in this case, branch relative to the table's start address + 2. To support this case IRA allows an optional third agument for the JMPx directive, which designates the relative jump table base address. So we make the following entry into BootMan.cnf:
Code:
JMPW $00001CE2 - $00001CF2 @ $00001CE4
CODE $00001CF2 - $00001D62
The next instruction is at $1cf2, so we also had to adapt the following CODE directive. Now the data block from above looks like this:
Code:
LAB_010C:
        DC.W    (LAB_010E)-(LAB_010C+2) ;1ce2: 000e
        DC.W    (LAB_010F)-(LAB_010C+2) ;1ce4: 0012
        DC.W    (LAB_0110)-(LAB_010C+2) ;1ce6: 001c
        DC.W    (LAB_0111)-(LAB_010C+2) ;1ce8: 0026
        DC.W    (LAB_0112)-(LAB_010C+2) ;1cea: 0036
        DC.W    (LAB_0113)-(LAB_010C+2) ;1cec: 0040
        DC.W    (LAB_0114)-(LAB_010C+2) ;1cee: 0050
        DC.W    (LAB_0115)-(LAB_010C+2) ;1cf0: 0060
LAB_010E:
        MOVEQ   #0,D0                   ;1cf2: 7000
        BRA.S   LAB_0117                ;1cf4: 6066
LAB_010F:
        BTST    #1,D6                   ;1cf6: 08060001
        BEQ.S   LAB_0116                ;1cfa: 675e
        MOVEQ   #0,D0                   ;1cfc: 7000
        BRA.S   LAB_0117                ;1cfe: 605c
LAB_0110:
        BTST    #2,D6                   ;1d00: 08060002
        BEQ.S   LAB_0116                ;1d04: 6754
        ...
You must find such jump tables in the output! Otherwise you will never reach a perfect reassembly and later modifications to the source will corrupt these tables.

We continue like this through the whole output, until we reach the point where everything seems perfectly reassembled. The final BootMan.cnf looks like this:
Code:
MACHINE 68000
ENTRY $00000000
OFFSET $00000000
BASEREG A4
BASEADR $45B8
CODE $00000000 - $00000254
CODE $00000260 - $00000A5E
CODE $00000A60 - $000016FE
CODE $0000183C - $00001A0A
CODE $00001A1C - $00001A82
CODE $00001A84 - $00001BDA
CODE $00001BDC - $00001CE2
JMPW $00001CE2 - $00001CF2 @ $00001CE4
CODE $00001CF2 - $00001D62
CODE $00001D64 - $00001FEE
CODE $00001FF0 - $00002242
CODE $00002244 - $000022B6
CODE $000022E4 - $000023A6
CODE $000023A8 - $000023D6
CODE $000023D8 - $000023E6
CODE $000023E8 - $00002546
TEXT $0000257E - $000025DA
CODE $000025DC - $0000262A
CODE $0000262C - $000026BC
TEXT $000026BC - $000026C0
CODE $000026C8 - $00002BE2
TEXT $00002C0E - $00002C12
CODE $00002C3C - $00002E22
CODE $00002E24 - $00002E76
CODE $00002E78 - $000033CE
CODE $000033D4 - $0000346A
CODE $0000346C - $000034E2
CODE $000034E4 - $000037C6
CODE $000037C8 - $00003A3E
CODE $00003A40 - $00003B1C
JMPW $00003B1C - $00003B3E @ $3B1E
CODE $00003B3E - $00003C64
JMPW $00003C64 - $00003CB6 @ $3C66
CODE $00003CB6 - $00003F26
CODE $00003F28 - $00003FD2
CODE $00003FD4 - $00003FE6
CODE $00003FE8 - $00004112
TEXT $00004138 - $0000413C
CODE $0000413C - $00004164
CODE $00004176 - $00004262
CODE $00004264 - $0000427A
CODE $0000427C - $00004296
CODE $00004298 - $000042DA
CODE $000042DC - $0000435A
CODE $0000435C - $000043A2
CODE $000043A4 - $0000440E
CODE $00004410 - $00004462
CODE $00004464 - $00004496
CODE $00004498 - $000044B2
CODE $000044B4 - $000044CA
CODE $000044DA - $000045B6
CODE $000045B8 - $000045B8
TEXT $00004650 - $00004660
TEXT $0000471A - $00004730
TEXT $00004846 - $0000484A
TEXT $000048E6 - $00004902
TEXT $000049A4 - $000049A8
TEXT $000049E2 - $000049F0
TEXT $00004A08 - $00004C74
TEXT $000050C4 - $000050D6
CODE $00005364 - $00005364
END
Now the reassembler output, BootMan.asm, has reached a state where we can freely modify and optimize it. The resulting executable will still work.

Last edited by phx; 24 March 2020 at 19:22. Reason: BASEREG A4
phx is offline  
Old 24 March 2020, 18:02   #2
kamelito
Zone Friend
 
kamelito's Avatar
 
Join Date: May 2006
Location: France
Posts: 1,801
Thanks, first thing that comes to mind is the options and how/why/when to use them?
Why not make the conf file by default
I’m interested in the 2nd part too.

You wrote BASEREG 4 you meant A4?
Is it possible to use 2 labels instead of hex values ? Likening your example CODE $00000A60 - $000001280?

How to be sure to have the right value for the BASEOFF value?

Last edited by kamelito; 24 March 2020 at 18:16.
kamelito is offline  
Old 24 March 2020, 19:22   #3
phx
Natteravn
 
phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,496
Quote:
Originally Posted by kamelito View Post
Thanks, first thing that comes to mind is the options and how/why/when to use them?
You mean command line options?
Most (all?) options are shown when IRA is started without arguments, together with a short description. Most important are -a, -preproc and -config, as shown in the example above.
-m680x0 has to be used when code for a different CPU than 68000 should be reassembled. -oldstyle nad -newstyle select the syntax.
-binary is used when reassembling raw binary files, instead of hunk-format executables. This is more difficult, because the reassembler is lacking relocation information. You will usually spend more time to differentiate real pointers from constants (using the PTRS and NOPTRS directives).
-entry defines the starting point of the code, which is usually only important for binary files (hunk-executables always start with first instruction in first section).
-keepzh keeps empty sections in the output. Only required if you want to make sure the reassembled program is identical.
-compat=[b][i]. Compatibility with bad code/assemblers. 'b' recognises bit-instructions accessing a bit number > 7 in a byte, and 'i' recognises immediate byte addressing modes, where the MSB is no zero
The rest is rather rare.

Quote:
Why not make the conf file by default
Without preprocessing (following the program flow) the config file would be worthless. Without -preproc you get not much more than a better disassembler - which might nevertheless be useful for some quick checks.

Quote:
I’m interested in the 2nd part too.
Ok. I might prepare that later.

Quote:
You wrote BASEREG 4 you meant A4?
Right! Will fix that.

Quote:
Is it possible to use 2 labels instead of hex values ? Likening your example CODE $00000A60 - $000001280?
Doesn't make much sense, unless you define your own labels with the SYMBOL directive. Label names may change with every new run.

Quote:
How to be sure to have the right value for the BASEOFF value?
Usually small data addressing will start at the beginning of a section. When A4 is loaded with an address pointing to the start of the section + 32766 then you know the BASEOFF is the same.
phx is offline  
Old 24 March 2020, 20:18   #4
kamelito
Zone Friend
 
kamelito's Avatar
 
Join Date: May 2006
Location: France
Posts: 1,801
Is there a way to tell IRA to replace offsets calls to the OS and then it replace all of them with LVO? Same for hardware registers are they recognized and replaced with EQUs?
kamelito is offline  
Old 24 March 2020, 20:22   #5
StingRay
move.l #$c0ff33,throat
 
StingRay's Avatar
 
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
For hardware registers this is easy, for library calls I do not see how this can be implemented in a way that always correctly guesses the correct library that has to be used for certain _LVO offsets.
StingRay is offline  
Old 24 March 2020, 20:32   #6
jotd
This cat is no more
 
jotd's Avatar
 
Join Date: Dec 2004
Location: FRANCE
Age: 52
Posts: 8,160
@kamelito I have a python script (cheapres.py) which does that. I was planning on releasing it on github
jotd is offline  
Old 24 March 2020, 20:33   #7
kamelito
Zone Friend
 
kamelito's Avatar
 
Join Date: May 2006
Location: France
Posts: 1,801
Sherlock disassembler does this, it is like IRA, if it find $4 then a call it knows that it is exec calls, then you can tell it that A6 point to intuition base then all cals will be replaced with LVO until A6 changes.
http://translate.googleusercontent.c...-s45ii-oPjW0KA

More readable in French
http://obligement.free.fr/articles/sherlock.php

Last edited by kamelito; 24 March 2020 at 20:38.
kamelito is offline  
Old 24 March 2020, 20:41   #8
kamelito
Zone Friend
 
kamelito's Avatar
 
Join Date: May 2006
Location: France
Posts: 1,801
Quote:
Originally Posted by jotd View Post
@kamelito I have a python script (cheapres.py) which does that. I was planning on releasing it on github
Nice, I suppose integrating the logic of your script into Ira is complicated.
Please link it here when ready

Last edited by kamelito; 24 March 2020 at 20:52.
kamelito is offline  
Old 24 March 2020, 20:50   #9
StingRay
move.l #$c0ff33,throat
 
StingRay's Avatar
 
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
Quote:
Originally Posted by kamelito View Post
Sherlock disassembler does this, it is like IRA, if it find $4 then a call it knows that it is exec calls, then you can tell it that A6 point to intuition base then all cals will be replaced with LVO until A6 changes.
I have not said it is impossible but I'm quite sure it is not doable 100% correct automatically. You have to follow/trace the complete code to be able to use the correct libary offsets reliably.
StingRay is offline  
Old 24 March 2020, 20:51   #10
jotd
This cat is no more
 
jotd's Avatar
 
Join Date: Dec 2004
Location: FRANCE
Age: 52
Posts: 8,160
I had a C++ version at some point. But I rewrote it from scratch into python. Besides, the script can make wrong assumptions. But it generally works very well 99% of the time. I've been using this approach for 15+ years and it always worked well.

It can detect library strings when opening libraries, change the labels to "xxxbase" and then change the offsets when A6 is loaded with this base. The base is reset when encountering RTS, so there's little chance that there's a mistake. At worst, the offsets remain (and the tool comments "unknown"). You can then figure out the unknown calls manually by setting the proper library base, and repeat the operation until there aren't any "unknown" left.

I recently added some "formal execution" that follows library base into registers, until it is set to A6.

I'll try to publish the github repository later. I don't want to create a repository only for this tool.

Note: CFOU! used it to reverse-engineer EOB / EOB2 to create AGA versions. I also use it a lot to clarify what the hell a DOS game is doing when creating a whdload slave.

BTW one thing I hate is when IRA fails to disassemble an executable. First it doesn't support overlayed (d68k does but the output sucks) at least for the non-overlayed parts. Then sometimes, on some executables, it just creates a bigger and bigger file. When you stop it, you get correct output at first, then a neverending lot of junk (dc.b). I usually edit it out. Anyway I'm generally not using IRA to be able to reassemble the code (although it works well). I use it to understand what the code does, and to apply patches at given offsets (that's why -a option is a must have for me)

I don't want to hijack this thread. I'll try the configuration tricks. I always used IRA in single pass.

Last edited by jotd; 24 March 2020 at 21:00.
jotd is offline  
Old 24 March 2020, 20:59   #11
StingRay
move.l #$c0ff33,throat
 
StingRay's Avatar
 
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
Quote:
Originally Posted by jotd View Post
The base is reset when encountering RTS, so there's little chance that there's a mistake.

Famous last words.
StingRay is offline  
Old 24 March 2020, 21:14   #12
jotd
This cat is no more
 
jotd's Avatar
 
Join Date: Dec 2004
Location: FRANCE
Age: 52
Posts: 8,160
lol. I think I have covered RTE as well but if someone does a BRA to a routine just after the code then changes A6, the next routine could have wrong lib offset

- if there is no RTS/RTE in between (means spaghetti code with another BRA)
- if A6 isn't reloaded by the routine

In real life, I may have encountered such a situation a few times. And I used the tool on hundreds of executables. That and the fact that a lot of system-friendly games are coded in C, and C compilers always remind the A6 base prior to call the system. The odds aren't against me at least.
jotd is offline  
Old 24 March 2020, 22:49   #13
phx
Natteravn
 
phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,496
Quote:
Originally Posted by kamelito View Post
Is there a way to tell IRA to replace offsets calls to the OS and then it replace all of them with LVO?
The original, non-portable, IRA release (V1.05) included a post-processor called "irapost" which did that. I never used it and it was left behind. It's probably better to rewrite it from scratch, like jotd did.

Quote:
Same for hardware registers are they recognized and replaced with EQUs?
Yes. IRA has symbols for all custom-chip registers, CIA registers and exception vectors.
phx is offline  
Old 25 March 2020, 13:15   #14
phx
Natteravn
 
phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,496
When you are done with reassembly and start to analyse the source, the SYMBOL directive might become useful. You can replace any label by a symbol name of your choice, by specifying a name and the program offset.

A good example for this tutorial might be Great Giana Sisters, which I reassembled and analysed 10 years ago for fun. It also shows how do deal with a raw binary program, starting at $1000 with program entry at $102c. In the config file you see that I have lots of PTRS directives to define program addresses (which are otherwise no known to the reassembler) and all my symbols which I added during analysation of the code.

The ripped binary (from a cracked disk): https://server.owl.de/~frank/download/gianacode.1000
The config file: https://server.owl.de/~frank/download/gianacode.cnf
The reassembled source: https://server.owl.de/~frank/download/gianacode.asm
phx is offline  
Old 26 March 2020, 00:58   #15
Asle
Registered User
 
Join Date: May 2006
Location: Paris/France
Age: 52
Posts: 526
Apologizes if this sounds stupid or something, but, what exactly is IRA ?!? I mean, it's the title of the thread but nobody explains or links to whatever it is.
Again, sorry if I missed the obvious.
Asle is offline  
Old 26 March 2020, 01:33   #16
phx
Natteravn
 
phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,496
http://aminet.net/package/dev/asm/ira
phx is offline  
Old 27 March 2020, 17:56   #17
kamelito
Zone Friend
 
kamelito's Avatar
 
Join Date: May 2006
Location: France
Posts: 1,801
I think I get it and start to like it because of the windows version
One feature could be to be able to set the value of the indentation when using the -a so all offsets could be aligned and so have a more readable output.

Is there a way to change data size of dc.x ? .b .w .l
Is it also possible to change to binary, hex and decimal the values?

Say A5 point to $DFF000, then something like move.w D6,98(A5) is there a way for IRA to put the Amiga chipset EQU instead?

VASM seems to do not like label with a ., like jmp .label update : labels were .begin .divs .mulu .modu maybe those word are reserved or is it something else, getting rid of the . fixed the issues.

Last edited by kamelito; 28 March 2020 at 16:14.
kamelito is offline  
Old 27 March 2020, 21:22   #18
redblade
Zone Friend
 
redblade's Avatar
 
Join Date: Mar 2004
Location: Middle Earth
Age: 40
Posts: 2,127
Thank you phx for this. Now I can patch MEmacs to run with a black backround and white/green text so that it is a lot easier on the eyes.
redblade is offline  
Old 27 March 2020, 21:24   #19
redblade
Zone Friend
 
redblade's Avatar
 
Join Date: Mar 2004
Location: Middle Earth
Age: 40
Posts: 2,127
Quote:
Originally Posted by phx View Post
A good example for this tutorial might be Great Giana Sisters, which I reassembled and analysed 10 years ago for fun. It also shows how do deal with a raw binary program, starting at $1000 with program entry at $102c. In the config file you see that I have lots of PTRS directives to define program addresses (which are otherwise no known to the reassembler) and all my symbols which I added during analysation of the code.

The ripped binary (from a cracked disk): https://server.owl.de/~frank/download/gianacode.1000
The config file: https://server.owl.de/~frank/download/gianacode.cnf
The reassembled source: https://server.owl.de/~frank/download/gianacode.asm
Were you the person who did Syndicate? I remember ending up on a Syndicate source code on some site.
redblade is offline  
Old 28 March 2020, 17:54   #20
jotd
This cat is no more
 
jotd's Avatar
 
Join Date: Dec 2004
Location: FRANCE
Age: 52
Posts: 8,160
just a few python tools based on IRA disassembly output on my github:

https://github.com/jotd666/amiga68ktools

tools/cheapres.py is the one which can put LVO names on IRA resourced code.

note that my approach was to first disassemble, then apply my tool on the disassembled text, so it may appear to conflict with the config file approach phx was exposing here. But if you run cheapres.py after each IRA pass, that is going to work too.

There's also a 68k code checker I advertised several years ago and which allowed to find hard-to-find CPU dependent loop on some games and also self-modifying code.
jotd is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
ira for Windows jotd Coders. General 63 12 December 2023 10:53
New tutorial on sprites Yragael Coders. Tutorials 8 04 September 2023 21:00
Debugging messages in serial (small tutorial, mainly for cross-dev) alkis Coders. Asm / Hardware 7 22 February 2016 14:16
68020 to 68000 code convertion using Ira and PhxAss gulliver Coders. Asm / Hardware 18 12 April 2014 01:09
Looking for IRA v1.07 or newer :-) voxel request.Apps 7 30 July 2008 01:39

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 11:10.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.20543 seconds with 14 queries