adda / suba Vs. lea - Page 3

Bruce Abbott · 30 September 2022, 03:51

Quote:

Originally Posted by meynaf

Then you don't need to encode the instructions manually. Just use opt 0 to turn all optimizations off, either at the command line or in the source itself.

I always turn optimization off because I don't want the assembler doing stuff behind my back. This is particularly important for cases like this where exact encoding is required.

Sometimes the only way to get exactly what you want is to write it in hex or create a macro for the instruction. For example the Barfly assembler 'optimizes' cmp to cmpi when the source is immediate, even with all optimizations turned off! Some compilers write word values into byte operands, which results in the (unused) upper byte being set to $ff when the signed byte value is negative. This becomes a problem if you are trying to create an identical executable from a disassembly, as the assembler may not do the same.

meynaf · 30 September 2022, 08:28

Quote:

Originally Posted by Bruce Abbott

I always turn optimization off because I don't want the assembler doing stuff behind my back. This is particularly important for cases like this where exact encoding is required.

Sometimes the only way to get exactly what you want is to write it in hex or create a macro for the instruction. For example the Barfly assembler 'optimizes' cmp to cmpi when the source is immediate, even with all optimizations turned off! Some compilers write word values into byte operands, which results in the (unused) upper byte being set to $ff when the signed byte value is negative. This becomes a problem if you are trying to create an identical executable from a disassembly, as the assembler may not do the same.

Exact encoding is rarely needed. You first want a program that works, don't you ?
Assemblers aren't compilers, they don't do important stuff behind your back. If they do, then they are broken.

Strictly identical exe is sometimes just impossible anyway, due to different ordering in reloc tables.

Turning optimization on has the advantage of choosing best options for branch sizes. Doing that by hand is a pita and sometimes close to impossible when using macros.

Resourcing code isn't that much different. First, reassemble with no opts. Then, compare with original exe and find out what the differences are. With PhxAss very few things can happen. So yeah, that $ff in negative byte immediates has disappeared. It's not a big deal, really. It won't prevent your program from working (and if it does, you know there is a checksum !).

Then simply turning optimizations on can earn many kilobytes and your executable is already better than the original one !

Photon · 30 September 2022, 23:39

As a reflection on the original discussion, you can know a few things that make instructions slower, without having to look up the cycle count for a specific CPU:

Immediate values that don't use the "q" suffix increase instruction length, and so execution time. E.g. move.w #imm,d0.
The same can be true for ea offsets, but not often. E.g. move.w d8(An,Rn),d0.
Modifying an address register is always longword-wide, which makes word-size instructions slower than their data register versions, except on the higher Motorola CPUs.
Extra instruction words means extra memory accesses, some of which may be blocked by the higher-priority parts of the chipset, sometimes for a short time, sometimes for a very long time. This doesn't happen if the instruction words are already in the instruction cache/prefetch buffer.
Instructions performing r/w to RAM can similarly be blocked, but not if the data is already in the data cache. Normally, Chip RAM is not cached.

Bruce Abbott · 30 September 2022, 23:57

Quote:

Originally Posted by meynaf

Exact encoding is rarely needed. You first want a program that works, don't you ?

Except in cases like this, which are not exactly rare here. As for in general, I'm just saying what I do - not suggesting it's the 'best' way for everyone.

Quote:

Assemblers aren't compilers, they don't do important stuff behind your back. If they do, then they are broken.

So Barfly assembler is broken. And VASM too, because it trims zeros off the end of your code without being told to (and sometimes gets it wrong). Actually most assemblers are broken in one way or another. I use ProAsm, which has several bugs that I had to patch to get correct code generation (worst one was setting the wrong data register in certain 68020 instructions).

Quote:

Strictly identical exe is sometimes just impossible anyway, due to different ordering in reloc tables.

I don't worry about reloc tables because the order doesn't matter. To compare reassembled code to the original I wrote a program that relocates them before comparison. When my disassembler is working properly the results are identical, which is how I check it for accuracy.

Quote:

Turning optimization on has the advantage of choosing best options for branch sizes. Doing that by hand is a pita and sometimes close to impossible when using macros.

You are right. Actually asm is a pain in general. But I like to know what size my branches are, as trying to keep them short makes my code tighter. Also the assembler takes longer when it has to modify branch sizes, and sometimes you need code to be a certain size (eg. branch tables).

Of course I always have the option of getting the assembler to do optimization if I am too lazy (rarely) or to make the final release code tighter.

Quote:

Resourcing code isn't that much different.

Resourcing code accurately enough to handle different instruction sizes is not that easy. It's not unusual to find PC relative offsets in data words, which are not detectable without a close examination of the code that uses them. If the code size changes they may point to the wrong place.

Quote:

First, reassemble with no opts. Then, compare with original exe and find out what the differences are.

This also isn't that easy if the code size changes. Everything after the change is offset by some amount and a straight binary comparison fails from there on.

Quote:

With PhxAss very few things can happen. So yeah, that $ff in negative byte immediates has disappeared. It's not a big deal, really. It won't prevent your program from working (and if it does, you know there is a checksum !).

True provided that the disassembly correctly identified code and data. If it didn't you could end up with 'code' that assembles into incorrect data. While you shouldn't let that happen, carefully inspecting every line for accuracy can be quite time-consuming. Often I just want a 'quick and dirty' disassembly so I can generate labels for debugging.

Quote:

Then simply turning optimizations on can earn many kilobytes and your executable is already better than the original one !

True, though my code doesn't generally have many branches that can be optimized. I find the best optimization is to review the code and ask "is this really the best way to do it?".

meynaf · 01 October 2022, 09:16

Quote:

Originally Posted by Bruce Abbott

Except in cases like this, which are not exactly rare here. As for in general, I'm just saying what I do - not suggesting it's the 'best' way for everyone.

True. We have different use cases.

Quote:

Originally Posted by Bruce Abbott

So Barfly assembler is broken. And VASM too, because it trims zeros off the end of your code without being told to (and sometimes gets it wrong). Actually most assemblers are broken in one way or another. I use ProAsm, which has several bugs that I had to patch to get correct code generation (worst one was setting the wrong data register in certain 68020 instructions).

I think VASM has an option to turn that "trim zeroes" off. Something to build kick 1.x compatible exes.

Quote:

Originally Posted by Bruce Abbott

I don't worry about reloc tables because the order doesn't matter. To compare reassembled code to the original I wrote a program that relocates them before comparison. When my disassembler is working properly the results are identical, which is how I check it for accuracy.

Great idea. Too bad it would be more complicated for me due i'm by far not always disassembling Amiga code.

Quote:

Originally Posted by Bruce Abbott

You are right. Actually asm is a pain in general. But I like to know what size my branches are, as trying to keep them short makes my code tighter. Also the assembler takes longer when it has to modify branch sizes, and sometimes you need code to be a certain size (eg. branch tables).

I can not always know what size my branches are, as i'm often using macros and conditional assembly (a likely story in an include file with features that only get assembled if they are used).
For branch tables you can turn optimizations off and on in the source itself, local to some code part (at least PhxAss can).
I see turning opts off useful only for initial comparison when resourcing and faster assembly of large programs, the latter being much less important with uae-jit.

Quote:

Originally Posted by Bruce Abbott

Of course I always have the option of getting the assembler to do optimization if I am too lazy (rarely) or to make the final release code tighter.

It depends on the size. I don't think you'll like to hand-optimize all branches in a 1M resourced executable...

Quote:

Originally Posted by Bruce Abbott

Resourcing code accurately enough to handle different instruction sizes is not that easy. It's not unusual to find PC relative offsets in data words, which are not detectable without a close examination of the code that uses them. If the code size changes they may point to the wrong place.

Right, but i've never seen the code size change with PhxAss opt 0.

Quote:

Originally Posted by Bruce Abbott

This also isn't that easy if the code size changes. Everything after the change is offset by some amount and a straight binary comparison fails from there on.

I know, but it never happened to me with optimizations turned off (or it's the sign i've missed something important).

Quote:

Originally Posted by Bruce Abbott

True provided that the disassembly correctly identified code and data. If it didn't you could end up with 'code' that assembles into incorrect data. While you shouldn't let that happen, carefully inspecting every line for accuracy can be quite time-consuming. Often I just want a 'quick and dirty' disassembly so I can generate labels for debugging.

That depends on what you want to do with the reassembled code.
Usually it's for making big alterations so you will have to check the code line by line. Actually i do all the code/data separation by hand.
If it's just to generate labels, i fail to see the usefulness - these labels are for the most part number-based and are meaningless.

Quote:

Originally Posted by Bruce Abbott

True, though my code doesn't generally have many branches that can be optimized. I find the best optimization is to review the code and ask "is this really the best way to do it?".

Well, not only the branches are optimized. If, say, you use a structure and access the first member : by using name(An) (which is supposed to be recommended) the assembler will use 0(An) instead of (An) with optimizations off. Other zero constants may appear as well.

Karlos · 01 October 2022, 14:20

Some really useful info guys. Manuals are one thing, real world examples sometimes another. Apologies for the nerd-snipe.

30 September 2022, 23:39	#43
Photon Moderator Join Date: Nov 2004 Location: Eksjö / Sweden Posts: 5,602	As a reflection on the original discussion, you can know a few things that make instructions slower, without having to look up the cycle count for a specific CPU: Immediate values that don't use the "q" suffix increase instruction length, and so execution time. E.g. move.w #imm,d0. The same can be true for ea offsets, but not often. E.g. move.w d8(An,Rn),d0. Modifying an address register is always longword-wide, which makes word-size instructions slower than their data register versions, except on the higher Motorola CPUs. Extra instruction words means extra memory accesses, some of which may be blocked by the higher-priority parts of the chipset, sometimes for a short time, sometimes for a very long time. This doesn't happen if the instruction words are already in the instruction cache/prefetch buffer. Instructions performing r/w to RAM can similarly be blocked, but not if the data is already in the data cache. Normally, Chip RAM is not cached.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
32bit PC-relative LEA ??	Nut	Coders. General	22	18 March 2010 10:56

01 October 2022, 14:20	#46
Karlos Alien Bleed Join Date: Aug 2022 Location: UK Posts: 4,118	Some really useful info guys. Manuals are one thing, real world examples sometimes another. Apologies for the nerd-snipe.

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)