Does assembler language optimizing 'compilers' exist? (Whole program optimizer)

NorthWay · 18 March 2023, 05:45

Background: There have been a few programs that are said to be "line-by-line" cpu to cpu translations. I think "Head over Heals" (or some other isometric game) on the Amiga supposedly did this. There also exists cpu-translation assemblers that you can feed (say) Z80 assembler code and it will spit out a 68000 code translation.
I have myself spent a lot of time with the originally ST-based Xformer 6502 emulator and I could envision taking the inner workings of it to take one 6502 opcode and produce a set of 68000 opcodes that replicates it.

All the assemblers I have seen and heard about will, at best, only look at a single or a pair of opcodes and replace that with more time/space optimal forms (apart from branch distances, but I'd classify that as single in this context).

I am wondering if there are any assemblers, for any architecture, that treats the input source as a "generic" language and does "dead code" optimization like your typical advanced C compiler?
In the context of using Xformer as a translation model I know there is a lot of 6502 flag special casing that generates a lot of instructions, of which very few will actually be used by a later instruction. If the assembler worked as an optimizing compiler it would detect that (say) - inside a code block - a write to D4(this could be the 6502 carry flag for example) is followed by misc code with 3 further more writes to D4 before a read (or 68000 flag generation dependency) happens, and thus all but the last write can be safely removed.

It would be interesting if there was such an assembler to possibly get even better optimized code, but also as a debugging tool as it could be used to show code that can't possibly be reached or which is redundant/not being utilized.

TEG · 18 March 2023, 08:18

Good question. Don't know if it exist but training an AI on this would do wonders I believe. Code as input, speed and result as output.

Thomas Richter · 18 March 2023, 09:51

Quote:

Originally Posted by TEG

Good question. Don't know if it exist but training an AI on this would do wonders I believe. Code as input, speed and result as output.

Hardly. It would rather read 68K input, and would generate something that looks approximately like a program as output upon sloppy inspection, but would turn out to be garbage if you attempt to run it. This is not a job for AI, but for NI (native intelligence).

The problem is that the very first step would be creating from the input an abstract representation of the input program, and assembler is a pretty bad language for expressing an algorithm - which is also why it is not a very popular language for complex problems. It is (relatively) easy to translate a C program to 68K or to x64, but translating 68K to x64 or vice versa is a much harder problem. For example, if a function is called on the 68K, is it necessary to emulate from the result the condition codes and registers d1/a0-a1, or could these just be omitted because they are "by convenion" scratch registers? Without seeing all possible calls to that function, it is impossible to decide.

Side remark: This is pretty much a reason why porting ARexx to AOS4 was not really done. I attempted, but Arexx is an assembler program and some functions return multiple results in multiple registers. It is a pile of assembler lines, but not a very structured code base.

Thus, to answer the OPs question: No, I am not aware of any such program, and it would be hard to write one. There are a couple of peep-hole optimizers, but that is several levels below the complexity of the problem.

TEG · 18 March 2023, 10:53

Quote:

Originally Posted by Thomas Richter

Hardly. It would rather read 68K input, and would generate something that looks approximately like a program as output upon sloppy inspection, but would turn out to be garbage if you attempt to run it. This is not a job for AI, but for NI (native intelligence).

To be sure we understand each other, I talked about optimisation, not generating code from scratch.

I think an AI would be able to easily detect patterns that can be optimized. An AI is "just" statistics. But I agree that the training would perhaps not be simple to setup. The main rule would be to have the exact same output from the original.

On complex system like the Amiga with the custom chips, so more possibilities, I'm sure it would give incredible results. The real limitation however I saw is the slowness of the 68000. It's perhaps impossible to do enough stats in a reasonable lap of time on real hardware. Perhaps with a very good and accurate 68000 emulator but running x100, so all respond times would be divided by the exact same factor it would be OK for pure 68000 parts.

drHirudo · 18 March 2023, 11:07

Not sure if it is related to your question, but Electrostatic - http://aminet.net/package/misc/emu/Electrostatic
translates whole Atari 2600 6502 based programs into Amiga programs.
With Electrostatic, I was able to convert the game Vault Assault - http://aminet.net/game/misc/VaultAssault.lha
Also 100 demos -
http://aminet.net/demo/euro/Atari2600Demos.lha

Karlos · 18 March 2023, 11:21

The problem with this kind of optimisation is that sloppy assembler is harder to diagnose than sloppy C, for example. What may at first appear to be some odd mistake may actually be some clever arrangement of instructions to avoid or leverage some side effect.

It's also much harder to perform trivial optimisations like subexpression elimination when you are dealing with actual registers (rather than the abstract ones you get in many RTL intermediates which are used before the compiler performs actual register allocation for the target CPU).

Years ago there was some genetic algorithm approach to producing "expert" optimisations on a small function scale for intel processors. It was able to produce some highly performant code for relatively simple input requirements. Applying at scale to whole programs is infeasible, of course. Applying to a function at a time for critical inner loops may work but it's equally likely to fail since the optimisation to function A may interfere with the optimisation to function B when they are called in close proximity to each other, for all kinds of reasons (impacts on cache, branch prediction, etc).

vbc · 18 March 2023, 13:36

Quote:

Originally Posted by NorthWay

I am wondering if there are any assemblers, for any architecture, that treats the input source as a "generic" language and does "dead code" optimization like your typical advanced C compiler?
In the context of using Xformer as a translation model I know there is a lot of 6502 flag special casing that generates a lot of instructions, of which very few will actually be used by a later instruction. If the assembler worked as an optimizing compiler it would detect that (say) - inside a code block - a write to D4(this could be the 6502 carry flag for example) is followed by misc code with 3 further more writes to D4 before a read (or 68000 flag generation dependency) happens, and thus all but the last write can be safely removed.

DEC had an x86 to Alpha translator that got some pretty good results and AFAIK did such optimizations. However, it was a JIT binary translator rather than working with assembly sources.

Locutus · 18 March 2023, 16:44

To invoke DEC again....

The DEC VMS operating systems development toolchain includes this.

The original version of VMS ran on a machine called the VAX, a 32bit memory-to-memory CISC machine which was later superseded by the 64bit Alpha, a 64bit register-to-register RISC machine.

The VMS assembler is called MACRO, when they ported to Alpha, they kept it and released it as MACRO32, a compiler generating Alpha code (with limitations such as user-mode only code and no 64bit pointers)

In addition you got MACRO64 for writing /real/ Alpha code.

To what extend MACRO32 does optimization i'm not sure, i never wrote code in it, but from what i understand it uses the same DEC compiler backend as the other languages (C/C++, Pascal, Fortran etc)

Thomas Richter · 18 March 2023, 17:03

Quote:

Originally Posted by TEG

To be sure we understand each other, I talked about optimisation, not generating code from scratch.

I fully understand, but look, AI is a sophisticated pattern matcher using non-linear networks. It is not a precise algorithm that could ensure correct results. That is exactly the problem here. The network cannot analyze code, call-graphs, register allocation or code-flow. It just looks at patterns of instructions, and replaces them by other patterns. This approach works nicely for natural language since humans are error-tolerant, but computer algorithms are not. Either a register allocation is correct, or it is not. But a network cannot ensure mathematical correct results. It can only ensure "probably correct results according to many rounds of training". AI with neural networks works great if the output is allowed to have a certain level of fuzzyness, but it is not great if the output needs to be 100% correct.

TEG · 18 March 2023, 17:18

Quote:

Originally Posted by Thomas Richter

AI with neural networks works great if the output is allowed to have a certain level of fuzzyness, but it is not great if the output needs to be 100% correct.

This is the point. I don't know for sure. I'm not experienced enough with AI to affirm the contrary so I tend to believe you and I understand perfectly what you say. But I would be interested to give it a try because of the complexity those neural networks are able to digest today and so the surprising results.

I notice you say "it is not great" not "it is impossible"

Thomas Richter · 18 March 2023, 18:41

Let's make an analog to make my point a bit cleaner. Consider we would have an experienced engineer that would hand-optimize the assembler output using full-program optimization. Would you believe that, no matter how experienced, such an engineer operates error free and generates code that is 100% reliable?

Well, I'm sure I won't. My brain is a much more sophisticated network than what we can model today, and yet it (me) makes mistakes, sometimes mistakes that are hard to track and hard to resolve. A network would perform similarly (or actually, even worse). It would generate (in best case) an output you cannot thrust - in the same sense that I generate assembler output you (or at least I) do not trust. It requires a long and again sophisticated debugging session, which requires understanding how a program is typically used.

Even then, the rule applies that there are only two types of programs. Those that are outdated, and those that are buggy.

malko · 18 March 2023, 19:04

^

Also, saying differently what has been said in some of the previous posts, 2 pass assembling (low level language) leaves less room for optimisation than 7 pass compiling (high level language)...

TEG · 18 March 2023, 19:32

@Thomas

Clearly we did not agree at this point, especially on the training stage of an AI, how it can be configured and the result it have to output.

But I'm no more going to argue, because the thread would be completely hijacked.

NorthWay · 18 March 2023, 20:37

Quote:

Originally Posted by Karlos

The problem with this kind of optimisation is that sloppy assembler is harder to diagnose than sloppy C, for example. What may at first appear to be some odd mistake may actually be some clever arrangement of instructions to avoid or leverage some side effect.

Is it harder? Both the sloppy and the clever should have the exact same effects and side-effects from their instructions (possibly memory access ambiguity apart) so it should be clear what is dead code in both cases.
If you are talking about infering intention from the code and then to rewrite it more optimally, then that is beyond the scope of what I was looking for. I was only thinking about dead code (of which an interpreting emulator inherently will do a lot - and as I such I was dread to see what a cross-cpu assembler would produce).

(A part of this originated from me pondering about more automatic translation of ROM-based coin-op games to the Amiga, but not wanting to drag along too much "support" code.)

18 March 2023, 05:45	#1
NorthWay Registered User Join Date: May 2013 Location: Grimstad / Norway Posts: 839	Does assembler language optimizing 'compilers' exist? (Whole program optimizer) Background: There have been a few programs that are said to be "line-by-line" cpu to cpu translations. I think "Head over Heals" (or some other isometric game) on the Amiga supposedly did this. There also exists cpu-translation assemblers that you can feed (say) Z80 assembler code and it will spit out a 68000 code translation. I have myself spent a lot of time with the originally ST-based Xformer 6502 emulator and I could envision taking the inner workings of it to take one 6502 opcode and produce a set of 68000 opcodes that replicates it. All the assemblers I have seen and heard about will, at best, only look at a single or a pair of opcodes and replace that with more time/space optimal forms (apart from branch distances, but I'd classify that as single in this context). I am wondering if there are any assemblers, for any architecture, that treats the input source as a "generic" language and does "dead code" optimization like your typical advanced C compiler? In the context of using Xformer as a translation model I know there is a lot of 6502 flag special casing that generates a lot of instructions, of which very few will actually be used by a later instruction. If the assembler worked as an optimizing compiler it would detect that (say) - inside a code block - a write to D4(this could be the 6502 carry flag for example) is followed by misc code with 3 further more writes to D4 before a read (or 68000 flag generation dependency) happens, and thus all but the last write can be safely removed. It would be interesting if there was such an assembler to possibly get even better optimized code, but also as a debugging tool as it could be used to show code that can't possibly be reached or which is redundant/not being utilized.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
which language should i use to program a utility for workbench	jcgeuze	Coders. Tutorials	11	08 September 2022 09:56
Disk Optimizer?	Pheonix	support.Apps	8	10 November 2020 10:13
Which 'free' Basic language program?	keitha1200	Coders. Language	29	04 October 2012 00:08
For a beginner what Program and Program language would you recommend?	amigang	New to Emulation or Amiga scene	5	27 March 2012 13:06
C++ Compilers where/what to get?	Spadger	request.Apps	18	05 May 2006 05:10

18 March 2023, 08:18	#2
TEG Registered User Join Date: Apr 2017 Location: France Posts: 567	Good question. Don't know if it exist but training an AI on this would do wonders I believe. Code as input, speed and result as output.

18 March 2023, 11:07	#5
drHirudo Amiga user Join Date: Nov 2008 Location: Sofia / Bulgaria Posts: 456	Not sure if it is related to your question, but Electrostatic - http://aminet.net/package/misc/emu/Electrostatic translates whole Atari 2600 6502 based programs into Amiga programs. With Electrostatic, I was able to convert the game Vault Assault - http://aminet.net/game/misc/VaultAssault.lha Also 100 demos - http://aminet.net/demo/euro/Atari2600Demos.lha

18 March 2023, 11:21	#6
Karlos Alien Bleed Join Date: Aug 2022 Location: UK Posts: 4,128	The problem with this kind of optimisation is that sloppy assembler is harder to diagnose than sloppy C, for example. What may at first appear to be some odd mistake may actually be some clever arrangement of instructions to avoid or leverage some side effect. It's also much harder to perform trivial optimisations like subexpression elimination when you are dealing with actual registers (rather than the abstract ones you get in many RTL intermediates which are used before the compiler performs actual register allocation for the target CPU). Years ago there was some genetic algorithm approach to producing "expert" optimisations on a small function scale for intel processors. It was able to produce some highly performant code for relatively simple input requirements. Applying at scale to whole programs is infeasible, of course. Applying to a function at a time for critical inner loops may work but it's equally likely to fail since the optimisation to function A may interfere with the optimisation to function B when they are called in close proximity to each other, for all kinds of reasons (impacts on cache, branch prediction, etc).

18 March 2023, 16:44	#8
Locutus Registered User Join Date: Jul 2014 Location: Finland Posts: 1,176	To invoke DEC again.... The DEC VMS operating systems development toolchain includes this. The original version of VMS ran on a machine called the VAX, a 32bit memory-to-memory CISC machine which was later superseded by the 64bit Alpha, a 64bit register-to-register RISC machine. The VMS assembler is called MACRO, when they ported to Alpha, they kept it and released it as MACRO32, a compiler generating Alpha code (with limitations such as user-mode only code and no 64bit pointers) In addition you got MACRO64 for writing /real/ Alpha code. To what extend MACRO32 does optimization i'm not sure, i never wrote code in it, but from what i understand it uses the same DEC compiler backend as the other languages (C/C++, Pascal, Fortran etc)

18 March 2023, 18:41	#11
Thomas Richter Registered User Join Date: Jan 2019 Location: Germany Posts: 3,216	Let's make an analog to make my point a bit cleaner. Consider we would have an experienced engineer that would hand-optimize the assembler output using full-program optimization. Would you believe that, no matter how experienced, such an engineer operates error free and generates code that is 100% reliable? Well, I'm sure I won't. My brain is a much more sophisticated network than what we can model today, and yet it (me) makes mistakes, sometimes mistakes that are hard to track and hard to resolve. A network would perform similarly (or actually, even worse). It would generate (in best case) an output you cannot thrust - in the same sense that I generate assembler output you (or at least I) do not trust. It requires a long and again sophisticated debugging session, which requires understanding how a program is typically used. Even then, the rule applies that there are only two types of programs. Those that are outdated, and those that are buggy.

18 March 2023, 19:04	#12
malko Ex nihilo nihil Join Date: Oct 2017 Location: CH Posts: 4,856	^ Also, saying differently what has been said in some of the previous posts, 2 pass assembling (low level language) leaves less room for optimisation than 7 pass compiling (high level language)...

18 March 2023, 19:32	#13
TEG Registered User Join Date: Apr 2017 Location: France Posts: 567	@Thomas Clearly we did not agree at this point, especially on the training stage of an AI, how it can be configured and the result it have to output. But I'm no more going to argue, because the thread would be completely hijacked.

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)