18 March 2023, 05:45 | #1 |
Registered User
Join Date: May 2013
Location: Grimstad / Norway
Posts: 839
|
Does assembler language optimizing 'compilers' exist? (Whole program optimizer)
Background: There have been a few programs that are said to be "line-by-line" cpu to cpu translations. I think "Head over Heals" (or some other isometric game) on the Amiga supposedly did this. There also exists cpu-translation assemblers that you can feed (say) Z80 assembler code and it will spit out a 68000 code translation.
I have myself spent a lot of time with the originally ST-based Xformer 6502 emulator and I could envision taking the inner workings of it to take one 6502 opcode and produce a set of 68000 opcodes that replicates it. All the assemblers I have seen and heard about will, at best, only look at a single or a pair of opcodes and replace that with more time/space optimal forms (apart from branch distances, but I'd classify that as single in this context). I am wondering if there are any assemblers, for any architecture, that treats the input source as a "generic" language and does "dead code" optimization like your typical advanced C compiler? In the context of using Xformer as a translation model I know there is a lot of 6502 flag special casing that generates a lot of instructions, of which very few will actually be used by a later instruction. If the assembler worked as an optimizing compiler it would detect that (say) - inside a code block - a write to D4(this could be the 6502 carry flag for example) is followed by misc code with 3 further more writes to D4 before a read (or 68000 flag generation dependency) happens, and thus all but the last write can be safely removed. It would be interesting if there was such an assembler to possibly get even better optimized code, but also as a debugging tool as it could be used to show code that can't possibly be reached or which is redundant/not being utilized. |
18 March 2023, 08:18 | #2 |
Registered User
Join Date: Apr 2017
Location: France
Posts: 567
|
Good question. Don't know if it exist but training an AI on this would do wonders I believe. Code as input, speed and result as output.
|
18 March 2023, 09:51 | #3 | |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,216
|
Quote:
The problem is that the very first step would be creating from the input an abstract representation of the input program, and assembler is a pretty bad language for expressing an algorithm - which is also why it is not a very popular language for complex problems. It is (relatively) easy to translate a C program to 68K or to x64, but translating 68K to x64 or vice versa is a much harder problem. For example, if a function is called on the 68K, is it necessary to emulate from the result the condition codes and registers d1/a0-a1, or could these just be omitted because they are "by convenion" scratch registers? Without seeing all possible calls to that function, it is impossible to decide. Side remark: This is pretty much a reason why porting ARexx to AOS4 was not really done. I attempted, but Arexx is an assembler program and some functions return multiple results in multiple registers. It is a pile of assembler lines, but not a very structured code base. Thus, to answer the OPs question: No, I am not aware of any such program, and it would be hard to write one. There are a couple of peep-hole optimizers, but that is several levels below the complexity of the problem. |
|
18 March 2023, 10:53 | #4 | |
Registered User
Join Date: Apr 2017
Location: France
Posts: 567
|
Quote:
I think an AI would be able to easily detect patterns that can be optimized. An AI is "just" statistics. But I agree that the training would perhaps not be simple to setup. The main rule would be to have the exact same output from the original. On complex system like the Amiga with the custom chips, so more possibilities, I'm sure it would give incredible results. The real limitation however I saw is the slowness of the 68000. It's perhaps impossible to do enough stats in a reasonable lap of time on real hardware. Perhaps with a very good and accurate 68000 emulator but running x100, so all respond times would be divided by the exact same factor it would be OK for pure 68000 parts. Last edited by TEG; 18 March 2023 at 11:03. |
|
18 March 2023, 11:07 | #5 |
Amiga user
Join Date: Nov 2008
Location: Sofia / Bulgaria
Posts: 456
|
Not sure if it is related to your question, but Electrostatic - http://aminet.net/package/misc/emu/Electrostatic
translates whole Atari 2600 6502 based programs into Amiga programs. With Electrostatic, I was able to convert the game Vault Assault - http://aminet.net/game/misc/VaultAssault.lha Also 100 demos - http://aminet.net/demo/euro/Atari2600Demos.lha |
18 March 2023, 11:21 | #6 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,128
|
The problem with this kind of optimisation is that sloppy assembler is harder to diagnose than sloppy C, for example. What may at first appear to be some odd mistake may actually be some clever arrangement of instructions to avoid or leverage some side effect.
It's also much harder to perform trivial optimisations like subexpression elimination when you are dealing with actual registers (rather than the abstract ones you get in many RTL intermediates which are used before the compiler performs actual register allocation for the target CPU). Years ago there was some genetic algorithm approach to producing "expert" optimisations on a small function scale for intel processors. It was able to produce some highly performant code for relatively simple input requirements. Applying at scale to whole programs is infeasible, of course. Applying to a function at a time for critical inner loops may work but it's equally likely to fail since the optimisation to function A may interfere with the optimisation to function B when they are called in close proximity to each other, for all kinds of reasons (impacts on cache, branch prediction, etc). |
18 March 2023, 13:36 | #7 | |
Registered User
Join Date: Jan 2021
Location: Germany
Posts: 18
|
Quote:
|
|
18 March 2023, 16:44 | #8 |
Registered User
Join Date: Jul 2014
Location: Finland
Posts: 1,176
|
To invoke DEC again....
The DEC VMS operating systems development toolchain includes this. The original version of VMS ran on a machine called the VAX, a 32bit memory-to-memory CISC machine which was later superseded by the 64bit Alpha, a 64bit register-to-register RISC machine. The VMS assembler is called MACRO, when they ported to Alpha, they kept it and released it as MACRO32, a compiler generating Alpha code (with limitations such as user-mode only code and no 64bit pointers) In addition you got MACRO64 for writing /real/ Alpha code. To what extend MACRO32 does optimization i'm not sure, i never wrote code in it, but from what i understand it uses the same DEC compiler backend as the other languages (C/C++, Pascal, Fortran etc) |
18 March 2023, 17:03 | #9 |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,216
|
I fully understand, but look, AI is a sophisticated pattern matcher using non-linear networks. It is not a precise algorithm that could ensure correct results. That is exactly the problem here. The network cannot analyze code, call-graphs, register allocation or code-flow. It just looks at patterns of instructions, and replaces them by other patterns. This approach works nicely for natural language since humans are error-tolerant, but computer algorithms are not. Either a register allocation is correct, or it is not. But a network cannot ensure mathematical correct results. It can only ensure "probably correct results according to many rounds of training". AI with neural networks works great if the output is allowed to have a certain level of fuzzyness, but it is not great if the output needs to be 100% correct.
|
18 March 2023, 17:18 | #10 | |
Registered User
Join Date: Apr 2017
Location: France
Posts: 567
|
Quote:
I notice you say "it is not great" not "it is impossible" |
|
18 March 2023, 18:41 | #11 |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,216
|
Let's make an analog to make my point a bit cleaner. Consider we would have an experienced engineer that would hand-optimize the assembler output using full-program optimization. Would you believe that, no matter how experienced, such an engineer operates error free and generates code that is 100% reliable?
Well, I'm sure I won't. My brain is a much more sophisticated network than what we can model today, and yet it (me) makes mistakes, sometimes mistakes that are hard to track and hard to resolve. A network would perform similarly (or actually, even worse). It would generate (in best case) an output you cannot thrust - in the same sense that I generate assembler output you (or at least I) do not trust. It requires a long and again sophisticated debugging session, which requires understanding how a program is typically used. Even then, the rule applies that there are only two types of programs. Those that are outdated, and those that are buggy. |
18 March 2023, 19:04 | #12 |
Ex nihilo nihil
Join Date: Oct 2017
Location: CH
Posts: 4,856
|
^
Also, saying differently what has been said in some of the previous posts, 2 pass assembling (low level language) leaves less room for optimisation than 7 pass compiling (high level language)... |
18 March 2023, 19:32 | #13 |
Registered User
Join Date: Apr 2017
Location: France
Posts: 567
|
@Thomas
Clearly we did not agree at this point, especially on the training stage of an AI, how it can be configured and the result it have to output. But I'm no more going to argue, because the thread would be completely hijacked. |
18 March 2023, 20:37 | #14 | |
Registered User
Join Date: May 2013
Location: Grimstad / Norway
Posts: 839
|
Quote:
If you are talking about infering intention from the code and then to rewrite it more optimally, then that is beyond the scope of what I was looking for. I was only thinking about dead code (of which an interpreting emulator inherently will do a lot - and as I such I was dread to see what a cross-cpu assembler would produce). (A part of this originated from me pondering about more automatic translation of ROM-based coin-op games to the Amiga, but not wanting to drag along too much "support" code.) |
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
which language should i use to program a utility for workbench | jcgeuze | Coders. Tutorials | 11 | 08 September 2022 09:56 |
Disk Optimizer? | Pheonix | support.Apps | 8 | 10 November 2020 10:13 |
Which 'free' Basic language program? | keitha1200 | Coders. Language | 29 | 04 October 2012 00:08 |
For a beginner what Program and Program language would you recommend? | amigang | New to Emulation or Amiga scene | 5 | 27 March 2012 13:06 |
C++ Compilers where/what to get? | Spadger | request.Apps | 18 | 05 May 2006 05:10 |
|
|