Fancy a tool that speeds up AMOS excutables?

saimo · 16 May 2023, 21:46

Towards the end of the development of Ring around the World, which is written mostly in AMOS Professional, I wrote a tool that speeds up the game executable produced by the AMOS Professional Compiler by applying various optimizations to conditionals, branches, peeking/poking, arrays, etc.
The tool is general purpose: it loads an AMOS executable, patches it and saves the resulting executable. Please note that it isn't magical: since its optimizations are at machine language level only, it won't help (much), for example, when the load is mostly elsewhere (e.g. blitting), the frame rate is locked, and so on.

The tool itself is written in AMOS, but every now and then I get the itch to rewrite it in assembly and release it - and every time I say to myself that it isn't worth the effort. This time around I thought I'd ask your opinion about it and to make some practical tests.
So, I'd like to ask:
a) do you know of any games/demos/apps that (supposedly) would benefit from such a tool? - I'd like to try the tool and see if it's actually useful in real world cases (other than RatW, that is);
b) would you be interested in the tool? - I'd like to know if there is a potential audience.
Thanks in advance for your feedback.

Samurai_Crow · 17 May 2023, 04:00

At one point I thought that making a library of a compiler backend of ECX or its EEC fork would be a way to improve other compilers. Now that GCC 13.1 is being ported to 68k I've shifted toward thinking that LibGCCJIT used as a static compiler optimizer and backend library would do even better.

At the end of the day, I'd like to be able to combine forces between AmigaE, AmosPro and Blitz to come up with something useful between them. Maybe W2C2 could have its ANSI C backend replaced with GCC's as well for a bytecode experience.

Regarding peephole optimization, just being able to use VAsm as the code generator would go a long way in that regard.

Without the original devs, who has the time?

andy2004 · 22 May 2023, 15:41

The only app i can think of to use the speed up tool on would be DMC.. Disk Magazine Creator. dont know if it would make it faster or not..
Note: its compressed with Crunchmania so would need unpacking first..

Thomas Richter · 22 May 2023, 17:50

Quote:

Originally Posted by saimo

b) would you be interested in the tool? - I'd like to know if there is a potential audience.
Thanks in advance for your feedback.

Note that there are similar tools like this. If you want to have a look, there is "Hunk" in Aminet which contains a couple of tiny script files (called "Hoppers") that apply such peep-hole optimizations for popular compilers. The Hoppers are scripted, so everyone can write their own. However, the overall benefit is really quite minor.

saimo · 22 May 2023, 19:00

Quote:

Originally Posted by andy2004

The only app i can think of to use the speed up tool on would be DMC.. Disk Magazine Creator. dont know if it would make it faster or not..
Note: its compressed with Crunchmania so would need unpacking first..

Thanks for the suggestion. I gave the demo version* a shot - optimized executable attached.
For a GUI-oriented program, I'd say it's probably impossible to spot differences in speed (unless there are non-interactive calculation-heavy parts).

*I couldn't bother searching for the full version.

saimo · 22 May 2023, 19:07

Quote:

Originally Posted by Thomas Richter

Note that there are similar tools like this. If you want to have a look, there is "Hunk" in Aminet which contains a couple of tiny script files (called "Hoppers") that apply such peep-hole optimizations for popular compilers. The Hoppers are scripted, so everyone can write their own. However, the overall benefit is really quite minor.

I didn't know your tool. I had a very quick glance at some hoppers and it looks really cool

It's certainly way more advanced than the tool I've proposed here.

saimo · 08 June 2023, 19:26

Just for the record...

Yesterday, while updating Ring around the World, I decided to measure the speedup brought by the tool to the most CPU-intensive routine of the game, i.e. the routine that calculates the shortest path to go from one location to another. In the case of the longest path allowed by the map, the speedup amounted to about 8.7% 13% on a stock Amiga 500: not exactly a major improvement, but still not too bad.

[ Show youtube player ]

For completeness, this is the routine:

Code:

   Fill WMA To MA_ROWSIZE*MA_HEIGHT+WMA,$80008000

   MCRV_TILEX=TILEX : MCRV_TILEY=TILEY : Call MCRA_GETTILEINDEX
   If OB_FLAGS(Param) and 1
      Doke WMA+TMA-MA_ADDRESS,0
   Else
      Dec D
   End If

   MT_CHECK_TILE:

   If TMA=TTMA Then Goto MT_WALK_PATH
   TWMA=WMA+TMA-MA_ADDRESS
   Inc D

   If Deek(TWMA-MA_ROWSIZE) and $8000
      If OB_FLAGS(Deek(TMA-MA_ROWSIZE)) and 1
         Loke WQAW,TMA-MA_ROWSIZE : Add WQAW,4
         Doke TWMA-MA_ROWSIZE,D
      End If
   End If

   If Deek(TWMA-2) and $8000
      If OB_FLAGS(Deek(TMA-2)) and 1
         Loke WQAW,TMA-2 : Add WQAW,4
         Doke TWMA-2,D
      End If
   End If

   If Deek(TWMA+2) and $8000
      If OB_FLAGS(Deek(TMA+2)) and 1
         Loke WQAW,TMA+2 : Add WQAW,4
         Doke TWMA+2,D
      End If
   End If

   If Deek(TWMA+MA_ROWSIZE) and $8000
      If OB_FLAGS(Deek(TMA+MA_ROWSIZE)) and 1
         Loke WQAW,TMA+MA_ROWSIZE : Add WQAW,4
         Doke TWMA+MA_ROWSIZE,D
      End If
   End If

   MT_CHECK_NEXT_TILE:

   If WQAR-WQAW
      TMA=Leek(WQAR) : Add WQAR,4
      Goto MT_CHECK_TILE
   End If

   If DTF=0
      Dec DTF

      WQAR=WQAS
      WQAW=WQAR

      TMA=DTMA-MA_ROWSIZE-2
      TWMA=WMA+TMA-MA_ADDRESS
      If Deek(TWMA) and $8000
         If OB_FLAGS(Deek(TMA)) and 1
            Doke TWMA,0
            Loke WQAW,TMA : Add WQAW,4
         End If
      End If

      Add TMA,4
      Add TWMA,4
      If Deek(TWMA) and $8000
         If OB_FLAGS(Deek(TMA)) and 1
            Doke TWMA,0
            Loke WQAW,TMA : Add WQAW,4
         End If
      End If

      Add TMA,MA_ROWSIZE*2
      Add TWMA,MA_ROWSIZE*2
      If Deek(TWMA) and $8000
         If OB_FLAGS(Deek(TMA)) and 1
            Doke TWMA,0
            Loke WQAW,TMA : Add WQAW,4
         End If
      End If

      Add TMA,-4
      Add TWMA,-4
      If Deek(TWMA) and $8000
         If OB_FLAGS(Deek(TMA)) and 1
            Doke TWMA,0
            Loke WQAW,TMA : Add WQAW,4
         End If
      End If

      D=0
      Goto MT_CHECK_NEXT_TILE

   End If

   Dec RC
   Goto MT_LEAVE

   MT_WALK_PATH:

Note: the flickering of the character in the main part of the video is due to the fact that the source code relies on a certain optimization, so the non-optimized code sometimes uses wrong values; more precisely, some offsets used to calculate the animation frames indexes of the character are stored in Areg() and the source code assumes that Areg() is not modified by AMOS; in this case, the Call command does modify them, whereas the optimized executable does not because the tool is instructed by a specific command line switch to not return in Areg() the values stored in the CPU address registers by the called machine language routine.

EDIT: added video link.

alain.treesong · 08 June 2023, 21:05

Hi Saimo,
Very interesting. I have a lot of code in Amos pro so we can apply your tool to a lot of code snippets and see the differences.
I publish sometimes some things here :
https://github.com/alain-treesong/amiga_coding_in_amos
and i have a few demos here :
https://demozoo.org/groups/111822/
So we can test a lot of cases I think

Tell me if you are interested.

See u

saimo · 09 June 2023, 00:35

Quote:

Originally Posted by alain.treesong

Hi Saimo,
Very interesting. I have a lot of code in Amos pro so we can apply your tool to a lot of code snippets and see the differences.
I publish sometimes some things here :
https://github.com/alain-treesong/amiga_coding_in_amos
and i have a few demos here :
https://demozoo.org/groups/111822/
So we can test a lot of cases I think

Tell me if you are interested.

See u

Thanks for the examples! If you provide me with the uncompressed executables, I'll post the optimized ones. If you feel like, add some basic benchmark code so that the difference can actually be measured.

EDIT2: elaborating more on the post made in a hurry, late at night, while I should have been in bed... I downloaded the latest demo and it turned out that no optimizations were possible - that happens with compressed executables; since I really should have gone to bed, I gave up; also, I downloaded one of the sources and, from a quick glance at it, it was clear that the tool would not help much as the core loop is dominated by FP maths (which the tool doesn't touch) and polygon rendering (which the tool doesn't touch). Anyway, later I'll give all the sources a spin and report back.

saimo · 09 June 2023, 00:36

Doh, I totally forgot to post the video that shows the test

[ Show youtube player ]

Doh2: the 8.7% figure is bogus! The optimized code takes 87% (see where the figure came from?) of the unoptimized code time, so the speedup is 13%.

This is what happens when doing things in a hurry and while falling asleep...

saimo · 09 June 2023, 15:06

@alain.treesong

OK, I had now a look at your example sources. I chose SimpleCube.amos as test case because the rendering part (which APEO ignores) is minimal and it uses some arrays (which get accelerated by APEO) in the inner loop.

CODE

I modified the code as follows:

Code:

Set Buffer 12
Rem Simple wire 3d cube  
Rem
Rem Aghnar / Agima may 2022

Degree 
NDP=8
CX=160
CY=128
CZ=256*5
Dim X(NDP),Y(NDP),Z(NDP)
Dim XE(NDP),YE(NDP)
Dim C(1024),S(1024)
'
Global X(),Y(),Z(),XE(),YE(),C(),S(),NDP,AX,AY,AZ,CX,CY,CZ
Global FI
'
Screen Open 0,320,256,2,Lowres
Screen Display 0,128,40,320,256
Paper 0 : Hide On : Flash Off : Curs Off : Cls 
Palette $0,$666

' Simple horizontal line using copper to enhance the scene 
Set Rainbow 0,0,16,"","",""
Rain(0,0)=$CCC
Rain(0,1)=$444
Rain(0,2)=$111
Rainbow 0,0,270,16

Ink 1 : Pen 1
Double Buffer : Autoback 0

' Trigo table and definition of the 8 points for the cube
For I=0 To 1023
   C(I)=Qcos(I,256) : S(I)=Qsin(I,256)
Next I
'
For I=1 To NDP
   Read X(I),Y(I),Z(I)
Next 
'
Data -1,-1,-1
Data 1,-1,-1
Data 1,1,-1
Data -1,1,-1
Data -1,-1,1
Data 1,-1,1
Data 1,1,1
Data -1,1,1
'
AX=0
AY=0
AZ=0

' Timing start 

TS=Timer

' Main loop : rotation on the 3 axis 

Repeat 
   
   Add AX,-1,0 To 1023
   Add AY,1,0 To 1023
   Add AZ,-1,0 To 1023
   
   RENDER_CUBE
   
   Screen Swap 
   FI=Timer
   
Until AX=0

'Benchmark report

ET=Timer-TS+1
Print "elapsed time:";ET;" frames"
Print "speed: ";(1024*50.0)/ET;" fps"
Screen Swap 
Wait Key 

Procedure RENDER_CUBE
   
   CAX=C(AX)
   SAX=S(AX)
   CAY=C(AY)
   SIY=S(AY)
   CAZ=C(AZ)
   SAZ=S(AZ)
   
   ' Rotation and projection of the 8 points. 
   ' A lot of optimizations are possible here : 
   ' - inlining, no array (xe(1) replaced by xe1) 
   ' - the fact that the initial x,y z values are 1 or -1   
   
   ULC=10
   While ULC

      I=NDP
      While I
         
         ' rotation %X  
         X=X(I)*256
         Y=Y(I)*CAX+Z(I)*SAX
         Z=-Y(I)*SAX+Z(I)*CAX
         '  
         ' rotation %Y
         X2=X*CAY+Z*SIY
         Z=-X*SIY+Z*CAY
         '  
         ' rotation %Z
         X2=X2/256
         X=X2*CAZ+Y*SAZ
         Y=-X2*SAZ+Y*CAZ
         '
         ' Projection 
         D=CZ+Z/256
         XE(I)=CX+X/D
         YE(I)=CY+Y/D
         
         Dec I
      Wend 
      
      Dec ULC
   Wend 
   
   Repeat : Until Timer-FI
   
   ' Draw all 12 lines  
   Blitter Clear 0,0
   Turbo Draw XE(2),YE(2) To XE(6),YE(6),1,1
   Turbo Draw XE(6),YE(6) To XE(5),YE(5),1,1
   Turbo Draw XE(5),YE(5) To XE(1),YE(1),1,1
   Turbo Draw XE(1),YE(1) To XE(2),YE(2),1,1
   Turbo Draw XE(5),YE(5) To XE(8),YE(8),1,1
   Turbo Draw XE(8),YE(8) To XE(7),YE(7),1,1
   Turbo Draw XE(7),YE(7) To XE(6),YE(6),1,1
   Turbo Draw XE(1),YE(1) To XE(4),YE(4),1,1
   Turbo Draw XE(4),YE(4) To XE(3),YE(3),1,1
   Turbo Draw XE(3),YE(3) To XE(2),YE(2),1,1
   Turbo Draw XE(3),YE(3) To XE(7),YE(7),1,1
   Turbo Draw XE(8),YE(8) To XE(4),YE(4),1,1
   
End Proc

A key change is that I replaced

Screen Swap
Wait Vbl

with

Repeat : Until Timer-FI
<rendering code>
Screen Swap
FI=Timer

This allows to use all the available CPU cycles and thus get the best performance possible on underpowered machines (Wait Vbl, instead, just wastes time doing nothing).

Then, given that the code already ran at 50 fps also on a stock A500, I forced the rotation and projection calculations to artificially repeat 10 times (While ULC... Wend loop).

Then, I replaced the inner For...Next with While...Wend because the former compiles terribly (it should always be replaced by While...Wend or Repeat...Until).

Finally, I added some code to measure the performance.

Attached is the bootable .adf with the test executables.

COMPILING AND OPTIMIZING

I compiled the code and then I created an optimized executable.
The result of the optimization was:

That means that APEO:
* optimized the global routine that handles the accesses to arrays;
* optimized the Colour() and Colour routines (for some reason, the Compiler seems to always include them in the executables, even when, like in this case, they are not used);
* optimized two divisions by a power of 2 (the /256 in X2=X2/256 and D=CZ+Z/256; this optimization is not beneficial on 68000, though).

The While...Wend and Repeat...Until loops I added were not optimized because I wrote them in a way that the Compiler already produces its best output.

Bootable .adf with the executables attached here.

BENCHMARKING

I ran the executables using a stock A500 configuration in WinUAE 5.0.0. and got these results:
* normal version: 4253 frames -> 12.038 fps;
* optimized version: 4089 frames -> 12.521 fps.
The optimized version took 164 frames = 3.28 seconds less.
The gain is minimal (less than 4%), but, after all, there wasn't much to optimize in first place. Anyway, a minimal gain is better than no gain

alain.treesong · 09 June 2023, 22:02

@saimo
Great.
I don't worry about the low optimization rate here because it is a single test.

Some questions :
1. Arrays are very slow in Amos as you said, so generally i replace them by simple variables. So XE(1) becomes for example xe1, xe(0) xe0 etc. I use sometimes external parsers written in Java to do that but it remains fastidious and the produced code is verbose. I see arrays in the screenshot but can your tool handles this case (replacing arrays when possible or optimizing array to be as efficient than using single vars) ?
2. The pro compiler (2.0) already claims that it optimizes mul / div by power of two replacing by logical shift. It is why I generally user 2^n when possible. It isn't the case in the produced code by amos pro compiler 2.0 ?
3. Thx for the tips with while / wend etc. In fact generally the idea is to produce scene at 50 fps so the wait vbl is enough. Indeed, for slower scenes, it's interesting.
4. Will you publish your great tool (if it isn't already the case) ?

Very nice to speak about Amos code in 2023 :-)

Edit 1: I suppose from the screenshot that you optimize Amreg(). This is a great idea because it is a lot used in game with sprites and bobs and it is slow.
Other thing that is slow is the (quite powerfull) rain command. Using big rains or multiple rain is very slow. I suppose that this is because the computed copperlist is then slow to do. Will be cool if not too difficult to optimize that.

Edit 2 : The 64k intro (yes! and the pixelated world) are compressed using shrinkler but the other ones are not compressed

Retro1234 · 09 June 2023, 22:29

Can you try this? I'm currious
https://eab.abime.net/showthread.php...11#post1122111

Also some people claim Amos The Creator Compiler produces faster executables - any thoughts on this?

saimo · 09 June 2023, 23:01

@alain.treesong

Quote:

Originally Posted by alain.treesong

1. Arrays are very slow in Amos as you said, so generally i replace them by simple variables. So XE(1) becomes for example xe1, xe(0) xe0 etc. I use sometimes external parsers written in Java to do that but it remains fastidious and the produced code is verbose.

Indeed using normal variables gives a huge boost.
For critical single-index arrays, it's best to use Areg(), Dreg() and Amreg() (if possible).

Quote:

I see arrays in the screenshot but can your tool handles this case (replacing arrays when possible or optimizing array to be as efficient than using single vars) ?

Nope. The tool optimizes the code that calculates the address of an item within an array, discarding the safety checks (they're pretty useless for compiled code). More precisely, this is what the tool does (straight from the comments in the code):

Code:

The assignment A(...)=... gets compiled as follows:
    <fetched/calculated value to assign gets put in d3>
    move.l d3,-(a3)
    <fetched/calculated index of first dimension gets put in d3>  
    move.l d3,-(a3)
    ...
    <fetched/calculated index of last dimension gets put in d3>
    move.l d3,-(a3)
    lea.l  *(a6),a0
    jsr    *(a4)
    move.l (a3)+,(a0)
where:
 * a0 ends up pointing to the address of the array descriptor;  
 * the code at *(a4) is the routine that performs the safety and type
   checks, and puts in a0 the address where the value is to be stored.

The array descriptor is (offset: content):
   0: number of dimensions
   1: log2(item size)
 2-3: maximum index of first dimension
 4-5: number of items in previous dimensions (for first dimension: 1)
 ...: ...
 ...: maximum index of last dimension
 ...: number of items in previous dimensions

The routine is:
    move.l  (a0),d0  ;2010      get array descriptor address    
    beq.w   #$0024   ;6700 0024 if array undefined...  
    movea.l d0,a0    ;2040      get array descriptor address  
    move.b  (a0)+,d3 ;1618      get number of dimensions  
    move.b  (a0)+,d4 ;1818      get log2 of item size
    moveq.l #0,d0    ;7000      clear high word
    moveq.l #0,d2    ;7400      reset number of items to skip from array beginning  
.l  move.w  (a0)+,d0 ;3018      get maximum index
    move.l  (a3)+,d1 ;221b      get desired index
    cmp.l   d0,d1    ;b280      check index against maximum possible  
    bhi.w   *        ;6200 **** if index too big...
    mulu.w  (a0)+,d1 ;c2d8      calculate number of items relative to previous dimensions
    add.l   d1,d2    ;d481      update number of items to skip  
    subq.b  #1,d3    ;5303      check next dimension  
    bne.b   .l       ;66ee      if dimensions not over...
    lsl.l   d4,d2    ;e9aa      calculate offset of item as index<<log2(item size)  
    adda.l  d2,a0    ;d1c2      calculate address of item
    rts              ;4e75

This code replaces the routine with:
    movea.l (a0),a0  ;2050 get array descriptor address  
    move.b  (a0)+,d3 ;1618 get number of dimensions  
    move.b  (a0)+,d4 ;1818 get log2 of item size  
    moveq.l #0,d0    ;7000 clear high word  
    moveq.l #0,d2    ;7400 reset number of items to skip from array beginning  
.l  addq.w  #2,a0    ;5448 skip maximum index  
    move.l  (a3)+,d1 ;221b get desired index  
    mulu.w  (a0)+,d1 ;c2d8 calculate number of items relative to previous dimensions  
    add.l   d1,d2    ;d481 update number of items to skip  
    subq.b  #1,d3    ;5303 check next dimension  
    bne.b   .l       ;66f4 if dimensions not over...  
    lsl.l   d4,d2    ;e9aa calculate offset of item as index<<log2(item size)  
    adda.l  d2,a0    ;d1c2 calculate address of item  
    rts              ;4e75

Quote:

2. The pro compiler (2.0) already claims that it optimizes mul / div by power of two replacing by logical shift. It is why I generally user 2^n when possible. It isn't the case in the produced code by amos pro compiler 2.0 ?

The Compiler produces this code regardless of the shift count:

Code:

   moveq.l     #<count>,d0
   asr/lsl.l   d0,d3

That's wasteful when the shift is 16, as multiplications/divisions by 65536 are done much more cheaply with swap.w + clr.w/ext.l, and also when the count is between 1 and 8, as asr/lsl are faster on 68020, 68030 and 68040 when the count is an immediate argument. APEO fixes/mitigates that.

Quote:

4. Will you publish your great tool (if it isn't already the case) ?

Nope, sorry :/ The purpose of this thread was precisely to verify whether the tool received enough interest to justify writing a proper version, writing the documentation, preparing a page and maintaing everything in the future (as I'm used to update all my stuff constantly, as you can see from the history of all my projects at https://retream.itch.io), and the answer has been that there is basically no interest.
But thanks for your interest!

Quote:

Edit 1: I suppose from the screenshot that you optimize Amreg(). This is a great idea because it is a lot used in game with sprites and bobs and it is slow.

Yes, APEO optimizes Amreg() as well. If I remember correctly, Amreg() uses the same complicated routine of Areg() and Dreg() to calculate the item address and do safety checks. APEO replaces everything with hand-made address calculation code - more precisely:

Code:

COMPILED
    move.l #$80000000,d1 ;223c 8000 0000 set flag
    bsr.w  *             ;6100 ****      call address calculation routine
    move.w (a0),d3       ;3610           read item value
    ext.l  d3            ;48c3           sign-extend value
    rts                  ;4e75

OPTIMIZED
    add.l  d3,d3         ;d683      calculate item offset
    lea.l  -$186e(a5),a0 ;41ed e792 calculate address of Amreg()
    move.w (a0,d3.l),d3  ;3630 3800 read item value
    ext.l  d3            ;48c3      sign-extend value
    rts                  ;4e75

Quote:

Other thing that is slow is the (quite powerfull) rain command. Using big rains or multiple rain is very slow. I suppose that this is because the computed copperlist is then slow to do. Will be cool if not too difficult to optimize that.

AMOS has hundreds of functions and the potential for optimizations is almost boundless. I focused on the most common stuff. I don't plan to add more optimizations to the tool. In future, I might add something if need arises when making a new game (I still haven't figured out which game to make with [ Show youtube player ]).

Quote:

Edit 2 : The 64k intro (yes! and the pixelated world) are compressed using shrinkler but the other ones are not compressed

I'll have a look at them later

saimo · 09 June 2023, 23:05

@Retro1234

Quote:

Originally Posted by Retro1234

Can you try this? I'm currious
https://eab.abime.net/showthread.php...11#post1122111

APEO can't optimize anything: is the executable compressed? Has it been produced by the AMOS Professional Compiler?

Quote:

Also some people claim Amos The Creator Compiler produces faster executables - any thoughts on this?

No idea - I never used that compiler.

Retro1234 · 09 June 2023, 23:10

Yeah it was probably compressed I'll see if I can find the source, thanks

Retro1234 · 09 June 2023, 23:18

I started work on a program to convert Amos to Blitz but I never finished it. Blitz in general is "faster" a blitting Bobs.

saimo · 09 June 2023, 23:49

@alain.treesong

Quote:

Edit 2 : The 64k intro (yes! and the pixelated world) are compressed using shrinkler but the other ones are not compressed

I'll have a look at them later

Here are the optimized demos MerryHappy, happy21, Happy2022 and NewImpact. The archive includes also the optimizations reports.
I don't the think the optimizations make any difference: the demos don't seem to do anything CPU-intensive and are probably frame-locked.
Notes:
* I noticed the programs can be broken with CTRL-C: if you need speed, use Comp Test Off;
* I had to remove the original executable from the NewImpact ADF as there wasn't enough space.

Karlos · 09 June 2023, 23:56

Could AMOS be transpiled to C? I appreciate that one can't just magically do this without a runtime library to provide equivalent functionality for all the graphics and audio features that the language provides out of the box, but in principle, is there anything about the language, some fundamental impedance mismatch, that prevents automated conversion to C?

saimo · 10 June 2023, 10:08

Quote:

Originally Posted by Karlos

Could AMOS be transpiled to C? I appreciate that one can't just magically do this without a runtime library to provide equivalent functionality for all the graphics and audio features that the language provides out of the box, but in principle, is there anything about the language, some fundamental impedance mismatch, that prevents automated conversion to C?

I'd say it would be quite straightforward to translate from AMOS to C (and the executable produced by the C compiler would surely be much more efficient than the AMOS one).

09 June 2023, 00:36	#10
saimo Registered User Join Date: Aug 2010 Location: Italy Posts: 862	Doh, I totally forgot to post the video that shows the test [ Show youtube player ] Doh2: the 8.7% figure is bogus! The optimized code takes 87% (see where the figure came from?) of the unoptimized code time, so the speedup is 13%. This is what happens when doing things in a hurry and while falling asleep... Last edited by saimo; 09 June 2023 at 13:29.

09 June 2023, 22:02	#12
alain.treesong Aghnar Join Date: Jan 2019 Location: France Posts: 156	@saimo Great. I don't worry about the low optimization rate here because it is a single test. Some questions : 1. Arrays are very slow in Amos as you said, so generally i replace them by simple variables. So XE(1) becomes for example xe1, xe(0) xe0 etc. I use sometimes external parsers written in Java to do that but it remains fastidious and the produced code is verbose. I see arrays in the screenshot but can your tool handles this case (replacing arrays when possible or optimizing array to be as efficient than using single vars) ? 2. The pro compiler (2.0) already claims that it optimizes mul / div by power of two replacing by logical shift. It is why I generally user 2^n when possible. It isn't the case in the produced code by amos pro compiler 2.0 ? 3. Thx for the tips with while / wend etc. In fact generally the idea is to produce scene at 50 fps so the wait vbl is enough. Indeed, for slower scenes, it's interesting. 4. Will you publish your great tool (if it isn't already the case) ? Very nice to speak about Amos code in 2023 :-) Edit 1: I suppose from the screenshot that you optimize Amreg(). This is a great idea because it is a lot used in game with sprites and bobs and it is slow. Other thing that is slow is the (quite powerfull) rain command. Using big rains or multiple rain is very slow. I suppose that this is because the computed copperlist is then slow to do. Will be cool if not too difficult to optimize that. Edit 2 : The 64k intro (yes! and the pixelated world) are compressed using shrinkler but the other ones are not compressed Last edited by alain.treesong; 09 June 2023 at 22:12.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
[Wip] Amos Professional X : Bring AGA to Amos Pro!	freddix	Coders. AMOS	53	22 July 2023 09:53
Anyone fancy a free Gameboy Colour?	Paul_s	MarketPlace	17	16 May 2009 18:41
If Microshaft can have fancy qualifications for Windows... then why can't we?	Paul_s	Amiga scene	30	14 April 2008 08:19
Anyone fancy putting some ADF`s onto disk? :)	Mike UK	MarketPlace	4	22 January 2007 17:09
Fancy a NEW Amiga magazine?	ronniet	Amiga scene	2	18 April 2006 02:14

16 May 2023, 21:46	#1
saimo Registered User Join Date: Aug 2010 Location: Italy Posts: 862	Fancy a tool that speeds up AMOS excutables? Towards the end of the development of Ring around the World, which is written mostly in AMOS Professional, I wrote a tool that speeds up the game executable produced by the AMOS Professional Compiler by applying various optimizations to conditionals, branches, peeking/poking, arrays, etc. The tool is general purpose: it loads an AMOS executable, patches it and saves the resulting executable. Please note that it isn't magical: since its optimizations are at machine language level only, it won't help (much), for example, when the load is mostly elsewhere (e.g. blitting), the frame rate is locked, and so on. The tool itself is written in AMOS, but every now and then I get the itch to rewrite it in assembly and release it - and every time I say to myself that it isn't worth the effort. This time around I thought I'd ask your opinion about it and to make some practical tests. So, I'd like to ask: a) do you know of any games/demos/apps that (supposedly) would benefit from such a tool? - I'd like to try the tool and see if it's actually useful in real world cases (other than RatW, that is); b) would you be interested in the tool? - I'd like to know if there is a potential audience. Thanks in advance for your feedback.

17 May 2023, 04:00	#2
Samurai_Crow Total Chaos forever! Join Date: Aug 2007 Location: Waterville, MN, USA Age: 49 Posts: 2,213	At one point I thought that making a library of a compiler backend of ECX or its EEC fork would be a way to improve other compilers. Now that GCC 13.1 is being ported to 68k I've shifted toward thinking that LibGCCJIT used as a static compiler optimizer and backend library would do even better. At the end of the day, I'd like to be able to combine forces between AmigaE, AmosPro and Blitz to come up with something useful between them. Maybe W2C2 could have its ANSI C backend replaced with GCC's as well for a bytecode experience. Regarding peephole optimization, just being able to use VAsm as the code generator would go a long way in that regard. Without the original devs, who has the time?

22 May 2023, 15:41	#3
andy2004 Banned Join Date: May 2006 Location: n/a Posts: 278	The only app i can think of to use the speed up tool on would be DMC.. Disk Magazine Creator. dont know if it would make it faster or not.. Note: its compressed with Crunchmania so would need unpacking first..

08 June 2023, 21:05	#8
alain.treesong Aghnar Join Date: Jan 2019 Location: France Posts: 156	Hi Saimo, Very interesting. I have a lot of code in Amos pro so we can apply your tool to a lot of code snippets and see the differences. I publish sometimes some things here : https://github.com/alain-treesong/amiga_coding_in_amos and i have a few demos here : https://demozoo.org/groups/111822/ So we can test a lot of cases I think Tell me if you are interested. See u

09 June 2023, 22:29	#13
Retro1234 Phone Homer Join Date: Jun 2006 Location: 5150 Posts: 5,850	Can you try this? I'm currious https://eab.abime.net/showthread.php...11#post1122111 Also some people claim Amos The Creator Compiler produces faster executables - any thoughts on this?

09 June 2023, 23:10	#16
Retro1234 Phone Homer Join Date: Jun 2006 Location: 5150 Posts: 5,850	Yeah it was probably compressed I'll see if I can find the source, thanks

09 June 2023, 23:18	#17
Retro1234 Phone Homer Join Date: Jun 2006 Location: 5150 Posts: 5,850	I started work on a program to convert Amos to Blitz but I never finished it. Blitz in general is "faster" a blitting Bobs.

09 June 2023, 23:56	#19
Karlos Alien Bleed Join Date: Aug 2022 Location: UK Posts: 4,667	Could AMOS be transpiled to C? I appreciate that one can't just magically do this without a runtime library to provide equivalent functionality for all the graphics and audio features that the language provides out of the box, but in principle, is there anything about the language, some fundamental impedance mismatch, that prevents automated conversion to C?

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)