English Amiga Board


Go Back   English Amiga Board > Main > Amiga scene

 
 
Thread Tools
Old 16 May 2023, 21:46   #1
saimo
Registered User
 
saimo's Avatar
 
Join Date: Aug 2010
Location: Italy
Posts: 854
Fancy a tool that speeds up AMOS excutables?

Towards the end of the development of Ring around the World, which is written mostly in AMOS Professional, I wrote a tool that speeds up the game executable produced by the AMOS Professional Compiler by applying various optimizations to conditionals, branches, peeking/poking, arrays, etc.
The tool is general purpose: it loads an AMOS executable, patches it and saves the resulting executable. Please note that it isn't magical: since its optimizations are at machine language level only, it won't help (much), for example, when the load is mostly elsewhere (e.g. blitting), the frame rate is locked, and so on.

The tool itself is written in AMOS, but every now and then I get the itch to rewrite it in assembly and release it - and every time I say to myself that it isn't worth the effort. This time around I thought I'd ask your opinion about it and to make some practical tests.
So, I'd like to ask:
a) do you know of any games/demos/apps that (supposedly) would benefit from such a tool? - I'd like to try the tool and see if it's actually useful in real world cases (other than RatW, that is);
b) would you be interested in the tool? - I'd like to know if there is a potential audience.
Thanks in advance for your feedback.
saimo is offline  
Old 17 May 2023, 04:00   #2
Samurai_Crow
Total Chaos forever!
 
Samurai_Crow's Avatar
 
Join Date: Aug 2007
Location: Waterville, MN, USA
Age: 49
Posts: 2,197
At one point I thought that making a library of a compiler backend of ECX or its EEC fork would be a way to improve other compilers. Now that GCC 13.1 is being ported to 68k I've shifted toward thinking that LibGCCJIT used as a static compiler optimizer and backend library would do even better.

At the end of the day, I'd like to be able to combine forces between AmigaE, AmosPro and Blitz to come up with something useful between them. Maybe W2C2 could have its ANSI C backend replaced with GCC's as well for a bytecode experience.

Regarding peephole optimization, just being able to use VAsm as the code generator would go a long way in that regard.

Without the original devs, who has the time?
Samurai_Crow is offline  
Old 22 May 2023, 15:41   #3
andy2004
Zone Friend
 
Join Date: May 2006
Location: Hampshire
Age: 49
Posts: 276
Send a message via Yahoo to andy2004
The only app i can think of to use the speed up tool on would be DMC.. Disk Magazine Creator. dont know if it would make it faster or not..
Note: its compressed with Crunchmania so would need unpacking first..
andy2004 is offline  
Old 22 May 2023, 17:50   #4
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,302
Quote:
Originally Posted by saimo View Post
b) would you be interested in the tool? - I'd like to know if there is a potential audience.
Thanks in advance for your feedback.
Note that there are similar tools like this. If you want to have a look, there is "Hunk" in Aminet which contains a couple of tiny script files (called "Hoppers") that apply such peep-hole optimizations for popular compilers. The Hoppers are scripted, so everyone can write their own. However, the overall benefit is really quite minor.
Thomas Richter is offline  
Old 22 May 2023, 19:00   #5
saimo
Registered User
 
saimo's Avatar
 
Join Date: Aug 2010
Location: Italy
Posts: 854
Quote:
Originally Posted by andy2004 View Post
The only app i can think of to use the speed up tool on would be DMC.. Disk Magazine Creator. dont know if it would make it faster or not..
Note: its compressed with Crunchmania so would need unpacking first..
Thanks for the suggestion. I gave the demo version* a shot - optimized executable attached.
For a GUI-oriented program, I'd say it's probably impossible to spot differences in speed (unless there are non-interactive calculation-heavy parts).

*I couldn't bother searching for the full version.
Attached Files
File Type: lha DMC-optimized.exe.lha (90.7 KB, 39 views)

Last edited by saimo; 10 June 2023 at 13:31.
saimo is offline  
Old 22 May 2023, 19:07   #6
saimo
Registered User
 
saimo's Avatar
 
Join Date: Aug 2010
Location: Italy
Posts: 854
Quote:
Originally Posted by Thomas Richter View Post
Note that there are similar tools like this. If you want to have a look, there is "Hunk" in Aminet which contains a couple of tiny script files (called "Hoppers") that apply such peep-hole optimizations for popular compilers. The Hoppers are scripted, so everyone can write their own. However, the overall benefit is really quite minor.
I didn't know your tool. I had a very quick glance at some hoppers and it looks really cool It's certainly way more advanced than the tool I've proposed here.
saimo is offline  
Old 08 June 2023, 19:26   #7
saimo
Registered User
 
saimo's Avatar
 
Join Date: Aug 2010
Location: Italy
Posts: 854
Just for the record...

Yesterday, while updating Ring around the World, I decided to measure the speedup brought by the tool to the most CPU-intensive routine of the game, i.e. the routine that calculates the shortest path to go from one location to another. In the case of the longest path allowed by the map, the speedup amounted to about 8.7% 13% on a stock Amiga 500: not exactly a major improvement, but still not too bad.

[ Show youtube player ]

For completeness, this is the routine:
Code:
   Fill WMA To MA_ROWSIZE*MA_HEIGHT+WMA,$80008000

   MCRV_TILEX=TILEX : MCRV_TILEY=TILEY : Call MCRA_GETTILEINDEX
   If OB_FLAGS(Param) and 1
      Doke WMA+TMA-MA_ADDRESS,0
   Else
      Dec D
   End If

   MT_CHECK_TILE:

   If TMA=TTMA Then Goto MT_WALK_PATH
   TWMA=WMA+TMA-MA_ADDRESS
   Inc D

   If Deek(TWMA-MA_ROWSIZE) and $8000
      If OB_FLAGS(Deek(TMA-MA_ROWSIZE)) and 1
         Loke WQAW,TMA-MA_ROWSIZE : Add WQAW,4
         Doke TWMA-MA_ROWSIZE,D
      End If
   End If

   If Deek(TWMA-2) and $8000
      If OB_FLAGS(Deek(TMA-2)) and 1
         Loke WQAW,TMA-2 : Add WQAW,4
         Doke TWMA-2,D
      End If
   End If

   If Deek(TWMA+2) and $8000
      If OB_FLAGS(Deek(TMA+2)) and 1
         Loke WQAW,TMA+2 : Add WQAW,4
         Doke TWMA+2,D
      End If
   End If

   If Deek(TWMA+MA_ROWSIZE) and $8000
      If OB_FLAGS(Deek(TMA+MA_ROWSIZE)) and 1
         Loke WQAW,TMA+MA_ROWSIZE : Add WQAW,4
         Doke TWMA+MA_ROWSIZE,D
      End If
   End If

   MT_CHECK_NEXT_TILE:

   If WQAR-WQAW
      TMA=Leek(WQAR) : Add WQAR,4
      Goto MT_CHECK_TILE
   End If

   If DTF=0
      Dec DTF

      WQAR=WQAS
      WQAW=WQAR

      TMA=DTMA-MA_ROWSIZE-2
      TWMA=WMA+TMA-MA_ADDRESS
      If Deek(TWMA) and $8000
         If OB_FLAGS(Deek(TMA)) and 1
            Doke TWMA,0
            Loke WQAW,TMA : Add WQAW,4
         End If
      End If

      Add TMA,4
      Add TWMA,4
      If Deek(TWMA) and $8000
         If OB_FLAGS(Deek(TMA)) and 1
            Doke TWMA,0
            Loke WQAW,TMA : Add WQAW,4
         End If
      End If

      Add TMA,MA_ROWSIZE*2
      Add TWMA,MA_ROWSIZE*2
      If Deek(TWMA) and $8000
         If OB_FLAGS(Deek(TMA)) and 1
            Doke TWMA,0
            Loke WQAW,TMA : Add WQAW,4
         End If
      End If

      Add TMA,-4
      Add TWMA,-4
      If Deek(TWMA) and $8000
         If OB_FLAGS(Deek(TMA)) and 1
            Doke TWMA,0
            Loke WQAW,TMA : Add WQAW,4
         End If
      End If

      D=0
      Goto MT_CHECK_NEXT_TILE

   End If

   Dec RC
   Goto MT_LEAVE

   MT_WALK_PATH:
Note: the flickering of the character in the main part of the video is due to the fact that the source code relies on a certain optimization, so the non-optimized code sometimes uses wrong values; more precisely, some offsets used to calculate the animation frames indexes of the character are stored in Areg() and the source code assumes that Areg() is not modified by AMOS; in this case, the Call command does modify them, whereas the optimized executable does not because the tool is instructed by a specific command line switch to not return in Areg() the values stored in the CPU address registers by the called machine language routine.

EDIT: added video link.

Last edited by saimo; 09 June 2023 at 13:27.
saimo is offline  
Old 08 June 2023, 21:05   #8
alain.treesong
Aghnar
 
Join Date: Jan 2019
Location: France
Posts: 155
Hi Saimo,
Very interesting. I have a lot of code in Amos pro so we can apply your tool to a lot of code snippets and see the differences.
I publish sometimes some things here :
https://github.com/alain-treesong/amiga_coding_in_amos
and i have a few demos here :
https://demozoo.org/groups/111822/
So we can test a lot of cases I think

Tell me if you are interested.

See u
alain.treesong is offline  
Old 09 June 2023, 00:35   #9
saimo
Registered User
 
saimo's Avatar
 
Join Date: Aug 2010
Location: Italy
Posts: 854
Quote:
Originally Posted by alain.treesong View Post
Hi Saimo,
Very interesting. I have a lot of code in Amos pro so we can apply your tool to a lot of code snippets and see the differences.
I publish sometimes some things here :
https://github.com/alain-treesong/amiga_coding_in_amos
and i have a few demos here :
https://demozoo.org/groups/111822/
So we can test a lot of cases I think

Tell me if you are interested.

See u
Thanks for the examples! If you provide me with the uncompressed executables, I'll post the optimized ones. If you feel like, add some basic benchmark code so that the difference can actually be measured.

EDIT2: elaborating more on the post made in a hurry, late at night, while I should have been in bed... I downloaded the latest demo and it turned out that no optimizations were possible - that happens with compressed executables; since I really should have gone to bed, I gave up; also, I downloaded one of the sources and, from a quick glance at it, it was clear that the tool would not help much as the core loop is dominated by FP maths (which the tool doesn't touch) and polygon rendering (which the tool doesn't touch). Anyway, later I'll give all the sources a spin and report back.

Last edited by saimo; 09 June 2023 at 13:21.
saimo is offline  
Old 09 June 2023, 00:36   #10
saimo
Registered User
 
saimo's Avatar
 
Join Date: Aug 2010
Location: Italy
Posts: 854
Doh, I totally forgot to post the video that shows the test

[ Show youtube player ]

Doh2: the 8.7% figure is bogus! The optimized code takes 87% (see where the figure came from?) of the unoptimized code time, so the speedup is 13%.

This is what happens when doing things in a hurry and while falling asleep...

Last edited by saimo; 09 June 2023 at 13:29.
saimo is offline  
Old 09 June 2023, 15:06   #11
saimo
Registered User
 
saimo's Avatar
 
Join Date: Aug 2010
Location: Italy
Posts: 854
@alain.treesong

OK, I had now a look at your example sources. I chose SimpleCube.amos as test case because the rendering part (which APEO ignores) is minimal and it uses some arrays (which get accelerated by APEO) in the inner loop.


CODE

I modified the code as follows:

Code:
Set Buffer 12
Rem Simple wire 3d cube  
Rem
Rem Aghnar / Agima may 2022

Degree 
NDP=8
CX=160
CY=128
CZ=256*5
Dim X(NDP),Y(NDP),Z(NDP)
Dim XE(NDP),YE(NDP)
Dim C(1024),S(1024)
'
Global X(),Y(),Z(),XE(),YE(),C(),S(),NDP,AX,AY,AZ,CX,CY,CZ
Global FI
'
Screen Open 0,320,256,2,Lowres
Screen Display 0,128,40,320,256
Paper 0 : Hide On : Flash Off : Curs Off : Cls 
Palette $0,$666

' Simple horizontal line using copper to enhance the scene 
Set Rainbow 0,0,16,"","",""
Rain(0,0)=$CCC
Rain(0,1)=$444
Rain(0,2)=$111
Rainbow 0,0,270,16

Ink 1 : Pen 1
Double Buffer : Autoback 0

' Trigo table and definition of the 8 points for the cube
For I=0 To 1023
   C(I)=Qcos(I,256) : S(I)=Qsin(I,256)
Next I
'
For I=1 To NDP
   Read X(I),Y(I),Z(I)
Next 
'
Data -1,-1,-1
Data 1,-1,-1
Data 1,1,-1
Data -1,1,-1
Data -1,-1,1
Data 1,-1,1
Data 1,1,1
Data -1,1,1
'
AX=0
AY=0
AZ=0

' Timing start 

TS=Timer

' Main loop : rotation on the 3 axis 

Repeat 
   
   Add AX,-1,0 To 1023
   Add AY,1,0 To 1023
   Add AZ,-1,0 To 1023
   
   RENDER_CUBE
   
   Screen Swap 
   FI=Timer
   
Until AX=0

'Benchmark report

ET=Timer-TS+1
Print "elapsed time:";ET;" frames"
Print "speed: ";(1024*50.0)/ET;" fps"
Screen Swap 
Wait Key 

Procedure RENDER_CUBE
   
   CAX=C(AX)
   SAX=S(AX)
   CAY=C(AY)
   SIY=S(AY)
   CAZ=C(AZ)
   SAZ=S(AZ)
   
   ' Rotation and projection of the 8 points. 
   ' A lot of optimizations are possible here : 
   ' - inlining, no array (xe(1) replaced by xe1) 
   ' - the fact that the initial x,y z values are 1 or -1   
   
   ULC=10
   While ULC

      I=NDP
      While I
         
         ' rotation %X  
         X=X(I)*256
         Y=Y(I)*CAX+Z(I)*SAX
         Z=-Y(I)*SAX+Z(I)*CAX
         '  
         ' rotation %Y
         X2=X*CAY+Z*SIY
         Z=-X*SIY+Z*CAY
         '  
         ' rotation %Z
         X2=X2/256
         X=X2*CAZ+Y*SAZ
         Y=-X2*SAZ+Y*CAZ
         '
         ' Projection 
         D=CZ+Z/256
         XE(I)=CX+X/D
         YE(I)=CY+Y/D
         
         Dec I
      Wend 
      
      Dec ULC
   Wend 
   
   Repeat : Until Timer-FI
   
   ' Draw all 12 lines  
   Blitter Clear 0,0
   Turbo Draw XE(2),YE(2) To XE(6),YE(6),1,1
   Turbo Draw XE(6),YE(6) To XE(5),YE(5),1,1
   Turbo Draw XE(5),YE(5) To XE(1),YE(1),1,1
   Turbo Draw XE(1),YE(1) To XE(2),YE(2),1,1
   Turbo Draw XE(5),YE(5) To XE(8),YE(8),1,1
   Turbo Draw XE(8),YE(8) To XE(7),YE(7),1,1
   Turbo Draw XE(7),YE(7) To XE(6),YE(6),1,1
   Turbo Draw XE(1),YE(1) To XE(4),YE(4),1,1
   Turbo Draw XE(4),YE(4) To XE(3),YE(3),1,1
   Turbo Draw XE(3),YE(3) To XE(2),YE(2),1,1
   Turbo Draw XE(3),YE(3) To XE(7),YE(7),1,1
   Turbo Draw XE(8),YE(8) To XE(4),YE(4),1,1
   
End Proc
A key change is that I replaced

Screen Swap
Wait Vbl


with

Repeat : Until Timer-FI
<rendering code>
Screen Swap
FI=Timer


This allows to use all the available CPU cycles and thus get the best performance possible on underpowered machines (Wait Vbl, instead, just wastes time doing nothing).

Then, given that the code already ran at 50 fps also on a stock A500, I forced the rotation and projection calculations to artificially repeat 10 times (While ULC... Wend loop).

Then, I replaced the inner For...Next with While...Wend because the former compiles terribly (it should always be replaced by While...Wend or Repeat...Until).

Finally, I added some code to measure the performance.

Attached is the bootable .adf with the test executables.


COMPILING AND OPTIMIZING

I compiled the code and then I created an optimized executable.
The result of the optimization was:



That means that APEO:
* optimized the global routine that handles the accesses to arrays;
* optimized the Colour() and Colour routines (for some reason, the Compiler seems to always include them in the executables, even when, like in this case, they are not used);
* optimized two divisions by a power of 2 (the /256 in X2=X2/256 and D=CZ+Z/256; this optimization is not beneficial on 68000, though).

The While...Wend and Repeat...Until loops I added were not optimized because I wrote them in a way that the Compiler already produces its best output.

Bootable .adf with the executables attached here.


BENCHMARKING

I ran the executables using a stock A500 configuration in WinUAE 5.0.0. and got these results:
* normal version: 4253 frames -> 12.038 fps;
* optimized version: 4089 frames -> 12.521 fps.
The optimized version took 164 frames = 3.28 seconds less.
The gain is minimal (less than 4%), but, after all, there wasn't much to optimize in first place. Anyway, a minimal gain is better than no gain
Attached Files
File Type: zip test.zip (487.7 KB, 24 views)
saimo is offline  
Old 09 June 2023, 22:02   #12
alain.treesong
Aghnar
 
Join Date: Jan 2019
Location: France
Posts: 155
@saimo
Great.
I don't worry about the low optimization rate here because it is a single test.

Some questions :
1. Arrays are very slow in Amos as you said, so generally i replace them by simple variables. So XE(1) becomes for example xe1, xe(0) xe0 etc. I use sometimes external parsers written in Java to do that but it remains fastidious and the produced code is verbose. I see arrays in the screenshot but can your tool handles this case (replacing arrays when possible or optimizing array to be as efficient than using single vars) ?
2. The pro compiler (2.0) already claims that it optimizes mul / div by power of two replacing by logical shift. It is why I generally user 2^n when possible. It isn't the case in the produced code by amos pro compiler 2.0 ?
3. Thx for the tips with while / wend etc. In fact generally the idea is to produce scene at 50 fps so the wait vbl is enough. Indeed, for slower scenes, it's interesting.
4. Will you publish your great tool (if it isn't already the case) ?

Very nice to speak about Amos code in 2023 :-)


Edit 1: I suppose from the screenshot that you optimize Amreg(). This is a great idea because it is a lot used in game with sprites and bobs and it is slow.
Other thing that is slow is the (quite powerfull) rain command. Using big rains or multiple rain is very slow. I suppose that this is because the computed copperlist is then slow to do. Will be cool if not too difficult to optimize that.

Edit 2 : The 64k intro (yes! and the pixelated world) are compressed using shrinkler but the other ones are not compressed

Last edited by alain.treesong; 09 June 2023 at 22:12.
alain.treesong is offline  
Old 09 June 2023, 22:29   #13
Retro1234
Phone Homer
 
Retro1234's Avatar
 
Join Date: Jun 2006
Location: 5150
Posts: 5,809
Can you try this? I'm currious
https://eab.abime.net/showthread.php...11#post1122111

Also some people claim Amos The Creator Compiler produces faster executables - any thoughts on this?
Retro1234 is offline  
Old 09 June 2023, 23:01   #14
saimo
Registered User
 
saimo's Avatar
 
Join Date: Aug 2010
Location: Italy
Posts: 854
@alain.treesong

Quote:
Originally Posted by alain.treesong View Post
1. Arrays are very slow in Amos as you said, so generally i replace them by simple variables. So XE(1) becomes for example xe1, xe(0) xe0 etc. I use sometimes external parsers written in Java to do that but it remains fastidious and the produced code is verbose.
Indeed using normal variables gives a huge boost.
For critical single-index arrays, it's best to use Areg(), Dreg() and Amreg() (if possible).

Quote:
I see arrays in the screenshot but can your tool handles this case (replacing arrays when possible or optimizing array to be as efficient than using single vars) ?
Nope. The tool optimizes the code that calculates the address of an item within an array, discarding the safety checks (they're pretty useless for compiled code). More precisely, this is what the tool does (straight from the comments in the code):

Code:
The assignment A(...)=... gets compiled as follows:
    <fetched/calculated value to assign gets put in d3>
    move.l d3,-(a3)
    <fetched/calculated index of first dimension gets put in d3>  
    move.l d3,-(a3)
    ...
    <fetched/calculated index of last dimension gets put in d3>
    move.l d3,-(a3)
    lea.l  *(a6),a0
    jsr    *(a4)
    move.l (a3)+,(a0)
where:
 * a0 ends up pointing to the address of the array descriptor;  
 * the code at *(a4) is the routine that performs the safety and type
   checks, and puts in a0 the address where the value is to be stored.

The array descriptor is (offset: content):
   0: number of dimensions
   1: log2(item size)
 2-3: maximum index of first dimension
 4-5: number of items in previous dimensions (for first dimension: 1)
 ...: ...
 ...: maximum index of last dimension
 ...: number of items in previous dimensions

The routine is:
    move.l  (a0),d0  ;2010      get array descriptor address    
    beq.w   #$0024   ;6700 0024 if array undefined...  
    movea.l d0,a0    ;2040      get array descriptor address  
    move.b  (a0)+,d3 ;1618      get number of dimensions  
    move.b  (a0)+,d4 ;1818      get log2 of item size
    moveq.l #0,d0    ;7000      clear high word
    moveq.l #0,d2    ;7400      reset number of items to skip from array beginning  
.l  move.w  (a0)+,d0 ;3018      get maximum index
    move.l  (a3)+,d1 ;221b      get desired index
    cmp.l   d0,d1    ;b280      check index against maximum possible  
    bhi.w   *        ;6200 **** if index too big...
    mulu.w  (a0)+,d1 ;c2d8      calculate number of items relative to previous dimensions
    add.l   d1,d2    ;d481      update number of items to skip  
    subq.b  #1,d3    ;5303      check next dimension  
    bne.b   .l       ;66ee      if dimensions not over...
    lsl.l   d4,d2    ;e9aa      calculate offset of item as index<<log2(item size)  
    adda.l  d2,a0    ;d1c2      calculate address of item
    rts              ;4e75

This code replaces the routine with:
    movea.l (a0),a0  ;2050 get array descriptor address  
    move.b  (a0)+,d3 ;1618 get number of dimensions  
    move.b  (a0)+,d4 ;1818 get log2 of item size  
    moveq.l #0,d0    ;7000 clear high word  
    moveq.l #0,d2    ;7400 reset number of items to skip from array beginning  
.l  addq.w  #2,a0    ;5448 skip maximum index  
    move.l  (a3)+,d1 ;221b get desired index  
    mulu.w  (a0)+,d1 ;c2d8 calculate number of items relative to previous dimensions  
    add.l   d1,d2    ;d481 update number of items to skip  
    subq.b  #1,d3    ;5303 check next dimension  
    bne.b   .l       ;66f4 if dimensions not over...  
    lsl.l   d4,d2    ;e9aa calculate offset of item as index<<log2(item size)  
    adda.l  d2,a0    ;d1c2 calculate address of item  
    rts              ;4e75
Quote:
2. The pro compiler (2.0) already claims that it optimizes mul / div by power of two replacing by logical shift. It is why I generally user 2^n when possible. It isn't the case in the produced code by amos pro compiler 2.0 ?
The Compiler produces this code regardless of the shift count:
Code:
   moveq.l     #<count>,d0
   asr/lsl.l   d0,d3
That's wasteful when the shift is 16, as multiplications/divisions by 65536 are done much more cheaply with swap.w + clr.w/ext.l, and also when the count is between 1 and 8, as asr/lsl are faster on 68020, 68030 and 68040 when the count is an immediate argument. APEO fixes/mitigates that.

Quote:
4. Will you publish your great tool (if it isn't already the case) ?
Nope, sorry :/ The purpose of this thread was precisely to verify whether the tool received enough interest to justify writing a proper version, writing the documentation, preparing a page and maintaing everything in the future (as I'm used to update all my stuff constantly, as you can see from the history of all my projects at https://retream.itch.io), and the answer has been that there is basically no interest.
But thanks for your interest!

Quote:
Edit 1: I suppose from the screenshot that you optimize Amreg(). This is a great idea because it is a lot used in game with sprites and bobs and it is slow.
Yes, APEO optimizes Amreg() as well. If I remember correctly, Amreg() uses the same complicated routine of Areg() and Dreg() to calculate the item address and do safety checks. APEO replaces everything with hand-made address calculation code - more precisely:

Code:
COMPILED
    move.l #$80000000,d1 ;223c 8000 0000 set flag
    bsr.w  *             ;6100 ****      call address calculation routine
    move.w (a0),d3       ;3610           read item value
    ext.l  d3            ;48c3           sign-extend value
    rts                  ;4e75

OPTIMIZED
    add.l  d3,d3         ;d683      calculate item offset
    lea.l  -$186e(a5),a0 ;41ed e792 calculate address of Amreg()
    move.w (a0,d3.l),d3  ;3630 3800 read item value
    ext.l  d3            ;48c3      sign-extend value
    rts                  ;4e75
Quote:
Other thing that is slow is the (quite powerfull) rain command. Using big rains or multiple rain is very slow. I suppose that this is because the computed copperlist is then slow to do. Will be cool if not too difficult to optimize that.
AMOS has hundreds of functions and the potential for optimizations is almost boundless. I focused on the most common stuff. I don't plan to add more optimizations to the tool. In future, I might add something if need arises when making a new game (I still haven't figured out which game to make with [ Show youtube player ]).

Quote:
Edit 2 : The 64k intro (yes! and the pixelated world) are compressed using shrinkler but the other ones are not compressed
I'll have a look at them later

Last edited by saimo; 09 June 2023 at 23:50.
saimo is offline  
Old 09 June 2023, 23:05   #15
saimo
Registered User
 
saimo's Avatar
 
Join Date: Aug 2010
Location: Italy
Posts: 854
@Retro1234

Quote:
Originally Posted by Retro1234 View Post
APEO can't optimize anything: is the executable compressed? Has it been produced by the AMOS Professional Compiler?

Quote:
Also some people claim Amos The Creator Compiler produces faster executables - any thoughts on this?
No idea - I never used that compiler.
saimo is offline  
Old 09 June 2023, 23:10   #16
Retro1234
Phone Homer
 
Retro1234's Avatar
 
Join Date: Jun 2006
Location: 5150
Posts: 5,809
Yeah it was probably compressed I'll see if I can find the source, thanks
Retro1234 is offline  
Old 09 June 2023, 23:18   #17
Retro1234
Phone Homer
 
Retro1234's Avatar
 
Join Date: Jun 2006
Location: 5150
Posts: 5,809
I started work on a program to convert Amos to Blitz but I never finished it. Blitz in general is "faster" a blitting Bobs.
Retro1234 is offline  
Old 09 June 2023, 23:49   #18
saimo
Registered User
 
saimo's Avatar
 
Join Date: Aug 2010
Location: Italy
Posts: 854
@alain.treesong
Quote:
Quote:
Edit 2 : The 64k intro (yes! and the pixelated world) are compressed using shrinkler but the other ones are not compressed
I'll have a look at them later
Here are the optimized demos MerryHappy, happy21, Happy2022 and NewImpact. The archive includes also the optimizations reports.
I don't the think the optimizations make any difference: the demos don't seem to do anything CPU-intensive and are probably frame-locked.
Notes:
* I noticed the programs can be broken with CTRL-C: if you need speed, use Comp Test Off;
* I had to remove the original executable from the NewImpact ADF as there wasn't enough space.
Attached Files
File Type: zip Agima_demos.zip (972.2 KB, 29 views)
saimo is offline  
Old 09 June 2023, 23:56   #19
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,413
Could AMOS be transpiled to C? I appreciate that one can't just magically do this without a runtime library to provide equivalent functionality for all the graphics and audio features that the language provides out of the box, but in principle, is there anything about the language, some fundamental impedance mismatch, that prevents automated conversion to C?
Karlos is online now  
Old 10 June 2023, 10:08   #20
saimo
Registered User
 
saimo's Avatar
 
Join Date: Aug 2010
Location: Italy
Posts: 854
Quote:
Originally Posted by Karlos View Post
Could AMOS be transpiled to C? I appreciate that one can't just magically do this without a runtime library to provide equivalent functionality for all the graphics and audio features that the language provides out of the box, but in principle, is there anything about the language, some fundamental impedance mismatch, that prevents automated conversion to C?
I'd say it would be quite straightforward to translate from AMOS to C (and the executable produced by the C compiler would surely be much more efficient than the AMOS one).

Last edited by saimo; 10 June 2023 at 12:44. Reason: Fixed typo.
saimo is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
[Wip] Amos Professional X : Bring AGA to Amos Pro! freddix Coders. AMOS 53 22 July 2023 09:53
Anyone fancy a free Gameboy Colour? Paul_s MarketPlace 17 16 May 2009 18:41
If Microshaft can have fancy qualifications for Windows... then why can't we? Paul_s Amiga scene 30 14 April 2008 08:19
Anyone fancy putting some ADF`s onto disk? :) Mike UK MarketPlace 4 22 January 2007 17:09
Fancy a NEW Amiga magazine? ronniet Amiga scene 2 18 April 2006 02:14

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 01:20.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.11260 seconds with 14 queries