English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 27 April 2019, 01:35   #161
ross
Per aspera ad astra

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 49
Posts: 2,472
Quote:
Originally Posted by PeterJ View Post
and it uses a lot of lsr #8 and Lsl #8 so the about is just perfect
For
lsr #8
the trick is similar:

Code:
    d0.w=xx00
    moveq   #0,d1
    ....

    move.w  d0,-(sp)
    move.b  (sp)+,d1
    ....
    
    d1.w=00xx
In this case you can use stack (so no spare A register and mem) but you need a D register and you must never touch his upper bits.
ross is offline  
Old 27 April 2019, 05:51   #162
NorthWay
Registered User
 
Join Date: May 2013
Location: Grimstad / Norway
Posts: 642
[lsl #8]
My personal preference is to start the program with "clr.l -(sp)" and match it before end with "move.l (sp)+,d0", and then use pairs of
move.b dX,(sp)
move.w (sp),dX
NorthWay is online now  
Old 27 April 2019, 09:15   #163
PeterJ
Registered User

 
Join Date: Feb 2015
Location: Copehagen
Posts: 36
Quote:
Originally Posted by ross View Post
For
lsr #8
the trick is similar:

Code:
    d0.w=xx00
    moveq   #0,d1
    ....

    move.w  d0,-(sp)
    move.b  (sp)+,d1
    ....
    
    d1.w=00xx
In this case you can use stack (so no spare A register and mem) but you need a D register and you must never touch his upper bits.

i just tried with $ff56 and the result was $ff

is it not only if you use movem.w that it clear or set the upper word depending of bit15?

Last edited by PeterJ; 27 April 2019 at 09:30. Reason: edit just add some stuf
PeterJ is offline  
Old 27 April 2019, 10:21   #164
ross
Per aspera ad astra

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 49
Posts: 2,472
Quote:
Originally Posted by NorthWay View Post
[lsl #8]
My personal preference is to start the program with "clr.l -(sp)" and match it before end with "move.l (sp)+,d0", and then use pairs of
move.b dX,(sp)
move.w (sp),dX

You just have to be careful not to use it in a nested routine so your sentence should be written "to start the subroutine with".


Quote:
Originally Posted by PeterJ View Post
i just tried with $ff56 and the result was $ff
As is should (i've simply written d0=$xx00 because low bits are anyway lost so can be anything).
But from your next phrase is it not that you meant the
asr
instruction?
Quote:
is it not only if you use movem.w that it clear or set the upper word depending of bit15?
Regardless,
movem
deals with words (or longs) and never with bytes.
ross is offline  
Old 29 April 2019, 12:36   #165
hooverphonique
ex. demoscener "Bigmama"

 
Join Date: Jun 2012
Location: Fyn / Denmark
Posts: 1,062
Quote:
Originally Posted by PeterJ View Post
and it uses a lot of lsr #8 and Lsl #8 so the about is just perfect
If they are used in hot code, maybe the solution is to refactor the necessity for these shifts away completely
hooverphonique is offline  
Old 14 August 2019, 20:51   #166
NorthWay
Registered User
 
Join Date: May 2013
Location: Grimstad / Norway
Posts: 642
Quote:
Originally Posted by NorthWay View Post
once upon a time there was a thing called the GNU(gcc?) super-optimizer
I found this reference to it: https://courses.cs.washington.edu/co...s/massalin.pdf
NorthWay is online now  
Old 17 August 2019, 00:08   #167
Photon
Moderator

Photon's Avatar
 
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 4,818
Quote:
Originally Posted by NorthWay View Post
A factor from my experience is that optimized C runs 10x slower than optimized Assembly, and that C++ cannot be fully reduced to C for an application. (This is up to about 68020/80386, where actually Pascal in some cases had a lower factor than C. Since the popularity of C, this may have changed, but not measured by me.)

Later, on-chip caches affected performance more than the number and time length of instructions, and this allowed utility applications to not be bogged down and reduce this factor.

But even after this hardware acceleration (circa 1990), applications such as games and demos would never use C (or C++) in time-critical sections for another half a decade, as we know.

It's true to this day that any high-level language (or one posing as such!) will always be beaten by a great margin by "simply" writing the program in Assembly. (The advantage of truly portable languages is of course the portability and less code to write, if you're not using macros.)

All this to make clear that there is no language level higher than Assembly that will ever generate as efficient (or small) code as writing it in Assembly

It's self-evident. But just to give factors for the performance loss paid. The compiler doesn't know what you're trying to do, so it can't deliver the perfect translation.
Photon is offline  
Old 04 February 2020, 15:19   #168
sparhawk
Registered User

sparhawk's Avatar
 
Join Date: Sep 2019
Location: Essen/Germany
Age: 51
Posts: 304
Maybe there is a faster way to clear the upper word of a register?

Replace this (16 cycles):
Code:
    and.l   #$ffff,d0
With this (12 cycles):
Code:
    moveq   #0,d1
    move.w  d0,d1
    move.l  d1,d0
I try to avoid (if possible) the second move by arranging the registers appropriatly, in which case the count would go down to 8 cycles.

Also 12 Cycles but only one register needed:
Code:
    swap    d0
    clr.w   d0
    swap    d0

Last edited by sparhawk; 04 February 2020 at 15:24.
sparhawk is offline  
Old 04 February 2020, 16:51   #169
ross
Per aspera ad astra

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 49
Posts: 2,472
Quote:
Originally Posted by sparhawk View Post
Maybe there is a faster way to clear the upper word of a register?
If you know for sure that's a positive value, you can
ext.l dx
(4 cycles )

But usually I keep a register with the upper part zeroed out of the main loop and then move the data only for the lower part.
ross is offline  
Old 04 February 2020, 16:53   #170
sparhawk
Registered User

sparhawk's Avatar
 
Join Date: Sep 2019
Location: Essen/Germany
Age: 51
Posts: 304
Quote:
Originally Posted by ross View Post
If you know for sure that's a positive value, you can
ext.l dx
(4 cycles )

Yes, that woul dbe the obvious solution. But it depends, so in the general case, I can't know that.


I usually do a lot of prototyping in Easy68k and see if I can find faster solutions as it tells me the cycle count, which is IMO a great feature for that.
sparhawk is offline  
Old 21 February 2020, 14:03   #171
Antiriad_UK
OCS forever!

Antiriad_UK's Avatar
 
Join Date: Mar 2019
Location: Birmingham, UK
Posts: 190
This is one I saw in another thread that made me scratch my head for a while is using add dx,dx to simultaneously test and clear a "flag". For example you have a loop where you are setting a flag from 0 to 1 if something occurred. Then at the end of the loop you check the flag to see if you need to loop again and reset the flag (sorting routine I did this in).

Instead of:
Code:
.loop:
	moveq	#0,d0			;reset flag
	...
	;If something occured, flag it
	moveq	#1,d0
	...
	;Do we need to loop again?
	tst.w	d0
	bne.s	.loop
Do this:
Code:
	moveq	#0,d0			;reset flag once
.loop:
	...
	;If something occured, flag it
	moveq	#-128,d0		;set flag = $80 ($fffffff80)
	...
	;Do we need to loop again? Also reset flag
	add.b	d0,d0
	bcs.s	.loop
Can do the same thing with subq and bmi, but I liked the use of carry
Antiriad_UK is offline  
Old 21 February 2020, 14:50   #172
hooverphonique
ex. demoscener "Bigmama"

 
Join Date: Jun 2012
Location: Fyn / Denmark
Posts: 1,062
Quote:
Originally Posted by Antiriad_UK View Post
Code:
    moveq    #0,d0            ;reset flag once
.loop:
    ...
    ;If something occured, flag it
    moveq    #-128,d0        ;set flag = $80 ($fffffff80)
    ...
    ;Do we need to loop again? Also reset flag
    add.b    d0,d0
    bcs.s    .loop
I stared at this for a while without seeing how it would reset the flag (carry), but I suppose you meant "reset" in the sense of returning d0 to zero?
hooverphonique is offline  
Old 21 February 2020, 15:00   #173
Pixelfill
Registered User

 
Join Date: Dec 2019
Location: Preston
Posts: 28
Quote:
Originally Posted by hooverphonique View Post
I stared at this for a while without seeing how it would reset the flag (carry), but I suppose you meant "reset" in the sense of returning d0 to zero?
I may be wide of the mark here, but I'm guessing adding bytes -128 to -128 results in -256, therefore an overflow beyond a byte (carry) and also results in d0.b set to 00?
the key part I believe is not the add.b. but the fact that d0 contains $xxxxxx80 beforehand from the moveq

forgive me if I'm wrong as I'm just starting out.

Mike
Pixelfill is offline  
Old 21 February 2020, 15:05   #174
Antiriad_UK
OCS forever!

Antiriad_UK's Avatar
 
Join Date: Mar 2019
Location: Birmingham, UK
Posts: 190
Quote:
Originally Posted by hooverphonique View Post
I stared at this for a while without seeing how it would reset the flag (carry), but I suppose you meant "reset" in the sense of returning d0 to zero?
Yes resets d0 to 0 so you can save a whole 4 cycles per loop
Antiriad_UK is offline  
Old 21 February 2020, 15:42   #175
hooverphonique
ex. demoscener "Bigmama"

 
Join Date: Jun 2012
Location: Fyn / Denmark
Posts: 1,062
Quote:
Originally Posted by Pixelfill View Post
I may be wide of the mark here, but I'm guessing adding bytes -128 to -128 results in -256, therefore an overflow beyond a byte (carry) and also results in d0.b set to 00?
the key part I believe is not the add.b. but the fact that d0 contains $xxxxxx80 beforehand from the moveq
Yes, you're right, and it's also where I arrived at, hence the last part of my previous comment
hooverphonique is offline  
Old 02 March 2020, 16:05   #176
VladR
Registered User

 
Join Date: Dec 2019
Location: North Dakota
Posts: 142
Quote:
Originally Posted by Photon View Post
A factor from my experience is that optimized C runs 10x slower than optimized Assembly, and that C++ cannot be fully reduced to C for an application. (This is up to about 68020/80386, where actually Pascal in some cases had a lower factor than C. Since the popularity of C, this may have changed, but not measured by me.)
I hear you, I myself have spent a log time on Jaguar, after each build (there is a linker option to view resulting ASM), being dumbfounded by pages of ASM code (per 1-2 lines of C code) generated by the C compiler.


Quote:
Originally Posted by Photon View Post
It's true to this day that any high-level language (or one posing as such!) will always be beaten by a great margin by "simply" writing the program in Assembly. (The advantage of truly portable languages is of course the portability and less code to write, if you're not using macros.)
Unfortunately, that is not true anymore



I burnt through a great heap of money (current estimate is between $150,000 - $200,000 :rising daily as I keep working on it alongside the game, so keep this in mind before you ask for free download - it's literally like asking to donate an average American house) during last two years on designing Higgs, which is slightly lower-level than C, but I designed it to be identical in speed to hand-written ASM.


Current features are:
- full access to all registers and ASM instructions
- choice of WorkingRegister to use by Higgs if the feature requires it
- global/local variables/constants
- byte/word/long access via .bwl (default is long, so no need to specify .l)
- arrays
- structures
- typecasting (word/long)
- conditions
- loops (continue + break)
- blocks {} allowing to pollute the name-space only within the current block
- debug printing
- function declarations with parameters (your choice of registers or global or local variables)
- function call with or without parameters
- local functions (invisible to outside world) like in Pascal
- push/pop stack syntax
- basic math operations (signed var1 = var2 * var3), (var3 += var1)



All of the features above are possible to implement (Higgs is written in C#) with the exact same instruction footprint as if you wrote it manually in Asm.

Some common C features like switch or do-while are high on my to-do list - I somehow managed to write the game without them, to my surprise, so they simply didn't get implemented yet.
On-Demand Inlining (e.g. only when you want, but can still force it to always) is in Top 5. On a 6502 target, I have an Unroll Loop, this still needs to be implemented to 68000 target.


You still have to think in terms of byte/word/long access and still have to prefer registers to variables (but don't have to if you don't feel like it). You are solely responsible for contents of registers, but if you want - you have an option to code using just variables.



Primarily, this targets .68000.
Most of the features are implemented also for a RISC backend (Jaguar's GPU and DSP processors). I also have a .6502 and .6502C targets (though those are currently simplest).



Once networking gets enabled in core for Vampire (and I can start deploying builds to my V4), I will make .68080 target, eventually with AMMX support.


Quick example:
Code:

 ; Arrays of structures are supposed to be accessed sequentially
 ; each time you simply advance the pointer via Next () which is a simple add.l #StructSizeOf,ptrStruct


  array SLaserShot LaserShots [MaxLaserShots]   ; Player's lasers
  SLaserShot.UseRegister (a2) ; Use this register for access



 Animate_LaserShots:
 {
     ; Animate (localZ + camY) Already Active LaserShots
    { ; Player's LS
         register d7:lpMain
          ; Keep d1 as WorldSpeed, since SLaserShot_UpdateZ requires d1 as input
         register d1:WorldSpeed  ; PlayerSpeed + LS_Speed
         register d2:CurrentPlayerSpeed
         CurrentPlayerSpeed = PlayerSpeed >> #3

         SLaserShot.InitRegister (LaserShots)
         loop (lpMain = #MaxLaserShots)
         {
             WorldSpeed = CurrentPlayerSpeed + SLaserShot.Speed
           ; print2H (SLaserShot.camY,SLaserShot.camZ,#110,#50)
             if.l (SLaserShot.IsActive == #1)
             {
                 if.l (SLaserShot.FrameDeactivate <= Frame:d0)
                 { ; Disable LS if it travelled too far
                     SLaserShot.IsActive = #0
                 }
                 else
                 { ; LS can still remain active
                      SLaserShot_UpdateZ () ; Update Z
                      SLaserShot_UpdateY () ; Update Y (after Z, so it is sync'ed)
                 }
             }
             SLaserShot.Next ()
       }
    }
 rts
 }
Quote:
Originally Posted by Photon View Post
All this to make clear that there is no language level higher than Assembly that will ever generate as efficient (or small) code as writing it in Assembly

It's self-evident. But just to give factors for the performance loss paid. The compiler doesn't know what you're trying to do, so it can't deliver the perfect translation.
Not true for my Higgs.
Granted, it's lower level than C as it's not supposed to be completely safe and idiot-proof, like C is.


But it's infinitely more easy to add/remove Higgs code compared to ASM. The mental effort required for pure ASM (nested irregular conditions, etc.) makes it hard to simply discard the code you wrote. In Higgs, I don't even think about that - I simply delete the code and rewrite from scratch. Let the compiler insert all the jump labels and figure out the proper comparison/BXX instruction based on the parameters.




Quote:
Originally Posted by Photon View Post
It's self-evident. But just to give factors for the performance loss paid. The compiler doesn't know what you're trying to do, so it can't deliver the perfect translation.
Real-world example of my Higgs.
On Atari Jaguar, 98% of code was written for 68000 and only 4 KB in RISC (3D transform and rasterizer loop).


So, quite literally, everything else is 68000. That's:
- input,
- Z-sorting,
- culling World track mesh,
- double-buffering,
- creating doublebuffered polygon list for RISC GPU,
- strafing physics,
- collision detection,
- camera,
- full 8-state AI,
- spawning enemies,
- procedural random generation of enemy RPG parameters,
- HUD,
- managing Jaguar's ObjectProcessor list (and related IRQ),
- damage equations.
And about two dozen things I didn't think of right this moment.


On Jag, about 90% of that was rewritten (I started with 100% ASM, gradually as I kept adding Higgs features, rewrote additional parts) into Higgs (100% on Amiga), yet benchmarks showed that it only took 10% of frame time on the 13.3 MHz 68000.


Meaning, I could still run the full logic of game ten times per frame, yet keep 60 fps. So, even if the Motorola was 10x slower at just 1.4 MHz, it still should fit within a frame time. Now that's funny
VladR is offline  
Old 03 March 2020, 15:11   #177
phx
Natteravn

phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 1,602
Quote:
Originally Posted by VladR View Post
during last two years on designing Higgs, which is slightly lower-level than C, but I designed it to be identical in speed to hand-written ASM.
Higgs looks like a really interesting high-level assembler language, which might be useful to speed up development.

But to claim that it reaches identical speed to hand-optimized assembler cannot be true, so I have to defend Photon's statement here. You always have to make compromises when translating a high-level (even the lowest high-level) language into machine code. Give me a program generated by Higgs and I (and many other coders here) will always be able to show you sequences which allow optimization.
phx is offline  
Old 03 March 2020, 20:17   #178
VladR
Registered User

 
Join Date: Dec 2019
Location: North Dakota
Posts: 142
Quote:
Originally Posted by phx View Post
Higgs looks like a really interesting high-level assembler language, which might be useful to speed up development.

But to claim that it reaches identical speed to hand-optimized assembler cannot be true, so I have to defend Photon's statement here. You always have to make compromises when translating a high-level (even the lowest high-level) language into machine code. Give me a program generated by Higgs and I (and many other coders here) will always be able to show you sequences which allow optimization.
I really like the term HighLevel Assembler - after all, the baseline is the vasm source code file, where the Higgs Parser merely inserts new lines.


That's how it started - first with macros, then macro modifications at compile-time, and eventually parsing expressions and simple commands (loops, conditions, blocks, etc.).

Yeah, I probably wouldn't use "hand-optimized" term. Rather, I use "hand-written". Meaning, same efficiency as I would write it by hand in ASM (though, it is certainly possible to write a slightly faster version, if you are willing to bastardize the code to the point it's unreadable later).


It's always possible, in ASM, to rearrange and rewrite certain combination of instructions to save some cycles (as this thread has demonstrated probably dozens of times).


But, that creates unmaintainable code (long-term). You save 4 cycles by abusing some fluke register dependency, and when you need to change the code, boom. You burn half day debugging wth is going on




I'm sure we all did the same thing:
- you write version 1 - it works, it is nicely documented or even self-documented
- you spot something, make version 2 and it saves some cycles
- you do the same and have version 3
- 3 months later you make some change elsewhere that breaks some of the dependencies brought by optimizations (because you now use higher 16 bits or whatever else it is).


Now, it is possible, to implement a final Optimizer pass, that would go over the code, examine the register status and replace certain combination of ops by a different, faster one (like the ones mentioned in this thread).
That would be indeed useful for 68000, but since now I focus on Vampire and 68040-68060, it's not really critical for me.






That brings the question - is there some kind of optimizer like this already for 68000 ? Something that would do such analysis of the code and find combos of ops that are safe to replace with faster ones ?
VladR is offline  
Old 03 March 2020, 20:38   #179
StingRay
move.l #$c0ff33,throat

StingRay's Avatar
 
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,383
Quote:
Originally Posted by VladR View Post
(though, it is certainly possible to write a slightly faster version, if you are willing to bastardize the code to the point it's unreadable later).
Optimising doesn't necessarily equals unreadable!


Quote:
Originally Posted by VladR View Post
But, that creates unmaintainable code (long-term).
And neither does it mean the code is unmaintainable.
StingRay is online now  
Old 03 March 2020, 20:55   #180
mr.spiv
Registered User
mr.spiv's Avatar
 
Join Date: Aug 2006
Location: Finland
Age: 47
Posts: 108
Quote:
Originally Posted by VladR View Post
But, that creates unmaintainable code (long-term). You save 4 cycles by abusing some fluke register dependency, and when you need to change the code, boom. You burn half day debugging wth is going on
Somehow I found myself here
mr.spiv is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
68000 boot code billt Coders. General 15 05 May 2012 20:13
Wasted Dreams on 68000 sanjyuubi support.Games 5 27 May 2011 17:11
680x0 to 68000 Counia Hardware mods 1 01 March 2011 10:18
quitting on 68000? Hungry Horace project.WHDLoad 60 19 December 2006 20:17
3D code and/or internet code for Blitz Basic 2.1 EdzUp Retrogaming General Discussion 0 10 February 2002 11:40

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 23:15.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, vBulletin Solutions Inc.
Page generated in 0.11492 seconds with 15 queries