English Amiga Board

English Amiga Board (https://eab.abime.net/index.php)
-   Coders. Asm / Hardware (https://eab.abime.net/forumdisplay.php?f=112)
-   -   68000 code optimisations (https://eab.abime.net/showthread.php?t=57587)

frank_b 05 February 2011 22:24

Oops I just edited instead of creating a new post :) Anyway

The blitter should be about the same speed as the cpu for a clear. It should be around 4 cycles per word.

Some machines don't have a blitter so it's good to know these things :D
Besides you might need to quickly copy or clear a fast RAM buffer some time :)

There is a draw back to the movem method to clear memory. Can you guess what it is?

Agreed on the cpu flags. There is another use case for this however. The seemingly redundant adda/suba etc instructions are useful because they don't set the cpu flags.

You can do a beq blah perform some adda/suba address register calculations and then do a subsequent beq later without an additional compare in the middle. You can also have a subroutine which doesn't affect the CPU flags called depending on a branch from a previous compare or add etc.

I'm assuming everyone knows the remainder power of two trick but I thought I'd post it anyway.

Bravo for the thread. It's an interesting topic. :)

StingRay 06 February 2011 00:12

Quote:

Originally Posted by pmc (Post 734615)
But on a 68000 equipped Amiga this wouldn't be faster than using the blitter to clear memory surely.

Pretty common method to clear the screen on 68000 is this:
Clear the upper half using the blitter, clear the bottom half using the CPU [movem.l dx-ax,-(a7)]



Quote:

Originally Posted by frank_b (Post 734623)
There is a draw back to the movem method to clear memory. Can you guess what it is?

Depending on the size of memory you want to clear and the number of used registers in the movem loop, you may have to clear the last bytes with some extra move.b/w./.l instructions.

frank_b 06 February 2011 00:28

Quote:

Originally Posted by StingRay (Post 734639)
Pretty common method to clear the screen on 68000 is this:
Clear the upper half using the blitter, clear the bottom half using the CPU [movem.l dx-ax,-(a7)]





Depending on the size of memory you want to clear and the number of used registers in the movem loop, you may have to clear the last bytes with some extra move.b/w./.l instructions.

True. The drawback I was hinting at was interrupt latency however.

The blitter should only use 1 DMA slot with a clear (D only). There should be some concurrency with the CPU as long as bitplane DMA isn't active. I'll need to check this next time I'm near a real Amiga. :)

pmc 06 February 2011 10:11

Quote:

Originally Posted by frank_b
The blitter should be about the same speed as the cpu for a clear. It should be around 4 cycles per word.

OK, thanks. :great

Quote:

Originally Posted by frank_b
Some machines don't have a blitter so it's good to know these things

True. :)

Quote:

Originally Posted by frank_b
Besides you might need to quickly copy or clear a fast RAM buffer some time

and true again. :D

Quote:

Originally Posted by StingRay
Pretty common method to clear the screen on 68000 is this:
Clear the upper half using the blitter, clear the bottom half using the CPU [movem.l dx-ax,-(a7)]

OK, good to know. One thing though, I take it that you have to save / restore the stack (what I mean is: where a7 points to) before / after the movem clear operation so as not to kill the stack --> crash...?

frank_b 06 February 2011 12:21

Quote:

Originally Posted by pmc (Post 734708)
OK, thanks. :great



True. :)



and true again. :D



OK, good to know. One thing though, I take it that you have to save / restore the stack (what I mean is: where a7 points to) before / after the movem clear operation so as not to kill the stack --> crash...?

The 68k has two stack pointers remember ;)

pmc 06 February 2011 12:29

Quote:

Originally Posted by frank_b
The 68k has two stack pointers remember

True for a third time! :D

TheDarkCoder 15 February 2011 12:05

Quote:

Originally Posted by Toni Wilen (Post 734371)
"Undocumented un-optimization": bset x,dn (and friends) take 2 cycles more if x >= 16.

interesting!
does it happen only on 68000?

Toni Wilen 16 February 2011 08:52

Quote:

Originally Posted by TheDarkCoder (Post 736777)
interesting!
does it happen only on 68000?

I assume it is another limitation caused by 68000/010 being internally pseudo 32-bit (all registers are 2x16, ALU is 16bit etc..) = Most 32-bit operations take longer than 8/16 bit operations.

68020+ are full 32-bit and shouldn't have these kind of restrictions. (But internal pipelining and buffering even when caches are disabled makes this kind of timing measuring really tricky if not impossible)

pmc 20 April 2012 11:21

little optimisation - maybe obvious, maybe not but here goes...

Instead of:

Code:

                    lea                vals(pc),An
                    move.w              (An)+,Dn
                    move.w              (An),Dn
                    ext.l              Dn
                    ext.l              Dn

this:

Code:

                    lea                vals(pc),An
                    movem.w            (An),Dn-Dn

gets you the ext.l's for free :)

StingRay 20 April 2012 14:10

Yes, often used optimisation because movem sign extends. :) Not always what you want but often quite nifty indeed.

h0ffman 28 April 2012 02:31

Wow, there are some serious tips in this thread! Really need to get back into some code soon!

Samurai_Crow 28 April 2012 03:59

Yes there are! I wonder how many of these tricks also apply to the '020+. I think the VAsm optimizations are documented for different processor generations also.

Leffmann 28 April 2012 19:31

Simple code size optimization:
Code:

move.l  (A0), D0  -->  moveq  #$3F, D0
and.l  #$3F, D0        and.l  (A0), D0


mark_k 28 April 2012 22:08

A possible optimisation, but note that TAS sets the condition codes differently. Useful if you want to set bit 7 of a data register and don't care about its previous contents.

BSET #7,Dn → TAS Dn
ORI.B #$80,Dn → TAS Dn

The only condition code affected by BSET is Z (set if bit 7 was 0, cleared otherwise). For TAS, N is set if bit 7 was already 1. Z is set if Dn.B was 0. V and C are cleared.
For ORI, condition codes are set similarly to TAS, except they refer to the "after" value, whereas the TAS condition codes refer to the "before" value. So Z will never be set with ORI.B #$80,Dn.

The TAS instruction isn't generally/reliably usable when accessing memory on the Amiga, due to the locked read-modify-write cycle it uses. But since there's no memory access when the operand is a data register using it is okay in that case.

Samurai_Crow 28 April 2012 23:06

Quote:

Originally Posted by Leffmann (Post 815222)
Simple code size optimization:
Code:

move.l  (A0), D0  -->  moveq  #$3F, D0
and.l  #$3F, D0        and.l  (A0), D0


I had forgotten that the operands to a moveq were larger than an addq or subq. But you are correct!

Galahad/FLT 28 April 2012 23:24

Quote:

Originally Posted by mark_k (Post 815245)
The TAS instruction isn't generally/reliably usable when accessing memory on the Amiga, due to the locked read-modify-write cycle it uses. But since there's no memory access when the operand is a data register using it is okay in that case.

Its just not advised to use TAS at all on Amiga... EVER!

Photon 28 April 2012 23:37

Quote:

Originally Posted by pmc (Post 813791)
Code:

                    lea                vals(pc),An
                    movem.w            (An),Dn-Dn

gets you the ext.l's for free :)

Indeed it does, and
Code:

        movem.w Vals(PC),d0-d1
saves a further 4 cycles. (Yeye I know it's obvious.)

Mainly replied to say that movem has an overhead which makes it break even at a count of 3 registers. Here, 2 are faster only because of the desired sign extends.

The instructions take the cycles they take, and there's no instruction reorder optimizations on 68000 apart from the prefetch after the write to BLTSIZ and the hard-to-know odd-cycle alignment wait of instructions that take 6/10/14 etc cycles.

A simple one for when you have a loop loading registers from memory is to backup, then pre-poke a magic exitvalue (such as say, zero or negative) instead of checking end-address or loopctr/DBF. Since you're loading the registers anyway, a simple bmi.s Done instead of dbf Dn,KeepOn saves 2 cycles.

The same is true for other branches inside loops; you may save 4 cycles for 50% of the branches if all branches jump outside the loop. More, if there is a bias toward either true or false.

Optimized an unrolled loop yesterday from 64 cycles to 51.75 cycles average :nuts

Thorham 29 April 2012 11:12

Quote:

Originally Posted by Galahad/FLT (Post 815256)
Its just not advised to use TAS at all on Amiga... EVER!

Not even when not accessing memory? Doesn't seem like it could hurt then.

Anyway, if you want to use something like tas, just use bset/bclr instead. It will first test the specified bit and then set/clear that bit.

mark_k 29 April 2012 14:08

Quote:

Originally Posted by Galahad/FLT (Post 815256)
Its just not advised to use TAS at all on Amiga... EVER!

Have you checked whether that applies when the operand is a data register? I'm pretty sure it doesn't, though maybe someone with access to a logic analyser could check for sure.

When the TAS operand refers to memory is where there are problems. It probably can't be used in chip RAM (or slow $C00000 RAM). Some true fast RAM expansions might not support read-modify-write cycles either.

Apparently Commodore's Janus PC Bridgeboard software uses TAS with memory operand. But in that case, presumably the memory is on the bridgeboard, so the software knows the the R-M-W cycle is supported.

Thorham 29 April 2012 16:47

What I want to know is why TAS shouldn't be used for memory. Doesn't seem like it could hurt. I've actually tried (chipmem) it and it didn't seem to cause any weird behavior.


All times are GMT +2. The time now is 21:20.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.

Page generated in 0.05658 seconds with 11 queries