Oops I just edited instead of creating a new post :) Anyway
The blitter should be about the same speed as the cpu for a clear. It should be around 4 cycles per word. Some machines don't have a blitter so it's good to know these things :D Besides you might need to quickly copy or clear a fast RAM buffer some time :) There is a draw back to the movem method to clear memory. Can you guess what it is? Agreed on the cpu flags. There is another use case for this however. The seemingly redundant adda/suba etc instructions are useful because they don't set the cpu flags. You can do a beq blah perform some adda/suba address register calculations and then do a subsequent beq later without an additional compare in the middle. You can also have a subroutine which doesn't affect the CPU flags called depending on a branch from a previous compare or add etc. I'm assuming everyone knows the remainder power of two trick but I thought I'd post it anyway. Bravo for the thread. It's an interesting topic. :) |
Quote:
Clear the upper half using the blitter, clear the bottom half using the CPU [movem.l dx-ax,-(a7)] Quote:
|
Quote:
The blitter should only use 1 DMA slot with a clear (D only). There should be some concurrency with the CPU as long as bitplane DMA isn't active. I'll need to check this next time I'm near a real Amiga. :) |
Quote:
Quote:
Quote:
Quote:
|
Quote:
|
Quote:
|
Quote:
does it happen only on 68000? |
Quote:
68020+ are full 32-bit and shouldn't have these kind of restrictions. (But internal pipelining and buffering even when caches are disabled makes this kind of timing measuring really tricky if not impossible) |
little optimisation - maybe obvious, maybe not but here goes...
Instead of: Code:
lea vals(pc),An Code:
lea vals(pc),An |
Yes, often used optimisation because movem sign extends. :) Not always what you want but often quite nifty indeed.
|
Wow, there are some serious tips in this thread! Really need to get back into some code soon!
|
Yes there are! I wonder how many of these tricks also apply to the '020+. I think the VAsm optimizations are documented for different processor generations also.
|
Simple code size optimization:
Code:
move.l (A0), D0 --> moveq #$3F, D0 |
A possible optimisation, but note that TAS sets the condition codes differently. Useful if you want to set bit 7 of a data register and don't care about its previous contents.
BSET #7,Dn → TAS Dn ORI.B #$80,Dn → TAS Dn The only condition code affected by BSET is Z (set if bit 7 was 0, cleared otherwise). For TAS, N is set if bit 7 was already 1. Z is set if Dn.B was 0. V and C are cleared. For ORI, condition codes are set similarly to TAS, except they refer to the "after" value, whereas the TAS condition codes refer to the "before" value. So Z will never be set with ORI.B #$80,Dn. The TAS instruction isn't generally/reliably usable when accessing memory on the Amiga, due to the locked read-modify-write cycle it uses. But since there's no memory access when the operand is a data register using it is okay in that case. |
Quote:
|
Quote:
|
Quote:
Code:
movem.w Vals(PC),d0-d1 Mainly replied to say that movem has an overhead which makes it break even at a count of 3 registers. Here, 2 are faster only because of the desired sign extends. The instructions take the cycles they take, and there's no instruction reorder optimizations on 68000 apart from the prefetch after the write to BLTSIZ and the hard-to-know odd-cycle alignment wait of instructions that take 6/10/14 etc cycles. A simple one for when you have a loop loading registers from memory is to backup, then pre-poke a magic exitvalue (such as say, zero or negative) instead of checking end-address or loopctr/DBF. Since you're loading the registers anyway, a simple bmi.s Done instead of dbf Dn,KeepOn saves 2 cycles. The same is true for other branches inside loops; you may save 4 cycles for 50% of the branches if all branches jump outside the loop. More, if there is a bias toward either true or false. Optimized an unrolled loop yesterday from 64 cycles to 51.75 cycles average :nuts |
Quote:
Anyway, if you want to use something like tas, just use bset/bclr instead. It will first test the specified bit and then set/clear that bit. |
Quote:
When the TAS operand refers to memory is where there are problems. It probably can't be used in chip RAM (or slow $C00000 RAM). Some true fast RAM expansions might not support read-modify-write cycles either. Apparently Commodore's Janus PC Bridgeboard software uses TAS with memory operand. But in that case, presumably the memory is on the bridgeboard, so the software knows the the R-M-W cycle is supported. |
What I want to know is why TAS shouldn't be used for memory. Doesn't seem like it could hurt. I've actually tried (chipmem) it and it didn't seem to cause any weird behavior.
|
All times are GMT +2. The time now is 21:20. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.