View Single Post
Old 26 November 2019, 14:24   #32
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,979
Quote:
Originally Posted by a/b View Post
Nice catch, it's a 4 cycles gain in the outer loop.

About lea vs. move/adda (after c_h label):
- 000/010: it's the same, 12 vs. 4+8 cycles
- 020/030: guessing it's the same
- 040: lea is faster
- 060: guessing lea is faster since all those should be "1 cycle", and 1 < 2
All in all, I'd use lea.

And yeah, -(a7) at the end. I left that intentionally, the whole movem situation has to be looked over since, probably, you don't have to preserve d2-d7/a2-a6 (the original code didn't, but I put there all the regs because it's mixed with c).
I never tested lea vs move/adda, my 68k optimisation mentor told me that in real move/adda version is fastest. Anyway I checked my assembler book and also
Move.w d2,A4
add.l a2,a4
version can be used. Perhaps 2 cycles fastest, but again not tested by me.
(SP) access can be removed too, but it needs 2 extra swap commands, perhaps speed for 68000 will be same.

Last edited by Don_Adan; 26 November 2019 at 14:41.
Don_Adan is offline  
 
Page generated in 0.07004 seconds with 11 queries