![]() |
Perhaps it should, yes. Alas this movem addressing mode isn't allowed.
|
Well, there's two minor issues there. The first is that movem.l a0-a2,(a4)+ doesn't actually exist, so you'd need to use movem.l a0-a2,(a4) and then update the value of A4 by hand. The second is that you're indeed correct - movem is only faster if you use a certain number of registers. I'm not sure of the top of my head how many, but it's either three or four IIRC (and that's not counting the cost of the updating A4 in this case).
Edit: didn't see meynaf's post when I started writing this, sorry for the double info. |
if you have a series of moves to perform you can add the total offset of a4 then use movem.l xxx,-(a4).
Slightly harder to maintain though. The gain doesn't seem too significant. |
An optimization of mine was brought to my attention yesterday :) It's nothing advanced but maybe it fits. It goes under basic ALU operations really, which we could make a list of.
not = neg;sub #1 For example, if a number is negative and should be used for a loop count (e.g. dbf), not.w d0 negates it and subtracts 1 in a single instruction. |
A typical use case is strlen:
Code:
move.l a0,d0 |
Quote:
It would be actually be interesting with similar cases where a chunk of code can be completely omitted by planning ahead! :great |
Question:
I have this '020 code Code:
moveq.l #127,d5 Code:
move.l (127*4,a0),d0 Is my optimisation useful? Or I could use another register: Code:
lea 127*4(a0),a1 |
Yes, this is faster:
Code:
move.l (127*4,a0),d0 |
thanks that's what I thought
|
I was asked to optimize a 68020 code for work (yes, I know, that's great)
The original code shifts D1:D0 by D3 bits on the right. Code:
If we have: D1 = $12345678 D0 = $9ABCDEF0 D3 = 24 (easier to understand what it does) In the end we get: D1 = $00000012 D0 = $3456789A Of course, one trivial optimization is to replace SUBQ+BGE by DBF, but it only speeds up a bit. My idea was to get rid of the loop, with the help of extra registers that I could spare Code:
addq.l #1,d3 ; loop counter is one off Anyone can propose further improvements on that one? |
Quote:
Code:
addq.w #1,d3 ; loop counter is one off EDIT: I had wasted a register... |
This is a standard 64-bit shift-right operation, which you find in any m68k C-compiler's clib.
For example: Code:
tst.w d3 |
Quote:
EDIT: ah, the input are reversed, and in jotd's one the counter is +1 It is best to use this, with proper register input :D |
Quote:
|
Perhaps this works? Didn't do much testing (looked fine with d3=24, 16, 8, 4, 0).
Code:
; addq.w #1,d3 ; include this if needed (e.g. d3=23 for 24 shifts) |
a/b what about the asr part? There's only one shift in that version. How can it give a correct result for d1?
ross micro optim looks good Code:
moveq #32,d5 My target is a 68020 CPU |
Ah, you need both d0 and d1. OK, I thought you only needed d0.
How about this (020+, as you mentioned)? Code:
; addq.l #1,d3 ; include this if needed (e.g. d3=23 for 24 shifts) |
Maybe this is the fastest if limited shift and 'right' d3 is used:
Code:
moveq #32,d2 About the 68020+: sometimes the speed is the same even if you use immediate values, but I think in fact in that case it is faster anyway (and use less memory). |
Quote:
Quote:
Quote:
Quote:
When a/b's solution works it would be brilliant. But I don't think it does. Did a quick check with d0:d1=$12345678:abcdef0 shifted by 7 and the result was $f02468ac:00000008. EDIT: Wow... ross and me posted in the same minute again. How likely is that? :) |
I wanted to look into bitfield instructions but thought they didn't cover registers as sources
That looks & reads great, but maybe it's too good to be true. Great to see so many answers for my question. Thanks all. |
All times are GMT +2. The time now is 21:04. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.