05 June 2021, 18:42 | #201 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,358
|
Perhaps it should, yes. Alas this movem addressing mode isn't allowed.
|
05 June 2021, 18:46 | #202 |
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,438
|
Well, there's two minor issues there. The first is that movem.l a0-a2,(a4)+ doesn't actually exist, so you'd need to use movem.l a0-a2,(a4) and then update the value of A4 by hand. The second is that you're indeed correct - movem is only faster if you use a certain number of registers. I'm not sure of the top of my head how many, but it's either three or four IIRC (and that's not counting the cost of the updating A4 in this case).
Edit: didn't see meynaf's post when I started writing this, sorry for the double info. |
06 June 2021, 00:08 | #203 |
This cat is no more
Join Date: Dec 2004
Location: FRANCE
Age: 52
Posts: 8,382
|
if you have a series of moves to perform you can add the total offset of a4 then use movem.l xxx,-(a4).
Slightly harder to maintain though. The gain doesn't seem too significant. |
02 August 2021, 12:47 | #204 |
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,674
|
An optimization of mine was brought to my attention yesterday It's nothing advanced but maybe it fits. It goes under basic ALU operations really, which we could make a list of.
not = neg;sub #1 For example, if a number is negative and should be used for a loop count (e.g. dbf), not.w d0 negates it and subtracts 1 in a single instruction. |
02 August 2021, 13:54 | #205 |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,324
|
A typical use case is strlen:
Code:
move.l a0,d0 .loop: tst.b (a0)+ bne.s .loop sub.l a0,d0 not.l d0 |
02 August 2021, 14:02 | #206 |
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,674
|
Yep, and languages can keep count during string operations - this avoids running this counting loop even once (strlen simply loads the count attached to the string and returns.)
It would be actually be interesting with similar cases where a chunk of code can be completely omitted by planning ahead! |
25 May 2022, 16:39 | #207 |
This cat is no more
Join Date: Dec 2004
Location: FRANCE
Age: 52
Posts: 8,382
|
Question:
I have this '020 code Code:
moveq.l #127,d5 moveq.l #126,d3 move.l (a0,d5.l*4),d0 sub.l (a0,d3.l*4),d0 Code:
move.l (127*4,a0),d0 sub.l (126*4,a0),d0 Is my optimisation useful? Or I could use another register: Code:
lea 127*4(a0),a1 move.l (A1),d0 sub.l -(A1),d0 # pre-decrementing to get offset 126*4 Last edited by jotd; 25 May 2022 at 18:52. |
25 May 2022, 18:11 | #208 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,498
|
Yes, this is faster:
Code:
move.l (127*4,a0),d0 sub.l (126*4,a0),d0 Last edited by ross; 25 May 2022 at 18:16. |
25 May 2022, 18:50 | #209 |
This cat is no more
Join Date: Dec 2004
Location: FRANCE
Age: 52
Posts: 8,382
|
thanks that's what I thought
|
08 June 2022, 21:46 | #210 |
This cat is no more
Join Date: Dec 2004
Location: FRANCE
Age: 52
Posts: 8,382
|
I was asked to optimize a 68020 code for work (yes, I know, that's great)
The original code shifts D10 by D3 bits on the right. Code:
GO_ON: ASR.L #1,D1 ROXR.L #1,D0 SUBQ.L #1,D3 BGE GO_ON If we have: D1 = $12345678 D0 = $9ABCDEF0 D3 = 24 (easier to understand what it does) In the end we get: D1 = $00000012 D0 = $3456789A Of course, one trivial optimization is to replace SUBQ+BGE by DBF, but it only speeds up a bit. My idea was to get rid of the loop, with the help of extra registers that I could spare Code:
addq.l #1,d3 ; loop counter is one off lsr.l d3,d0 moveq.l #0,d5 bset d3,d5 subq.l #1,d5 ; generate 1111s mask move.l d1,d2 and.l d5,d2 asr.l d3,d1 sub.l #32,d3 neg.l d3 ; shift = 32-shift lsl.l d3,d2 or.l d2,d0 Anyone can propose further improvements on that one? |
08 June 2022, 22:17 | #211 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,498
|
Micro optimization..
Code:
addq.w #1,d3 ; loop counter is one off lsr.l d3,d0 moveq #0,d5 bset d3,d5 subq.l #1,d5 ; generate 1111s mask move.l d1,d2 and.l d5,d2 asr.l d3,d1 moveq #32,d5 sub.w d3,d5 ; shift = 32-shift lsl.l d5,d2 or.l d2,d0 EDIT: I had wasted a register... Last edited by ross; 08 June 2022 at 22:29. |
08 June 2022, 22:18 | #212 |
Natteravn
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,551
|
This is a standard 64-bit shift-right operation, which you find in any m68k C-compiler's clib.
For example: Code:
tst.w d3 beq .2 moveq #32,d2 sub.l d3,d2 bgt.b .1 move.l d0,d1 neg.l d2 add.l d0,d0 subx.l d0,d0 asr.l d2,d1 bra.b .2 .1: move.l d0,d4 lsr.l d3,d1 lsl.l d2,d4 asr.l d3,d0 or.l d4,d1 .2: rts |
08 June 2022, 22:31 | #213 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,498
|
Quote:
EDIT: ah, the input are reversed, and in jotd's one the counter is +1 It is best to use this, with proper register input Last edited by ross; 08 June 2022 at 22:42. |
|
08 June 2022, 23:03 | #214 | |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,050
|
Quote:
|
|
08 June 2022, 23:25 | #215 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,068
|
Perhaps this works? Didn't do much testing (looked fine with d3=24, 16, 8, 4, 0).
Code:
; addq.w #1,d3 ; include this if needed (e.g. d3=23 for 24 shifts) moveq #0,d2 bset d3,d2 subq.l #1,d2 ; mask eor.l d0,d1 and.l d2,d1 or.l d1,d0 ror.l d3,d0 Last edited by a/b; 09 June 2022 at 00:12. Reason: more shorterer+fasterer (if it works ><) |
09 June 2022, 00:21 | #216 |
This cat is no more
Join Date: Dec 2004
Location: FRANCE
Age: 52
Posts: 8,382
|
a/b what about the asr part? There's only one shift in that version. How can it give a correct result for d1?
ross micro optim looks good Code:
moveq #32,d5 sub.w d3,d5 ; shift = 32-shift My target is a 68020 CPU |
09 June 2022, 00:44 | #217 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,068
|
Ah, you need both d0 and d1. OK, I thought you only needed d0.
How about this (020+, as you mentioned)? Code:
; addq.l #1,d3 ; include this if needed (e.g. d3=23 for 24 shifts) moveq #32,d2 sub.l d3,d2 bfins d1,d0{d2:d3} rol.l d2,d0 lsr.l d3,d1 |
09 June 2022, 00:47 | #218 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,498
|
Maybe this is the fastest if limited shift and 'right' d3 is used:
Code:
moveq #32,d2 sub.l d3,d2 move.l d1,d4 lsr.l d3,d0 lsl.l d2,d4 asr.l d3,d1 or.l d4,d0 About the 68020+: sometimes the speed is the same even if you use immediate values, but I think in fact in that case it is faster anyway (and use less memory). |
09 June 2022, 00:47 | #219 | ||
Natteravn
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,551
|
Indeed, I overlooked that. Then they are not comparable.
Quote:
I extracted it from vclib, exchanged d2 and d3 and removed the prolog and epilog, which includes movem. The bra.b was for the movem. Quote:
When a/b's solution works it would be brilliant. But I don't think it does. Did a quick check with d0:d1=$12345678:abcdef0 shifted by 7 and the result was $f02468ac:00000008. EDIT: Wow... ross and me posted in the same minute again. How likely is that? Last edited by phx; 09 June 2022 at 00:49. Reason: Strange... |
||
09 June 2022, 00:49 | #220 |
This cat is no more
Join Date: Dec 2004
Location: FRANCE
Age: 52
Posts: 8,382
|
I wanted to look into bitfield instructions but thought they didn't cover registers as sources
That looks & reads great, but maybe it's too good to be true. Great to see so many answers for my question. Thanks all. |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
68000 boot code | billt | Coders. General | 15 | 05 May 2012 20:13 |
Wasted Dreams on 68000 | sanjyuubi | support.Games | 5 | 27 May 2011 17:11 |
680x0 to 68000 | Counia | Hardware mods | 1 | 01 March 2011 10:18 |
quitting on 68000? | Hungry Horace | project.WHDLoad | 60 | 19 December 2006 20:17 |
3D code and/or internet code for Blitz Basic 2.1 | EdzUp | Retrogaming General Discussion | 0 | 10 February 2002 11:40 |
|
|