English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 05 June 2021, 18:42   #201
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Perhaps it should, yes. Alas this movem addressing mode isn't allowed.
meynaf is offline  
Old 05 June 2021, 18:46   #202
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,409
Well, there's two minor issues there. The first is that movem.l a0-a2,(a4)+ doesn't actually exist, so you'd need to use movem.l a0-a2,(a4) and then update the value of A4 by hand. The second is that you're indeed correct - movem is only faster if you use a certain number of registers. I'm not sure of the top of my head how many, but it's either three or four IIRC (and that's not counting the cost of the updating A4 in this case).

Edit: didn't see meynaf's post when I started writing this, sorry for the double info.
roondar is online now  
Old 06 June 2021, 00:08   #203
jotd
This cat is no more
 
jotd's Avatar
 
Join Date: Dec 2004
Location: FRANCE
Age: 52
Posts: 8,162
if you have a series of moves to perform you can add the total offset of a4 then use movem.l xxx,-(a4).

Slightly harder to maintain though. The gain doesn't seem too significant.
jotd is offline  
Old 02 August 2021, 12:47   #204
Photon
Moderator
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,602
An optimization of mine was brought to my attention yesterday It's nothing advanced but maybe it fits. It goes under basic ALU operations really, which we could make a list of.

not = neg;sub #1

For example, if a number is negative and should be used for a loop count (e.g. dbf), not.w d0 negates it and subtracts 1 in a single instruction.
Photon is offline  
Old 02 August 2021, 13:54   #205
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,215
A typical use case is strlen:

Code:
 move.l a0,d0
.loop:
 tst.b (a0)+
 bne.s .loop
 sub.l a0,d0
 not.l d0
Thomas Richter is offline  
Old 02 August 2021, 14:02   #206
Photon
Moderator
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,602
Quote:
Originally Posted by Thomas Richter View Post
A typical use case is strlen:
Yep, and languages can keep count during string operations - this avoids running this counting loop even once (strlen simply loads the count attached to the string and returns.)

It would be actually be interesting with similar cases where a chunk of code can be completely omitted by planning ahead!
Photon is offline  
Old 25 May 2022, 16:39   #207
jotd
This cat is no more
 
jotd's Avatar
 
Join Date: Dec 2004
Location: FRANCE
Age: 52
Posts: 8,162
Question:

I have this '020 code

Code:
     moveq.l     #127,d5
     moveq.l     #126,d3
     move.l     (a0,d5.l*4),d0
     sub.l (a0,d3.l*4),d0
as d5 and d3 are clobbered just afterwards so we don't really need the values there, I figured that I could write


Code:
     move.l     (127*4,a0),d0
     sub.l (126*4,a0),d0
But the moveq is very quick and now I have 16 bit offsets instead of registers (but the *4 operation is done at compile time)

Is my optimisation useful?

Or I could use another register:

Code:
   lea 127*4(a0),a1
   move.l  (A1),d0
   sub.l    -(A1),d0  # pre-decrementing to get offset 126*4

Last edited by jotd; 25 May 2022 at 18:52.
jotd is offline  
Old 25 May 2022, 18:11   #208
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Yes, this is faster:

Code:
     move.l  (127*4,a0),d0
     sub.l   (126*4,a0),d0
Also on bare 68k (EDIT: of course even if it were only (ax,dx.l),d0 ).

Last edited by ross; 25 May 2022 at 18:16.
ross is offline  
Old 25 May 2022, 18:50   #209
jotd
This cat is no more
 
jotd's Avatar
 
Join Date: Dec 2004
Location: FRANCE
Age: 52
Posts: 8,162
thanks that's what I thought
jotd is offline  
Old 08 June 2022, 21:46   #210
jotd
This cat is no more
 
jotd's Avatar
 
Join Date: Dec 2004
Location: FRANCE
Age: 52
Posts: 8,162
I was asked to optimize a 68020 code for work (yes, I know, that's great)

The original code shifts D10 by D3 bits on the right.

Code:
GO_ON:
    ASR.L #1,D1
    ROXR.L #1,D0
    SUBQ.L #1,D3
    BGE GO_ON

If we have:

D1 = $12345678
D0 = $9ABCDEF0

D3 = 24 (easier to understand what it does)

In the end we get:

D1 = $00000012
D0 = $3456789A

Of course, one trivial optimization is to replace SUBQ+BGE by DBF, but it only speeds up a bit. My idea was to get rid of the loop, with the help of extra registers that I could spare

Code:
       addq.l  #1,d3   ; loop counter is one off
        lsr.l   d3,d0
        moveq.l #0,d5
        bset    d3,d5
        subq.l  #1,d5   ; generate 1111s mask
        move.l  d1,d2
        and.l   d5,d2
        asr.l   d3,d1
        sub.l   #32,d3
        neg.l   d3              ; shift = 32-shift
        lsl.l   d3,d2   
        or.l    d2,d0
The code is faster for D3 > 2. It's 5 times faster when D3=20, so it's already great.

Anyone can propose further improvements on that one?
jotd is offline  
Old 08 June 2022, 22:17   #211
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by jotd View Post
Anyone can propose further improvements on that one?
Micro optimization..
Code:
        addq.w  #1,d3   ; loop counter is one off
        lsr.l   d3,d0
        moveq   #0,d5
        bset    d3,d5
        subq.l  #1,d5   ; generate 1111s mask
        move.l  d1,d2
        and.l   d5,d2
        asr.l   d3,d1
        moveq   #32,d5
        sub.w   d3,d5   ; shift = 32-shift
        lsl.l   d5,d2   
        or.l    d2,d0
But probably only on 68000

EDIT: I had wasted a register...

Last edited by ross; 08 June 2022 at 22:29.
ross is offline  
Old 08 June 2022, 22:18   #212
phx
Natteravn
 
phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,496
This is a standard 64-bit shift-right operation, which you find in any m68k C-compiler's clib.
For example:
Code:
        tst.w   d3
        beq     .2
        moveq   #32,d2
        sub.l   d3,d2
        bgt.b   .1
        move.l  d0,d1
        neg.l   d2
        add.l   d0,d0
        subx.l  d0,d0
        asr.l   d2,d1
        bra.b   .2
.1:     move.l  d0,d4
        lsr.l   d3,d1
        lsl.l   d2,d4
        asr.l   d3,d0
        or.l    d4,d1
.2:     rts
EDIT: Don't know if this is faster than the jotd/ross version. Probably not. Too lazy to count.
phx is offline  
Old 08 June 2022, 22:31   #213
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by phx View Post
EDIT: Don't know if this is faster than the jotd/ross version. Probably not. Too lazy to count.
For sure it's more generic (support shift >32), I have to try it ...

EDIT: ah, the input are reversed, and in jotd's one the counter is +1
It is best to use this, with proper register input

Last edited by ross; 08 June 2022 at 22:42.
ross is offline  
Old 08 June 2022, 23:03   #214
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,959
Quote:
Originally Posted by phx View Post
This is a standard 64-bit shift-right operation, which you find in any m68k C-compiler's clib.
For example:
Code:
        tst.w   d3
        beq     .2
        moveq   #32,d2
        sub.l   d3,d2
        bgt.b   .1
        move.l  d0,d1
        neg.l   d2
        add.l   d0,d0
        subx.l  d0,d0
        asr.l   d2,d1
        bra.b   .2
.1:     move.l  d0,d4
        lsr.l   d3,d1
        lsl.l   d2,d4
        asr.l   d3,d0
        or.l    d4,d1
.2:     rts
EDIT: Don't know if this is faster than the jotd/ross version. Probably not. Too lazy to count.
If this is for C compilers then change "bra.b .2" to "rts".
Don_Adan is offline  
Old 08 June 2022, 23:25   #215
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,039
Perhaps this works? Didn't do much testing (looked fine with d3=24, 16, 8, 4, 0).
Code:
;	addq.w	#1,d3		; include this if needed (e.g. d3=23 for 24 shifts)
	moveq	#0,d2
	bset	d3,d2
	subq.l	#1,d2		; mask
	eor.l	d0,d1
	and.l	d2,d1
	or.l	d1,d0
	ror.l	d3,d0

Last edited by a/b; 09 June 2022 at 00:12. Reason: more shorterer+fasterer (if it works ><)
a/b is offline  
Old 09 June 2022, 00:21   #216
jotd
This cat is no more
 
jotd's Avatar
 
Join Date: Dec 2004
Location: FRANCE
Age: 52
Posts: 8,162
a/b what about the asr part? There's only one shift in that version. How can it give a correct result for d1?

ross micro optim looks good

Code:
        moveq   #32,d5
        sub.w   d3,d5   ; shift = 32-shift
why is that better than sub.l #32,d3 only on 68000? moveq+sub register isn't faster in all cases? plus you're eliminating the neg instruction. That looks marginally faster to me.

My target is a 68020 CPU
jotd is offline  
Old 09 June 2022, 00:44   #217
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,039
Ah, you need both d0 and d1. OK, I thought you only needed d0.

How about this (020+, as you mentioned)?
Code:
;	addq.l	#1,d3		; include this if needed (e.g. d3=23 for 24 shifts)
	moveq	#32,d2
	sub.l	d3,d2
	bfins	d1,d0{d2:d3}
	rol.l	d2,d0
	lsr.l	d3,d1
a/b is offline  
Old 09 June 2022, 00:47   #218
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Maybe this is the fastest if limited shift and 'right' d3 is used:

Code:
    moveq   #32,d2
    sub.l   d3,d2
    move.l  d1,d4
    lsr.l   d3,d0
    lsl.l   d2,d4
    asr.l   d3,d1
    or.l    d4,d0
It is a specialized version from the generic one (using the specifications of your registers).

About the 68020+: sometimes the speed is the same even if you use immediate values, but I think in fact in that case it is faster anyway (and use less memory).
ross is offline  
Old 09 June 2022, 00:47   #219
phx
Natteravn
 
phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,496
Quote:
Originally Posted by ross View Post
For sure it's more generic (support shift >32)
Indeed, I overlooked that. Then they are not comparable.

Quote:
ah, the input are reversed,
Missed that too. Usually the lower register is the MSW in 64-bit register pairs. I like big-endian.

Quote:
Originally Posted by Don_Adan View Post
If this is for C compilers then change "bra.b .2" to "rts".
I extracted it from vclib, exchanged d2 and d3 and removed the prolog and epilog, which includes movem. The bra.b was for the movem.

Quote:
Originally Posted by jotd View Post
why is that better than sub.l #32,d3 only on 68000? moveq+sub register isn't faster in all cases?
I would always prefer moveq+sub over sub.l-immediate as well. It also saves two bytes.

When a/b's solution works it would be brilliant. But I don't think it does. Did a quick check with d0:d1=$12345678:abcdef0 shifted by 7 and the result was $f02468ac:00000008.

EDIT: Wow... ross and me posted in the same minute again. How likely is that?

Last edited by phx; 09 June 2022 at 00:49. Reason: Strange...
phx is offline  
Old 09 June 2022, 00:49   #220
jotd
This cat is no more
 
jotd's Avatar
 
Join Date: Dec 2004
Location: FRANCE
Age: 52
Posts: 8,162
I wanted to look into bitfield instructions but thought they didn't cover registers as sources

That looks & reads great, but maybe it's too good to be true.

Great to see so many answers for my question. Thanks all.
jotd is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
68000 boot code billt Coders. General 15 05 May 2012 20:13
Wasted Dreams on 68000 sanjyuubi support.Games 5 27 May 2011 17:11
680x0 to 68000 Counia Hardware mods 1 01 March 2011 10:18
quitting on 68000? Hungry Horace project.WHDLoad 60 19 December 2006 20:17
3D code and/or internet code for Blitz Basic 2.1 EdzUp Retrogaming General Discussion 0 10 February 2002 11:40

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 00:39.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.18219 seconds with 16 queries