Preservation of registers - Page 2

guy lateur · 24 October 2018, 00:47

Quote:

Originally Posted by PeterK

the contents of d0 is unknown here, too. This could cause random memory trashing instead of terminating your text string.

Thanks, nice catch!

Corrected code:

Code:

;-- copy chars into textBuffer and zero-terminate it
;in:    a0:     source pointer
;in:    d0:     # characters to copy
copyToBuffer:
    move.l  4.w,a6              ;keep lib base in a6; a6 might get changed (?)
    move.l  d0,d2               ;CopyMem() does not necessarily preserve d0
    lea     textBuffer,a1
    jsr     _LVOCopyMem(a6) 
    lea     textBuffer,a1       ;CopyMem() does not necessarily preserve a1
    add.l   d2,a1
    move.b  #asciiNULL,(a1)
.return:
    rts

guy lateur · 24 October 2018, 00:48

Quote:

Originally Posted by Don_Adan

Yes, I know. You can check DOS.lzx from:

http://wt.exotica.org.uk/test.html

Ok, thanks for the tip!

Don_Adan · 24 October 2018, 01:38

Quote:

Originally Posted by guy lateur

Thanks, nice catch!

Corrected code:

Code:

;-- copy chars into textBuffer and zero-terminate it
;in:    a0:     source pointer
;in:    d0:     # characters to copy
copyToBuffer:
    move.l  4.w,a6              ;keep lib base in a6; a6 might get changed (?)
    move.l  d0,d2               ;CopyMem() does not necessarily preserve d0
    lea     textBuffer,a1
    jsr     _LVOCopyMem(a6) 
    lea     textBuffer,a1       ;CopyMem() does not necessarily preserve a1
    add.l   d2,a1
    move.b  #asciiNULL,(a1)
.return:
    rts

I dont coded long time, but for me your code is too long. You can use something like this, if you want:

Code:

;-- copy chars into textBuffer and zero-terminate it
;in:    a0:     source pointer
;in:    d0:     # characters to copy
copyToBuffer:
    move.l  4.w,a6              ;keep lib base in a6; a6 might get changed (?)
    lea     textBuffer,a1
    move.b  #asciiNULL,(a1,D0.L)
    JMP     _LVOCopyMem(a6)

guy lateur · 24 October 2018, 01:59

Quote:

Originally Posted by Don_Adan

I dont coded long time, but for me your code is too long. You can use something like this, if you want:

Code:

;-- copy chars into textBuffer and zero-terminate it
;in:    a0:     source pointer
;in:    d0:     # characters to copy
copyToBuffer:
    move.l  4.w,a6              ;keep lib base in a6; a6 might get changed (?)
    lea     textBuffer,a1
    move.b  #asciiNULL,(a1,D0.L)
    JMP     _LVOCopyMem(a6)

*m68k assembly noob alert*

Oh, the register indirect with index and offset adressing mode, you mean? Thanks, you're absolutely right!

No idea how I could've missed that, tbh..

jlin_au · 24 October 2018, 08:31

Couple of comments:
1) it would be good practice to check the return code in d0 and handle it appropriately after calling a system library routine
2) the "JMP _LVOCopyMem(a6)" in the example above instead of a "JSR ...." assumes that the stack contains the valid return address of the routine that called your code and that your routine has left the stack in a clean state! Otherwise !!!!!!

StingRay · 24 October 2018, 08:36

Quote:

2) the "JMP _LVOCopyMem(a6)" in the example above instead of a "JSR ...." assumes that the stack contains the valid return address of the routine that called your code and that your routine has left the stack in a clean state! Otherwise !!!!!

jmp _LVOCopyMem(a6) is perfectly fine here! It's just a shorter (and faster) way to do jsr+rts.

Edit: And CopyMem has no return code!

ross · 24 October 2018, 09:12

Actually it can be further optimized, knowing that #asciiNULL==0x00

guy lateur · 24 October 2018, 11:17

Quote:

Originally Posted by StingRay

jmp _LVOCopyMem(a6) is perfectly fine here! It's just a shorter (and faster) way to do jsr+rts.

For the record: my code had a jsr instead of a jmp. Don actually changed it to jmp (thanks!), but I paid no further attention to it until now. So for clarity: when is it ok to use jmp instead of jsr? I generally never touch the stack (yet), so am I always clear to use jmp then? If I put the program counter on the stack myself, is it still better/shorter/faster to do a jmp?

On a side note: where do you get your info about how many cycles and how many bytes a given instruction takes? Because it's not documented (at least not consistently) in any books I have. Is there an online source I could check?

guy lateur · 24 October 2018, 11:18

Quote:

Originally Posted by ross

Actually it can be further optimized, knowing that #asciiNULL==0x00

How? Because I don't think moveq supports this adressing mode. Clr.b? And.b with 0? Eor with self?

hooverphonique · 24 October 2018, 11:37

Quote:

Originally Posted by guy lateur

For the record: my code had a jsr instead of a jmp. Don actually changed it to jmp (thanks!), but I paid no further attention to it until now. So for clarity: when is it ok to use jmp instead of jsr? I generally never touch the stack (yet), so am I always clear to use jmp then? If I put the program counter on the stack myself, is it still better/shorter/faster to do a jmp?

when you have some code that ends in the following pattern

Code:

jsr/bsr xyz
rts

you can always change that to

Code:

jmp/bra xyz

provided that "xyz" is a subroutine ending in rts (which the library functions always are). If you think about it, you can probably quickly figure out why it is so

guy lateur · 24 October 2018, 11:52

Quote:

Originally Posted by hooverphonique

when you have a some code that ends in the following pattern

Code:

jsr xyz
rts

you can always change that to

Code:

jmp xyz

provided that "xyz" is a subroutine ending in rts (which the library functions always are). If you think about it, you can probably quickly figure out why it is so

I see, that makes sense. So you're actually 'abusing' the rts contained in the called function to serve in the calling function. Quite clever, indeed!

ross · 24 October 2018, 12:06

Quote:

Originally Posted by guy lateur

How? Because I don't think moveq supports this adressing mode. Clr.b? And.b with 0? Eor with self?

clr.b (a1,d0.l)

Quote:

Originally Posted by guy lateur

I see, that makes sense. So you're actually 'abusing' the rts contained in the called function to serve in the calling function. Quite clever, indeed!

Yes, and this can even further optimized.

Suppose you have something like this:

Code:

maincode:
 register_setup
 bsr calc_subroutine
 blabla_code1
 bsr calc_subroutine
 blabla_code2
 bsr calc_subroutine
 rts

calc_subroutine:
 code_code
 rts

New code can be:

Code:

maincode:
 register_setup
 bsr calc_subroutine
 blabla_code1
 bsr calc_subroutine
 blabla_code2
calc_subroutine:
 code_code
 rts

Code smaller and faster

guy lateur · 24 October 2018, 12:20

Quote:

Originally Posted by ross

clr.b (a1,d0.l)

Quote:

Originally Posted by ross

Yes, and this can even further optimized.
<snip>
Code smaller and faster

Alright, thanks a lot for the tips, much appreciated!

NorthWay · 24 October 2018, 16:48

Some assemblers can optimize away all those. BAsm does AFAIR. (If you tell it to.)

ross · 24 October 2018, 16:58

Quote:

Originally Posted by NorthWay

Some assemblers can optimize away all those. BAsm does AFAIR. (If you tell it to.)

Yes, BAsm is great and really fast.
But not perfect in optimizing code (no other assembler is perfect too).
I've got some 'wrong' optimization with it that give me no more working code.
So use it only for a 'suggestion' then manually optimize, or write optimized from the start

NorthWay · 24 October 2018, 18:16

If you don't have a Vampire(just not optimal) then you can take subroutine runs and turn them on the head:

jsr sub1
jsr sub2
jsr sub3
rts
->
pea sub3
pea sub2
jmp sub1

guy lateur · 24 October 2018, 18:25

Quote:

Originally Posted by NorthWay

If you don't have a Vampire(just not optimal) then you can take subroutine runs and turn them on the head:

jsr sub1
jsr sub2
jsr sub3
rts
->
pea sub3
pea sub2
jmp sub1

Oh cool, so you're kinda 'stringing' those calls together by pushing their adresses on the stack, rather than always coming back to the calling routine. Nice! And no, I don't have a Vampire (yet)..

ross · 24 October 2018, 18:55

Not if you use an absolute addressing mode like the proposed example.

jsr $absolute

and

pea $absolute

are the same cycles on 68000 and the

rts

need to be executed anyway.

Different the

jsr offset(pc)

and

pea offset(pc)

case where

pea

is faster than the

jsr/bsr

counterpart.

EDIT: well, considering the two messages in sequence (optimization by the assembler and pea/jsr couple), is clear why that code is considered faster.
It is simply the assembler that optimizes it (if it can), making it pc relative.
But always better understand what is behind it

.

guy lateur · 24 October 2018, 19:22

Quote:

Originally Posted by ross

jsr $absolute

and

pea $absolute

are the same cycles on 68000

What is a reliable source for this kind of information? I found this (http://oldwww.nvg.ntnu.no/amiga/MC68...000timing.HTML), but I find it a bit strangely organised. Maybe that's just me, though, not really understanding all the nuances and subtleties this entails.

What would be great is some kind of interactive version of this. So I just type in my instruction, and it comes back with timing information (maybe broken down into address calculation time + the rest) and preferably also size information. Now thát would be a handy tool!

ross · 24 October 2018, 19:32

I use this https://wiki.neogeodev.org/index.php...ctions_timings (which basically refers to what you have linked)
or the official Motorola manuals.

24 October 2018, 18:55	#38
ross Defendit numerus Join Date: Mar 2017 Location: Crossing the Rubicon Age: 53 Posts: 4,468	Not if you use an absolute addressing mode like the proposed example. jsr $absolute and pea $absolute are the same cycles on 68000 and the rts need to be executed anyway. Different the jsr offset(pc) and pea offset(pc) case where pea is faster than the jsr/bsr counterpart. EDIT: well, considering the two messages in sequence (optimization by the assembler and pea/jsr couple), is clear why that code is considered faster. It is simply the assembler that optimizes it (if it can), making it pc relative. But always better understand what is behind it . Last edited by ross; 24 October 2018 at 19:36.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
OCS collision and clx registers	PiCiJi	Coders. Asm / Hardware	11	30 July 2019 06:15
A4000 IDE registers	mark_k	Coders. Asm / Hardware	6	11 May 2015 17:05
Using FPU registers?	oRBIT	Coders. General	16	26 April 2010 13:34
Need DA8000-DAFFFF registers documentation	BlueAchenar	Coders. General	2	13 December 2008 15:39
Gayle Hardware Registers	bluea	support.Hardware	5	09 July 2006 17:07

24 October 2018, 08:31	#25
jlin_au Registered User Join Date: Nov 2016 Location: Fadden ACT Australia Posts: 128	Couple of comments: 1) it would be good practice to check the return code in d0 and handle it appropriately after calling a system library routine 2) the "JMP _LVOCopyMem(a6)" in the example above instead of a "JSR ...." assumes that the stack contains the valid return address of the routine that called your code and that your routine has left the stack in a clean state! Otherwise !!!!!!

24 October 2018, 09:12	#27
ross Defendit numerus Join Date: Mar 2017 Location: Crossing the Rubicon Age: 53 Posts: 4,468	Actually it can be further optimized, knowing that #asciiNULL==0x00

24 October 2018, 16:48	#34
NorthWay Registered User Join Date: May 2013 Location: Grimstad / Norway Posts: 839	Some assemblers can optimize away all those. BAsm does AFAIR. (If you tell it to.)

24 October 2018, 18:16	#36
NorthWay Registered User Join Date: May 2013 Location: Grimstad / Norway Posts: 839	If you don't have a Vampire(just not optimal) then you can take subroutine runs and turn them on the head: jsr sub1 jsr sub2 jsr sub3 rts -> pea sub3 pea sub2 jmp sub1

24 October 2018, 19:32	#40
ross Defendit numerus Join Date: Mar 2017 Location: Crossing the Rubicon Age: 53 Posts: 4,468	I use this https://wiki.neogeodev.org/index.php...ctions_timings (which basically refers to what you have linked) or the official Motorola manuals.

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)