English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 01 January 2017, 03:52   #1
Galahad/FLT
Going nowhere

Galahad/FLT's Avatar
 
Join Date: Oct 2001
Location: United Kingdom
Age: 45
Posts: 7,037
Quickest code....

Is the following quicker in a loop

move.w d1,(a0)
addq.w #4,a0

or is

move.l d1,(a0)+

this is for 68000.

Last edited by Galahad/FLT; 01 January 2017 at 04:39.
Galahad/FLT is offline  
Old 01 January 2017, 04:23   #2
Pat the Cat
Banned

 
Join Date: Dec 2016
Location: Nottingham, UK
Posts: 481
I could be wrong, think it depends partly on what kind of RAM it's running in. Also how many bitplanes are being displayed.

Technically, the second is moving more data. The first only moves a word, and increments the counter by +4.

Are you sure these two bits of code are doing the same thing? Doesn't look like it from here..
Pat the Cat is offline  
Old 01 January 2017, 04:42   #3
Galahad/FLT
Going nowhere

Galahad/FLT's Avatar
 
Join Date: Oct 2001
Location: United Kingdom
Age: 45
Posts: 7,037
Quote:
Originally Posted by Pat the Cat View Post
I could be wrong, think it depends partly on what kind of RAM it's running in. Also how many bitplanes are being displayed.

Technically, the second is moving more data. The first only moves a word, and increments the counter by +4.

Are you sure these two bits of code are doing the same thing? Doesn't look like it from here..
If I hadn't missed out the + on the end of the second code example then yes it would have been the same

first code is moving just the word colour value into a copperlist, and then skipping colour register word to get to the next colour value.

the second code would have the colorregister and colour value all in one in the data register, and a straight longword move with no need to add anything to A0 because by using (a0)+, its already at the next correct location.

So folks, i'm under the impression using words is quicker on 68000, but is that the case with this code?
Galahad/FLT is offline  
Old 01 January 2017, 04:58   #4
Pat the Cat
Banned

 
Join Date: Dec 2016
Location: Nottingham, UK
Posts: 481
Looks like it, according to;-

http://oldwww.nvg.ntnu.no/amiga/MC68...000timing.HTML

Longwords take twice as long to write, as the the data bus is only 16 bits wide on a 68000 - so it has to happen twice. Wheras the addq.4 to an address is done very quickly, quicker than writing a word to a memory address. Although the second write, to write a long word, is going to slow things down some. Shouldn't it be addq.2, to increment the address pointer?

Like I said, it does depend to a certain extent on where the code runs - it's not called "fast" RAM for nothing.

The HRM explains those issues, with bus contention in chip RAM, a lot more clearly than I can.
Pat the Cat is offline  
Old 01 January 2017, 05:25   #5
Galahad/FLT
Going nowhere

Galahad/FLT's Avatar
 
Join Date: Oct 2001
Location: United Kingdom
Age: 45
Posts: 7,037
Quote:
Originally Posted by Pat the Cat View Post
Looks like it, according to;-

http://oldwww.nvg.ntnu.no/amiga/MC68...000timing.HTML

Longwords take twice as long to write, as the the data bus is only 16 bits wide on a 68000 - so it has to happen twice. Wheras the addq.4 to an address is done very quickly, quicker than writing a word to a memory address. Although the second write, to write a long word, is going to slow things down some. Shouldn't it be addq.2, to increment the address pointer?

Like I said, it does depend to a certain extent on where the code runs - it's not called "fast" RAM for nothing.

The HRM explains those issues, with bus contention in chip RAM, a lot more clearly than I can.
As i'm sure you're aware, copperlists can only reside in chip mem
Galahad/FLT is offline  
Old 01 January 2017, 11:00   #6
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 43
Posts: 22,023
move.w dn,(an) takes 8 cycles (prefetch and write)
addaq #x,an (both word or long) is 8 cycles. (prefetch and idle)

move.l dn,(an)+ takes 12 cycles (two writes and prefetch)

Because both variants have same beginning cycle usage (3 back to back memory accesses), move.l can never be slower, even if DMA steals cycles.
Toni Wilen is online now  
Old 01 January 2017, 11:19   #7
Galahad/FLT
Going nowhere

Galahad/FLT's Avatar
 
Join Date: Oct 2001
Location: United Kingdom
Age: 45
Posts: 7,037
Quote:
Originally Posted by Toni Wilen View Post
move.w dn,(an) takes 8 cycles (prefetch and write)
addaq #x,an (both word or long) is 8 cycles. (prefetch and idle)

move.l dn,(an)+ takes 12 cycles (two writes and prefetch)

Because both variants have same beginning cycle usage (3 back to back memory accesses), move.l can never be slower, even if DMA steals cycles.
Lovely......
Galahad/FLT is offline  
Old 01 January 2017, 16:20   #8
LaBodilsen
 
Posts: n/a
Toni is of course right.

testet this:

rept 8000
move.w d1,(a0)
addq.w #4,a0
endr

and

rept 8000
move.l d1,(a0)+
endr

and the later was about 16 scanlines faster.

even with 6 bitplane DMA, the later is still faster. funny, as i would have thought the first was faster.

Last edited by LaBodilsen; 01 January 2017 at 16:27.
 
Old 01 January 2017, 16:34   #9
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 43
Posts: 22,023
Possible reason for confusion is that addq.w #x,Dn is 4 cycles but addaq.w #x,An is 8 cycles. Address register operations are always long wide which needs more than 1 internal operation (ALU is only 16-bit) and any word values need to be internally sign extended to long first.
Toni Wilen is online now  
Old 01 January 2017, 17:31   #10
Pat the Cat
Banned

 
Join Date: Dec 2016
Location: Nottingham, UK
Posts: 481
Quote:
Originally Posted by LaBodilsen View Post
Toni is of course right.

testet this:

rept 8000
move.w d1,(a0)
addq.w #4,a0
endr

and

rept 8000
move.l d1,(a0)+
endr

and the later was about 16 scanlines faster.

even with 6 bitplane DMA, the later is still faster. funny, as i would have thought the first was faster.
Yes, that's the real test... write code that loops lots, and time them.

Turn interlace on, as well as 6 bitplanes (EHB or HAM) and try playing sound at the same time and the CPU crawls if running from code in chip RAM. At least, it always did in my code.

Quote:
Originally Posted by Toni Wilen View Post
Possible reason for confusion is that addq.w #x,Dn is 4 cycles but addaq.w #x,An is 8 cycles. Address register operations are always long wide which needs more than 1 internal operation (ALU is only 16-bit) and any word values need to be internally sign extended to long first.
Ah... That's where I got tangled.

Because I thought, you can only ADDQ values of 1 to 8 - I thought they translated as seperate opcodes. Which would mean the operation takes place entirely inside the CPU, with no bus access required.

If the CPU has to read the value to be added, that slows down the operation... but... even more...

... Fact remains, to move the same amount of data (word versus longword), you really have to have a second write to the first set of code anyway!

Last edited by Pat the Cat; 01 January 2017 at 17:39.
Pat the Cat is offline  
Old 01 January 2017, 18:23   #11
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 43
Posts: 22,023
Quote:
Originally Posted by Pat the Cat View Post
Yes, that's the real test... write code that loops lots, and time them.
This is now "basic" knowledge and practically fully known, there is no need for testing anymore

Quote:
Because I thought, you can only ADDQ values of 1 to 8 - I thought they translated as seperate opcodes. Which would mean the operation takes place entirely inside the CPU, with no bus access required.
They are separate opcodes. (add.w #x,An is even slower due to extra memory read). Answer is still the same: 68000 is internally 16-bit, 32-bit operations mean multiple 16-bit operations.
Toni Wilen is online now  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
disassemble a code ? turrican3 Coders. Asm / Hardware 2 26 January 2016 09:02
Any C/C++ example code? vim Coders. C/C++ 6 10 February 2015 06:34
What's this code doing? Jherek Carnelia Coders. General 13 15 August 2011 18:55
What is the quickest way Doc Mindie support.WinUAE 6 17 October 2007 22:15
3D code and/or internet code for Blitz Basic 2.1 EdzUp Retrogaming General Discussion 0 10 February 2002 12:40

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 07:19.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2018, vBulletin Solutions Inc.
Page generated in 0.08744 seconds with 13 queries