68000 optimisation

Galahad/FLT · 19 June 2016, 13:48

Need to optimize something, but don't want to waste a lot of effort on parts that won't yield proper benefits.

Code in question is lots of ADDA.L #$X,Ax, especially in loops.

Am I going to get much of a saving for the processor if I change them all to
LEA x(Ax),Ax instead?

ReadOnlyCat · 19 June 2016, 14:46

Quote:

Originally Posted by Galahad/FLT

Need to optimize something, but don't want to waste a lot of effort on parts that won't yield proper benefits.

Code in question is lots of ADDA.L #$X,Ax, especially in loops.

Am I going to get much of a saving for the processor if I change them all to
LEA x(Ax),Ax instead?

According to http://oldwww.nvg.ntnu.no/amiga/MC68...timjmpetc.HTML, LEA would take the same number of cycles in this case, that is 8.
Even ADDQ is not faster on address kittens but it would be on data registers (4 cycles).

What kind of values do you add to these registers? Maybe there is a way to use (Ax)+ instead? It would help if you had an example with the initial setting of the data registers and the loop.

Don_Adan · 19 June 2016, 15:03

Quote:

Originally Posted by ReadOnlyCat

According to http://oldwww.nvg.ntnu.no/amiga/MC68...timjmpetc.HTML, LEA would take the same number of cycles in this case, that is 8.
Even ADDQ is not faster on address kittens but it would be on data registers (4 cycles).

What kind of values do you add to these registers? Maybe there is a way to use (Ax)+ instead? It would help if you had an example with the initial setting of the data registers and the loop.

Lea is fastest than adda.l for 68000.
He can use add.w, same speed like lea, or addq.l, shortest code.

ReadOnlyCat · 19 June 2016, 15:19

Quote:

Originally Posted by Don_Adan

Lea is fastest than adda.l for 68000.
He can use add.w, same speed like lea, or addq.l, shortest code.

Are you sure? Both are listed as 8 cycles on the site I linked to.

Toni Wilen · 19 June 2016, 15:38

add.l #x,reg can't be 8 cycles. Count the number of memory fetches needed..

ReadOnlyCat · 19 June 2016, 16:06

Quote:

Originally Posted by Toni Wilen

add.l #x,reg can't be 8 cycles. Count the number of memory fetches needed..

Damn, what a dummy. I read the cycle count but forgot to add operation timing + effective address computation.

Indeed, just fetching the operands would be 8 cycles itself.

Lesson of the day: do not surf the EAB before a morning shower.

Asman · 19 June 2016, 21:38

@Galahad/FLT
Please post more code lines.
If you have spare dx register then I will use moveq #x,dx and in loop
add.l dx,a0

Photon · 17 July 2016, 17:57

On 68000, there's no faster way than lea d16(An),An.

Not even addq (but you will save 2 bytes of code).

Toni Wilen · 17 July 2016, 19:03

addq.l #x,an can be faster than lea because it is memory cycle + 2xidle cycle combination (lea is 2xmemory cycle), DMA can use second cycle without slowing down the CPU.

Photon · 20 August 2016, 00:29

Quote:

Originally Posted by Toni Wilen

addq.l #x,an can be faster than lea because it is memory cycle + 2xidle cycle combination (lea is 2xmemory cycle), DMA can use second cycle without slowing down the CPU.

Of course, however you will need more than 4 bitplanes on, or a combination of bitplanes and the eccentric BLTPRI=0 mode (or equally eccentric minterms). Otherwise you wont get MA cycles untimely stolen-/granted-or-not and the CPU is simply either locked out or not.

Lea is a normal MA for the instruction, and a MA for the offset, while addq is a normal MA for the instruction barring prefetch, followed by a 2 cycle internal operation which affects nothing but the CPU internal state. Naturally you should cut down on MA where possible, but correct blits and not hampering the CPU is the larger optimization.

I was about to post something about this doublespeed addq anomaly in WinUAE vs. real Amiga, but I tried it in emu now and saw you fixed that

Good work

19 June 2016, 13:48	#1
Galahad/FLT Going nowhere Join Date: Oct 2001 Location: United Kingdom Age: 50 Posts: 9,016	68000 optimisation Need to optimize something, but don't want to waste a lot of effort on parts that won't yield proper benefits. Code in question is lots of ADDA.L #$X,Ax, especially in loops. Am I going to get much of a saving for the processor if I change them all to LEA x(Ax),Ax instead?

17 July 2016, 19:03	#9
Toni Wilen WinUAE developer Join Date: Aug 2001 Location: Hämeenlinna/Finland Age: 49 Posts: 26,570	addq.l #x,an can be faster than lea because it is memory cycle + 2xidle cycle combination (lea is 2xmemory cycle), DMA can use second cycle without slowing down the CPU. Last edited by Toni Wilen; 17 July 2016 at 19:10.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
anyone have Play16 V1.8 (68000)	Yulquen74	support.Apps	2	22 November 2013 22:50
680x0 to 68000	Counia	Hardware mods	1	01 March 2011 10:18
quitting on 68000?	Hungry Horace	project.WHDLoad	60	19 December 2006 20:17
Picasso IV optimisation	Tony Landais	support.Hardware	10	01 September 2006 19:54

19 June 2016, 15:38	#5
Toni Wilen WinUAE developer Join Date: Aug 2001 Location: Hämeenlinna/Finland Age: 49 Posts: 26,570	add.l #x,reg can't be 8 cycles. Count the number of memory fetches needed..

19 June 2016, 21:38	#7
Asman 68k Join Date: Sep 2005 Location: Somewhere Posts: 829	@Galahad/FLT Please post more code lines. If you have spare dx register then I will use moveq #x,dx and in loop add.l dx,a0

17 July 2016, 17:57	#8
Photon Moderator Join Date: Nov 2004 Location: Eksjö / Sweden Posts: 5,655	On 68000, there's no faster way than lea d16(An),An. Not even addq (but you will save 2 bytes of code).

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)