English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Language > Coders. C/C++

 
 
Thread Tools
Old 01 November 2020, 13:57   #1
Ernst Blofeld
<optimized out>
 
Ernst Blofeld's Avatar
 
Join Date: Sep 2020
Location: <optimized out>
Posts: 321
General question about efficiency and WORDs vs LONGs

I've moved onto the next step of my little toy project and I'm now drawing polygons. I've rasterized them using Bresenham storing the edge points in an array and I'm drawing lines between matching points, currently using a LineTo function that calls my SetPixel function over and over.

I know that's bad, and I've started writing a DrawLineFaster function for the special case of horizontal lines. This will write either a WORD or a LONG at a time. My gut feeling is that LONGs will be quicker, same number of writes to memory but only half as many instruction fetches.

The tradeoff as far as I can see is just the the lookup tables that I think will need to exist for the bits at the start and end that don't fit within the WORD / LONG boundaries will have to be twice as big for the LONG version.

Does this sound sensible, or does it sound like I've not understood something?

Edit: OCS, standard Amiga 500, deliberately writing in C only, deliberately not using the blitter.

Last edited by Ernst Blofeld; 01 November 2020 at 14:48.
Ernst Blofeld is offline  
Old 01 November 2020, 14:43   #2
Samurai_Crow
Total Chaos forever!
 
Samurai_Crow's Avatar
 
Join Date: Aug 2007
Location: Waterville, MN, USA
Age: 49
Posts: 2,186
You haven't told us your system configuration. An AGA machine has 32-bit busses. OCS and ECS chipsets have 16-bit Chip RAM. While the blitter-based polygon filler is not as fast as a modern system, filling multiple polygons in one step can help it gain some time.
Samurai_Crow is offline  
Old 01 November 2020, 14:48   #3
Ernst Blofeld
<optimized out>
 
Ernst Blofeld's Avatar
 
Join Date: Sep 2020
Location: <optimized out>
Posts: 321
Quote:
Originally Posted by Samurai_Crow View Post
You haven't told us your system configuration. An AGA machine has 32-bit busses. OCS and ECS chipsets have 16-bit Chip RAM. While the blitter-based polygon filler is not as fast as a modern system, filling multiple polygons in one step can help it gain some time.
OCS, standard Amiga 500, deliberately writing in C only, deliberately not using the blitter.
Ernst Blofeld is offline  
Old 01 November 2020, 14:55   #4
Samurai_Crow
Total Chaos forever!
 
Samurai_Crow's Avatar
 
Join Date: Aug 2007
Location: Waterville, MN, USA
Age: 49
Posts: 2,186
The 32 bit write won't be any faster or slower than 16 bit writes because the Chip RAM bus accesses give the same number of 16 bit writes regardless of the size being written to them and will delay the second word just as long as if there was another instruction there.
Samurai_Crow is offline  
Old 01 November 2020, 15:12   #5
Ernst Blofeld
<optimized out>
 
Ernst Blofeld's Avatar
 
Join Date: Sep 2020
Location: <optimized out>
Posts: 321
Quote:
Originally Posted by Samurai_Crow View Post
The 32 bit write won't be any faster or slower than 16 bit writes because the Chip RAM bus accesses give the same number of 16 bit writes regardless of the size being written to them and will delay the second word just as long as if there was another instruction there.
Ok, thanks, I'll stick to words.
Ernst Blofeld is offline  
Old 01 November 2020, 15:41   #6
chb
Registered User
 
Join Date: Dec 2014
Location: germany
Posts: 439
Quote:
Originally Posted by Ernst Blofeld View Post
I know that's bad, and I've started writing a DrawLineFaster function for the special case of horizontal lines. This will write either a WORD or a LONG at a time. My gut feeling is that LONGs will be quicker, same number of writes to memory but only half as many instruction fetches.
Yes, correct IMHO.

Quote:
Originally Posted by Samurai_Crow View Post
The 32 bit write won't be any faster or slower than 16 bit writes because the Chip RAM bus accesses give the same number of 16 bit writes regardless of the size being written to them and will delay the second word just as long as if there was another instruction there.
No, because the number of instructions fetched is lower, as EB wrote. To write out two words from a register (a typical solid polygon span filler), the following
Code:
move.l d0,(a0)+
needs only 75% the mem accesses (1 fetch/2 data) compared to a
Code:
move.w d0,(a0)+
move.w d0,(a0)+
which is (2/2).
The fastest way would be using the movem instruction for long horizontal lines - then the instruction fetch overhead word vs. long also vanishes. But I do not know how you can trick the compiler into using movem.
chb is offline  
Old 01 November 2020, 15:59   #7
Samurai_Crow
Total Chaos forever!
 
Samurai_Crow's Avatar
 
Join Date: Aug 2007
Location: Waterville, MN, USA
Age: 49
Posts: 2,186
Quote:
Originally Posted by chb View Post
Yes, correct IMHO.


No, because the number of instructions fetched is lower, as EB wrote. To write out two words from a register (a typical solid polygon span filler), the following
Code:
move.l d0,(a0)+
needs only 75% the mem accesses (1 fetch/2 data) compared to a
Code:
move.w d0,(a0)+
move.w d0,(a0)+
which is (2/2).
The fastest way would be using the movem instruction for long horizontal lines - then the instruction fetch overhead word vs. long also vanishes. But I do not know how you can trick the compiler into using movem.
I guess it depends if you use chip RAM to hold your code in.
Samurai_Crow is offline  
Old 01 November 2020, 16:02   #8
Ernst Blofeld
<optimized out>
 
Ernst Blofeld's Avatar
 
Join Date: Sep 2020
Location: <optimized out>
Posts: 321
Quote:
Originally Posted by chb View Post
Yes, correct IMHO.


No, because the number of instructions fetched is lower, as EB wrote. To write out two words from a register (a typical solid polygon span filler), the following
Code:
move.l d0,(a0)+
needs only 75% the mem accesses (1 fetch/2 data) compared to a
Code:
move.w d0,(a0)+
move.w d0,(a0)+
which is (2/2).
The fastest way would be using the movem instruction for long horizontal lines - then the instruction fetch overhead word vs. long also vanishes. But I do not know how you can trick the compiler into using movem.
There will also be the loop control instructions, which will be executed twice as often for words vs longs, and the special case of the start and end being the same address won't happen as often. I know I can unroll the loop into a switch statement with fall throughs to remove the loop control instructions, but I don't know how significant the special case will be.
Ernst Blofeld is offline  
Old 01 November 2020, 17:50   #9
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,410
As far as I've been able to tell, using longwords is faster than using words for 32 bit writes, even on the A500. It doesn't appear to matter if the code is in chip memory or fast memory, though if DMA is really busy and code is in fast memory the speed difference does diminish.

There's two reasons for longwords being faster. The first is, as both you and chb pointed out, instruction fetching taking time. The second is that fast memory does not actually make the 68000 run faster, it merely doesn't slow it down in case of heavy DMA use (such as more than 4 bitplanes on the screen). What helps to understand why longwords are still faster in that case is that any slowdown due to chip memory being busy will affect both word and longword writes equally, but the longword writes require fewer instructions fetched so still end up faster.

For reference, here's the cycle count + memory access count
Code:
68000 move instruction cycle use / memory accesses
cycles instruction               memory accesses
  8    move.w d0,(a0)+            2 (1r/1w)
 12    move.l d0,(a0)+            3 (1r/2w)
 64    movem.w d0-d7/a1-a6,(a0)  16 (2r/14w)
120    movem.l d0-d7/a1-a6,(a0)  30 (2r/28w)

Note that the movem.l is still slightly faster than the movem.w for the same 
amount of memory, though it's not by much.
Now, it is certainly possible that there are scenarios in which longword reads/writes to memory are less efficient than word ones, but I've not found them myself to date.
roondar is offline  
Old 01 November 2020, 18:08   #10
Ernst Blofeld
<optimized out>
 
Ernst Blofeld's Avatar
 
Join Date: Sep 2020
Location: <optimized out>
Posts: 321
Thanks everyone, I've now written it and it seems to work. Not surprisingly it is at least 10 x the speed of the version that set each pixel individually.

Code:
static const ULONG leftEnd [] = {
    0xffffffff, 0x7fffffff, 0x3fffffff, 0x1fffffff, 0x0fffffff, 0x07ffffff, 0x03ffffff, 0x01ffffff,
    0x00ffffff, 0x007fffff, 0x003fffff, 0x001fffff, 0x000fffff, 0x0007ffff, 0x0003ffff, 0x0001ffff,
    0x0000ffff, 0x00007fff, 0x00003fff, 0x00001fff, 0x00000fff, 0x000007ff, 0x000003ff, 0x000001ff,
    0x000000ff, 0x0000007f, 0x0000003f, 0x0000001f, 0x0000000f, 0x00000007, 0x00000003, 0x00000001

};

static const ULONG rightEnd [] = {
    0x80000000, 0xc0000000, 0xe0000000, 0xf0000000, 0xf8000000, 0xfc000000, 0xfe000000, 0xff000000,
    0xff800000, 0xffc00000, 0xffe00000, 0xfff00000, 0xfff80000, 0xfffc0000, 0xfffe0000, 0xffff0000,
    0xffff8000, 0xffffc000, 0xffffe000, 0xfffff000, 0xfffff800, 0xfffffc00, 0xfffffe00, 0xffffff00,
    0xffffff80, 0xffffffc0, 0xffffffe0, 0xfffffff0, 0xfffffff8, 0xfffffffc, 0xfffffffe, 0xffffffff
};

static void DrawHorizontalLine(const UWORD y, const UWORD x0, const UWORD x1) {
	UWORD start = x0 >> 5;
	UWORD end = x1 >> 5;

	ULONG left = leftEnd[x0 & 0x001f];
	ULONG right = rightEnd[x1 & 0x001f];

	ULONG * p = ((ULONG *) currentBuffer) + y * ROW_SIZE_IN_LONGS + start;

	if (start == end) {
		ULONG m = left & right;

		for (UWORD i = 1; i < DISPLAY_NUM_COLOURS; i += i, p += DISPLAY_WIDTH_IN_LONGS) {
			if (pen.colour & i)
				*p |= m;
			else
				*p &= ~m;
		}
	} else {
		for (UWORD i = 1; i < DISPLAY_NUM_COLOURS; i += i, p += DISPLAY_WIDTH_IN_LONGS) {
			ULONG * q = p;

			if (pen.colour & i) {
				*q++ |= left;
				switch (end - start) {
					case 11: *q++ = 0xffffffff;
					case 10: *q++ = 0xffffffff;
					case  9: *q++ = 0xffffffff;
					case  8: *q++ = 0xffffffff;
					case  7: *q++ = 0xffffffff;
					case  6: *q++ = 0xffffffff;
					case  5: *q++ = 0xffffffff;
					case  4: *q++ = 0xffffffff;
					case  3: *q++ = 0xffffffff;
					case  2: *q++ = 0xffffffff;
				}
				*q |= right;
			} else {
				*q++ &= ~left;
				switch (end - start) {
					case 11: *q++ = 0x00000000;
					case 10: *q++ = 0x00000000;
					case  9: *q++ = 0x00000000;
					case  8: *q++ = 0x00000000;
					case  7: *q++ = 0x00000000;
					case  6: *q++ = 0x00000000;
					case  5: *q++ = 0x00000000;
					case  4: *q++ = 0x00000000;
					case  3: *q++ = 0x00000000;
					case  2: *q++ = 0x00000000;
				}
				*q &= ~right;
			}
		}
	}
}

Last edited by Ernst Blofeld; 01 November 2020 at 19:25.
Ernst Blofeld is offline  
Old 01 November 2020, 20:43   #11
alkis
Registered User
 
Join Date: Dec 2010
Location: Athens/Greece
Age: 53
Posts: 719
I don't know if you are already aware, but Deluxe Paint version I released source code.
Maybe you'd pick an idea or two from there.
https://computerhistory.org/blog/ele...ly-source-code
alkis is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Random lockups, general question Leandro Jardim support.WinUAE 6 03 September 2014 13:49
32 and 64 bit sprite control words question FrenchShark Coders. General 8 10 January 2008 02:32
General A1200 040 question JonSick support.Hardware 1 14 October 2006 20:54
General asm question Haakon Coders. General 14 15 February 2006 21:42
Swear words Kodoichi project.EAB 19 14 December 2001 00:53

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 19:25.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.15482 seconds with 15 queries