01 November 2020, 13:57 | #1 |
<optimized out>
Join Date: Sep 2020
Location: <optimized out>
Posts: 321
|
General question about efficiency and WORDs vs LONGs
I've moved onto the next step of my little toy project and I'm now drawing polygons. I've rasterized them using Bresenham storing the edge points in an array and I'm drawing lines between matching points, currently using a LineTo function that calls my SetPixel function over and over.
I know that's bad, and I've started writing a DrawLineFaster function for the special case of horizontal lines. This will write either a WORD or a LONG at a time. My gut feeling is that LONGs will be quicker, same number of writes to memory but only half as many instruction fetches. The tradeoff as far as I can see is just the the lookup tables that I think will need to exist for the bits at the start and end that don't fit within the WORD / LONG boundaries will have to be twice as big for the LONG version. Does this sound sensible, or does it sound like I've not understood something? Edit: OCS, standard Amiga 500, deliberately writing in C only, deliberately not using the blitter. Last edited by Ernst Blofeld; 01 November 2020 at 14:48. |
01 November 2020, 14:43 | #2 |
Total Chaos forever!
Join Date: Aug 2007
Location: Waterville, MN, USA
Age: 49
Posts: 2,186
|
You haven't told us your system configuration. An AGA machine has 32-bit busses. OCS and ECS chipsets have 16-bit Chip RAM. While the blitter-based polygon filler is not as fast as a modern system, filling multiple polygons in one step can help it gain some time.
|
01 November 2020, 14:48 | #3 | |
<optimized out>
Join Date: Sep 2020
Location: <optimized out>
Posts: 321
|
Quote:
|
|
01 November 2020, 14:55 | #4 |
Total Chaos forever!
Join Date: Aug 2007
Location: Waterville, MN, USA
Age: 49
Posts: 2,186
|
The 32 bit write won't be any faster or slower than 16 bit writes because the Chip RAM bus accesses give the same number of 16 bit writes regardless of the size being written to them and will delay the second word just as long as if there was another instruction there.
|
01 November 2020, 15:12 | #5 | |
<optimized out>
Join Date: Sep 2020
Location: <optimized out>
Posts: 321
|
Quote:
|
|
01 November 2020, 15:41 | #6 | ||
Registered User
Join Date: Dec 2014
Location: germany
Posts: 439
|
Quote:
Quote:
Code:
move.l d0,(a0)+ Code:
move.w d0,(a0)+ move.w d0,(a0)+ The fastest way would be using the movem instruction for long horizontal lines - then the instruction fetch overhead word vs. long also vanishes. But I do not know how you can trick the compiler into using movem. |
||
01 November 2020, 15:59 | #7 | |
Total Chaos forever!
Join Date: Aug 2007
Location: Waterville, MN, USA
Age: 49
Posts: 2,186
|
Quote:
|
|
01 November 2020, 16:02 | #8 | |
<optimized out>
Join Date: Sep 2020
Location: <optimized out>
Posts: 321
|
Quote:
|
|
01 November 2020, 17:50 | #9 |
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,410
|
As far as I've been able to tell, using longwords is faster than using words for 32 bit writes, even on the A500. It doesn't appear to matter if the code is in chip memory or fast memory, though if DMA is really busy and code is in fast memory the speed difference does diminish.
There's two reasons for longwords being faster. The first is, as both you and chb pointed out, instruction fetching taking time. The second is that fast memory does not actually make the 68000 run faster, it merely doesn't slow it down in case of heavy DMA use (such as more than 4 bitplanes on the screen). What helps to understand why longwords are still faster in that case is that any slowdown due to chip memory being busy will affect both word and longword writes equally, but the longword writes require fewer instructions fetched so still end up faster. For reference, here's the cycle count + memory access count Code:
68000 move instruction cycle use / memory accesses cycles instruction memory accesses 8 move.w d0,(a0)+ 2 (1r/1w) 12 move.l d0,(a0)+ 3 (1r/2w) 64 movem.w d0-d7/a1-a6,(a0) 16 (2r/14w) 120 movem.l d0-d7/a1-a6,(a0) 30 (2r/28w) Note that the movem.l is still slightly faster than the movem.w for the same amount of memory, though it's not by much. |
01 November 2020, 18:08 | #10 |
<optimized out>
Join Date: Sep 2020
Location: <optimized out>
Posts: 321
|
Thanks everyone, I've now written it and it seems to work. Not surprisingly it is at least 10 x the speed of the version that set each pixel individually.
Code:
static const ULONG leftEnd [] = { 0xffffffff, 0x7fffffff, 0x3fffffff, 0x1fffffff, 0x0fffffff, 0x07ffffff, 0x03ffffff, 0x01ffffff, 0x00ffffff, 0x007fffff, 0x003fffff, 0x001fffff, 0x000fffff, 0x0007ffff, 0x0003ffff, 0x0001ffff, 0x0000ffff, 0x00007fff, 0x00003fff, 0x00001fff, 0x00000fff, 0x000007ff, 0x000003ff, 0x000001ff, 0x000000ff, 0x0000007f, 0x0000003f, 0x0000001f, 0x0000000f, 0x00000007, 0x00000003, 0x00000001 }; static const ULONG rightEnd [] = { 0x80000000, 0xc0000000, 0xe0000000, 0xf0000000, 0xf8000000, 0xfc000000, 0xfe000000, 0xff000000, 0xff800000, 0xffc00000, 0xffe00000, 0xfff00000, 0xfff80000, 0xfffc0000, 0xfffe0000, 0xffff0000, 0xffff8000, 0xffffc000, 0xffffe000, 0xfffff000, 0xfffff800, 0xfffffc00, 0xfffffe00, 0xffffff00, 0xffffff80, 0xffffffc0, 0xffffffe0, 0xfffffff0, 0xfffffff8, 0xfffffffc, 0xfffffffe, 0xffffffff }; static void DrawHorizontalLine(const UWORD y, const UWORD x0, const UWORD x1) { UWORD start = x0 >> 5; UWORD end = x1 >> 5; ULONG left = leftEnd[x0 & 0x001f]; ULONG right = rightEnd[x1 & 0x001f]; ULONG * p = ((ULONG *) currentBuffer) + y * ROW_SIZE_IN_LONGS + start; if (start == end) { ULONG m = left & right; for (UWORD i = 1; i < DISPLAY_NUM_COLOURS; i += i, p += DISPLAY_WIDTH_IN_LONGS) { if (pen.colour & i) *p |= m; else *p &= ~m; } } else { for (UWORD i = 1; i < DISPLAY_NUM_COLOURS; i += i, p += DISPLAY_WIDTH_IN_LONGS) { ULONG * q = p; if (pen.colour & i) { *q++ |= left; switch (end - start) { case 11: *q++ = 0xffffffff; case 10: *q++ = 0xffffffff; case 9: *q++ = 0xffffffff; case 8: *q++ = 0xffffffff; case 7: *q++ = 0xffffffff; case 6: *q++ = 0xffffffff; case 5: *q++ = 0xffffffff; case 4: *q++ = 0xffffffff; case 3: *q++ = 0xffffffff; case 2: *q++ = 0xffffffff; } *q |= right; } else { *q++ &= ~left; switch (end - start) { case 11: *q++ = 0x00000000; case 10: *q++ = 0x00000000; case 9: *q++ = 0x00000000; case 8: *q++ = 0x00000000; case 7: *q++ = 0x00000000; case 6: *q++ = 0x00000000; case 5: *q++ = 0x00000000; case 4: *q++ = 0x00000000; case 3: *q++ = 0x00000000; case 2: *q++ = 0x00000000; } *q &= ~right; } } } } Last edited by Ernst Blofeld; 01 November 2020 at 19:25. |
01 November 2020, 20:43 | #11 |
Registered User
Join Date: Dec 2010
Location: Athens/Greece
Age: 53
Posts: 719
|
I don't know if you are already aware, but Deluxe Paint version I released source code.
Maybe you'd pick an idea or two from there. https://computerhistory.org/blog/ele...ly-source-code |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Random lockups, general question | Leandro Jardim | support.WinUAE | 6 | 03 September 2014 13:49 |
32 and 64 bit sprite control words question | FrenchShark | Coders. General | 8 | 10 January 2008 02:32 |
General A1200 040 question | JonSick | support.Hardware | 1 | 14 October 2006 20:54 |
General asm question | Haakon | Coders. General | 14 | 15 February 2006 21:42 |
Swear words | Kodoichi | project.EAB | 19 | 14 December 2001 00:53 |
|
|