English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General

 
 
Thread Tools
Old 21 November 2019, 17:14   #1
TCH
Newbie Amiga programmer

 
Join Date: Jun 2012
Location: Front of my A500+
Age: 33
Posts: 165
XOR fill optimization and "zig-zag" horizontal edges

I wrote a XOR fill routine in C, but i have two problems with it.

First, all horizontal edges are becoming "zig-zag" (obviously, since it will toggle on each sequential high bits). Is there any "standard" method to avoid that? The naive approach, what would search for the end of the sequence would be very slow.

Second, the filling is also very slow. I know, it should be rewritten in assembly, but this is only the PoC code; i'll do that later, when the algorithm is done. Now, i would like to speed up the algorithm itself. For example, is there any arithmetical/logical trick to spare the innermost cycle what iterates over the bits of the current word?

Here is the code:
Code:
dstptr16 = (UWORD *)TempArea;
while (th-- > 0)
{
	t = w;
	pattern16 = 0x0000;
	while (t-- > 0)
	{
		read16 = *dstptr16;
		mask16 = 0x8000;
		fill16 = 0x0000;
		while (mask16 != 0)
		{
			if ((read16 & mask16) != 0)
			{
				pattern16 = ~pattern16;
			}
			fill16 |= pattern16 & mask16;
			mask16 >>= 1;
		}
		*dstptr16++ = fill16;
	}
	dstptr16 -= w;
	dstptr16 += RowSize;
}

Last edited by TCH; 21 November 2019 at 17:16. Reason: remove tab
TCH is offline  
Old 21 November 2019, 19:04   #2
a/b
Registered User

 
Join Date: Jun 2016
Location: europe
Posts: 155
I'd use lookup tables, 128K for 16-data and 64K for mode toggle (similar to blitter fill carry bit). If that's too much, then 256B for 8-bit data. First and last word/byte you can handle individually in a loop, or mask the table data (similar to blitter first/last work mask).

Other than that (no tables at all)... I'd split the loop into 3 parts: lead in to get to a word aligned boundary, middle part where I'd check if *dstptr16 is 0 and if so I'd fill/skip all 16 bits in a single operation, and lead out for the remaining non-aligned bits.

If you are filling entire lines (e.g. 0-319), then it's all aligned and no need for first/last word stuff.
a/b is offline  
Old 21 November 2019, 19:41   #3
TCH
Newbie Amiga programmer

 
Join Date: Jun 2012
Location: Front of my A500+
Age: 33
Posts: 165
Somehow i forgot about the tables as a solution; thank you, for reminding me. The 16-bit table indeed would be too much, but the 8-bit one will work.

As for the second option, i am already done with that, actually that was my first approach. But it is only good for convex polygons, or concave ones which are wider than 16 pixels. If i have two or more edges in a single read, then i still have to iterate the bits.

So, i think i go for the split table solution now, but i am still open to suggestions.
TCH is offline  
Old 22 November 2019, 13:18   #4
TCH
Newbie Amiga programmer

 
Join Date: Jun 2012
Location: Front of my A500+
Age: 33
Posts: 165
So, i've ended up with these two times two tables:
Code:
const UBYTE xorfill_fill_pattern[2][256] =
{
	{
		0b00000000, 0b00000001, 0b00000011, 0b00000011, 0b00000111, 0b00000111, 0b00000110, 0b00000111, 0b00001111, 0b00001111, 0b00001110, 0b00001111, 0b00001100, 0b00001101, 0b00001111, 0b00001111,
		0b00011111, 0b00011111, 0b00011110, 0b00011111, 0b00011100, 0b00011101, 0b00011111, 0b00011111, 0b00011000, 0b00011001, 0b00011011, 0b00011011, 0b00011111, 0b00011111, 0b00011110, 0b00011111,
		0b00111111, 0b00111111, 0b00111110, 0b00111111, 0b00111100, 0b00111101, 0b00111111, 0b00111111, 0b00111000, 0b00111001, 0b00111011, 0b00111011, 0b00111111, 0b00111111, 0b00111110, 0b00111111,
		0b00110000, 0b00110001, 0b00110011, 0b00110011, 0b00110111, 0b00110111, 0b00110110, 0b00110111, 0b00111111, 0b00111111, 0b00111110, 0b00111111, 0b00111100, 0b00111101, 0b00111111, 0b00111111,
		0b01111111, 0b01111111, 0b01111110, 0b01111111, 0b01111100, 0b01111101, 0b01111111, 0b01111111, 0b01111000, 0b01111001, 0b01111011, 0b01111011, 0b01111111, 0b01111111, 0b01111110, 0b01111111,
		0b01110000, 0b01110001, 0b01110011, 0b01110011, 0b01110111, 0b01110111, 0b01110110, 0b01110111, 0b01111111, 0b01111111, 0b01111110, 0b01111111, 0b01111100, 0b01111101, 0b01111111, 0b01111111,
		0b01100000, 0b01100001, 0b01100011, 0b01100011, 0b01100111, 0b01100111, 0b01100110, 0b01100111, 0b01101111, 0b01101111, 0b01101110, 0b01101111, 0b01101100, 0b01101101, 0b01101111, 0b01101111,
		0b01111111, 0b01111111, 0b01111110, 0b01111111, 0b01111100, 0b01111101, 0b01111111, 0b01111111, 0b01111000, 0b01111001, 0b01111011, 0b01111011, 0b01111111, 0b01111111, 0b01111110, 0b01111111,
		0b11111111, 0b11111111, 0b11111110, 0b11111111, 0b11111100, 0b11111101, 0b11111111, 0b11111111, 0b11111000, 0b11111001, 0b11111011, 0b11111011, 0b11111111, 0b11111111, 0b11111110, 0b11111111,
		0b11110000, 0b11110001, 0b11110011, 0b11110011, 0b11110111, 0b11110111, 0b11110110, 0b11110111, 0b11111111, 0b11111111, 0b11111110, 0b11111111, 0b11111100, 0b11111101, 0b11111111, 0b11111111,
		0b11100000, 0b11100001, 0b11100011, 0b11100011, 0b11100111, 0b11100111, 0b11100110, 0b11100111, 0b11101111, 0b11101111, 0b11101110, 0b11101111, 0b11101100, 0b11101101, 0b11101111, 0b11101111,
		0b11111111, 0b11111111, 0b11111110, 0b11111111, 0b11111100, 0b11111101, 0b11111111, 0b11111111, 0b11111000, 0b11111001, 0b11111011, 0b11111011, 0b11111111, 0b11111111, 0b11111110, 0b11111111,
		0b11000000, 0b11000001, 0b11000011, 0b11000011, 0b11000111, 0b11000111, 0b11000110, 0b11000111, 0b11001111, 0b11001111, 0b11001110, 0b11001111, 0b11001100, 0b11001101, 0b11001111, 0b11001111,
		0b11011111, 0b11011111, 0b11011110, 0b11011111, 0b11011100, 0b11011101, 0b11011111, 0b11011111, 0b11011000, 0b11011001, 0b11011011, 0b11011011, 0b11011111, 0b11011111, 0b11011110, 0b11011111,
		0b11111111, 0b11111111, 0b11111110, 0b11111111, 0b11111100, 0b11111101, 0b11111111, 0b11111111, 0b11111000, 0b11111001, 0b11111011, 0b11111011, 0b11111111, 0b11111111, 0b11111110, 0b11111111,
		0b11110000, 0b11110001, 0b11110011, 0b11110011, 0b11110111, 0b11110111, 0b11110110, 0b11110111, 0b11111111, 0b11111111, 0b11111110, 0b11111111, 0b11111100, 0b11111101, 0b11111111, 0b11111111
	},
	{
		0b11111111, 0b11111111, 0b11111110, 0b11111111, 0b11111100, 0b11111101, 0b11111111, 0b11111111, 0b11111000, 0b11111001, 0b11111011, 0b11111011, 0b11111111, 0b11111111, 0b11111110, 0b11111111,
		0b11110000, 0b11110001, 0b11110011, 0b11110011, 0b11110111, 0b11110111, 0b11110110, 0b11110111, 0b11111111, 0b11111111, 0b11111110, 0b11111111, 0b11111100, 0b11111101, 0b11111111, 0b11111111,
		0b11100000, 0b11100001, 0b11100011, 0b11100011, 0b11100111, 0b11100111, 0b11100110, 0b11100111, 0b11101111, 0b11101111, 0b11101110, 0b11101111, 0b11101100, 0b11101101, 0b11101111, 0b11101111,
		0b11111111, 0b11111111, 0b11111110, 0b11111111, 0b11111100, 0b11111101, 0b11111111, 0b11111111, 0b11111000, 0b11111001, 0b11111011, 0b11111011, 0b11111111, 0b11111111, 0b11111110, 0b11111111,
		0b11000000, 0b11000001, 0b11000011, 0b11000011, 0b11000111, 0b11000111, 0b11000110, 0b11000111, 0b11001111, 0b11001111, 0b11001110, 0b11001111, 0b11001100, 0b11001101, 0b11001111, 0b11001111,
		0b11011111, 0b11011111, 0b11011110, 0b11011111, 0b11011100, 0b11011101, 0b11011111, 0b11011111, 0b11011000, 0b11011001, 0b11011011, 0b11011011, 0b11011111, 0b11011111, 0b11011110, 0b11011111,
		0b11111111, 0b11111111, 0b11111110, 0b11111111, 0b11111100, 0b11111101, 0b11111111, 0b11111111, 0b11111000, 0b11111001, 0b11111011, 0b11111011, 0b11111111, 0b11111111, 0b11111110, 0b11111111,
		0b11110000, 0b11110001, 0b11110011, 0b11110011, 0b11110111, 0b11110111, 0b11110110, 0b11110111, 0b11111111, 0b11111111, 0b11111110, 0b11111111, 0b11111100, 0b11111101, 0b11111111, 0b11111111,
		0b10000000, 0b10000001, 0b10000011, 0b10000011, 0b10000111, 0b10000111, 0b10000110, 0b10000111, 0b10001111, 0b10001111, 0b10001110, 0b10001111, 0b10001100, 0b10001101, 0b10001111, 0b10001111,
		0b10011111, 0b10011111, 0b10011110, 0b10011111, 0b10011100, 0b10011101, 0b10011111, 0b10011111, 0b10011000, 0b10011001, 0b10011011, 0b10011011, 0b10011111, 0b10011111, 0b10011110, 0b10011111,
		0b10111111, 0b10111111, 0b10111110, 0b10111111, 0b10111100, 0b10111101, 0b10111111, 0b10111111, 0b10111000, 0b10111001, 0b10111011, 0b10111011, 0b10111111, 0b10111111, 0b10111110, 0b10111111,
		0b10110000, 0b10110001, 0b10110011, 0b10110011, 0b10110111, 0b10110111, 0b10110110, 0b10110111, 0b10111111, 0b10111111, 0b10111110, 0b10111111, 0b10111100, 0b10111101, 0b10111111, 0b10111111,
		0b11111111, 0b11111111, 0b11111110, 0b11111111, 0b11111100, 0b11111101, 0b11111111, 0b11111111, 0b11111000, 0b11111001, 0b11111011, 0b11111011, 0b11111111, 0b11111111, 0b11111110, 0b11111111,
		0b11110000, 0b11110001, 0b11110011, 0b11110011, 0b11110111, 0b11110111, 0b11110110, 0b11110111, 0b11111111, 0b11111111, 0b11111110, 0b11111111, 0b11111100, 0b11111101, 0b11111111, 0b11111111,
		0b11100000, 0b11100001, 0b11100011, 0b11100011, 0b11100111, 0b11100111, 0b11100110, 0b11100111, 0b11101111, 0b11101111, 0b11101110, 0b11101111, 0b11101100, 0b11101101, 0b11101111, 0b11101111,
		0b11111111, 0b11111111, 0b11111110, 0b11111111, 0b11111100, 0b11111101, 0b11111111, 0b11111111, 0b11111000, 0b11111001, 0b11111011, 0b11111011, 0b11111111, 0b11111111, 0b11111110, 0b11111111
	}
};

const BOOL xorfill_filling_change[2][256] =
{
	{
		FALSE,  TRUE,  TRUE, FALSE,  TRUE, FALSE, FALSE,  TRUE,  TRUE, FALSE, FALSE,  TRUE, FALSE,  TRUE,  TRUE, FALSE,
		 TRUE, FALSE, FALSE,  TRUE, FALSE,  TRUE,  TRUE, FALSE, FALSE,  TRUE,  TRUE, FALSE,  TRUE, FALSE, FALSE,  TRUE,
		 TRUE, FALSE, FALSE,  TRUE, FALSE,  TRUE,  TRUE, FALSE, FALSE,  TRUE,  TRUE, FALSE,  TRUE, FALSE, FALSE,  TRUE,
		FALSE,  TRUE,  TRUE, FALSE,  TRUE, FALSE, FALSE,  TRUE,  TRUE, FALSE, FALSE,  TRUE, FALSE,  TRUE,  TRUE, FALSE,
		 TRUE, FALSE, FALSE,  TRUE, FALSE,  TRUE,  TRUE, FALSE, FALSE,  TRUE,  TRUE, FALSE,  TRUE, FALSE, FALSE,  TRUE,
		FALSE,  TRUE,  TRUE, FALSE,  TRUE, FALSE, FALSE,  TRUE,  TRUE, FALSE, FALSE,  TRUE, FALSE,  TRUE,  TRUE, FALSE,
		FALSE,  TRUE,  TRUE, FALSE,  TRUE, FALSE, FALSE,  TRUE,  TRUE, FALSE, FALSE,  TRUE, FALSE,  TRUE,  TRUE, FALSE,
		 TRUE, FALSE, FALSE,  TRUE, FALSE,  TRUE,  TRUE, FALSE, FALSE,  TRUE,  TRUE, FALSE,  TRUE, FALSE, FALSE,  TRUE,
		 TRUE, FALSE, FALSE,  TRUE, FALSE,  TRUE,  TRUE, FALSE, FALSE,  TRUE,  TRUE, FALSE,  TRUE, FALSE, FALSE,  TRUE,
		FALSE,  TRUE,  TRUE, FALSE,  TRUE, FALSE, FALSE,  TRUE,  TRUE, FALSE, FALSE,  TRUE, FALSE,  TRUE,  TRUE, FALSE,
		FALSE,  TRUE,  TRUE, FALSE,  TRUE, FALSE, FALSE,  TRUE,  TRUE, FALSE, FALSE,  TRUE, FALSE,  TRUE,  TRUE, FALSE,
		 TRUE, FALSE, FALSE,  TRUE, FALSE,  TRUE,  TRUE, FALSE, FALSE,  TRUE,  TRUE, FALSE,  TRUE, FALSE, FALSE,  TRUE,
		FALSE,  TRUE,  TRUE, FALSE,  TRUE, FALSE, FALSE,  TRUE,  TRUE, FALSE, FALSE,  TRUE, FALSE,  TRUE,  TRUE, FALSE,
		 TRUE, FALSE, FALSE,  TRUE, FALSE,  TRUE,  TRUE, FALSE, FALSE,  TRUE,  TRUE, FALSE,  TRUE, FALSE, FALSE,  TRUE,
		 TRUE, FALSE, FALSE,  TRUE, FALSE,  TRUE,  TRUE, FALSE, FALSE,  TRUE,  TRUE, FALSE,  TRUE, FALSE, FALSE,  TRUE,
		FALSE,  TRUE,  TRUE, FALSE,  TRUE, FALSE, FALSE,  TRUE,  TRUE, FALSE, FALSE,  TRUE, FALSE,  TRUE,  TRUE, FALSE
	},
	{
		 TRUE, FALSE, FALSE,  TRUE, FALSE,  TRUE,  TRUE, FALSE, FALSE,  TRUE,  TRUE, FALSE,  TRUE, FALSE, FALSE,  TRUE,
		FALSE,  TRUE,  TRUE, FALSE,  TRUE, FALSE, FALSE,  TRUE,  TRUE, FALSE, FALSE,  TRUE, FALSE,  TRUE,  TRUE, FALSE,
		FALSE,  TRUE,  TRUE, FALSE,  TRUE, FALSE, FALSE,  TRUE,  TRUE, FALSE, FALSE,  TRUE, FALSE,  TRUE,  TRUE, FALSE,
		 TRUE, FALSE, FALSE,  TRUE, FALSE,  TRUE,  TRUE, FALSE, FALSE,  TRUE,  TRUE, FALSE,  TRUE, FALSE, FALSE,  TRUE,
		FALSE,  TRUE,  TRUE, FALSE,  TRUE, FALSE, FALSE,  TRUE,  TRUE, FALSE, FALSE,  TRUE, FALSE,  TRUE,  TRUE, FALSE,
		 TRUE, FALSE, FALSE,  TRUE, FALSE,  TRUE,  TRUE, FALSE, FALSE,  TRUE,  TRUE, FALSE,  TRUE, FALSE, FALSE,  TRUE,
		 TRUE, FALSE, FALSE,  TRUE, FALSE,  TRUE,  TRUE, FALSE, FALSE,  TRUE,  TRUE, FALSE,  TRUE, FALSE, FALSE,  TRUE,
		FALSE,  TRUE,  TRUE, FALSE,  TRUE, FALSE, FALSE,  TRUE,  TRUE, FALSE, FALSE,  TRUE, FALSE,  TRUE,  TRUE, FALSE,
		FALSE,  TRUE,  TRUE, FALSE,  TRUE, FALSE, FALSE,  TRUE,  TRUE, FALSE, FALSE,  TRUE, FALSE,  TRUE,  TRUE, FALSE,
		 TRUE, FALSE, FALSE,  TRUE, FALSE,  TRUE,  TRUE, FALSE, FALSE,  TRUE,  TRUE, FALSE,  TRUE, FALSE, FALSE,  TRUE,
		 TRUE, FALSE, FALSE,  TRUE, FALSE,  TRUE,  TRUE, FALSE, FALSE,  TRUE,  TRUE, FALSE,  TRUE, FALSE, FALSE,  TRUE,
		FALSE,  TRUE,  TRUE, FALSE,  TRUE, FALSE, FALSE,  TRUE,  TRUE, FALSE, FALSE,  TRUE, FALSE,  TRUE,  TRUE, FALSE,
		 TRUE, FALSE, FALSE,  TRUE, FALSE,  TRUE,  TRUE, FALSE, FALSE,  TRUE,  TRUE, FALSE,  TRUE, FALSE, FALSE,  TRUE,
		FALSE,  TRUE,  TRUE, FALSE,  TRUE, FALSE, FALSE,  TRUE,  TRUE, FALSE, FALSE,  TRUE, FALSE,  TRUE,  TRUE, FALSE,
		FALSE,  TRUE,  TRUE, FALSE,  TRUE, FALSE, FALSE,  TRUE,  TRUE, FALSE, FALSE,  TRUE, FALSE,  TRUE,  TRUE, FALSE,
		 TRUE, FALSE, FALSE,  TRUE, FALSE,  TRUE,  TRUE, FALSE, FALSE,  TRUE,  TRUE, FALSE,  TRUE, FALSE, FALSE,  TRUE
	},
};
And a new xorfiller code using these:
Code:
th = h;
while (th-- > 0)
{
	t = w;
	filling = FALSE;
	while (t-- > 0)
	{
		ft = *fillptr;
		*fillptr++ = xorfill_fill_pattern[filling][ft];
		filling = xorfill_filling_change[filling][ft];
	}
	fillptr += Stepper;
}
It turned out, that the pure 8-bit filling is faster by some percent than reading in and splitting 16-bit values (i guess the same would be true for the 32-bit version), so it ended up like this. I guess there is nothing what could give more speed to this one, is there?
TCH is offline  
Old 22 November 2019, 19:35   #5
a/b
Registered User

 
Join Date: Jun 2016
Location: europe
Posts: 155
1. Change counters from [x,1] to [x-1,0] in hope that your code will get optimized to use dbf. Maybe using pre-decrement (and >=) instead will also do the trick.

2. Unroll inner loop 8/16/32/... times (and adjust t accordingly).

3. Access to 2D tables, so double index, is generally slow because you have to calculate the offset instead of using a simple indexing mode (ax,dx.w).
It really depends what the compiler produces. Maybe doing some manual calcs instead would be better, like use 256 instead of TRUE (fill_change table is words, not bytes) and then (xorfill_fill_pattern+filling)[ft], and define both 2D tables as 1D with 512 entries. Or perhaps try to merge indices. In asm it's simple:
Code:
 moveq #0,d0  ; filling = false;
 moveq #w-1,d1   ; t = w-1
inner:
 move.b (a0),d0   ; merge filling and ft
 move.b (a1,d0.w),(a0)+  ; fill_pattern (byte table, 512 entries)
if 020+
 move.w (a2,d0.w*2),d0
else
 add.w d0,d0
 move.w (a2,d0.w),d0  ; fill_change (word table 0 or 256, 512 entries)
endif
 dbf d1,inner
...
EDIT: In c (context switch took too long ;\):
Code:
 filling += *fillptr;
 *fillptr++ = xorfill_fill_pattern[filling];
 filling = xorfill_filling_change[filling];

Last edited by a/b; 22 November 2019 at 19:50.
a/b is offline  
Old 22 November 2019, 23:47   #6
TCH
Newbie Amiga programmer

 
Join Date: Jun 2012
Location: Front of my A500+
Age: 33
Posts: 165
1. Thanks, it gained almost 1%. Well, "who does not appreciate the few, does not deserve the lot".

2. Unroll how? Variable
w
can contain anything from 1 to 40. If i unroll it for example 8 times, then the minimum of pixels i must copy will become 64.

3. Worked like a charm, 11% faster. Thank you.
TCH is offline  
Old 23 November 2019, 01:00   #7
Antiriad_UK
Registered User

 
Join Date: Mar 2019
Location: Birmingham, UK
Posts: 142
On my list of things I maybe need is a 68000 line draw and fill that works like blitter horizontal fill. Liking the info in this thread!
Antiriad_UK is offline  
Old 23 November 2019, 05:07   #8
a/b
Registered User

 
Join Date: Jun 2016
Location: europe
Posts: 155
Something like:
Code:
filling = 0;  t1 = w>>3;  t2 = w&7;  // unrolled 8 times
while (t1-- > 0)
  { // 8*8 = 64 pixels
  filling += *fillptr;  *fillptr++ = xorfill_fill_pattern[filling];  filling = xorfill_filling_change[filling];
  filling += *fillptr;  *fillptr++ = xorfill_fill_pattern[filling];  filling = xorfill_filling_change[filling];
  // repeat 6 more times...
  }
while (t2-- > 0)
  { filling += *fillptr;  *fillptr++ = xorfill_fill_pattern[filling];  filling = xorfill_filling_change[filling]; }
If there're mostly shorter spans to fill, maybe unrolling say 4 times (32 pixels) would be faster.
a/b is offline  
Old 23 November 2019, 12:16   #9
TCH
Newbie Amiga programmer

 
Join Date: Jun 2012
Location: Front of my A500+
Age: 33
Posts: 165
@Antiriad_UK:

Soon, the line drawing algorithm i use will be posted on EAB too.

@a/b:
Oh, so you meant unrolling in a way as
memcpy()
works if the bus is wider than 8-bit...i am such stupid, i should have realize that.

Thanks for the explanation, we gained some more speed with this approach. I thought it further and came up with this macro-cascade
Code:
#define t_fill1x() filling += *fillptr; *fillptr++ = xorfill_fill_pattern[filling]; filling = xorfill_filling_change[filling];
#define t_fill2x() t_fill1x(); t_fill1x();
#define t_fill4x() t_fill2x(); t_fill2x();
#define t_fill8x() t_fill4x(); t_fill4x();
#define t_fill16x() t_fill8x(); t_fill8x();
#define t_fill32x() t_fill16x(); t_fill16x();
#define filling_cycle(NUM) t = tw##NUM; while (--t >= 0) { t_fill##NUM##x(); }
and this code
Code:
fillptr = TempArea;
th = h;
tw32 = w >> 5;
tw16 = (w >> 4) & 1;
tw8 = (w >> 3) & 1;
tw4 = (w >> 2) & 1;
tw2 = (w >> 1) & 1;
tw1 = w & 1;
while (--th >= 0)
{
	filling = 0x0000;
	filling_cycle(32);
	filling_cycle(16);
	filling_cycle(8);
	filling_cycle(4);
	filling_cycle(2);
	filling_cycle(1);
	fillptr += Stepper;
}
While this means around 1.4 kB of unrolled code, the gained 7.6% of speed worth it, i thinks. Thank you.
TCH is offline  
Old 23 November 2019, 14:04   #10
hmn
Registered User

 
Join Date: Nov 2016
Location: DE
Posts: 16
Could you combine the tables? That would spare one array access.

Code:
// entry: (filling << 8) | pattern
extern UWORD lut[512];			
[...]
filllpat = lut[(fillpat & 0xff00) | *fillptr];
*fillptr++ = fillpat & 0xff;
Compiles to (GCC 6, -O2 -S):

Code:
	and.l #65280,%d0
	or.b (%a0),%d0
	add.l %d0,%d0
	move.w (%a2,%d0.l),%d0
	move.b %d0,(%a0)+
Quick hand optimization:
Code:
	move.b (%a0),%d0
	add.w %d0,%d0
	move.w (%a2,%d0.l),%d0
	move.b %d0,(%a0)+
DISCLAIMER: Just an idea - I have not verified any of this code.
hmn is offline  
Old 23 November 2019, 14:42   #11
TCH
Newbie Amiga programmer

 
Join Date: Jun 2012
Location: Front of my A500+
Age: 33
Posts: 165
That unfortunately only made it ~4% slower.
TCH is offline  
Old 23 November 2019, 17:35   #12
a/b
Registered User

 
Join Date: Jun 2016
Location: europe
Posts: 155
Macro approch above will bomb on 020/030 because of small icache size (256B).
You can also replace while(--t >= 0) with if(t) or if(tw##NUM) and remove t, since t is either 0 or 1. On faster CPUs and in asm a simple btst #x,dy (where dy = w) would probably be as fast too (and no set-up code required for tw vars).
a/b is offline  
Old 23 November 2019, 18:06   #13
TCH
Newbie Amiga programmer

 
Join Date: Jun 2012
Location: Front of my A500+
Age: 33
Posts: 165
Thank you for pointing out about the cache. However, it means, that i have to keep the
while
cycle on the top level. Changed macros to this
Code:
#define t_fill1x() filling += *fillptr; *fillptr++ = xorfill_fill_pattern[filling]; filling = xorfill_filling_change[filling];
#define t_fill2x() t_fill1x(); t_fill1x();
#define t_fill4x() t_fill2x(); t_fill2x();
#define t_fill8x() t_fill4x(); t_fill4x();
#define filling_fork(NUM) if (tw##NUM != 0) { t_fill##NUM##x(); }
and code to this
Code:
fillptr = TempArea;
th = h;
tw8 = w >> 3;
tw4 = w & 4;
tw2 = w & 2;
tw1 = w & 1;
while (--th >= 0)
{
	filling = 0x0000;
	t = tw8;
	while (--t >= 0)
	{
		t_fill8x();
	}
	filling_fork(4);
	filling_fork(2);
	filling_fork(1);
	fillptr += Stepper;
}
According to my benchmarks, it did not impacted the speed at all.

Last edited by TCH; 24 November 2019 at 00:42. Reason: fixing typo
TCH is offline  
Old 24 November 2019, 08:17   #14
deimos
Registered User

 
Join Date: Jul 2018
Location: Londonish / UK
Posts: 489
Quote:
Originally Posted by TCH View Post
I wrote a XOR fill routine in C
Can I ask why you chose not to use a scan line fill instead?
deimos is offline  
Old 24 November 2019, 11:10   #15
TCH
Newbie Amiga programmer

 
Join Date: Jun 2012
Location: Front of my A500+
Age: 33
Posts: 165
You mean i iterate through a list which contains y, x0, x1 coordinates and fill up scanlines between x0 and x1 on line y? Well, i already tried that, but to get a list like that, i had to add all the dots' coordinates which were generated by the line drawing algorithm into a big multi dimensional list and then order them, remove duplicates, etc. It was very slow.

Or what do you mean by scan line fill?
TCH is offline  
Old 24 November 2019, 11:22   #16
deimos
Registered User

 
Join Date: Jul 2018
Location: Londonish / UK
Posts: 489
Quote:
Originally Posted by TCH View Post
You mean i iterate through a list which contains y, x0, x1 coordinates and fill up scanlines between x0 and x1 on line y? Well, i already tried that, but to get a list like that, i had to add all the dots' coordinates which were generated by the line drawing algorithm into a big multi dimensional list and then order them, remove duplicates, etc. It was very slow.

Or what do you mean by scan line fill?
That is what I mean by scan line fill, but I didn't find it necessary to order points or remove duplicates when I did mine. I think the only difference is that you'd have to split your concave polygons.
deimos is offline  
Old 24 November 2019, 11:54   #17
TCH
Newbie Amiga programmer

 
Join Date: Jun 2012
Location: Front of my A500+
Age: 33
Posts: 165
Well, yes, you are right, if you only have convex polygons, you'll always have only two dots per line. This did not occurred in my mind when i tried to do scanfills, thank you for pointing out.

However i have no idea, how to split them up, so it seems i have to read some code.
TCH is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
"Diabolik" & "Dylan Dog" & "Tex" & "Time Runners" series DamienD request.Old Rare Games 11 11 November 2019 09:30
"Voices8" 8 Channel Soundtracker "DemoSongI" song - "This is the Amiga with 8 Voices" DemosongIHunter request.Music 44 27 October 2019 20:04
"Screech!! v2.41" & "Screech!! [AGA] v2.51" - "HD install" --> "ADFs" DamienD request.Old Rare Games 39 03 April 2019 23:53
"Reminder "Lincs Amiga User Group aka "LAG" Meet Sat 5th of January 2013" rockape News 4 30 January 2013 01:06
Glitchy horizontal lines in Vanish's "Monoxide" demo plus list of older reports Maren support.WinUAE 34 07 January 2011 13:53

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 02:19.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2019, vBulletin Solutions Inc.
Page generated in 0.08188 seconds with 14 queries