Approaching tactics of softblitting - Page 2

Samurai_Crow · 09 January 2022, 22:23

When an image needs shifting, there is part of an image that needs to be made transparent pixels because the left and right edges have no source images being shifted in on one side and only image portions shifted in but no source on the other. Meynaf is referring to those cases as corner cases, in part because shifting bits in 68000 Assembly requires special care when shifting more than 16 bits in either direction. The C compiler generates that code for you.

An interleaved bitplane display uses horizontal modulo registers to allow the bitplane rows to be stacked in memory vertically. On OCS that severely limits the display width because the modulo registers can skip a maximum of 1024 bits from one row to the next. That means if you have a 5 bitplane display, the maximum width a display can be is 256 pixels. Using a shallower palette depth helps with that by reducing the number of bitplanes to skip using the modulo. Also, ECS has a 15-bit modulo instead of 10-bit so in can handle much wider displays with this configuration.

The way an interleaved display looks in memory is row 0 bitplane 0 is followed in memory by row 0 bitplane 1, followed by row 0 bitplane 2, up to row 0 bitplane d-1 where d is the screen depth. After that, you start over with row 1 for all bitplanes, then row 2 for all bitplanes, all the way up to row h-1 for all bitplanes where h is the display height.

The reason for interleaved bitplanes are blitting speed allows all bitplanes to be processed as one tall bitplane. The disadvantage is that an interleaved "cookie-cutter" masked blit requires the mask plane to be duplicated in height for all bitplanes to get the speed advantage, thus costing a lot of chip memory.

TCH · 09 January 2022, 22:49

Quote:

Originally Posted by Samurai_Crow

When an image needs shifting, there is part of an image that needs to be made transparent pixels because the left and right edges have no source images being shifted in on one side and only image portions shifted in but no source on the other. Meynaf is referring to those cases as corner cases, in part because shifting bits in 68000 Assembly requires special care when shifting more than 16 bits in either direction. The C compiler generates that code for you.

Okay, i get it now, a mask of ones at the top is needed, because shifting the mask will shift in zeroes and wrecks transparency. Thank you both.
However, when i asked about handling the edges separately i did not mean this. Currently my code iterates all the rows, but in a row, it handles the left and right edge outside the internal loop, because unlike the rest, they need one read/write. I don't know if it is handled the correct way or can it handled another way.

Quote:

Originally Posted by Samurai_Crow

An interleaved bitplane display uses horizontal modulo registers to allow the bitplane rows to be stacked in memory vertically. On OCS that severely limits the display width because the modulo registers can skip a maximum of 1024 bits from one row to the next. That means if you have a 5 bitplane display, the maximum width a display can be is 256 pixels. Using a shallower palette depth helps with that by reducing the number of bitplanes to skip using the modulo. Also, ECS has a 15-bit modulo instead of 10-bit so in can handle much wider displays with this configuration.

Okay, so the Amiga can handle interleaved displays too. I did not know that. But now i could search it up and found

SA_Interleaved

in the manual of

OpenScreen()

. Thanks for shedding the light.

Quote:

Originally Posted by Samurai_Crow

The way an interleaved display looks in memory is row 0 bitplane 0 is followed in memory by row 0 bitplane 1, followed by row 0 bitplane 2, up to row 0 bitplane d-1 where d is the screen depth. After that, you start over with row 1 for all bitplanes, then row 2 for all bitplanes, all the way up to row h-1 for all bitplanes where h is the display height.

I knew what is an interleaved display, what i totally did not get, that the Blitter supports both.

Quote:

Originally Posted by Samurai_Crow

The reason for interleaved bitplanes are blitting speed allows all bitplanes to be processed as one tall bitplane. The disadvantage is that an interleaved "cookie-cutter" masked blit requires the mask plane to be duplicated in height for all bitplanes to get the speed advantage, thus costing a lot of chip memory.

And that the screen can be only 256 pixels wide on OCS if i want to blit like this?

roondar · 09 January 2022, 23:01

Quote:

Originally Posted by TCH

And that the screen can be only 256 pixels wide on OCS if i want to blit like this?

You can blit on screens up to 1008 pixels wide on OCS systems and using the new ECS/AGA Blitter registers up to 32768 pixels wide on those. Interleaved vs non-interleaved blitting does not change these maximums, so the screen can be pretty much as wide as you like.

TCH · 09 January 2022, 23:09

If the type of blitting does not change that, then what did this mean?

Quote:

Originally Posted by Samurai_Crow

An interleaved bitplane display uses horizontal modulo registers to allow the bitplane rows to be stacked in memory vertically. On OCS that severely limits the display width because the modulo registers can skip a maximum of 1024 bits from one row to the next. That means if you have a 5 bitplane display, the maximum width a display can be is 256 pixels. Using a shallower palette depth helps with that by reducing the number of bitplanes to skip using the modulo. Also, ECS has a 15-bit modulo instead of 10-bit so in can handle much wider displays with this configuration.

Either i miss something trivial, or the terminology i use differ from the conventional...

roondar · 09 January 2022, 23:32

Quote:

Originally Posted by TCH

If the type of blitting does not change that, then what did this mean?Either i miss something trivial, or the terminology i use differ from the conventional...

Ah, I hadn't read that part.. Well, that info is not correct, the Blitter & Bitplane Modulo values are signed 16 bit one values measured in bytes, meaning you can skip up to 32768 bytes per line (on OCS), which is far more than 1024 bits/pixels.

Samurai_Crow · 10 January 2022, 00:53

Quote:

Originally Posted by roondar

Ah, I hadn't read that part.. Well, that info is not correct, the Blitter & Bitplane Modulo values are signed 16 bit one values measured in bytes, meaning you can skip up to 32768 bytes per line (on OCS), which is far more than 1024 bits/pixels.

Are you sure about that? It was one significant thing that was added in ECS that wasn't in OCS. Bitmaps bigger than 1024 pixels horizontally require ECS.

meynaf · 10 January 2022, 07:48

Quote:

Originally Posted by TCH

What do you mean by "adjust" and "overflow"?

It's simply about altering transparency mask to remove pixels out of target region, so that they are not blitted (if they were, part of the graphic would be visible out of the wanted region, i.e. overflow).

Quote:

Originally Posted by TCH

I cannot even imagine, how can i skip the handling of the left and right edge at each line...

You don't need to. But it's possible to just alter a few parameters so main loop is still used.
Something like this :
- init phase : setup for first word
- main loop
- exit phase : setup for last word, return to main loop for 1 more iteration
Perhaps this is just easier to do in asm than in C.

roondar · 10 January 2022, 10:40

Quote:

Originally Posted by Samurai_Crow

Are you sure about that? It was one significant thing that was added in ECS that wasn't in OCS. Bitmaps bigger than 1024 pixels horizontally require ECS.

100% positive. The reason for the 1024 pixel limit is purely due to the BLTSIZE register only supporting a width of 64 words, it has nothing to do with the modulo values (which can go up much further). ECS & AGA update this with the BLTSIZV & BLTSIZH values.

So you can blit a maximum of 1024 pixels wide on OCS, but that can be on a bitmap wider than 1024 pixels due to use of modulos.

TCH · 10 January 2022, 11:05

Quote:

Originally Posted by roondar

Ah, I hadn't read that part.. Well, that info is not correct, the Blitter & Bitplane Modulo values are signed 16 bit one values measured in bytes, meaning you can skip up to 32768 bytes per line (on OCS), which is far more than 1024 bits/pixels.

Quote:

Originally Posted by roondar

100% positive. The reason for the 1024 pixel limit is purely due to the BLTSIZE register only supporting a width of 64 words, it has nothing to do with the modulo values (which can go up much further). ECS & AGA update this with the BLTSIZV & BLTSIZH values.

So you can blit a maximum of 1024 pixels wide on OCS, but that can be on a bitmap wider than 1024 pixels due to use of modulos.

Okay, now it's clean, thanks.

Also, you've mentioned utilizing the CPU for "clearing" on the previous page; what did you mean by that? Zeroing out everything in a square or AND-ing the mask there? (Also, does it gain performance in DPF mode only or does it in SPF mode too?)

Quote:

Originally Posted by meynaf

It's simply about altering transparency mask to remove pixels out of target region, so that they are not blitted (if they were, part of the graphic would be visible out of the wanted region, i.e. overflow).

Okay, so it's the shifting in ones from the left part, got it, thanks.

Quote:

Originally Posted by meynaf

You don't need to. But it's possible to just alter a few parameters so main loop is still used.
Something like this :
- init phase : setup for first word
- main loop
- exit phase : setup for last word, return to main loop for 1 more iteration
Perhaps this is just easier to do in asm than in C.

Well, my main loop looks like exactly this. Then i did not screw it up at all.

@topic:
So, to sum it up, the blitter is faster with interleaved blitting than continous, even if it needs a trick to "duplicate" the mask for it?
Is any benchmark, sources or tutorials available about that?

roondar · 10 January 2022, 11:12

Quote:

Originally Posted by TCH

Okay, now it's clean, thanks.

Also, you've mentioned utilizing the CPU for "clearing" on the previous page; what did you mean by that? Zeroing out everything in a square or AND-ing the mask there? (Also, does it gain performance in DPF mode only or does it in SPF mode too?)

I mean zeroing out everything in a square (though technically you can use any 16 bit pattern, so it doesn't need to be zero). This is faster if you do part of it with the CPU while you also run the Blitter to do the rest, because the Blitter only uses half of the available cycles when running in clear mode. The CPU can slot into the other half.

Is faster to do it this way in any display mode

TCH · 10 January 2022, 11:21

Quote:

Originally Posted by roondar

I mean zeroing out everything in a square (though technically you can use any 16 bit pattern, so it doesn't need to be zero). This is faster if you do part of it with the CPU while you also run the Blitter to do the rest, because the Blitter only uses half of the available cycles when running in clear mode. The CPU can slot into the other half.

Is faster to do it this way in any display mode

So, technically i can do the masking with the CPU and then i can blit with the Blitter as if there would not be any background? Or not masking, just overwriting it with a constant register?

roondar · 10 January 2022, 11:26

Quote:

Originally Posted by TCH

So, technically i can do the masking with the CPU and then i can blit with the Blitter as if there would not be any background? Or not masking, just overwriting it with a constant register?

This is only faster when you overwrite with a constant register and only works if the Blitter is doing the same operation (i.e. D channel only blit with BLTADAT set to the pattern you wish to use to clear).

Basically, on the 68000, any form of masking/copying is always much slower with the CPU than the Blitter. The clearing of data is an exception because you can take advantage of both the fact that the Blitter in that specific case only uses half the available cycles (for copy/cookie-cut this is not the case) and the fact that the 68000 can write constant values to memory using move.l or movem.l fairly quickly.

TCH · 10 January 2022, 11:37

The Blitter doing the same means, it is also doing clearing? So, just like in your tutorial, one half of the clearing is done by the Blitter and other half of the clearing is done by the CPU?

Is

movem

faster? Because for that, one would need to save a lot of registers. For instance this 68k code:

Code:

; a0 = pointer in bitplane #0 at x, y
; d0 = line length in longwords
; d1 = number of lines * number of bitplanes
; d2 = line modulo
; trashes: d3, d4

zero_out:		moveq	#0,	d3
zero_out_0:		move.w	d0,	d4
zero_out_1:		move.l	d3,	(a0)+
			subq	d4
			dbne	zero_out_1
			add.l	d2,	a0
			subq	d1
			dbne	zero_out_0
			rts

Would be slower or faster with

movem

, even counting the register saving?

(Also, is this the reason of an interleaved approach is faster: only one modulo, not two?)

roondar · 10 January 2022, 11:52

Quote:

Originally Posted by TCH

The Blitter doing the same means, it is also doing clearing? So, just like in your tutorial, one half of the clearing is done by the Blitter and other half of the clearing is done by the CPU?

Basically, yes (though the exact split that performs best is probably not 50/50 on 68000 as the CPU has slightly more overhead, not sure off the top of my head what the best split is for clearing - best for you to experiment a bit, I'd say

). Though most approaches using Blitter clearing I've seen so far tend to not split the blit by line but rather by bitplane.

Quote:

Is

movem

faster? Because for that, one would need to save a lot of registers. <<...>>Would be slower or faster with

movem

, even counting the register saving?

(Also, is this the reason of an interleaved approach is faster: only one modulo, not two?)

Whether or not movem.l is quicker depends on the size of the area you want to clear. For a big area like a full screen clear, it's almost certainly quicker. For something smaller like clearing space used by a bob, it's probably quicker to use move.l instead.

In general though, on 68000 performance is mostly gained by (partially) unrolling loops. For instance, if you know the area cleared will always be a multiple of 16 lines, it's normally best for performance to unroll the loop 16 times so that the amount of dbne's executed is as small as possible.

About interleaving: the reason for an interleaved approach being faster is normally that you only need to set up the expensive parts of the blit (calculate address/shift values, set up pointers) once over all the planes, rather than once per plane. Normally you'd not need an additional modulo for non-interleaved blitting, though. In fact, the modulo value for interleaved and non-interleaved blitting are normally the same. It's the height that changes

TCH · 10 January 2022, 12:27

Quote:

Originally Posted by roondar

Basically, yes (though the exact split that performs best is probably not 50/50 on 68000 as the CPU has slightly more overhead, not sure off the top of my head what the best split is for clearing - best for you to experiment a bit, I'd say

).

...

Whether or not movem.l is quicker depends on the size of the area you want to clear. For a big area like a full screen clear, it's almost certainly quicker. For something smaller like clearing space used by a bob, it's probably quicker to use move.l instead.

In general though, on 68000 performance is mostly gained by (partially) unrolling loops. For instance, if you know the area cleared will always be a multiple of 16 lines, it's normally best for performance to unroll the loop 16 times so that the amount of dbne's executed is as small as possible.

I see, thank you for the suggestions, i'll keep them in mind, especially the unrolling part.

Quote:

Originally Posted by roondar

Though most approaches using Blitter clearing I've seen so far tend to not split the blit by line but rather by bitplane.

How so? If interleaved is faster, then why don't they use it?

Quote:

Originally Posted by roondar

About interleaving: the reason for an interleaved approach being faster is normally that you only need to set up the expensive parts of the blit (calculate address/shift values, set up pointers) once over all the planes, rather than once per plane. Normally you'd not need an additional modulo for non-interleaved blitting, though. In fact, the modulo value for interleaved and non-interleaved blitting are normally the same. It's the height that changes

I don't get it. Either i know something wrongly, or i don't understand what are you saying. I need to clarify this, so i need to illustrate it:
AFAIK, for continous blitting, depending on the approach, we would need either two modulo:

Code:

Continous #1:
============
******************************************************************
* Bitplane #0                                                    *
******************************************************************
|                                                                |
|                                                                |
|                    00000000 < + line modulo                    |
|                    00000000 < + line modulo                    |
|                    00000000 < + line modulo                    |
|                    00000000 < + plane modulo                   |
|                                                                |
|                                                                |
******************************************************************
* Bitplane #1                                                    *
******************************************************************
|                                                                |
|                                                                |
|                    11111111 < + line modulo                    |
|                    11111111 < + line modulo                    |
|                    11111111 < + line modulo                    |
|                    11111111 < + plane modulo                   |
|                                                                |
|                                                                |
******************************************************************
* Bitplane #2                                                    *
******************************************************************
|                                                                |
|                                                                |
|                    22222222 < + line modulo                    |
|                    22222222 < + line modulo                    |
|                    22222222 < + line modulo                    |
|                    22222222                                    |
|                                                                |
|                                                                |
******************************************************************

or if we iterate it by plane to line, then either two modulo or a restoring point, pointing to the next line:

Code:

Continous #2:
============
******************************************************************
* Bitplane #0                                                    *
******************************************************************
|                                                                |
|                                                                |
|                    00000000 < + plane modulo                   |
|                    00000000 < + plane modulo                   |
|                    00000000 < + plane modulo                   |
|                    00000000 < + plane modulo                   |
|                                                                |
|                                                                |
******************************************************************
* Bitplane #1                                                    *
******************************************************************
|                                                                |
|                                                                |
|                    11111111 < + plane modulo                   |
|                    11111111 < + plane modulo                   |
|                    11111111 < + plane modulo                   |
|                    11111111 < + plane modulo                   |
|                                                                |
|                                                                |
******************************************************************
* Bitplane #2                                                    *
******************************************************************
|                                                                |
|                                                                |
|                    22222222 < = next line / - modulo B         |
|                    22222222 < = next line / - modulo B         |
|                    22222222 < = next line / - modulo B         |
|                    22222222                                    |
|                                                                |
|                                                                |
******************************************************************

While interleaved, it's just always the next line:

Code:

Interleaved:
============
******************************************************************
|                                                                |
|                                                                |
|                                                                |
|                                                                |
|                                                                |
|                                                                |
|                    00000000 < + line modulo                    |
|                    11111111 < + line modulo                    |
|                    22222222 < + line modulo                    |
|                    00000000 < + line modulo                    |
|                    11111111 < + line modulo                    |
|                    22222222 < + line modulo                    |
|                    00000000 < + line modulo                    |
|                    11111111 < + line modulo                    |
|                    22222222 < + line modulo                    |
|                    00000000 < + line modulo                    |
|                    11111111 < + line modulo                    |
|                    22222222                                    |
|                                                                |
|                                                                |
|                                                                |
|                                                                |
|                                                                |
|                                                                |
******************************************************************

For me it seems, that the expensive parts (the handling of each line's beginning and end) are the same at every line, but i might be wrong. Where is the error in my approach?

roondar · 10 January 2022, 14:51

Right, I see what you mean...

The thing is, the Blitter doesn't have two modulo's per channel, only one. So my translation into a soft-blitting approach also didn't. Instead, when blitting non-interleaved, you normally update the Blitter pointers between planes (i.e. you blit all lines of plane 1, set the pointers for plane 2, blit that, etc). I was assuming your soft-blitting code worked in a similar way.

But yes, on the CPU, you could use a second modulo to achieve the same, but you can also use separate calls using recalculated pointers.

TCH · 10 January 2022, 21:35

A-ha, okay, now i get it, thanks.

No, my algorithm was actually the "continous #2" approach, based on what i read in this forum topic.
So, either i use a continous display and then blit each bitplane by a separate call, or i use an interleaved, but with height x bitplanes as the number of lines.

A last stupid question:
For the "stacked" mask with the "tall" interleaved blit, if i have this mask:

the "stacked" mask itself is needed to be in interleaved format too, right?
So it will look like this:

I hope i got that correctly.

roondar · 10 January 2022, 22:21

Quote:

Originally Posted by TCH

A-ha, okay, now i get it, thanks.

No, my algorithm was actually the "continous #2" approach, based on what i read in this forum topic.
So, either i use a continous display and then blit each bitplane by a separate call, or i use an interleaved, but with height x bitplanes as the number of lines.

Note that there is nothing wrong with the approach you chose, it just wasn't what I had expected. If you want to use two modolu's like that with the CPU blit code, well... Why not?

Quote:

A last stupid question:
For the "stacked" mask with the "tall" interleaved blit, if i have this mask:
<<...>>
the "stacked" mask itself is needed to be in interleaved format too, right?
So it will look like this:
<<...>>
I hope i got that correctly.

Yup, that's exactly right for blitting in interleaved mode with the Blitter. Note here that, of course, a CPU based approach can do things slightly differently and only load the mask once per (line*planes) and reuse the already loaded mask for planes beyond the first.

TCH · 11 January 2022, 18:33

Quote:

Originally Posted by roondar

Note that there is nothing wrong with the approach you chose, it just wasn't what I had expected. If you want to use two modolu's like that with the CPU blit code, well... Why not?

Because it's slower than your approach.

I don't want to use two modulos, i'm just trying to figure out the fastest approach.

Quote:

Originally Posted by roondar

Yup, that's exactly right for blitting in interleaved mode with the Blitter. Note here that, of course, a CPU based approach can do things slightly differently and only load the mask once per (line*planes) and reuse the already loaded mask for planes beyond the first.

Yep, i know, the CPU can do anything we tell it.

This is a bit off here (hardblit question), but related: i've read, that the Blitter does the masking blit with the following formula:

DEST = (DEST & ~MASK) | (SRC & MASK)

I suspect that the answer is no, but is the double masking mandatory? If i have my source already masked and the mask already inverted, then can it be just simply:

DEST = (DEST & MASK) | SRC

?

roondar · 11 January 2022, 20:18

Quote:

Originally Posted by TCH

This is a bit off here (hardblit question), but related: i've read, that the Blitter does the masking blit with the following formula:

DEST = (DEST & ~MASK) | (SRC & MASK)

I suspect that the answer is no, but is the double masking mandatory? If i have my source already masked and the mask already inverted, then can it be just simply:

DEST = (DEST & MASK) | SRC

?

If the source is already masked out, then you can indeed do that. Mind you, given you still need 3 sources and one destination this would not speed up anything as the Blitter speed is primarily based on the number of channels enabled. In other words, keeping the same number of channels, but changing the minterm will not affect speed.

10 January 2022, 11:37	#33
TCH Newbie Amiga programmer Join Date: Jun 2012 Location: Front of my A500+ Age: 38 Posts: 372	The Blitter doing the same means, it is also doing clearing? So, just like in your tutorial, one half of the clearing is done by the Blitter and other half of the clearing is done by the CPU? Is movem faster? Because for that, one would need to save a lot of registers. For instance this 68k code: Code: ; a0 = pointer in bitplane #0 at x, y ; d0 = line length in longwords ; d1 = number of lines * number of bitplanes ; d2 = line modulo ; trashes: d3, d4 zero_out: moveq #0, d3 zero_out_0: move.w d0, d4 zero_out_1: move.l d3, (a0)+ subq d4 dbne zero_out_1 add.l d2, a0 subq d1 dbne zero_out_0 rts Would be slower or faster with movem , even counting the register saving? (Also, is this the reason of an interleaved approach is faster: only one modulo, not two?)

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
What demo ended with a spinning axe approaching the screen, followed by a...	Mark_C	request.Demos	4	26 August 2020 23:46
Alien Breed 3D - tactics?	Angus	support.Games	4	29 December 2019 17:26
Shadow Tactics - Commandos are back in Edo Japan!	Shoonay	Nostalgia & memories	0	11 December 2016 12:30
Winning Tactics (KO2/PM)	adalsgaard	support.Games	1	03 July 2015 16:50
Premier Manager 2 versions and tactics?	BrooksterMax	Retrogaming General Discussion	7	23 December 2010 09:49

09 January 2022, 22:23	#21
Samurai_Crow Total Chaos forever! Join Date: Aug 2007 Location: Waterville, MN, USA Age: 49 Posts: 2,187	When an image needs shifting, there is part of an image that needs to be made transparent pixels because the left and right edges have no source images being shifted in on one side and only image portions shifted in but no source on the other. Meynaf is referring to those cases as corner cases, in part because shifting bits in 68000 Assembly requires special care when shifting more than 16 bits in either direction. The C compiler generates that code for you. An interleaved bitplane display uses horizontal modulo registers to allow the bitplane rows to be stacked in memory vertically. On OCS that severely limits the display width because the modulo registers can skip a maximum of 1024 bits from one row to the next. That means if you have a 5 bitplane display, the maximum width a display can be is 256 pixels. Using a shallower palette depth helps with that by reducing the number of bitplanes to skip using the modulo. Also, ECS has a 15-bit modulo instead of 10-bit so in can handle much wider displays with this configuration. The way an interleaved display looks in memory is row 0 bitplane 0 is followed in memory by row 0 bitplane 1, followed by row 0 bitplane 2, up to row 0 bitplane d-1 where d is the screen depth. After that, you start over with row 1 for all bitplanes, then row 2 for all bitplanes, all the way up to row h-1 for all bitplanes where h is the display height. The reason for interleaved bitplanes are blitting speed allows all bitplanes to be processed as one tall bitplane. The disadvantage is that an interleaved "cookie-cutter" masked blit requires the mask plane to be duplicated in height for all bitplanes to get the speed advantage, thus costing a lot of chip memory.

10 January 2022, 14:51	#36
roondar Registered User Join Date: Jul 2015 Location: The Netherlands Posts: 3,411	Right, I see what you mean... The thing is, the Blitter doesn't have two modulo's per channel, only one. So my translation into a soft-blitting approach also didn't. Instead, when blitting non-interleaved, you normally update the Blitter pointers between planes (i.e. you blit all lines of plane 1, set the pointers for plane 2, blit that, etc). I was assuming your soft-blitting code worked in a similar way. But yes, on the CPU, you could use a second modulo to achieve the same, but you can also use separate calls using recalculated pointers.

10 January 2022, 21:35	#37
TCH Newbie Amiga programmer Join Date: Jun 2012 Location: Front of my A500+ Age: 38 Posts: 372	A-ha, okay, now i get it, thanks. No, my algorithm was actually the "continous #2" approach, based on what i read in this forum topic. So, either i use a continous display and then blit each bitplane by a separate call, or i use an interleaved, but with height x bitplanes as the number of lines. A last stupid question: For the "stacked" mask with the "tall" interleaved blit, if i have this mask: the "stacked" mask itself is needed to be in interleaved format too, right? So it will look like this: I hope i got that correctly.

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)