English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General

 
 
Thread Tools
Old 08 January 2022, 13:51   #1
TCH
Newbie Amiga programmer
 
TCH's Avatar
 
Join Date: Jun 2012
Location: Front of my A500+
Age: 38
Posts: 372
Approaching tactics of softblitting

I would like to know experts' opinion on this: what is the fastest approach in general?

- How should the blitting phases done: mask and copy in one or two steps?
- How should the blitting segments done: edges (which does not align to word boundaries) separatively from the inner parts or at the same time?
- How should the blitting cycles done: copy one plane at one time or copy all the planes at the same time?
- How should the blitting flow done: columns (16 or 32-bit) or rows?

I am not looking for exact solutions, sources, but approaches, arguments, maybe benchmarks.
TCH is offline  
Old 08 January 2022, 14:01   #2
Samurai_Crow
Total Chaos forever!
 
Samurai_Crow's Avatar
 
Join Date: Aug 2007
Location: Waterville, MN, USA
Age: 49
Posts: 2,186
The main advantage to soft blitting is that you can reuse the mask for all the bitplanes at once. Does this answer your question at least partially?
Samurai_Crow is offline  
Old 08 January 2022, 14:11   #3
TCH
Newbie Amiga programmer
 
TCH's Avatar
 
Join Date: Jun 2012
Location: Front of my A500+
Age: 38
Posts: 372
Two out of four. So all planes should be blitted at the same time with masking and copying in one step. Thanks.
TCH is offline  
Old 08 January 2022, 14:22   #4
Samurai_Crow
Total Chaos forever!
 
Samurai_Crow's Avatar
 
Join Date: Aug 2007
Location: Waterville, MN, USA
Age: 49
Posts: 2,186
Blitting columns of 32 bits mainly has an advantage on AGA and maybe the A3000. Even then it needs 32 bit alignment if at all possible. Register usage on the 020+ comes into play also. The blitter just does rows.
Samurai_Crow is offline  
Old 08 January 2022, 14:26   #5
jotd
This cat is no more
 
jotd's Avatar
 
Join Date: Dec 2004
Location: FRANCE
Age: 52
Posts: 8,161
I use CPU / blitter mixed routine, but the CPU is only clearing/restoring planes when the blitter is blitting another plane.

Avoids the hassle of handling shifting (which I don't support), just restore or clear in byte aligned or word aligned.

My "blit_character" routine clears the old position character and blits the current one, allowing to perform CPU copy and blit at the same time (less time wasted with blitter waits)
jotd is offline  
Old 08 January 2022, 14:29   #6
TCH
Newbie Amiga programmer
 
TCH's Avatar
 
Join Date: Jun 2012
Location: Front of my A500+
Age: 38
Posts: 372
@Samurai_Crow:
Isn't
Code:
move.l (a0+),(a1+)
faster than
Code:
move.w (a0+),(a1+)
move.w (a0+),(a1+)
even with 68000 and 16-bit Chip RAM?

Anyway, so columns are not recommended, thanks for answering that. Now the only question what remains is if the edges should be blitted before/after the inner parts or at the same time, before/after each line. This means calculating and/or stack operations for the paddings each line or two additional blitting. Which is faster?

@jotd:
I plan to do that too, a mixed approach, although not yet sure what to task the CPU with...
How do you avoid shifting?

Last edited by TCH; 08 January 2022 at 14:32. Reason: another answer arrived
TCH is offline  
Old 08 January 2022, 18:36   #7
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,408
I did a mixed CPU/Blitter approach for AGA systems (well, the A1200 at any rate) a while back. You can get the source code and a description of how I did it here: http://powerprograms.nl/amiga/cpu-blit-assist.html. Obviously, if you don't want to use the source (as you wrote in the OP), you can still check out the idea behind it all in detail in the article.

However, if you want to soft-blit the best tactic speedwise is probably to use a 68020+ with Fast RAM, do all the soft-blitting in Fast RAM (i.e. have all GFX in Fast RAM, including copies of the screen buffers) and use a dirty rectangles approach to see which parts have changed, then copy over the dirty rectangles. Done well, that should be much faster than the approach I used.

Basically, the advantage of my approach is that you don't need Fast RAM

Edit: even on 68000 a 32 bit move will be faster than two 16 bit moves, however, unless you're using the CPU just to clear (not copy) parts of the screen I don't think soft-blitting will ever be faster on a 68000 based system. Of course, should I be wrong about that I'd love to know that too

Last edited by roondar; 08 January 2022 at 19:02. Reason: Missed that the OP didn't need source
roondar is offline  
Old 08 January 2022, 20:21   #8
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
I don't know if my method is the fastest, but it is fast enough for my use and it is the most flexible - it can blit anything, any place.


Quote:
Originally Posted by TCH View Post
- How should the blitting phases done: mask and copy in one or two steps?
Depends what you call "mask".
I precompute the mask from wanted transparency color upon loading gfx data. Then it is simply reused for every plane.


Quote:
Originally Posted by TCH View Post
- How should the blitting segments done: edges (which does not align to word boundaries) separatively from the inner parts or at the same time?
A little bit of special case at start of line, a little bit of special case at the end, but it's just mask adjusting to remove what's not in the area.


Quote:
Originally Posted by TCH View Post
- How should the blitting cycles done: copy one plane at one time or copy all the planes at the same time?
Copy one plane at a time only for current line, i.e. follow the layout of interleaved bitplanes.


Quote:
Originally Posted by TCH View Post
- How should the blitting flow done: columns (16 or 32-bit) or rows?
Rows.
meynaf is offline  
Old 09 January 2022, 00:52   #9
TCH
Newbie Amiga programmer
 
TCH's Avatar
 
Join Date: Jun 2012
Location: Front of my A500+
Age: 38
Posts: 372
Quote:
Originally Posted by roondar View Post
I did a mixed CPU/Blitter approach for AGA systems (well, the A1200 at any rate) a while back. You can get the source code and a description of how I did it here: http://powerprograms.nl/amiga/cpu-blit-assist.html.
Wow, thank you, this article was extremely useful. I've learned a lot. Most importantly that on the A500 i will not be able to gain any extra performance for splitting up the blitting work between the CPU and the Blitter...
Splitting up the objects themselves was an interesting idea; i planned to split the list of objects: while the Blitter blits - IDK, random number - 5 objects, the CPU could blit the sixth. I don't know if that would work though...
Quote:
Originally Posted by roondar View Post
Obviously, if you don't want to use the source (as you wrote in the OP), you can still check out the idea behind it all in detail in the article.
Oh, i will study the sources, thank you. I just don't want to simply use somebody else's sources, i would like to learn this.
Quote:
Originally Posted by roondar View Post
However, if you want to soft-blit the best tactic speedwise is probably to use a 68020+ with Fast RAM, do all the soft-blitting in Fast RAM (i.e. have all GFX in Fast RAM, including copies of the screen buffers) and use a dirty rectangles approach to see which parts have changed, then copy over the dirty rectangles. Done well, that should be much faster than the approach I used.
No, i was aiming for stock A500 "support". Maybe stock A1200 too, later.
Quote:
Originally Posted by roondar View Post
Basically, the advantage of my approach is that you don't need Fast RAM
Good, because stock machines do not have that.
Quote:
Originally Posted by roondar View Post
Edit: even on 68000 a 32 bit move will be faster than two 16 bit moves, however, unless you're using the CPU just to clear (not copy) parts of the screen I don't think soft-blitting will ever be faster on a 68000 based system. Of course, should I be wrong about that I'd love to know that too
No-no, i only wanted to use the CPU as a "supplementary" blitter, not to do the entire blitting with it.
I suspected that a 32-bit move would be faster than two 16-bit, even on a stock A500. Still, based on your article, whatever i would try, would fail, so i guess, on an A500 i should stick to the Blitter. Still it is worth a try on the A1200. Thanks for the insight.
Quote:
Originally Posted by meynaf View Post
Depends what you call "mask".
I precompute the mask from wanted transparency color upon loading gfx data. Then it is simply reused for every plane.
Of course, the mask is pre-computed. By "mask" i meant a one-bitplaned image which contains zeroes at where the object is positioned on the actual image and ones at the rest of the bitmap, so when i will AND the screen data with this bitmap, all the opaque bits will be zeroed out, so i can OR the actual image onto the screen bitmap. I think that is what is called as a mask.
Quote:
Originally Posted by meynaf View Post
A little bit of special case at start of line, a little bit of special case at the end, but it's just mask adjusting to remove what's not in the area.
That's ok, but will be that faster if it's done in the same place as the rest of the line than if it is handled separately? Branches eats up lots of cycles, so if there are no special cases at the linecopy (because it's handled elsewhere) it might be faster? Just asking.
Quote:
Originally Posted by meynaf View Post
Copy one plane at a time only for current line, i.e. follow the layout of interleaved bitplanes.
Thanks for the tip, but why interleaved? My images are in ACBM format, not ILBM, so they are not interleaved, but continous.
Also, Samurai_Crow above suggested, that i should copy all bitplanes per line and not all lines per bitplane, or did i misunderstood, or his approach is good for ACBM and yours for ILBM?
Quote:
Originally Posted by meynaf View Post
Rows.
Ack. Thanks.

@topic:
I've come up with a generic softblit function in C. Works so far, but of course it is far from optimized or well designed, it's just the PoC; now at least i can point to parts of the algorithm and ask questions.
The file: http://oscomp.hu/depot/amiga_softblit.c
So, back to the starting point about the planecopying, because i have two conflicting suggestions: all lines in one bitplane or all bitplanes in one line?

Last edited by TCH; 09 January 2022 at 00:53. Reason: last paragraph
TCH is offline  
Old 09 January 2022, 01:24   #10
Samurai_Crow
Total Chaos forever!
 
Samurai_Crow's Avatar
 
Join Date: Aug 2007
Location: Waterville, MN, USA
Age: 49
Posts: 2,186
My suggestion of reusing the mask is because the blitter itself can't do that. As the screen depth becomes deeper, the blitter wastes more time rereading the mask. The best it can do is have the same mask duplicated vertically in memory and then have the source bitplanes stacked vertically in memory the same way. Using a modulo register trick allows the blitter to blit all the planes in one pass. The CPU can achieve similar results from a single noninterleaved mask plane by holding the mask in a register and only flushing it out after having processed all of the bitplanes for that row. Either way is accessible to the CPU but if you are using the blitter for the same bobs anyway, you can benefit its performance more using interleaved blitting.
Samurai_Crow is offline  
Old 09 January 2022, 08:57   #11
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by TCH View Post
That's ok, but will be that faster if it's done in the same place as the rest of the line than if it is handled separately? Branches eats up lots of cycles, so if there are no special cases at the linecopy (because it's handled elsewhere) it might be faster? Just asking.
The special cases are not handled inside the loop. It's just special work done at start to setup for the first iteration and special work done at end to setup for the last one.


Quote:
Originally Posted by TCH View Post
Thanks for the tip, but why interleaved? My images are in ACBM format, not ILBM, so they are not interleaved, but continous.
Also, Samurai_Crow above suggested, that i should copy all bitplanes per line and not all lines per bitplane, or did i misunderstood, or his approach is good for ACBM and yours for ILBM?
My images are ILBM, so...
Copying bitplanes one by one also has an issue : it forces you to either use a back-buffer or do frame flipping. Otherwise the beam might catch up and an incomplete image with totally wrong colors is shown.
meynaf is offline  
Old 09 January 2022, 11:25   #12
TCH
Newbie Amiga programmer
 
TCH's Avatar
 
Join Date: Jun 2012
Location: Front of my A500+
Age: 38
Posts: 372
Quote:
Originally Posted by Samurai_Crow View Post
My suggestion of reusing the mask is because the blitter itself can't do that. As the screen depth becomes deeper, the blitter wastes more time rereading the mask. The best it can do is have the same mask duplicated vertically in memory and then have the source bitplanes stacked vertically in memory the same way. Using a modulo register trick allows the blitter to blit all the planes in one pass. The CPU can achieve similar results from a single noninterleaved mask plane by holding the mask in a register and only flushing it out after having processed all of the bitplanes for that row. Either way is accessible to the CPU but if you are using the blitter for the same bobs anyway, you can benefit its performance more using interleaved blitting.
Does that mean, that the Blitter only can work with interleaved images, not continous? Is this trick you had described faster than a "normal" Blitter blitting?
Quote:
Originally Posted by meynaf View Post
The special cases are not handled inside the loop. It's just special work done at start to setup for the first iteration and special work done at end to setup for the last one.
How? The shifting at the start must be done each line, isn't it?
Quote:
Originally Posted by meynaf View Post
My images are ILBM, so...
Copying bitplanes one by one also has an issue : it forces you to either use a back-buffer or do frame flipping. Otherwise the beam might catch up and an incomplete image with totally wrong colors is shown.
Doesn't that mean, that if i do it line by line, then if the beam catches up, then partial images will be shown, although with correct colours?
TCH is offline  
Old 09 January 2022, 11:49   #13
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by TCH View Post
How? The shifting at the start must be done each line, isn't it?
The special cases are for masking out start/end of line, not for shifting - which is of course done for every data regardless of its position.


Quote:
Originally Posted by TCH View Post
Doesn't that mean, that if i do it line by line, then if the beam catches up, then partial images will be shown, although with correct colours?
Somehow. Current partially done line will be wrong due some planes are done and not others (this wouldn't happen with chunky pixels), but the image will show a lot less artifacts.
The goal however is more to leave more time finishing the blit. If you do it plane after plane, then all planes minus one must be finished when first line is reached by the beam. However, by doing it line by line, it has to catch up with current line.
meynaf is offline  
Old 09 January 2022, 14:33   #14
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,408
Quote:
Originally Posted by TCH View Post
I suspected that a 32-bit move would be faster than two 16-bit, even on a stock A500. Still, based on your article, whatever i would try, would fail, so i guess, on an A500 i should stick to the Blitter. Still it is worth a try on the A1200. Thanks for the insight.
To clarify, on the A500 combining Blitter & CPU won't get you performance for copying or cookie-cut operations, but you can use the CPU + Blitter to make clear operations much faster (which can be useful for instance in Dual Playfield mode). This wasn't the focus of the article, so it wasn't in there, but I thought it might be useful information
roondar is offline  
Old 09 January 2022, 14:51   #15
Samurai_Crow
Total Chaos forever!
 
Samurai_Crow's Avatar
 
Join Date: Aug 2007
Location: Waterville, MN, USA
Age: 49
Posts: 2,186
Quote:
Originally Posted by TCH View Post
Does that mean, that the Blitter only can work with interleaved images, not continous? Is this trick you had described faster than a "normal" Blitter blitting?
"Normal" blitting needs a separate blit for each bitplane but interleaved blitting uses one tall blit for the entire image. The time savings comes from the blitter interrupt causing the CPU to have to set up additional blit operations when they could have been combined.
Samurai_Crow is offline  
Old 09 January 2022, 17:24   #16
TCH
Newbie Amiga programmer
 
TCH's Avatar
 
Join Date: Jun 2012
Location: Front of my A500+
Age: 38
Posts: 372
Quote:
Originally Posted by meynaf View Post
The special cases are for masking out start/end of line, not for shifting - which is of course done for every data regardless of its position.
But you have to shift the first word of each line of the image by X mod 16 to the right, to align with the X coordinate on the screen, aren't you?
Quote:
Originally Posted by meynaf View Post
The goal however is more to leave more time finishing the blit. If you do it plane after plane, then all planes minus one must be finished when first line is reached by the beam. However, by doing it line by line, it has to catch up with current line.
And what about that "tall" blitting Samurai_Crow is talking about, what the program races there?
BTW, is double buffering recommended on a stock A500 to avoid tearing, or it is not recommended, because it takes up a lot of time?
Quote:
Originally Posted by roondar View Post
To clarify, on the A500 combining Blitter & CPU won't get you performance for copying or cookie-cut operations, but you can use the CPU + Blitter to make clear operations much faster (which can be useful for instance in Dual Playfield mode). This wasn't the focus of the article, so it wasn't in there, but I thought it might be useful information
It is, but what do you mean by "clearing"? Zeroing out everything in a square or AND-ing the mask there? (Also, does it gain performance in DPF mode only or does it in SPF mode too?)
Quote:
Originally Posted by Samurai_Crow View Post
"Normal" blitting needs a separate blit for each bitplane but interleaved blitting uses one tall blit for the entire image. The time savings comes from the blitter interrupt causing the CPU to have to set up additional blit operations when they could have been combined.
I'm not sure if i understand. So, "normal" blit, blits one plane, but is that plane continous or is it interleaved? Also this "interleaved" blit; what does it mean, what is interleaved here? The lines themselves, or the planes, but the trick is that the Blitter blits them as if they would be lines?

Sorry if i'm making stupid questions, but i've just begun learning about the Blitter and the screen handling and the register layouts and descriptions are anything but clear.
TCH is offline  
Old 09 January 2022, 17:54   #17
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by TCH View Post
But you have to shift the first word of each line of the image by X mod 16 to the right, to align with the X coordinate on the screen, aren't you?
Why the heck would you want to shift only the first word ?
It means the rest of the line will not be blitted at the correct place !


Quote:
Originally Posted by TCH View Post
And what about that "tall" blitting Samurai_Crow is talking about, what the program races there?
That's blitter stuff, we're talking about cpu blit here. With cpu it does not make much of a difference anyway.


Quote:
Originally Posted by TCH View Post
BTW, is double buffering recommended on a stock A500 to avoid tearing, or it is not recommended, because it takes up a lot of time?
I can't tell, that depends on the application.
meynaf is offline  
Old 09 January 2022, 18:02   #18
TCH
Newbie Amiga programmer
 
TCH's Avatar
 
Join Date: Jun 2012
Location: Front of my A500+
Age: 38
Posts: 372
Quote:
Originally Posted by meynaf View Post
Why the heck would you want to shift only the first word ?
It means the rest of the line will not be blitted at the correct place !
I did not say that the rest should not be shifted, of course they should. I think i no longer understand what do you talking about the special cases. Could you please check my PoC code and point out which part is unnecessarily? In the code i had to handle both the left and the right edge word out of the internal loop, but still in the big loop.
TCH is offline  
Old 09 January 2022, 19:33   #19
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by TCH View Post
I did not say that the rest should not be shifted, of course they should. I think i no longer understand what do you talking about the special cases. Could you please check my PoC code and point out which part is unnecessarily? In the code i had to handle both the left and the right edge word out of the internal loop, but still in the big loop.
Sorry, i'm not a big fan of C, especially with zero comments inside.
Left and right edge only need to adjust transparency masks so that they don't overflow, that's all for me.
meynaf is offline  
Old 09 January 2022, 20:50   #20
TCH
Newbie Amiga programmer
 
TCH's Avatar
 
Join Date: Jun 2012
Location: Front of my A500+
Age: 38
Posts: 372
What do you mean by "adjust" and "overflow"?

I cannot even imagine, how can i skip the handling of the left and right edge at each line...
TCH is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
What demo ended with a spinning axe approaching the screen, followed by a... Mark_C request.Demos 4 26 August 2020 23:46
Alien Breed 3D - tactics? Angus support.Games 4 29 December 2019 17:26
Shadow Tactics - Commandos are back in Edo Japan! Shoonay Nostalgia & memories 0 11 December 2016 12:30
Winning Tactics (KO2/PM) adalsgaard support.Games 1 03 July 2015 16:50
Premier Manager 2 versions and tactics? BrooksterMax Retrogaming General Discussion 7 23 December 2010 09:49

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 16:34.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.10497 seconds with 13 queries