English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 15 June 2022, 15:49   #1
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
OCS Blitter Speed (FrameBuffer Clear)

How much of a frame time [in %] on NTSC OCS does it take for Blitter to clear:
1 Bitplane
2 Bitplanes
4 Bitplanes
6 Bitplanes

Each BP being (320x200) = 8,000 Bytes.

From a quick search it would appear that Blitter on OCS is apparently slower than CPU version, but if that is the case it wouldn't be a huge issue for me because I could simply run other parts of 3D pipeline in parallel (like I do on Jaguar where I initiate the clear at beginning of the frame and in parallel do 3D transform and clipping).

Of course, in a 6-BP EHB Mode I might not get many cycles for parallel execution, but it should still be faster in the end (meaning, if I lock framerate to 20 fps, I should still finish everything faster when working in parallel compared to a slightly faster CPU clear).
VladR is offline  
Old 15 June 2022, 15:54   #2
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
I just found this:
Thread: https://eab.abime.net/showthread.php?t=103515
Quote:
Originally Posted by BigT View Post
I was messing around with clearing a single bitplane in ASM on A500 OCS ie 320x256=10240 bytes. My findings were as follows:

Code:
178 scanlines  clr.l    (a0)+ dbra loop - 
118 scanlines  clr.l    (a0)+ dbra loop unrolled x16 clr.l statements - 
 73 scanlines  move.l    d1,(a0)+  unrolled x16 - 
 56 scanlines  movem.l    d1-d6/a2-a3,-(a0) unrolled x2 - 
 50 scanlines  blitter D channel clear - 
 27 scanlines  Blitter D + movem.l combination -
What I was most surprised by was how close in performance a movem.l operation was to the blitter. I had always thought the blitter was much faster....
VladR is offline  
Old 15 June 2022, 15:58   #3
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
So, if it takes 50 scanlines out 256 to clear 10,240 Bytes, then for 6 BPs in EHB, it would take:
6 * 8,000 [Bytes] = 48,000 Bytes
48,000 / 10,240 = 4.6875

4.6875 * 50 [scanlines] = 234.375 scanlines

234.375 scanlines - that's basically almost entire frame - like 235/256 = ~92% of frame time ?


Does that sound about right that it would actually take that long just to clear FrameBuffer ?

The Use case here is a 3D starfield - I am wondering where exactly is the threshold where it still makes performance sense to just erase last-frame's stars instead of brute-force clear.
Approach 1: BruteForce FB Clear - much faster DrawPixel version (that does not do AND masking), but gives us advantage of drawing anything else (as it will be cleared next frame autoamtically)
Approach 2: No FB Clear - much slower DrawPixel version with AND masking, plus we have to clear last frame's pixels - effectively ~halving pixel throughput (but gaining the time it would take the clear FB)

Depending on the benchmark numbers, the final scene complexity will quite differ and I am quite curious about the numbers for each approach (and the threshold).
VladR is offline  
Old 15 June 2022, 17:48   #4
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,039
Blitter clear if good if you can do it 100% in parallel and never have to wait for it to finish. Otherwise, cpu+blitter split is faster. Blitclear is 0 sources, 1 destination, so it has an idle state when it's supposed to read, and it's not running at full speed like it would with 1+ sources, meaning there could be some unused dma slots depending on what the cpu and other dma are doing.
Best would be to benchmark it against cpu dot clear, I'd guess. Another option is adaptive approach, if you're under a certain number of stars you clear them individually with cpu, otherwise blit and/or cpu clear.
a/b is online now  
Old 15 June 2022, 20:05   #5
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
Quote:
Originally Posted by a/b View Post
Blitter clear if good if you can do it 100% in parallel and never have to wait for it to finish. Otherwise, cpu+blitter split is faster.
Yeah, I don't think I should have to wait for Blitter, because at the very least, I have 2 full frames worth of work (maybe 3 -> 20 fps).
As long as there are no framedrops, 20 fps is plenty smooth. But there must be enough performance buffer for the worst CPU spikes (worst clipping scenarios plus AI spike)...

What I'm realizing right now is that with 6 Bitplanes, even Blitter will be affected by DMA (though it still has higher priority than CPU), so it might even take longer than ~93% of frame time to clear all 6 BPs ?

Quote:
Originally Posted by a/b View Post
Another option is adaptive approach, if you're under a certain number of stars you clear them individually with cpu, otherwise blit and/or cpu clear.
You mean, like, Level of Detail ? Perhaps as an option, yes. With the obvious impact on framerate...
VladR is offline  
Old 15 June 2022, 20:39   #6
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,099
Quote:
Originally Posted by VladR View Post
Yeah, I don't think I should have to wait for Blitter, because at the very least, I have 2 full frames worth of work (maybe 3 -> 20 fps).
As long as there are no framedrops, 20 fps is plenty smooth. But there must be enough performance buffer for the worst CPU spikes (worst clipping scenarios plus AI spike)...

What I'm realizing right now is that with 6 Bitplanes, even Blitter will be affected by DMA (though it still has higher priority than CPU), so it might even take longer than ~93% of frame time to clear all 6 BPs ?


You mean, like, Level of Detail ? Perhaps as an option, yes. With the obvious impact on framerate...

I'm used to PAL, so numbers might be a bit off, but roughly in NTSC you have 262 scanlines with 223 usable DMA slots per scanline = 58426. 320x200x6 BPL uses 24000 (~41%). Clearing the screen using the blitter also takes 24000 leaving you very little (~10K) to anything else. Timewise the clearing would indeed take 48K CCKs (@~3.5Mhz) ~ almost a complete frame though every other DMA slot would be open (if it isn't used by for display). You're starting to see why very few (if any) games used EHB in game..

For the cut-off a/b mentioned it's not so much LOD as considering the trade-off. Simplified example assuming stock A500 clearing 8000 bytes using only blitter takes 4000 * 4 7Mhz cycles, say you've optimized clearing a pixel down to a single "and.b dN, ofs(AM)" instruction taking 16 * 7Mhz cycles the switch off point where it's better to use the blitter would be 1000 stars. You'd use method 2 (clear with CPU) if the number of displayed stars was less than 1000.
paraj is offline  
Old 15 June 2022, 23:29   #7
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
Quote:
Originally Posted by paraj View Post
I'm used to PAL, so numbers might be a bit off, but roughly in NTSC you have 262 scanlines with 223 usable DMA slots per scanline = 58426. 320x200x6 BPL uses 24000 (~41%). Clearing the screen using the blitter also takes 24000 leaving you very little (~10K) to anything else. Timewise the clearing would indeed take 48K CCKs (@~3.5Mhz) ~ almost a complete frame though every other DMA slot would be open (if it isn't used by for display).
Ouch Thanks for confirming.

Quote:
Originally Posted by paraj View Post
You're starting to see why very few (if any) games used EHB in game..
Well, actually, not really. For double-buffering, it's not that huge of a deal. Yes, it sucks having one full frame destroyed by clearing framebuffer, but we can still design the game around 30 or 20 fps. For EHB, probably 20 fps, because EHB takes 24,000 DMA slots just for display. But, even then, we still have 2 full frames worth of CPU time (well, 2 frames of ~54% anyway - kinda like just one full frame at 4 BPLs).
Probably can't move too many huge SW sprites around the screen with this much CPU throughput, but surely plenty games can be designed around that.
Of course, it's a major complication for dev, so that must have played some role, I guess.

Quote:
Originally Posted by paraj View Post
For the cut-off a/b mentioned it's not so much LOD as considering the trade-off. Simplified example assuming stock A500 clearing 8000 bytes using only blitter takes 4000 * 4 7Mhz cycles, say you've optimized clearing a pixel down to a single "and.b dN, ofs(AM)" instruction taking 16 * 7Mhz cycles the switch off point where it's better to use the blitter would be 1000 stars. You'd use method 2 (clear with CPU) if the number of displayed stars was less than 1000.
Let's talk specific numbers for EHB (6 BPLs):

236c : ClearPixel
230c: DrawPixel (No AND Mask - assumes FB was cleared)
390c: DrawPixel (Clears all bits first, writes only 1s)

0.54*119,333 = 64,439c (available cycles after DMA given ~54% utilization)

Let's just round the clearing of framebuffer to full frame (for comparison purposes).
How much can we do in the same time ?
64,439 = PixelCount * (236+230)
PixelCount = 64,439 / 466 = 138 - this is the threshold for EHB

So, during same time it takes to Clear FrameBuffer, 138 individual pixels can be cleared and drawn anew.
Well, slightly less in reality, because those 138 pixels need to be read from an array, which is additional cycles obviously.
VladR is offline  
Old 16 June 2022, 12:01   #8
Tigerskunk
Inviyya Dude!
 
Tigerskunk's Avatar
 
Join Date: Sep 2016
Location: Amiga Island
Posts: 2,770
Aren't you coding on the Vamp?

Why are you interested in the Blitter than? Would be like the biggest bottleneck of all time if you used that instead of a simple CPU clear routine.
Tigerskunk is offline  
Old 16 June 2022, 14:28   #9
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
Quote:
Originally Posted by Tigerskunk View Post
Aren't you coding on the Vamp?
Yes, but it's not exclusive anymore. Since I got a fantastic remote job a year ago, I had to temporarily put my big Vampire project on hold (for as long as I keep the job). And I discovered the wonderful world of OCS. It's kinda like Atari 800XL on steroids - it has Blitter, 32 registers, Copper, expandable RAM, OS and a giant variety of upgrades (up to the level of Jaguar and probably above with 060, MIPS-wise).

It's a better target to remake 8-bit games than Jaguar. Not to mention the absence of the general Jaguar toxic hostility omnipresent in jag forums.

Here, we can actually have a technical conversation, people are willing to share what they know without belittling newbies (like me) and OCS gives a coder so many options how to implement things.

Still, I wish there was a 160x200 native resolution for OCS...

Quote:
Originally Posted by Tigerskunk View Post
Why are you interested in the Blitter than? Would be like the biggest bottleneck of all time if you used that instead of a simple CPU clear routine.
Because on Jaguar, at the start of the frame, I initiate the FrameBuffer Clear, and in parallel, start doing the 3d pipeline.
If you break the pipeline into discreet batches, it's possible to time it perfectly with Blitter just having finished clearing (without ever having to wait).

There's no EHB 3D game on OCS but that doesn't mean one can't be made. As long as one understands the constraints, it's doable.


Basically - if I lock the framerate to 20 fps and have 3 full frames, I got 3 * 0.54 * 119,333 = 3 * 64,439c = 193,317 cycle budget.

Question is - how much 3D can you do in 193,317c ? I don't know yet, but that's what we're trying to figure out now.
VladR is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
blitter speed and fmode jotd Coders. Asm / Hardware 2 19 June 2021 14:38
Fastest method to clear a single bitplane on Amiga OCS - My findings BigT Coders. General 11 12 August 2020 19:51
Question for the pros about blitter clear and triple buffering mc6809e Coders. General 2 02 May 2020 19:41
Data dependent OCS cycle-exact blitter speed? hooverphonique support.WinUAE 4 18 November 2017 09:08
Blitter filling speed, how much? sandruzzo Coders. Asm / Hardware 7 03 July 2015 14:38

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 16:13.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.13001 seconds with 13 queries