15 June 2022, 15:49 | #1 |
Registered User
Join Date: Dec 2019
Location: North Dakota
Posts: 741
|
OCS Blitter Speed (FrameBuffer Clear)
How much of a frame time [in %] on NTSC OCS does it take for Blitter to clear:
1 Bitplane 2 Bitplanes 4 Bitplanes 6 Bitplanes Each BP being (320x200) = 8,000 Bytes. From a quick search it would appear that Blitter on OCS is apparently slower than CPU version, but if that is the case it wouldn't be a huge issue for me because I could simply run other parts of 3D pipeline in parallel (like I do on Jaguar where I initiate the clear at beginning of the frame and in parallel do 3D transform and clipping). Of course, in a 6-BP EHB Mode I might not get many cycles for parallel execution, but it should still be faster in the end (meaning, if I lock framerate to 20 fps, I should still finish everything faster when working in parallel compared to a slightly faster CPU clear). |
15 June 2022, 15:54 | #2 | |
Registered User
Join Date: Dec 2019
Location: North Dakota
Posts: 741
|
I just found this:
Thread: https://eab.abime.net/showthread.php?t=103515 Quote:
|
|
15 June 2022, 15:58 | #3 |
Registered User
Join Date: Dec 2019
Location: North Dakota
Posts: 741
|
So, if it takes 50 scanlines out 256 to clear 10,240 Bytes, then for 6 BPs in EHB, it would take:
6 * 8,000 [Bytes] = 48,000 Bytes 48,000 / 10,240 = 4.6875 4.6875 * 50 [scanlines] = 234.375 scanlines 234.375 scanlines - that's basically almost entire frame - like 235/256 = ~92% of frame time ? Does that sound about right that it would actually take that long just to clear FrameBuffer ? The Use case here is a 3D starfield - I am wondering where exactly is the threshold where it still makes performance sense to just erase last-frame's stars instead of brute-force clear. Approach 1: BruteForce FB Clear - much faster DrawPixel version (that does not do AND masking), but gives us advantage of drawing anything else (as it will be cleared next frame autoamtically) Approach 2: No FB Clear - much slower DrawPixel version with AND masking, plus we have to clear last frame's pixels - effectively ~halving pixel throughput (but gaining the time it would take the clear FB) Depending on the benchmark numbers, the final scene complexity will quite differ and I am quite curious about the numbers for each approach (and the threshold). |
15 June 2022, 17:48 | #4 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,039
|
Blitter clear if good if you can do it 100% in parallel and never have to wait for it to finish. Otherwise, cpu+blitter split is faster. Blitclear is 0 sources, 1 destination, so it has an idle state when it's supposed to read, and it's not running at full speed like it would with 1+ sources, meaning there could be some unused dma slots depending on what the cpu and other dma are doing.
Best would be to benchmark it against cpu dot clear, I'd guess. Another option is adaptive approach, if you're under a certain number of stars you clear them individually with cpu, otherwise blit and/or cpu clear. |
15 June 2022, 20:05 | #5 | |
Registered User
Join Date: Dec 2019
Location: North Dakota
Posts: 741
|
Quote:
As long as there are no framedrops, 20 fps is plenty smooth. But there must be enough performance buffer for the worst CPU spikes (worst clipping scenarios plus AI spike)... What I'm realizing right now is that with 6 Bitplanes, even Blitter will be affected by DMA (though it still has higher priority than CPU), so it might even take longer than ~93% of frame time to clear all 6 BPs ? You mean, like, Level of Detail ? Perhaps as an option, yes. With the obvious impact on framerate... |
|
15 June 2022, 20:39 | #6 | |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,099
|
Quote:
I'm used to PAL, so numbers might be a bit off, but roughly in NTSC you have 262 scanlines with 223 usable DMA slots per scanline = 58426. 320x200x6 BPL uses 24000 (~41%). Clearing the screen using the blitter also takes 24000 leaving you very little (~10K) to anything else. Timewise the clearing would indeed take 48K CCKs (@~3.5Mhz) ~ almost a complete frame though every other DMA slot would be open (if it isn't used by for display). You're starting to see why very few (if any) games used EHB in game.. For the cut-off a/b mentioned it's not so much LOD as considering the trade-off. Simplified example assuming stock A500 clearing 8000 bytes using only blitter takes 4000 * 4 7Mhz cycles, say you've optimized clearing a pixel down to a single "and.b dN, ofs(AM)" instruction taking 16 * 7Mhz cycles the switch off point where it's better to use the blitter would be 1000 stars. You'd use method 2 (clear with CPU) if the number of displayed stars was less than 1000. |
|
15 June 2022, 23:29 | #7 | |||
Registered User
Join Date: Dec 2019
Location: North Dakota
Posts: 741
|
Quote:
Quote:
Probably can't move too many huge SW sprites around the screen with this much CPU throughput, but surely plenty games can be designed around that. Of course, it's a major complication for dev, so that must have played some role, I guess. Quote:
236c : ClearPixel 230c: DrawPixel (No AND Mask - assumes FB was cleared) 390c: DrawPixel (Clears all bits first, writes only 1s) 0.54*119,333 = 64,439c (available cycles after DMA given ~54% utilization) Let's just round the clearing of framebuffer to full frame (for comparison purposes). How much can we do in the same time ? 64,439 = PixelCount * (236+230) PixelCount = 64,439 / 466 = 138 - this is the threshold for EHB So, during same time it takes to Clear FrameBuffer, 138 individual pixels can be cleared and drawn anew. Well, slightly less in reality, because those 138 pixels need to be read from an array, which is additional cycles obviously. |
|||
16 June 2022, 12:01 | #8 |
Inviyya Dude!
Join Date: Sep 2016
Location: Amiga Island
Posts: 2,770
|
Aren't you coding on the Vamp?
Why are you interested in the Blitter than? Would be like the biggest bottleneck of all time if you used that instead of a simple CPU clear routine. |
16 June 2022, 14:28 | #9 | |
Registered User
Join Date: Dec 2019
Location: North Dakota
Posts: 741
|
Yes, but it's not exclusive anymore. Since I got a fantastic remote job a year ago, I had to temporarily put my big Vampire project on hold (for as long as I keep the job). And I discovered the wonderful world of OCS. It's kinda like Atari 800XL on steroids - it has Blitter, 32 registers, Copper, expandable RAM, OS and a giant variety of upgrades (up to the level of Jaguar and probably above with 060, MIPS-wise).
It's a better target to remake 8-bit games than Jaguar. Not to mention the absence of the general Jaguar toxic hostility omnipresent in jag forums. Here, we can actually have a technical conversation, people are willing to share what they know without belittling newbies (like me) and OCS gives a coder so many options how to implement things. Still, I wish there was a 160x200 native resolution for OCS... Quote:
If you break the pipeline into discreet batches, it's possible to time it perfectly with Blitter just having finished clearing (without ever having to wait). There's no EHB 3D game on OCS but that doesn't mean one can't be made. As long as one understands the constraints, it's doable. Basically - if I lock the framerate to 20 fps and have 3 full frames, I got 3 * 0.54 * 119,333 = 3 * 64,439c = 193,317 cycle budget. Question is - how much 3D can you do in 193,317c ? I don't know yet, but that's what we're trying to figure out now. |
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
blitter speed and fmode | jotd | Coders. Asm / Hardware | 2 | 19 June 2021 14:38 |
Fastest method to clear a single bitplane on Amiga OCS - My findings | BigT | Coders. General | 11 | 12 August 2020 19:51 |
Question for the pros about blitter clear and triple buffering | mc6809e | Coders. General | 2 | 02 May 2020 19:41 |
Data dependent OCS cycle-exact blitter speed? | hooverphonique | support.WinUAE | 4 | 18 November 2017 09:08 |
Blitter filling speed, how much? | sandruzzo | Coders. Asm / Hardware | 7 | 03 July 2015 14:38 |
|
|