English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General

 
 
Thread Tools
Old 16 April 2020, 23:39   #1
olleharstedt
Registered User

 
Join Date: Mar 2020
Location: Hamburg
Posts: 20
Will a faster CPU make the blitter obsolete?

Had some arguments with a very knowledgeable Amiga engineer, but didn't really understand the arguments. As I understand it, the blitter is fully asynchronous, right? And if you turn off multitasking, you can use it fully. So will a faster CPU, like 040, 060, really make the blitter obsolete, since it can't keep up with the speed of the CPU? Does it take too much time to prep the blitter for each "job"? What's your perspective here?
olleharstedt is offline  
Old 17 April 2020, 00:20   #2
Samurai_Crow
Total Chaos forever!

Samurai_Crow's Avatar
 
Join Date: Aug 2007
Location: Ft. Collins, CO USA
Age: 45
Posts: 1,488
Send a message via Yahoo to Samurai_Crow
If you're referring to the Vampire, yes. SDRAM only goes a certain speed and even a 128-bit bus can only provide so much bandwidth.

The gains are temporary though. A Vampire stand-alone v4 has DDR3 which provides much more bandwidth than the SDRAM of a Vampire v2. In some theoretical future Amiga-like chipset, the chip RAM could use GDDR6 instead of SDRAM and DDR4 or 5 for the CPU and its fast RAM.

In short, it all depends on the bandwidth.
Samurai_Crow is offline  
Old 17 April 2020, 03:15   #3
ReadOnlyCat
Code Kitten

 
Join Date: Aug 2015
Location: Montreal/Canadia
Age: 48
Posts: 1,175
Quote:
Originally Posted by Samurai_Crow View Post
If you're referring to the Vampire, yes. SDRAM only goes a certain speed and even a 128-bit bus can only provide so much bandwidth.
I actually think Olle is referring to regular 040, 060 CPUs:

Quote:
Originally Posted by olleharstedt View Post
So will a faster CPU, like 040, 060, really make the blitter obsolete, since it can't keep up with the speed of the CPU? Does it take too much time to prep the blitter for each "job"? What's your perspective here?
The answer is "yes" for some tasks, and "no" for others.

It is true that the 040 and 060 outperform the Blitter at its own tasks but they are bandwidth limited by the Chip RAM bus so even if they can compute much faster than the Blitter, they can only do so in Fast RAM. When they want to touch Chip RAM, they have to slow down considerably. They will still, however, generally operate faster than the Blitter can even then.

However, the Blitter + CPU combination may be faster than CPU alone since the CPU can work in Fast RAM without stealing any cycles from the Blitter which will be working in Chip RAM.

It really depends on what you want to do with the Blitter and whether it is worth to waste corresponding cycles on the CPU.
If the Blitter is fast enough to do the task you want in less than one frame, then you should probably be using it rather than the CPU.
Why? Because, even though the CPU would be much faster, it would be wasting most of its time waiting for the Chip RAM bus to allow it to push data. In that case, it is better to let the Blitter do the job and have the CPU do something else at full speed in Fast RAM.
ReadOnlyCat is offline  
Old 17 April 2020, 10:14   #4
olleharstedt_wo
Registered User

 
Join Date: Apr 2020
Location: Hamburg, DE
Posts: 10
Quote:
Originally Posted by Samurai_Crow View Post
If you're referring to the Vampire, yes. SDRAM only goes a certain speed and even a 128-bit bus can only provide so much bandwidth.

The gains are temporary though. A Vampire stand-alone v4 has DDR3 which provides much more bandwidth than the SDRAM of a Vampire v2. In some theoretical future Amiga-like chipset, the chip RAM could use GDDR6 instead of SDRAM and DDR4 or 5 for the CPU and its fast RAM.

In short, it all depends on the bandwidth.

Hm, OK, so what's the bandwidth to the Vampire video memory? The video memory in Amiga is chipmem, right?
olleharstedt_wo is offline  
Old 17 April 2020, 10:16   #5
olleharstedt_wo
Registered User

 
Join Date: Apr 2020
Location: Hamburg, DE
Posts: 10
Quote:
Originally Posted by ReadOnlyCat View Post
I actually think Olle is referring to regular 040, 060 CPUs:



The answer is "yes" for some tasks, and "no" for others.

It is true that the 040 and 060 outperform the Blitter at its own tasks but they are bandwidth limited by the Chip RAM bus so even if they can compute much faster than the Blitter, they can only do so in Fast RAM. When they want to touch Chip RAM, they have to slow down considerably. They will still, however, generally operate faster than the Blitter can even then.

However, the Blitter + CPU combination may be faster than CPU alone since the CPU can work in Fast RAM without stealing any cycles from the Blitter which will be working in Chip RAM.

It really depends on what you want to do with the Blitter and whether it is worth to waste corresponding cycles on the CPU.
If the Blitter is fast enough to do the task you want in less than one frame, then you should probably be using it rather than the CPU.
Why? Because, even though the CPU would be much faster, it would be wasting most of its time waiting for the Chip RAM bus to allow it to push data. In that case, it is better to let the Blitter do the job and have the CPU do something else at full speed in Fast RAM.

Thanks!
olleharstedt_wo is offline  
Old 18 April 2020, 01:24   #6
VladR
Registered User

 
Join Date: Dec 2019
Location: North Dakota
Posts: 235
This is an extremely generic question. You really need to run detailed benchmarks yourself:

1. Implement both codepaths (Blitter, SW) in hand-optimized Assembler (for the love of god, avoid patently absurd atrocities like C or SDL for this purpose)
2. Benchmark

Done ! Simple, eh


I can, however, give you an example from Jaguar (~somewhat comparable to 68060) where I implemented both codepaths for my flatshading 3D engine.

Jag's Blitter was significantly faster than a 26.6 MHz RISC GPU (which also has a 64-bit RAM access).

I have completely parallelized all 3d pipeline stages to completely hide the Blitter latency (dozen refactors, really). So, Clearing 640x240x16bit framebuffer (300 KB!) was , really, free. No Blitter wait, as GPU continued to process the 3D scene.

On some reasonably complex scenes, my Blitter wait time was under 8%, still at 60 fps.

The way to do that is:
foreach scanline in polygon
- DrawScanlineViaBlitter
- Compute EndPoints of Next Scanline

Meaning, while Blitter is busy drawing the scanline, RISC GPU is busy (in parallel) traversing the edges of a polygon and computing the points for next scanline (plus clipping).


Obviously, it goes without saying, that to beat the 26.6 MHz RISC GPU by a 680X0 you would need to have a really fast 680X0. For sure, a mere 50 MHz 030 would still be very much behind and Blitter would be loooong done with current scanline.

I did implement a SW codepath on jag's 13.3 MHz 68000, but it was over an order of magnitude slower than the 26.6 MHz RISC GPU, which despite the 3-stage RISC pipeline, still executed at a rate of ~0.7 ops per cycle. So, good luck getting a performance like that from anything other than 68060...
In other words, even if the Jaguar's 68000 was running at 133 MHz, it still wouldn't be able to do the rasterizing faster than the 26.6 MHz RISC GPU (for same scenes).

If I recall correctly, somewhere around the 400 MHz (~30:1 ratio), there was the point of equilibrium. Yes, you would need roughly a 400 MHz 68000 to beat the 26.6 MHz RISC in polygon drawing.
To be fair, that wasn't my first version of polygon rasterizer. More, like, 10th, so it was really fast - forget floating point, forget fixed-point even ! It was all just an Integer codepath which executed fully from within the internal 4 KB GPU cache without any external RAM access for computation (only drawing). Not an easy thing to do on a RISC. Just go and see how much code you can fit into 4 KB (less than 2,048 instructions - more like 1,500).


Of course, the above is based on Jaguar's Blitter. I never benchmarked Amiga's Blitter and probably never will.


The principle stays the same : Do the work and benchmark it yourself on the target HW. Then you'll have exact, scientific data, for your particular use case scenario.

Real Quick and Easy

Last edited by VladR; 18 April 2020 at 01:42.
VladR is offline  
Old 18 April 2020, 02:02   #7
VladR
Registered User

 
Join Date: Dec 2019
Location: North Dakota
Posts: 235
Going back to your original question - whether faster CPU can obsolete Blitter - of course it can, it all depends on speed.

Circling back to my Atari Jaguar example - we need roughly 400 MHz 68000 to equal 26.6 MHz RISC in polygon rasterizing.

The RISC GPU is still significantly slower than Jaguar's Blitter, though.

I don't have exact data by hand here, but we definitely would have to cross the 500 MHz threshold for 68000 to at least match the 26.6 MHz RISC + Blitter combo.


A 68060 would probably only need to be somewhere around ~133 MHz to match it, though.
VladR is offline  
Old 21 April 2020, 14:31   #8
AmigaHope
Registered User
 
Join Date: Sep 2006
Location: New Sandusky
Posts: 725
I think OP was talking about the Amiga blitter in particular, and the answer is yes, it's pretty much obsolete if you have an 030/25 or better on an A3000 or AGA system, and an 040 or better on a non-A3000 ECS or OCS system.

The issue is that the blitter is slow even on OCS, and the blitter on A3000 ECS and AGA is not any faster despite the better bandwidth to chip RAM. The CPU can easily outperform the blitter on pretty much any operation, albeit with the hit of talking to chipmem (this is why on A3000 ECS and AGA the win is bigger for CPU). So if you want to e.g. draw the most bitmap objects possible, you'd go with CPU instead of blitter.

The one caveat is that you can still use the blitter to offload some tasks from the CPU that are still within the blitter's capability to do in one frame. For instance, one common approach was to use the blitter to clear a bitmap in the background before the CPU drew on it.

Essentially, the blitter is still a win if it can do the operation you want in the time that the CPU is busy doing other stuff, but if the limiting factor in your code is how fast data is being manipulated in chip memory, then the CPU will do it better than the blitter.

Also shame on Commodore for not upgrading the blitter in AGA. =(
AmigaHope is offline  
Old 21 April 2020, 15:23   #9
olleharstedt_wo
Registered User

 
Join Date: Apr 2020
Location: Hamburg, DE
Posts: 10
Yeah, clearing a bitmap is something I can see being useful.

> Also shame on Commodore for not upgrading the blitter in AGA. =(

Agree. :|
olleharstedt_wo is offline  
Old 21 April 2020, 17:46   #10
zero
Registered User

 
Join Date: Jun 2016
Location: UK
Posts: 350
Even an 030 can out perform the blitter sometimes, or even the A1200 020 in very limited circumstances.

The key is fitting everything in the cache or at least in fast RAM if you have it.

But it depends on the application, it might be better to use the blitter and CPU together if you can.
zero is offline  
Old 21 April 2020, 20:08   #11
AmigaHope
Registered User
 
Join Date: Sep 2006
Location: New Sandusky
Posts: 725
tl;dr the blitter is a win on 68000 systems. It's still a win on OCS and non-A3000 ECS systems with fastmem if you can avoid overdraw. On A3000 and AGA it is almost always a losing proposition to use the blitter for anything beyond very simple tasks with very carefully crafted code.
AmigaHope is offline  
Old 21 April 2020, 20:19   #12
roondar
Registered User

 
Join Date: Jul 2015
Location: The Netherlands
Posts: 2,068
AGA without fast memory definitely benefits from the Blitter. The 68020 is not fast enough to beat it without having fast memory.
roondar is offline  
Old 21 April 2020, 23:57   #13
aeberbach
Registered User

 
Join Date: Mar 2019
Location: Melbourne, Australia
Posts: 219
I used to use a little program to turn off the blitter when using a 68020 on my old A2500. In a terminal you could see the individual bitplanes scrolling individually resulting in white text on a dark background having coloured fringing when moving. Letting the 68020 take care of it completely fixed that. So I think your answer is definitely "sometimes".
aeberbach is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Puzzling: Blitter line drawing faster on second execution Yragael Coders. Asm / Hardware 10 02 June 2019 22:52
When is the 68k processor faster then the blitter at copying memory redblade Coders. Asm / Hardware 20 08 May 2019 22:57
Make Window Refresh Faster? AGS Coders. System 4 06 January 2014 17:05
Anything to make A600 IDE go faster? Photon support.Hardware 6 18 October 2009 18:31
Can I make WinUAE faster? (loading time and such) EssKung support.WinUAE 15 29 May 2007 11:59

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 13:07.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, vBulletin Solutions Inc.
Page generated in 0.08857 seconds with 15 queries