05 March 2024, 11:38 | #41 | ||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,796
|
Quote:
Quote:
|
||
05 March 2024, 12:01 | #42 | |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,234
|
Quote:
However, even if the chip ram were chunky, the bandwidth is still weak and you are dealing with uncached memory accesses. Rendering tends to be per pixel, you have a slow bus. Heaven help you if you want to do any transparency or other effects that require reading a pixel and replacing it with a new value. |
|
05 March 2024, 13:35 | #43 | |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,247
|
Quote:
But that's the point - you don't render with the CPU if you don't have to. With C2P, you are burning CPU cycles. |
|
05 March 2024, 14:17 | #44 |
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,796
|
|
05 March 2024, 14:28 | #45 |
Thalion Webshrine
Join Date: Jan 2004
Location: Oxford
Posts: 14,396
|
Mr.Moderator. Can someone clean up this thread, backtrack to where it left the technical discussions about C2P speed on AGA/ECS and put everything after that into any other Wot-if thread? Or the bin? Ta.
|
05 March 2024, 14:33 | #46 |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,247
|
What would you answer if the original question would be "how fast can you copy memory"? The answer is "it depends". Here the answer is "by the speed of available bandwidth". The *real* answer is: "avoid it if you can, because it will always cost time." |
05 March 2024, 14:47 | #47 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,234
|
@ThoR
I think the point that you should not be using the CPU to do what your video hardware could/should is very well understood, but the fact is we are dealing with a specific problem domain where, for whatever reason, we are dealing with CPU rendering. That could be for any reason: Video decode, an oldschool FPS game, an emulator, etc. If you are calculating a frame, pixel by pixel, potentially drawing things that aren't easily broken down into simple sequential planar writes, you are probably going to end up to use C2P. Even when you can do planar writes in spans, just look at TFX. The single biggest bottleneck was trying to render to chip ram. Rendering the same planar data in fast ram and moving said data to chip dramatically improved the speed. If you are generating chunky pixels, you now have to pay a conversion tax to get them displayed and the less you have to pay for any given number the better. Back to the original question, it generally consumes 100% CPU (as in you can't get anything else meaningful done in the meantime). The only variable is, how long does it take, per frame? |
05 March 2024, 16:04 | #48 | |
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,796
|
Quote:
Yes, but why use C2P when you don't need it? Seems a bit obvious. |
|
10 March 2024, 02:27 | #49 |
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,625
|
As a rough baseline, a 320x200x8bpp ready buffer takes roughly 1 frame with the CPU, or maybe a bit less on a fast 060. This means that there's little or no time left to render something interesting in 8 bits at full frame rate.
Graphics cards on a bus with no acceleration would be better or worse depending on the bus bandwidth vs. specific Amiga model motherboard chip RAM bandwidth from the CPU. Reverts to copyspeed. Motorola CPU cards with VDU as North Bridge would bypass the CPU-external bus and is the only way to beat bus copyspeed. IDK if SCSI modules could be adapted to become VDUs or if they're fast enough, but I estimate 1 in 20 or less Blizzard cards have modules. If TF1260 could be modded this could be more promising as it has sold in high numbers. For 3D and calculations, the fastest way is ofc a GPU with everything on it, and CPU just passing object data and handing off all rendering to that PU. Unfortunately this results in a great cost (maybe not now?) but the bigger problem is that it must be standardized and devs incentivized to make games and other real-time applications for it that need full framerate. A cheap GPU, say 300 EUR would probably take 10 years to reach out to a wide enough audience. Big box Amigas bar A2000 are very rare, and the question is where the GPU would sit in the much more abundant wedge Amigas. |
10 March 2024, 03:58 | #50 | |
Registered User
Join Date: Dec 2019
Location: North Dakota
Posts: 741
|
Quote:
1. Allow user the option of having 50% of screen covered in HUD (cockpit view in racing games, flight/space simulators) thus reducing that cost to 50% and leaving the user with 2 options : 50/60 fps (but with half screen spent on HUD) and 30 fps (full-screen). 2. Just optimize/tweak the 3D scene to a 25/30 fps lock (but no frame-drops!). Yes, this means that for most of the time, the CPU is unused to handle the rare spikes (which would otherwise drop the framerate below 25/30). But plenty recent racing games had a 30-fps lock on consoles and that's fine. |
|
10 March 2024, 04:20 | #51 | |
Registered User
Join Date: Dec 2019
Location: North Dakota
Posts: 741
|
Quote:
One of the features is that I allow to run benchmark without _LVOWritePixelArray (CyberGraphXBase) - meaning it benchmarks raw CPU throughput without any chipram malarkey (you just don't see it on screen, but the frame is fully rendered internally in a loop). Is RTG driver doing something else during that call other than C2P ? The benchmark spits out 2 numbers - with and without the RTG call, so it's very easy to instantly find out how long it takes. Obviously, no VSYNC. I'm just wondering if the _LVOWritePixelArray () doesn't do a bunch of other things that would skew the C2P results? Do cards like ZZ9000 even do C2P ? Don't they bypass C2P / chipram completely with their own [presumably] RTG video-out ? Maybe I'm mixing that with the PiStorm, though... |
|
10 March 2024, 05:28 | #52 |
Registered User
Join Date: Jul 2017
Location: San Jose
Posts: 664
|
Assumed one manages to transfer around 10mb/s to the RTG card across the ZIII bus, a 320x200 image costs ca 6ms to transfer, leaving ca 10ms for the game to render everything at 60fps.
One thing that could have helped greatly is if the graphics card could do DMA transfers on its own via busmastering. This way you could hide the transfer while the CPU renders the next frame already (assumed that the transfer does not saturate the fastmem bandwidth). I’m not aware of any VGA chip if that era that could do this, though (with maybe the exception of the s3 virge?) - this is generally something that came up later with the first 3D cards. In case of C2P the equivalent would have been a DMA engine that can fetch from Fastmem into chipmem fully autonomous and do the conversion on the fly. |
10 March 2024, 06:06 | #53 | |
Registered User
Join Date: Mar 2018
Location: Hastings, New Zealand
Posts: 2,649
|
Quote:
But this requires specialized hardware. The way things go these days it would be stupidly expensive and hard to buy, then go out of production due to lack of chips or interest from the designer. IMO it's better to concentrate on developing code that works with all existing hardware. That way anybody with an Amiga can make use of it. Maybe it's not as efficient as dedicated hardware, but it has the advantages of being 'free' and a lot more inclusive. It is also truer to the retro spirit. We can imagine it being done back in 1992, showing what the Amiga was really capable of! I suspect that many games using chunky pixels are still not fully optimized. For example Doom does c2p on the whole screen even when running in a smaller game window. What it should be doing is not redrawing the (static) border, and updating the status panel at a lower rate (perhaps using the blitter and bitplane graphics). With these changes I bet the frame rate could be increased significantly. |
|
10 March 2024, 06:35 | #54 | |
Registered User
Join Date: Mar 2018
Location: Hastings, New Zealand
Posts: 2,649
|
Quote:
What's more important is having a consistent frame rate that you can tune your reactions to. I had Quake on my A3000 with 50MHz 060, and what spoiled it was dramatic slowdowns when engaging enemies - just when you didn't need it. Maybe less particle stuff and reduced detail in the enemies would have helped. But of course nobody tried this because it would mean modifying game assets, a much harder job than just increasing hardware performance. When the frame rate is locked there is more incentive to make the code consistently keep up, rather than just trying to get the fastest speed you can. I suspect the desire for even higher frame rates largely comes from wanting to eliminate annoying slowdowns. |
|
10 March 2024, 07:10 | #55 | |
Registered User
Join Date: Dec 2019
Location: North Dakota
Posts: 741
|
Quote:
The problem isn't necessarily that the code is unoptimized (of course, there's a lot of that). Problem is, that for a given HW, the scene complexity was always considered as "playable" by choosing the least-complex 3D scene/room. Then, like in your example, you add enemies in Quake (or more complex room), and framerate obviously drops drastically, as there's no such thing as two equally CPU-intensive 3D-engine frames. The solution is to run benchmark on entire game, pick the slowest room, lock the framerate to that (or butcher the scene complexity on such rooms). But, of course, since other rooms could run at 50-200% higher framerate, nobody does that and then the framerate is all over the place, resulting in grossly suboptimal experience for many of us... Technically, Carmack did a significant optimization with Quake using the BeamTree approach, which halved the framerate in best-case scenario but avoided brutal framedrops (though their definition of "brutal" differs from ours). Still, they should have raised the min.req. as even on my Pentium, it became quickly unplayable... As a funny anecdote, I really enjoyed Quake 1 completely for the first time only on PS4, as I could play it in FullHD without framedrops. ~Quarter century later, but hey... |
|
10 March 2024, 08:27 | #56 | |
Registered User
Join Date: Dec 2019
Location: Ur, Atlantis
Posts: 1,974
|
All threads lead to Doom
Quote:
Forcing the lowest-denominator frame lock would be a pretty crazy move since hardware setups were different, most people prefer to have it maxed wherever possible, and generally it wouldn't be as bad as you describe. And you could of course control it via console anyway (cl_maxfps if I remember correctly). |
|
10 March 2024, 09:30 | #57 | |
HOL/FTP busy bee
Join Date: Sep 2006
Location: Germany
Age: 46
Posts: 31,728
|
Quote:
|
|
10 March 2024, 09:42 | #58 | |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,247
|
Quote:
No graphics card does C2P because that operation does not make sense in a chunky world. P96 has a primitive for P2C, and that conversion is offered by many graphic cards as also the windows API needs something similar to expand 1-bitplane wide graphics (as for example for drawing text) to a chunky frame buffer. |
|
10 March 2024, 09:51 | #59 | ||
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,247
|
Quote:
Quote:
What is a lot more practical is to have a chunky mode directly in Denise. It is the same amount of data it had to fetch, and it would be even simpler as it does not have to interleave the accesses for the individual bitplanes. |
||
10 March 2024, 11:02 | #60 |
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,796
|
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Selling A3660 CPU card, including Rev 5 CPU - NEW - professionally built | tbtorro | MarketPlace | 1 | 17 June 2018 19:14 |
Blitter C2P? How? | Samurai_Crow | Coders. Asm / Hardware | 21 | 24 April 2018 19:12 |
Any C2P experts here? | oRBIT | Coders. General | 36 | 27 April 2010 07:26 |
C2P....help! | NovaCoder | Coders. General | 8 | 17 December 2009 00:15 |
Game in c2p? | oRBIT | Amiga scene | 11 | 01 February 2007 21:28 |
|
|