English Amiga Board


Go Back   English Amiga Board > Main > Amiga scene

 
 
Thread Tools
Old 05 March 2024, 11:38   #41
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,796
Quote:
Originally Posted by Thomas Richter View Post
Yes, you can achieve "copy speed", but that's because "copy speed" is slow. The trick is to avoid the copy in first place, and let the chipset do the work. But that's not possible due to an architecture bound to planar.
This isn't another what if thread. It just somehow got derailed. This is the original post:
Quote:
Originally Posted by lmimmfn View Post
I'm curious with the most generic C2P routine(not optimized for Edge cases etc.) how much CPU it consumes across the different Amiga range and CPUs, I realise the Chip RAM throughput is only half on 16bit machines vs 32bit, but dies anyone have any benchmarks on CPU performance across the Amiga range and CPUs?

I thought it might be interested vs top level Intel performance.
Thsnks
Thorham is offline  
Old 05 March 2024, 12:01   #42
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,234
Quote:
Originally Posted by Thomas Richter View Post
Yes, you can achieve "copy speed", but that's because "copy speed" is slow. The trick is to avoid the copy in first place, and let the chipset do the work. But that's not possible due to an architecture bound to planar.
That's a given, which is the reason for the original question.

However, even if the chip ram were chunky, the bandwidth is still weak and you are dealing with uncached memory accesses. Rendering tends to be per pixel, you have a slow bus. Heaven help you if you want to do any transparency or other effects that require reading a pixel and replacing it with a new value.
Karlos is offline  
Old 05 March 2024, 13:35   #43
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,247
Quote:
Originally Posted by Karlos View Post
However, even if the chip ram were chunky, the bandwidth is still weak and you are dealing with uncached memory accesses. Rendering tends to be per pixel, you have a slow bus. Heaven help you if you want to do any transparency or other effects that require reading a pixel and replacing it with a new value.

But that's the point - you don't render with the CPU if you don't have to. With C2P, you are burning CPU cycles.
Thomas Richter is offline  
Old 05 March 2024, 14:17   #44
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,796
Quote:
Originally Posted by Thomas Richter View Post
But that's the point - you don't render with the CPU if you don't have to. With C2P, you are burning CPU cycles.
What does that have to do with the original question?
Thorham is offline  
Old 05 March 2024, 14:28   #45
alexh
Thalion Webshrine
 
alexh's Avatar
 
Join Date: Jan 2004
Location: Oxford
Posts: 14,396
Mr.Moderator. Can someone clean up this thread, backtrack to where it left the technical discussions about C2P speed on AGA/ECS and put everything after that into any other Wot-if thread? Or the bin? Ta.
alexh is offline  
Old 05 March 2024, 14:33   #46
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,247
Quote:
Originally Posted by Thorham View Post
What does that have to do with the original question?

What would you answer if the original question would be "how fast can you copy memory"? The answer is "it depends". Here the answer is "by the speed of available bandwidth". The *real* answer is: "avoid it if you can, because it will always cost time."
Thomas Richter is offline  
Old 05 March 2024, 14:47   #47
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,234
@ThoR

I think the point that you should not be using the CPU to do what your video hardware could/should is very well understood, but the fact is we are dealing with a specific problem domain where, for whatever reason, we are dealing with CPU rendering. That could be for any reason: Video decode, an oldschool FPS game, an emulator, etc.

If you are calculating a frame, pixel by pixel, potentially drawing things that aren't easily broken down into simple sequential planar writes, you are probably going to end up to use C2P. Even when you can do planar writes in spans, just look at TFX. The single biggest bottleneck was trying to render to chip ram. Rendering the same planar data in fast ram and moving said data to chip dramatically improved the speed.

If you are generating chunky pixels, you now have to pay a conversion tax to get them displayed and the less you have to pay for any given number the better.

Back to the original question, it generally consumes 100% CPU (as in you can't get anything else meaningful done in the meantime). The only variable is, how long does it take, per frame?
Karlos is offline  
Old 05 March 2024, 16:04   #48
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,796
Quote:
Originally Posted by Thomas Richter View Post
What would you answer if the original question would be "how fast can you copy memory"?
But that's not the question. The question is about C2P CPU time consumption on various Amiga systems. This can just be measured per system.

Quote:
Originally Posted by Thomas Richter View Post
The *real* answer is: "avoid it if you can, because it will always cost time."
Yes, but why use C2P when you don't need it? Seems a bit obvious.
Thorham is offline  
Old 10 March 2024, 02:27   #49
Photon
Moderator
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,625
As a rough baseline, a 320x200x8bpp ready buffer takes roughly 1 frame with the CPU, or maybe a bit less on a fast 060. This means that there's little or no time left to render something interesting in 8 bits at full frame rate.

Graphics cards on a bus with no acceleration would be better or worse depending on the bus bandwidth vs. specific Amiga model motherboard chip RAM bandwidth from the CPU. Reverts to copyspeed.

Motorola CPU cards with VDU as North Bridge would bypass the CPU-external bus and is the only way to beat bus copyspeed. IDK if SCSI modules could be adapted to become VDUs or if they're fast enough, but I estimate 1 in 20 or less Blizzard cards have modules. If TF1260 could be modded this could be more promising as it has sold in high numbers.

For 3D and calculations, the fastest way is ofc a GPU with everything on it, and CPU just passing object data and handing off all rendering to that PU. Unfortunately this results in a great cost (maybe not now?) but the bigger problem is that it must be standardized and devs incentivized to make games and other real-time applications for it that need full framerate. A cheap GPU, say 300 EUR would probably take 10 years to reach out to a wide enough audience. Big box Amigas bar A2000 are very rare, and the question is where the GPU would sit in the much more abundant wedge Amigas.
Photon is offline  
Old 10 March 2024, 03:58   #50
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
Quote:
Originally Posted by Photon View Post
As a rough baseline, a 320x200x8bpp ready buffer takes roughly 1 frame with the CPU, or maybe a bit less on a fast 060. This means that there's little or no time left to render something interesting in 8 bits at full frame rate.
Yes, but there's 2 solutions to this problem:
1. Allow user the option of having 50% of screen covered in HUD (cockpit view in racing games, flight/space simulators) thus reducing that cost to 50% and leaving the user with 2 options : 50/60 fps (but with half screen spent on HUD) and 30 fps (full-screen).

2. Just optimize/tweak the 3D scene to a 25/30 fps lock (but no frame-drops!). Yes, this means that for most of the time, the CPU is unused to handle the rare spikes (which would otherwise drop the framerate below 25/30). But plenty recent racing games had a 30-fps lock on consoles and that's fine.
VladR is offline  
Old 10 March 2024, 04:20   #51
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
Quote:
Originally Posted by Thomas Richter View Post
Add real measurements - what about the same engine on a RTG graphics card - with chunky memory and a chunky blitter? That it something CBM could have done, if they just haven't been asleep. Reinterpreting the planar data from Agnus in chunky is not exactly rocket science.
I do have a question about that. During last 2 weekends I've configured my WinUAE dev set-up on my new PC and restarted working on the RTG Benchmark Demo (which runs on my flatshader engine that I so far tested only on Vampire V2/V4 - though the code is 040-only ATM).

One of the features is that I allow to run benchmark without _LVOWritePixelArray (CyberGraphXBase) - meaning it benchmarks raw CPU throughput without any chipram malarkey (you just don't see it on screen, but the frame is fully rendered internally in a loop).

Is RTG driver doing something else during that call other than C2P ?

The benchmark spits out 2 numbers - with and without the RTG call, so it's very easy to instantly find out how long it takes. Obviously, no VSYNC.

I'm just wondering if the _LVOWritePixelArray () doesn't do a bunch of other things that would skew the C2P results?



Do cards like ZZ9000 even do C2P ? Don't they bypass C2P / chipram completely with their own [presumably] RTG video-out ? Maybe I'm mixing that with the PiStorm, though...
VladR is offline  
Old 10 March 2024, 05:28   #52
pipper
Registered User
 
Join Date: Jul 2017
Location: San Jose
Posts: 664
Assumed one manages to transfer around 10mb/s to the RTG card across the ZIII bus, a 320x200 image costs ca 6ms to transfer, leaving ca 10ms for the game to render everything at 60fps.
One thing that could have helped greatly is if the graphics card could do DMA transfers on its own via busmastering. This way you could hide the transfer while the CPU renders the next frame already (assumed that the transfer does not saturate the fastmem bandwidth). I’m not aware of any VGA chip if that era that could do this, though (with maybe the exception of the s3 virge?) - this is generally something that came up later with the first 3D cards.

In case of C2P the equivalent would have been a DMA engine that can fetch from Fastmem into chipmem fully autonomous and do the conversion on the fly.
pipper is offline  
Old 10 March 2024, 06:06   #53
Bruce Abbott
Registered User
 
Bruce Abbott's Avatar
 
Join Date: Mar 2018
Location: Hastings, New Zealand
Posts: 2,649
Quote:
Originally Posted by pipper View Post
In case of C2P the equivalent would have been a DMA engine that can fetch from Fastmem into chipmem fully autonomous and do the conversion on the fly.
Would be great to have an Akiko style c2p converter that did that. Or it could just have DMA on the output side, then the CPU simply has to stuff it with chunky pixels which get (slowly) written to ChipRAM via DMA while the CPU bus is freed up for other stuff.

But this requires specialized hardware. The way things go these days it would be stupidly expensive and hard to buy, then go out of production due to lack of chips or interest from the designer.

IMO it's better to concentrate on developing code that works with all existing hardware. That way anybody with an Amiga can make use of it. Maybe it's not as efficient as dedicated hardware, but it has the advantages of being 'free' and a lot more inclusive. It is also truer to the retro spirit. We can imagine it being done back in 1992, showing what the Amiga was really capable of!

I suspect that many games using chunky pixels are still not fully optimized. For example Doom does c2p on the whole screen even when running in a smaller game window. What it should be doing is not redrawing the (static) border, and updating the status panel at a lower rate (perhaps using the blitter and bitplane graphics). With these changes I bet the frame rate could be increased significantly.
Bruce Abbott is offline  
Old 10 March 2024, 06:35   #54
Bruce Abbott
Registered User
 
Bruce Abbott's Avatar
 
Join Date: Mar 2018
Location: Hastings, New Zealand
Posts: 2,649
Quote:
Originally Posted by VladR View Post
2. Just optimize/tweak the 3D scene to a 25/30 fps lock (but no frame-drops!). Yes, this means that for most of the time, the CPU is unused to handle the rare spikes (which would otherwise drop the framerate below 25/30). But plenty recent racing games had a 30-fps lock on consoles and that's fine.
25 fps is plenty high enough for most games IMO. Even 17 fps (three 50Hz frames) is fine for games like Doom.

What's more important is having a consistent frame rate that you can tune your reactions to. I had Quake on my A3000 with 50MHz 060, and what spoiled it was dramatic slowdowns when engaging enemies - just when you didn't need it. Maybe less particle stuff and reduced detail in the enemies would have helped. But of course nobody tried this because it would mean modifying game assets, a much harder job than just increasing hardware performance.

When the frame rate is locked there is more incentive to make the code consistently keep up, rather than just trying to get the fastest speed you can. I suspect the desire for even higher frame rates largely comes from wanting to eliminate annoying slowdowns.
Bruce Abbott is offline  
Old 10 March 2024, 07:10   #55
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
Quote:
Originally Posted by Bruce Abbott View Post
25 fps is plenty high enough for most games IMO. Even 17 fps (three 50Hz frames) is fine for games like Doom.

What's more important is having a consistent frame rate that you can tune your reactions to. I had Quake on my A3000 with 50MHz 060, and what spoiled it was dramatic slowdowns when engaging enemies - just when you didn't need it. Maybe less particle stuff and reduced detail in the enemies would have helped. But of course nobody tried this because it would mean modifying game assets, a much harder job than just increasing hardware performance.

When the frame rate is locked there is more incentive to make the code consistently keep up, rather than just trying to get the fastest speed you can. I suspect the desire for even higher frame rates largely comes from wanting to eliminate annoying slowdowns.
Yes, 17 fps in Doom, if it was frame-locked, would be plenty. But the first framedrop always killed it for me, even 30 yrs ago.

The problem isn't necessarily that the code is unoptimized (of course, there's a lot of that).
Problem is, that for a given HW, the scene complexity was always considered as "playable" by choosing the least-complex 3D scene/room.
Then, like in your example, you add enemies in Quake (or more complex room), and framerate obviously drops drastically, as there's no such thing as two equally CPU-intensive 3D-engine frames.

The solution is to run benchmark on entire game, pick the slowest room, lock the framerate to that (or butcher the scene complexity on such rooms).

But, of course, since other rooms could run at 50-200% higher framerate, nobody does that and then the framerate is all over the place, resulting in grossly suboptimal experience for many of us...

Technically, Carmack did a significant optimization with Quake using the BeamTree approach, which halved the framerate in best-case scenario but avoided brutal framedrops (though their definition of "brutal" differs from ours).
Still, they should have raised the min.req. as even on my Pentium, it became quickly unplayable...
As a funny anecdote, I really enjoyed Quake 1 completely for the first time only on PS4, as I could play it in FullHD without framedrops. ~Quarter century later, but hey...
VladR is offline  
Old 10 March 2024, 08:27   #56
dreadnought
Registered User
 
Join Date: Dec 2019
Location: Ur, Atlantis
Posts: 1,974
All threads lead to Doom
Quote:
Originally Posted by VladR View Post
As a funny anecdote, I really enjoyed Quake 1 completely for the first time only on PS4, as I could play it in FullHD without framedrops. ~Quarter century later, but hey...
Whatever rocks your boat, but we played Quake 1/2 competitively in the late 90s on whatever hardware/connection was available (mostly low end Pentiums and Celerons) and still having hell of a time.

Forcing the lowest-denominator frame lock would be a pretty crazy move since hardware setups were different, most people prefer to have it maxed wherever possible, and generally it wouldn't be as bad as you describe. And you could of course control it via console anyway (cl_maxfps if I remember correctly).
dreadnought is offline  
Old 10 March 2024, 09:30   #57
TCD
HOL/FTP busy bee

 
TCD's Avatar
 
Join Date: Sep 2006
Location: Germany
Age: 46
Posts: 31,728
Quote:
Originally Posted by VladR View Post
Technically, Carmack did a significant optimization with Quake using the BeamTree approach, which halved the framerate in best-case scenario but avoided brutal framedrops (though their definition of "brutal" differs from ours).
Quite an interesting article about that by Micheal Abrash: https://www.bluesnews.com/abrash/chap64.shtml
TCD is offline  
Old 10 March 2024, 09:42   #58
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,247
Quote:
Originally Posted by VladR View Post
I'm just wondering if the _LVOWritePixelArray () doesn't do a bunch of other things that would skew the C2P results?
I do not know what CGFfx does. I can only tell you what P96 does. The corresponding P96 function creates from your source data a transient chunky bitmap and then runs into BltBitMapRastPort, which performs the usual clipping at layer boundaries. Within each rectangle to copy,it runs into BltBitMap(), which at the low level, performs a memory copy if the target is chunky, or a C2P conversion if the target is planar.


Quote:
Originally Posted by VladR View Post
Do cards like ZZ9000 even do C2P ? Don't they bypass C2P / chipram completely with their own [presumably] RTG video-out ? Maybe I'm mixing that with the PiStorm, though...
No graphics card does C2P because that operation does not make sense in a chunky world. P96 has a primitive for P2C, and that conversion is offered by many graphic cards as also the windows API needs something similar to expand 1-bitplane wide graphics (as for example for drawing text) to a chunky frame buffer.
Thomas Richter is offline  
Old 10 March 2024, 09:51   #59
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,247
Quote:
Originally Posted by pipper View Post
One thing that could have helped greatly is if the graphics card could do DMA transfers on its own via busmastering. This way you could hide the transfer while the CPU renders the next frame already (assumed that the transfer does not saturate the fastmem bandwidth). I’m not aware of any VGA chip if that era that could do this, though (with maybe the exception of the s3 virge?) - this is generally something that came up later with the first 3D cards.
Back then, DMA from the host was a relatively rare feature, and from the VGA chips I had written drivers for, none had support for that. There is a single exception, namely the A2410 card, which could do busmastering and let the TMS34010 access Amiga Zorro RAM through DMA - at least in principle. In practise, the DMA logic on the board is broken and it does not work.



Quote:
Originally Posted by pipper View Post
In case of C2P the equivalent would have been a DMA engine that can fetch from Fastmem into chipmem fully autonomous and do the conversion on the fly.
What you describe is a blitter mode that does such a conversion, though the Amiga blitter is relatively poorly prepared for that. For a (speedy) C2P conversion, it would need to have 8 destination channels, not only one. Of course you can use the blitter "as is" for C2P running serially over all bitplanes by moving the right bits out of the source - and I had even done this a while ago for VideoEasel. The result is not very fast, but it runs in parallel to the CPU which can do more useful things while the blitter is busy.


What is a lot more practical is to have a chunky mode directly in Denise. It is the same amount of data it had to fetch, and it would be even simpler as it does not have to interleave the accesses for the individual bitplanes.
Thomas Richter is offline  
Old 10 March 2024, 11:02   #60
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,796
Quote:
Originally Posted by dreadnought View Post
All threads lead to Doom
What's this obsession with Doom about anyway
Thorham is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Selling A3660 CPU card, including Rev 5 CPU - NEW - professionally built tbtorro MarketPlace 1 17 June 2018 19:14
Blitter C2P? How? Samurai_Crow Coders. Asm / Hardware 21 24 April 2018 19:12
Any C2P experts here? oRBIT Coders. General 36 27 April 2010 07:26
C2P....help! NovaCoder Coders. General 8 17 December 2009 00:15
Game in c2p? oRBIT Amiga scene 11 01 February 2007 21:28

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 04:12.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.10052 seconds with 13 queries