![]() |
![]() |
#1 |
Registered User
Join Date: May 2018
Location: Ireland
Posts: 691
|
How much CPU does C2P consume?
I'm curious with the most generic C2P routine(not optimized for Edge cases etc.) how much CPU it consumes across the different Amiga range and CPUs, I realise the Chip RAM throughput is only half on 16bit machines vs 32bit, but dies anyone have any benchmarks on CPU performance across the Amiga range and CPUs?
I thought it might be interested vs top level Intel performance. Thsnks |
![]() |
![]() |
#2 |
Registered User
Join Date: Sep 2007
Location: Melbourne/Australia
Posts: 4,416
|
Almost nothing for an 060 (esp. overclocked), most of my old ports run at about the same FPS for both AGA and RTG.
|
![]() |
![]() |
#3 |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,307
|
Well C2P takes all the CPU time it can get, of course, though if the target is the chip memory, then typically the bottleneck is the interface to the chip memory - if this is what you ask. That does not mean that it comes for free compared to a "native chunky" display. Actually, in the latter case the CPU would only render parts of the screen - the objects to animate - but with C2P, it would typically convert the entire frame buffer. Making C2P only for an object or a part of the frame costs also overhead.
This is neither a matter of "intel vs. 68k" - it is more a matter of the data organization and the available bandwidths. Thus - would a native display in chunky be faster, even with the same bottleneck? Yes, but not because the C2P would go away, but because you would not have to touch the entire screen and move less data around. Would it be faster with a faster chip memory bandwidth? Yes, most definitely, and at that point C2P would be the bottleneck, not the memory bandwidth. With the bandwidths available, it does not make a difference whether you copy an entire frame planar to planar, or convert an entire frame with CP2 from chunky to planar, but that's not because the conversion is "for free", but because the chip memory is so slow. |
![]() |
![]() |
#4 |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,039
|
You can check this old c2p thread and results for NONE (no writes to chip ram).
https://eab.abime.net/showthread.php...475#post967475 |
![]() |
![]() |
#5 | ||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
Quote:
There is a lot to be gained with partial c2p. Without it, my HOMM2 port would just crawl. Quote:
Then, be it rendered thru copymem or c2p, it does not matter (on 060, that is). With a significant amount of graphic operations, using a buffer in fastmem is probably faster than rendering directly to chipmem, btw - even if we had native chunky. |
||
![]() |
![]() |
#6 |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,307
|
How can rendering things twice (first to fast mem, then c2p to chip) be possibly faster than rendering only once (CPU directly to chip)?
|
![]() |
![]() |
#7 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,062
|
If you read from the buffer, or write the same pixel multiple times...
|
![]() |
![]() |
#8 | |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
Quote:
For example (taking 60ns fastmem, 50Mhz 68030 and 32-bit chipmem), let's have 3 mem accesses (8+8+8) - that's not much actually - and then copy to chip (8+26). Now compare to direct rendering (26+26+26). More mem accesses but faster. And that's on small example, on a rather modest cpu, with few operations and without counting data caches (which won't cache chipmem). |
|
![]() |
![]() |
#9 |
titan sucks!
![]() Join Date: Dec 2012
Location: munich/germany
Posts: 54
|
https://github.com/Kalmalyzer/kalms-...ee/main/normal
On the top of many .s files Kalms documents how much his routines need. |
![]() |
![]() |
#10 | |
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,840
|
Quote:
|
|
![]() |
![]() |
#11 |
Piotr
Join Date: Jul 2013
Location: Lodz/Poland
Age: 40
Posts: 207
![]() |
Dumb question - would it be possible to design an accelerator board with dedicated chip and memory to perform c2p externally? Does it make sense? ;-)
|
![]() |
![]() |
#12 | |
Registered User
Join Date: Jun 2010
Location: PL?
Posts: 2,875
|
Quote:
Also this can be interesting: https://eab.abime.net/showthread.php?t=105664 - RP2040 seem to be perfect solution for Amiga small improvements - probably RP2040 can be fast enough to deal with RGA bus (so seat on top of Denise and perform some functionality comparable to Indivision) |
|
![]() |
![]() |
#13 |
Piotr
Join Date: Jul 2013
Location: Lodz/Poland
Age: 40
Posts: 207
![]() |
Wow! Thanks for the links
![]() |
![]() |
![]() |
#14 |
Registered User
Join Date: Mar 2018
Location: Hastings, New Zealand
Posts: 2,719
|
Some real-world results on my A1200 with Blizzard 1230IV (50MHz 030) and 60ns RAM, running DoomAttack timedemo at standard window size (2 steps down from full-screen) with various copy routines:-
"c2p_optimized" (normal AGA c2p routine) - 10.1 fps "fake chunky" (copy FastRAM to ChipRAM) - 10.85 fps, 7% faster. "fake RTG" (copy FastRAM to FastRAM) - 12.65 fps, 25% faster "Fake RTG" direct (just rendering to FastRAM) - 13.27 fps, 31% faster Conclusions: - On a 50MHz 030 the c2p overhead is minimal. The improvement from having a hardware chunky mode in the AGA chipset would be practically unnoticeable (at least for Doom and similar games). - Using a 32 bit graphics card on the local CPU bus could potentially increase Doom's frame rate by up to 31%, which is a significant but not amazing improvement. Programs that spend less time calculating would benefit more. As a comparison, here are some selected Doom frame rates on various PCs:- i386 SX25 WDC VGA ISA - 3.4 fps Am386 DX40 (MX83) ISA - 8.24 fps 486DX2/66 miro 1H10AD VLB - 10.02 fps 486DX2/66 CL-GD5428 1MB VLB - 10.3 fps P100 TVGA 8800CS 512KB ISA - 11.82 fps P100 Stealth II S220 4MB PCI - 15.71 fps P100 Trident TVGA 8900D ISA - 32.35 fps P100 Bali 32 1MB PCI - 73.66 fps And here are some more on various systems:- NextStation 68040 33MHz 2-bit grayscale - 9.8 fps SPARCstation IPX MB86903 40MHz Sun GX - 10.9 fps Pentium-60 Compaq Qvision 2000+ MGA PCI - 11.8 fps Amiga1200 68040 40MHz AGA - 13.4 fps Pentium 75 S3 PCI - 23.2 fps Amiga 1200 68060 50MHz AGA - 24.6 fps Pentium-120 Diamond Viper SE PCI - 27.4 fps On faster PCs the frame rates vary greatly depending on the graphics card and bus settings. The limit for ISA bus cards appears to be 33 fps, though most were around 15 fps and some VLB and PCI machines were even slower despite having a fast 486 or Pentium CPU. |
![]() |
![]() |
#15 | |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,307
|
Quote:
So what you conclude from this "fake comparison" is wrong. It is not "oh, we don't need chunky". It is rather "speed up the damn chipset". What you would need to compare is "planar over a properly made chip ram interface" vs. "chunky over a properly made chip ram interface", and that would measure the overhead of a stupid conversion that could have been avoided if CBM had made some investments into the chips instead of "read my lips - no new chips". Thus, now go back to measurement and measure "c2p from fast to fast" vs. "direct rendering into fast". That gives you the right numbers for decisoin making for graphics modes. |
|
![]() |
![]() |
#16 | |
Registered User
Join Date: Aug 2020
Location: Namestovo/Slovakia
Posts: 17
|
Quote:
Do You have results for A1200 with Fast RAM only and may publish it? |
|
![]() |
![]() |
#17 |
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,840
|
|
![]() |
![]() |
#18 | ||||
Registered User
Join Date: Mar 2018
Location: Hastings, New Zealand
Posts: 2,719
|
Quote:
Quote:
Quote:
AGA did a good job of what it was designed to do. Everybody at Commodore (engineers and management) agreed on that - the only issue being how long it took to get out the door. AAA was different story. I think Commodore should have killed it early on and concentrated on the low-end AA (AGA) instead. For the high end they should have just put RTG into the OS and let 3rd party graphics cards fill the gap. But most of the engineers didn't want that - they wanted to go up-market with a 'VGA killer' chipset, rather than down-market where Commodore's strength lay. Quote:
|
||||
![]() |
![]() |
#19 | |
Registered User
Join Date: Mar 2018
Location: Hastings, New Zealand
Posts: 2,719
|
Quote:
If you mean everything being in FastRAM, including Doom code and data, screen memory and ROM, that's the 'Fake RTG direct' test, where it just renders to FastRAM and that's it (no screen copy, no c2p). |
|
![]() |
![]() |
#20 | |
Registered User
Join Date: May 2023
Location: Norwich
Posts: 429
|
Quote:
Everyone.at Amstrad thought the same about the GX4000 console. When you're on the inside of a project, tunnel vision can easily blind you to mistakes that are obvious to outside observers. |
|
![]() |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Selling A3660 CPU card, including Rev 5 CPU - NEW - professionally built | tbtorro | MarketPlace | 1 | 17 June 2018 19:14 |
Blitter C2P? How? | Samurai_Crow | Coders. Asm / Hardware | 21 | 24 April 2018 19:12 |
Any C2P experts here? | oRBIT | Coders. General | 36 | 27 April 2010 07:26 |
C2P....help! | NovaCoder | Coders. General | 8 | 17 December 2009 00:15 |
Game in c2p? | oRBIT | Amiga scene | 11 | 01 February 2007 21:28 |
|
|