11 December 2023, 14:34 | #121 | |
Registered User
Join Date: Aug 2010
Location: Italy
Posts: 855
|
Quote:
|
|
11 December 2023, 15:37 | #122 |
Registered User
Join Date: Aug 2010
Location: Italy
Posts: 855
|
And here are the builds to verify whether my guess is correct and, if so, what produces the best performance.
The last reports above are relative to a version that simply keeps the data cache on all the time and uses the cache-inhibition precise exception model forthe low 16 MB. These builds adopt other solutions. The files included in the archive are: * PVE-IA * PVE-IO * PVE-IU * PVE-PA (this is the one that gave the previous results) * PVE-PO * PVE-PU The naming scheme goes like this: * I = cache-inhibition Imprecise exception model for the low 16 MB * P = cache-inhibition Precise exception model for the low 16 MB * A = data cache Always on * O = data cache off during voxel rendering and On while rotating©ing the raster from FAST RAM to CHIP RAM * U = data cache off during voxel rendering and Unmanaged while rotating©ing the raster from FAST RAM to CHIP RAM If these tests run without glitches, the next step will be trying the copyback mode for FAST RAM, although I don't expect it to make any difference given that, at least for now, there are basically no repeated reads and writes on the same variables. Last edited by saimo; 11 December 2023 at 20:59. Reason: Removed attachment as I provided a newer version later. |
11 December 2023, 16:48 | #123 |
Registered User
Join Date: Jan 2019
Location: Finland
Posts: 654
|
IO and PO glitches out same way as in #102, rest of the tests are glitchless.
IA-FB 50mhz 23.705 IA-FB 100mhz 40.519 IA-BB 50mhz 26.454 IA-BB 100mhz 47.636 IU-FB 50mhz 27.138 IU-FB 100mhz 46.826 IU-BB 50mhz 31.984 IU-BB 100mhz 50.068 PU-FB 50mhz 26.000 PU-FB 100mhz 43.163 PU-BB 50mhz 31.880 PU-BB 100mhz 50.068 |
11 December 2023, 18:03 | #124 | |
Registered User
Join Date: Aug 2010
Location: Italy
Posts: 855
|
Quote:
* the problem has been found and fixed; * the imprecise model gives a slightly better performance; * turning the cache off during the voxel rendering is better (unlike on the 68030); * I managed to screw something up yet again - but my head is spinning too much at the moment due to lack of sleep, so I'll get back to fiddling with the cache later... P.S. I know that the .068 looks wrong. It kind of is. It's due to the following: after a frame has been rendered, the code waits for buffering to make a raster available for rendering (if not available yet); when the machine can render frames faster than they can be displayed, there is no available raster, as the previously rendered one in waiting to be copied to CHIP RAM and the other one has just been rendered; so the code halts for a while; at the 256th frame the benchmark exists without caring for buffering anymore, so it exits early and the figure goes beyond 50 fps; on the other hand, if it were artificially made to wait for the buffering to complete (there's a double double buffering: one for CHIP RAM buffers and one for FAST RAM buffers) without doing anything, the figure would be lower than 50 fps. |
|
11 December 2023, 20:58 | #125 |
Registered User
Join Date: Aug 2010
Location: Italy
Posts: 855
|
Fixed the bug that made the IO and PO versions act up - it wasn't related to caches, but to this sequence of instructions...
move.l (sp)+,d0 movem.l (sp)+,d3-d7/a1-a5 ... which should have been instead... movem.l (sp)+,d3-d7/a1-a5 move.l (sp)+,d0 This stupid mistake was caused by an aborted idea that I forgot to undo completely. New archive attached. EDIT: I have added to the archive also 3 builds that use copyback for FAST RAM ('C' in the name); given the previous results, they use only the imprecise exception model for the low 16 MB. Last edited by saimo; 11 December 2023 at 22:21. Reason: Removed archive as I provided a newer version later. |
11 December 2023, 21:56 | #126 |
Registered User
Join Date: Jan 2019
Location: Finland
Posts: 654
|
IO-FB 50mhz 28.014
IO-FB 100mhz 48.012 PO-FB 50mhz 26.797 PO-FB 100mhz 44.214 CU-FB 50mhz 27.254 CU-FB 100mhz 46.998 CU-BB 50mhz 32.040 CU-BB 100mhz 50.068 CO-FB 50mhz 28.134 CO-FB 100mhz 48.192 BlindBenchmark crash-reboots on IO, PO, and CO tests. |
11 December 2023, 22:21 | #127 | ||
Registered User
Join Date: Aug 2010
Location: Italy
Posts: 855
|
Quote:
Winner combination: * imprecise exception model for the low 16 MB * copyback for the rest of the address space * cache off during voxel rendering * cache on during the CHIP RAM -> FAST RAM transfer I'd be surprised if it weren't the same also on other 68060 machines. Quote:
Updated archive attached. Last edited by saimo; 17 December 2023 at 23:25. Reason: Removed attachment as I provided a newer version later. |
||
16 December 2023, 15:39 | #128 |
Registered User
Join Date: Aug 2010
Location: Italy
Posts: 855
|
klx300r made other tests. Overall, the results are mixed. The only thing in common is that the imprecise exception model for the low 16 MB is best.
RESULS RECAP Code:
A4000 + Cyberstorm MK III CA BB: 37.724 CA FB: 32.632 CO BB: 32.296 CO FB: 28.998 CU BB: 32.926 CU FB: 28.262 IA BB: 38.461 IA FB: 33.156 best IO BB: 32.926 IO FB: 28.998 IU crash PA BB: 38.461 PA FB: 32.959 PO BB: 32.926 PO FB: 28.841 PU crash Best data cache settings: * writethrough * always on A1200 + TerribleFire TF1260 (klx300r's) CA BB: 25.921 CA FB: 23.255 CO BB: 31.932 CO FB: 28.017 IA BB: 26.514 IA FB: 23.708 IO BB: 31.936 IO FB: 28.021 best Best data cache settings: * writethrough * off during voxel rendering and On while rotating©ing the raster from FAST RAM to CHIP RAM A1200 + TerribleFire TF1260 (Aardvark's) CO FB: 28.134 best CU FB: 27.254 CU BB: 32.040 IO FB: 28.014 PA BB: 26.479 PA FB: 22.855 PO FB: 26.797 Best data cache settings: * copyback * off during voxel rendering and On while rotating©ing the raster from FAST RAM to CHIP RAM Instead, the difference between keeping the cache always on or not is much bigger and "inverted", as on the CSMKIII it's best to keep it always on, whereas on the TF1260 it's best to toggle it: * CSIII: IA FB - IO FB = 33.156-28.998 = 4.158 fps * TF1260 (klx300r): IO FB - IA FB = 28.021-23.708 = 4.313 fps * TF1260 (Aardvark's): PO FB - PA FB = 26.797-22.855 = 3.942 fps I must say it bothers me that there isn't a single best setting. I'll add a command line switch to let the user choose whether to have the cache toggled or not. On another note, a couple of days ago I measured that on an A1200 without accelerator / FAST RAM the copy&rotate operation takes about 273 rasterlines. Given that the screen is 200 lines tall and that the operation starts right after the 200th line, that means that the latter ends when the beam is drawing the 273-(312-200) = 161th line of the screen, i.e. that it manages to race the beam. Therefore, given that even a stock A1200 can do it, double buffering for the CHIP RAM rasters is totally useless, so I have removed it - that not only simplifies the code and makes it lighter, but, above all, reduces the refresh lag by 1 frame. |
16 December 2023, 16:03 | #129 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,499
|
Ah, to race the beam...
Kids today have no idea what that even means. |
16 December 2023, 17:38 | #130 |
Retro Freak
Join Date: Nov 2001
Location: Slovenia
Age: 51
Posts: 1,665
|
So next is Comanche port to amiga ?
|
16 December 2023, 19:27 | #131 |
Registered User
Join Date: Oct 2007
Location: ManCave, Canada
Posts: 1,646
|
@ saimo
awesome to see the improvements |
16 December 2023, 23:19 | #132 |
Registered User
Join Date: Aug 2010
Location: Italy
Posts: 855
|
@Karlos
& @tomcat666 A port? Boooring... I like to make new stuff! Also, I don't even know if I'll ever do anything with this thing here. @klx300r Your supports helped making them @paraj @Aardvark @Lunda @modrobert I'd be happy to put your real names in the credits section of the manual. If you like the idea, please PM me. Last edited by saimo; 17 December 2023 at 14:34. |
17 December 2023, 23:23 | #133 |
Registered User
Join Date: Aug 2010
Location: Italy
Posts: 855
|
New version with a number of changes.
DATA CACHE HANDLING By default, the data cache and the 68030 data cache burst are always on and in writethrough mode. By means of the new command line switches below it is possible to make the data cache work differently. CACHECOPYBACK=CC/S: make the 68040/68060 data cache work in copyback mode CACHESWITCHING=CS/S: switch off the 68030 data cache burst or the 68040/68060 data cache while rendering the voxel BUFFERING The CHIP RAM raster is no longer double buffered. BENCHMARK There is only one benchmark option now (RUNBENCHMARK=RB/S). The benchmark works differently from before: it renders and copies to CHIP RAM 256 frames without displaying anything and without synchronizing with the raster beam, so that the figures represent exclusively and precisely the performance related to the generation of graphics. Therefore, on powerful machines the figures can be crazily high. Any help with filling this table is welcome Code:
| DATE CACHE MODE | -----------------------------+--------+--------+--------+--------+------ AMIGA | ACCELERATOR BOARD | D | C | S | C+S | NOTE -------+---------------------+--------+--------+--------+--------+------ 1200 | ? | | - | - | - | 1 1200 | Blizzard 1230 IV | 21.129 | - | 21.241 | - | 2 1200 | Blizzard 1260 | | | | | 3 1200 | PiStorm32 | | | | | 4 1200 | TerribleFire TF1260 | | | | | 3 1200 | TerribleFire TF1260 | | | | | 5 4000 | Cyberstorm MK III | | | | | 3 CD³² | The Beast 030 | | | | | 7 DATA CACHE MODE: D = Default (always on + writethrough) C = Copyback S = Switching NOTE 1. FAST RAM only 2. 68030 50 MHz, RAM 60 ns 3. 68060 50 MHz 4. Raspberry Pi 3 A+ 5. 68060 100 MHz 6. 68030 70 MHz, SRAM > PVE RB > PVE RB CC > PVE RB CS > PVE RB CC CS MMU HANDLING I have moved the specific code out to some general-purpose functions I added to the private library of functions I use for various projects. While at it, I improved some key aspects of the library. Unless I introduced new bugs, the program should be more stable and compatible than ever. Last edited by saimo; 18 December 2023 at 18:10. Reason: Removed attachment as I provided a newer version later. |
18 December 2023, 00:03 | #134 |
Registered User
Join Date: Jan 2019
Location: Finland
Posts: 654
|
TF1260 50mhz
RB 10.094 RB CC 9.612 RB CS 28.122 *RB CC CS 16.950 TF1260 100mhz RB 17.610 RB CC 16.835 RB CS 48.448 *RB CC CS 22.737 * "PVE CC CS" doesn't work and only shows black screen. The "RB CC CS" benchmark does finish, but displays some graphical glitches during the test. |
18 December 2023, 00:28 | #135 | |
Registered User
Join Date: Aug 2010
Location: Italy
Posts: 855
|
Quote:
Thanks and sorry! Last edited by saimo; 18 December 2023 at 01:16. |
|
18 December 2023, 02:50 | #136 |
Registered User
Join Date: Jan 2019
Location: Finland
Posts: 654
|
|
18 December 2023, 07:43 | #137 |
Registered User
Join Date: Jul 2023
Location: Domsjö/Sweden
Posts: 57
|
The Beast 030
RB: 29.240 RB CS: 29.260 |
18 December 2023, 09:38 | #138 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,499
|
Instead of a commanche type clone, I wonder if this could be used for a voxel based game more like Zeewolf?
|
18 December 2023, 11:28 | #139 |
Registered User
Join Date: Aug 2010
Location: Italy
Posts: 855
|
|
18 December 2023, 11:30 | #140 |
Registered User
Join Date: Aug 2010
Location: Italy
Posts: 855
|
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
No native AGA screens on PIV since P96 v3 upgrade | LoadWB | support.Apps | 0 | 30 October 2020 01:57 |
Extra bottom line on native screens, chipset feature or WinUAE? | PeterK | support.WinUAE | 5 | 11 September 2019 21:21 |
My pseudo 3D jump code | Brick Nash | Coders. AMOS | 24 | 03 September 2016 00:18 |
Chunky to Planar (C2P) -- USELESS GIMMICK?! | crosis38 | support.Hardware | 10 | 09 July 2016 04:17 |
Pseudo Ops Viruskiller | Promax | request.Apps | 0 | 28 July 2010 22:21 |
|
|