English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 11 December 2023, 14:34   #121
saimo
Registered User
 
saimo's Avatar
 
Join Date: Aug 2010
Location: Italy
Posts: 855
Quote:
Originally Posted by paraj View Post
Seems good! 37.858 (bb) and 30.656 (fb) with my normal config and no glitches!
Quote:
Originally Posted by Aardvark View Post
No glitches on TF1260 either.

BB 50mhz 26.479
FB 50mhz 22.855
BB 100mhz 47.814
FB 100mhz 38.163
Glad to see some sane figures and finally the code performing decently on 68060. I hope that some more speed can be achieved with a more sofisticated handling of the data cache.
saimo is offline  
Old 11 December 2023, 15:37   #122
saimo
Registered User
 
saimo's Avatar
 
Join Date: Aug 2010
Location: Italy
Posts: 855
And here are the builds to verify whether my guess is correct and, if so, what produces the best performance.
The last reports above are relative to a version that simply keeps the data cache on all the time and uses the cache-inhibition precise exception model forthe low 16 MB. These builds adopt other solutions.

The files included in the archive are:
* PVE-IA
* PVE-IO
* PVE-IU
* PVE-PA (this is the one that gave the previous results)
* PVE-PO
* PVE-PU

The naming scheme goes like this:
* I = cache-inhibition Imprecise exception model for the low 16 MB
* P = cache-inhibition Precise exception model for the low 16 MB
* A = data cache Always on
* O = data cache off during voxel rendering and On while rotating&copying the raster from FAST RAM to CHIP RAM
* U = data cache off during voxel rendering and Unmanaged while rotating&copying the raster from FAST RAM to CHIP RAM

If these tests run without glitches, the next step will be trying the copyback mode for FAST RAM, although I don't expect it to make any difference given that, at least for now, there are basically no repeated reads and writes on the same variables.

Last edited by saimo; 11 December 2023 at 20:59. Reason: Removed attachment as I provided a newer version later.
saimo is offline  
Old 11 December 2023, 16:48   #123
Aardvark
Registered User
 
Join Date: Jan 2019
Location: Finland
Posts: 654
IO and PO glitches out same way as in #102, rest of the tests are glitchless.

IA-FB 50mhz 23.705
IA-FB 100mhz 40.519
IA-BB 50mhz 26.454
IA-BB 100mhz 47.636

IU-FB 50mhz 27.138
IU-FB 100mhz 46.826
IU-BB 50mhz 31.984
IU-BB 100mhz 50.068

PU-FB 50mhz 26.000
PU-FB 100mhz 43.163
PU-BB 50mhz 31.880
PU-BB 100mhz 50.068
Aardvark is offline  
Old 11 December 2023, 18:03   #124
saimo
Registered User
 
saimo's Avatar
 
Join Date: Aug 2010
Location: Italy
Posts: 855
Quote:
Originally Posted by Aardvark View Post
IO and PO glitches out same way as in #102, rest of the tests are glitchless.

IA-FB 50mhz 23.705
IA-FB 100mhz 40.519
IA-BB 50mhz 26.454
IA-BB 100mhz 47.636

IU-FB 50mhz 27.138
IU-FB 100mhz 46.826
IU-BB 50mhz 31.984
IU-BB 100mhz 50.068

PU-FB 50mhz 26.000
PU-FB 100mhz 43.163
PU-BB 50mhz 31.880
PU-BB 100mhz 50.068
Your speedy (and much appreciated) report tell us that:
* the problem has been found and fixed;
* the imprecise model gives a slightly better performance;
* turning the cache off during the voxel rendering is better (unlike on the 68030);
* I managed to screw something up yet again - but my head is spinning too much at the moment due to lack of sleep, so I'll get back to fiddling with the cache later...

P.S. I know that the .068 looks wrong. It kind of is. It's due to the following: after a frame has been rendered, the code waits for buffering to make a raster available for rendering (if not available yet); when the machine can render frames faster than they can be displayed, there is no available raster, as the previously rendered one in waiting to be copied to CHIP RAM and the other one has just been rendered; so the code halts for a while; at the 256th frame the benchmark exists without caring for buffering anymore, so it exits early and the figure goes beyond 50 fps; on the other hand, if it were artificially made to wait for the buffering to complete (there's a double double buffering: one for CHIP RAM buffers and one for FAST RAM buffers) without doing anything, the figure would be lower than 50 fps.
saimo is offline  
Old 11 December 2023, 20:58   #125
saimo
Registered User
 
saimo's Avatar
 
Join Date: Aug 2010
Location: Italy
Posts: 855
Fixed the bug that made the IO and PO versions act up - it wasn't related to caches, but to this sequence of instructions...

move.l (sp)+,d0
movem.l (sp)+,d3-d7/a1-a5

... which should have been instead...

movem.l (sp)+,d3-d7/a1-a5
move.l (sp)+,d0

This stupid mistake was caused by an aborted idea that I forgot to undo completely.

New archive attached.

EDIT: I have added to the archive also 3 builds that use copyback for FAST RAM ('C' in the name); given the previous results, they use only the imprecise exception model for the low 16 MB.

Last edited by saimo; 11 December 2023 at 22:21. Reason: Removed archive as I provided a newer version later.
saimo is offline  
Old 11 December 2023, 21:56   #126
Aardvark
Registered User
 
Join Date: Jan 2019
Location: Finland
Posts: 654
IO-FB 50mhz 28.014
IO-FB 100mhz 48.012

PO-FB 50mhz 26.797
PO-FB 100mhz 44.214

CU-FB 50mhz 27.254
CU-FB 100mhz 46.998
CU-BB 50mhz 32.040
CU-BB 100mhz 50.068

CO-FB 50mhz 28.134
CO-FB 100mhz 48.192

BlindBenchmark crash-reboots on IO, PO, and CO tests.
Aardvark is offline  
Old 11 December 2023, 22:21   #127
saimo
Registered User
 
saimo's Avatar
 
Join Date: Aug 2010
Location: Italy
Posts: 855
Quote:
Originally Posted by Aardvark View Post
IO-FB 50mhz 28.014
IO-FB 100mhz 48.012

PO-FB 50mhz 26.797
PO-FB 100mhz 44.214

CU-FB 50mhz 27.254
CU-FB 100mhz 46.998
CU-BB 50mhz 32.040
CU-BB 100mhz 50.068

CO-FB 50mhz 28.134
CO-FB 100mhz 48.192
Wonderful!
Winner combination:
* imprecise exception model for the low 16 MB
* copyback for the rest of the address space
* cache off during voxel rendering
* cache on during the CHIP RAM -> FAST RAM transfer

I'd be surprised if it weren't the same also on other 68060 machines.

Quote:
BlindBenchmark crash-reboots on IO, PO, and CO tests.
Whooops, I forgot to move a label needed by that test to before the move.l d0,-(sp) that matches the move.l (sp)+,d0 mentioned above

Updated archive attached.

Last edited by saimo; 17 December 2023 at 23:25. Reason: Removed attachment as I provided a newer version later.
saimo is offline  
Old 16 December 2023, 15:39   #128
saimo
Registered User
 
saimo's Avatar
 
Join Date: Aug 2010
Location: Italy
Posts: 855
klx300r made other tests. Overall, the results are mixed. The only thing in common is that the imprecise exception model for the low 16 MB is best.

RESULS RECAP

Code:
A4000 + Cyberstorm MK III

CA BB: 37.724
CA FB: 32.632
CO BB: 32.296
CO FB: 28.998
CU BB: 32.926
CU FB: 28.262
IA BB: 38.461
IA FB: 33.156 best
IO BB: 32.926
IO FB: 28.998
IU crash
PA BB: 38.461
PA FB: 32.959
PO BB: 32.926
PO FB: 28.841
PU crash

Best data cache settings:
 * writethrough
 * always on


A1200 + TerribleFire TF1260 (klx300r's)

CA BB: 25.921
CA FB: 23.255
CO BB: 31.932
CO FB: 28.017
IA BB: 26.514
IA FB: 23.708
IO BB: 31.936
IO FB: 28.021 best

Best data cache settings:
 * writethrough
 * off during voxel rendering and On while rotating&copying the raster from FAST RAM to CHIP RAM


A1200 + TerribleFire TF1260 (Aardvark's)

CO FB: 28.134 best
CU FB: 27.254
CU BB: 32.040
IO FB: 28.014
PA BB: 26.479
PA FB: 22.855
PO FB: 26.797

Best data cache settings:
 * copyback
 * off during voxel rendering and On while rotating&copying the raster from FAST RAM to CHIP RAM
Copyback works better only in a single case, and only by a slight margin (CO FB - IO FB = 28.134-28.014 = 0.120 fps).
Instead, the difference between keeping the cache always on or not is much bigger and "inverted", as on the CSMKIII it's best to keep it always on, whereas on the TF1260 it's best to toggle it:
* CSIII: IA FB - IO FB = 33.156-28.998 = 4.158 fps
* TF1260 (klx300r): IO FB - IA FB = 28.021-23.708 = 4.313 fps
* TF1260 (Aardvark's): PO FB - PA FB = 26.797-22.855 = 3.942 fps

I must say it bothers me that there isn't a single best setting.
I'll add a command line switch to let the user choose whether to have the cache toggled or not.

On another note, a couple of days ago I measured that on an A1200 without accelerator / FAST RAM the copy&rotate operation takes about 273 rasterlines. Given that the screen is 200 lines tall and that the operation starts right after the 200th line, that means that the latter ends when the beam is drawing the 273-(312-200) = 161th line of the screen, i.e. that it manages to race the beam. Therefore, given that even a stock A1200 can do it, double buffering for the CHIP RAM rasters is totally useless, so I have removed it - that not only simplifies the code and makes it lighter, but, above all, reduces the refresh lag by 1 frame.
saimo is offline  
Old 16 December 2023, 16:03   #129
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,499
Ah, to race the beam...

Kids today have no idea what that even means.
Karlos is offline  
Old 16 December 2023, 17:38   #130
tomcat666
Retro Freak
 
tomcat666's Avatar
 
Join Date: Nov 2001
Location: Slovenia
Age: 51
Posts: 1,665
So next is Comanche port to amiga ?
tomcat666 is offline  
Old 16 December 2023, 19:27   #131
klx300r
Registered User
 
klx300r's Avatar
 
Join Date: Oct 2007
Location: ManCave, Canada
Posts: 1,646
Thumbs up

@ saimo


awesome to see the improvements
klx300r is offline  
Old 16 December 2023, 23:19   #132
saimo
Registered User
 
saimo's Avatar
 
Join Date: Aug 2010
Location: Italy
Posts: 855
@Karlos

&


@tomcat666

A port? Boooring... I like to make new stuff!
Also, I don't even know if I'll ever do anything with this thing here.


@klx300r

Your supports helped making them


@paraj @Aardvark @Lunda @modrobert

I'd be happy to put your real names in the credits section of the manual. If you like the idea, please PM me.

Last edited by saimo; 17 December 2023 at 14:34.
saimo is offline  
Old 17 December 2023, 23:23   #133
saimo
Registered User
 
saimo's Avatar
 
Join Date: Aug 2010
Location: Italy
Posts: 855
New version with a number of changes.


DATA CACHE HANDLING

By default, the data cache and the 68030 data cache burst are always on and in writethrough mode. By means of the new command line switches below it is possible to make the data cache work differently.

CACHECOPYBACK=CC/S: make the 68040/68060 data cache work in copyback mode
CACHESWITCHING=CS/S: switch off the 68030 data cache burst or the 68040/68060 data cache while rendering the voxel


BUFFERING

The CHIP RAM raster is no longer double buffered.


BENCHMARK

There is only one benchmark option now (RUNBENCHMARK=RB/S). The benchmark works differently from before: it renders and copies to CHIP RAM 256 frames without displaying anything and without synchronizing with the raster beam, so that the figures represent exclusively and precisely the performance related to the generation of graphics. Therefore, on powerful machines the figures can be crazily high.

Any help with filling this table is welcome

Code:
                             |          DATE CACHE MODE          |
-----------------------------+--------+--------+--------+--------+------
 AMIGA | ACCELERATOR BOARD   |      D |      C |      S |    C+S | NOTE
-------+---------------------+--------+--------+--------+--------+------
  1200 | ?                   |        |      - |      - |      - | 1
  1200 | Blizzard 1230 IV    | 21.129 |      - | 21.241 |      - | 2
  1200 | Blizzard 1260       |        |        |        |        | 3
  1200 | PiStorm32           |        |        |        |        | 4
  1200 | TerribleFire TF1260 |        |        |        |        | 3
  1200 | TerribleFire TF1260 |        |        |        |        | 5
  4000 | Cyberstorm MK III   |        |        |        |        | 3
  CD³² | The Beast 030       |        |        |        |        | 7

DATA CACHE MODE:
 D = Default (always on + writethrough)
 C = Copyback
 S = Switching

NOTE
 1. FAST RAM only
 2. 68030 50 MHz, RAM 60 ns
 3. 68060 50 MHz
 4. Raspberry Pi 3 A+
 5. 68060 100 MHz
 6. 68030 70 MHz, SRAM
If you feel like giving it a go, these are all the possible combinations:
> PVE RB
> PVE RB CC
> PVE RB CS
> PVE RB CC CS


MMU HANDLING

I have moved the specific code out to some general-purpose functions I added to the private library of functions I use for various projects. While at it, I improved some key aspects of the library. Unless I introduced new bugs, the program should be more stable and compatible than ever.

Last edited by saimo; 18 December 2023 at 18:10. Reason: Removed attachment as I provided a newer version later.
saimo is offline  
Old 18 December 2023, 00:03   #134
Aardvark
Registered User
 
Join Date: Jan 2019
Location: Finland
Posts: 654
TF1260 50mhz
RB 10.094
RB CC 9.612
RB CS 28.122
*RB CC CS 16.950

TF1260 100mhz
RB 17.610
RB CC 16.835
RB CS 48.448
*RB CC CS 22.737

* "PVE CC CS" doesn't work and only shows black screen. The "RB CC CS" benchmark does finish, but displays some graphical glitches during the test.
Aardvark is offline  
Old 18 December 2023, 00:28   #135
saimo
Registered User
 
saimo's Avatar
 
Join Date: Aug 2010
Location: Italy
Posts: 855
Quote:
Originally Posted by Aardvark View Post
TF1260 50mhz
RB 10.094
RB CC 9.612
RB CS 28.122
*RB CC CS 16.950

TF1260 100mhz
RB 17.610
RB CC 16.835
RB CS 48.448
*RB CC CS 22.737

* "PVE CC CS" doesn't work and only shows black screen. The "RB CC CS" benchmark does finish, but displays some graphical glitches during the test.
Whooops, function pointer initialization bug. Bugfix coming soon. EDIT: fixed version attached to my previous post.
Thanks and sorry!

Last edited by saimo; 18 December 2023 at 01:16.
saimo is offline  
Old 18 December 2023, 02:50   #136
Aardvark
Registered User
 
Join Date: Jan 2019
Location: Finland
Posts: 654
Quote:
Originally Posted by saimo View Post
Whooops, function pointer initialization bug. Bugfix coming soon. EDIT: fixed version attached to my previous post.
Thanks and sorry!
CC CS is still broken.
Aardvark is offline  
Old 18 December 2023, 07:43   #137
Lunda
Registered User
 
Join Date: Jul 2023
Location: Domsjö/Sweden
Posts: 57
The Beast 030
RB: 29.240
RB CS: 29.260
Lunda is offline  
Old 18 December 2023, 09:38   #138
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,499
Instead of a commanche type clone, I wonder if this could be used for a voxel based game more like Zeewolf?
Karlos is offline  
Old 18 December 2023, 11:28   #139
saimo
Registered User
 
saimo's Avatar
 
Join Date: Aug 2010
Location: Italy
Posts: 855
Quote:
Originally Posted by Aardvark View Post
CC CS is still broken.
My hope is that the late-night, zombie-state bugfix wasn't 100% correct. I'll check later.
saimo is offline  
Old 18 December 2023, 11:30   #140
saimo
Registered User
 
saimo's Avatar
 
Join Date: Aug 2010
Location: Italy
Posts: 855
Quote:
Originally Posted by Lunda View Post
The Beast 030
RB: 29.240
RB CS: 29.260
Many thanks! The figures are now in the manual.
saimo is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
No native AGA screens on PIV since P96 v3 upgrade LoadWB support.Apps 0 30 October 2020 01:57
Extra bottom line on native screens, chipset feature or WinUAE? PeterK support.WinUAE 5 11 September 2019 21:21
My pseudo 3D jump code Brick Nash Coders. AMOS 24 03 September 2016 00:18
Chunky to Planar (C2P) -- USELESS GIMMICK?! crosis38 support.Hardware 10 09 July 2016 04:17
Pseudo Ops Viruskiller Promax request.Apps 0 28 July 2010 22:21

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 06:40.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.11863 seconds with 14 queries