21 June 2023, 16:42 | #21 |
Registered User
Join Date: Aug 2010
Location: Italy
Posts: 854
|
I just noticed that 4 values for the CMYW model in the colors table were wrong: I fixed them and uploaded a new archive.
While at it, given the request above, I whipped up an AMOS Professional program that shows how to set up a PED81C screen and to perform some basic operations on it - hopefully, this will be easy to understand and also open the door to AMOS programmers. The program source is included in the archive. Code:
'----------------------------------------------------------------------------- '$VER: PED81C example 1.3 (28.11.2023) (c) 2023 RETREAM 'Legal terms: please refer to the accompanying documentation. 'www.retream.com/PED81C 'contact@retream.com '----------------------------------------------------------------------------- '----------------------------------------------------------------------------- 'DESCRIPTION 'This shows how to set up a PED81C screen and to perform some basic operations 'on it. 'Screen features: ' * equivalent to a 319x256 LORES screen ' * 160 dots wide raster ' * single buffer ' * blanked border ' * 64-bit bitplanes fetch mode ' * CMYW color model ' 'NOTES 'The code is written to be readable, not to be general-purpose/optimal. '----------------------------------------------------------------------------- '----------------------------------------------------------------------------- 'GLOBAL VARIABLES Global RASTERADDRESS,RASTERWIDTH,RASTERHEIGHT,RASTERSIZE RASTERWIDTH=160 RASTERHEIGHT=256 RASTERSIZE=RASTERWIDTH*RASTERHEIGHT '----------------------------------------------------------------------------- 'MAIN 'Initialize everything. _INITIALIZE_AMOS_ENVIRONMENT _INITIALIZE_SCREEN 'If the initialization succeeded, load a picture into the raster and, in case 'of success, execute a simple effect on it. If Param _LOAD_PICTURE_INTO_RASTER["picture-160x256.raw"] If Param _TURN_DISPLAY_DMA_ON[0] _RANDOMIZE_RASTER _TURN_DISPLAY_DMA_OFF End If End If 'Deinitialize everything. _DEINITIALIZE_SCREEN _RESTORE_AMOS_ENVIRONMENT '----------------------------------------------------------------------------- 'ROUTINES Procedure _ALLOCATE_BITPLANE[BANKINDEX,SIZE] '-------------------------------------------------------------------------- 'DESCRIPTION 'Allocates a CHIP RAM buffer to be used as a bitplane. ' 'INPUT 'BANKINDEX = index of bank to use 'SIZE = size [bytes] of bitplane ' 'OUTPUT '64-bit-aligned bitplane address (0 = error) ' 'WARNINGS 'The buffer must be freed with Erase BANKINDEX or Erase All. '-------------------------------------------------------------------------- Trap Reserve As Chip Data BANKINDEX,SIZE+8 If Errtrap=0 Then A=(Start(BANKINDEX)+7) and $FFFFFFF8 End Proc[A] Procedure _DEINITIALIZE_SCREEN '-------------------------------------------------------------------------- 'DESCRIPTION 'Deinitializes the screen. ' 'WARNINGS 'Can be called only if the display is off. '-------------------------------------------------------------------------- Erase All Doke $DFF1FC,0 : Rem FMODE End Proc Procedure _INITIALIZE_AMOS_ENVIRONMENT '-------------------------------------------------------------------------- 'DESCRIPTION 'Ensures the program cannot be interrupted or brought to back, and turns 'off the AMOS video system. '-------------------------------------------------------------------------- Break Off Amos Lock Comp Test Off Auto View Off Update Off Copper Off _TURN_DISPLAY_DMA_OFF End Proc Procedure _INITIALIZE_SCREEN '-------------------------------------------------------------------------- 'DESCRIPTION 'Initializes the screen. ' 'OUTPUT '-1/0 = OK/error ' 'WARNINGS '_DEINITIALIZE_SCREEN[] must be called also in case of failure. ' 'NOTES 'Sets RASTERADDRESS. '-------------------------------------------------------------------------- 'Allocate the raster. _ALLOCATE_BITPLANE[10,RASTERSIZE] : If Param=0 Then Pop Proc[0] RASTERADDRESS=Param 'Allocate and fill the selector bitplanes. _ALLOCATE_BITPLANE[11,RASTERSIZE] : If Param=0 Then Pop Proc[0] B3A=Param Fill B3A To B3A+RASTERSIZE,$55555555 _ALLOCATE_BITPLANE[12,RASTERSIZE] : If Param=0 Then Pop Proc[0] B4A=Param Fill B4A To B4A+RASTERSIZE,$33333333 'Set the chipset. DIWSTRTX=$81+(160-RASTERWIDTH) DIWSTRTY=$2C+(128-RASTERHEIGHT/2) DIWSTRT=((DIWSTRTY and $FF)*256) or((DIWSTRTX+1) and $FF) DIWSTOPX=DIWSTRTX+RASTERWIDTH*2 DIWSTOPY=DIWSTRTY+RASTERHEIGHT DIWSTOP=((DIWSTOPY and $FF)*256) or(DIWSTOPX and $FF) DIWHIGH=((DIWSTOPX and $100)*32) or(DIWSTOPY and $700) or((DIWSTRTX and $100)/8) or(DIWSTRTY/256) DDFSTRT=(DIWSTRTX-17)/2 DDFSTOP=DDFSTRT+RASTERWIDTH-8 Doke $DFF092,DDFSTRT Doke $DFF094,DDFSTOP Doke $DFF08E,DIWSTRT Doke $DFF090,DIWSTOP Doke $DFF1E4,DIWHIGH Doke $DFF100,$4241 : Rem BPLCON0 Doke $DFF102,$10 : Rem BPLCON1 Doke $DFF104,$224 : Rem BPLCON2 Doke $DFF108,0 : Rem BPLMOD1 Doke $DFF10A,0 : Rem BPLMOD2 Doke $DFF1FC,$3 : Rem FMODE 'Set COLORxx. Doke $DFF106,$20 : Rem BPLCON3 Doke $DFF180,0 Doke $DFF182,$88 Doke $DFF184,$88 Doke $DFF186,$FF Doke $DFF188,0 Doke $DFF18A,$808 Doke $DFF18C,$808 Doke $DFF18E,$F0F Doke $DFF190,0 Doke $DFF192,$880 Doke $DFF194,$880 Doke $DFF196,$FF0 Doke $DFF198,0 Doke $DFF19A,$888 Doke $DFF19C,$888 Doke $DFF19E,$FFF Doke $DFF106,$220 : Rem BPLCON3 Doke $DFF180,0 Doke $DFF182,0 Doke $DFF184,0 Doke $DFF188,0 Doke $DFF18A,0 Doke $DFF18C,0 Doke $DFF190,0 Doke $DFF192,0 Doke $DFF194,0 Doke $DFF198,0 Doke $DFF19A,0 Doke $DFF19C,0 Doke $DFF106,$20 : Rem BPLCON3 'Build a Copperlist that sets the bitplanes pointers. Cop Movel $E0,RASTERADDRESS Cop Movel $E4,RASTERADDRESS Cop Movel $E8,B3A Cop Movel $EC,B4A Cop Swap End Proc[-1] Procedure _LOAD_PICTURE_INTO_RASTER[FILEPATH$] '-------------------------------------------------------------------------- 'DESCRIPTION 'Loads a raw 8-bit chunky picture into the raster, ensuring that its size 'is correct. ' 'IN 'FILEPATHS = path of picture file ' 'OUTPUT '-1/0 = OK/error '-------------------------------------------------------------------------- Trap Open In 1,FILEPATH$ : If Errtrap Then Pop Proc[0] L=Lof(1) Close(1) If L<>RASTERSIZE Then Pop Proc[0] Trap Bload FILEPATH$,RASTERADDRESS End Proc[Errtrap=0] Procedure _RANDOMIZE_RASTER '-------------------------------------------------------------------------- 'DESCRIPTION 'Randomizes the raster by swapping 16 dots per frame, until a mouse button 'is pressed. '-------------------------------------------------------------------------- XM=RASTERWIDTH-1 YM=RASTERHEIGHT-1 Repeat C=16 While C X0=Rnd(XM) Y0=Rnd(YM) X1=Rnd(XM) Y1=Rnd(YM) A0=Y0*RASTERWIDTH+X0+RASTERADDRESS A1=Y1*RASTERWIDTH+X1+RASTERADDRESS C0=Peek(A0) Poke A0,Peek(A1) Poke A1,A0 Dec C Wend _WAIT_SCREEN_BOTTOM Until Mouse Click End Proc Procedure _RESTORE_AMOS_ENVIRONMENT '-------------------------------------------------------------------------- 'DESCRIPTION 'Restores the AMOS environment. '-------------------------------------------------------------------------- Copper On Update On Auto View On Amos Unlock Break On _TURN_DISPLAY_DMA_ON[$20] End Proc Procedure _TURN_DISPLAY_DMA_OFF '-------------------------------------------------------------------------- 'DESCRIPTION 'Disables the bitplanes, Copper and sprites DMA. '-------------------------------------------------------------------------- _WAIT_SCREEN_BOTTOM Doke $DFF096,$3A0 : Rem DMACON End Proc Procedure _TURN_DISPLAY_DMA_ON[SSPRITESFLAG] '-------------------------------------------------------------------------- 'DESCRIPTION 'Enables the bitplanes and Copper DMA. ' 'INPUT 'SSPRITESFLAG = $20/0 = turn / do not turn sprites on ' 'WARNINGS 'The chipset must have been set up properly. '-------------------------------------------------------------------------- _WAIT_SCREEN_BOTTOM Doke $DFF096,$8380 or SSPRITESFLAG : Rem DMACON End Proc Procedure _WAIT_SCREEN_BOTTOM '-------------------------------------------------------------------------- 'DESCRIPTION 'Waits for the bottom of the screen. '-------------------------------------------------------------------------- While Deek($DFF004) and $3 : Wend Repeat : Until(Leek($DFF004) and $3FF00)>$12C00 End Proc Last edited by saimo; 29 November 2023 at 13:07. Reason: Updated source code. |
02 July 2023, 22:37 | #22 |
Registered User
Join Date: Jun 2017
Location: Finland
Posts: 367
|
This is cool!
|
28 November 2023, 23:42 | #23 |
Registered User
Join Date: Aug 2010
Location: Italy
Posts: 854
|
I have just released a little update, accompanied by the PED81C Voxel Engine (PVE), i.e. a new demo. If you can't be bothered trying it yourself, you can see it in this video - but beware: YouTube's video compression degraded the visual quality (especially the colors saturation and brightness).
[ Show youtube player ] Details about PVE straight from the manual: Code:
-------------------------------------------------------------------------------- OVERVIEW PVE is an experiment to test the graphical quality and computational performance of the PED81C system. It allows to move freely around a typical voxel landscape. -------------------------------------------------------------------------------- GETTING STARTED PVE requires: * Amiga computer * AGA chipset * 80 kB of CHIP RAM * 4 MB of FAST RAM * PAL SHRES support * digital joystick and keyboard * 2.1 MB of storage space To install PVE, unpack the LhA archive to any directory of your choice. To start PVE, open the program directory and double-click the program icon from Workbench or execute the program from shell. Shell arguments: CACHECOPYBACK=CC/S: make the 68040/68060 data cache work in copyback mode CACHESWITCHING=CS/S: switch off the 68030 data cache burst or the 68040/68060 data cache while rendering the voxel RUNBENCHMARK=RB/S: benchmark graphics rendering If your monitor / graphics card / scan doubler do(es) not support SHRES, the colors will look off or even not show at all. In such case, to hopefully fix the colors a bit, try the staggered lines option. -------------------------------------------------------------------------------- CONTROLS PVE is controlled by joystick (in the game port) and keyboard. JOYSTICK | KEYBOARD | SPLASH SCREEN | VOXEL SCREEN ---------+----------+-----------------------------+---------------------------- [UP] | | | move forwards [DOWN] | | | move backwards [LEFT] | | | turn left [RIGHT] | | | turn right [FIRE1] | | go to voxel screen | accelerate | [F1] | turn staggered lines on/off | turn staggered lines on/off | [F2] | turn fps indicator on/off | turn fps indicator on/off | [ESCAPE] | quit to AmigaOS | go to splash screen -------------------------------------------------------------------------------- MISCELLANEOUS * The staggered lines shift the odd lines by 1 SHRES pixel to the right. On systems which handle SHRES correctly, that will reduce the jailbars effect (but give the screen a kind of wavy look). On system which handle SHRES as HIRES (for example: MNT's VA2000 graphics card and Irix Labs' ScanPlus AGA - contrary to how is was originally marketed - display only the even or odd columns of pixels, so only reds and blues or greens and grays show), that helps improving the colors a bit (giving the screen a kind of scanline effect). On other systems, the results are unpredictable, but the option is still worth a try. * The number shown in the top-left corner of the voxel screen is the fps indicator, which reports the number of frames rendered in the last second. * The map wraps around at its edges. -------------------------------------------------------------------------------- BENCHMARK The performance of graphics rendering can be measured by means of the command line RUNBENCHMARK option. Measuring the performance allows to find the best settings for any given machine. On 68030 machines, the best settings can be found by running PVE from shell as follows: > PVE RUNBENCHMARK > PVE RUNBENCHMARK CACHESWITCHING On 68040 and 68060 machines,the best settings can be found by running PVE from shell as follows (between parentheses are the shortened forms): > PVE RUNBENCHMARK > PVE RUNBENCHMARK CACHECOPYBACK > PVE RUNBENCHMARK CACHESWITCHING > PVE RUNBENCHMARK CACHECOPYBACK CACHESWITCHING The benchmark makes PVE render 256 frames while rotating the camera by 360°, quit to AmigaOS and print the results to the standard output as follows: * number of frames rendered; * elapsed time in seconds; * number of frames rendered per second. During the benchmark, nothing shows. The elapsed time depends on the power of the machine. On very slow machines, it might take quite a while (e.g. on a machine that renders at 4 fps, the duration will be 256/4 = 64 seconds). This table shows the results of various benchmarks expressed in fps. | DATE CACHE MODE | ------+---------------------+--------+--------+---------+---------+----- AMIGA | EXPANSION BOARD | D | C | S | C+S | NOTE ------+---------------------+--------+--------+---------+---------+----- 1200 | ? | 6.401 | - | - | - | 1 1200 | Blizzard 1230 IV | 21.129 | - | 21.241 | - | 2 1200 | Blizzard 1260 | 11.770 | 11.770 | 29.047 | | 3 1200 | PiStorm32 | 78.120 | 78.768 | 127.936 | | 4 1200 | TerribleFire TF1260 | 10.094 | 9.612 | 28.122 | | 3 1200 | TerribleFire TF1260 | 17.610 | 16.835 | 48.448 | | 5 3000+ | BFG9060 | 11.004 | 11.004 | 31.011 | | 3 3000+ | BFG9060 | 19.114 | 19.114 | 53.422 | | 5 4000 | Cyberstorm MK III | | | | | 3 4000 | Warp Engine | 12.120 | 12.120 | 36.861 | | 6 4000T | CyberStorm PPC | 17.611 | 17.611 | 34.641 | | 7 CD³² | The Beast 030 | 29.240 | - | 29.260 | - | 8 DATA CACHE MODE: D = Default (always on + writethrough) C = Copyback S = Switching NOTE 1. 68020 14.19 MHz, FAST RAM only 2. 68030 50 MHz, RAM 60 ns 3. 68060 50 MHz 4. Raspberry Pi 3 A+ 5. 68060 100 MHz 6. 68060 80 MHz 7. 68060 60 MHz 8. 68030 70 MHz, SRAM -------------------------------------------------------------------------------- TECHNICAL NOTES * Rendering is done by columns, from bottom to top and then left to right, but the data is written to FAST RAM raster sequentially (therefore, in practice, the raster is rotated clockwise by 90°). * The graphics in the FAST RAM raster are rotated and copied to a PED81C raster in CHIP RAM while the bitplanes DMA fetch is inactive. The rotation executes partially/entirely (depending on the CPU) in parallel with the writes to CHIP RAM. * Rendering and buffering and are totally asynchronous, so that the CPU must never wait and can run at full speed all the time (unless it is so fast that it renders the frames faster than they are shown). * The code applies a depth of 256 steps per column, so it evaluates 256*128 = 32768 dots per frame (and then renders only those which are actually visible). * The screen resolution is 1020x200 SHRES pixels, which correspond to 255x200 LORES-sized dots and to 128x200 logical dots. * The screen resolution can be changed by redefining the width and height constants in the code and reassembling it. * The program supports only maps of 1024x1024 pixels, but it can be made to support maps of other sizes by redefining the width and height constants in the code and reassembling it. * The code is 100% assembly. * The code is mostly optimized for 68030. * The handling of the user input and of the camera is decoupled from the graphics rendering and executes every frame. * The height of the camera adapts automatically to that of the dot in the map it * is at, but it can be made user-controllable and its maximum value can be increased almost to the point that the landscape disappears at the bottom of the screen. * The map color and height data are stored in separate files, but at load time they are merged in a single buffer consisting of <color, height> couples. * The map requires 2 MB of FAST RAM. * The program takes over the system entirely and returns to AmigaOS cleanly. -------------------------------------------------------------------------------- BACKSTORY After a hiatus from programming of several months (due to a computer-unrelated project), I decided to finally create something for PED81C because I had made nothing with it other than a few little examples, I wanted to test its graphical quality and computational performance, and... I felt like having some good fun. After some inconclusive mental wandering, the idea of making a voxel engine came to mind for unknown reasons (I had never dabbled with voxel before). When the engine was mature enough I decided to distribute PVE publicly (which initially was not planned). Code:
In general, given a raster which is RASTERWIDTH dots wide and RASTERHEIGHT dots tall, the values to write to the chipset registers in order to create a centered screen can be calculated as follows: * SCREENWIDTH = RASTERWIDTH * 8 * SCREENHEIGHT = RASTERHEIGHT * DIWSTRTX = $81 + (160 - SCREENWIDTH / 8) * DIWSTRTY = $2c + (128 - SCREENHEIGHT / 2) * DIWSTRT = ((DIWSTRTY & $ff) << 8) | ((DIWSTRTX + 1) & $ff) * DIWSTOPX = DIWSTRTX + SCREENWIDTH / 4 * DIWSTOPY = DIWSTRTY + SCREENHEIGHT * DIWSTOP = ((DIWSTOPY & $ff) << 8) | (DIWSTOPX & $ff) * DIWHIGH = ((DIWSTOPX & $100) << 5) | (DIWSTOPY & $700) | ((DIWSTRTX & $100) >> 3) | (DIWSTRTY >> 8) * DDFSTRT = (DIWSTRTX - 17) / 2 * DDFSTOP = DDFSTRT+SCREENWIDTH / 8 - 8 Last edited by saimo; 18 December 2023 at 23:32. Reason: Updated manual text. |
29 November 2023, 07:01 | #24 |
HOL/FTP busy bee
Join Date: Sep 2006
Location: Germany
Age: 46
Posts: 31,922
|
I watched it yesterday evening and it's very impressive What sort of setup/machine was the video recorded on?
|
29 November 2023, 10:38 | #25 |
Registered User
Join Date: Aug 2010
Location: Italy
Posts: 854
|
|
29 November 2023, 10:39 | #26 |
HOL/FTP busy bee
Join Date: Sep 2006
Location: Germany
Age: 46
Posts: 31,922
|
Okay, thank you for the info
|
29 November 2023, 17:56 | #27 |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,189
|
Looks neat on a real monitor and very distinct. Only getting around 16-20fps on my B1260/50Mhz though, slow chip mem access speed really hurts :/
|
29 November 2023, 18:28 | #28 | |
Registered User
Join Date: Aug 2010
Location: Italy
Posts: 854
|
@TCD
You're welcome! @paraj Nice to hear! Thanks for the test and the report Quote:
That said, the FAST RAM -> CHIP RAM copy loop is fine-tuned for my card. I made tens of tests and tried the weirdest solutions, and eventually found out that the best code was this: Code:
move.w #RASTERSIZE/(13*4)-1,d7 .CopyDots move.l (a6)+,d0 move.l (a6)+,d1 move.l (a6)+,d2 move.l (a6)+,d3 move.l (a6)+,d4 move.l (a6)+,d5 move.l (a6)+,d6 movea.l (a6)+,a0 movea.l (a6)+,a1 movea.l (a6)+,a2 movea.l (a6)+,a3 movea.l (a6)+,a4 movea.l (a6)+,a5 movem.l d0-d6/a0-a5,(a7) adda.w #13*4,a7 dbf d7,.CopyDots rept (RASTERSIZE//(13*4))/4 move.l (a6)+,(a7)+ endr Do you happen to know which is the best strategy? |
|
29 November 2023, 18:39 | #29 | |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,189
|
Quote:
Chip writes are just slower than they need to be on my card (around 5.3M/s) regardless of what you do (don't think move16 works to chipram, but haven't tried). To maximize performance you want to do aligned long word writes to chipmem, then interleave computations that don't cause cache misses while the write(s) [up to 4] complete. There are ~30 cycles or so to do stuff per write. Obviously this isn't easy to do productively in a general case, so most of the time it's spend C2Ping "for free" |
|
30 November 2023, 00:47 | #30 | |
Registered User
Join Date: Aug 2010
Location: Italy
Posts: 854
|
Quote:
Could you try the attached test program, please? It's the same program, but with the FAST RAM -> CHIP RAM copy disabled, so that the stats printed out at the end will tell us the speed of rendering alone - and thus, indirectly, the impact of the writes to CHIP RAM. After clicking the left mouse button in the splash screen all you'll see is the screen flickering madly: after a few seconds, simply click the right mouse button to put an end to the headache-inducing show. In the meanwhile, I received the results from tests made on other 68060 boards: * A1200 + TF1260: 14.21 fps * A4000 + Cyberstorm MK III: 18.80 fps Last edited by saimo; 01 December 2023 at 02:23. Reason: Removed attachment, as I provided a new test version. |
|
30 November 2023, 02:18 | #31 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,413
|
Mmm, delicious gory details.
I'm sure I saw another recent thread where it was noted that movem was performing less well than expected on 040/060, so might there be a correlation here? |
30 November 2023, 09:05 | #32 | |
Registered User
Join Date: Aug 2010
Location: Italy
Posts: 854
|
Quote:
Anyway, in the sleepless night that followed I realized that something better than fiddling with instructions can be done. The current (triple) buffering was devised for when (i.e. initially) rendering was done directly to CHIP RAM, but now that graphics are rendered in FAST RAM that isn't optimal anymore. I'm going to rework it so that the writes to CHIP RAM happen while the bitplanes are not being displayed (more precisely, I'll have the copy start right after the last line has been displayed) - that, hopefully, will improve performance. Also, I thought that I can unroll the core loop of the renderer a bit and still have it fit the 68020 and 68030 cache, and save about 4.5 cycles per source dot - if it works out, that should give a 0.25-0.75 fps (rough estimate) increase on my 68030 machine. |
|
30 November 2023, 13:27 | #33 |
Alien Bleed
Join Date: Aug 2022
Location: UK
Posts: 4,413
|
I've basically posted this exact thing somewhere else, but why not have a routine targeted for each CPU that you can make a realistic specific optimisation for and then just detect which CPU is in use on startup and assign the relevant function address to a pointer somewhere? Sure, doing an indirect jump to the function is going to cost a few more cycles but it's presumably nothing compared to the work in the loop itself.
|
30 November 2023, 13:59 | #34 | |
Registered User
Join Date: Aug 2010
Location: Italy
Posts: 854
|
Quote:
By the way, the loop unrolling optimization worked as expected: it provided 1 extra fps for the rendering code (22.2 -> 23.2 frames rendered per second) and a 0.7 fps overall improvement (20.2 -> 20.9 fps). It's a pity that the outer loop doesn't fit in the cache as well only by a few bytes (it's 266 bytes now). The buffering strategy change, instead, is only on paper, and I'll be able to work on it only later - real life work got in the way :/ *The data cache burst is turned on and off as needed. Last edited by saimo; 30 November 2023 at 18:18. |
|
30 November 2023, 17:50 | #35 | |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,189
|
Quote:
It's only a little bit faster. 208/554 not touching controls with PVE-B, 208/498 moving a bit vs. 70/205 and 161/433 with copy active. |
|
30 November 2023, 18:17 | #36 | |
Registered User
Join Date: Aug 2010
Location: Italy
Posts: 854
|
Quote:
Astonishing results! Without copy: 208/554 > 18.77 fps 208/498 > 20.88 fps With copy: 70/205 > 17.07 fps 161/433 > 18.59 fps It looks like that the copy costs about 1.7 to 2.3 fps, which is quite similar to what I get on my machine - and this aligns well with what we said above regarding CHIP RAM access. In other words, it's the renderer code to be slower! Now, this is weird: that it wasn't optimal was a given (I only avoided some registers conflicts, without taking into account pOEP and sOEP), but that it would perform worse is a big surprise. And there isn't much room for improvement, as the core code is only a bunch of basic instructions for a total of 46 bytes. The new minimally-unrolled loop version should help a bit. Maybe I'll give it a thought later - now I'll deal with the CHIP RAM writes / buffering. |
|
30 November 2023, 18:36 | #37 |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,030
|
From old tests, good c2p routine is fastest than copy from fast to chip on Cyberstorm 060 boards.
|
01 December 2023, 02:22 | #38 |
Registered User
Join Date: Aug 2010
Location: Italy
Posts: 854
|
New version:
* reworked buffering system, so that the FAST RAM -> CHIP RAM copy happens when there is no bitplanes fetch; * optimized renderer core loop; * replaced calls to exec's CacheControl() with custom code. The new buffering strategy didn't bring the improvement I hoped for - but, still, it's an improvement. The last change is due to the fact that, on the emulated 68060 system I've set up, CacheControl() didn't enable the branch cache and the store buffer. I don't know if it's because my installation of AmigaOS 3.9 (the same I use for the 68030, with just 680x0.library and 68060.library in place) was not sane or the function doesn't fully support the 68060 cache at all. If this was actually a problem, the performance should be better now. Can you guys give the attached executable a shot and let me know how it runs on your 68060 machines and if it finally performs better than on my 68030, please? This is how it looks on my machine: [ Show youtube player ] Notes: * the red bar indicates the time spent with copying the rendered graphics to the video buffer in CHIP RAM; * the scanline-ish look is due to the fact that my machine's video output goes through the ScanPlus AGA scandoubler, which does not support the SHRES resolution (it displays it as HIRES, skipping the even columns of pixels), so the program adopts a workaround to get decent (kind of) colors - the workaround consists in shifting every other line one SHRES pixel to the right, so that the even lines show greens and grays and the odd lines show reds and blues; * as you can see, now it runs at 21 fps most of the time. Last edited by saimo; 02 December 2023 at 23:03. Reason: Removed attachment, as I provided a newer version later. |
01 December 2023, 08:13 | #39 |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,189
|
About the same. Stationary: 17.4fps, Moving about: 19.3.
Last edited by paraj; 01 December 2023 at 08:33. |
01 December 2023, 08:23 | #40 |
Alien Breeder
Join Date: Dec 2007
Location: Szigetszentmiklos / Hungary
Age: 46
Posts: 1,112
|
Awesome stuff. looking forward to see where it ends up in several stuffs used.
Keep up the great work! |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
No native AGA screens on PIV since P96 v3 upgrade | LoadWB | support.Apps | 0 | 30 October 2020 01:57 |
Extra bottom line on native screens, chipset feature or WinUAE? | PeterK | support.WinUAE | 5 | 11 September 2019 21:21 |
My pseudo 3D jump code | Brick Nash | Coders. AMOS | 24 | 03 September 2016 00:18 |
Chunky to Planar (C2P) -- USELESS GIMMICK?! | crosis38 | support.Hardware | 10 | 09 July 2016 04:17 |
Pseudo Ops Viruskiller | Promax | request.Apps | 0 | 28 July 2010 22:21 |
|
|