![]() |
![]() |
#1 |
Mostly Harmless
![]() Join Date: Aug 2004
Location: Northern Ireland
Posts: 1,045
|
WHDLoad blitter speed tests - needs you!
In The Zone shortly will be a WHDLoad slave that does some blitter speed tests. The intention is to investigate one possible cause of the annoying slowdowns many WHDLoad games experience on certain processors. 68040, I'm looking at you.
A sequence of blits, of the "cookie cutter" type typically found in games, is run with different sizes (16x16 to 1008x1008!), different types of blitter waiting code and different settings for the CPU caches. The the time taken for each blit is measured using the CIA timers, stored, and finally output to a text file "bspeed.txt" when all tests are complete. The zip file in The Zone also includes some Excel spreadsheets containing the results from my A1200-060 and some pretty graphs to help analysis. So here's your bit: run "WHDLoad BlitterSpeed.slave" and post, PM or email me your bspeed.txt files along with your machine specifications. AGA users can also see the effect of the FMODE register on blitter speed by passing the CUSTOM1 parameter to the slave with values of 0,1,2 or 3. You will need a 68020 or better, 0.5Mb chipram and ~2mb fastram in order to run the tests. They will take 5-10 minutes to complete and the screen will flash colours to let you know it's still running. I'm especially interested in 68040 results, but the more the merrier! |
![]() |
![]() |
#2 |
Moderator
Join Date: Jul 2004
Location: Norwich, Norfolk, UK
Age: 36
Posts: 11,160
|
Will test it for you either tonight or tomorrow...
|
![]() |
![]() |
#3 |
Longplayer
![]() |
Have set the test going on my BPPC 040/25.
Config is A1200-BlizzPPC 040/25, Mediator+Voodoo3/net/sb128/tv. OS39+Custom OS39 rom including HOGWAITBLIT patch if that makes a difference. as miggy hasnt been on for so long the battery has flatened and lost settings so have just forced 60ns memory mode from command line since i cant see boot menu to set it. CHIPNOCACHE whd option set for compatability. One normal and 4 custom1 tests. Edit : 1 normal test added with Chip cachability enabled just incase it effects anything ![]() Last edited by Mad-Matt; 27 February 2021 at 14:32. |
![]() |
![]() |
#4 |
Registered User
Join Date: Jan 2005
Location: 62-France
Age: 55
Posts: 413
|
hello,
tested on a1200dbox+1260/50/48mo fast+P96+aos3.9bb2. blizzkick enabled with some modules running. custom=1. just use WHDLoad BlitterSpeed.slave and wait the end. interesting to konw the result. |
![]() |
![]() |
#5 |
Global Caturator
Join Date: Aug 2004
Location: Porando
Age: 42
Posts: 6,074
|
Cool, will run the test and I do hope it'll help solve the slowdown problem, tho I'm on 030
![]() ---=== EDITED ===--- Done... Tested on Amiga 1200 with Blizzard 030/50MHz 32MB; OS 3.9 BB2 with FBlit & SystemPatch Last edited by Shoonay; 13 May 2008 at 12:47. |
![]() |
![]() |
#6 |
Registered User
Join Date: Aug 2007
Location: Budapest/Hungary
Posts: 13
|
My config:
A1200 Bppc 68040/33MHz /603+/233MHz, 2Mb Chip 128MB Fast Ram, Mediator/Voodoo etc. Chip ram cacheable disabled in whdload config because the compüatibility. OS: 3.9, Kickstart 3.1 with blitzkick Hope I can help you. |
![]() |
![]() |
#7 |
Moderator
Join Date: Jul 2004
Location: Norwich, Norfolk, UK
Age: 36
Posts: 11,160
|
Okay, done, courtesy of my A1200 with Apollo 040/33, and 32MB Fast RAM. WHDLoad 16.8 was used, and no tooltypes were set. If you want me to test it with any tooltypes just give me a shout.
|
![]() |
![]() |
#8 |
Mostly Harmless
![]() Join Date: Aug 2004
Location: Northern Ireland
Posts: 1,045
|
@madmatt & Graham: do you run PAL Amigas ?
|
![]() |
![]() |
#9 |
Moderator
Join Date: Jul 2004
Location: Norwich, Norfolk, UK
Age: 36
Posts: 11,160
|
Yep.
|
![]() |
![]() |
#10 |
.
Join Date: Oct 2004
Location: Ioannina/Greece
Posts: 5,040
|
here are my results, PAL, A4000D and cs-ppc 060/50mhz... no tooltypes whatsoever... chipmem cache disabled are usual for 040/060...
![]() ![]() ![]() |
![]() |
![]() |
#11 |
Longplayer
![]() |
|
![]() |
![]() |
#12 |
Mostly Harmless
![]() Join Date: Aug 2004
Location: Northern Ireland
Posts: 1,045
|
![]()
Firstly, thanks all for taking the time to run my little test
![]() From the results, it seems the standard blitter wait code supplied with WHDLoad is the best of the bunch. It may be possible to improve on it (I don't know) but it's not a bad start in any case and also proves Wepl's idea to avoid blitter slowdown was right. Not that that was really ever in doubt ![]() I've attached a spreadsheet to this message containing some further analysis of the times reported for the 'A' code ie: the standard WHDLoad blitter wait. There are some interesting results...
Some lessons learned then:
Perhaps these are obvious, but at least now there are hard numbers to back them up! These tests just enable or disable the instruction / data caches wholesale. For further investigation it might be a worthwhile experiment to run a test with the two caches enabled & disabled separately and with the data cache running in different modes, especially for 040 machines running out of chipmem. But whatever you do, don't take my word for this. Look at the numbers yourself and see if you agree! |
![]() |
![]() |
#13 |
2 contact me: email only!
![]() Join Date: May 2001
Location: Auckland / New Zealand
Posts: 3,170
|
It would be nice if there was an easy way to do the equivalent of a jsr BlitWait directly into WHDLoad running in fast memory - and WHDLoad can put the best patch for each processor rather than putting the code into each slave.
The global blitter code could be tweaked for each CPU and the poor slave programmer doesn't have to worry about which blitwait code to run! And only one place to update to improve it. |
![]() |
![]() |
#14 |
move.w #$4489,$dff07e
Join Date: Sep 2005
Location: Norfolk, UK
Age: 41
Posts: 2,344
|
That's a great idea Codetapper
![]() ![]() |
![]() |
![]() |
#15 |
WinUAE developer
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 48
Posts: 25,813
|
Can I see the source code please?
![]() |
![]() |
![]() |
#16 |
CaptainM68K-SPS France
|
hi girv here is mine a bit late
![]() |
![]() |
![]() |
#17 | |||||||
Moderator
Join Date: Nov 2001
Location: Germany
Posts: 829
|
First Girv, many thanks for doing the investigation on this topic, this will help to clear things, doing the right patches and help to optimize the installs.
Quote:
Best would be to send the slave source. I would also like to include the slave in whdload-dev package because I think it is of general interest. I also like to add a wait routine using the stop instruction which should be the fastest, although it maybe hard to use it for the general patch case: (routine not tested by me, maybe small corrections required...) Code:
init: move.w #$7fff,_custom+intena lea int,a0 move.l a0,$6c move.w #INTF_SETCLR|INTF_BLIT|INTF_INTEN,_custom+intena ... int: move.w #INTF_BLIT,_custom+intreq tst.w _custom+intreqr rte blitwait: stop #$2000 btst #DMAB_BLITDONE-8,_custom+intreqr ;sould be obsolete, but for security beq blitwait move #$2700,sr ;avoid interrupt occuring before stop rts Another topic is how much dma channels are used is your examples? I think this will have a noticeable effect on the results. I would expect much more influence of the blitwait routine if there are all 4 channels are used and less impact if only one or two channels are used. Next point is the impact of BLITHOG/BLTPRI? Quote:
Quote:
It seems so. To prove that more please try the wait routine using the stop instruction. I would expect that using this gives the same speed (at least for the large blits) on all machines. Quote:
Quote:
enabling caches in chipmem may cause other problems, also there are boards out there (40/60) which do not support cachability of chipmem (CHIPNOCACHE must be used). Quote:
Quote:
|
|||||||
![]() |
![]() |
#18 |
Mostly Harmless
![]() Join Date: Aug 2004
Location: Northern Ireland
Posts: 1,045
|
Source code is now in The Zone. It's probably best if this is reviewed anyway
![]() It does contain a pretty clear cut example of how to use the CIA timers in linked mode to time longer periods than one 16 bit counter will allow. This is the first serious use of the CIAs I've ever made and I'm quite pleased with how its gone ![]() I've also attached results from my Blizzard 1260 50Mhz card, running versions of the standard WHDLoad BLITWAIT macro with different numbers (0-5) of "tst.b _ciaa" in the loop. Blitwait loops were running from fastmem with all caches enabled. As you can see, for blits sized 16x16 - 64x64, which I guess are the most common found in games, there is a definite 10+% blitter speedup to be had on 060 machines by including 4 or 5 "tst.b _ciaa" instructions instead of the standard 2. Adding more increases performance on larger blits too but the gains aren't so dramatic, so perhaps 4 or 5 is the best compromise for these machines? |
![]() |
![]() |
#19 | |||||||||
Mostly Harmless
![]() Join Date: Aug 2004
Location: Northern Ireland
Posts: 1,045
|
Quote:
![]() ![]() Quote:
Quote:
![]() Quote:
Quote:
Quote:
Quote:
Quote:
![]() Quote:
![]() |
|||||||||
![]() |
![]() |
#20 |
Mostly Harmless
![]() Join Date: Aug 2004
Location: Northern Ireland
Posts: 1,045
|
I did a quick test on my 060: the STOP routine is quicker for 128x128, gaining about 6-8% over the standard WHDLoad blitwait with caches on and 15-20% with caches off in chipmem. Its much slower for smaller sizes - I guess its the interrupt raising/handling overhead.
|
![]() |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Future Wars - WHDload game speed | XimeR | project.WHDLoad | 12 | 05 October 2022 16:16 |
Adjust cpu speed slider in WHDload config | markpage | support.WinUAE | 2 | 09 October 2012 20:22 |
Game Speed under WHDLoad | Winterjaeger | support.Games | 0 | 23 September 2012 20:03 |
change the mouse speed in a whdload game | _psy | project.WHDLoad | 3 | 08 June 2012 10:41 |
WHDLoad game speed | Washac | project.WHDLoad | 7 | 26 February 2012 17:40 |
|
|