English Amiga Board


Go Back   English Amiga Board > Coders > Coders. System

 
 
Thread Tools
Old 23 December 2014, 20:20   #1
Michael
A1260T/PPC/BV/SCSI/NET

Michael's Avatar
 
Join Date: Jan 2013
Location: Moscow / Russia
Posts: 616
Puzzle, ROM & Resident modules, performance tuning.

I have done a few SysSpeed tests on my real A1200 with BlizzardPPC
and I can't understand what's going on with the same system but with slightly different software configuration (ROM Modules)

Theoretically with ROM in fast things should improve, and they do on most tests, but why the PPC mem read speeds drop ?

And why using loadresident gives a massive boost to some intuition/gfx operations?!

ideas?

Column 1 - intuition and graphics libs are loaded using loadresident, no fast rom
Column 2 - basic config, no fast rom
Column 3 - rom kicked to fast with blizkick
Column 4 - rom in fast mem via boot menu



Attached Thumbnails
Click image for larger version

Name:	SS_intuition.png
Views:	801
Size:	24.5 KB
ID:	42532   Click image for larger version

Name:	SS_graphics.png
Views:	747
Size:	34.3 KB
ID:	42533   Click image for larger version

Name:	SS_cpu.png
Views:	723
Size:	11.9 KB
ID:	42534  
Michael is offline  
AdSense AdSense  
Old 06 January 2015, 11:51   #2
Michael
A1260T/PPC/BV/SCSI/NET

Michael's Avatar
 
Join Date: Jan 2013
Location: Moscow / Russia
Posts: 616
So, no ideas?
Michael is offline  
Old 11 January 2015, 02:37   #3
Lonewolf10
AMOS Extensions Developer
Lonewolf10's Avatar
 
Join Date: Jun 2007
Location: near Cambridge, UK
Age: 38
Posts: 1,917
Sorry, not a clue.

Maybe someone else will respond with an answer or suggestion?
Lonewolf10 is offline  
Old 11 January 2015, 12:00   #4
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 42
Posts: 19,924
Does other benchmarks show similar changes? I'd never trust single benchmark to actually measure what you (or even the benchmark itself) think it measures.
Toni Wilen is offline  
Old 11 January 2015, 16:05   #5
voxel
Amiga Nuts!
voxel's Avatar
 
Join Date: Sep 2006
Location: Le Mayet d'Ecole, 03800, FRANCE
Posts: 162
hello ^^)

don't forget the overhead effect introduced by the context switching between the 68K side and the ppc side when a 68k software (syspeed) test the ppc side...
voxel is offline  
Old 11 January 2015, 19:52   #6
Michael
A1260T/PPC/BV/SCSI/NET

Michael's Avatar
 
Join Date: Jan 2013
Location: Moscow / Russia
Posts: 616
We know that different benchmarks show different results, the test were done several times to exclude any bogus results that drop out or go high.
What's interesting is when the same library (code) is placed in fast mem by different methods it impacts dramatically on some test results. Some tests improve by more then 80%! While others of the same nature at the same time drop performance by more then 5%.
Michael is offline  
Old 12 January 2015, 04:03   #7
matthey
Banned
 
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
Quote:
Originally Posted by Michael View Post
We know that different benchmarks show different results, the test were done several times to exclude any bogus results that drop out or go high.
What's interesting is when the same library (code) is placed in fast mem by different methods it impacts dramatically on some test results. Some tests improve by more then 80%! While others of the same nature at the same time drop performance by more then 5%.
I also don't trust SysSpeed although it is generally a good benchmark program. There could be an issue with alignment (PPC is sensitive) or context switching (the PPC+68k is crazy complex) with this particular test. The memory copy speed is a clue that the benchmark result may be incorrect. There is no slow down for memory copy but I believe the copy speed would be reduced if the load speed was reduced as a load/store architecture separates load+store and would be the sum of the 2 or worse. CISC can be hardware optimized for memory copy as my 68060@75MHz can copy up to 37MB/s according to SysSpeed which is better than your PPC.

Your write memory speeds are also suspicious. You get the best MB/s with a byte write instead of 4x larger word (32 bit on PPC) writes? This also could be alignment related as a byte write is always aligned while the others may give mis-alignment penalties that grow as the data size grows. A larger data size for read, write or copy is generally faster on any CPU including my 68060. Writes are tricky though as the more common copyback caching can be slower than writethrough caching.

Benchmarking is an art of science. While the SysSpeed author, Torsten Bach, is no doubt a good programmer, he may not have the technical background to understand the hardware or the documentation may be lacking.
matthey is offline  
Old 12 January 2015, 17:34   #8
Michael
A1260T/PPC/BV/SCSI/NET

Michael's Avatar
 
Join Date: Jan 2013
Location: Moscow / Russia
Posts: 616
@matthey
Your comments have a few logical explanations as to why some things differ but that's hardware dependent.

1. How do you compare 68K CPU copy against PPC if there is no equivalent test in SS ?
2. A lot will depend on the bus speed and the results are about correct
3. The mem speed tests might have flaws (or do get cached at some points) or somerhing. But why Readb is better when rom is not kicked in fast ram ? That's odd.
4. And most curious is why drawing ellipses have an enormous difference. The lib is in fast memory in both cases, just pit there by different means.

So the question is, is it worth kicking the rom to fast mem ?
Yes, it obviously improves performance.

But what if we put the rom libs in same fast mem as residents?
Why the system boost performance on some tests a lot, while a few tests show a slight slowdown.
Michael is offline  
Old 12 January 2015, 23:08   #9
matthey
Banned
 
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
Quote:
Originally Posted by Michael View Post
@matthey
Your comments have a few logical explanations as to why some things differ but that's hardware dependent.

1. How do you compare 68K CPU copy against PPC if there is no equivalent test in SS ?
The best copy speed is by what ever method gives the highest MB/s in the fastest memory type (fast memory). On my 68060 this is Fast2Fast16 (using MOVE16 loop) for SysSpeed (results could be higher with more efficient code).

read08 79.53 38.34
read16 80.67 53.02
read32 79.18 65.45
best_r 80.67 65.45

write08 73.88 30.97
write16 61.67 48.18
write32 48.03 48.18
best_wr 73.88 48.18

copy08 28.49 17.99
copy16 33.09 26.46
copy32 29.15 29.95
best_c 29.15 37.03

The 1st column is the operation and data size number of bits used in the loop.
The 2nd column is your PPC results in SysSpeed.
The 3rd column is my CSMK3 68060@75MHz results in SysSpeed.

Better results than what SysSpeed achieves are probably possible for both processors. Data alignment, code alignment, loop unroll length, instructions used, caching and MMU settings, timing problems etc. can all affect these numbers.

PPC, ARM and most RISC data sizes are byte (8 bits), half word (16 bits), word (32 bits), double word (64 bits).
The 68k data sizes are byte (8 bits), word (16 bits), long word (32 bits), quad word (64 bits).
x86/x86_64 data sizes are byte (8 bits), word (16 bits), double word (32 bits), quad word (64 bits).

Quote:
Originally Posted by Michael View Post
2. A lot will depend on the bus speed and the results are about correct
Bus width is mighty important also. A hardware technical person could probably tell you the theoretical maximum bandwidth in MB/s based on bus speed and width. That doesn't mean the CPU can reach or maintain it.

Quote:
Originally Posted by Michael View Post
3. The mem speed tests might have flaws (or do get cached at some points) or something. But why Readb is better when rom is not kicked in fast ram ? That's odd.
SysSpeed flawed? Maybe you should use SysInfo .

Quote:
Originally Posted by Michael View Post
4. And most curious is why drawing ellipses have an enormous difference. The lib is in fast memory in both cases, just put there by different means.
Alignment was one guess but it could be many others.

Quote:
Originally Posted by Michael View Post
So the question is, is it worth kicking the rom to fast mem ?
Yes, it obviously improves performance.

But what if we put the rom libs in same fast mem as residents?
Why the system boost performance on some tests a lot, while a few tests show a slight slowdown.
A programmer likely messed something up. Take what gives the best overall real world performance and report the problem to code enforcement .
matthey is offline  
Old 14 September 2017, 12:07   #10
tom256
Registered User
 
Join Date: Dec 2011
Location: Poland
Posts: 121
Small update to this topic.

While watching docs for MC68030 I discover that using MMU will provide 1 clock of delay for accessing mapped memory.
In fact kickstart mapped using MMU will extend time to access soft loaded to memory. Maybe some mechanism from some of these software working in this way causing additional delays to memory access with graphic.library affecting performance.
tom256 is offline  
Old 14 September 2017, 21:49   #11
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 42
Posts: 19,924
68030 documentation says "The MMU completely overlaps address translation time with other processing activity when the translation is resident in the ATC".

I guess it may be a bit too optimistic description and there can be situations when translation slows down memory accesses.
Toni Wilen is offline  
AdSense AdSense  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
XPKMash and kickstart rom modules gulliver support.Apps 4 27 May 2014 23:09
ROM resident HRTMon Toni Wilen Coders. General 24 09 January 2008 10:33
Need a good&small resident AV&MBR protector cybernoid support.Apps 2 10 May 2007 17:14
Apod 1 & 2 (pc player with amiga modules) sarek2k Nostalgia & memories 8 30 December 2006 05:06
Wizzard puzzle jump&run mix Erian74 Looking for a game name ? 8 19 September 2006 13:56

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 04:42.


Powered by vBulletin® Version 3.8.8 Beta 1
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Page generated in 0.25384 seconds with 15 queries