15 November 2016, 15:41 | #1 |
Total Chaos forever!
Join Date: Aug 2007
Location: Waterville, MN, USA
Age: 49
Posts: 2,223
|
Sorting benchmark
Could someone please compile this code on an AmigaOne with a G3 or Efika and post the results on this thread?
Code:
/*-----------------------------------------------------------------------*/ /* CPU Stress test (Gunnar von Boehn) Feel free to do with the code what you want. The test is small but stresses the following CPU features: - DataCache - Conditional Code execution / Branch prediction - Loop acceleration - Memory Hazard Detection = These are all important CPU features that are stressed. The test was taken from the internal test cases of the APOLLO CPU project. Compile with O2 E.g: gcc -o sort -O2 sortbench.c */ # include <stdio.h> # include <stdlib.h> # include <float.h> # include <sys/time.h> # define HLINE "-------------------------------------------------------------\n" int count[32]={ 524799,2098175,4720127,8390655, 13109759,18877439,25693695,33558527, 42471935,52433919,63444479,75503615, 88611327,102767615,117972479,134225919, 151527935,169878527,189277695,209725439, 231221759,253766655,277360127,302002175, 327692799,354431999,382219775,411056127, 440941055,471874559,503856639,536887295}; double mysecond() { struct timeval tp; struct timezone tzp; int i; i = gettimeofday(&tp,&tzp); return ( (double) tp.tv_sec + (double) tp.tv_usec * 1.e-6 ); } /* 68K Code loop2: move.l D2,D1 ; 1 - move.l (a0)+,D2 ; 2nd value in register D2 1 1 cmp.l D1,D2 ; Compare values 2 1 bge noswap ; Branch if greater than or equal to 3 - doswap: exg D1,D2 ; 3 1 noswap: move.l D1,-8(a0) ; Store 1st ordered values 4 1 subq.l #1,D6 bge d6,loop2 ; inner loop 4 1 */ void sort(int * data, int size){ int D6,D7,D1,D2,D3; int * A0; D7 = size-2; loop1: A0 = data; D6 = D7; D2 = *A0++; loop2: D1 = D2; D2 = *A0++; if( D1>D2 ){ D3=D1; D1=D2; D2=D3; } *(A0-2) = D1; D6--; if( D6 > -1) goto loop2; *(A0-1) = D2; D7--; if( D7 > -1) goto loop1; } void bench(int size1){ int i; double time1, time2; int * data; int * A0,A1; int size=size1*1024; int loops, loopsmax; data =malloc(size*4); // Malloc anough for array between 1KB to 64KB time1= mysecond(); loopsmax=count[31]/count[size1-1]; for(loops=loopsmax; loops>0 ; loops--){ A0 = data; for (i=size; i>0 ; i--){ *A0++=i; } sort(data, size); } time2= mysecond(); printf("%2i K Element : %6.2f MB/sec\n",size/1024, loopsmax*count[size1-1]/1024*8/(time2-time1)/1024 ); free(data); } main() { int i; printf(HLINE); printf("SORTBENCH 1.1 (Gunnar von Boehn)\n"); printf("Its a CPU benchmark that stresses CPU, DCache and branch prediction.\n"); printf(HLINE); for (i=1; i<=32; i=i*2){ bench(i); } } |
15 November 2016, 18:48 | #2 |
Targ Explorer
|
No efika or G3 here.
Tested on my AmigaOne X1000 ------------------------------------------------------------- SORTBENCH 1.1 (Gunnar von Boehn) Its a CPU benchmark that stresses CPU, DCache and branch prediction. ------------------------------------------------------------- 1 K Element : 2084.88 MB/sec 2 K Element : 2096.05 MB/sec 4 K Element : 2105.75 MB/sec 8 K Element : 2134.63 MB/sec 16 K Element : 2112.34 MB/sec 32 K Element : 1712.92 MB/sec |
15 November 2016, 20:24 | #3 |
Banned
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
|
Someone is going to need a bigger graph . More interesting would be MB/s/MHz though. It shows which processors are weak (like most ARM and ColdFire) and gives a much better comparison of processor design. Compiler performance is a big part of this test as with DMIPS. One bug or missed optimization can make a huge difference as we saw in another thread with vbcc DMIPS compiled code.
|
15 November 2016, 20:38 | #4 |
Registered User
Join Date: Feb 2012
Location: #DrainTheSwamp
Posts: 4,545
|
compiler version makes a difference - (here on my peecee amd phenom II X4 955 3.2 ghz)
i686-w64-mingw32-gcc-5.4.0.exe -> 32 K Element : 6076.98 MB/sec i686-pc-mingw32-gcc-4.7.3.exe -> 32 K Element : 4870.25 MB/sec sort-gcc-4.7.3.exe: PE32 executable (console) Intel 80386 (stripped to external PDB), for MS Windows sort-gcc-5.4.0.exe: PE32 executable (console) Intel 80386 (stripped to external PDB), for MS Windows attached my binaries - compiled using cygwin/ming32 [ -o sort -O2 sortbench.c ] #1) can anyone attach an amiga os binary? the linker of my crosscompiler does not work :/ Last edited by emufan; 15 November 2016 at 20:55. |
16 November 2016, 22:44 | #5 |
Amigan
Join Date: Feb 2012
Location: London
Posts: 1,323
|
4 year old Intel I7 3770 3.4GHz running Windows:
Linux VM: 32 K Element : 14682.84 MB/sec Cygwin: 32 K Element : 14657.00 MB/sec Windows 10 bash: 32 K Element : 14563.97 MB/sec (pretty consistent) A4000 68060 CyberStorm MkII gcc 2.95.3: 32 K Element : 34.87 MB/sec Compiled with m68k-amigaos-gcc -noixemul -m68020-60 -o sort_060 -O2 sortbench.c -lm Ouch! Amiga executable attached. |
16 November 2016, 23:17 | #6 | ||
Registered User
Join Date: Feb 2012
Location: #DrainTheSwamp
Posts: 4,545
|
Quote:
Quote:
#1) but using ur compiler syntax, I was able to make one on my own, thanks, the "-noixemul -lm" did the trick #2) had to change %6.2f into %6f in the printf, otherwise i had "%6.2f MB/sec" in results list. I only have winuae here, so those results makes no sense; but i get "915 MB/sec" in my fast as possible 030/882 setup #3) attached 68000 and 68020-060 AmigaOS binaries. Last edited by emufan; 16 November 2016 at 23:48. |
||
17 November 2016, 00:23 | #7 | ||
Banned
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
|
Quote:
http://www.apollo-core.com/sortbench...age=benchmarks Quote:
ARM Cortex A4 0.30 MB/s/MHz ColdFire v3 MCF5329 0.44 MB/s/MHz Raspberry Pi ARM 1176JZF-S 0.652 MB/s/MHz ARM Feroceon 88FR131 0.69 MB/s/MHz IBM Power 6 0.69 MB/s/MHz Intel Atom 0.84 MB/s/MHz IBM Power 7 1.16 MB/s/MHz AmigaOne X1000 PA6T-1682M 1.19 MB/s/MHz PPC G4 7447 1.26 MB/s/MHz 68060 1.60 MB/s/MHz (1.87 MB/s/MHz with assembler optimizations) Intel Core 2 Duo 2.61 MB/s/MHz Intel i7 3770 4.32 MB/s/MHz (or more if smaller element size helps) A modern clocked 68060 with modern die size and modern cache sizes would likely outperform everything here except modern x86_64 processors in this single core benchmark. It should even be possible for performance to improve some with a modern die shrink (shorter timings for instructions and addressing modes). The Apollo core claims even better performance than the 68060. The claim at the link I posted would give 3.70 MB/s/MHz which would be faster than an Intel Core 2 Duo for this test and would be even more impressive. This is why I suggested MB/s/MHz and using a smaller element size for comparison but nobody ever listened to me . |
||
17 November 2016, 07:18 | #8 | |
Total Chaos forever!
Join Date: Aug 2007
Location: Waterville, MN, USA
Age: 49
Posts: 2,223
|
Hi matthey,
Quote:
The AMD64 instruction set brought registers that were only used for segment pointers that hung lifeless in 32-bit enhanced code back into circulation as general-purpose registers bringing the total up to 16. Of course the '080 has an additional bank of 8 bringing the total for the '080 up to 24, as you have probably heard. |
|
17 November 2016, 09:03 | #9 | |
Apollo Team
Join Date: May 2014
Location: not far
Posts: 381
|
Quote:
Last edited by TuKo; 17 November 2016 at 09:15. |
|
17 November 2016, 09:19 | #10 |
Total Chaos forever!
Join Date: Aug 2007
Location: Waterville, MN, USA
Age: 49
Posts: 2,223
|
Here are the results for my Core2 Duo-based Mac Mini running 64-bit Debian Linux at 1.83 GHz:
------------------------------------------------------------- SORTBENCH 1.1 (Gunnar von Boehn) Its a CPU benchmark that stresses CPU, DCache and branch prediction. ------------------------------------------------------------- 1 K Element : 6646.58 MB/sec 2 K Element : 6817.39 MB/sec 4 K Element : 6903.55 MB/sec 8 K Element : 6927.06 MB/sec 16 K Element : 6876.59 MB/sec 32 K Element : 6868.39 MB/sec Results are from the same executable compiled with GCC 4.9.2 . -edit- The results using -mtune=core2 are shown below. Slightly better than before. samuraicrow@SamsMacMini:~/Downloads$ ./sort2 ------------------------------------------------------------- SORTBENCH 1.1 (Gunnar von Boehn) Its a CPU benchmark that stresses CPU, DCache and branch prediction. ------------------------------------------------------------- 1 K Element : 6697.62 MB/sec 2 K Element : 6809.94 MB/sec 4 K Element : 6917.67 MB/sec 8 K Element : 6947.43 MB/sec 16 K Element : 6934.64 MB/sec 32 K Element : 6934.44 MB/sec @matthey Attached is a new chart for you with a newer Gold2core candidate measured against the efficiency of your Core2Duo. Last edited by Samurai_Crow; 17 November 2016 at 12:19. Reason: updated results with tuned executable |
17 November 2016, 12:16 | #11 | ||
Banned
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
|
Quote:
Quote:
Results will vary significantly due to many factors. My point was to produce a rough idea of the peak performance in cache of different processors which is roughly comparable. My numbers and info was based on the web site I linked to (and numbers given in this thread) and I do not know how reliable they are other than the 68060 numbers from my Amiga. There probably is a significant difference between early Core 2 Duos with small caches and later die shrink versions with larger caches. Results can vary significantly by API/ABI used as well. Samurai's results are significantly higher for example. They would be ~3.80 MB/s/MHz. The Core 2 Duo is a strong and efficient (for x86_64 architecture) processor. |
||
17 November 2016, 12:22 | #12 |
Total Chaos forever!
Join Date: Aug 2007
Location: Waterville, MN, USA
Age: 49
Posts: 2,223
|
Ninja'd by matthey's post. Check the edit of my previous post.
-edit- Whoops. I see now that it was from the website that you got the results. I wonder what compiler was used to generate such poor results of the Core2Duo that should have been faster than my Mac Mini by spec. -edit2- Found the problem. All the results on the website were from Sortbench 1.0 while the source in this thread was Sortbench 1.1. Last edited by Samurai_Crow; 17 November 2016 at 12:53. Reason: Noted source of information correctly. |
17 November 2016, 16:11 | #13 | |
Registered User
Join Date: Feb 2008
Location: RNO
Posts: 1,010
|
Quote:
As already said, there's some difference which compiler you use, here are with gcc 2 and 5: Pegasos1 G3/600MHz MorphOS gcc2: ------------------------------------------------------------- SORTBENCH 1.1 (Gunnar von Boehn) Its a CPU benchmark that stresses CPU, DCache and branch prediction. ------------------------------------------------------------- 1 K Element : 736.16 MB/sec 2 K Element : 736.15 MB/sec 4 K Element : 734.83 MB/sec 8 K Element : 734.24 MB/sec 16 K Element : 670.56 MB/sec 32 K Element : 649.91 MB/sec Pegasos1 G3/600MHz MorphOS gcc5: ------------------------------------------------------------- SORTBENCH 1.1 (Gunnar von Boehn) Its a CPU benchmark that stresses CPU, DCache and branch prediction. ------------------------------------------------------------- 1 K Element : 1102.51 MB/sec 2 K Element : 1103.06 MB/sec 4 K Element : 1101.12 MB/sec 8 K Element : 1099.38 MB/sec 16 K Element : 984.12 MB/sec 32 K Element : 948.57 MB/sec And then one test with my "Amiga laptop"... PowerBook G4/1667MHz MorphOS gcc5: ------------------------------------------------------------- SORTBENCH 1.1 (Gunnar von Boehn) Its a CPU benchmark that stresses CPU, DCache and branch prediction. ------------------------------------------------------------- 1 K Element : 3082.05 MB/sec 2 K Element : 3082.15 MB/sec 4 K Element : 3059.91 MB/sec 8 K Element : 3069.99 MB/sec 16 K Element : 2877.81 MB/sec 32 K Element : 2807.30 MB/sec |
|
17 November 2016, 20:04 | #14 | ||
Amigan
Join Date: Feb 2012
Location: London
Posts: 1,323
|
Quote:
Quote:
It would be interesting to see if GCC v6 makes a difference. This guy claims to have 6.20 working. I compiled v6 for 68k ages ago but without Amiga patches. |
||
17 November 2016, 21:52 | #15 |
Total Chaos forever!
Join Date: Aug 2007
Location: Waterville, MN, USA
Age: 49
Posts: 2,223
|
Neat link about GCC 6.2 building. Thanks!
|
27 November 2016, 18:28 | #16 |
Amigan
Join Date: Feb 2012
Location: London
Posts: 1,323
|
|
27 November 2016, 22:06 | #17 |
Total Chaos forever!
Join Date: Aug 2007
Location: Waterville, MN, USA
Age: 49
Posts: 2,223
|
I didn't try to build it. It doesn't seem like his patches are public or completely tested.
|
23 April 2017, 03:20 | #18 |
kLiker
Join Date: Mar 2011
Location: Brno / Czech Republic
Posts: 371
|
Can someone please make and share WarpOS executable?
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Remakes - Sorting the good from the crap | Jim | Retrogaming General Discussion | 42 | 12 December 2013 14:31 |
Sorting through my Amiga games Collection! | fitzsteve | Retrogaming General Discussion | 6 | 04 July 2013 01:29 |
Sorting out | gotmashed | MarketPlace | 1 | 13 August 2007 11:49 |
Bonus! Was sorting out all my game boxes... | Chris | Nostalgia & memories | 29 | 23 January 2003 18:37 |
Sorting through my disk collection | Jim | Retrogaming General Discussion | 10 | 10 September 2002 10:54 |
|
|