English Amiga Board


Go Back   English Amiga Board > Coders > Coders. System > Coders. Nextgen

 
 
Thread Tools
Old 15 November 2016, 15:41   #1
Samurai_Crow
Total Chaos forever!

Samurai_Crow's Avatar
 
Join Date: Aug 2007
Location: Ft. Collins, CO USA
Age: 43
Posts: 1,088
Send a message via Yahoo to Samurai_Crow
Sorting benchmark

Could someone please compile this code on an AmigaOne with a G3 or Efika and post the results on this thread?

Code:
/*-----------------------------------------------------------------------*/
/*
 CPU  Stress test 
 (Gunnar von Boehn) Feel free to do with the code what you want.

 The test is small but stresses the following CPU features:
 - DataCache
 - Conditional Code execution / Branch prediction
 - Loop acceleration  
 - Memory Hazard Detection

 = These are all important CPU features that are stressed.
 The test was taken from the internal test cases of the APOLLO CPU project.


 Compile with O2 
 E.g: gcc -o sort -O2 sortbench.c
 
 */
# include <stdio.h>
# include <stdlib.h>
# include <float.h>
# include <sys/time.h>
# define HLINE "-------------------------------------------------------------\n"



int count[32]={ 524799,2098175,4720127,8390655,
          13109759,18877439,25693695,33558527,
          42471935,52433919,63444479,75503615,
          88611327,102767615,117972479,134225919,
          151527935,169878527,189277695,209725439,
          231221759,253766655,277360127,302002175,
          327692799,354431999,382219775,411056127,
          440941055,471874559,503856639,536887295};


 
double mysecond() {
        struct timeval tp;
        struct timezone tzp;
        int i;

        i = gettimeofday(&tp,&tzp);
        return ( (double) tp.tv_sec + (double) tp.tv_usec * 1.e-6 );
}

/* 
 68K Code
  loop2:
       move.l  D2,D1        ;                                           1  -  
       move.l  (a0)+,D2      ; 2nd value in register D2             1  1                      
       cmp.l   D1,D2        ; Compare values                            2  1 
       bge     noswap       ; Branch if greater than or equal to        3  -         
 doswap:
       exg     D1,D2        ;                                       3  1
 noswap:
       move.l  D1,-8(a0)    ; Store 1st ordered values                  4  1            
       subq.l  #1,D6
       bge     d6,loop2     ; inner loop                   4  1
*/

void sort(int * data, int size){
    int D6,D7,D1,D2,D3; int * A0;   
    D7 = size-2;
loop1:
    A0 = data;
    D6 = D7;
    D2 = *A0++;
loop2:
    D1 = D2;
    D2 = *A0++;
    if( D1>D2 ){
      D3=D1; D1=D2; D2=D3;
    }
    *(A0-2) = D1;
    D6--; if( D6 > -1) goto loop2;
    *(A0-1) = D2;
    D7--; if( D7 > -1) goto loop1;
}

void bench(int size1){
    int i; double time1, time2;
    int * data;
    int * A0,A1;
    int size=size1*1024;
    int loops, loopsmax;

    data =malloc(size*4); // Malloc anough for array between 1KB to  64KB
    time1= mysecond(); 

    loopsmax=count[31]/count[size1-1];
    for(loops=loopsmax; loops>0 ; loops--){

      A0 = data;
      for (i=size; i>0 ; i--){
        *A0++=i;
      }
      sort(data, size);
    }
    time2= mysecond(); 

    printf("%2i K Element :  %6.2f MB/sec\n",size/1024, loopsmax*count[size1-1]/1024*8/(time2-time1)/1024 );
    free(data);
}
main() {
    int i;

    printf(HLINE);
    printf("SORTBENCH 1.1 (Gunnar von Boehn)\n");
    printf("Its a CPU benchmark that stresses CPU, DCache and branch prediction.\n");
    printf(HLINE);

    for (i=1; i<=32; i=i*2){
       bench(i);
    }
}
We want to gauge against the current experimental Gold2 Core on the Vampire.
Attached Thumbnails
Click image for larger version

Name:	sortbench.png
Views:	150
Size:	81.8 KB
ID:	50875  
Samurai_Crow is offline  
Old 15 November 2016, 18:48   #2
DDNI
Targ Explorer

DDNI's Avatar
 
Join Date: Mar 2006
Location: Northern Ireland
Age: 44
Posts: 5,181
Send a message via ICQ to DDNI Send a message via MSN to DDNI
No efika or G3 here.
Tested on my AmigaOne X1000

-------------------------------------------------------------
SORTBENCH 1.1 (Gunnar von Boehn)
Its a CPU benchmark that stresses CPU, DCache and branch prediction.
-------------------------------------------------------------
1 K Element : 2084.88 MB/sec
2 K Element : 2096.05 MB/sec
4 K Element : 2105.75 MB/sec
8 K Element : 2134.63 MB/sec
16 K Element : 2112.34 MB/sec
32 K Element : 1712.92 MB/sec
DDNI is offline  
Old 15 November 2016, 20:24   #3
matthey
Banned
 
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
Quote:
Originally Posted by DDNI View Post
No efika or G3 here.
Tested on my AmigaOne X1000

32 K Element : 1712.92 MB/sec
Someone is going to need a bigger graph . More interesting would be MB/s/MHz though. It shows which processors are weak (like most ARM and ColdFire) and gives a much better comparison of processor design. Compiler performance is a big part of this test as with DMIPS. One bug or missed optimization can make a huge difference as we saw in another thread with vbcc DMIPS compiled code.
matthey is offline  
Old 15 November 2016, 20:38   #4
emufan
Registered User
 
Join Date: Feb 2012
Location: #DrainTheSwamp
Posts: 4,546
compiler version makes a difference - (here on my peecee amd phenom II X4 955 3.2 ghz)

i686-w64-mingw32-gcc-5.4.0.exe -> 32 K Element : 6076.98 MB/sec
i686-pc-mingw32-gcc-4.7.3.exe -> 32 K Element : 4870.25 MB/sec

sort-gcc-4.7.3.exe: PE32 executable (console) Intel 80386 (stripped to external PDB), for MS Windows
sort-gcc-5.4.0.exe: PE32 executable (console) Intel 80386 (stripped to external PDB), for MS Windows

attached my binaries - compiled using cygwin/ming32 [ -o sort -O2 sortbench.c ]

#1) can anyone attach an amiga os binary? the linker of my crosscompiler does not work :/
Attached Files
File Type: zip sortbench-win32.zip (16.2 KB, 53 views)

Last edited by emufan; 15 November 2016 at 20:55.
emufan is offline  
Old 16 November 2016, 22:44   #5
nogginthenog
Amigan

 
Join Date: Feb 2012
Location: London
Posts: 624
4 year old Intel I7 3770 3.4GHz running Windows:
Linux VM: 32 K Element : 14682.84 MB/sec
Cygwin: 32 K Element : 14657.00 MB/sec
Windows 10 bash: 32 K Element : 14563.97 MB/sec
(pretty consistent)

A4000 68060 CyberStorm MkII gcc 2.95.3:
32 K Element : 34.87 MB/sec

Compiled with m68k-amigaos-gcc -noixemul -m68020-60 -o sort_060 -O2 sortbench.c -lm

Ouch! Amiga executable attached.
Attached Files
File Type: zip scsi_060.zip (317 Bytes, 50 views)
nogginthenog is offline  
Old 16 November 2016, 23:17   #6
emufan
Registered User
 
Join Date: Feb 2012
Location: #DrainTheSwamp
Posts: 4,546
Quote:
Originally Posted by nogginthenog View Post
4 year old Intel I7 3770 3.4GHz running Windows:
Linux VM: 32 K Element : 14682.84 MB/sec
Cygwin: 32 K Element : 14657.00 MB/sec
Windows 10 bash: 32 K Element : 14563.97 MB/sec
(pretty consistent)
pretty awesome - while my amd burns down 120 wattage

Quote:
Ouch! Amiga executable attached.
thanks - but this is just a log of sortbench, not the amiga binary

#1) but using ur compiler syntax, I was able to make one on my own, thanks, the "-noixemul -lm" did the trick

#2) had to change %6.2f into %6f in the printf, otherwise i had "%6.2f MB/sec" in results list.
I only have winuae here, so those results makes no sense; but i get "915 MB/sec" in my fast as possible 030/882 setup

#3) attached 68000 and 68020-060 AmigaOS binaries.
Attached Files
File Type: zip sort-000-020.zip (15.6 KB, 55 views)

Last edited by emufan; 16 November 2016 at 23:48.
emufan is offline  
Old 17 November 2016, 00:23   #7
matthey
Banned
 
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
Quote:
Originally Posted by nogginthenog View Post
4 year old Intel I7 3770 3.4GHz running Windows:
Linux VM: 32 K Element : 14682.84 MB/sec
Cygwin: 32 K Element : 14657.00 MB/sec
Windows 10 bash: 32 K Element : 14563.97 MB/sec
(pretty consistent)
Modern x86_64 processors are strong at this test. They did well in test results posted at the following link also.

http://www.apollo-core.com/sortbench...age=benchmarks

Quote:
Originally Posted by nogginthenog View Post
A4000 68060 CyberStorm MkII gcc 2.95.3:
32 K Element : 34.87 MB/sec

Compiled with m68k-amigaos-gcc -noixemul -m68020-60 -o sort_060 -O2 sortbench.c -lm

Ouch!
The 68060 results are actually impressive. 32k elements is too big to fit in the DCache and is a poor size to choose when comparing the performance of older processors. A smaller number of elements were chosen for comparison in the results I linked above. The performance in the cache should scale linearly and different processors can be compared by looking at MB/s/MHz. My 68060@75MHz CSMKIII scored ~120 MB/s with outdated compilers and I achieved ~140 MB/s with assembler optimizations when the DCache could hold all the data.

ARM Cortex A4 0.30 MB/s/MHz
ColdFire v3 MCF5329 0.44 MB/s/MHz
Raspberry Pi ARM 1176JZF-S 0.652 MB/s/MHz
ARM Feroceon 88FR131 0.69 MB/s/MHz
IBM Power 6 0.69 MB/s/MHz
Intel Atom 0.84 MB/s/MHz
IBM Power 7 1.16 MB/s/MHz
AmigaOne X1000 PA6T-1682M 1.19 MB/s/MHz
PPC G4 7447 1.26 MB/s/MHz
68060 1.60 MB/s/MHz (1.87 MB/s/MHz with assembler optimizations)
Intel Core 2 Duo 2.61 MB/s/MHz
Intel i7 3770 4.32 MB/s/MHz (or more if smaller element size helps)

A modern clocked 68060 with modern die size and modern cache sizes would likely outperform everything here except modern x86_64 processors in this single core benchmark. It should even be possible for performance to improve some with a modern die shrink (shorter timings for instructions and addressing modes). The Apollo core claims even better performance than the 68060. The claim at the link I posted would give 3.70 MB/s/MHz which would be faster than an Intel Core 2 Duo for this test and would be even more impressive. This is why I suggested MB/s/MHz and using a smaller element size for comparison but nobody ever listened to me .
matthey is offline  
Old 17 November 2016, 07:18   #8
Samurai_Crow
Total Chaos forever!

Samurai_Crow's Avatar
 
Join Date: Aug 2007
Location: Ft. Collins, CO USA
Age: 43
Posts: 1,088
Send a message via Yahoo to Samurai_Crow
Hi matthey,

Quote:
A modern clocked 68060 with modern die size and modern cache sizes would likely outperform everything here except modern x86_64 processors in this single core benchmark. It should even be possible for performance to improve some with a modern die shrink (shorter timings for instructions and addressing modes). The Apollo core claims even better performance than the 68060. The claim at the link I posted would give 3.70 MB/s/MHz which would be faster than an Intel Core 2 Duo for this test and would be even more impressive.
It shouldn't be too surprising. I think Gunnar is figuring out many of the same cache optimizations that are used by Intel. One additional advantage that the '060 and '080 share is that the code density is slightly higher than most of the others.

The AMD64 instruction set brought registers that were only used for segment pointers that hung lifeless in 32-bit enhanced code back into circulation as general-purpose registers bringing the total up to 16. Of course the '080 has an additional bank of 8 bringing the total for the '080 up to 24, as you have probably heard.
Samurai_Crow is offline  
Old 17 November 2016, 09:03   #9
TuKo
Apollo Team

TuKo's Avatar
 
Join Date: May 2014
Location: not far
Posts: 240
Quote:
ARM Cortex A4 0.30 MB/s/MHz
ColdFire v3 MCF5329 0.44 MB/s/MHz
Raspberry Pi ARM 1176JZF-S 0.652 MB/s/MHz
ARM Feroceon 88FR131 0.69 MB/s/MHz
IBM Power 6 0.69 MB/s/MHz
Intel Atom 0.84 MB/s/MHz
IBM Power 7 1.16 MB/s/MHz
AmigaOne X1000 PA6T-1682M 1.19 MB/s/MHz
PPC G4 7447 1.26 MB/s/MHz
68060 1.60 MB/s/MHz (1.87 MB/s/MHz with assembler optimizations)
Intel Core 2 Duo 2.61 MB/s/MHz
Intel i7 3770 4.32 MB/s/MHz (or more if smaller element size helps)
Thanks for these results, they are very interesting ! Can you please give us model details for Core 2 Duo ?

Last edited by TuKo; 17 November 2016 at 09:15.
TuKo is offline  
Old 17 November 2016, 09:19   #10
Samurai_Crow
Total Chaos forever!

Samurai_Crow's Avatar
 
Join Date: Aug 2007
Location: Ft. Collins, CO USA
Age: 43
Posts: 1,088
Send a message via Yahoo to Samurai_Crow
Here are the results for my Core2 Duo-based Mac Mini running 64-bit Debian Linux at 1.83 GHz:
-------------------------------------------------------------
SORTBENCH 1.1 (Gunnar von Boehn)
Its a CPU benchmark that stresses CPU, DCache and branch prediction.
-------------------------------------------------------------
1 K Element : 6646.58 MB/sec
2 K Element : 6817.39 MB/sec
4 K Element : 6903.55 MB/sec
8 K Element : 6927.06 MB/sec
16 K Element : 6876.59 MB/sec
32 K Element : 6868.39 MB/sec

Results are from the same executable compiled with GCC 4.9.2 .

-edit-
The results using -mtune=core2 are shown below. Slightly better than before.

samuraicrow@SamsMacMini:~/Downloads$ ./sort2
-------------------------------------------------------------
SORTBENCH 1.1 (Gunnar von Boehn)
Its a CPU benchmark that stresses CPU, DCache and branch prediction.
-------------------------------------------------------------
1 K Element : 6697.62 MB/sec
2 K Element : 6809.94 MB/sec
4 K Element : 6917.67 MB/sec
8 K Element : 6947.43 MB/sec
16 K Element : 6934.64 MB/sec
32 K Element : 6934.44 MB/sec

@matthey

Attached is a new chart for you with a newer Gold2core candidate measured against the efficiency of your Core2Duo.
Attached Thumbnails
Click image for larger version

Name:	Sortbench2chart.png
Views:	90
Size:	118.6 KB
ID:	50902  

Last edited by Samurai_Crow; 17 November 2016 at 12:19. Reason: updated results with tuned executable
Samurai_Crow is offline  
Old 17 November 2016, 12:16   #11
matthey
Banned
 
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
Quote:
Originally Posted by Samurai_Crow View Post
It shouldn't be too surprising. I think Gunnar is figuring out many of the same cache optimizations that are used by Intel. One additional advantage that the '060 and '080 share is that the code density is slightly higher than most of the others.
I'm not sure code density makes much of a difference for this benchmark. The code is relatively small and should fit in the ICache of all but the oldest processors. The Data takes the same space in the DCache of all processors. Furthermore, if Gunnar cared about code density then maybe he wouldn't have abandoned research and enhancements which improve it.

Quote:
Originally Posted by Samurai_Crow View Post
The AMD64 instruction set brought registers that were only used for segment pointers that hung lifeless in 32-bit enhanced code back into circulation as general-purpose registers bringing the total up to 16. Of course the '080 has an additional bank of 8 bringing the total for the '080 up to 24, as you have probably heard.
The 68080, as Gunnar calls it, isn't using any of those extra registers for this benchmark. They are unlikely to ever be used by any compiler. This benchmark is testing only the decades old API and stack based ABI of a more modern 68k CPU design. The AMD64/x86_64 enhancements of 64 registers and an improved ABI passing function arguments in registers is being utilized.

Quote:
Originally Posted by TuKo View Post
Thanks for these results, they are very interesting ! Can you please give us model details for Core 2 Duo ?
Results will vary significantly due to many factors. My point was to produce a rough idea of the peak performance in cache of different processors which is roughly comparable. My numbers and info was based on the web site I linked to (and numbers given in this thread) and I do not know how reliable they are other than the 68060 numbers from my Amiga. There probably is a significant difference between early Core 2 Duos with small caches and later die shrink versions with larger caches. Results can vary significantly by API/ABI used as well. Samurai's results are significantly higher for example. They would be ~3.80 MB/s/MHz. The Core 2 Duo is a strong and efficient (for x86_64 architecture) processor.
matthey is offline  
Old 17 November 2016, 12:22   #12
Samurai_Crow
Total Chaos forever!

Samurai_Crow's Avatar
 
Join Date: Aug 2007
Location: Ft. Collins, CO USA
Age: 43
Posts: 1,088
Send a message via Yahoo to Samurai_Crow
Ninja'd by matthey's post. Check the edit of my previous post.

-edit-
Whoops. I see now that it was from the website that you got the results. I wonder what compiler was used to generate such poor results of the Core2Duo that should have been faster than my Mac Mini by spec.

-edit2-
Found the problem. All the results on the website were from Sortbench 1.0 while the source in this thread was Sortbench 1.1.

Last edited by Samurai_Crow; 17 November 2016 at 12:53. Reason: Noted source of information correctly.
Samurai_Crow is offline  
Old 17 November 2016, 16:11   #13
jPV
Registered User
jPV's Avatar
 
Join Date: Feb 2008
Location: RNO
Posts: 601
Quote:
Originally Posted by Samurai_Crow View Post
Could someone please compile this code on an AmigaOne with a G3 or Efika and post the results on this thread?
Is Pegasos 1 with G3/600MHz and MorphOS close enough?

As already said, there's some difference which compiler you use, here are with gcc 2 and 5:

Pegasos1 G3/600MHz MorphOS gcc2:

-------------------------------------------------------------
SORTBENCH 1.1 (Gunnar von Boehn)
Its a CPU benchmark that stresses CPU, DCache and branch prediction.
-------------------------------------------------------------
1 K Element : 736.16 MB/sec
2 K Element : 736.15 MB/sec
4 K Element : 734.83 MB/sec
8 K Element : 734.24 MB/sec
16 K Element : 670.56 MB/sec
32 K Element : 649.91 MB/sec


Pegasos1 G3/600MHz MorphOS gcc5:

-------------------------------------------------------------
SORTBENCH 1.1 (Gunnar von Boehn)
Its a CPU benchmark that stresses CPU, DCache and branch prediction.
-------------------------------------------------------------
1 K Element : 1102.51 MB/sec
2 K Element : 1103.06 MB/sec
4 K Element : 1101.12 MB/sec
8 K Element : 1099.38 MB/sec
16 K Element : 984.12 MB/sec
32 K Element : 948.57 MB/sec


And then one test with my "Amiga laptop"...

PowerBook G4/1667MHz MorphOS gcc5:

-------------------------------------------------------------
SORTBENCH 1.1 (Gunnar von Boehn)
Its a CPU benchmark that stresses CPU, DCache and branch prediction.
-------------------------------------------------------------
1 K Element : 3082.05 MB/sec
2 K Element : 3082.15 MB/sec
4 K Element : 3059.91 MB/sec
8 K Element : 3069.99 MB/sec
16 K Element : 2877.81 MB/sec
32 K Element : 2807.30 MB/sec
jPV is offline  
Old 17 November 2016, 20:04   #14
nogginthenog
Amigan

 
Join Date: Feb 2012
Location: London
Posts: 624
Quote:
Originally Posted by emufan View Post
thanks - but this is just a log of sortbench, not the amiga binary
Doh!

Quote:
#1) but using ur compiler syntax, I was able to make one on my own, thanks, the "-noixemul -lm" did the trick

#2) had to change %6.2f into %6f in the printf, otherwise i had "%6.2f MB/sec" in results list.
I got similar results until I added -lm. -m68020-60 made little difference.
It would be interesting to see if GCC v6 makes a difference. This guy claims to have 6.20 working. I compiled v6 for 68k ages ago but without Amiga patches.
nogginthenog is offline  
Old 17 November 2016, 21:52   #15
Samurai_Crow
Total Chaos forever!

Samurai_Crow's Avatar
 
Join Date: Aug 2007
Location: Ft. Collins, CO USA
Age: 43
Posts: 1,088
Send a message via Yahoo to Samurai_Crow
Neat link about GCC 6.2 building. Thanks!
Samurai_Crow is offline  
Old 27 November 2016, 18:28   #16
nogginthenog
Amigan

 
Join Date: Feb 2012
Location: London
Posts: 624
Quote:
Originally Posted by Samurai_Crow View Post
Neat link about GCC 6.2 building. Thanks!
Did you try to build it? It doesn't look like Stefan has published his patches.
nogginthenog is offline  
Old 27 November 2016, 22:06   #17
Samurai_Crow
Total Chaos forever!

Samurai_Crow's Avatar
 
Join Date: Aug 2007
Location: Ft. Collins, CO USA
Age: 43
Posts: 1,088
Send a message via Yahoo to Samurai_Crow
I didn't try to build it. It doesn't seem like his patches are public or completely tested.
Samurai_Crow is offline  
Old 23 April 2017, 03:20   #18
jack-3d
kLiker
 
Join Date: Mar 2011
Location: Brno / Czech Republic
Posts: 348
Can someone please make and share WarpOS executable?
jack-3d is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Remakes - Sorting the good from the crap Jim Retrogaming General Discussion 42 12 December 2013 14:31
Sorting through my Amiga games Collection! fitzsteve Retrogaming General Discussion 6 04 July 2013 01:29
Sorting out gotmashed MarketPlace 1 13 August 2007 11:49
Bonus! Was sorting out all my game boxes... Chris Nostalgia & memories 29 23 January 2003 18:37
Sorting through my disk collection Jim Retrogaming General Discussion 10 10 September 2002 10:54

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 23:29.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2018, vBulletin Solutions Inc.
Page generated in 0.08683 seconds with 14 queries