English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Language > Coders. C/C++

 
 
Thread Tools
Old 07 April 2023, 18:14   #1
bebbo
bye
 
Join Date: Jun 2016
Location: Some / Where
Posts: 680
Looking for C/C++ programs to benchmark compilers

I am looking for sources of C (and maybe C++) programs that can be used to compare the aspects of the generated code. Such a program should have only one source file and not use any parameters, to keep the scripts simple.


I want to compare size and execution time - something else?


To compare I'll use Vamos, WinUAE and real Amigas. Also an archive with the executables plus a script to run them all should be available at the end.


Any interest in contributing such a test program?


Attach it here or mail it to me.


THX
bebbo is offline  
Old 17 April 2023, 22:41   #2
bebbo
bye
 
Join Date: Jun 2016
Location: Some / Where
Posts: 680
I'm starting with
* sieve
* tscp182

and the first results are here:https://franke.ms/bench2/chart.html
bebbo is offline  
Old 17 April 2023, 23:04   #3
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,233
Sorry, I do not understand. Code generated b what? The code generated by an Amiga C compiler does not depend on the (real or virtual) environment it was compiled on, only the execution speed differs. vamos itself does not generate any code. It is an interpreter around musashi, except all Os calls which are executed in python on the native machine. WinAUE does have a jitter.
Thomas Richter is offline  
Old 17 April 2023, 23:39   #4
bebbo
bye
 
Join Date: Jun 2016
Location: Some / Where
Posts: 680
Quote:
Originally Posted by Thomas Richter View Post
Sorry, I do not understand. Code generated by what?
by the different compilers

Quote:
Originally Posted by Thomas Richter View Post
The code generated by an Amiga C compiler does not depend on the (real or virtual) environment it was compiled on, only the execution speed differs.
hopefully, yes :-)
Quote:
Originally Posted by Thomas Richter View Post
vamos itself does not generate any code. It is an interpreter around musashi, except all Os calls which are executed in python on the native machine. WinAUE does have a jitter.
And vamos is used to count the cpu cycles for the programs.


Less cycles used is better.





If someone provides a more reasonable text for what I am doing, I might consider using that text...


btw: the benchmark sources and all generated programs can be downloaded here: https://franke.ms/bench2/bench2.zip
bebbo is offline  
Old 18 April 2023, 08:57   #5
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,233
Quote:
Originally Posted by bebbo View Post
by the different compilers
WinUAE, vamos and real machine are not "compilers". They can all execute compilers. The compiled code does depend on the compiler, but not within which environment it was compiled.


Quote:
Originally Posted by bebbo View Post
hopefully, yes :-)
Definitely yes.


Quote:
Originally Posted by bebbo View Post

And vamos is used to count the cpu cycles for the programs.
Not really. First, I would not know that Musashi has this option, but even if it had, the result would be wrong. From the 68020 onwards, the number of cycles spend by a processor on an instruction does not depend on the instruction alone anymore. It depends on what is in the cache, and whether the instruction could partially overlap with the previous and the next instruction.



Cycle counting is a very bad idea to learn about software performance. Look at the source code, and learn about algorithmic complexity and big-O notation.
Thomas Richter is offline  
Old 18 April 2023, 09:35   #6
alkis
Registered User
 
Join Date: Dec 2010
Location: Athens/Greece
Age: 53
Posts: 719
Thomas, I am pretty sure Bebbo knows what a compiler is, since you know....he ported gcc to amigaos-target.

Anyways, the point of the chart is to compare compiler-produced-code from various compilers. It's "let's see which compiler comes up with the best code" situation.

Cycle counting from vamos for the 68000 seems pretty good from my experiments.

For example
Code:
vamos -v prime 1000000 1
10:18:27.122       main:   INFO:  done. exit code=0
10:18:27.123       main:   INFO:  total cycles: 50779208
10:18:27.123       main:   INFO:  vamos is exiting
Counted 78498 primes up to 1000000. (Did it 1 times)
which for an A500 (PAL) translates to 7.something seconds
Code:
50779208/7.09e6
	~7.16208857545839210155
Now, if I fire up fs-uae and emulate an A500 and run the code, it does run in 7.something seconds.

The benefit of using vamos cycle counting is you "measure" the same number no matter what your host machine is. I get the above number in Ryzen 9, intel, raspberry pi. So any future runs that give a different number is because the compiler produced code changed. And let's say you are working in tuning a compiler, then you could see if your modifications drive the numbers down or up. I think that's the objective here.
alkis is offline  
Old 18 April 2023, 17:16   #7
bebbo
bye
 
Join Date: Jun 2016
Location: Some / Where
Posts: 680
Quote:
Originally Posted by alkis View Post
Thomas, I am pretty sure Bebbo knows what a compiler is, since you know....he ported gcc to amigaos-target.

Anyways, the point of the chart is to compare compiler-produced-code from various compilers. It's "let's see which compiler comes up with the best code" situation.

Cycle counting from vamos for the 68000 seems pretty good from my experiments.

For example
Code:
vamos -v prime 1000000 1
10:18:27.122       main:   INFO:  done. exit code=0
10:18:27.123       main:   INFO:  total cycles: 50779208
10:18:27.123       main:   INFO:  vamos is exiting
Counted 78498 primes up to 1000000. (Did it 1 times)
which for an A500 (PAL) translates to 7.something seconds
Code:
50779208/7.09e6
    ~7.16208857545839210155
Now, if I fire up fs-uae and emulate an A500 and run the code, it does run in 7.something seconds.

The benefit of using vamos cycle counting is you "measure" the same number no matter what your host machine is. I get the above number in Ryzen 9, intel, raspberry pi. So any future runs that give a different number is because the compiler produced code changed. And let's say you are working in tuning a compiler, then you could see if your modifications drive the numbers down or up. I think that's the objective here.

Alkis, thank you for stepping in. There are those who have learned to ask politely, and those who have not.


Back to the topics - I omit the rants...


1. Does the same compiler produce identical code when running on different platforms?


If you are precise: it's not the same compiler. It gets compiled from the same sources. And I don't have an example at hand (I think it was on the RasPi with 32bit), but if you consider that the compiler sometimes has to trade off between statements that it considers equivalent, then different memory addresses and resulting hashes can lead to different results.


But that's not a topic here.




2. Is cycle counting a good idea?

I agree with Alkis: It's a good idea for simple CPUs like the 68000.
But what is more complex CPUs, which contain caches and whatever else?
From my point of view it is a good idea there too, because the cycles per instruction are the essential basis that the compiler can use to select the best instructions from his point of view. In some compilers - like gcc - you can further model the CPU, which can then be used to schedule the instructions.
In this respect, the total cycles per program are still a good indication. While these will not match perfectly with the real values. One can still run these tests on real systems at any time and evaluate these results. TBD.



That's a reasonable topic for me.



3. For me it's interesting to observe different compilers plus the evolution of the gcc compiler and maybe more compilers like LLVM - if I can get these to work.



For example, if you look at SIEVE, you find that -Os from gcc-6.5.0b is slower than -Os from gcc-10.2.1b. This effect can also be observed from gcc-9.5.0-elf to gcc 10.4.0-elf. The difference results from the fact that as of version 10 the built-in function memset is also recognized with -Os and -O2 and memset is significantly faster than the loops generated by the compiler. So backporting this change might be an option.
It also shows that the old gcc-2.95.3 does a real good job.



Looking at TSCP182 is also interesting.

For -O2 and -O3 the gcc-6.5.0-elf yields faster code than all successors. That's where I want to find out why.
Or comparing gcc-13av2 and gcc-13 (both experimental branches for the Amiga) differ only in the provided cost model.


Maybe there is a benchmark where a recent gcc version provides a quantum leap in performance?


... next is fixing gcc-2.95.3-elf for tscp182...
bebbo is offline  
Old 18 April 2023, 17:31   #8
AnimaInCorpore
Registered User
 
Join Date: Nov 2012
Location: Willich/Germany
Posts: 232
https://netlib.org/benchmark/linpackc
AnimaInCorpore is offline  
Old 18 April 2023, 17:41   #9
bebbo
bye
 
Join Date: Jun 2016
Location: Some / Where
Posts: 680
Quote:
Originally Posted by AnimaInCorpore View Post
Thank you, interesting.
Uses ~2MB stack and a lot of floating point stuff, hm. Maybe not so ideal for the 68000?

EDIT: err, that are static variables^^
bebbo is offline  
Old 19 April 2023, 20:22   #10
bebbo
bye
 
Join Date: Jun 2016
Location: Some / Where
Posts: 680
I added clang-17 - the experimental m68k target of llvm - and got sieve to work. The tscp182 benchmark fails with an internal error...
bebbo is offline  
Old 19 April 2023, 21:10   #11
alkis
Registered User
 
Join Date: Dec 2010
Location: Athens/Greece
Age: 53
Posts: 719
Double numbers performance and some io.

Code:
/* The Computer Language Benchmarks Game
 * https://salsa.debian.org/benchmarksgame-team/benchmarksgame/

   contributed by Greg Buchholz
*/

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv) {
  int w, h, bit_num = 0;
  char byte_acc = 0;
  int i, iter = 50;
  double x, y, limit = 2.0;
  double Zr, Zi, Cr, Ci, Tr, Ti;

  if (argc != 2)
    w = h = 100;
  else
    w = h = atoi(argv[1]);

  printf("P4\n%d %d\n", w, h);

  for (y = 0; y < h; ++y) {
    for (x = 0; x < w; ++x) {
      Zr = Zi = Tr = Ti = 0.0;
      Cr = (2.0 * x / w - 1.5);
      Ci = (2.0 * y / h - 1.0);

      for (i = 0; i < iter && (Tr + Ti <= limit * limit); ++i) {
        Zi = 2.0 * Zr * Zi + Ci;
        Zr = Tr - Ti + Cr;
        Tr = Zr * Zr;
        Ti = Zi * Zi;
      }

      byte_acc <<= 1;
      if (Tr + Ti <= limit * limit)
        byte_acc |= 0x01;

      ++bit_num;

      if (bit_num == 8) {
        putc(byte_acc, stdout);
        byte_acc = 0;
        bit_num = 0;
      } else if (x == w - 1) {
        byte_acc <<= (8 - w % 8);
        putc(byte_acc, stdout);
        byte_acc = 0;
        bit_num = 0;
      }
    }
  }
}
sample compiler & run
Code:
m68k-amigaos-gcc -mcrt=nix13 -O3 -o mandelbrot mandelbrot.c -lm
vamos -v ./mandelbrot >foo
22:02:38.801       main:   INFO:  done. exit code=0
22:02:38.801       main:   INFO:  total cycles: 693865522
22:02:38.801       main:   INFO:  vamos is exiting

file foo
foo: Netpbm image data, size = 100 x 100, rawbits, bitmap
Produced 'foo' can be seen with 'xdg-open foo'. Binary foo from amigaos-gcc-6.5 matches binary foo from linux gcc 11.3.0.
alkis is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Earlier C++ Compilers Anubis Coders. General 25 23 October 2019 00:48
C Compilers? Pheonix support.Apps 7 05 December 2016 18:06
AmigaBASIC Compilers Leandro Jardim request.Apps 4 22 May 2013 22:59
C++ Compilers where/what to get? Spadger request.Apps 18 05 May 2006 05:10
c compilers? kruwi request.Apps 1 25 April 2006 18:30

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 03:18.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.11931 seconds with 15 queries