English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General

 
 
Thread Tools
Old 09 July 2023, 21:27   #421
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,975
Thanks. I dont coding from long time, but after quick check, I dont know why You cleared BSS section matrix, because this is always 0, if this is one shoot program. And clearing working buff via move byte command is not very effective.
Don_Adan is offline  
Old 09 July 2023, 21:49   #422
alkis
Registered User
 
Join Date: Dec 2010
Location: Athens/Greece
Age: 53
Posts: 719
Quote:
Originally Posted by paraj View Post
@alkis: Not to stoke the fires, and just out of interest, could you also post the comparable C code (just the important loop/function will do). Could be interesting to see what the compiler did better (for learning, not to hate on your code).
Sure, I'll just put the whole thing so it can be compilable.

Code:
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef unsigned char ptype;

int isqrt(int x) {
  int q = 1, r = 0;
  while (q <= x) {
    q <<= 2;
  }
  while (q > 1) {
    int t;
    q >>= 2;
    t = x - r - q;
    r >>= 1;
    if (t >= 0) {
      x = t;
      r += q;
    }
  }
  return r;
}

// #define COMPUTE_SUM

int computePrimes(ptype *table, int n) {
  int ret = 1; // This is our counter. Starts at 1 to account for number 2
  int i;       //  which is prime, even being even...how odd :P
  int sq = isqrt(n + n) >> 1;
  int step;
  int j;
#ifdef COMPUTE_SUM
  unsigned long long sum64 = 2;
  unsigned int sumh = 0, suml = 2, t;
#endif

  for (i = 0; i <= sq; ++i) {
    if (table[i] == 0) {
      // This is a prime
      ++ret;
      step = 2 * i + 3;
#ifdef COMPUTE_SUM
      sum64 += step;
      t = suml;
      suml += step;
      if (suml < t)
        ++sumh;
#endif
      j = i + step;   // we leave the first occurance as zero
      while (j < n) { // in case we want to print them later.
        table[j] = 1;
        j += step;
      }
    }
  }
  --n; // fixes a bug when N+1 is prime and gets counted
  for (; i < n; ++i)
    if (table[i] == 0) {
#ifdef COMPUTE_SUM
      sum64 += 2 * i + 3;
      t = suml;
      suml += 2 * i + 3;
      if (suml < t)
        ++sumh;
#endif
      ++ret;
    }
#ifdef COMPUTE_SUM
  printf("%16llx\n%08x%08x\n", sum64, sumh, suml);
#endif
  return ret;
}

int main(int argc, char **argv) {
  int num, howMuchMem, n, loops, i;
  ptype *table;

  if (argc <= 2) {
    fprintf(stderr,
            "Usage: %s limit [number of times to run] -- will compute primes "
            "up to num\n",
            argv[0]);
    return 5;
  }
  if (argc == 3) {
    loops = atol(argv[2]);
  } else {
    loops = 1;
  }
  num = atol(argv[1]);
  if (num & 1) {
    num += 1;
  }
  num >>= 1;
  howMuchMem = num * sizeof(ptype);
  table = (ptype *)malloc(howMuchMem);
  if (!table) {
    fprintf(stderr, "Out of memory\n");
    return 20;
  }

  for (i = 0; i < loops; ++i) {
    // printf("Zeroing memory...\n");
    memset(table, 0, howMuchMem);
    // printf("Computing primes...\n");
    n = computePrimes(table, num);
  }
  printf("Counted %d primes up to %d. (Run %d times)\n", n, num * 2, loops);
  free(table);
  return 0;
}
compiles with:
m68k-amigaos-gcc -O3 -mcrt=nix13 -o prime prime.c
alkis is offline  
Old 09 July 2023, 22:14   #423
alkis
Registered User
 
Join Date: Dec 2010
Location: Athens/Greece
Age: 53
Posts: 719
Quote:
Originally Posted by Don_Adan View Post
Thanks. I dont coding from long time, but after quick check, I dont know why You cleared BSS section matrix, because this is always 0, if this is one shoot program. And clearing working buff via move byte command is not very effective.
New clear. Movem.l 32 bytes at a time, unrolled once to go 64bytes per loop.
Code:
          lea.l        guard,a0
          move.w       #7813-1,d5
          moveq        #0,d0
          moveq        #0,d1
          moveq        #0,d2
          moveq        #0,d3
          moveq        #0,d4
          move.l       d0,a1
          move.l       d0,a2
          move.l       d0,a3

.nclr     MOVEM.L      D0-D4/A1-A3,-(A0)                            ; 8 x 4 = 32 bytes
          MOVEM.L      D0-D4/A1-A3,-(A0)                            ; 8 x 4 again for 64 bytes	
          dbra         d5,.nclr
vamos -v sieve
23:09:56.731 main: INFO: done. exit code=0
23:09:56.731 main: INFO: total cycles: 53939790
23:09:56.732 main: INFO: vamos is exiting
Number of primes 78498

Down from 68736530. Gcc still comes up on top with 49823902
alkis is offline  
Old 10 July 2023, 00:24   #424
Muzza
Registered User
 
Muzza's Avatar
 
Join Date: Sep 2019
Location: Sydney
Posts: 357
Quote:
Originally Posted by Hypex View Post
Thanks for testing. I'm not sure what jeq and jlt are. That's not a common mnemonic.
Pseudo op-codes.
They are replaced with the shortest kind of branch possible, but this is not always known at compile time.

Quote:
Originally Posted by Hypex View Post
Except it's branching (or jumping) to a routine, which then jumps to another, comes back and returns. My code branches direct to assumed routines.
I did it like this to represent the ; 0 ; + ; - comments you added to your original, which I assume were there to place-hold some kind of work. Without them the compiler would optimize away the entire routine because it doesn't actually do anything. Taking that into account, the ASM it produces is identical to yours.

Change the functions pos/neg/zero to some small inlined calculation that gets re-used in multiple places, and you'd have a nice example of the benefits of an optimizing compiler.
Muzza is offline  
Old 10 July 2023, 01:12   #425
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,975
Quote:
Originally Posted by alkis View Post
New clear. Movem.l 32 bytes at a time, unrolled once to go 64bytes per loop.
Code:
          lea.l        guard,a0
          move.w       #7813-1,d5
          moveq        #0,d0
          moveq        #0,d1
          moveq        #0,d2
          moveq        #0,d3
          moveq        #0,d4
          move.l       d0,a1
          move.l       d0,a2
          move.l       d0,a3

.nclr     MOVEM.L      D0-D4/A1-A3,-(A0)                            ; 8 x 4 = 32 bytes
          MOVEM.L      D0-D4/A1-A3,-(A0)                            ; 8 x 4 again for 64 bytes	
          dbra         d5,.nclr
vamos -v sieve
23:09:56.731 main: INFO: done. exit code=0
23:09:56.731 main: INFO: total cycles: 53939790
23:09:56.732 main: INFO: vamos is exiting
Number of primes 78498

Down from 68736530. Gcc still comes up on top with 49823902
If I understand correctly Your routine, then You used 0 byte for prime and 1 for no prime. If You replace this for 0 as no prime and 1 as prime, then in final loop You can perhaps replace tst.b (a0)+ with tst.w (a0)+. And final loop will be 2 times faster.

Also I dont remember if add.l d6,d1 is fastest than addq.l #1,d1. For me addq.l version is much more readable.
Don_Adan is offline  
Old 10 July 2023, 03:28   #426
alkis
Registered User
 
Join Date: Dec 2010
Location: Athens/Greece
Age: 53
Posts: 719
Quote:
Originally Posted by Don_Adan View Post
If I understand correctly Your routine, then You used 0 byte for prime and 1 for no prime. If You replace this for 0 as no prime and 1 as prime, then in final loop You can perhaps replace tst.b (a0)+ with tst.w (a0)+. And final loop will be 2 times faster.

Also I dont remember if add.l d6,d1 is fastest than addq.l #1,d1. For me addq.l version is much more readable.
I am missing something on your first suggestion. Table element is byte. How can I check 2 values with one check when using words?

For the add.l vs addq, I think they are both 8 cycles according to https://wiki.neogeodev.org/index.php...ctions_timings
alkis is offline  
Old 10 July 2023, 08:57   #427
kamelito
Zone Friend
 
kamelito's Avatar
 
Join Date: May 2006
Location: France
Posts: 1,801
It reminds me of this video and guess what Assembly was not the winner nor C.
[ Show youtube player ]
kamelito is offline  
Old 10 July 2023, 09:03   #428
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
This doesn't look too well :
Code:
.while	cmp.l	d7,d3
	bge.b	.endw
	move.b	d6,(a1,d3.l)
	add.l	d2,d3
	bra.b	.while
.endw
Perhaps you could try something like :
Code:
 lea (a1,d7.l),a3       ; put that out of for loop
 lea (a1,d3.l),a2
 bra.b .do              ; remove that if first test always returns to .while
.while
 move.b d6,(a2)
 add.l d2,a2
.do
 cmp.l a3,a2
 blo.b .while
meynaf is offline  
Old 10 July 2023, 09:56   #429
Bruce Abbott
Registered User
 
Bruce Abbott's Avatar
 
Join Date: Mar 2018
Location: Hastings, New Zealand
Posts: 2,581
Quote:
Originally Posted by alkis View Post
For example, I coded a program that counts all primes up to 1 million. I did it in asm and then I did it in C (I am using beboo's gcc 6.5). C wins. I am sure if a godlike assembly programmer (like most other guys in this forum are) would write the asm version asm would come on top. The other day I thought, "Huh, what would sas c give?". I compiled with sas and my asm was better. So ranking is:
1) gcc 6.5
2) average (or below) asm programer
3) sas c

For reference here are the timings...
How did you count the cycles?
Bruce Abbott is offline  
Old 10 July 2023, 10:03   #430
alkis
Registered User
 
Join Date: Dec 2010
Location: Athens/Greece
Age: 53
Posts: 719
Quote:
Originally Posted by Bruce Abbott View Post
How did you count the cycles?
I run the executable through vamos on linux. With -v it reports the cycles needed for the execution.
alkis is offline  
Old 10 July 2023, 10:14   #431
alkis
Registered User
 
Join Date: Dec 2010
Location: Athens/Greece
Age: 53
Posts: 719
Quote:
Originally Posted by meynaf View Post
This doesn't look too well :
Code:
.while	cmp.l	d7,d3
	bge.b	.endw
	move.b	d6,(a1,d3.l)
	add.l	d2,d3
	bra.b	.while
.endw
Perhaps you could try something like :
Code:
 lea (a1,d7.l),a3       ; put that out of for loop
 lea (a1,d3.l),a2
 bra.b .do              ; remove that if first test always returns to .while
.while
 move.b d6,(a2)
 add.l d2,a2
.do
 cmp.l a3,a2
 blo.b .while
vamos -v sieve
11:12:02.255 main: INFO: done. exit code=0
11:12:02.255 main: INFO: total cycles: 42054466
11:12:02.256 main: INFO: vamos is exiting
Number of primes 78498

Asm on top now.
alkis is offline  
Old 10 July 2023, 11:17   #432
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,762
Quote:
Originally Posted by kamelito View Post
It reminds me of this video and guess what Assembly was not the winner nor C.
He's wrong. Assembly language is the fastest. Assembly language is translated to machine code directly, and machine code is all a CPU knows. So how is assembly language not the fastest?
Thorham is offline  
Old 10 July 2023, 11:46   #433
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by Thorham View Post
So how is assembly language not the fastest?
Easy : it wasn't in the test.
Frankly this test looks completely bogus. Java in top 5, serious ?
meynaf is offline  
Old 10 July 2023, 12:39   #434
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,165
Quote:
Originally Posted by Thorham View Post
He's wrong. Assembly language is the fastest. Assembly language is translated to machine code directly, and machine code is all a CPU knows. So how is assembly language not the fastest?
It is possible for JIT to beat ahead of time compiled or assembled code for the very simple reason that lots of "variables" at compile time and branches based on them end up being invariant at runtime and a good JIT only optimises the paths actually taken. If you have never heard of it i suggest you read up on HP's Project Dynamo in which a JIT emulation of their own PA-RISC CPU ran a number of benchmarks faster than the host CPU did, despite the fact it was the same chip! Think of it as a 68K emulator running in a 68K outperforming the actual CPU for the sane task.

The reason for this apparent over-unity performance wasn't bad measurement or a dysfunction if reality. It was just that the compiled benchmark code was further optimisable at runtime.

Java, as annoying as it is, does have a very mature JIT on x64, so i would not be surprised for it to run some high level code faster than some native binary.
Karlos is offline  
Old 10 July 2023, 13:05   #435
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,762
Quote:
Originally Posted by meynaf View Post
Java in top 5, serious ?
Depends on the JIT compiler.

Quote:
Originally Posted by Karlos View Post
It is possible for JIT to beat ahead of time compiled or assembled code
It still ends up being native machine code (which can be directly translated to ASM).

Quote:
Originally Posted by Karlos View Post
Think of it as a 68K emulator running in a 68K outperforming the actual CPU for the sane task.
Yeah, I wanna see that.

The reason why ASM is fastest is because machine code is all the CPU knows. A CPU can't run anything else. That you can use various techniques to improve performance doesn't change this.
Thorham is offline  
Old 10 July 2023, 13:16   #436
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by Thorham View Post
Depends on the JIT compiler.
It sure does, but java run-time is far from being lightweight.
Every java app i've tried so far wasn't impressive by its speed...
So perhaps a good JIT can offset things, but putting that... thing in top 5 of two hundred languages ? Nah.

While there are some optimizations JIT can do from information only available at run-time, it has to be much faster than static optimization and therefore misses many opportunities.
In addition, we all know benchmarks have a strong tendency to do useless things that can be completely removed.

But anyone who believes JIT compiler can beat native speed is welcome to help me improve my VM.
meynaf is offline  
Old 10 July 2023, 13:26   #437
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,762
Quote:
Originally Posted by meynaf View Post
Every java app i've tried so far wasn't impressive by its speed...
Depends on the code quality. I have a DDS decoder in C# that's based on someone else's code. I managed to make the code three times as fast. I also found another DDS decoder in C# that was about ten times slower than mine!

Quote:
Originally Posted by meynaf View Post
But anyone who believes JIT compiler can beat native speed is welcome to help me improve my VM.
You can't beat native speed. It's impossible.
Thorham is offline  
Old 10 July 2023, 13:28   #438
kamelito
Zone Friend
 
kamelito's Avatar
 
Join Date: May 2006
Location: France
Posts: 1,801
Quote:
Originally Posted by Thorham View Post
He's wrong. Assembly language is the fastest. Assembly language is translated to machine code directly, and machine code is all a CPU knows. So how is assembly language not the fastest?
You know that it not the machine code produced but the code being C or assembly no the end result at the machine level that the person who did the assembly code was not able to write.
kamelito is offline  
Old 10 July 2023, 13:41   #439
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by Thorham View Post
Depends on the code quality. I have a DDS decoder in C# that's based on someone else's code. I managed to make the code three times as fast. I also found another DDS decoder in C# that was about ten times slower than mine!
Well, it's C# vs C#. Still not impressed, it's so easy to kill performance in HLLs.


Quote:
Originally Posted by Thorham View Post
You can't beat native speed. It's impossible.
I know. I was just sarcarstic.
meynaf is offline  
Old 10 July 2023, 13:45   #440
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by kamelito View Post
You know that it not the machine code produced but the code being C or assembly no the end result at the machine level that the person who did the assembly code was not able to write.
That nobody was able to write decent x86 code isn't surprising.
The fact x86 (and many others) asm is just horrible has led to the (false) belief compilers do a better job. But it's easy to be the best when nobody seriously challenges you.
meynaf is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Chat GPT Amiga Assembly rcman Coders. Asm / Hardware 3 26 March 2023 20:24
An Amiga coder was banned without a reason - is it ok? litwr project.EAB 1 18 June 2021 20:38
Beginning Amiga Assembly Tutorial(s) Curbie Coders. Asm / Hardware 15 29 May 2020 00:21
Beginning Amiga Assembly Programming Hewitson Coders. Tutorials 32 09 October 2012 18:25
Amiga Assembly sources - Please help! W4r3DeV1L Amiga scene 21 16 July 2008 08:13

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 02:13.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.17531 seconds with 16 queries