English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 05 February 2017, 09:41   #1
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
the multi-cpu code density contest

Hi all asm lovers,


I have unusual needs here. It's about comparison of various cpu families, but, this time, not at all about performance benchmarks - rather, code size (both number of bytes and number of instructions).
I'd wish to recruit people who master some asm other than 68k (and are willing to help, obviously). As I can do the 68k version of just about everything but i don't code in anything else anymore.

Perhaps OT/Technical would have been a better place but i wouldn't like this thread to get pruned so i set it up here.

This is to make comparisons merely for academic purposes, as everything that can be found on the Net shows only final sizes of stuff written by compilers, when they show something at all (and missing the point with included libraries, debug info, etc, that can make the exe grow, and also with compiler settings and compiler quality, which can have dramatic effects as well).
Now detailed comparatives, with instructions and encodings, would tell better.

So it's all about writing/showing real life code samples of significant size (i'd say 20-40 instructions should be enough), and write that for as many cpus as possible.

Anyone who thinks 68k can easily be beaten in code size / number of instructions by something else (x86, 6502, ARM, whatever you see fit) is more than welcome.
What do you all think about this idea ? Crazy ? Fancy participate then ?
meynaf is offline  
Old 05 February 2017, 13:47   #2
idrougge
Registered User
 
Join Date: Sep 2007
Location: Stockholm
Posts: 4,337
Try to start a thread on Stack Overflow forum "Code golf".
idrougge is offline  
Old 05 February 2017, 13:56   #3
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
I don't have an account there and won't create one just for seeing nobody is able to help.
meynaf is offline  
Old 05 February 2017, 14:04   #4
idrougge
Registered User
 
Join Date: Sep 2007
Location: Stockholm
Posts: 4,337
Code golf is a forum dedicated to code size optimisation. If you're going to find help somewhere, that's where you'll find it. Or Pouët.
idrougge is offline  
Old 05 February 2017, 15:43   #5
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,762
Quote:
Originally Posted by meynaf View Post
I don't have an account there and won't create one just for seeing nobody is able to help.
Login with google.

Idrougge does have a point. This forum is really dedicated to 68k, so I wouldn't be surprised if not much gets posted.
Thorham is online now  
Old 05 February 2017, 16:03   #6
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by Thorham View Post
Login with google.
Can't "login" with google, can just import an account from there. Still needs to create stack overflow account. And all this, to probably not find anyone able to help - at least not more than here. It's not just about code optimising.


Quote:
Originally Posted by Thorham View Post
Idrougge does have a point. This forum is really dedicated to 68k, so I wouldn't be surprised if not much gets posted.
Well, it's for putting OT with litwr in another place...

Anyway, if this thread isn't the right place then just let it go.
If nobody can help i will see it by myself ; i don't need you, idrougge or anyone else to tell it to me.
meynaf is offline  
Old 06 February 2017, 08:49   #7
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Ok, as i wish to recruit, i must at least give something to do.

Code:
; a0=source, a1-a4=dest
 move.w #1999,d0
.loop
 movem.l (a0)+,d1-d4
 move.l d1,d5
 swap d5
 move.w d3,d5
 move.l d5,(a2)+
 move.l d1,d5
 swap d3
 move.w d3,d5
 move.l d5,(a1)+
 move.l d2,d5
 swap d5
 move.w d4,d5
 move.l d5,(a4)+
 move.l d2,d5
 swap d4
 move.w d4,d5
 move.l d5,(a3)+
 dbf d0,.loop
 rts
The above code shows that x86 would just plain suck at doing this :
- it has post-increment but only for one source and one target,
- it can do same as swap but with a longer instruction,
- it can't mix 16 and 32 bit code without the use of prefixes, which will make the code even longer,
- the above code uses 6 data and 5 addr, a lot more than what the x86 in 32-bit can provide.

Some smart ass may want to prove me that x86 is better than 68k : just show me wrong with a shorter x86 version of the code here.

Or do it for ARM or whatever cpu. 6502 version must be very funny, it does not even have enough ram to merely make it work...

For those who want to know, this routine does ST to Amiga screen conversion in less than half a frame on plain A1200 (because of 32-bit chipmem accesses ; 68020 fails to provide that speed on ECS). A much shorter approach exists by doing direct 16-bit moves but it's quite a lot slower, especially because target of writes is in chipmem.

Oh, ok. As it's supposed to be 68k-only here (?), 68k people can do their optimizing attempts as well
(There is one way to grab a few clocks but it's for 020/030 only and it'd make the code longer.)
meynaf is offline  
Old 06 February 2017, 11:38   #8
idrougge
Registered User
 
Join Date: Sep 2007
Location: Stockholm
Posts: 4,337
You can't recruit Intel or ARM guys in an Amiga forum.
idrougge is offline  
Old 06 February 2017, 11:50   #9
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by idrougge View Post
You can't recruit Intel or ARM guys in an Amiga forum.
There is at least one.
Read what litwr says in this thread.

Amiga people can also know other things.
meynaf is offline  
Old 06 February 2017, 12:08   #10
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,762
You can do it a little shorter like this (if I didn't make a mistake):
Code:
    move.w  #1999,d0
.loop
    movem.l (a0)+,d1-d4

    swap    d3
    eor.w   d1,d3
    eor.w   d3,d1
    move.l  d1,(a1)+
    eor.w   d1,d3
    swap    d3
    move.l  d3,(a2)+

    swap    d4
    eor.w   d2,d4
    eor.w   d4,d2
    move.l  d2,(a3)+
    eor.w   d2,d4
    swap    d4
    move.l  d4,(a4)+

    dbra    d0,.loop
    rts
Might be faster on a plain A1200. Reading two more registers and unrolling once may also increase speed a bit.

Last edited by Thorham; 06 February 2017 at 12:17.
Thorham is online now  
Old 06 February 2017, 12:28   #11
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Looks like it will work
meynaf is offline  
Old 06 February 2017, 12:48   #12
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,762
Quote:
Originally Posted by meynaf View Post
Looks like it will work
Cool

This is longer, but should be faster because of more pipelining:
Code:
    move.w  #999,d0
.loop
    movem.l (a0)+,d1-d4

    swap    d3
    eor.w   d1,d3
    eor.w   d3,d1
    move.l  d1,(a1)+
    eor.w   d1,d3
    swap    d3
    move.l  d3,(a2)+

    swap    d4
    eor.w   d2,d4
    eor.w   d4,d2
    move.l  d2,(a3)+
    eor.w   d2,d4
    swap    d4

    movem.l (a0)+,d1-d3/d5
    move.l  d4,(a4)+

    swap    d3
    eor.w   d1,d3
    eor.w   d3,d1
    move.l  d1,(a1)+
    eor.w   d1,d3
    swap    d3
    move.l  d3,(a2)+

    swap    d5
    eor.w   d2,d5
    eor.w   d5,d2
    move.l  d2,(a3)+
    eor.w   d2,d5
    swap    d5
    move.l  d5,(a4)+

    dbra    d0,.loop
    rts
Thorham is online now  
Old 06 February 2017, 12:48   #13
robinsonb5
Registered User
 
Join Date: Mar 2012
Location: Norfolk, UK
Posts: 1,153
Interesting idea - when I get a chance I might try a ZPU version just for fun. (ZPU code is often ridiculously verbose but each instruction only occupies 1 byte.)
robinsonb5 is offline  
Old 06 February 2017, 12:58   #14
idrougge
Registered User
 
Join Date: Sep 2007
Location: Stockholm
Posts: 4,337
Quote:
Originally Posted by meynaf View Post
There is at least one.
Read what litwr says in this thread.

Amiga people can also know other things.
Sure you can, but if someone went to a PC forum and tried to recruit 68k coders, wouldn't you say the competition was weighted in favour of Intel?
idrougge is offline  
Old 06 February 2017, 13:11   #15
robinsonb5
Registered User
 
Join Date: Mar 2012
Location: Norfolk, UK
Posts: 1,153
Quote:
Originally Posted by idrougge View Post
Sure you can, but if someone went to a PC forum and tried to recruit 68k coders, wouldn't you say the competition was weighted in favour of Intel?
True, but 68K has been obsolete for years, so the chances are much higher that 68K coders will have experience of other architectures.
robinsonb5 is offline  
Old 06 February 2017, 13:12   #16
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by Thorham View Post
This is longer, but should be faster because of more pipelining
This was my idea when i told about one way to grab a few clocks that's for 020/030 only and makes the code longer.


Quote:
Originally Posted by robinsonb5 View Post
Interesting idea - when I get a chance I might try a ZPU version just for fun. (ZPU code is often ridiculously verbose but each instruction only occupies 1 byte.)
That would be very cool


Quote:
Originally Posted by idrougge View Post
Sure you can, but if someone went to a PC forum and tried to recruit 68k coders, wouldn't you say the competition was weighted in favour of Intel?
True, but there is no real place that's without any bias, and there has to be one for collecting the results.

And as 68k is the best, it will win anyway at the end, no ? So here is the best place

That said, you can go to stack overflow or whatever site and ask them and bring the results back here. I just don't want to do it myself.


Quote:
Originally Posted by robinsonb5 View Post
True, but 68K has been obsolete for years, so the chances are much higher that 68K coders will have experience of other architectures.
+1

Last edited by meynaf; 06 February 2017 at 13:13. Reason: cross-posting
meynaf is offline  
Old 06 February 2017, 13:21   #17
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,762
Quote:
Originally Posted by meynaf View Post
This was my idea when i told about one way to grab a few clocks that's for 020/030 only and makes the code longer.
Yeah, it's a pretty obvious one You could also read more registers with movem.

Quote:
Originally Posted by meynaf View Post
And as 68k is the best, it will win anyway at the end, no ? So here is the best place
Obviously
Thorham is online now  
Old 06 February 2017, 15:51   #18
idrougge
Registered User
 
Join Date: Sep 2007
Location: Stockholm
Posts: 4,337
Quote:
Originally Posted by robinsonb5 View Post
True, but 68K has been obsolete for years, so the chances are much higher that 68K coders will have experience of other architectures.
Assembly language has been obsolete for just as long.
idrougge is offline  
Old 06 February 2017, 16:01   #19
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,762
Quote:
Originally Posted by idrougge View Post
Assembly language has been obsolete for just as long.
Thorham is online now  
Old 06 February 2017, 16:48   #20
litwr
Registered User
 
Join Date: Mar 2016
Location: Ozherele
Posts: 229
This example is a bit too big. I am not sure that I can afford to have time enough for it.
IMHO I have the other and much more simple. We have two 64-bit unsigned integers A and B. How to find what is bigger? With x86_64 we just use CMP RAX,RBX. With x86 we should use 2 registers for every number, for example, EAX:EBX for A and ECX:EDX for B. 680x0 may use D0: D1 for A and D2: D3 for B. The registers should not change. Start!
BTW 680x0 can't match ARM in the division algorithm. It requires only 3 ARM instructions for a loop! Indeed, 680x0 has hardware division...
litwr is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Generated code and CPU Instruction Cache Mrs Beanbag Coders. Asm / Hardware 11 23 May 2014 11:05
EAB Christmas Song-writing Contest mr_a500 project.EAB 64 24 May 2009 02:44
AmigaSYS Wallpaper Contest Calo Nord News 10 22 April 2005 09:33
Landover's Amiga Arcade Conversion Contest Frog News 1 28 January 2005 23:41
Battlechess Contest (EAB vs A500) Bloodwych Nostalgia & memories 67 14 August 2003 14:37

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 03:54.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.10946 seconds with 16 queries