English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 07 February 2017, 09:15   #41
litwr
Registered User
 
Join Date: Mar 2016
Location: Ozherele
Posts: 229
Quote:
Originally Posted by matthey View Post
+ division starts at the most significant end making it faster
Thanks. Only this point has a real importance. But it is true for right shift division only, left shift division generally is faster.
Quote:
Originally Posted by meynaf View Post
But ok, i'll give that 68k code. The 68020 has tools that are very powerful when you know how to use them (prepare to be shocked ) :
Code:
bfset (a0){d0:1}
Fascinating! However I am afraid that it maybe a bit slow with 68020.
litwr is offline  
Old 07 February 2017, 09:22   #42
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by Thorham View Post
Yeah, that's pretty short I keep forgetting those bitfield intructions for some reason However, is it faster (greater than 5 byte case is 22 cycles)?
Single-bit access can't be a 5-byte case
meynaf is offline  
Old 07 February 2017, 09:28   #43
idrougge
Registered User
 
Join Date: Sep 2007
Location: Stockholm
Posts: 4,332
Quote:
Originally Posted by AnimaInCorpore View Post
I wouldn't say "obsolete" but "less important".
Just like the 68k. There are probably people out there maintaining 68k/Coldfire code in all kinds of projects, but desktop applications, games and even demos are written in high-level languages today. The percentage of PC coders who know and use x86 assembly is much smaller than the amount of Amiga coders who know 68k assembly.
idrougge is offline  
Old 07 February 2017, 09:41   #44
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by idrougge View Post
The percentage of PC coders who know and use x86 assembly is much smaller than the amount of Amiga coders who know 68k assembly.
That's very true !
So there is little point seeking asm help on some PC forum.
meynaf is offline  
Old 07 February 2017, 09:42   #45
idrougge
Registered User
 
Join Date: Sep 2007
Location: Stockholm
Posts: 4,332
I never said you should seek help on a PC forum.
idrougge is offline  
Old 07 February 2017, 09:50   #46
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by idrougge View Post
I never said you should seek help on a PC forum.
Stack Overflow is a PC forum.
meynaf is offline  
Old 07 February 2017, 10:11   #47
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,751
Quote:
Originally Posted by litwr View Post
However I am afraid that it maybe a bit slow with 68020.
Just benched it on a 68030 (same cycle times as 68020). It's the same speed as this:
Code:
    move.l  d0,d1
    lsr.l   #5,d1
    bset    d0,(a0,d1.w*4)
Very nice instructions those bitfield instructions.

Quote:
Originally Posted by meynaf View Post
Single-bit access can't be a 5-byte case
Thought it referred to the accessed bytes.

Quote:
Originally Posted by idrougge View Post
Just like the 68k. There are probably people out there maintaining 68k/Coldfire code in all kinds of projects, but desktop applications, games and even demos are written in high-level languages today. The percentage of PC coders who know and use x86 assembly is much smaller than the amount of Amiga coders who know 68k assembly.
Of course, but assembly language isn't obsolete. Knowledge of assembly language is still critically important. We simply can't do without yet. As long as computers have CPUs with instruction sets like we have now, we can't do without assembly language knowledge.

Quote:
Originally Posted by meynaf View Post
Stack Overflow is a PC forum.
It's a programming forum. Not everything on stack overflow is peecee related. Most of it is language related.
Thorham is offline  
Old 07 February 2017, 10:20   #48
idrougge
Registered User
 
Join Date: Sep 2007
Location: Stockholm
Posts: 4,332
Quote:
Originally Posted by Thorham View Post
Of course, but assembly language isn't obsolete. Knowledge of assembly language is still critically important. We simply can't do without yet. As long as computers have CPUs with instruction sets like we have now, we can't do without assembly language knowledge.
Do I strike you as so stupid as to suggest that? Take my statement with a grain of salt.

Quote:
Originally Posted by meynaf View Post
Stack Overflow is a PC forum.
StackOverflow is a lot of forums. I mainly use it as a Mac/iOS programming forum, but Codegolf is… well, check for yourself instead of just being stubborn.
idrougge is offline  
Old 07 February 2017, 10:27   #49
litwr
Registered User
 
Join Date: Mar 2016
Location: Ozherele
Posts: 229
Quote:
Originally Posted by matthey View Post
Theoretically, any 68k CPU with a 16 bit data bus could be faster when adding/subtracting a 32 bit number in memory but memory was likely already fast enough (and the 68k slow enough) that it made little if any difference. I would love to hear anywhere you read that it made a difference and how much though.
Just read the manuals.
[68000] ADDI.l #,Dn 16 cycles, MOVE.l #,Dn 12 cycles
The timing should be equal for LE.
litwr is offline  
Old 07 February 2017, 10:44   #50
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,751
Quote:
Originally Posted by idrougge View Post
Do I strike you as so stupid as to suggest that?
You didn't suggest it, you just outright wrote it
Quote:
Originally Posted by idrougge View Post
Assembly language has been obsolete for just as long.
Quote:
Originally Posted by idrougge View Post
StackOverflow is a lot of forums.
No, it's not. It's one forum which is part of StackExchange: http://stackexchange.com/
Thorham is offline  
Old 07 February 2017, 10:54   #51
litwr
Registered User
 
Join Date: Mar 2016
Location: Ozherele
Posts: 229
Quote:
Originally Posted by matthey View Post
+ more human readable hex/binaries and text in memory
DEC disappeared claiming octals have better readability. They didn' t support hexadecimals to the end. What a stupidity! Motorola made the similar mistake...
BTW StackOverflow is one of the best IT forum. You may ask even about 8-bit Commodore there.
litwr is offline  
Old 07 February 2017, 12:52   #52
grond
Registered User
 
Join Date: Jun 2015
Location: Germany
Posts: 1,918
Quote:
Originally Posted by meynaf View Post
Code:
; a0=source, a1-a4=dest
 move.w #1999,d0
.loop
 movem.l (a0)+,d1-d4
 move.l d1,d5
 swap d5
 move.w d3,d5
 move.l d5,(a2)+
 move.l d1,d5
 swap d3
 move.w d3,d5
 move.l d5,(a1)+
 move.l d2,d5
 swap d5
 move.w d4,d5
 move.l d5,(a4)+
 move.l d2,d5
 swap d4
 move.w d4,d5
 move.l d5,(a3)+
 dbf d0,.loop
 rts
If this code is about code density, it can be done better:
Code:
; a0=source, a1-a4=dest
 move.l a0,a5
 adda.w #2000,a5
.loop
 movem.w (a0)+,d0-d7
 swap d0
 swap d1
 move.w d4,d0
 move.w d5,d1
 move.l d0,(a1)+
 swap d2
 move.l d1,(a2)+
 move.w d6,d2
 swap d3
 move.l d2,(a3)+
 move.w d7,d3
 move.l d3,(a4)+
 cmpa.l a0,a5
 bpl .loop
 rts
I haven't counted cycles but it might also be faster. Of course, optimising for speed is a different topic altogether, especially if the destination is in chipmem. For this reason I don't think it makes much sense to analyse one aspect, i.e. code density, without the others (execution speed on a real life system).
grond is offline  
Old 07 February 2017, 14:02   #53
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by idrougge View Post
StackOverflow is a lot of forums. I mainly use it as a Mac/iOS programming forum, but Codegolf is… well, check for yourself instead of just being stubborn.
Well, let me be more precise then : Stack Overflow is mostly a PC forum.
(And if you think they're really helpful in our case, just go and ask them instead of wasting my time here.)


Quote:
Originally Posted by Thorham View Post
Thought it referred to the accessed bytes.
You don't need to access more than 1 byte if you want to get 1 bit.


Quote:
Originally Posted by litwr View Post
DEC disappeared claiming octals have better readability. They didn' t support hexadecimals to the end. What a stupidity! Motorola made the similar mistake...
Endianness has nothing to do with octals. Octals are unreadable, little endian is unreadable - if you really want to compare, it's Intel who made the mistake.


Quote:
Originally Posted by litwr View Post
BTW StackOverflow is one of the best IT forum. You may ask even about 8-bit Commodore there.
You may ask many things in many places. Getting meaningful replies is something else.


Quote:
Originally Posted by grond View Post
I haven't counted cycles but it might also be faster. Of course, optimising for speed is a different topic altogether, especially if the destination is in chipmem. For this reason I don't think it makes much sense to analyse one aspect, i.e. code density, without the others (execution speed on a real life system).
It was about doing it fully 32-bit. If you use 16-bit memory accesses, the loop is just 4 instructions !
meynaf is offline  
Old 07 February 2017, 14:12   #54
grond
Registered User
 
Join Date: Jun 2015
Location: Germany
Posts: 1,918
Quote:
Originally Posted by meynaf View Post
It was about doing it fully 32-bit. If you use 16-bit memory accesses, the loop is just 4 instructions !
Well, if you change or augment the rules mid-game... I assumed the destination was in chipmem and the source in fast. Otherwise using movem wouldn't have been the best decision anyway.
grond is offline  
Old 07 February 2017, 14:20   #55
phx
Natteravn
 
phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,496
Quote:
Originally Posted by meynaf View Post
Endianness has nothing to do with octals. Octals are unreadable, little endian is unreadable - if you really want to compare, it's Intel who made the mistake.
I agree. No sane person would design a new CPU with little-endian, except for compatibility issues with x86, PCI-bus, etc..

The x86 inherited little-endian from their 8-bit CPUs, where it makes sense to read the least significant bytes first from memory when doing operations on them. But there is no reason for real 32/64 bits CPUs, except for compatibility with former models.

Little-endian alone is a reason for me to stay away from a CPU. I never did much with ARM either, because it is mostly used in LE mode.
phx is offline  
Old 07 February 2017, 14:25   #56
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by grond View Post
Well, if you change or augment the rules mid-game... I assumed the destination was in chipmem and the source in fast. Otherwise using movem wouldn't have been the best decision anyway.
Perhaps i was just a little unclear about it, sorry.
The destination is in chipmem and the source in fast.
It's just that reads and writes must be performed with 32-bit width, or it becomes meaningless (see my 4x move.w explanation in previous posts).
meynaf is offline  
Old 07 February 2017, 14:48   #57
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,751
Quote:
Originally Posted by meynaf View Post
You don't need to access more than 1 byte if you want to get 1 bit.
I meant that after the 5th byte you'd get a penalty similar to shifting more than 8 bits. The documentation says 14 cycles for < 5 bytes, and 22 cycles > 5 bytes.
Thorham is offline  
Old 07 February 2017, 15:01   #58
grond
Registered User
 
Join Date: Jun 2015
Location: Germany
Posts: 1,918
Quote:
Originally Posted by meynaf View Post
It's just that reads and writes must be performed with 32-bit width, or it becomes meaningless
Why would be doing 16bit reads from fast and 32bit writes to chip be meaningless? I understand you are investigating code density but made the extra condition to use 32bit moves for the writes because they are to chipmem on a 32bit chipmem machine. Your four word-size moves example violates this condition. My code does not and shows better code density and possibly even better speed on some 68k. To proud to admit this?
grond is offline  
Old 07 February 2017, 16:02   #59
matthey
Banned
 
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
Quote:
Originally Posted by Thorham View Post
You mean 68060?
Yes, and likely any other superscalar 68k CPU.

Quote:
Originally Posted by Thorham View Post
Why exactly? If it's the instruction ordering, then wouldn't that be irrelevant because of the pipelining (slow chipmem writes)?
It is instruction scheduling and forwarding concerns. Your code has many dependencies. The 68060 is in order superscalar so it can not reorder instructions which it executes in pairs (OoO execution benefits from better superscalar scheduling also). A calculation result of the first instruction of a pair can not be sourced for the 2nd instruction because it has not been completed yet. Superscalar dual execution of the code could save a few cycles allowing chip mem writes to start sooner. This is no different than other optimizations.

Quote:
Originally Posted by Thorham View Post
Anyway, I optimize code for 68020s and 68030s because they need it far more than 68060s, and I don't know a whole lot about 68060 optimization (I only know something about instruction ordering). Especially when a plain A1200 is your target, 68060 optimization isn't relevant anymore.
It is possible to produce fairly optimal code for 68020-68060. Code for the 68020-68030 can usually be instruction scheduled for the 68060 with little if any slow down (see below for my attempt).

Quote:
Originally Posted by Thorham View Post
Also, it was about code size
I wasn't criticizing your code. It was the shortest, even for the 68060 .

Quote:
Originally Posted by Thorham View Post
Code:
    move.w  #1999,d0 ; pOEP
.loop
    movem.l (a0)+,d1-d4 ; pOEP only
    swap    d3 ; pOEP only
    eor.w   d1,d3 ; pOEP
    eor.w   d3,d1 ; pOEP (dependency)
    move.l  d1,(a1)+ ; pOEP (only .l forwarded)
    eor.w   d1,d3 ; sOEP only
    swap    d3 ; pOEP only
    move.l  d3,(a2)+ ; pOEP
    swap    d4 ; pOEP only
    eor.w   d2,d4 ; pOEP
    eor.w   d4,d2 ; pOEP (dependency)
    move.l  d2,(a3)+  ; pOEP (only .l forwarded)
    eor.w   d2,d4 ; sOEP
    swap    d4 ; pOEP only
    move.l  d4,(a4)+ ; pOEP
    dbra    d0,.loop ; pOEP only
    rts  ; pOEP only
pOEP = primary integer pipe
sOEP = secondary integer pipe

Optimum would be every other instruction being sOEP although that is rarely possible. There are some instructions which can't be sOEP in the 68060 and don't even allow an sOEP instruction at the same time like MOVEM, SWAP (oversight/mistake as it could and should have been), MUL and DIV. There isn't much room to reschedule your code. This is just the nature of the EOR exchange algorithm which does more calculations.

Quote:
Originally Posted by meynaf View Post
Code:
; a0=source, a1-a4=dest
 move.w #1999,d0 ; pOEP
.loop
 movem.l (a0)+,d1-d4 ; pOEP only
 move.l d1,d5 ; pOEP
 swap d5 ; pOEP only
 move.w d3,d5 ; pOEP
 move.l d5,(a2)+ ; pOEP
 move.l d1,d5 ; sOEP
 swap d3 ; pOEP only
 move.w d3,d5 ; pOEP
 move.l d5,(a1)+ ; pOEP
 move.l d2,d5 ; sOEP
 swap d5 ; pOEP only
 move.w d4,d5 ; pOEP
 move.l d5,(a4)+ ; pOEP
 move.l d2,d5 ; sOEP
 swap d4 ; pOEP only
 move.w d4,d5 ; pOEP
 move.l d5,(a3)+ ; pOEP
 dbf d0,.loop ; pOEP only
 rts ; pOEP only
Meynaf's code is not much better but has more opportunities to reschedule. The 68060 has optimizations for MOVE.L which helps his code. I believe I found 2 instructions which can be removed from his code as follows.

Code:
; a0=source, a1-a4=dest
 move.w #1999,d0 ; pOEP
.loop
 movem.l (a0)+,d1-d4 ; pOEP only
 move.l d1,d5 ; pOEP
 swap d1 ; pOEP only
 move.w d3,d1 ; pOEP
 swap d3 ; pOEP only
 move.l d1,(a2)+ ; pOEP
 move.w d3,d5 ; sOEP
 move.l d5,(a1)+ ; pOEP
 move.l d2,d5 ; sOEP
 swap d5 ; pOEP only
 move.w d4,d5 ; pOEP
 swap d4 ; pOEP only
 move.l d5,(a4)+ ; pOEP
 move.w d4,d2 ; sOEP
 move.l d2,(a3)+ ; pOEP
 dbf d0,.loop ; pOEP only
 rts ; pOEP only
On a superscalar CPU like the 68060 but with SWAP available in the sOEP (I believe the Apollo core would qualify), we would have much better dual execution as the following code shows.

Code:
; a0=source, a1-a4=dest
 move.w #1999,d0 ; pOEP
.loop
 movem.l (a0)+,d1-d4 ; pOEP only
 move.l d1,d5 ; pOEP
 swap d1 ; sOEP
 move.w d3,d1 ; pOEP
 swap d3 ; sOEP
 move.l d1,(a2)+ ; pOEP
 move.w d3,d5 ; sOEP
 move.l d5,(a1)+ ; pOEP
 move.l d2,d5 ; sOEP
 swap d5 ; pOEP
 move.w d4,d5 ; pOEP (dependency)
 swap d4 ; sOEP
 move.l d5,(a4)+ ; pOEP
 move.w d4,d2 ; sOEP
 move.l d2,(a3)+ ; pOEP
 dbf d0,.loop ; pOEP only
 rts ; pOEP only
Such a simple mistake of not supporting SWAP in the sOEP was very costly here. SWAP is quite common, simple and the result can be forwarded so not allowing in the sOEP is right up there with removing multiply with 64 bit result on the 68060 for brain farts.

Last edited by matthey; 07 February 2017 at 16:23.
matthey is offline  
Old 07 February 2017, 16:53   #60
litwr
Registered User
 
Join Date: Mar 2016
Location: Ozherele
Posts: 229
Quote:
Originally Posted by Thorham View Post
Just benched it on a 68030 (same cycle times as 68020). It's the same speed as this:
Code:
    move.l  d0,d1
    lsr.l   #5,d1
    bset    d0,(a0,d1.w*4)
Very nice instructions those bitfield instructions.
Theoretically it is nice but practically... Moto's always has oddities like CLR which looks fine but works like RMW-type slow instruction. I am aware that with 68020 CLR works properly. However this illustrates the common fact that Motorola was always too theoretical and forced users of their CPU to use a bit raw and bulky instructions. The bit instructions were the part of forgotten NS 320xx ISA too. BTW it was the first 32-bit CPU in a chip.
The 386 code:
Code:
mov ebx,eax ;2
shl ebx,5   ;3
bts [esi+4*ebx],eax  ;12
17 ticks and 9 bytes.
BTW. BE is a horror! it is even worse than octals. Decimals are horrific too but IBM realized FP BCD at its latest mainframes... We have imperfect world. C'est la vie. Somebody confuses the external and internal representations. The same shame we have with Unicode.

Last edited by litwr; 07 February 2017 at 17:22.
litwr is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Generated code and CPU Instruction Cache Mrs Beanbag Coders. Asm / Hardware 11 23 May 2014 11:05
EAB Christmas Song-writing Contest mr_a500 project.EAB 64 24 May 2009 02:44
AmigaSYS Wallpaper Contest Calo Nord News 10 22 April 2005 09:33
Landover's Amiga Arcade Conversion Contest Frog News 1 28 January 2005 23:41
Battlechess Contest (EAB vs A500) Bloodwych Nostalgia & memories 67 14 August 2003 14:37

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 05:15.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.10419 seconds with 14 queries