07 February 2017, 09:15 | #41 |
Registered User
Join Date: Mar 2016
Location: Ozherele
Posts: 229
|
Thanks. Only this point has a real importance. But it is true for right shift division only, left shift division generally is faster.
Fascinating! However I am afraid that it maybe a bit slow with 68020. |
07 February 2017, 09:22 | #42 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
|
07 February 2017, 09:28 | #43 |
Registered User
Join Date: Sep 2007
Location: Stockholm
Posts: 4,357
|
Just like the 68k. There are probably people out there maintaining 68k/Coldfire code in all kinds of projects, but desktop applications, games and even demos are written in high-level languages today. The percentage of PC coders who know and use x86 assembly is much smaller than the amount of Amiga coders who know 68k assembly.
|
07 February 2017, 09:41 | #44 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
|
07 February 2017, 09:42 | #45 |
Registered User
Join Date: Sep 2007
Location: Stockholm
Posts: 4,357
|
I never said you should seek help on a PC forum.
|
07 February 2017, 09:50 | #46 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
|
07 February 2017, 10:11 | #47 | |
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,847
|
Just benched it on a 68030 (same cycle times as 68020). It's the same speed as this:
Code:
move.l d0,d1 lsr.l #5,d1 bset d0,(a0,d1.w*4) Thought it referred to the accessed bytes. Quote:
It's a programming forum. Not everything on stack overflow is peecee related. Most of it is language related. |
|
07 February 2017, 10:20 | #48 | |
Registered User
Join Date: Sep 2007
Location: Stockholm
Posts: 4,357
|
Quote:
StackOverflow is a lot of forums. I mainly use it as a Mac/iOS programming forum, but Codegolf is… well, check for yourself instead of just being stubborn. |
|
07 February 2017, 10:27 | #49 | |
Registered User
Join Date: Mar 2016
Location: Ozherele
Posts: 229
|
Quote:
[68000] ADDI.l #,Dn 16 cycles, MOVE.l #,Dn 12 cycles The timing should be equal for LE. |
|
07 February 2017, 10:44 | #50 |
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,847
|
You didn't suggest it, you just outright wrote it
No, it's not. It's one forum which is part of StackExchange: http://stackexchange.com/ |
07 February 2017, 10:54 | #51 |
Registered User
Join Date: Mar 2016
Location: Ozherele
Posts: 229
|
DEC disappeared claiming octals have better readability. They didn' t support hexadecimals to the end. What a stupidity! Motorola made the similar mistake...
BTW StackOverflow is one of the best IT forum. You may ask even about 8-bit Commodore there. |
07 February 2017, 12:52 | #52 | |
Registered User
Join Date: Jun 2015
Location: Germany
Posts: 1,926
|
Quote:
Code:
; a0=source, a1-a4=dest move.l a0,a5 adda.w #2000,a5 .loop movem.w (a0)+,d0-d7 swap d0 swap d1 move.w d4,d0 move.w d5,d1 move.l d0,(a1)+ swap d2 move.l d1,(a2)+ move.w d6,d2 swap d3 move.l d2,(a3)+ move.w d7,d3 move.l d3,(a4)+ cmpa.l a0,a5 bpl .loop rts |
|
07 February 2017, 14:02 | #53 | ||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
Quote:
(And if you think they're really helpful in our case, just go and ask them instead of wasting my time here.) You don't need to access more than 1 byte if you want to get 1 bit. Quote:
Quote:
Quote:
|
||||
07 February 2017, 14:12 | #54 |
Registered User
Join Date: Jun 2015
Location: Germany
Posts: 1,926
|
Well, if you change or augment the rules mid-game... I assumed the destination was in chipmem and the source in fast. Otherwise using movem wouldn't have been the best decision anyway.
|
07 February 2017, 14:20 | #55 | |
Natteravn
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,546
|
Quote:
The x86 inherited little-endian from their 8-bit CPUs, where it makes sense to read the least significant bytes first from memory when doing operations on them. But there is no reason for real 32/64 bits CPUs, except for compatibility with former models. Little-endian alone is a reason for me to stay away from a CPU. I never did much with ARM either, because it is mostly used in LE mode. |
|
07 February 2017, 14:25 | #56 | |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
Quote:
The destination is in chipmem and the source in fast. It's just that reads and writes must be performed with 32-bit width, or it becomes meaningless (see my 4x move.w explanation in previous posts). |
|
07 February 2017, 14:48 | #57 |
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,847
|
|
07 February 2017, 15:01 | #58 |
Registered User
Join Date: Jun 2015
Location: Germany
Posts: 1,926
|
Why would be doing 16bit reads from fast and 32bit writes to chip be meaningless? I understand you are investigating code density but made the extra condition to use 32bit moves for the writes because they are to chipmem on a 32bit chipmem machine. Your four word-size moves example violates this condition. My code does not and shows better code density and possibly even better speed on some 68k. To proud to admit this?
|
07 February 2017, 16:02 | #59 | ||||
Banned
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
|
Yes, and likely any other superscalar 68k CPU.
Quote:
Quote:
I wasn't criticizing your code. It was the shortest, even for the 68060 . Quote:
sOEP = secondary integer pipe Optimum would be every other instruction being sOEP although that is rarely possible. There are some instructions which can't be sOEP in the 68060 and don't even allow an sOEP instruction at the same time like MOVEM, SWAP (oversight/mistake as it could and should have been), MUL and DIV. There isn't much room to reschedule your code. This is just the nature of the EOR exchange algorithm which does more calculations. Quote:
Code:
; a0=source, a1-a4=dest move.w #1999,d0 ; pOEP .loop movem.l (a0)+,d1-d4 ; pOEP only move.l d1,d5 ; pOEP swap d1 ; pOEP only move.w d3,d1 ; pOEP swap d3 ; pOEP only move.l d1,(a2)+ ; pOEP move.w d3,d5 ; sOEP move.l d5,(a1)+ ; pOEP move.l d2,d5 ; sOEP swap d5 ; pOEP only move.w d4,d5 ; pOEP swap d4 ; pOEP only move.l d5,(a4)+ ; pOEP move.w d4,d2 ; sOEP move.l d2,(a3)+ ; pOEP dbf d0,.loop ; pOEP only rts ; pOEP only Code:
; a0=source, a1-a4=dest move.w #1999,d0 ; pOEP .loop movem.l (a0)+,d1-d4 ; pOEP only move.l d1,d5 ; pOEP swap d1 ; sOEP move.w d3,d1 ; pOEP swap d3 ; sOEP move.l d1,(a2)+ ; pOEP move.w d3,d5 ; sOEP move.l d5,(a1)+ ; pOEP move.l d2,d5 ; sOEP swap d5 ; pOEP move.w d4,d5 ; pOEP (dependency) swap d4 ; sOEP move.l d5,(a4)+ ; pOEP move.w d4,d2 ; sOEP move.l d2,(a3)+ ; pOEP dbf d0,.loop ; pOEP only rts ; pOEP only Last edited by matthey; 07 February 2017 at 16:23. |
||||
07 February 2017, 16:53 | #60 | |
Registered User
Join Date: Mar 2016
Location: Ozherele
Posts: 229
|
Quote:
The 386 code: Code:
mov ebx,eax ;2 shl ebx,5 ;3 bts [esi+4*ebx],eax ;12 BTW. BE is a horror! it is even worse than octals. Decimals are horrific too but IBM realized FP BCD at its latest mainframes... We have imperfect world. C'est la vie. Somebody confuses the external and internal representations. The same shame we have with Unicode. Last edited by litwr; 07 February 2017 at 17:22. |
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Generated code and CPU Instruction Cache | Mrs Beanbag | Coders. Asm / Hardware | 11 | 23 May 2014 11:05 |
EAB Christmas Song-writing Contest | mr_a500 | project.EAB | 64 | 24 May 2009 02:44 |
AmigaSYS Wallpaper Contest | Calo Nord | News | 10 | 22 April 2005 09:33 |
Landover's Amiga Arcade Conversion Contest | Frog | News | 1 | 28 January 2005 23:41 |
Battlechess Contest (EAB vs A500) | Bloodwych | Nostalgia & memories | 67 | 14 August 2003 14:37 |
|
|