![]() |
![]() |
#121 | |
Registered User
Join Date: May 2016
Location: Rostock/Germany
Posts: 132
|
Quote:
Code:
move #999,d0 .loop load (a0)+,E0 ;a0 b0 c0 d0 (.w) load (a0)+,E1 ;a1 b1 c1 d1 load (a0)+,E2 ;a2 b2 c2 d2 load (a0)+,E3 ;a3 b3 c3 d3 transhi E0-E3,E4:E5 ;E4: a0 a1 a2 a3 E5: b0 b1 b2 b3 translo E0-E3,E6:E7 ;E6: c0 c1 c2 c3 E7: d0 d1 d2 d3 ;TRANS has latency, 1 cyc lost in this example store E4,(a1)+ ; store E5,(a2)+ ; store E6,(a3)+ ; store E7,(a4)+ ;inner loop assembles to 10 * 32 Bit dbf d0,.loop ;plus move, dbf = 12 * 32 Bit |
|
![]() |
![]() |
#122 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
Unfortunately for you this thread is about code density. And your example shows no benefit at all in this area.
Last edited by prowler; 01 March 2017 at 22:11. Reason: Cleanup. |
![]() |
![]() |
#123 |
Banned
Join Date: Aug 2005
Location: London / Sydney
Age: 47
Posts: 20,420
|
Why do all these threads keep going downhill; it's usually the same people arguing???
![]() I really don't have time to be reading through 7 pages of bickering between you guys... I need to prepare for a new job over the weekend so... Closed for now until another GM has time to review. |
![]() |
![]() |
#124 |
Global Moderator
Join Date: Aug 2008
Location: Sidcup, England
Posts: 10,300
|
Done, thanks Damien!
![]() Thread reopened. Now, let's try again, shall we, guys? ![]() Last edited by prowler; 03 March 2017 at 22:24. Reason: typo. |
![]() |
![]() |
#125 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
Ok, let's try again.
I have a new case to submit. It's a complicated one, of course to do in a minimal amount of code. This is a real life case, but discussing details would probably lead to endless OT. Here is pseudo-code as i guess an explanation wouldn't be clear enough : Code:
flag=0 start: struct = data ; some rel(pc) array of structs with sizeof =8 loop: x = struct[5] >>4 ; -- -- -- -- -- x- v1=table[x*2] ; attn: table too far for d8(pc,ix) but ok for d16(pc) v2=table[x*2 +1] if flag and v2>=0 goto skip ; v2>=0 is bit #7 test cc = (v2<0) ; set condition (passed to call func via some reg) if flag then cc=0 var = (v1&128) + (v2&15) ; but bits 6,5,4 of var are "don't care" call func (struct,cc,var) skip: ; value in var is unimportant if we skip struct = struct +8 ; next item (sizeof =8) if struct[0]<>0 or struct[1]<>0 or struct[2]<>0 or struct[3]<>0 goto loop if flag goto error flag = 1 goto start error: Note : pseudo-code is of course not optimal. Use as few registers as you can. Last edited by meynaf; 03 March 2017 at 13:36. Reason: fixed two mistakes in pseudo code |
![]() |
![]() |
#126 | ||||
Registered User
Join Date: Mar 2016
Location: Ozherele
Posts: 229
|
Sorry for delay I was a bit sick.
Quote:
![]() I can also mention very expensive PDP/VAX-11 ISA. Moto used it as a pattern. It is interesting that "mighty" VAX-11/730 can be outperformed by 6502 @4MHz! Look at pi-spigot results for a proof. Moto was too close to this madness. Quote:
Quote:
Quote:
It is too big to be a sport event. |
||||
![]() |
![]() |
#127 | |||||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
Quote:
BSET isn't slow. And you can't criticize Moto for this one - x86 has it since 386. Starting with 68040 both are just 1 clock. Quote:
x86 is full of bugs (like ol' opcode F0 0F C7 C8 that simply hanged first pentiums like a 02 on 6502). 80286 was so buggy that it was impossible to get out of protected mode... Quote:
Quote:
Quote:
And the user looking in data files will find them very unreadable if in LE. I did it just too many times. Now you want to check some file contents. Like a WAV. Even though it's LE, being LE will still cause trouble when you compare e.g. "RIFF" (52495656). On 68k it's simple cmpi.l #"RIFF",(a0)+. On x86 "RIFF" may well translate to 56564952 and then you get wrong code. Quote:
Quote:
Big asm code shouldn't be a problem with a decent instruction set. And this is the problem with x86's so-called good code density. It's good only for very small code - as soon as it becomes larger, it starts to suck. HOMM2 on x86 is 1.5MB of code, on 68k it's 0.9MB (in spite the compiler did a really poor job - it could have been half that size). For your "sport event", it needs to be big enough to put some pressure on the register file. How would a c2p look like on x86 for example ? |
|||||||
![]() |
![]() |
#128 | ||
Registered User
Join Date: Mar 2016
Location: Ozherele
Posts: 229
|
Quote:
![]() Quote:
![]() I agree some assemblers for LE may have this problem but it is just representation. There is no problem to write "RIFF" for proper configured x86 assembler. ML level is the level of legendary coders of 50s... Last edited by litwr; 05 May 2017 at 19:44. |
||
![]() |
![]() |
#129 | |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,355
|
Of course it had to be just a write but this is only an implementation problem.
I'm not saying the implementation of 68k is good. I'm just saying the instruction set is good (at least, good enough for asm use). And anyway this was fixed in 68020 (or 68010 ? I don't remember). Damned english ![]() Again it's just old 68000 problem. Quote:
Now... perhaps should i recall you that this is a thread about code density. So when will you write code ? |
|
![]() |
![]() |
#130 |
Banned
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
|
Dr. Vince Weaver finally updated his code density web site and documentation using some improvements suggested in this thread.
http://www.deater.net/weave/vmwprod/asm/ll/ The 68k has the best code density for the LZSS decoder and 2nd best for total size. Thanks to Vince, to all who contributed code and to the 68k designers for one of the greatest CPU architecture of all time. |
![]() |
![]() |
#131 | |
Registered User
Join Date: May 2014
Location: inside the emulator
Posts: 377
|
Quote:
Note that the ARM and x86 code can be improved, probably the 68k and others too. |
|
![]() |
![]() |
#132 | ||
Banned
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
|
Quote:
Quote:
http://eab.abime.net/showthread.php?t=85474 http://eab.abime.net/showthread.php?t=87205 |
||
![]() |
![]() |
#133 |
Registered User
Join Date: May 2014
Location: inside the emulator
Posts: 377
|
The GCC patches are promising I must say, haven't looked at VBCC lately but any improvement is nice.
![]() Given how clean the 68k architecture is it's strange IMHO that compilers should have any problem generating good code, a compiler that use the (mostly legacy) quirky instructions of x86 instead of a "RISC" subset having problems sure - but 68k? The only real "problem" is the split D and A register sets and that's not too hard to work with... -- I have never been good at compression/decompression code however the LZSS decompression code in the logo routine(s) feels odd. Feels is the right word as I haven't really analysed it. |
![]() |
![]() |
#134 | |
Banned
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
|
Vince added RISC-V ISAs to the comparison.
http://www.deater.net/weave/vmwprod/asm/ll/ He said there was room for improvement for RISC-V code. So far the results come up a little short of the code density hype and claims although RV64C appears to have pretty good code density for a 64 bit CPU. RV64C is beating arm64 (ARMv8 AArch64). All other RISC-V variants are unimpressive in code density and make me wonder why they even bothered. Quote:
|
|
![]() |
![]() |
#135 |
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,840
|
|
![]() |
![]() |
#136 |
Registered User
Join Date: May 2014
Location: inside the emulator
Posts: 377
|
Of course :P
However even on x86 with all support and "optimization" (quotes for a reason...) the generated code quality is generally lacking especially for size optimized code. Enable vectorization and increase optimization -> useless vectorization of scalar integer code that is extremely bloated and runs slower than the most naive integer code due to setup overheads. Beginning to think i'm getting older - as I'm starting to long for less complex compilers... |
![]() |
![]() |
#137 |
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,655
|
Intel will win because it still supports special-purpose 8-bit CPU instructions that do more than other 8-bit CPUs. 16-bit is fluffier with the exception of mul/div and 32-bit Risc are the worst. Even with Thumb they can't quite get there. There are 64-bit etc CPUs too, ofc ;-)
|
![]() |
![]() |
#138 |
Registered User
Join Date: May 2013
Location: Grimstad / Norway
Posts: 852
|
Back to that peculiar obsession with the size of the LZSS decompression loop. I sat down and tinkered with how I would do it natively if size was all I cared about and I could arrange data as I like. 34 bytes:
Code:
get_bits move.b (a3)+,d5 get_bit roxr.b #1,d5 beq.s get_bits bcs.s string literal move.b (a3)+,(a2)+ bra.s check string move.w (a3)+,d0 ; 4(negative) + 12(negative) move.w d0,d1 or.w d2,d0 ; $F000 sub.w d3,d1 ; 2<<12 lea (a2,d0.w),a0 copyloop move.b (a0)+,(a2)+ add.w d4,d1 ; 1<<12 bcc.s copyloop check cmp.l a2,a1 ; swap order? might need to adjust a1 by 1 bcc.s get_bit |
![]() |
![]() |
#139 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,491
|
Quote:
To be fair D2, D3 and A1 need to be initialized and consume code space (with an escape token A1 can be omitted). And this code do not works on 68000 machines ![]() [EDIT: there is even a subtle initialization bug..] Regards, ross Last edited by ross; 19 August 2017 at 23:49. Reason: [] |
|
![]() |
![]() |
#140 |
Registered User
Join Date: May 2013
Location: Grimstad / Norway
Posts: 852
|
I know, but I said it was about the loop itself. That was the only thing that was counted for some reason, and so I cut out all init etc.
If I cared about a more realistic total code size then I would arrange it differently. Chances are I would care about speed too. And if you are willing to drop in-buffer overwrite de-compression then you can separate literals and control/length+distance bits and read out the control bits("get_bits") 16 at a time. Last edited by NorthWay; 20 August 2017 at 03:03. |
![]() |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Generated code and CPU Instruction Cache | Mrs Beanbag | Coders. Asm / Hardware | 11 | 23 May 2014 11:05 |
EAB Christmas Song-writing Contest | mr_a500 | project.EAB | 64 | 24 May 2009 02:44 |
AmigaSYS Wallpaper Contest | Calo Nord | News | 10 | 22 April 2005 09:33 |
Landover's Amiga Arcade Conversion Contest | Frog | News | 1 | 28 January 2005 23:41 |
Battlechess Contest (EAB vs A500) | Bloodwych | Nostalgia & memories | 67 | 14 August 2003 14:37 |
|
|