GCC 6.2 toolchain for AmigaOS 3 - Page 2

bebbo · 11 January 2017, 20:44

Quote:

Originally Posted by nogginthenog

Late home today due to a train drivers strike in Southern England. 2 hours to get home from work :-(

This seems to be the cause of the problems.
Error: Unknown pseudo-op: `.section'

Confirmed that m68k-amigaos-as does not support .section

Example:

Code:

paul@debian:cd ~/source/gcc6.2/amigaos-cross-toolchain/submodules/libnix/sources/stubs/stubs

/opt/m68k-amigaos/bin/m68k-amigaos-gcc __dtor_list__.c
/tmp/cctMPpYM.s: Assembler messages:
/tmp/cctMPpYM.s:3: Error: Unknown pseudo-op:  `.section'

/opt/m68k-amigaos/bin/m68k-amigaos-gcc -S __dtor_list__.c
cat __dtor_list__.s
#NO_APP
        .globl  ___DTOR_LIST__
        .section        .bss
        .align  2
___DTOR_LIST__:
        .skip 8

I will try to investigate some more tomorrow.
Thanks for the great work Bebbo, we appreciate it :-)

You need the patched binutil-2.14 which is in the toolchain:

Code:

./toolchain-m68k --gcc 6 --binutils 2.14

... it's just a start.

Bebbo

alkis · 18 January 2017, 17:33

Can we see some generated code?

For reference here is a small function strcpy.c

Code:

char *strcpy(char *dst, const char *src) {
  char *ret = dst;
  while(*dst++=*src++)
    ;
  return ret;
}

With amiga-gcc cross-compiler 3.4.0

Code:

m68k-amigaos-gcc -fno-builtin -S strcpy.c -c -fverbose-asm -fomit-frame-pointer -O3

The resulting strcpy.s

Code:

_strcpy:
        movel sp@(4),a0 ;# dst, dst
        movel sp@(8),a1 ;# src, src
        movel a0,d1     ;# dst, ret
        .even
L2:
        moveb a1@+,d0   ;#, tmp36
        moveb d0,a0@+   ;# tmp36,
        jne L2  ;#
        movel d1,d0     ;# ret, <result>
        rts

And with gcc-m68k (not for amiga) 5.4.0

Code:

strcpy:
        move.l %a2,-(%sp)       |,
        move.l 8(%sp),%a0       | dst, dst
        move.l 12(%sp),%a2      | src, src
        move.l %a0,%a1  | dst, ivtmp.12
.L2:
        move.b (%a2)+,%d0       | MEM[base: src_8, offset: 4294967295B], D.1040
        move.b %d0,(%a1)+       | D.1040, MEM[base: _14, offset: 0B]
        jne .L2 |
        move.l %a0,%d0  |,
        move.l (%sp)+,%a2       |,
        rts

Can we see the relevant strcpy.s from 6.x please?
Thanks.

nogginthenog · 18 January 2017, 20:09

I don't have libnix built yet but GCC 6.2.1 produces this:

Code:

_strcpy:
        move.l a2,-(sp) |,
        move.l a0,d0    | dst, dst
        move.l a0,a2    | dst, ivtmp.11
.L2:
        move.b (a1)+,d1 | MEM[base: src_8, offset: 4294967295B], _9
        move.b d1,(a2)+ | _9, MEM[base: _14, offset: 0B]
        jne .L2 |
        move.l (sp)+,a2 |,
        rts

matthey · 18 January 2017, 21:15

Quote:

Originally Posted by nogginthenog

I don't have libnix built yet but GCC 6.2.1 produces this:

Code:

_strcpy:
        move.l a2,-(sp) |,
        move.l a0,d0    | dst, dst
        move.l a0,a2    | dst, ivtmp.11
.L2:
        move.b (a1)+,d1 | MEM[base: src_8, offset: 4294967295B], _9
        move.b d1,(a2)+ | _9, MEM[base: _14, offset: 0B]
        jne .L2 |
        move.l (sp)+,a2 |,
        rts

This is not using the AT&T ABI so is not comparable to what alkis posted. What option did you use to get it to pass arguments in registers?

The code is flawed of course (as usual with GCC) and should probably be the following using register arguments as above with a0=dst a1=src (inlining would be better where possible).

Code:

_strcpy:
        move.l a0,d0
.L2:
        move.b (a1)+,(a0)+
        jne .L2
        rts

nogginthenog · 19 January 2017, 13:15

Quote:

Originally Posted by matthey

This is not using the AT&T ABI so is not comparable to what alkis posted. What option did you use to get it to pass arguments in registers?

Same as alkis.

matthey · 19 January 2017, 17:49

Quote:

Originally Posted by matthey

This is not using the AT&T ABI so is not comparable to what alkis posted. What option did you use to get it to pass arguments in registers?

Quote:

Originally Posted by nogginthenog

Same as alkis.

Very strange that the AT&T (stack args) ABI would not be used, especially with the same options. The Amiga Geek Gadgets guys tried to introduce a more efficient ABI using registers in the unofficial Amiga GCC versions up to 3.4.0 (good idea but buggy in my experience as was the RTD support). The official GCC maintainers refused to consider anything but the AT&T ABI for the 68k, including customizable register arguments for functions, while they supported several ABIs for the x86/x86_64 for years (regparm and fastcall among a wide selection). No new ABI was introduced even for the ColdFire although the MOV3Q was introduced primarily for popping the arguments off the stack since RTD was removed (because RTD support was buggy in GCC?). I don't know if the failed support for the 68k/CF is because of incompetence or bias but probably some of both by the GCC developers and the 68k/CF developers.

alkis · 19 January 2017, 17:51

Quote:

Originally Posted by matthey

...
The code is flawed of course (as usual with GCC) ..

Are you taking a stand that C compilers are flawed in the amiga?

Or...what code does your favorite amiga c compiler produces???

matthey · 19 January 2017, 22:37

Quote:

Originally Posted by alkis

Are you taking a stand that C compilers are flawed in the amiga?

The code generation is flawed in comparison to what modern compilers are capable of.

Quote:

Originally Posted by alkis

Or...what code does your favorite amiga c compiler produces???

Vbcc uses assembler inlines by default giving the following code for strcpy().

Code:

   move.l a0,d0
.l1:
   move.b (a1)+,(a0)+
   bne .l1

Beautiful! Perfect! Short sweet and inlined. However, if you compile the C code above you get the following.

Code:

strcpy:
   movem.l a2-a3,-(sp)
   movea.l ($10,sp),a3
   movea.l ($c,sp),a2
   move.l a2,d0
   movea.l a3,a1
   addq.l #1,a3
   movea.l a2,a0
   addq.l #1,a2
   move.b (a1),(a0)
   beq.b .l2
.l1:
   movea.l a3,a1
   addq.l #1,a3
   movea.l a2,a0
   addq.l #1,a2
   move.b (a1),(a0)
   bne.b .l1
.l2:
   movem.l (sp)+,a2-a3
   rts

Doh! Epic fail! Dr. Barthelmann and his vbcc compiler are capable of more but realistically, as I told Frank Wille, it is not going to happen without support for the 68k and Amiga. Who wants to waste time on a dead platform for a handful of people?

Thorham · 20 January 2017, 01:34

Wow, that's bad. I don't think even old compilers like SAS/C produce that kind of code. Just wow

I certainly know which compiler to avoid now. What a shame

matthey · 20 January 2017, 02:16

Quote:

Originally Posted by Thorham

Wow, that's bad. I don't think even old compilers like SAS/C produce that kind of code. Just wow

I certainly know which compiler to avoid now. What a shame

It is embarrassing and frustrating. Vbcc can generate some of the best code and some of the worst. I don't know what happened here. It was generating better code in an earlier version. It is capable of using advanced instructions and addressing modes like move.b (a1)+,(a0)+ but then so is GCC. Maybe there is a fix already for some of the problems but more problems seem to creep back in with both vbcc and GCC. Too much complexity and not enough developers any more. Sad

.

Thorham · 20 January 2017, 02:36

At least that means SAS/C isn't useless

Here's what SAS/C produces, for those who are interested.

Code:

              SECTION      text,CODE
__code:
@strcopy:
              MOVE.L         A2,-(A7)                 ;2f0a 
___strcopy__1:
              MOVE.L         A1,A2                    ;2449 
___strcopy__2:
              MOVE.B         (A0)+,D0                 ;1018 
              MOVE.B         D0,(A1)+                 ;12c0 
              BNE.B          ___strcopy__2            ;66fa 
___strcopy__3:
              MOVE.L         A2,D0                    ;200a 
___strcopy__4:
              MOVE.L         (A7)+,A2                 ;245f 
              RTS                                     ;4e75 
__const:
__strings:
              XDEF           @strcopy
              END

alkis · 20 January 2017, 08:28

Seems like the gcc-3.4.0 produces the most efficient code then, as it avoids saving/restoring a register to stack.

Thorham · 20 January 2017, 09:45

Quote:

Originally Posted by alkis

Seems like the gcc-3.4.0 produces the most efficient code then, as it avoids saving/restoring a register to stack.

For code this trivial compilers should produce:

Code:

strcpy
    move.l  a0,d0
.loop
    move.b  (a0)+,(a1)+
    bne.s   .loop

    rts

Or better:

Code:

strcpy
    move.l  a0,d0
.loop
    move.b  (a0)+,(a1)+
    beq.s   .end
    move.b  (a0)+,(a1)+
    beq.s   .end
    move.b  (a0)+,(a1)+
    beq.s   .end
    move.b  (a0)+,(a1)+
    bne.s   .loop
.end
    rts

alkis · 20 January 2017, 10:50

Well, with

Code:

m68k-amigaos-gcc -S -O3 strcpy.c -fomit-frame-pointer -funroll-all-loops

it unrolls it 8 times

Code:

_strcpy:
        movel sp@(4),a1
        movel sp@(8),a0
        movel a1,d1
L2:
        moveb a0@+,d0
        moveb d0,a1@+
        jeq L12
        moveb a0@+,d0
        moveb d0,a1@+
        jeq L12
        moveb a0@+,d0
        moveb d0,a1@+
        jeq L12
        moveb a0@+,d0
        moveb d0,a1@+
        jeq L12
        moveb a0@+,d0
        moveb d0,a1@+
        jeq L12
        moveb a0@+,d0
        moveb d0,a1@+
        jeq L12
        moveb a0@+,d0
        moveb d0,a1@+
        jeq L12
        moveb a0@+,d0
        moveb d0,a1@+
        jne L2
        .even
L12:
        movel d1,d0
        rts

I don't understand why the peephole optimiser doesn't convert

Code:

        moveb a0@+,d0
        moveb d0,a1@+

to moveb a0@,a1@ since d0 is dead after, but hey...

wawa · 20 January 2017, 11:20

not that i understand much but for the sake of it, results of both aros68k compilers im currently using:

strcpy:
move.l 4(%sp),%d0 | dst, dst
move.l 8(%sp),%a1 | src, src
move.l %d0,%a0 | dst, ivtmp.11
.L2:
move.b (%a1)+,%d1 | MEM[base: src_8, offset: 4294967295B], _9
move.b %d1,(%a0)+ | _9, MEM[base: _14, offset: 0B]
jne .L2 |
rts
.size strcpy, .-strcpy
.ident "GCC: (GNU) 6.1.0"

strcpy:
move.l 4(%sp),%d0 | dst, dst
move.l 8(%sp),%a1 | src, ivtmp.9
move.l %d0,%a0 | dst, dst
.L2:
move.b (%a1)+,%d1 | MEM[base: D.779_19, offset: 0B], D.759
move.b %d1,(%a0)+ | D.759, MEM[base: dst_1, offset: 0B]
jne .L2 |
rts
.size strcpy, .-strcpy
.ident "GCC: (GNU) 4.6.4"

hooverphonique · 20 January 2017, 14:13

Quote:

Originally Posted by Thorham

For code this trivial compilers should produce:

Code:

strcpy
    move.l  a0,d0

what is

Code:

strcpy
    move.l  a0,d0

for?

idrougge · 20 January 2017, 17:15

Quote:

Originally Posted by Thorham

Code:

strcpy
    move.l  a0,d0
.loop
    move.b  (a0)+,(a1)+
    beq.s   .end
    move.b  (a0)+,(a1)+
    beq.s   .end
    move.b  (a0)+,(a1)+
    beq.s   .end
    move.b  (a0)+,(a1)+
    bne.s   .loop
.end
    rts

What is gained by unrolling one loop into four loops?

Samurai_Crow · 20 January 2017, 17:20

Quote:

Originally Posted by idrougge

What is gained by unrolling one loop into four loops?

Duff's device is not cache friendly and offers no speed benefit in this case.

Thorham · 20 January 2017, 18:01

Quote:

Originally Posted by alkis

Well, with m68k-amigaos-gcc -S -O3 strcpy.c -fomit-frame-pointer -funroll-all-loops it unrolls it 8 times

That's something, at least.

Quote:

Originally Posted by alkis

I don't understand why the peephole optimiser doesn't convert

Code:

        moveb a0@+,d0
        moveb d0,a1@+

to moveb a0@,a1@ since d0 is dead after, but hey...

Yeah, that's pretty shitty and makes no sense.

Quote:

Originally Posted by hooverphonique

what is

Code:

strcpy
    move.l  a0,d0

for?

D0 is the return value.

Quote:

Originally Posted by idrougge

What is gained by unrolling one loop into four loops?

On 68020s and 68030s taken branches are 8 cycles, not taken byte branches are 4 cycles.

matthey · 20 January 2017, 18:24

Quote:

Originally Posted by alkis

Well, with

Code:

m68k-amigaos-gcc -S -O3 strcpy.c -fomit-frame-pointer -funroll-all-loops

it unrolls it 8 times

Code:

_strcpy:
        movel sp@(4),a1
        movel sp@(8),a0
        movel a1,d1
L2:
        moveb a0@+,d0
        moveb d0,a1@+
        jeq L12
        moveb a0@+,d0
        moveb d0,a1@+
        jeq L12
        moveb a0@+,d0
        moveb d0,a1@+
        jeq L12
        moveb a0@+,d0
        moveb d0,a1@+
        jeq L12
        moveb a0@+,d0
        moveb d0,a1@+
        jeq L12
        moveb a0@+,d0
        moveb d0,a1@+
        jeq L12
        moveb a0@+,d0
        moveb d0,a1@+
        jeq L12
        moveb a0@+,d0
        moveb d0,a1@+
        jne L2
        .even
L12:
        movel d1,d0
        rts

I expect it is generally better to inline strcpy() where it is called than unroll strcpy() after a costly push/bsr/rts/pop (assuming strings are not excessively long). This is what vbcc does with the assembler inlines.

Quote:

Originally Posted by alkis

I don't understand why the peephole optimiser doesn't convert

Code:

        moveb a0@+,d0
        moveb d0,a1@+

to moveb a0@,a1@ since d0 is dead after, but hey...

That is not a peephole optimization. Further analysis outside of this code snippet is needed to conclude the optimization can be done (it is not guaranteed to be an equivalent replacement). Two instruction input peephole optimizations are aggressive too. Vasm only looks at one instruction max (but can output multiple instructions) and it is currently the best 68k peephole optimizing assembler.

@wawa
The aros68k GCC compiler is doing a good job here. It should be able to merge 2 lines inside the loop but overall good.

Quote:

Originally Posted by hooverphonique

what is

Code:

strcpy
    move.l  a0,d0

for?

strcpy() returns a pointer to the destination. The return is rarely used because it is already available and doesn't change but that is how it is defined.

Code:

char *strcpy(char *dst, const char *src);

It is much more useful to return a pointer to the end of the string like stpcpy().

Code:

char *stpcpy(char *dst, const char *src);

char *stpcpy(char *dst, const char *src)
{
while (*dst++ = *src++);
return (dst-1);
}

This came from SAS/C to BSD where it became part of POSIX but not C99 or C11.

20 January 2017, 01:34	#29
Thorham Computer Nerd Join Date: Sep 2007 Location: Rotterdam/Netherlands Age: 48 Posts: 3,945	Wow, that's bad. I don't think even old compilers like SAS/C produce that kind of code. Just wow I certainly know which compiler to avoid now. What a shame Last edited by Thorham; 20 January 2017 at 01:59.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
New GCC based dev toolchain for AmigaOS 3.x	cla	Coders. Releases	8	24 December 2017 10:18
Issue with photon/xxxx WinUAE Toolchain	arpz	Coders. Asm / Hardware	2	26 September 2015 22:33
New 68k gcc toolchain	arti	Coders. C/C++	17	31 July 2015 03:59
Hannibal's WinUAE Demo Toolchain 5	Bobic	Amiga scene	1	23 July 2015 21:04
From gcc to vbcc.	Cowcat	Coders. General	9	06 June 2014 14:45

20 January 2017, 08:28	#32
alkis Registered User Join Date: Dec 2010 Location: Athens/Greece Age: 53 Posts: 729	Seems like the gcc-3.4.0 produces the most efficient code then, as it avoids saving/restoring a register to stack.

20 January 2017, 11:20	#35
wawa Registered User Join Date: Aug 2007 Location: berlin/germany Posts: 1,054	not that i understand much but for the sake of it, results of both aros68k compilers im currently using: strcpy: move.l 4(%sp),%d0 \| dst, dst move.l 8(%sp),%a1 \| src, src move.l %d0,%a0 \| dst, ivtmp.11 .L2: move.b (%a1)+,%d1 \| MEM[base: src_8, offset: 4294967295B], _9 move.b %d1,(%a0)+ \| _9, MEM[base: _14, offset: 0B] jne .L2 \| rts .size strcpy, .-strcpy .ident "GCC: (GNU) 6.1.0" strcpy: move.l 4(%sp),%d0 \| dst, dst move.l 8(%sp),%a1 \| src, ivtmp.9 move.l %d0,%a0 \| dst, dst .L2: move.b (%a1)+,%d1 \| MEM[base: D.779_19, offset: 0B], D.759 move.b %d1,(%a0)+ \| D.759, MEM[base: dst_1, offset: 0B] jne .L2 \| rts .size strcpy, .-strcpy .ident "GCC: (GNU) 4.6.4"

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)