English Amiga Board

English Amiga Board (https://eab.abime.net/index.php)
-   Coders. C/C++ (https://eab.abime.net/forumdisplay.php?f=118)
-   -   GCC 6.2 toolchain for AmigaOS 3 (https://eab.abime.net/showthread.php?t=85474)

bebbo 11 January 2017 20:44

Quote:

Originally Posted by nogginthenog (Post 1133858)
Late home today due to a train drivers strike in Southern England. 2 hours to get home from work :-(

This seems to be the cause of the problems.
Error: Unknown pseudo-op: `.section'

Confirmed that m68k-amigaos-as does not support .section

Example:
Code:

paul@debian:cd ~/source/gcc6.2/amigaos-cross-toolchain/submodules/libnix/sources/stubs/stubs

/opt/m68k-amigaos/bin/m68k-amigaos-gcc __dtor_list__.c
/tmp/cctMPpYM.s: Assembler messages:
/tmp/cctMPpYM.s:3: Error: Unknown pseudo-op:  `.section'

/opt/m68k-amigaos/bin/m68k-amigaos-gcc -S __dtor_list__.c
cat __dtor_list__.s
#NO_APP
        .globl  ___DTOR_LIST__
        .section        .bss
        .align  2
___DTOR_LIST__:
        .skip 8

I will try to investigate some more tomorrow.
Thanks for the great work Bebbo, we appreciate it :-)


You need the patched binutil-2.14 which is in the toolchain:
Code:

./toolchain-m68k --gcc 6 --binutils 2.14
... it's just a start.

Bebbo

alkis 18 January 2017 17:33

Can we see some generated code?

For reference here is a small function strcpy.c

Code:

char *strcpy(char *dst, const char *src) {
  char *ret = dst;
  while(*dst++=*src++)
    ;
  return ret;
}

With amiga-gcc cross-compiler 3.4.0
Code:

m68k-amigaos-gcc -fno-builtin -S strcpy.c -c -fverbose-asm -fomit-frame-pointer -O3
The resulting strcpy.s
Code:

_strcpy:
        movel sp@(4),a0 ;# dst, dst
        movel sp@(8),a1 ;# src, src
        movel a0,d1    ;# dst, ret
        .even
L2:
        moveb a1@+,d0  ;#, tmp36
        moveb d0,a0@+  ;# tmp36,
        jne L2  ;#
        movel d1,d0    ;# ret, <result>
        rts

And with gcc-m68k (not for amiga) 5.4.0
Code:

strcpy:
        move.l %a2,-(%sp)      |,
        move.l 8(%sp),%a0      | dst, dst
        move.l 12(%sp),%a2      | src, src
        move.l %a0,%a1  | dst, ivtmp.12
.L2:
        move.b (%a2)+,%d0      | MEM[base: src_8, offset: 4294967295B], D.1040
        move.b %d0,(%a1)+      | D.1040, MEM[base: _14, offset: 0B]
        jne .L2 |
        move.l %a0,%d0  |,
        move.l (%sp)+,%a2      |,
        rts

Can we see the relevant strcpy.s from 6.x please?
Thanks.

nogginthenog 18 January 2017 20:09

I don't have libnix built yet but GCC 6.2.1 produces this:

Code:

_strcpy:
        move.l a2,-(sp) |,
        move.l a0,d0    | dst, dst
        move.l a0,a2    | dst, ivtmp.11
.L2:
        move.b (a1)+,d1 | MEM[base: src_8, offset: 4294967295B], _9
        move.b d1,(a2)+ | _9, MEM[base: _14, offset: 0B]
        jne .L2 |
        move.l (sp)+,a2 |,
        rts


matthey 18 January 2017 21:15

Quote:

Originally Posted by nogginthenog (Post 1135693)
I don't have libnix built yet but GCC 6.2.1 produces this:

Code:

_strcpy:
        move.l a2,-(sp) |,
        move.l a0,d0    | dst, dst
        move.l a0,a2    | dst, ivtmp.11
.L2:
        move.b (a1)+,d1 | MEM[base: src_8, offset: 4294967295B], _9
        move.b d1,(a2)+ | _9, MEM[base: _14, offset: 0B]
        jne .L2 |
        move.l (sp)+,a2 |,
        rts


This is not using the AT&T ABI so is not comparable to what alkis posted. What option did you use to get it to pass arguments in registers?

The code is flawed of course (as usual with GCC) and should probably be the following using register arguments as above with a0=dst a1=src (inlining would be better where possible).

Code:

_strcpy:
        move.l a0,d0
.L2:
        move.b (a1)+,(a0)+
        jne .L2
        rts


nogginthenog 19 January 2017 13:15

Quote:

Originally Posted by matthey (Post 1135714)
This is not using the AT&T ABI so is not comparable to what alkis posted. What option did you use to get it to pass arguments in registers?

Same as alkis.

matthey 19 January 2017 17:49

Quote:

Originally Posted by matthey (Post 1135714)
This is not using the AT&T ABI so is not comparable to what alkis posted. What option did you use to get it to pass arguments in registers?

Quote:

Originally Posted by nogginthenog (Post 1135851)
Same as alkis.

Very strange that the AT&T (stack args) ABI would not be used, especially with the same options. The Amiga Geek Gadgets guys tried to introduce a more efficient ABI using registers in the unofficial Amiga GCC versions up to 3.4.0 (good idea but buggy in my experience as was the RTD support). The official GCC maintainers refused to consider anything but the AT&T ABI for the 68k, including customizable register arguments for functions, while they supported several ABIs for the x86/x86_64 for years (regparm and fastcall among a wide selection). No new ABI was introduced even for the ColdFire although the MOV3Q was introduced primarily for popping the arguments off the stack since RTD was removed (because RTD support was buggy in GCC?). I don't know if the failed support for the 68k/CF is because of incompetence or bias but probably some of both by the GCC developers and the 68k/CF developers.

alkis 19 January 2017 17:51

Quote:

Originally Posted by matthey (Post 1135714)
...
The code is flawed of course (as usual with GCC) ..

Are you taking a stand that C compilers are flawed in the amiga? :)

Or...what code does your favorite amiga c compiler produces???

matthey 19 January 2017 22:37

Quote:

Originally Posted by alkis (Post 1135942)
Are you taking a stand that C compilers are flawed in the amiga? :)

The code generation is flawed in comparison to what modern compilers are capable of.

Quote:

Originally Posted by alkis (Post 1135942)
Or...what code does your favorite amiga c compiler produces???

Vbcc uses assembler inlines by default giving the following code for strcpy().

Code:

  move.l a0,d0
.l1:
  move.b (a1)+,(a0)+
  bne .l1

Beautiful! Perfect! Short sweet and inlined. However, if you compile the C code above you get the following.

Code:

strcpy:
  movem.l a2-a3,-(sp)
  movea.l ($10,sp),a3
  movea.l ($c,sp),a2
  move.l a2,d0
  movea.l a3,a1
  addq.l #1,a3
  movea.l a2,a0
  addq.l #1,a2
  move.b (a1),(a0)
  beq.b .l2
.l1:
  movea.l a3,a1
  addq.l #1,a3
  movea.l a2,a0
  addq.l #1,a2
  move.b (a1),(a0)
  bne.b .l1
.l2:
  movem.l (sp)+,a2-a3
  rts

Doh! Epic fail! Dr. Barthelmann and his vbcc compiler are capable of more but realistically, as I told Frank Wille, it is not going to happen without support for the 68k and Amiga. Who wants to waste time on a dead platform for a handful of people?

Thorham 20 January 2017 01:34

Wow, that's bad. I don't think even old compilers like SAS/C produce that kind of code. Just wow :rolleyes

I certainly know which compiler to avoid now. What a shame :(

matthey 20 January 2017 02:16

Quote:

Originally Posted by Thorham (Post 1136024)
Wow, that's bad. I don't think even old compilers like SAS/C produce that kind of code. Just wow :rolleyes

I certainly know which compiler to avoid now. What a shame :(

It is embarrassing and frustrating. Vbcc can generate some of the best code and some of the worst. I don't know what happened here. It was generating better code in an earlier version. It is capable of using advanced instructions and addressing modes like move.b (a1)+,(a0)+ but then so is GCC. Maybe there is a fix already for some of the problems but more problems seem to creep back in with both vbcc and GCC. Too much complexity and not enough developers any more. Sad :sad.

Thorham 20 January 2017 02:36

At least that means SAS/C isn't useless ;)

Here's what SAS/C produces, for those who are interested.
Code:

              SECTION      text,CODE
__code:
@strcopy:
              MOVE.L        A2,-(A7)                ;2f0a
___strcopy__1:
              MOVE.L        A1,A2                    ;2449
___strcopy__2:
              MOVE.B        (A0)+,D0                ;1018
              MOVE.B        D0,(A1)+                ;12c0
              BNE.B          ___strcopy__2            ;66fa
___strcopy__3:
              MOVE.L        A2,D0                    ;200a
___strcopy__4:
              MOVE.L        (A7)+,A2                ;245f
              RTS                                    ;4e75
__const:
__strings:
              XDEF          @strcopy
              END


alkis 20 January 2017 08:28

Seems like the gcc-3.4.0 produces the most efficient code then, as it avoids saving/restoring a register to stack.

Thorham 20 January 2017 09:45

Quote:

Originally Posted by alkis (Post 1136057)
Seems like the gcc-3.4.0 produces the most efficient code then, as it avoids saving/restoring a register to stack.

For code this trivial compilers should produce:
Code:

strcpy
    move.l  a0,d0
.loop
    move.b  (a0)+,(a1)+
    bne.s  .loop

    rts

Or better:
Code:

strcpy
    move.l  a0,d0
.loop
    move.b  (a0)+,(a1)+
    beq.s  .end
    move.b  (a0)+,(a1)+
    beq.s  .end
    move.b  (a0)+,(a1)+
    beq.s  .end
    move.b  (a0)+,(a1)+
    bne.s  .loop
.end
    rts


alkis 20 January 2017 10:50

Well, with

Code:

m68k-amigaos-gcc -S -O3 strcpy.c -fomit-frame-pointer -funroll-all-loops
it unrolls it 8 times

Code:

_strcpy:
        movel sp@(4),a1
        movel sp@(8),a0
        movel a1,d1
L2:
        moveb a0@+,d0
        moveb d0,a1@+
        jeq L12
        moveb a0@+,d0
        moveb d0,a1@+
        jeq L12
        moveb a0@+,d0
        moveb d0,a1@+
        jeq L12
        moveb a0@+,d0
        moveb d0,a1@+
        jeq L12
        moveb a0@+,d0
        moveb d0,a1@+
        jeq L12
        moveb a0@+,d0
        moveb d0,a1@+
        jeq L12
        moveb a0@+,d0
        moveb d0,a1@+
        jeq L12
        moveb a0@+,d0
        moveb d0,a1@+
        jne L2
        .even
L12:
        movel d1,d0
        rts

I don't understand why the peephole optimiser doesn't convert
Code:

        moveb a0@+,d0
        moveb d0,a1@+

to moveb a0@,a1@ since d0 is dead after, but hey...

wawa 20 January 2017 11:20

not that i understand much but for the sake of it, results of both aros68k compilers im currently using:

strcpy:
move.l 4(%sp),%d0 | dst, dst
move.l 8(%sp),%a1 | src, src
move.l %d0,%a0 | dst, ivtmp.11
.L2:
move.b (%a1)+,%d1 | MEM[base: src_8, offset: 4294967295B], _9
move.b %d1,(%a0)+ | _9, MEM[base: _14, offset: 0B]
jne .L2 |
rts
.size strcpy, .-strcpy
.ident "GCC: (GNU) 6.1.0"


strcpy:
move.l 4(%sp),%d0 | dst, dst
move.l 8(%sp),%a1 | src, ivtmp.9
move.l %d0,%a0 | dst, dst
.L2:
move.b (%a1)+,%d1 | MEM[base: D.779_19, offset: 0B], D.759
move.b %d1,(%a0)+ | D.759, MEM[base: dst_1, offset: 0B]
jne .L2 |
rts
.size strcpy, .-strcpy
.ident "GCC: (GNU) 4.6.4"

hooverphonique 20 January 2017 14:13

Quote:

Originally Posted by Thorham (Post 1136065)
For code this trivial compilers should produce:
Code:

strcpy
    move.l  a0,d0


what is
Code:

strcpy
    move.l  a0,d0

for?

idrougge 20 January 2017 17:15

Quote:

Originally Posted by Thorham (Post 1136065)
Code:

strcpy
    move.l  a0,d0
.loop
    move.b  (a0)+,(a1)+
    beq.s  .end
    move.b  (a0)+,(a1)+
    beq.s  .end
    move.b  (a0)+,(a1)+
    beq.s  .end
    move.b  (a0)+,(a1)+
    bne.s  .loop
.end
    rts


What is gained by unrolling one loop into four loops?

Samurai_Crow 20 January 2017 17:20

Quote:

Originally Posted by idrougge (Post 1136139)
What is gained by unrolling one loop into four loops?

Duff's device is not cache friendly and offers no speed benefit in this case.

Thorham 20 January 2017 18:01

Quote:

Originally Posted by alkis (Post 1136075)
Well, with m68k-amigaos-gcc -S -O3 strcpy.c -fomit-frame-pointer -funroll-all-loops it unrolls it 8 times

That's something, at least.

Quote:

Originally Posted by alkis (Post 1136075)
I don't understand why the peephole optimiser doesn't convert
Code:

        moveb a0@+,d0
        moveb d0,a1@+

to moveb a0@,a1@ since d0 is dead after, but hey...

Yeah, that's pretty shitty and makes no sense.

Quote:

Originally Posted by hooverphonique (Post 1136104)
what is
Code:

strcpy
    move.l  a0,d0

for?

D0 is the return value.

Quote:

Originally Posted by idrougge (Post 1136139)
What is gained by unrolling one loop into four loops?

On 68020s and 68030s taken branches are 8 cycles, not taken byte branches are 4 cycles.

matthey 20 January 2017 18:24

Quote:

Originally Posted by alkis (Post 1136075)
Well, with

Code:

m68k-amigaos-gcc -S -O3 strcpy.c -fomit-frame-pointer -funroll-all-loops
it unrolls it 8 times

Code:

_strcpy:
        movel sp@(4),a1
        movel sp@(8),a0
        movel a1,d1
L2:
        moveb a0@+,d0
        moveb d0,a1@+
        jeq L12
        moveb a0@+,d0
        moveb d0,a1@+
        jeq L12
        moveb a0@+,d0
        moveb d0,a1@+
        jeq L12
        moveb a0@+,d0
        moveb d0,a1@+
        jeq L12
        moveb a0@+,d0
        moveb d0,a1@+
        jeq L12
        moveb a0@+,d0
        moveb d0,a1@+
        jeq L12
        moveb a0@+,d0
        moveb d0,a1@+
        jeq L12
        moveb a0@+,d0
        moveb d0,a1@+
        jne L2
        .even
L12:
        movel d1,d0
        rts


I expect it is generally better to inline strcpy() where it is called than unroll strcpy() after a costly push/bsr/rts/pop (assuming strings are not excessively long). This is what vbcc does with the assembler inlines.

Quote:

Originally Posted by alkis (Post 1136075)
I don't understand why the peephole optimiser doesn't convert
Code:

        moveb a0@+,d0
        moveb d0,a1@+

to moveb a0@,a1@ since d0 is dead after, but hey...

That is not a peephole optimization. Further analysis outside of this code snippet is needed to conclude the optimization can be done (it is not guaranteed to be an equivalent replacement). Two instruction input peephole optimizations are aggressive too. Vasm only looks at one instruction max (but can output multiple instructions) and it is currently the best 68k peephole optimizing assembler.

@wawa
The aros68k GCC compiler is doing a good job here. It should be able to merge 2 lines inside the loop but overall good.

Quote:

Originally Posted by hooverphonique (Post 1136104)
what is
Code:

strcpy
    move.l  a0,d0

for?

strcpy() returns a pointer to the destination. The return is rarely used because it is already available and doesn't change but that is how it is defined.

Code:

char *strcpy(char *dst, const char *src);
It is much more useful to return a pointer to the end of the string like stpcpy().

Code:

char *stpcpy(char *dst, const char *src);

char *stpcpy(char *dst, const char *src)
{
while (*dst++ = *src++);
return (dst-1);
}

This came from SAS/C to BSD where it became part of POSIX but not C99 or C11.


All times are GMT +2. The time now is 07:33.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.

Page generated in 0.09604 seconds with 11 queries