English Amiga Board - Optimizing polygonfill bitcopy

Page 3 of 6

Show 20 post(s) from this thread on one page

English Amiga Board (https://eab.abime.net/index.php)

- Coders. General (https://eab.abime.net/forumdisplay.php?f=37)

- - Optimizing polygonfill bitcopy (https://eab.abime.net/showthread.php?t=99764)

TCH	26 November 2019 23:25

@a/b, @Don_Adan:
Thanks, this latest version is finally faster than the C one, by ~3.6%.

deimos

27 November 2019 11:38

Quote:

Originally Posted by TCH (Post 1361610)

@a/b, @Don_Adan:
Thanks, this latest version is finally faster than the C one, by ~3.6%.

Am I missing something, or is the C version within 3 to 4% of the current best assembly version? Which C compiler and what flags?

Antiriad_UK

27 November 2019 11:49

Makes me want to ditch assembler and go to C (Which I'm better at anyway lol)

ross	27 November 2019 12:17

This is a rather particular case, due to the fact that the compiler generates perfect code in the main loop.
But yes, latest GCC is very good on 68k (I only saw the generated code, my laziness prevented me from installing it for Amiga).
I suppose it's GCC because I compiled on x86 and the generated ASM code is pretty much the same :)

Don_Adan

27 November 2019 12:33

You can check this version, if you want, but perhaps same speed.

Code:

_PolygonBitmapToPlanes32: 

        movem.l        d2-d7/a2-a6,-(a7) 

        

        move.l        d2,a6 

        Add.l   a6,a6

        Add.l        a6,a6                ; a6 = Modulo<<2 = BitplaneSize-Width<<2 

        add.w        d1,d2 

        mulu.w        d4,d2 

        Lsl.l        #2,d2                ; longwords to bytes 

        sub.l        d3,d2       ; d2 = Depth*BitplaneSize-RowSize 

        subq.w        #1,d0                ; Height--;

        subq.w        #1,d1                ; Width--;

        subq.w        #1,d4                ; Depth--; 

        Swap    D4

        Move.w  d1,d4

       

        Ext.l D0    ; d0.w can not be negative? If can use and.l #$ffff,d0

c_h:

        Swap D0

        Move.w d0,a4

        Add.l A2,a4

        eor.w        #8<<2,d0        ; alternate between 0 and 8<<2 

        move.l        d4,d1

        Swap d1                    ; PlaneCounter = Depth; 

c_p:

        movea.l        a1,a3                ; SrcPtr = TempArea; 

        move.l        (a4)+,A5        ; CurrentPattern

        move.w        D4,d5                ; WidthCounter = Width-1;

c_w:

        move.l        (a0),d7 

        move.l        A5,d6 

        eor.l        d7,d6 

        and.l        (a3)+,d6 

        eor.l        d7,d6 

        move.l        d6,(a0)+        ; *DestPtr++ = (*DestPtr&~Temp)|(CurrentPattern&Temp);

        dbf        d5,c_w                ; if (--WidthCounter >= 0) goto c_w;



        adda.l        a6,a0                ; DestArea += BitplaneSize-Width<<2;

        dbf        d1,c_p                ; if (--PlaneCounter >= 0) goto c_p;



        sub.l        d2,a0                ; DestArea += RowSize-Depth*BitplaneSize;

        adda.l        d3,a1                ; TempArea += RowSize; 

         Swap     d0                    ; to dbf

        dbf        d0,c_h                ; if (--Height >= 0) goto c_h;



        movem.l        (a7)+,d2-d7/a2-a6

        rts

TCH	27 November 2019 12:50

@deimos:
Yes, bebbo's GCC 6 is producing this fast results with

-O2

.

@Don_Adan:
It's faster by around 0.1% than your previous version, thanks.

deimos

27 November 2019 13:10

Quote:

Originally Posted by TCH (Post 1361691)

@deimos:
Yes, bebbo's GCC 6 is producing this fast results with

-O2

I'd be interested to see if GCC 8.3 can do even better. Would it be hard to try?

TCH	27 November 2019 13:30

No idea. Where can i get GCC 8.3?

ross	27 November 2019 13:33

Quote:

Originally Posted by TCH (Post 1361696)

No idea. Where can i get GCC 8.3?

https://github.com/BartmanAbyss/vscode-amiga-debug

Antiriad_UK

27 November 2019 13:35

The Bartman lecture said they went through various compilers and that 8.3 was pretty sweet. https://www.twitch.tv/videos/468413972?t=02h20m09s

I'd probably still stay assembler, it's part of the charm of retro coding on an A500 :)

ross	27 November 2019 13:37

Quote:

Originally Posted by Antiriad_UK (Post 1361698)

I'd probably still stay assembler, it's part of the charm of retro coding on an A500 :)

This :agree

deimos

27 November 2019 13:37

Quote:

Originally Posted by TCH (Post 1361696)

No idea. Where can i get GCC 8.3?

What ross said, but here's the original thread about it too: http://eab.abime.net/showthread.php?t=98525

It's not as established as bebbo's, and I'm not sure if I find the VS Code integration all that useful, but 8 > 6?

Don_Adan

27 November 2019 14:14

Quote:

Originally Posted by TCH (Post 1361691)

@deimos:
Yes, bebbo's GCC 6 is producing this fast results with

-O2

.

@Don_Adan:
It's faster by around 0.1% than your previous version, thanks.

Interesting. Something must be 2-4 cycles fastest. (SP) vs 2 swap and/or lea vs add/move?

TCH	27 November 2019 14:35

@ross, @deimos:

This seems to be windows only. Do i miss something?

@Don_Adan:
I think it's because you spared the stack operations of

d2

. As for

d0

it can be negative as it is a coordinate.

deimos

27 November 2019 14:37

Quote:

Originally Posted by TCH (Post 1361709)

@ross, @deimos:

This seems to be windows only. Do i miss something?

No, all the cool kids use Windows nowadays.

ross	27 November 2019 15:10

Quote:

Originally Posted by TCH (Post 1361709)

@ross, @deimos:

This seems to be windows only. Do i miss something?

No, you've to suffer :)

hooverphonique

27 November 2019 15:24

Quote:

Originally Posted by deimos (Post 1361695)

I'd be interested to see if GCC 8.3 can do even better. Would it be hard to try?

According to Bartman, it can - after benchmarking gcc 4,6, and 8 (I think - or maybe it was 6/7/8), he deemed that only 8 was good enough for demomaking, thus did the 8.3 vscode thing.

EDIT: Basically what Antiriad said :D

a/b	27 November 2019 15:28

Here is the thing with c vs. asm. And it's not meant as a critique of Bartman, original poster, or anyone else. It's good to have more people work with Amiga, regardless of the language.
Take a simple 16->1 loop. You can write it c in at least 16 different ways: for, while, do/while, predec, postdec, ...
It *does* matter. You cannot simply write c code and assume the compiler will produce optimal code. You kind of have to give it hints, like it's been demonstrated in this thread and the other one (XOR fill optimization), adjusting the counter to lead the compiler to use dbf.
And experienced asm coders generally do that on the fly. They see how the 'optimal' code should look like in asm and they write similar c constructs. Not so experienced people don't do that and the output can be moderately slower, even if the compiler is pretty good.

deimos

27 November 2019 15:32

Quote:

Originally Posted by TCH (Post 1361709)

This seems to be windows only. Do i miss something?

If you don't do Windows, and if your code can run and output two numbers (C vs asm) for valid comparison, then I don't mind doing it for you, as long as it's that easy.

deimos

27 November 2019 15:40

Quote:

Originally Posted by a/b (Post 1361720)

Here is the thing with c vs. asm. And it's not meant as a critique to Bartman, original poster, or anyone else. It's good to have more people work with Amiga, regardless of the language.
Take a simple 16->1 loop. You can write it c in at least 16 different ways: for, while, do/while, predec, postdec, ...
It *does* matter. You cannot simply write c code and assume the compiler will produce optimal code. You kind of have to give it hints, like it's been demonstrated in this thread and the other one (XOR fill optimization), adjusting the counter to lead the compiler to use dbf.
And experienced asm coders generally do that on the fly. They see how the 'optimal' code should look like in asm and they write similar c constructs. Not so experienced people don't do that and the output can be moderately slower, even if the compiler is pretty good.

Been there.

Even us sub-optimal people can rewrite our C code so that mostly decent assembly is produced. But we need constant reminders to not use complex indexes into arrays instead of pointers that increment, modulos instead of working around the end of arrays, etc. etc.

But, it usually only matters for a very small percentage of code that makes up the hot spots, which is easy to forget.

All times are GMT +2. The time now is 04:55.

Page 3 of 6

Last »

Show 20 post(s) from this thread on one page

Page generated in 0.09111 seconds with 11 queries