Thread: 68k details
View Single Post
Old 07 September 2018, 09:12   #442
alpine9000
Registered User
 
Join Date: Mar 2016
Location: Australia
Posts: 881
Quote:
Originally Posted by meynaf View Post
To show the example i will do something concrete.

My main position is that 68k has better ISA than x86. I'm not telling about the implementation.
So here's the 68k code i was reluctant to give :
Code:
draw_line
 movem.l d0/d4-d7/a0-a2,-(a7)
 movea.w #1,a0
 move.l a0,a1			; a0=a1=1 (dir)
 tst.l d4
 bpl.s .abs1
 neg.l d4
 subq.l #2,a0			; will be -1
.abs1
 tst.l d5
 bpl.s .abs2
 neg.l d5
 subq.l #2,a1
.abs2
 move.l d4,d6			; x counter
 cmp.l d4,d5
 blo.s .max
 move.l d5,d6			; y counter
.max
 move.l d6,d0			; loop cntr
 move.l d6,a2			; save for addy
 lsr.l #1,d6			; rounding to avoid last pixel effect
 move.l d6,d7			; d6=x cntr, d7=y cntr
 bra.s .yp
.loop
 sub.l d4,d6
 bgt.s .xp
 add.l a0,d1			; depl x
 add.l a2,d6
.xp
 sub.l d5,d7
 bgt.s .yp
 add.l a1,d2			; depl y
 add.l a2,d7
.yp
 bsr.s setpixel
 dbf d0,.loop
 movem.l (a7)+,d0/d4-d7/a0-a2
 rts
Now waiting for equivalent x86 (or whatever) version...
Just for fun I thought I would see what GCC could do with this. I grabbed the algorithm from here, quickly hacked it to conform to your original spec:

Code:
void drawline(register int x0 asm("d1"),
             register int y0 asm("d2"),
             register int c asm("d3"),
             register int dx asm("d4"),
             register int dy asm("d5"))
{
    int p, x, y, x1;

    x=x0;
    y=y0;
    x1=x0+dx;

    p=2*dy-dx;

    while(x<x1)
    {
        if(p>=0)
        {
            putpixel(x,y,c);
            y=y+1;
            p=p+2*dy-2*dx;
        }
        else
        {
            putpixel(x,y,c);
            p=p+2*dy;
        }
        x=x+1;
    }
}
compiled with (-Os = smallest code please):

Code:
m68k-amigaosvasm-gcc -fomit-frame-pointer -Os -S line.c
and it generated:

Code:
_drawline:
        movem.l a3/a2/d7/d6/d5/d4/d3/d2,-(sp)
        move.l d1,d7
        move.l d1,a3
        add.l d4,a3
        add.l d5,d5
        move.l d5,d6
        sub.l d4,d6
        add.l d4,d4
        lea _putpixel,a2
_.L2:
        cmp.l d7,a3
        jgt _.L5
        movem.l (sp)+,d2/d3/d4/d5/d6/d7/a2/a3
        rts
_.L5:
        tst.l d6
        jlt _.L3
        move.l d3,-(sp)
        move.l d2,-(sp)
        move.l d7,-(sp)
        jsr (a2)
        addq.l #1,d2
        add.l d5,d6
        sub.l d4,d6
_.L6:
        lea (12,sp),sp
        addq.l #1,d7
        jra _.L2
_.L3:
        move.l d3,-(sp)
        move.l d2,-(sp)
        move.l d7,-(sp)
        jsr (a2)
        add.l d5,d6
        jra _.L6
Which is similar in the number of lines of code as your hand optimised example.

I didn't have time to confirm that the C is correct (only spent 1 minute on this), but it's interesting either way.
alpine9000 is offline  
 
Page generated in 0.04326 seconds with 11 queries