Quote:
Originally Posted by meynaf
To show the example i will do something concrete.
My main position is that 68k has better ISA than x86. I'm not telling about the implementation.
So here's the 68k code i was reluctant to give :
Code:
draw_line
movem.l d0/d4-d7/a0-a2,-(a7)
movea.w #1,a0
move.l a0,a1 ; a0=a1=1 (dir)
tst.l d4
bpl.s .abs1
neg.l d4
subq.l #2,a0 ; will be -1
.abs1
tst.l d5
bpl.s .abs2
neg.l d5
subq.l #2,a1
.abs2
move.l d4,d6 ; x counter
cmp.l d4,d5
blo.s .max
move.l d5,d6 ; y counter
.max
move.l d6,d0 ; loop cntr
move.l d6,a2 ; save for addy
lsr.l #1,d6 ; rounding to avoid last pixel effect
move.l d6,d7 ; d6=x cntr, d7=y cntr
bra.s .yp
.loop
sub.l d4,d6
bgt.s .xp
add.l a0,d1 ; depl x
add.l a2,d6
.xp
sub.l d5,d7
bgt.s .yp
add.l a1,d2 ; depl y
add.l a2,d7
.yp
bsr.s setpixel
dbf d0,.loop
movem.l (a7)+,d0/d4-d7/a0-a2
rts
Now waiting for equivalent x86 (or whatever) version...
|
Just for fun I thought I would see what GCC could do with this. I grabbed the algorithm from
here, quickly hacked it to conform to your original spec:
Code:
void drawline(register int x0 asm("d1"),
register int y0 asm("d2"),
register int c asm("d3"),
register int dx asm("d4"),
register int dy asm("d5"))
{
int p, x, y, x1;
x=x0;
y=y0;
x1=x0+dx;
p=2*dy-dx;
while(x<x1)
{
if(p>=0)
{
putpixel(x,y,c);
y=y+1;
p=p+2*dy-2*dx;
}
else
{
putpixel(x,y,c);
p=p+2*dy;
}
x=x+1;
}
}
compiled with (-Os = smallest code please):
Code:
m68k-amigaosvasm-gcc -fomit-frame-pointer -Os -S line.c
and it generated:
Code:
_drawline:
movem.l a3/a2/d7/d6/d5/d4/d3/d2,-(sp)
move.l d1,d7
move.l d1,a3
add.l d4,a3
add.l d5,d5
move.l d5,d6
sub.l d4,d6
add.l d4,d4
lea _putpixel,a2
_.L2:
cmp.l d7,a3
jgt _.L5
movem.l (sp)+,d2/d3/d4/d5/d6/d7/a2/a3
rts
_.L5:
tst.l d6
jlt _.L3
move.l d3,-(sp)
move.l d2,-(sp)
move.l d7,-(sp)
jsr (a2)
addq.l #1,d2
add.l d5,d6
sub.l d4,d6
_.L6:
lea (12,sp),sp
addq.l #1,d7
jra _.L2
_.L3:
move.l d3,-(sp)
move.l d2,-(sp)
move.l d7,-(sp)
jsr (a2)
add.l d5,d6
jra _.L6
Which is similar in the number of lines of code as your hand optimised example.
I didn't have time to confirm that the C is correct (only spent 1 minute on this), but it's interesting either way.