02 September 2014, 14:14 | #81 |
XoXo/Tasko Developer
Join Date: Dec 2013
Location: Munich
Age: 48
Posts: 450
|
Ah! And is this new method as accurate as the divs-version?
|
02 September 2014, 14:23 | #82 |
Glastonbridge Software
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,243
|
with the multiplication by 258 it should be accurate to 2 parts in 65535 (255*255*258/256 = 65533.0078...)
that should be good enough for anyone |
02 September 2014, 14:27 | #83 |
XoXo/Tasko Developer
Join Date: Dec 2013
Location: Munich
Age: 48
Posts: 450
|
But it's more accurate and faster than the methode with the lsr #8 (instead of the div by 255), right? An lsr takes double as cycles as the amount of bits that are shiftet.
|
02 September 2014, 15:11 | #84 | |
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,810
|
Quote:
On the 68000 shifts take up more cycles per bit. |
|
02 September 2014, 15:13 | #85 |
Glastonbridge Software
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,243
|
on 68000, lsr takes 6+2n cycles, swap takes 4 cycles, so yes it is both more accurate and faster.
on 68020+, however, lsr and swap take the same time. But then 68000 doesn't have muls.l anyway. muls.l is much slower than muls.w, too. We could instead multiply alpha by 129 and use a muls.w followed by a left shift. |
02 September 2014, 15:20 | #86 |
XoXo/Tasko Developer
Join Date: Dec 2013
Location: Munich
Age: 48
Posts: 450
|
|
02 September 2014, 15:31 | #87 |
Glastonbridge Software
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,243
|
well it would be either lsl #1 followed by swap, or lsr #7 and lsr #8. We'd need to shift right 15 bits instead of 16.
|
02 September 2014, 15:40 | #88 |
XoXo/Tasko Developer
Join Date: Dec 2013
Location: Munich
Age: 48
Posts: 450
|
Ok, I tried it this way:
Code:
moveq #0,d0 moveq #0,d2 move.b d2,d0 lsl.w #7,d2 add.w d0,d2 moveq #0,d0 move.b (a0)+,d0 move.b (a1),d1 sub.l d1,d0 muls.w d2,d0 lsl.l #1,d0 swap d0 add.w d1,d0 move.b d0,(a1)+ |
02 September 2014, 15:57 | #89 |
XoXo/Tasko Developer
Join Date: Dec 2013
Location: Munich
Age: 48
Posts: 450
|
|
02 September 2014, 15:59 | #90 |
Glastonbridge Software
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,243
|
can't see why not. it would help to have a more detailed description of the not working.
those first two moveq #0s are inside the loop, right? Multiplying by 258 or multiplying by 129 followed by a left shift should be the same. I hope that seems obvious. Lsl is no different to Asl, the sign is only important in right shifts. 254*129 = 32766 < 32768 so that shouldn't overflow. 255 is handled elsewhere. hmm... Last edited by Mrs Beanbag; 02 September 2014 at 16:05. |
02 September 2014, 16:09 | #91 |
XoXo/Tasko Developer
Join Date: Dec 2013
Location: Munich
Age: 48
Posts: 450
|
Yes they are inside. Here's what it looks like. None of the transparent pixels are drawn.
|
02 September 2014, 16:13 | #92 |
Glastonbridge Software
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,243
|
and the correct result?
|
02 September 2014, 16:23 | #93 |
XoXo/Tasko Developer
Join Date: Dec 2013
Location: Munich
Age: 48
Posts: 450
|
Here:
|
02 September 2014, 16:35 | #94 |
Registered User
Join Date: Dec 2013
Location: Lake Havasu City, AZ
Posts: 741
|
If you guys don't mind burning 64K of memory, this would be way faster using a simple lookup table.
|
02 September 2014, 17:07 | #95 |
XoXo/Tasko Developer
Join Date: Dec 2013
Location: Munich
Age: 48
Posts: 450
|
How can I measure the speed? It's running in an OS app. Looking up the fields with the seconds and micros in the intbase did not reveal much. It's seemingly too fast for intuition updating those fields.
|
02 September 2014, 17:19 | #96 |
Registered User
Join Date: Dec 2010
Location: Athens/Greece
Age: 53
Posts: 720
|
call it a gazillion times
|
02 September 2014, 17:38 | #97 |
Glastonbridge Software
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,243
|
so where is D2 being read from memory?
|
02 September 2014, 17:41 | #98 |
XoXo/Tasko Developer
Join Date: Dec 2013
Location: Munich
Age: 48
Posts: 450
|
AH! It was cleared after the read! Thus all partially pixels became total transparent. I am now measuring the speed.
|
02 September 2014, 18:01 | #99 |
XoXo/Tasko Developer
Join Date: Dec 2013
Location: Munich
Age: 48
Posts: 450
|
It works, but it turns out being 5 seconds slower (66 vs 71) at 3000 iterations, at least here in FS-UAE.
ps: I think this is because for the emulation there is no difference between muls.w and muls.l and then the additional lsl.l takes more time. @Toni Wilen: What do you mean? Last edited by AGS; 02 September 2014 at 18:09. |
02 September 2014, 18:38 | #100 |
XoXo/Tasko Developer
Join Date: Dec 2013
Location: Munich
Age: 48
Posts: 450
|
@Mrs Beanbag
I compared speed with the original muls/divs version and found that the muls/divs version is faster than the new optimized variant by 3 seconds. And additionally, when we draw the same alpha picture onto the screen over and over again, the result is surprising. Left is the variant with muls and divs, and right the new optimized variant. Seems like something is going wrong? Last edited by AGS; 02 September 2014 at 18:59. |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Tool to convert asm to gnu asm (gas) | Asman | Coders. Asm / Hardware | 13 | 30 December 2020 11:57 |
TCP/IP stack: Most optimized//small? | Amiga1992 | support.Apps | 17 | 14 June 2008 00:42 |
Optimized Protracker playroutine? | Photon | Coders. General | 10 | 11 June 2005 00:54 |
|
|