d0/d1 is essentially whan I'm doing with d0/a1 (but without a muls, my 26 byter was doing this with a muls). You can eliminate the multiplication by increasing counter1 by counter2's value and then decreasing counter2 by 2. The product is now counter1's value.
0*512 = 0, d = +511+2
1*511 = 511, d = +511
2*510 = 1020, d = +509
3*509 = 1527, d = +507
...
becomes
511+2 0
511 0+511
509 0+511+509=1020
507 0+511+509+507=1527
...
You still have to initialize and increment both, it only eliminates the multiplication (-2 bytes), but since you need the product in 2 registers (d2 and d3) you still need an additional move so it goes back to 40 bytes. However it's faster.
Last edited by a/b; 24 March 2021 at 09:23.
|