23 August 2014, 12:43 | #41 | |
old bearded fool
Join Date: Jan 2010
Location: Bangkok
Age: 56
Posts: 775
|
Quote:
Didn't know there was a system call for it. |
|
23 August 2014, 13:41 | #42 |
Join Date: Jul 2008
Location: Sweden
Posts: 2,269
|
Personally I would skip the BCD and just do long-form division on 32-bit integers. You can extract up to 4 digits with each run of 2 divisions, so it's reasonably fast. Finding the digits by means of subtracting 10^n is even faster, and both methods are trivial to extend for integers of any length.
|
23 August 2014, 14:11 | #43 | |
old bearded fool
Join Date: Jan 2010
Location: Bangkok
Age: 56
Posts: 775
|
Quote:
Could you explain that with some sample code? Whenever I search it's usually crappy little endian x86 code showing up, or Atmel. |
|
23 August 2014, 14:40 | #44 |
Join Date: Jul 2008
Location: Sweden
Posts: 2,269
|
f.ex. like this:
Code:
Print move.l sp, a0 sub #12, sp sf -(a0) .loop clr.l d1 swap d0 move.w d0, d1 divu.w #10, d1 move.w d1, d0 swap d0 move.w d0, d1 divu.w #10, d1 move.w d1, d0 swap d1 add.b #'0', d1 move.b d1, -(a0) tst.l d0 bne .loop ; A0 now points to ASCII string add #12, sp rts |
23 August 2014, 14:45 | #45 |
old bearded fool
Join Date: Jan 2010
Location: Bangkok
Age: 56
Posts: 775
|
Thanks! It was when/how to do the 'swap' I needed to see, will test soon, still fiddling with alkis code.
|
15 September 2014, 22:02 | #46 | |
68k
Join Date: Sep 2005
Location: Somewhere
Posts: 828
|
Quote:
Code:
lea score(pc),a0 ;8c move.b (a0),d3 ;8c moveq #$f,d1 ;4c and.b d1,d3 ;4c move.w (a0)+,d0 ;8c, now a0 points to digits move.b d0,d2 ;4c and.w d1,d2 ;4c lsr.w #4,d0 ;14c and.w d0,d1 ;4c move.b d3,d0 ;4c ;if ascii then uncomment (take extra 20c) ; ;move.w #$3030,d3 ;8c ;add.w d3,d0 ;4c ;add.b d3,d1 ;4c ;add.b d3,d2 ;4c move.w d0,(a0)+ ;8c move.b d1,(a0)+ ;8c move.b d2,(a0)+ ;8c ; = 86c (or 106c with ascii version) score: dc.w $1234 ; score in bcd format digits: dc.l 0,0 |
|
15 September 2014, 23:07 | #47 |
Registered User
Join Date: Jan 2012
Location: USA
Posts: 372
|
If the blitter is expected to run or if bitplane DMA > 4 planes in a chip/slow ram system, it might be interesing to compare algorithms based on how often they touch memory rather by how many cycles they take.
Leffman's code has some time consuming DIVUs but the code leaves plenty of DMA cycles free. Somewhat related: MULS and DIVS instructions can be paired with a MULS immediately followed by a DIVS to create a nice memory access free window of up to 238 cycles, provided that all data is already in Dx registers. This is possible because the MULS instruction first prefetches the DIVS instruction before beginning internal execution while the DIVS instruction does its prefetch cycle at the end following its internal execution. |
16 September 2014, 09:57 | #48 | |
68k
Join Date: Sep 2005
Location: Somewhere
Posts: 828
|
Quote:
for example this routine has one read and one write and takes 114c Code:
lea score(pc),a0 move.w (a0)+,d0 move.w #$f0f0,d1 and.w d0,d1 eor.w d1,d0 move.b d1,d2 rol.w #4,d2 ror.w #4,d1 move.b d0,d2 ror.w #8,d0 move.b d0,d1 swap d1 move.w d2,d1 ;for ascii uncomment (extra 16c ) ; add.l #$30303030,d1 move.l d1,(a0)+ ;=114c (with ascii take 130c) score: dc.w $1234 digits: dc.l 0,0 Last edited by Asman; 16 September 2014 at 10:09. Reason: added ascii version |
|
16 September 2014, 11:38 | #49 |
Glastonbridge Software
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,243
|
of course we can divide by 10,000 with a divu.w for a maximum Long input of 655359999, then convert each half of the result separately (i.e. a recursive approach).
also one could divide by 1,000,000 by shifting right by 4 and then divide by 62500, i'll leave correcting the remainder as an exercise to the student |
16 September 2014, 19:42 | #50 | |
Registered User
Join Date: Jan 2012
Location: USA
Posts: 372
|
Quote:
Now consider the rol.w #4, d2 instruction. The instruction runs one prefetch cycle at the beginning of execution, then there are a number of internal operations that execute internally to rotate the data. For rol.w #4, d2, the number of internal cycles is equal to 10. The rol instruction is unusual in that is has cycles that don't require memory access. Most of the time though you can assume an instruction busily spends all its time accessing memory for things like instruction prefetch and operand reads and writes. And if the number of bitplanes for the display is four or fewer, DMA for bitplane display can usually overlap with CPU memory accesses since the CPU begins a memory access cycle by placing an address on the address bus and not transferring data during the first two cycles of a memory access cycle. Bitplane DMA can occur during those first two cycles. Things change as you add more bitplanes or use the blitter. Instructions that need to access memory are more often made to wait if bitplane DMA or blitter DMA blocks the CPU in the two cycles after the address is placed on the bus. This is where instruction that have internal operations can make a difference. If internal operation overlaps with DMA, then there is less slow down. The page http://nemesis.hacking-cult.org/Mega...tion/Yacht.txt give data bus usage for the 68000 cpu. It can be a little confusing. Make sure to ignore whitespace and pipes when trying to comprehend bus usages. For example, EORI.L #$55555555, d0 runs like this: npnpnpnn Each 'n' represents two cycles of internal processing. Most of the time this is when the CPU puts an address on the bus before a memory access. The 'p' means prefetch. The last two 'n's in the instruction represent four CPU cycles of internal processing. System DMA can occur during any of the 'n's without slowing the CPU down. |
|
22 November 2021, 06:38 | #51 |
Registered User
Join Date: Sep 2019
Location: Finland
Posts: 361
|
I just recently realized (woke up in the middle of the night) that I can get rid of a DIVU from inside a loop by using BCD. I have a two digit line counter that gets displayed, so I converted this:
Code:
lea .pos(pc),a0 move d6,d0 divu #10,d0 or.b #'0',d0 move.b d0,(a0) swap d0 or.b #'0',d0 move.b d0,1(a0) Code:
lea .pos(pc),a0 move.w d6,d0 * $00XY lsl.w #4,d0 * $0XY0 lsr.b #4,d0 * $0X0Y or.w #$3030,d0 move d0,(a0) A lookup table would be another alternative but this shall suffice. |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Prefs/DefIcons howto ? | amiga | support.Apps | 1 | 04 October 2008 18:34 |
Got a Catweasel MK2... howto? | Photon | support.Hardware | 3 | 27 July 2008 16:22 |
MKick Howto? | maxlock | support.Other | 2 | 12 June 2008 19:01 |
My CD32-compilation HOWTO... | frostwork | Amiga scene | 1 | 05 January 2005 15:53 |
|
|