English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General

 
 
Thread Tools
Old 23 August 2014, 12:43   #41
modrobert
old bearded fool
 
modrobert's Avatar
 
Join Date: Jan 2010
Location: Bangkok
Age: 56
Posts: 775
Quote:
Originally Posted by alkis View Post
You can always use the OS.
Code:
* converts d0 long number to decimal ascii at (a0)
decimalconvert
    movem.l d0/a0-a3,-(sp)
    move.l d0,savelongvalue
    lea.l savelongvalue(pc),a1
    move.l a0,a3
    lea.l formatString(pc),a0
    lea.l stuffChar(pc),a2
    CALLEXEC RawDoFmt
    movem.l (sp)+,d0/a0-a3
    rts

stuffChar:
    move.b  d0,(a3)+
    rts

savelongvalue dc.l 0
formatString  dc.b '%10ld',0
    EVEN
Thanks a lot! Will single step through the process and see how the OS does it.

Didn't know there was a system call for it.
modrobert is offline  
Old 23 August 2014, 13:41   #42
Leffmann
 
Join Date: Jul 2008
Location: Sweden
Posts: 2,269
Personally I would skip the BCD and just do long-form division on 32-bit integers. You can extract up to 4 digits with each run of 2 divisions, so it's reasonably fast. Finding the digits by means of subtracting 10^n is even faster, and both methods are trivial to extend for integers of any length.
Leffmann is offline  
Old 23 August 2014, 14:11   #43
modrobert
old bearded fool
 
modrobert's Avatar
 
Join Date: Jan 2010
Location: Bangkok
Age: 56
Posts: 775
Quote:
Originally Posted by Leffmann View Post
Personally I would skip the BCD and just do long-form division on 32-bit integers. You can extract up to 4 digits with each run of 2 divisions, so it's reasonably fast. Finding the digits by means of subtracting 10^n is even faster, and both methods are trivial to extend for integers of any length.
Please treat me like an idiot, because it's the truth (at least in this case).

Could you explain that with some sample code?

Whenever I search it's usually crappy little endian x86 code showing up, or Atmel.
modrobert is offline  
Old 23 August 2014, 14:40   #44
Leffmann
 
Join Date: Jul 2008
Location: Sweden
Posts: 2,269
f.ex. like this:
Code:
Print	move.l	sp, a0
	sub	#12, sp
	sf	-(a0)

.loop	clr.l	d1
	swap	d0
	move.w	d0, d1
	divu.w	#10, d1
	move.w	d1, d0
	swap	d0
	move.w	d0, d1
	divu.w	#10, d1
	move.w	d1, d0
	swap	d1
	add.b	#'0', d1
	move.b	d1, -(a0)

	tst.l	d0
	bne	.loop

	; A0 now points to ASCII string

	add	#12, sp
	rts
We're working in base 2^16, but other than that it's really no different from what we did with pen and paper back in the school bench. But I don't want to stir up bad memories
Leffmann is offline  
Old 23 August 2014, 14:45   #45
modrobert
old bearded fool
 
modrobert's Avatar
 
Join Date: Jan 2010
Location: Bangkok
Age: 56
Posts: 775
Thanks! It was when/how to do the 'swap' I needed to see, will test soon, still fiddling with alkis code.
modrobert is offline  
Old 15 September 2014, 22:02   #46
Asman
68k
 
Asman's Avatar
 
Join Date: Sep 2005
Location: Somewhere
Posts: 828
Quote:
Originally Posted by phx View Post
This is the code I'm currently using:
Code:
8       subq.l  #8,sp
4       move.l  sp,a0
12      move.b  Score(a4),d0
4       moveq   #15,d1
4       and.b   d0,d1
8       move.b  d1,(a0)+
14      lsr.b   #4,d0
8       move.b  d0,(a0)+
12      move.b  Score+1(a4),d0
4       moveq   #15,d1
4       and.b   d0,d1
8       move.b  d1,(a0)+
14      lsr.b   #4,d0
8       move.b  d0,(a0)
28      add.l   #'0000',(sp)
---
140
My second approach of 4 digits bcd score routine.

Code:
    lea score(pc),a0 ;8c
    move.b  (a0),d3 ;8c
    moveq   #$f,d1  ;4c
    and.b   d1,d3   ;4c
    move.w  (a0)+,d0    ;8c, now a0 points to digits
    move.b  d0,d2   ;4c
    and.w   d1,d2   ;4c
    lsr.w   #4,d0   ;14c
    and.w   d0,d1   ;4c
    move.b  d3,d0   ;4c

    ;if ascii then uncomment (take extra 20c)
    ;
    ;move.w  #$3030,d3   ;8c
    ;add.w   d3,d0       ;4c
    ;add.b   d3,d1       ;4c
    ;add.b   d3,d2       ;4c
    
    move.w  d0,(a0)+    ;8c
    move.b  d1,(a0)+    ;8c
    move.b  d2,(a0)+    ;8c
                               ; = 86c  (or 106c with ascii version)

score:  dc.w    $1234 ; score in bcd format
digits: dc.l    0,0
It's hard for me to beats DonAdan approach with table and very hard for me to beat Codetapper 8 bytes bcd version, but I will try .
Asman is offline  
Old 15 September 2014, 23:07   #47
mc6809e
Registered User
 
Join Date: Jan 2012
Location: USA
Posts: 372
If the blitter is expected to run or if bitplane DMA > 4 planes in a chip/slow ram system, it might be interesing to compare algorithms based on how often they touch memory rather by how many cycles they take.

Leffman's code has some time consuming DIVUs but the code leaves plenty of DMA cycles free.

Somewhat related:

MULS and DIVS instructions can be paired with a MULS immediately followed by a DIVS to create a nice memory access free window of up to 238 cycles, provided that all data is already in Dx registers.

This is possible because the MULS instruction first prefetches the DIVS instruction before beginning internal execution while the DIVS instruction does its prefetch cycle at the end following its internal execution.
mc6809e is offline  
Old 16 September 2014, 09:57   #48
Asman
68k
 
Asman's Avatar
 
Join Date: Sep 2005
Location: Somewhere
Posts: 828
Quote:
Originally Posted by mc6809e View Post
If the blitter is expected to run or if bitplane DMA > 4 planes in a chip/slow ram system, it might be interesing to compare algorithms based on how often they touch memory rather by how many cycles they take.
I don't get it. So its mean that routine with less amount of read/write to chip/slow will be faster ? How can I check this ?

for example this routine has one read and one write and takes 114c

Code:
    lea score(pc),a0
    move.w  (a0)+,d0
    move.w  #$f0f0,d1
    and.w   d0,d1
    eor.w   d1,d0
    move.b  d1,d2
    rol.w   #4,d2
    ror.w   #4,d1
    move.b  d0,d2
    ror.w   #8,d0
    move.b  d0,d1
    swap    d1
    move.w  d2,d1
;for ascii uncomment (extra 16c )
;   add.l #$30303030,d1 
    move.l  d1,(a0)+
;=114c (with ascii take 130c)

score:  dc.w    $1234
digits: dc.l    0,0

Last edited by Asman; 16 September 2014 at 10:09. Reason: added ascii version
Asman is offline  
Old 16 September 2014, 11:38   #49
Mrs Beanbag
Glastonbridge Software
 
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,243
of course we can divide by 10,000 with a divu.w for a maximum Long input of 655359999, then convert each half of the result separately (i.e. a recursive approach).

also one could divide by 1,000,000 by shifting right by 4 and then divide by 62500, i'll leave correcting the remainder as an exercise to the student
Mrs Beanbag is offline  
Old 16 September 2014, 19:42   #50
mc6809e
Registered User
 
Join Date: Jan 2012
Location: USA
Posts: 372
Quote:
Originally Posted by Asman View Post
I don't get it. So its mean that routine with less amount of read/write to chip/slow will be faster ? How can I check this ?

for example this routine has one read and one write and takes 114c
You have to include prefetch cycles for each instruction if your code is in chipram. The and.w d0,d1 instruction in your code, for example, has one prefetch cycle so even though the work is being done inside registers, the instruction still needs to prefetch the next instruction from memory before it can finish.

Now consider the rol.w #4, d2 instruction. The instruction runs one prefetch cycle at the beginning of execution, then there are a number of internal operations that execute internally to rotate the data. For rol.w #4, d2, the number of internal cycles is equal to 10.

The rol instruction is unusual in that is has cycles that don't require memory access. Most of the time though you can assume an instruction busily spends all its time accessing memory for things like instruction prefetch and operand reads and writes. And if the number of bitplanes for the display is four or fewer, DMA for bitplane display can usually overlap with CPU memory accesses since the CPU begins a memory access cycle by placing an address on the address bus and not transferring data during the first two cycles of a memory access cycle. Bitplane DMA can occur during those first two cycles.

Things change as you add more bitplanes or use the blitter. Instructions that need to access memory are more often made to wait if bitplane DMA or blitter DMA blocks the CPU in the two cycles after the address is placed on the bus.

This is where instruction that have internal operations can make a difference. If internal operation overlaps with DMA, then there is less slow down.

The page
http://nemesis.hacking-cult.org/Mega...tion/Yacht.txt give data bus usage for the 68000 cpu.

It can be a little confusing. Make sure to ignore whitespace and pipes when trying to comprehend bus usages.

For example, EORI.L #$55555555, d0 runs like this:

npnpnpnn

Each 'n' represents two cycles of internal processing. Most of the time this is when the CPU puts an address on the bus before a memory access. The 'p' means prefetch. The last two 'n's in the instruction represent four CPU cycles of internal processing. System DMA can occur during any of the 'n's without slowing the CPU down.
mc6809e is offline  
Old 22 November 2021, 06:38   #51
koobo
Registered User
 
koobo's Avatar
 
Join Date: Sep 2019
Location: Finland
Posts: 361
I just recently realized (woke up in the middle of the night) that I can get rid of a DIVU from inside a loop by using BCD. I have a two digit line counter that gets displayed, so I converted this:

Code:
        lea	.pos(pc),a0		
	move	d6,d0
	divu	#10,d0
	or.b	#'0',d0
	move.b	d0,(a0)
	swap	d0
	or.b	#'0',d0
	move.b	d0,1(a0)
to this:

Code:
	lea	.pos(pc),a0		
	move.w	d6,d0	* $00XY
	lsl.w	#4,d0	* $0XY0
	lsr.b	#4,d0	* $0X0Y
	or.w	#$3030,d0
	move	d0,(a0)
Neat
A lookup table would be another alternative but this shall suffice.
koobo is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Prefs/DefIcons howto ? amiga support.Apps 1 04 October 2008 18:34
Got a Catweasel MK2... howto? Photon support.Hardware 3 27 July 2008 16:22
MKick Howto? maxlock support.Other 2 12 June 2008 19:01
My CD32-compilation HOWTO... frostwork Amiga scene 1 05 January 2005 15:53

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 19:26.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.09693 seconds with 16 queries