English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 19 March 2019, 15:51   #1
mcgeezer
Registered User
 
Join Date: Oct 2017
Location: Sunderland, England
Posts: 2,702
Fast divide by 50 or 60 without tables?

Hi all.... I know you just love these little programming conundrums I conjure up from Rygar so here's another one I've ran into.

For the count down timer at the top of the play window I use a divide by 50 routine which does avoid using any divide instructions but at a horendous cost of using 20KB of tables/ram, so I'm trying to optimise memory use and have my eye on this.

The current routine is like this...

Code:
lea	DIV50_TABLE(pc),a0
	move.w	(a0,d0*4),d0
	tst.b	d0
	beq.s	.draw_sec
	bra.s	.draw_msec    ;Draw millisecond or second counter

DIV50_TABLE:				; 20KB!!!
		REPT	200				; 200 Seconds with remainders!
		dc.b	REPTN,0
		dc.b	REPTN,1
		dc.b	REPTN,2
		dc.b	REPTN,3
		dc.b	REPTN,4
		dc.b	REPTN,5
		dc.b	REPTN,6
		dc.b	REPTN,7
		dc.b	REPTN,8
		dc.b	REPTN,9
		dc.b	REPTN,10
		dc.b	REPTN,11
		dc.b	REPTN,12
		dc.b	REPTN,13
		dc.b	REPTN,14
		dc.b	REPTN,15
		dc.b	REPTN,16
		dc.b	REPTN,17
		dc.b	REPTN,18
		dc.b	REPTN,19
		dc.b	REPTN,20
		dc.b	REPTN,21
		dc.b	REPTN,22
		dc.b	REPTN,23
		dc.b	REPTN,24
		dc.b	REPTN,25
		dc.b	REPTN,26
		dc.b	REPTN,27
		dc.b	REPTN,28
		dc.b	REPTN,29
		dc.b	REPTN,30
		dc.b	REPTN,31
		dc.b	REPTN,32
		dc.b	REPTN,33
		dc.b	REPTN,34
		dc.b	REPTN,35
		dc.b	REPTN,36
		dc.b	REPTN,37
		dc.b	REPTN,38
		dc.b	REPTN,39
		dc.b	REPTN,40
		dc.b	REPTN,41
		dc.b	REPTN,42
		dc.b	REPTN,43
		dc.b	REPTN,44
		dc.b	REPTN,45
		dc.b	REPTN,46
		dc.b	REPTN,47
		dc.b	REPTN,48
		dc.b	REPTN,49
		ENDR
Does anyone have any ideas how I can get a better balance of speed over memory, if push comes to shove I'll have to take a cpu hit and use a DIV instruction as I can't justify 20Kb table ram.

Any ideas welcome!

Geezer
mcgeezer is offline  
Old 19 March 2019, 16:01   #2
mcgeezer
Registered User
 
Join Date: Oct 2017
Location: Sunderland, England
Posts: 2,702
I've just thought that this doesn't even have to be accurate... I could simply divide by 64 if I have to...

That's at least one way to do it.
mcgeezer is offline  
Old 19 March 2019, 16:01   #3
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,408
My assumption here is that you're dividing the frame counter by 50. If that is correct, there is a way of avoiding a division by using a multiple counters.

I.E. Instead of having a frame counter and dividing it by 50 (or 60), have several counters: one that just counts frames (as you'll likely need it somewhere), one which is added to each frame and resets every 50 (or 60) and one which is added to only once every 50 (or 60) frames.

This way you have one counter for the seconds and one for each 0,02/0,016* seconds.

Note: this is not a full algorithm (as you still have to combine the two into one result), but it may be useful as the start of one.

Edit: you could then simply shift the resulting 'frame counter' to the left the appropriate count (i.e. 1 for a 50Hz display and 4 for a 60Hz display) and use that as the source of the 'fractional digits'.

*) Actually 0,166666...7 but this'll do
roondar is offline  
Old 19 March 2019, 16:07   #4
mcgeezer
Registered User
 
Join Date: Oct 2017
Location: Sunderland, England
Posts: 2,702
Quote:
Originally Posted by roondar View Post
My assumption here is that you're dividing the frame counter by 50. If that is correct, there is a way of avoiding a division by using a multiple counters.

)

Yeah that's right, 50 frames per second.
mcgeezer is offline  
Old 19 March 2019, 16:10   #5
arcanist
Registered User
 
Join Date: Dec 2017
Location: Austin, TX
Age: 41
Posts: 405
Yes, counting up (or down) with a fixed increment is the way to go.

For arbitrary (constant) divisors I'd start two counters from 0. On each frame add ($10000 / <divisor>) to one of them, for the fractional part. When the add.w overflows increment the other counter, for the whole number.
arcanist is online now  
Old 19 March 2019, 16:18   #6
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,408
Quote:
Originally Posted by arcanist View Post
Yes, counting up (or down) with a fixed increment is the way to go.

For arbitrary (constant) divisors I'd start two counters from 0. On each frame add ($10000 / <divisor>) to one of them, for the fractional part. When the add.w overflows increment the other counter, for the whole number.
That is a nice method. I'll have to remember that, it's a much more general version of my idea.
roondar is offline  
Old 19 March 2019, 16:53   #7
zero
Registered User
 
Join Date: Jun 2016
Location: UK
Posts: 428
Could you use a CIA timer and just subtract 1 every time it overflows?
zero is offline  
Old 19 March 2019, 17:26   #8
Gorf
Registered User
 
Gorf's Avatar
 
Join Date: May 2017
Location: Munich/Bavaria
Posts: 2,294
(1/64)+(1/256)+(1/2048) = 0.02
Is that close enough to 1/50 = 0.02 ?
;-)

(1/64)+(1/1024)=0.0166
1/60 = 0.016666
Gorf is offline  
Old 19 March 2019, 19:33   #9
Leffmann
 
Join Date: Jul 2008
Location: Sweden
Posts: 2,269
Do you actually have to optimize it? The cost of a 16-bit division on 020 is like 50 CPU cycles. At ~14.2 MHz that's like 1/20th of a single scanline, to look at it in those terms. There's practically no gain from optimizing it.

But if it has to be faster, just add 1 to a counter each frame, and when you hit 50 you have a second. I think that's what Roondar is saying too Counting with fractions is always helpful for these things.
Leffmann is offline  
Old 19 March 2019, 20:23   #10
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Code:
    mulu.w  #1311,d0    ;(1/50)*2^16
    lsr.l   #8,d0       ;d0.w=s|frac
Cheers
ross is offline  
Old 19 March 2019, 21:10   #11
mcgeezer
Registered User
 
Join Date: Oct 2017
Location: Sunderland, England
Posts: 2,702
Quote:
Originally Posted by Leffmann View Post
Do you actually have to optimize it? The cost of a 16-bit division on 020 is like 50 CPU cycles. At ~14.2 MHz that's like 1/20th of a single scanline, to look at it in those terms. There's practically no gain from optimizing it.
Yes I do. Currently it uses a 20Kb table so it isn't the cycles I'm worried about it's the memory use.

Anyway... some great suggestions here all of which will help.

Appreciated gents!
mcgeezer is offline  
Old 19 March 2019, 22:13   #12
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,408
Quote:
Originally Posted by Leffmann View Post
But if it has to be faster, just add 1 to a counter each frame, and when you hit 50 you have a second. I think that's what Roondar is saying too Counting with fractions is always helpful for these things.
Yeah, that is what I meant. And then keep two counters so you have the fraction and number of seconds separately. But I was in a bit of a hurry so it came out more like a brain dump than a carefully crafted post
roondar is offline  
Old 20 March 2019, 01:06   #13
alpine9000
Registered User
 
Join Date: Mar 2016
Location: Australia
Posts: 881
Interesting to see what GCC generates for divide by 50:

Code:
 
        lsr.w #1,d0
        mulu.w #5243,d0
        clr.w d0
        swap d0
        lsr.w #1,d0

Last edited by alpine9000; 20 March 2019 at 01:11. Reason: spelling
alpine9000 is offline  
Old 20 March 2019, 08:44   #14
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by alpine9000 View Post
interesting to see what gcc generates for divide by 50:

Code:
 
        lsr.w #1,d0
        mulu.w #5243,d0
        clr.w d0
        swap d0
        lsr.w #1,d0
5243/1311=4

I often use reciprocals when I can for divisions.
ross is offline  
Old 20 March 2019, 09:23   #15
alpine9000
Registered User
 
Join Date: Mar 2016
Location: Australia
Posts: 881
Quote:
Originally Posted by ross View Post
5243/1311=4

I often use reciprocals when I can for divisions.
I'm not great at cycle counting, but your solution and the gcc one are a pretty similar number of cycles (due to the large number of bits you shift) ? Except your solution is much more compact!
alpine9000 is offline  
Old 20 March 2019, 10:07   #16
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by alpine9000 View Post
I'm not great at cycle counting, but your solution and the gcc one are a pretty similar number of cycles (due to the large number of bits you shift) ? Except your solution is much more compact!
On 000 is practically same cycles.
But on 020+ (mcgeezer environment), thanks to barrel shifter, mine is faster.

And there another advantage: on upper register part you have directly the integer result.
On the lower the 1/2^16 fractional part, that you can scale with a shift or table to make it decimal
(the lsr stuff this in byte position and expels rounding errors, just to make it consistent with the original code).

GCC solution is for a simple, integer only, result.
ross is offline  
Old 20 March 2019, 11:01   #17
DanScott
Lemon. / Core Design
 
DanScott's Avatar
 
Join Date: Mar 2016
Location: Tier 5
Posts: 1,211
add.l #1310,Counter
move.w Counter,d0 ; d0 = Seconds passed.. accurate enough
DanScott is offline  
Old 20 March 2019, 11:59   #18
Hewitson
Registered User
 
Hewitson's Avatar
 
Join Date: Feb 2007
Location: Melbourne, Australia
Age: 41
Posts: 3,772
The only issue I have with some of the above solutions is that they provide very little clue as to what they do. Where possible I prefer to use code that I know I'll understand in the future, and would be obvious to someone if they were to perform a disassembly.
Hewitson is offline  
Old 20 March 2019, 13:18   #19
mcgeezer
Registered User
 
Join Date: Oct 2017
Location: Sunderland, England
Posts: 2,702
Quote:
Originally Posted by Hewitson View Post
The only issue I have with some of the above solutions is that they provide very little clue as to what they do. .
Then I guess all you have to do is ask the poster to elaborate on their code.

Last edited by mcgeezer; 20 March 2019 at 13:26.
mcgeezer is offline  
Old 20 March 2019, 13:23   #20
mcgeezer
Registered User
 
Join Date: Oct 2017
Location: Sunderland, England
Posts: 2,702
Quote:
Originally Posted by DanScott View Post
add.l #1310,Counter
move.w Counter,d0 ; d0 = Seconds passed.. accurate enough
What about the remainder value though?

As 1/50th of a second equals 1310 I need to effectively get that back to an integer value so that I know which value to plot.

Geezer
mcgeezer is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Are there any compressors/tools that re-arrange/block-divide data(pictures)? NorthWay Coders. General 15 15 May 2019 21:18
Fast multiply / divide by 64? mcgeezer Coders. Asm / Hardware 10 06 April 2018 19:29
DivIDE Type Device for Amstrad CPC 464 manic23 request.Other 9 16 August 2014 10:55
Use of 4MB PCMCIA Fast Flash Memory as Fast RAM in A1200 nkarytia support.Hardware 10 16 September 2011 13:37
ERROR: Dalek Attack (Integer Divide By Zero) Hungry Horace project.Killergorilla's WHD packs 18 20 September 2009 22:17

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 22:00.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.11637 seconds with 15 queries