English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 29 November 2023, 18:41   #1
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,205
Cycle accurate delay loop

Say you have some classic 7MHz code that does a standard, stupid busy loop:
Code:
Loop:     DBF D0,Loop
Assume it's running from ROM (so no memory access contention to worry about). How could you best replace it with something that takes the (roughly) same amount of time on a wide range of 020+ Amigas even for smallish (32) values?


Came up when I wanted to quickly extract A1000 startup sound as a more normal executable. This seems to work OK (empirically) on my machine:

Code:
        lsr.w   #1,\1
.\@
        tst.b   $bfe001
        dbf     \1,.\@
But CIA access speed varies (I think in principle with as fast as possible CIA access should work great: 1/10th 7MHz per iteration running from fast ram should be ~10 7Mhz cycles per DBF, but you can't do them back to back or maybe my accelerator card can't...).
paraj is offline  
Old 29 November 2023, 18:46   #2
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,491
CIA Timer?
ross is offline  
Old 29 November 2023, 19:00   #3
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,205
Quote:
Originally Posted by ross View Post
CIA Timer?
That'll work. Could be setup beforehand, but what values do I need? Is this the best we can do?
paraj is offline  
Old 29 November 2023, 19:16   #4
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,307
timer.device TR_ADDREQUEST and DoIO()?
Thomas Richter is offline  
Old 29 November 2023, 19:21   #5
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,491
Quote:
Originally Posted by paraj View Post
That'll work. Could be setup beforehand, but what values do I need? Is this the best we can do?
I normally use the greatest common divisor between the values that I will use as delay factors (and I insert this value into the Timer prescaler).
Then I pass the multiplier as a parameter and do a dbf of that value (with a loop checking the underflow bit).
ross is offline  
Old 29 November 2023, 19:45   #6
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,205
Quote:
Originally Posted by Thomas Richter View Post
timer.device TR_ADDREQUEST and DoIO()?

33 iterations of the loop is <5usec, I'll test (later) but I'm guessing it will be way off for small numbers.


Quote:
Originally Posted by ross View Post
I normally use the greatest common divisor between the values that I will use as delay factors (and I insert this value into the Timer prescaler).
Then I pass the multiplier as a parameter and do a dbf of that value (with a loop checking the underflow bit).

Seems very reasonable. Bonus points to solutions that don't require setup though. Even if you do require setup, but use CIA for small numbers setup/check factors in...


Difference of e.g. 10 CIA ticks doesn't matter too much, but if anyone can do better...
paraj is offline  
Old 29 November 2023, 19:53   #7
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,570
Manual audio mode and audio interrupts?
Toni Wilen is offline  
Old 29 November 2023, 20:01   #8
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,205
Quote:
Originally Posted by Toni Wilen View Post
Manual audio mode and audio interrupts?
Love it, but not for general use (already a bit ugly for A1000 startup sound as that uses all channels, though only one at a time)
paraj is offline  
Old 29 November 2023, 21:43   #9
jotd
This cat is no more
 
jotd's Avatar
 
Join Date: Dec 2004
Location: FRANCE
Age: 52
Posts: 8,369
here's what I use (Harry provided that code ages ago)

Code:
; < D0: value of D0 in line
; .x: DBF D0,x
emulate_dbf
	swap	D0
	clr.w	D0
	swap	D0
	divu.w	#$28,D0
	swap	D0
	clr.w	D0
	swap	D0
	bsr	beamdelay
	move.w	#$FFFF,d0
	rts
; < D0: numbers of vertical positions to wait
beamdelay
.bd_loop1
	move.w  d0,-(a7)
        move.b	$dff006,d0	; VPOS
.bd_loop2
	cmp.b	$dff006,d0
	beq.s	.bd_loop2
	move.w	(a7)+,d0
	dbf	d0,.bd_loop1
	rts
I often use beamdelay alone with pre-computed values. Mostly for keyboard (2) or DMA write (4 to 7 depending on how it really sounds).

Those busy waits end up having an impact on performance, so code rewrite would be better (with cia timers) but it's not trivial on an existing game code without source!
jotd is offline  
Old 30 November 2023, 17:54   #10
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,205
Yes, I've seen your beam delay function quite a bit I trust that the $28 divisor probably works well in practice given how often you guys have used it. And yeah, when doing whdload stuff it's not easy to know in general which CIA timers are available though I often see one used for disk stuff that I imagine would then be free later on (though you never know).

Guess it should be possible to also use the HPOS part to improve the precision a bit (though in the the end it probably won't matter much). Mostly idle curiosity.
paraj is offline  
Old 01 December 2023, 19:43   #11
Bruce Abbott
Registered User
 
Bruce Abbott's Avatar
 
Join Date: Mar 2018
Location: Hastings, New Zealand
Posts: 2,719
Quote:
Originally Posted by paraj View Post
Say you have some classic 7MHz code that does a standard, stupid busy loop:
Code:
Loop:     DBF D0,Loop
Assume it's running from ROM
Which ROM in which machine?
Bruce Abbott is offline  
Old 01 December 2023, 20:33   #12
Photon
Moderator
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,657
Quote:
Originally Posted by paraj View Post
That'll work. Could be setup beforehand, but what values do I need? Is this the best we can do?
Umm. The dbf value is an empirical value, so that was the best they could do.

ROM, custom register, chip ram access speed is not enough.

Raster is not enough, it will fail in productivity mode.

CIA clock timing is the only accurate option for waits longer than 2-3us, if you don't use audio interrupts/requests.

Busy-wait is still lazy and wasting resources, but you can use a one-shot CIA timer. Value set according to HRM. Between NTSC/PAL, the difference is small enough that you can use the largest value of the two.
Photon is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Correct Game Loop and Event Loop? mateusz_s Coders. System 0 28 March 2021 21:36
cycle exact cpu emulation speed, accurate ? turrican3 support.WinUAE 1 23 December 2013 18:20
Loop optimization + cycle counts losso Coders. Asm / Hardware 8 05 November 2013 11:50
ASM - Rolling Thunder sources - accurate delay Asman Coders. General 5 21 September 2010 23:15
How accurate is the emulation? manicx support.WinUAE 26 07 July 2003 08:35

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 06:09.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.10440 seconds with 13 queries