English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General

 
 
Thread Tools
Old 14 October 2018, 05:50   #1
MethodGit
Junior Member
MethodGit's Avatar
 
Join Date: Dec 2002
Location: The Streets
Age: 34
Posts: 2,729
Hunter's trackloader is driving me bonkers.

First off, it is I again. Yes, I felt flummoxed enough to ask for some help again despite my coding/understanding skills having improved somewhat since the last time I asked for assistance on something (and given that I went a long break without dealing with anything Amiga-related, that's saying something).

It concerns the 1991 Activision title Hunter. Its trackloader seems to work okay under a standard A500 setup, but try to test/run it on anything more significant than that - say, a 68020 A1200 - and said trackloader promptly keels over and gets stuck in a cha-cha-cha loop before it can even load the title card. Yes, I tried it in Cycle Exact mode as well - no dice. (Funny how it's never an issue during the first opening seconds with the red and black display.)

And no, it's not copy-protection related. As well as the IPF (which I had to painstakingly step through the decryption routine via the WinUAE debugger to ensure it didn't unpack garbage after I had to change an "addq.l #6,a7" to "addq.l #8,a7" to get around a stack-related issue on 68020+ Amigas, just to confirm that yes, it wouldn't load anyway), I tried all of the existing scene cracks of the game. And they all exhibited the same problem. Yes, even the "AGA-fixed" edition by Nomad. Though that's probably because it falls over before it can even get to the part it had to change to make it work on AGA machines.

After some messing around, I can confirm that it's definitely related to the speed/frequency of the CPU. Selecting "68020" by itself can cause issues quite easily, and can only really be rectified by significantly lowering the CPU speed to fairly slow figures ("1x" in Cycle Exact mode, or -90 to -80% without CE). Lo and behold, it'll work as fine as it does on a bog-standard A500.

For the record, this seems to be where it gets stuck, starting from $10d86 in memory (A5 is pointing to $806, and A6 is pointing to $DFF000):
Code:
10d86 = bsr 10d4e				61 c6
10d88 = move.w #$c000,d0			30 3c c0 00
10d8c = mulu.w d4,d4				c8 c4			<- D4 is always empty during this part, so it's probably just a delay tactic.  But alas, not a very good one.
10d8e = and.w #$2,1e(a6)			02 6e 00 02 00 1e
10d94 = dbne.w d0,10d8c				56 c8 ff f6		<- the possible problematic part - if it moves to $10d98 with a D0 result of either FFFF or anything above 8000, it's a bad outcome.
10d98 = move.w #$4000,24(a6)			3d 7c 40 00 00 24
10d9e = tst.w d0				4a 40
10da0 = rts					4e 75

10d4e = move.w #$4000,24(a6)			3d 7c 40 00 00 24
10d54 = move.l 160a(a5),20(a6)			2d 6d 16 0a 00 20
10d5a = move.w #$6a00,9e(a6)			3d 7c 6a 00 00 9e
10d60 = move.w #$9500,9e(a6)			3d 7c 95 00 00 9e
10d66 = move.w 160e(a5),7e(a6)			3d 6d 16 0e 00 7e
10d6c = move.w 1608(a5),d0			30 2d 16 08
10d70 = addq.w #$1,d0				52 40
10d72 = or.w #$8000,d0				00 40 80 00
10d76 = move.w #$2,9c(a6)			3d 7c 00 02 00 9c
10d7c = move.w d0,24(a6)			3d 40 00 24
10d80 = move.w d0,24(a6)			3d 40 00 24
10d84 = rts					4e 75
I've tried to come up with varying delay tricks in the code such as DBF loops and a VPOSR check without any success. I tend to get quite moody if I feel like I'm being defeated at every turn, and it's getting quite late where I am, so I'm willing to bite the bullet and ask one of the experts on here to point out to me where I'm going all wrong and what can be done to make the trackloader behave correctly on all Amigas of all shapes, sizes and, yes, speeds. Heck, it would be nice to confirm whether it's technically the fault of WinUAE, the game code, or something else altogether. (But if anyone does provide some code, try to showcase it in something I can understand straight away, like how I write things. I don't go by the MC68000 instruction manual for a living and don't use labels for everything.)

Obligatory "pretty please with a cherry on top", natch.
MethodGit is offline  
Old 14 October 2018, 10:45   #2
ross
Omnia fert aetas

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 48
Posts: 1,234
A wrong way to check that the track has been read completely.

You exit from
dbne.w d0,10d8c
in two way:
- if d0 underflow (so when a fixed amount of cycles passed,
mulu
is here to waste cycles)
- if DSKBLK is satisfied from the previous Z flag set by the INTENAR read

On fast processors the track is not completely read because
mulu
is executed faster

EDIT: the funny thing is that this routine could also fail for the opposite situation, that is in for very slow processors
(impossible because there are no 000 Amiga with clock less than 7MHZ, but potentially could happen)

Last edited by ross; 14 October 2018 at 11:54.
ross is offline  
Old 14 October 2018, 13:17   #3
MethodGit
Junior Member
MethodGit's Avatar
 
Join Date: Dec 2002
Location: The Streets
Age: 34
Posts: 2,729
So what would be a good solution? Spam ten or twenty
mulu
s instead? I've never been good at understanding the Amiga's interrupts, diskloading routines or cycle-handling, unfortunately.
MethodGit is offline  
Old 14 October 2018, 14:10   #4
ross
Omnia fert aetas

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 48
Posts: 1,234
Quote:
Originally Posted by MethodGit View Post
So what would be a good solution? Spam ten or twenty
mulu
s instead? I've never been good at understanding the Amiga's interrupts, diskloading routines or cycle-handling, unfortunately.
I suppose you want an in-line solution, so you have to do some trick.

I'm in hurry, so no comment added, but maybe later I make some.

Code:
00010D86  61C6                            bsr.b   $10d4e
00010D88  7007                            moveq   #7,d0
00010D8A  4840                            swap    d0
00010D8C  7802                        .l  moveq   #2,d4
00010D8E  C86E 001E                       and.w   $1e(a6),d4
00010D92  6604                            bne.b   .df
00010D94  5380                            subq.l  #1,d0
00010D96  6AF4                            bpl.b   .l
00010D98  3D7C 4000 0024              .df move.w  #$4000,$24(a6)
00010D9E  4A80                            tst.l   d0
00010DA0  4E75                            rts
ross is offline  
Old 14 October 2018, 14:56   #5
StingRay
move.l #$c0ff33,throat

StingRay's Avatar
 
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,146
Quote:
Originally Posted by MethodGit View Post
So what would be a good solution? Spam ten or twenty
mulu
s instead?

Think about what you want to do and ask yourself if that is a reliable way to fix problems caused by CPU dependent loops. You may answer your question yourself then.
StingRay is offline  
Old 14 October 2018, 16:11   #6
ross
Omnia fert aetas

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 48
Posts: 1,234
Ok, got some time now to comment my code.
With a superficial look you can ask: "is not your code also cpu dependent?"
Response: "no, or better only relatively!"

This code is internal BUS timing dependent because even if you ignore all instructions on main wait cycle, I give to my routine a minimum time to last.
EDIT: before giving up and thrown an error it takes a lot of time on A500 (~2.7s), on fast cached 030+ practically the time explained later.

Some math (taking PAL Amiga as reference but the concept is the same on NTSC):

- internal BUS timing is 3546895Hz (C2 clock), but the processor can access only the even cycles, so an INTENAR read require a fixed (1/C2*2)=~564ns time

- a floppy revolution require ~200ms, but this is not a valid time because there is more to consider
(how many bits are really required by the game throught DMA, how many parts (sectors) there is on track, how big is the gap)

- disassembling we've a DMA request of $19CE words and 10 sectors/track (512 bytes/sector), so continue with math

- $19CE words require ($19ce*16*2ns)=~211ms (time the DMA require to read from disk after the sync is found)

- then there are two possible malus:
1) start reading immediately after the sync so you need a whole sector (1/10 track, ~20ms) before the next sync;
2) start reading at the gap start that can be bigger as $19CE-(512*10)-syncs-headers=~$5B0->~47ms;

- take the worst value of the two and sum to reading time ~(211+47)=258ms

- finally, how many CPU bus cycles there is in this time? (0.258s/564ns)=~457550->$6fb1c->
$70000


Now you can get where my [
moveq #7,d0
,
swap d0
] comes from and you can understand why this routine is not casual and require a bit of Amiga consciousness to be made
(at least for an online patched version)

Cheers!

PS: not that I would use such a technique, in fact in my loader I use the CIA timers, but for a dirty patch it can also be fine

Last edited by ross; 14 October 2018 at 19:07. Reason: cosmetic
ross is offline  
Old 15 October 2018, 02:44   #7
MethodGit
Junior Member
MethodGit's Avatar
 
Join Date: Dec 2002
Location: The Streets
Age: 34
Posts: 2,729
Well I tried out your code, Ross. Tested it quickly with the Fairlight crack before I worry about the other cracks or an original.

It dawned on me that I probably should've shown at least what happens after D0 is word-tested at the end of that routine:

Code:
10b3c = bsr 10cd2		61 00 01 94
10b40 = movea.l 1618(a5),a4	28 6d 16 18
10b44 = subq.b #$1,1620(a5)	53 2d 16 20
10b48 = bpl 10b5e		6a 14
10b4a = subq.b #$1,1621(a5)	53 2d 16 21
10b4e = bmi 10b82		6b 32
10b50 = st.b 1626(a5)		50 ed 16 26
10b54 = bsr 10be4		61 00 00 8e
10b58 = move.b #$2,1620(a5)	1b 7c 00 02 16 20
10b5e = move.l a4,1618(a5)	2b 4c 16 18
10b62 = bsr 10d86		61 00 02 22	<--- this is the routine explained earlier.
10b66 = bmi 10b3c		6b d4		<--- if D0 is seen as negative, branches back to $10b3c.  We want it to carry on forwards.
10b68 = movea.l 1610(a5),a2	24 6d 16 10
10b6c = jsr (a2)		4e 92
10b6e = bmi 10b3c		6b cc
10b70 = addq.w #$1,1624(a5)	52 6d 16 24
10b74 = bsr 10cb0		61 00 01 3a
10b78 = subq.w #$1,162a(a5)	53 6d 16 2a
10b7c = bpl 10b58		6a da
10b7e = moveq #$0,d4		78 00
10b80 = rts			4e 75
I understand that BMI stands for "Branch on MInus (Negative)" (thanks to this page). So that's why I wanted it to avoid seeing a 'negative' word result of anywhere between 8000-FFFF. Which can be difficult when the D0 counter can finish literally anywhere once the floppy revolution's been detected.

Anyhow, on a quickstart A1200 WinUAE config, starting from $70000 seemed to result in a negative D0 all the time, so I tried increasing it to $78000 (this required me to find some space elsewhere on the disk to branch to in order to have room for a "move.l #78000,d0" - the start of the copylock routine seemed ideal as part of the beginning of it already appeared to have been wiped from the ADF so it's probably not accessed anyway and FLT took a different approach to inserting the serial in the correct address, but I digress!) and this worked out better. Only problem is that now it's caused the opposite effect of the trackloader failing on an A500 setup.

You're right in that your implementation is still largely at the mercy of the CPU and what speed it might be set to on any given day. I did think about testing for different processor bits in AttnFlags, setting the appropriate flag somewhere within low memory, and then setting up differing high numbers for D0 depending on what flag it found in what spot, but that might end up being a very long list if I was required to also take into account all the various processors, CPUs and other bits of hardware that an Amiga could end up having any potential combination of. In regards to CIA timers, I was consulting this page just now (specifically the part about the timers, natch) and wondered if that's exactly what I'm supposed to use, but am wracking my brain thinking about how to correctly implement it into the game's code, especially if I'm supposed to still use
and.w
against $dff01e (INTREQR) at some point. Do I have to check against both CIAs and both timers for this purpose, or will one of them suffice? Also, what does "<<3" mean in
move.b	#1<<3,$bfee01-$bfd100(a5)
? Evidently not "less than less than 3".

(Just for curiousity btw, I wanted to see what happens if you simply NOPed out the BMI at $10b66, and while it seems to allow the tracks to load, obvious glitches will occur such as the title card not popping up, and the in-game graphics going through varying shades of groovy colours and what not. So yeah, can't get away with that one. )

Quote:
Originally Posted by StingRay View Post
Think about what you want to do and ask yourself if that is a reliable way to fix problems caused by CPU dependent loops. You may answer your question yourself then.
Eeehh, I wasn't expecting it to be a viable solution. I only wondered about it because I recall seeing a replacement trackloader in one previous crack of another game implement a lot of random NOPs here and there and I assumed they were intended to cause necessary delays to time the CPU and disk drive right. But it *was* one of those late 80's cracks so it would probably fail against newer processors anyway.
MethodGit is offline  
Old 15 October 2018, 10:11   #8
ross
Omnia fert aetas

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 48
Posts: 1,234
You did not look carefully at my code..
I've used
tst.l d0
. When this can become negative? Only if "time-out" time has passed!

I've only described the tricky part, not the trivial..
ross is offline  
Old 15 October 2018, 10:18   #9
ross
Omnia fert aetas

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 48
Posts: 1,234
Quote:
Originally Posted by MethodGit View Post
(Just for curiousity btw, I wanted to see what happens if you simply NOPed out the BMI at $10b66, and while it seems to allow the tracks to load, obvious glitches will occur such as the title card not popping up, and the in-game graphics going through varying shades of groovy colours and what not. So yeah, can't get away with that one. )
So you can understand why it is not possible to create a "real" patch without a little deeper knowledge.

Not only in the loader there are problems with fast processors but in various parts of the code.
ross is offline  
Old 15 October 2018, 11:23   #10
roondar
Registered User

 
Join Date: Jul 2015
Location: The Netherlands
Posts: 554
Ah yes, the joys of using CPU instruction timings to delay things. Or, perhaps better called: woe to anyone using a faster processor.

I still wonder why people regularly coded delays based on the duration of specific CPU instructions. Even back in the 1980s faster CPU's were available for the Amiga. I recall reading warnings against using such delay loops even in fairly non-technical magazines. Perhaps I'm underestimating the number of people who coded directly on an A500.

Anyway, I tend to either using raster timing (VPOSR) or timer/vbl interrupts to get CPU independent delays. I find waiting on VPOSR to be the easiest to implement, though I do dislike busy waiting loops.
roondar is offline  
Old 15 October 2018, 11:51   #11
ross
Omnia fert aetas

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 48
Posts: 1,234
The habit of making CPU timed wait was embarrassing even on early '80...
There is so much precise time sources on Amiga: CIA E-clock, CIA hsync, CIA TOD, VPOS, VBL, BUS C2, is practically all synchronized!
Maybe only laziness?
ross is offline  
Old 15 October 2018, 12:17   #12
roondar
Registered User

 
Join Date: Jul 2015
Location: The Netherlands
Posts: 554
Quote:
Originally Posted by ross View Post
The habit of making CPU timed wait was embarrassing even on early '80...
There is so much precise time sources on Amiga: CIA E-clock, CIA hsync, CIA TOD, VPOS, VBL, BUS C2, is practically all synchronized!
Maybe only laziness?
Thinking about it some more, it may simply be that people 'graduated' from the 8 bit systems (C64, Spectrum, CPC) and were simply not aware of the problems involving a system that also included models with much faster CPU's. On the 8-bits, while not 'neat and tidy', a CPU based loop is often good enough. Plus, many effects on the 8-bit systems did require delaying the CPU with just enough NOP's.


That said, laziness is obviously still an option
roondar is offline  
Old 15 October 2018, 13:55   #13
StingRay
move.l #$c0ff33,throat

StingRay's Avatar
 
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,146
Quote:
Originally Posted by MethodGit View Post
Eeehh, I wasn't expecting it to be a viable solution.

Why did you even consider it then?
StingRay is offline  
Old 15 October 2018, 14:04   #14
MethodGit
Junior Member
MethodGit's Avatar
 
Join Date: Dec 2002
Location: The Streets
Age: 34
Posts: 2,729
Quote:
Originally Posted by ross View Post
You did not look carefully at my code..
I've used
tst.l d0
. When this can become negative? Only if "time-out" time has passed!

I've only described the tricky part, not the trivial..
Well ain't I a freaking imbecile. I compared your replacement code to the original and assumed the differences ended where
move.w #4000,24(a6)
began. That's what I get for not looking at that little
tst
more closely!

So cool, *now* it works good and all. I was testing various processors with it up to 68060, and I wondered.... one could just start as high up as $80000000 if they wanted to and then it'd be good for all floppy drives and configurations, right?

Quote:
Originally Posted by StingRay View Post
Why did you even consider it then?
Read my last message again. It explains my train of thought at the time.
MethodGit is offline  
Old 15 October 2018, 14:19   #15
StingRay
move.l #$c0ff33,throat

StingRay's Avatar
 
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,146
Quote:
Read my last message again. It explains my train of thought at the time.

Right... Good to see you're still the same old MethodGit you've always been.
StingRay is offline  
Old 15 October 2018, 16:58   #16
MethodGit
Junior Member
MethodGit's Avatar
 
Join Date: Dec 2002
Location: The Streets
Age: 34
Posts: 2,729
Quote:
Originally Posted by StingRay View Post
Right... Good to see you're still the same old MethodGit you've always been.
Because I had to remind you to re-read something you chose to ignore the first time? Give me a break.
MethodGit is offline  
Old 15 October 2018, 17:41   #17
ross
Omnia fert aetas

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 48
Posts: 1,234
Quote:
Originally Posted by MethodGit View Post
That's what I get for not looking at that little
tst
more closely!
When you write in assembly you can't trascure anything, the devil is in the detail..

Quote:
So cool, *now* it works good and all. I was testing various processors with it up to 68060, and I wondered.... one could just start as high up as $80000000 if they wanted to and then it'd be good for all floppy drives and configurations, right?
Yes and no.. i've detailed the reason behind this value.
Need to be as low as possible (for the slow machines) but high enough to unbind the value from a defined processor and tie to a consistent timer.

Sure you can use large value (so using
moveq #8,d0
is practically the same, even more safe), but if you use absurdly high values (like what you have proposed) then before an error is signaled you would literally spend hours waiting on A500.
ross is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
AlphaOne Trackloader Vers 2004 (404 Byte) Question Giants Coders. Asm / Hardware 22 05 October 2018 11:43
REQ:ASM Trackloader spud Coders. Tutorials 9 16 August 2018 12:11
Trackloader without working /RDY pin phx Coders. Asm / Hardware 4 05 October 2017 17:39
Wonderful world trackloader demo dottyflowers request.Demos 1 23 May 2016 01:41
Driving me around the bend... zerohour1974 New to Emulation or Amiga scene 7 10 April 2015 19:28

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 05:57.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2018, vBulletin Solutions Inc.
Page generated in 0.09264 seconds with 13 queries