14 October 2018, 04:50 | #1 |
Junior Member
Join Date: Dec 2002
Location: The Streets
Age: 40
Posts: 2,731
|
Hunter's trackloader is driving me bonkers.
First off, it is I again. Yes, I felt flummoxed enough to ask for some help again despite my coding/understanding skills having improved somewhat since the last time I asked for assistance on something (and given that I went a long break without dealing with anything Amiga-related, that's saying something).
It concerns the 1991 Activision title Hunter. Its trackloader seems to work okay under a standard A500 setup, but try to test/run it on anything more significant than that - say, a 68020 A1200 - and said trackloader promptly keels over and gets stuck in a cha-cha-cha loop before it can even load the title card. Yes, I tried it in Cycle Exact mode as well - no dice. (Funny how it's never an issue during the first opening seconds with the red and black display.) And no, it's not copy-protection related. As well as the IPF (which I had to painstakingly step through the decryption routine via the WinUAE debugger to ensure it didn't unpack garbage after I had to change an "addq.l #6,a7" to "addq.l #8,a7" to get around a stack-related issue on 68020+ Amigas, just to confirm that yes, it wouldn't load anyway), I tried all of the existing scene cracks of the game. And they all exhibited the same problem. Yes, even the "AGA-fixed" edition by Nomad. Though that's probably because it falls over before it can even get to the part it had to change to make it work on AGA machines. After some messing around, I can confirm that it's definitely related to the speed/frequency of the CPU. Selecting "68020" by itself can cause issues quite easily, and can only really be rectified by significantly lowering the CPU speed to fairly slow figures ("1x" in Cycle Exact mode, or -90 to -80% without CE). Lo and behold, it'll work as fine as it does on a bog-standard A500. For the record, this seems to be where it gets stuck, starting from $10d86 in memory (A5 is pointing to $806, and A6 is pointing to $DFF000): Code:
10d86 = bsr 10d4e 61 c6 10d88 = move.w #$c000,d0 30 3c c0 00 10d8c = mulu.w d4,d4 c8 c4 <- D4 is always empty during this part, so it's probably just a delay tactic. But alas, not a very good one. 10d8e = and.w #$2,1e(a6) 02 6e 00 02 00 1e 10d94 = dbne.w d0,10d8c 56 c8 ff f6 <- the possible problematic part - if it moves to $10d98 with a D0 result of either FFFF or anything above 8000, it's a bad outcome. 10d98 = move.w #$4000,24(a6) 3d 7c 40 00 00 24 10d9e = tst.w d0 4a 40 10da0 = rts 4e 75 10d4e = move.w #$4000,24(a6) 3d 7c 40 00 00 24 10d54 = move.l 160a(a5),20(a6) 2d 6d 16 0a 00 20 10d5a = move.w #$6a00,9e(a6) 3d 7c 6a 00 00 9e 10d60 = move.w #$9500,9e(a6) 3d 7c 95 00 00 9e 10d66 = move.w 160e(a5),7e(a6) 3d 6d 16 0e 00 7e 10d6c = move.w 1608(a5),d0 30 2d 16 08 10d70 = addq.w #$1,d0 52 40 10d72 = or.w #$8000,d0 00 40 80 00 10d76 = move.w #$2,9c(a6) 3d 7c 00 02 00 9c 10d7c = move.w d0,24(a6) 3d 40 00 24 10d80 = move.w d0,24(a6) 3d 40 00 24 10d84 = rts 4e 75 Obligatory "pretty please with a cherry on top", natch. |
14 October 2018, 09:45 | #2 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,479
|
A wrong way to check that the track has been read completely.
You exit from dbne.w d0,10d8cin two way: - if d0 underflow (so when a fixed amount of cycles passed, muluis here to waste cycles) - if DSKBLK is satisfied from the previous Z flag set by the INTENAR read On fast processors the track is not completely read because muluis executed faster EDIT: the funny thing is that this routine could also fail for the opposite situation, that is in for very slow processors (impossible because there are no 000 Amiga with clock less than 7MHZ, but potentially could happen) Last edited by ross; 14 October 2018 at 10:54. |
14 October 2018, 12:17 | #3 |
Junior Member
Join Date: Dec 2002
Location: The Streets
Age: 40
Posts: 2,731
|
So what would be a good solution? Spam ten or twenty
mulus instead? I've never been good at understanding the Amiga's interrupts, diskloading routines or cycle-handling, unfortunately. |
14 October 2018, 13:10 | #4 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,479
|
Quote:
I'm in hurry, so no comment added, but maybe later I make some. Code:
00010D86 61C6 bsr.b $10d4e 00010D88 7007 moveq #7,d0 00010D8A 4840 swap d0 00010D8C 7802 .l moveq #2,d4 00010D8E C86E 001E and.w $1e(a6),d4 00010D92 6604 bne.b .df 00010D94 5380 subq.l #1,d0 00010D96 6AF4 bpl.b .l 00010D98 3D7C 4000 0024 .df move.w #$4000,$24(a6) 00010D9E 4A80 tst.l d0 00010DA0 4E75 rts |
|
14 October 2018, 13:56 | #5 |
move.l #$c0ff33,throat
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
|
|
14 October 2018, 15:11 | #6 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,479
|
Ok, got some time now to comment my code.
With a superficial look you can ask: "is not your code also cpu dependent?" Response: "no, or better only relatively!" This code is internal BUS timing dependent because even if you ignore all instructions on main wait cycle, I give to my routine a minimum time to last. EDIT: before giving up and thrown an error it takes a lot of time on A500 (~2.7s), on fast cached 030+ practically the time explained later. Some math (taking PAL Amiga as reference but the concept is the same on NTSC): - internal BUS timing is 3546895Hz (C2 clock), but the processor can access only the even cycles, so an INTENAR read require a fixed (1/C2*2)=~564ns time - a floppy revolution require ~200ms, but this is not a valid time because there is more to consider (how many bits are really required by the game throught DMA, how many parts (sectors) there is on track, how big is the gap) - disassembling we've a DMA request of $19CE words and 10 sectors/track (512 bytes/sector), so continue with math - $19CE words require ($19ce*16*2ns)=~211ms (time the DMA require to read from disk after the sync is found) - then there are two possible malus: 1) start reading immediately after the sync so you need a whole sector (1/10 track, ~20ms) before the next sync; 2) start reading at the gap start that can be bigger as $19CE-(512*10)-syncs-headers=~$5B0->~47ms; - take the worst value of the two and sum to reading time ~(211+47)=258ms - finally, how many CPU bus cycles there is in this time? (0.258s/564ns)=~457550->$6fb1c-> $70000 Now you can get where my [ moveq #7,d0, swap d0] comes from and you can understand why this routine is not casual and require a bit of Amiga consciousness to be made (at least for an online patched version) Cheers! PS: not that I would use such a technique, in fact in my loader I use the CIA timers, but for a dirty patch it can also be fine Last edited by ross; 14 October 2018 at 18:07. Reason: cosmetic |
15 October 2018, 01:44 | #7 |
Junior Member
Join Date: Dec 2002
Location: The Streets
Age: 40
Posts: 2,731
|
Well I tried out your code, Ross. Tested it quickly with the Fairlight crack before I worry about the other cracks or an original.
It dawned on me that I probably should've shown at least what happens after D0 is word-tested at the end of that routine: Code:
10b3c = bsr 10cd2 61 00 01 94 10b40 = movea.l 1618(a5),a4 28 6d 16 18 10b44 = subq.b #$1,1620(a5) 53 2d 16 20 10b48 = bpl 10b5e 6a 14 10b4a = subq.b #$1,1621(a5) 53 2d 16 21 10b4e = bmi 10b82 6b 32 10b50 = st.b 1626(a5) 50 ed 16 26 10b54 = bsr 10be4 61 00 00 8e 10b58 = move.b #$2,1620(a5) 1b 7c 00 02 16 20 10b5e = move.l a4,1618(a5) 2b 4c 16 18 10b62 = bsr 10d86 61 00 02 22 <--- this is the routine explained earlier. 10b66 = bmi 10b3c 6b d4 <--- if D0 is seen as negative, branches back to $10b3c. We want it to carry on forwards. 10b68 = movea.l 1610(a5),a2 24 6d 16 10 10b6c = jsr (a2) 4e 92 10b6e = bmi 10b3c 6b cc 10b70 = addq.w #$1,1624(a5) 52 6d 16 24 10b74 = bsr 10cb0 61 00 01 3a 10b78 = subq.w #$1,162a(a5) 53 6d 16 2a 10b7c = bpl 10b58 6a da 10b7e = moveq #$0,d4 78 00 10b80 = rts 4e 75 Anyhow, on a quickstart A1200 WinUAE config, starting from $70000 seemed to result in a negative D0 all the time, so I tried increasing it to $78000 (this required me to find some space elsewhere on the disk to branch to in order to have room for a "move.l #78000,d0" - the start of the copylock routine seemed ideal as part of the beginning of it already appeared to have been wiped from the ADF so it's probably not accessed anyway and FLT took a different approach to inserting the serial in the correct address, but I digress!) and this worked out better. Only problem is that now it's caused the opposite effect of the trackloader failing on an A500 setup. You're right in that your implementation is still largely at the mercy of the CPU and what speed it might be set to on any given day. I did think about testing for different processor bits in AttnFlags, setting the appropriate flag somewhere within low memory, and then setting up differing high numbers for D0 depending on what flag it found in what spot, but that might end up being a very long list if I was required to also take into account all the various processors, CPUs and other bits of hardware that an Amiga could end up having any potential combination of. In regards to CIA timers, I was consulting this page just now (specifically the part about the timers, natch) and wondered if that's exactly what I'm supposed to use, but am wracking my brain thinking about how to correctly implement it into the game's code, especially if I'm supposed to still use and.wagainst $dff01e (INTREQR) at some point. Do I have to check against both CIAs and both timers for this purpose, or will one of them suffice? Also, what does "<<3" mean in move.b #1<<3,$bfee01-$bfd100(a5)? Evidently not "less than less than 3". (Just for curiousity btw, I wanted to see what happens if you simply NOPed out the BMI at $10b66, and while it seems to allow the tracks to load, obvious glitches will occur such as the title card not popping up, and the in-game graphics going through varying shades of groovy colours and what not. So yeah, can't get away with that one. ) Eeehh, I wasn't expecting it to be a viable solution. I only wondered about it because I recall seeing a replacement trackloader in one previous crack of another game implement a lot of random NOPs here and there and I assumed they were intended to cause necessary delays to time the CPU and disk drive right. But it *was* one of those late 80's cracks so it would probably fail against newer processors anyway. |
15 October 2018, 09:11 | #8 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,479
|
You did not look carefully at my code..
I've used tst.l d0. When this can become negative? Only if "time-out" time has passed! I've only described the tricky part, not the trivial.. |
15 October 2018, 09:18 | #9 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,479
|
Quote:
Not only in the loader there are problems with fast processors but in various parts of the code. |
|
15 October 2018, 10:23 | #10 |
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,423
|
Ah yes, the joys of using CPU instruction timings to delay things. Or, perhaps better called: woe to anyone using a faster processor.
I still wonder why people regularly coded delays based on the duration of specific CPU instructions. Even back in the 1980s faster CPU's were available for the Amiga. I recall reading warnings against using such delay loops even in fairly non-technical magazines. Perhaps I'm underestimating the number of people who coded directly on an A500. Anyway, I tend to either using raster timing (VPOSR) or timer/vbl interrupts to get CPU independent delays. I find waiting on VPOSR to be the easiest to implement, though I do dislike busy waiting loops. |
15 October 2018, 10:51 | #11 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,479
|
The habit of making CPU timed wait was embarrassing even on early '80...
There is so much precise time sources on Amiga: CIA E-clock, CIA hsync, CIA TOD, VPOS, VBL, BUS C2, is practically all synchronized! Maybe only laziness? |
15 October 2018, 11:17 | #12 | |
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,423
|
Quote:
That said, laziness is obviously still an option |
|
15 October 2018, 12:55 | #13 |
move.l #$c0ff33,throat
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
|
|
15 October 2018, 13:04 | #14 | |
Junior Member
Join Date: Dec 2002
Location: The Streets
Age: 40
Posts: 2,731
|
Quote:
move.w #4000,24(a6)began. That's what I get for not looking at that little tstmore closely! So cool, *now* it works good and all. I was testing various processors with it up to 68060, and I wondered.... one could just start as high up as $80000000 if they wanted to and then it'd be good for all floppy drives and configurations, right? Read my last message again. It explains my train of thought at the time. |
|
15 October 2018, 13:19 | #15 | |
move.l #$c0ff33,throat
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
|
Quote:
Right... Good to see you're still the same old MethodGit you've always been. |
|
15 October 2018, 15:58 | #16 |
Junior Member
Join Date: Dec 2002
Location: The Streets
Age: 40
Posts: 2,731
|
|
15 October 2018, 16:41 | #17 | ||
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,479
|
Quote:
Quote:
Need to be as low as possible (for the slow machines) but high enough to unbind the value from a defined processor and tie to a consistent timer. Sure you can use large value (so using moveq #8,d0is practically the same, even more safe), but if you use absurdly high values (like what you have proposed) then before an error is signaled you would literally spend hours waiting on A500. |
||
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
AlphaOne Trackloader Vers 2004 (404 Byte) Question | Giants | Coders. Asm / Hardware | 22 | 05 October 2018 10:43 |
REQ:ASM Trackloader | spud | Coders. Tutorials | 9 | 16 August 2018 11:11 |
Trackloader without working /RDY pin | phx | Coders. Asm / Hardware | 4 | 05 October 2017 16:39 |
Wonderful world trackloader demo | dottyflowers | request.Demos | 1 | 23 May 2016 00:41 |
Driving me around the bend... | zerohour1974 | New to Emulation or Amiga scene | 7 | 10 April 2015 18:28 |
|
|