English Amiga Board

English Amiga Board (https://eab.abime.net/index.php)
-   project.WHDLoad (https://eab.abime.net/forumdisplay.php?f=63)
-   -   Zool WHDLoad Fix please (https://eab.abime.net/showthread.php?t=87498)

th4t1guy 25 June 2017 21:42

Quote:

Originally Posted by ross (Post 1167544)
Hi th4t1guy, can you post complete specs?

Thanks for the report.

Bye,
ross

NTSC CD32 with sx-1 + 8mb fast ram
running ClassicWB 3.1 from 8gb CF card.

Only thing changed was I set "PAL" on the tooltype.

ross 25 June 2017 21:46

Quote:

Originally Posted by Akira (Post 1163801)
Zool and Zool 2 have massive slowdown issues. Nobody seems to be interested in looking at it because it's a shit ton of work from what I read.

Yes, there is several changes to be made.

I can go into technical details and, at the end of beta stage, release the source code of the slave (but first i've to talk to Wepl about it, my use of WHDLoad is improper..).

There would be more to do but too much work required (relocate code/data to fast RAM, even use FMODE=1 :shocked), so I hope that what I did is enough.

Bye,
ross

Nibbler 26 June 2017 12:36

1 Attachment(s)
Hi Ross.
I Tested the Beta of the AGA version and its way better. :great
Slowdowns happen here and there but i think thats the game itself.
The Blitter waits are pretty much fixed as far as i can tell.


The 16 Bit version would be sweet too:) (for my A600 FuriaV2)
Here is a very interresting readme from the JST version.
Maybe it is worth taking a look.

HUGE greetings & Thanks, Nibbler :spin

ross 26 June 2017 16:34

Quote:

Originally Posted by StingRay (Post 1163998)
Probably some missing blitter waits and/or self-modifying code is responsible for that.

I've only looked at Zool AGA code but I'm pretty sure that OCS is similar
(AGA seems a quick OCS transcoding.. brutally increasing DMA usage from 4 to 7BPL(4+3) and FMODE=0 :().

There is SMC and reversed BlitterWaits, so you be right
(all patched in new AGA beta slave).

Bye,
ross

Wepl 27 June 2017 00:14

Quote:

Originally Posted by ross (Post 1167132)
I've no idea if can be solved the slowdowns but my approach is different from previous (and i'm sure that Wepl do not approve my slave (ab)use ;)).

I cannot approve changing the VBR in the Slave. It breaks several WHDLoad functions and is not supported. :)
The extra cycles consumed by WHDLoad during interrupt processing are minimal. I doubt that these can result in any noticeable degradation in normal demos/games. Excessive unusual interrupt load would be necessary for that.
But your are welcome to prove the opposite. ;)

SCO_1 27 June 2017 04:01

I've actually seen whdload Zool 2 freeze in fs-UAE.

Unfortunately, a great many games have timing faults of some kind that are heisenbugs.

It's probably a pity reporting is not automated, so almost for certain, a great amount of semi-reproducible bugs are slipping through unnoticed.

ross 27 June 2017 12:11

Quote:

Originally Posted by Wepl (Post 1167787)
I cannot approve changing the VBR in the Slave. It breaks several WHDLoad functions and is not supported. :)
The extra cycles consumed by WHDLoad during interrupt processing are minimal. I doubt that these can result in any noticeable degradation in normal demos/games. Excessive unusual interrupt load would be necessary for that.
But your are welcome to prove the opposite. ;)

Ok, i make a clean version ;)

Bye!
ross

ross 27 June 2017 20:06

WHDLoad impact
 
Ok, time for some numbers
(they are approximate, the variables to consider are many more).

I've rewritten IRQ services L2 and L3 (CIAA/VBL).
For CIAA I 've used TimerA/TimerB/SP (before there is only SP).
TimerA is for KBD ACK after SP IRQ so is impredictable.

TimerB is for DMA start of sample in PT replay routine (VBL) and for repeat point (before there was a delay done with a cpu loop...).
So i've a minumum of 3 IRQ / frame.

In fast processor Wepl code can take 0.5 H line, so 1.5 line/frame for Zool.

This is a mere 0.5% of the total time (can be 1% in slow processor).

note: these numbers are for a non-optimal situation (stack in chip ram, vector call cycles in whdload routine count ...)

So I think WHDLoad cannot be the reason of the slowdowns :)

Cheers,
ross

ross 27 June 2017 21:39

In The Zone! a pure WHDLoad slave :)

Please test and report if slower than previous version.

@Wepl: for some critical demo/game my _Disable/_Enable can be useful.
What about an official version? (with some precaution i proved that can be secure and usable)
A big NOOO can be an acceptable response ;)

Bye!
ross

Wepl 27 June 2017 23:11

Quote:

Originally Posted by ross (Post 1167934)
Ok, time for some numbers
(they are approximate, the variables to consider are many more).

In fast processor Wepl code can take 0.5 H line, so 1.5 line/frame for Zool.

This is a mere 0.5% of the total time (can be 1% in slow processor).

note: these numbers are for a non-optimal situation (stack in chip ram, vector call cycles in whdload routine count ...)

Very nice to know :)
On which CPU do you tested? And was there fast ram?

Quote:

Originally Posted by ross (Post 1167950)
@Wepl: for some critical demo/game my _Disable/_Enable can be useful.

If there is an example game/demo which shows a slowdown because of WHDLoad extra code in the interrupts I will add an option to WHDLoad (e.g. FastInts/S) which will remove the extra checks in the Ints. This is the best solution I think.
Quote:

Originally Posted by ross (Post 1167950)
What about an official version? (with some precaution i proved that can be secure and usable)
A big NOOO can be an acceptable response ;)

What do you mean with 'official version'?

ross 28 June 2017 01:04

Thanks Wepl for the interest!

Quote:

Originally Posted by Wepl (Post 1167968)
On which CPU do you tested? And was there fast ram?

I own only an Amiga: WinUAE... But I'm orthodox: only CPU/DMA/Memory_access cycle exact.
In this case a '030 8x base freq. + exp. RAM. (yes, for me this is a reference/fast machine..).
The latency emulation in WinUAE in regard to internal bus is VERY good.
I check time elapsed from HPOS.
In your case there is latency from: CPU cycles for the IRQ start, some environment/stack checks, some custom/CIA read/check, get autovector from chip_RAM, make the new stack, call the real service (with probably supervisor SP in chip..).
I made some fast test with an emulated vanilla A1200 resulting near one H-line (but for precise result I've to repeat.. ).
On a '060 sure much better than 0.5H (practically only latency from internal bus).

I'm really sorry for lack of real Amiga... :sad
But I trust WinUAE.

Quote:

If there is an example game/demo which shows a slowdown because of WHDLoad extra code in the interrupts I will add an option to WHDLoad (e.g. FastInts/S) which will remove the extra checks in the Ints. This is the best solution I think.
Do you mean all ints like L4/L5/L6 [move.l 7x.w,-(sp) / rts] ?
Can be a solution (absolutely fast!), but in users hands..

Quote:

What do you mean with 'official version'?
A resload_Disable()/resload_Enable(), but a slave rewrite is required...

Both solutions have pros and cons.

And first we need a demo/game that generate an IRQ storm..
But after there's no more excuse for a slow slave :D

Bye!
ross

th4t1guy 28 June 2017 05:23

I ran the side by side comparison with the new slave, and speedwise, it's very close to the first beta you put out.
However, I think the first beta seems to be just little smoother overall when lots of sprites are on screen.

If I set speed = "FAST" and inertia = "OFF" in the options, the speed differences seem to be a little more pronounced.

ross 28 June 2017 09:24

Quote:

Originally Posted by th4t1guy (Post 1168014)
I ran the side by side comparison with the new slave, and speedwise, it's very close to the first beta you put out.
However, I think the first beta seems to be just little smoother overall when lots of sprites are on screen.

If I set speed = "FAST" and inertia = "OFF" in the options, the speed differences seem to be a little more pronounced.

Thanks, this is significative.

So Zool AGA is really time critical and a good candidate for
FastInts/S
or
resload_Disable()
.
This is surely due to the exaggerate usage of BPL DMA / small BLIT (internal bus is hogged too much).
Not to mention the intensive use of variables in chip memory (CPU register usage is 'optional'..).

The little 0.5% is in most way related to free internal cycles so is multiplied and make some difference.

:)

Nibbler 30 June 2017 18:07

Hi Ross & Co. (Sorry, it took a while )

I agree with th4t1guy. The first beta was the "smoothest" ;)
Not much of a difference; just a little bit.

(i played always the same style in level 1-1... just pressing right on the joypad and fire fireballs... until the forced walljump passage with the "double zool ability" ... then keep on running right without any pause and :evilgrin shooooooooting fireballs :evilgrin )

a good stress test;)
(man, if zool would scroll like kid chaos :spin !!!! )

Thank you ross :bowdown

ross 30 June 2017 19:24

Working on Zool (OCS).

In theory it should go much better than the AGA version.

See you soon,
ross

Nibbler 30 June 2017 20:26

No hurry Ross. Ready when you are. Big fan ;)

DamienD 30 June 2017 20:49

Quote:

Originally Posted by Nibbler (Post 1168766)
Big fan ;)

Really? Nobody would have ever guessed this :p

Nibbler 01 July 2017 10:53

- hihi - :spin

ross 02 July 2017 20:13

Zool (OCS) WHDLoad rework beta (in The Zone)

phew, more patch points than AGA version... :)

Ciao,
ross

Shatterhand 02 July 2017 20:59

I remember reviews pointing out that Zool AGA had lots of slowdowns and jerkiness not present on the OCS version.. this was magazines back at the time, I remember "The One" review pointing out there's no point in add parallax and more colors if the game is going to run like shit.

Maybe this is not an WHDLoad problem indeed?

As I never owned an AGA machine I can't give much personal info about it.


All times are GMT +2. The time now is 12:16.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.

Page generated in 0.05218 seconds with 11 queries