01 August 2020, 20:55 | #1 |
Registered User
Join Date: Oct 2017
Location: Sunderland, England
Posts: 2,702
|
Mega Typhoon Deconstruction
I've been meaning to take a look at Mega Typhoon for a while now and got around to doing it today for the first time.
While I did guess correctly that the game was running in 16 colours, I didn't anticipate that it was running it in Dual Playfield mode. On initial look, there is certainly a lot of sprite multiplexing going on with the player ship and enemy bullets. But what is interesting is the way they are driving the blitter. From what I can gather the game has a copper list containing CMOVE's that drive the chip registers for the Blitter. Are they really gaining that much speed by driving it this way? More importantly - and the reason for my post, has anyone else done a deep dive on this game? (I fell into this trap when I did Xenon 2 only to find Galahad had done a load of work on it already). It's certainly impressive for an ECS game. Geezer Update: Just reading Roondar's site here as just found it... it doesn't reference the original EAB thread though. http://web.archive.org/web/201808290...-fastbobs.html Last edited by mcgeezer; 01 August 2020 at 21:10. |
01 August 2020, 22:38 | #2 |
Inviyya Dude!
Join Date: Sep 2016
Location: Amiga Island
Posts: 2,773
|
I thought there was a discussion about it somewhere here on the EAB a few years ago...
The speed comes from using dual playfield and direct blitting into the front playfield without the need to save and restore the background then or something like that.... |
01 August 2020, 23:04 | #3 |
68k
Join Date: Sep 2005
Location: Somewhere
Posts: 828
|
@mcgeezer
Check this post and other related. http://eab.abime.net/showpost.php?p=...&postcount=167 It would be really great to have source code or resourced version of MegaTyphoon. |
02 August 2020, 00:27 | #4 | |
Registered User
Join Date: May 2018
Location: Ireland
Posts: 674
|
Quote:
Funnily the Amstrad CPC's Mission Genocide uses a similar technique, i. e. reduces total colours from 16 to 8 as 2x4 colour playfields saving on removal and restore on sprites moving. Last edited by lmimmfn; 02 August 2020 at 00:37. |
|
02 August 2020, 01:41 | #5 | ||||
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,411
|
Quote:
Quote:
Quote:
The last is that they wrote their entire animation & movement system in such a way that all coordinates & other values are always already in the correct form for blitting (so they never have to translate X,Y & animation frame numbers to Blitter Addresses/Shifts). They were quite proud of that one, they pointed it out in either the documentation or in an interview (I forget which). Quote:
Which can a problem for using sprites. Last edited by roondar; 02 August 2020 at 02:20. Reason: Combined both my replies into one post instead of making two ;) |
||||
02 August 2020, 11:31 | #6 | |
CaptainM68K-SPS France
|
Quote:
|
|
02 August 2020, 12:54 | #7 |
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,411
|
I'm not too sure of that actually.
Coin ups generally use sprites, which tend to have easy to use coordinate systems already. For example: looking at the small amount of CPS1 documentation that is out there, it seems that CPS1 sprites simply have a 2 byte X value and a 2 byte Y value for their on screen location. This means that on at least CPS-1 arcade hardware you don't have to use address translation and shift calculation like the Amiga's Blitter requires. You simply write the desired X & Y coordinates to the proper location in memory. It's precisely this address/shift translation of X&Y coordinates that Mega Typhoon optimises. |
02 August 2020, 15:22 | #8 | ||
Registered User
Join Date: Nov 2015
Location: Vaasa, Finland
Posts: 525
|
I used google translate to the german readme file on the Mega Typhoon game disk, and it says this about the "copper controlled blitter" method:
Quote:
Quote:
And when you play the game, at regular intervals the scrolling slows down for a moment, and there aren't that many enemies...I think that at these spots it builds the next background area by blitting those graphics brushes into the whole 448 * 1684 bitmap. And thanks to this method it doesn't need to blit new tile rows every 16 pixels like other shoot'em ups do, which brings another small speed increase. --- Also here is a screenshot of how the game looks in the WinUAE visual debugger: The cyan blitter operations always have those yellow dots before them, which I guess is the "copper controlling the blitter" thing. This is the only game where I have seen this sort of thing happening in the debugger window. In other games there are usually long "empty sections" between the cyan "blitter activity" zones. But here all those individual "blitter activity" zones sort of join together into one big chunk, that increases and decreases in size as the game goes on. |
||
02 August 2020, 18:03 | #9 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,474
|
Quote:
In this game the blitter objects would be better defined in the broad sense as "copper objects". CPU 'construct' the copper list dynamically as a queue where next object is already defined as a series of copper instruction that write in blitter registers. Construction is fast because 'copper objects' are simply indexed by a copper jumps. Jumps are 'near' so only COP1LCL is changed. Queues is sorted to make the blitter setup as fast as possible, guaranteeing less registers to be changed between a blit and the successive. A very clever copper usage There is one thing that I didn't properly understand .. Why slow down the copper list using three convecutive CWAIT_BFD? There is a thread somewhere on EAB (where I also wrote some code) where it was reported that with a single wait there were problems (but at the end it has not been clarified if it was an hw problem specific to that machine). Theoretically only one should be sufficient on the Agnus from the A500 onwards (but maybe here Toni should intervene to clarify). It seems strange to me, considering how much they have optimized everything, that there is no reason for this. Another curiosity: instead of the $1FE register for a CNOP the $190 register is used. For all intents and purposes, cause the way the bitplanes are used, there is no substantial difference, but it is quite particular |
|
02 August 2020, 18:22 | #10 |
Inviyya Dude!
Join Date: Sep 2016
Location: Amiga Island
Posts: 2,773
|
That's how my Sprite Multiplexer operates as well.
Sorting Y position, and then writing a dynamic copperlist. Works like a charm. I am not sure though, what the benefit would be using this system with BOBs? Is there really a speed advantage doing this? Sounds more like a pain in the ass for something that you can have much less complicated as well. |
02 August 2020, 18:43 | #11 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,474
|
Quote:
Copper is very fast at writing in blitter registers, with only two bus/mem accesses. Using 68000 code in a quick way: move.w #const,blitter_reg(custom_base), are 4 accesses to the internal bus/memory... To match the copper you should use move.w dx,(ax) but how many times you can use it? Very very rarely.. |
|
02 August 2020, 19:30 | #12 |
Lemon. / Core Design
Join Date: Mar 2016
Location: Tier 5
Posts: 1,212
|
|
02 August 2020, 19:46 | #13 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,474
|
Quote:
But for a vertical shoter this is not usually a problem |
|
02 August 2020, 21:46 | #14 | |
CaptainM68K-SPS France
|
Quote:
Mega Typhoon is very clever |
|
03 August 2020, 03:02 | #15 | |
Registered User
Join Date: Nov 2017
Location: Los Angeles
Posts: 49
|
Quote:
That doesn't detract from your general point though - while it's possible to write more efficiently from the CPU, it's difficult and you sort of have to ignore all the other overheads (like the blitter waits). I read somewhere (in another thread I think?) that the player bullets and ship were re-using the same sprites with a palette change - that would be somewhat tricky to have working efficiently in this system. |
|
03 August 2020, 08:27 | #16 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,474
|
Not from speed perspective (even if you are correct from 'physical' accesses perspective, I've simplified to grasp the concept )
Take this CMOVE, a blitter operation start: dc.w $0058,$0041 How many cycles it requires? Only 4 cck cycles/8 68k cycles (for the two mem fetches for copper command/param), because the write is 'hidden' and only on the RGA bus. The same MOVE with CPU code, using the aforementioned move.w #const,blitter_reg(custom_base): dc.w $3d7c,$0041,$0058 Here you have 3 read (for the opcode/param fetches) and 1 write (to the mem/custom), so 8 cck cycles/16 68k cycles, because the write is 'external'! Well this is also oversimplified, the copper/CPU/blitter concurrency on buses is not so trivial, but it is only to make it clear that in most real cases, at least on an architecture like a bare A500, copper is much faster for this kind of operations (obviously if you already have the copper list built in memory, as in the case of the game in question). |
03 August 2020, 13:20 | #17 | |
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,411
|
Quote:
It seems to me that any discussion on Copper blitting (as a way to gain performance) kind of glosses over the updates to the Copper instructions needed between frames. Even if we assume the Copperlist itself can stay 100% static and no queue entries are ever added or removed (which doesn't seem quite right to me if we want to be able to cover every situation), you still have to change the data in the Copperlist in order to move or animate Bobs. Obviously you won't need to update all of the Copper instructions every frame, but it still does cost some CPU time to update the Copperlist itself between frames. The question then becomes: how much raster time is lost doing this? I've never really seen a good answer to this so I'd love to hear your (or anyone else's ) view on this. |
|
03 August 2020, 15:16 | #18 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,474
|
Quote:
So this is a difficult answer, real tests should be made and statistics collected. In any case, I try to make rough calculations. I take the image of message #8 and I observe the DMA slots used for the generic blitting* of an object (very simple to see because it is preceded by an unmistakable multiple wait sequence, really 3 CWAIT_BFD). [*this is a 'bad case', for clear/other ops I can optimize it] There are 11 writes on the blitter registers, plus the setting for new copper code execution position (the next object in the queue) and the CJMP for a total of 13 copper 'operations'. Now I weigh the writing with the CPU as 100 and the one with the copper 50 (based on assumpitions in my previous messages). To the copper ops weight I must add the writing on the clist by the CPU (therefore 100). But how many writes do I need on the list? I can only estimate an average, certainly 1 for the jump setting plus those for the registers to be updated. If I am lucky 0 (totally static object for the frame), otherwise even 5 or 6 .. Let's say that on average a couple for frame are enough, I will have: CPU = 11*100 = 1100, COPPER = 50*13 + 100*(2+1) = 950. I have gained around 15%. How did the programmers then reported a gain that can reach 30%? Well, if you read they wrote 'compared to the conventional blitter finished interrupt chaining', that is not the way we want to use for an optimized blitter usage . Correct me if I wrote too big nonsense, they are just quick guesses without thinking much about it. |
|
03 August 2020, 15:51 | #19 |
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,411
|
Right, it is indeed hard to be exact without measurements. Perhaps I should try it myself once to see how it goes. Thanks for your views though, they certainly sound reasonable.
|
03 August 2020, 16:25 | #20 | |
Registered User
Join Date: Aug 2018
Location: Untergrund/Germany
Posts: 408
|
Quote:
For my current game i also use blitting from the copper list and i ran into these 'single wait' blit issues on real hardware. My observations: - Single wait doesn't work always on my A500 but two waits work always. - On A1200 a single wait is enough - Using a single wait runs always fine in WinUae My experience: A500 + single wait does work under certain situations (as stated in EAB forums) but it is depending on channel usage and number of blits. I never needed 3 waits. |
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Mega Typhoon ECS only? | Photon | HOL suggestions and feedback | 8 | 16 April 2020 21:47 |
EAB/Lemon Super League 2017: Round 4 - Mega Typhoon | Graham Humphrey | EAB's competition | 50 | 09 April 2017 11:01 |
Working copy of Mega Typhoon ECS game? | ImmortalA1000 | request.Old Rare Games | 9 | 04 February 2013 06:38 |
Mega Typhoon Trainer Version - Working! | plasmatron | request.Old Rare Games | 1 | 03 July 2011 23:52 |
Mega Typhoon | haynor666 | HOL contributions | 1 | 19 August 2008 00:37 |
|
|