28 October 2016, 18:23 | #1 |
Registered User
Join Date: Oct 2016
Location: Australia
Posts: 58
|
The Amiga 1000 could of done a game like Wolfenstein in 1985 - shock!
Sorry about the title - I could not resist
For the the past month or so I've been doing viable tests to see if a game like Wolfenstein is possible on an Amiga with only a 68000 cpu. I like the idea that it was possible for a game like that to be released in 1985, the year the Amiga 1000 was launched - if it was released in 1985 I like to think it would of been a game changer and the landscape of the PC world today would be different all together. So this discussion will basically use the Amiga 1000 and what it had to offer with upgrades to memory only... This means slow memory at the time did not exist as that was an Amiga 500 only thing. With that in mind the A1000 could have the chip memory expanded from 256k to 512k and up to 8 megs worth of fast. Because slow memory does not exist yet then fast memory use does not have any stigma attached to it and can be exploited to the full. The first batches of the Amiga 1000 did not have EHB mode but was later added and users that did not have it could upgrade. With that in mind for the discussion we will use 2 graphics mode at low res - 5 bit-planes and 6 bit-planes (EHB). Now that the rules are set what have my tests proved that is possible with the most taxing mode which is EHB. We also need 3 frames to process a full screen to get good bandwidth and parallel processing with the blitter. Here are a list of things that it cannot do. 1. Code can never run from chip memory 2. 1x1 pixels are out of the question - not enough bandwidth. 3. 2x1 pixels. There's bandwidth there but drawing scaled vertical lines and merging for a word write destroys the bandwidth. If every vertical line combination of a 2x1 pixel can be calculated then it is possible. Unfortunately there's not enough memory - not even close to do that - to pre-calculate. So realistically we can only use 2x2 pixels. I took a look at Gloom Deluxe and changed the pixels to 2x2 and it looks acceptable - and lets face it to see a solid world at that resolution moving around in 3D with texture maps would of been mind blowing in 1985. Ok, the only way to get textures on the screen in the simplest and quickest way is to use ray-casting. This means we need scaled vertical lines. All this can be pre-calculated for the cost of memory by using sequences of code like: Scaling up move.b (a0)+,d0 move.b d0,160*1(a1) move.b d0,160*2(a1) A 1x1 mapping ratio move.b (a0)+,160*1(a1) Scaling down move.b 64*1(a0),160*1(a1) By doing this our reads per texture size of 64x64 can only be 64 at a max for a line. And the writes can only be the length of the line. By doing this we make sure that bus usage is at the bare minimum. So we want this game to run on NTSC and PAL and Wolfenstein used a 320 x 200 screen. But in actual screen space with out the panel at the bottom was 320 x 176. For our screen of 2x2 pixels we need 160x88 which comes to 14080 bytes. With copper line doubling we can get the y to go to 176. The x gets doubled by encoding the bytes in a special way for the C2P conversion routine... more of this later but the blitter will be used for the C2P conversion. Before I discuss how much bandwidth it takes to fill a whole screen with pixels we need to set out what DMA usage we actually need. What we need is: 1. Memory refresh 2. Bit-planes 3. Sound 4. Copper 5. Blitter What we don't need is: 1. Sprites when we render 2. Disk access when we render By removing sprites and disk DMA we gain 19 DMA slots back per line which equates to nearly 1 bit-plane which needs 20 DMA slots. This means we are effectively running only 5 bit-plane accesses when we are using 6 bit-planes for EHB mode. This means the blitter can do more per frame in the long run. There's two ways we can get the chunky screen to chip memory and each has it's pros and cons. The first way is to store the chunky screen in fast memory and do all your reads and writes there. Then just transfer that to chip men by using a movem copy. The transfer speed is around half a frame to do this. This leaves 2.5 frames to render with. The half a frame transfer means that the blitter cannot be used when this happening. The second way is to store the chunky screen in chip memory. This means that all bus writes go to chip memory for the render but of course chip memory is slower than fast. But this means you remove the movem copy and the blitter can run for 3 frames. So which way is quicker for blitter usage and frame time. Both methods can write a whole worth of pixels to the screen in 2 frames. Using the fast memory technique is faster of course but you pay back with the copy and lose a bit of blitter time. Writing to chip you lose out on the writes on the bus but you gain back because you don't need to copy. Surprisingly both methods are comparable but writing to chip gives you extra blitter time. So having the chunky screen in chip is the way to go. This chunky screen needs to be double buffered. What we have to remember here is the code is still running from fast memory and all reads come from fast as well. The conclusion is the CPU is not that much crippled with DMA contention. So how do we do the C2P conversion? This job will be done with the blitter. The blitter can process the next frames worth of data while the previous frame is displayed. This means there will be a 1 frame lag (3 frames really because we will be running at 17fps on PAL and 20fps NTSC) but that really is not that much of a problem. If we read the inputs every v-sync and store them we can process the game update to compensate for the lag. This will be all synchronised by waiting for the 3rd frame and line 176. When that condition happens we change the copper pointer so that the bit-plane pointers will show the new screen, and then fire off the first blitter process chain. Then once the first blit finishes it will trigger a blitter finish interrupt and the next blitter process can fire off. This blitter chain will process over 3 frames and must end before line 176 on the 3rd. The blitter tests I did over 3 frames shows that the blitter has enough time to convert to planer and time left over for other usages. This was the first thing I tested and was shocked that an OCS chipset with only a 68000 could do this. By using the blitter and the code in fast memory makes all this possible. Now we need a nice why to encode the byte for the C2P conversion: 12334566 The numbers represent bit-planes. The chunky screen pixel x order will be 0,8,1,9,2,10,3,11,4,12,5,13,6,14,7,15 Each number represents a byte. This way a word will mean that it contains 2 pixels at 8 pixels apart. For a ray casting engine that only draws vertical means that this is not an issue. With this order we get a word that looks like this: 12334566 12334566 Doing the usual merge passes with the blitter we get these merge patterns: 12124545 12124545 1 pass 33336666 33336666 1 pass 12121212 12121212 1 pass 45454545 45454545 1 pass 33333333 33333333 converted bit-plane (double pixels) 1 pass 66666666 66666666 converted bit-plane (double pixels) 1 pass To get the double pixels for bit-planes 1,2,4 and 5 we just mask out the unwanted bits and shift one of the channels by a pixel like this: 10101010 10101010 01010101 01010101 11111111 11111111 converted bit-plane 1 pass 22222222 22222222 converted bit-plane 1 pass 44444444 44444444 converted bit-plane 1 pass 55555555 55555555 converted bit-plane 1 pass This breaks down to 10 passes. These calculations are in words using PAL timings for blitter times and don't forget we use one of the channels as a constant so the DMA channel for C does not need to be enabled - this means the blitter uses 6 cycles: (3520 * 2 passes * 6 cycles / 7.09) / 1000 = roughly 5.957 m/s (1760 * 8 passes * 6 cycles / 7.09) / 1000 = roughly 11.915 m/s Which comes to 17.872 m/s. Even with DMA contention to be considered this can easily be calculated over 3 frames at 17fps on PAL. As you can see, in theory, the numbers work out that it is possible to have a 64 colour low res screen using 2x2 pixels, having the code run in fast memory and using the blitter spread over 3 frames that it is possible to do a game like wolfenstein on a memory expanded Amiga 1000. And what is mind blowing is that this could of been done in 1985 Something released at that time would of been a game changer and certainly computer history would of been different! I have not really programmed the Amiga since the mid 90s. So it was really good fun working all this out in the last couple of months. Next on my list is to get the theory working in a little demo - not sure how long it will take as I work full time. But as I said it's been fun doing this as a little hobby project. Note: I have done these tests using an emulator (winUAE) with the accuracy set to high. What I've read into this is that the 68000 emulation is pretty much spot on and the custom chip timings are spot on. If not then all this theory needs to be tested on a real machine which I do not have at the present time. But if it's more or less accurate then all this is possible. |
28 October 2016, 21:39 | #2 |
Registered User
Join Date: Feb 2011
Location: Italy/Rome
Posts: 2,344
|
All this is fascinating, but what about timining to do ray casting, collisions, and game logic? How much time do we have even with fast mem?
|
28 October 2016, 22:17 | #3 |
Registered User
Join Date: Oct 2016
Location: Australia
Posts: 58
|
In theory we should have about 1 frame for all that. The cpu will be doing reads and writes to and from fast memory so it will be running at full speed with no DMA contention.
The ray casting side can mostly run from very large look-up tables... 8 meg of fast memory will be very handy indeed. Collision is basically a 2D grid problem because that's what Wolfenstein is. I guess it's pretty similar to games like the Chaos Engine or Alien Breed. I'm just guessing here with assumptions... these points will need investigation on the feasibility of all this. But I think 1 frame or less will be ok for what breaks down to a 2D game problem. But the real question is how fast can the sprites be? I have some really neat ideas on how this should be done. That will be the next viable tests I do. |
30 October 2016, 14:21 | #4 |
Registered User
Join Date: Feb 2011
Location: Italy/Rome
Posts: 2,344
|
I woul be interesting to do some tests
|
30 October 2016, 14:41 | #5 |
Registered User
Join Date: Jun 2008
Location: Boston USA
Posts: 466
|
|
30 October 2016, 15:38 | #6 |
Registered User
Join Date: Sep 2007
Location: Stockholm
Posts: 4,357
|
If you take the price and availability of fastmem in 1985 into account, this discussion is purely academical.
|
30 October 2016, 18:33 | #7 | |
Registered User
Join Date: Oct 2016
Location: Australia
Posts: 58
|
Quote:
Every system has the killer app that sells hardware... look what Doom did for PC graphic card sales. It just makes me wonder if someone released something like Wolfenstein before the A500 launch. And we are talking 1985 here so that would of been mind blowing to see textures moving around in 3D. I think if a game like that was released in 1985 then people would of bought fast memory no matter what the cost of it was... Just to play a game that was miles ahead graphically to any thing else. It's fun to think what might of been. But as I said I agree that fast memory was not really a standard on the Amiga. The whole purpose of this thread is to prove it can be done. I know the Atari ST got a port of Wolfenstein but with 16 colours and the fps been not that good that I know the Amiga can do so much better. When I actually say Wolfenstein... I don't want to really port that - I think it will be fun to do something original. On the testing front I've written a blitter chain system using the blitter interrupt and have the C2P working. So that is basically taking my byte format for pixels and converts to 6 bit-planes for the EHB mode and of course 64 colours. If I put the code into chip memory the whole system runs very slowly and TBH would not be worth doing... Fast memory is the only why this can work and give a decent frame rate. On a side note I've just bought an A600 off eBay for testing on real hardware. |
|
31 October 2016, 10:16 | #8 | |
Registered User
Join Date: Dec 2014
Location: germany
Posts: 439
|
Quote:
*Source: http://aminet.net/package/util/misc/DumpA1000BootROM Last edited by chb; 31 October 2016 at 10:27. |
|
31 October 2016, 10:29 | #9 |
Registered User
Join Date: Oct 2016
Location: Australia
Posts: 58
|
I'm actually starting to change my mind on where he chunky buffer should be stored. I did some experiments where I did some reads from chip memory with the code in fast and things get slower. I want to do reads on the chunky buffer for transparency effects.
The C2P conversion takes around 1.5 frames... which means I can use the blitter for other graphic updates. Because I've decided to store the chunky buffer in fast now the movem copy to chip takes about .5 frames. That leaves about 1 frame for the blitter for other things. Even though I loose .5 frames for the blitter it still works out about the same because the blitter takes less time to complete tasks as it's not yielding cycles for the CPU. Doing simple test of writing to the chunky buffer in fast memory I can update a whole screens worth of pixels in just over 1.5 frames and with the movem copy brings it to just slightly over 2 frames. That means there's nearly 1 frame left for the CPU to process the game logic and ray casting. It looks pretty promising so far. |
31 October 2016, 11:24 | #10 |
Registered User
Join Date: Feb 2011
Location: Italy/Rome
Posts: 2,344
|
16 colors for 1x1 would suffice. Why use 2x2?
|
31 October 2016, 11:51 | #11 |
Registered User
Join Date: Oct 2016
Location: Australia
Posts: 58
|
Because we are are using fast memory the DMA will have no effect on the speed of the CPU. It's more about what the 68000 running from fast can write into fast memory. A byte is still used for the chunky buffer so by cutting the colour depth down will not actually speed the writes up because you still process the same amount of data. And in the case of 1x1 pixels you x4 the amount of data to shovel around. An example of this is the copy to chip takes .5 frames to happen. So 2x1 pixels will take a frame and 1x1 pixels will take 2 frames... it would not be worth doing.
But saying all that it would be nice to detect CPU speed and adapt for better quality. But the whole purpose of this exercise is to get something that would work on a Amiga 1000 with fast memory at a nice frame rate. Edit: Also to note is the blitter time. In realistic terms the more data we need for the chunky buffer means the more the blitter has to convert. The hard limit on this is that over 3 frames the blitter can convert 2x1 pixels for 6 bit-planes. Using 1x1 pixels the blitter will not have enough time for the conversion. The test I've been doing is just to see what bandwidth is available for the blitter as well as the 68000 running in fast. The reason why fast memory helps is because we can use parallel processing - the blitter and the 68000 can run in tandem with no DMA contentions. Last edited by AndNN; 31 October 2016 at 12:24. |
31 October 2016, 14:01 | #12 |
Registered User
Join Date: Feb 2011
Location: Italy/Rome
Posts: 2,344
|
with only 16 colors you'll have 8 pixel into 32 bit insted 4. Not bad. You can almost double the pixel written into chip and fast memory
|
31 October 2016, 14:14 | #13 |
Registered User
Join Date: Nov 2010
Location: South Wales
Age: 47
Posts: 944
|
|
31 October 2016, 15:59 | #14 |
Phone Homer
Join Date: Jun 2006
Location: 5150
Posts: 5,816
|
Ask the FBI
I couldn't find the video. Last edited by Retro1234; 31 October 2016 at 16:06. |
31 October 2016, 16:02 | #15 |
Registered User
Join Date: Feb 2011
Location: Italy/Rome
Posts: 2,344
|
we could write down some code, and do some real test. Target i would say a500 + 1-2 mb of fast-mem. What do you think?
|
31 October 2016, 22:49 | #16 |
Registered User
Join Date: Dec 2010
Location: Norway
Posts: 833
|
Don't forget to tick 'cycle exact' emulation in WinUAE to get realistic speed
|
01 November 2016, 03:05 | #17 | |
Registered User
Join Date: Oct 2016
Location: Australia
Posts: 58
|
Quote:
Because you are now using nibbles you will have to merge them together. On my first post I mentioned the compiled sequences to draw vertical lines... as well as doing that for say even lines, for odd lines you will have to merge with ors: move.b (a0)+,d0 or d0,160(a1) or d0,160(a1) Which means you will need to still write double the amount. Admittedly the chunky buffer copy will be the same speed for 2x1 pixels to 2x2 pixels. Also, your texture data will need different versions for odd and even. There is a way to cut the writes down in half which I mentioned in my first post and that is to process two vertical lines together and do the or on the cpu and write once for 2 pixels. Because to make this as fast as possible each possible vertical line will consist of a raw sequence of instructions. This gets complicated for 2 lines at once. For every possible line, we will need a version for odd and even lines together. That is a crap load of memory that the Amiga does not have. But there's no harm in doing tests for this to rule what options are available. And if anybody wants to have a go at testing you are welcome. I myself will be going down the 2x2 pixel route using 64 colours and 32 colours because my own testing proved that the bandwidth is comfortable spread over 3 frames. And if I want to go full screen PAL then I will have options to do that. Who says I have not got a time machine... once this is proven I'm going back in time to change history |
|
01 November 2016, 06:57 | #18 |
Registered User
Join Date: Feb 2011
Location: Italy/Rome
Posts: 2,344
|
When you have to convert chuncky into planar you'll see the difference on speed, and what about having only 4 planes enabled? You'll spare a lot of free dma cycles wich will speed up, copy from fast to chip mem
About merge operation. I think there is a way to arrange buffer in order to speed up this operations too |
01 November 2016, 10:17 | #19 |
Total Chaos forever!
Join Date: Aug 2007
Location: Waterville, MN, USA
Age: 49
Posts: 2,200
|
The Copper can stretch pixels vertically by alternating modulo register values so the chunky to planar conversion doesn't have to stretch in that direction.
|
01 November 2016, 10:33 | #20 | |
Registered User
Join Date: Oct 2016
Location: Australia
Posts: 58
|
Quote:
So all that will probably work really well but it depends on a clever way to make the merge not double the writes. If there is clever way to do that then you can get 64 colours with 2x1 pixels. This is possible because the instruction timings are the same for byte or words. So instead of updating the chunky buffer with bytes you use words. So you would do 2 reads, let the cpu merge, then write out a word. That method I mentioned needs so much memory that there is not enough when you draw vertically with the pre-compiled drawing instructions. If 16 colours at 2x1 pixels is possible then that is just an extra mode added to the 64 colours at 2x2 pixels. The Atari ST version of Wolfenstein uses 16 colours with 2x2 pixels and the framerate is not that good (don't get me wrong the guy that did that went through hoops to get that to work) but the Amiga can do so much better. I know there are demos that have done 2x1 pixels at full screen but there will not be enough time to run the cpu on game and ray casting logic so the framerate runs at 17fps on PAL and 20fps in NTSC... all the testing is just finding out what can actually be done without committing. |
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Block Shock/Shock Wave - The Last Chance | Retro-Nerd | Games images which need to be WHDified | 11 | 04 May 2012 23:31 |
Amiga 1000 Inboard 1000 by Spirit - help needed | amiman99 | support.Hardware | 0 | 13 December 2011 04:50 |
Very old damaged EA game disks (1985/1986): looking for ADF replacement... | Hacky | request.Old Rare Games | 19 | 20 October 2008 22:56 |
Connecting Amiga 1000 DKB Insider to 1000 without daughterboard | huggies | support.Hardware | 0 | 05 October 2008 11:10 |
What Pre-1985 Video Game Character Am I? | Kodoichi | Nostalgia & memories | 12 | 03 January 2002 17:46 |
|
|