English Amiga Board - View Single Post - The Amiga 1000 could of done a game like Wolfenstein in 1985

AndNN · 28 October 2016, 18:23

Sorry about the title - I could not resist

For the the past month or so I've been doing viable tests to see if a game like Wolfenstein is possible on an Amiga with only a 68000 cpu.

I like the idea that it was possible for a game like that to be released in 1985, the year the Amiga 1000 was launched - if it was released in 1985 I like to think it would of been a game changer and the landscape of the PC world today would be different all together. So this discussion will basically use the Amiga 1000 and what it had to offer with upgrades to memory only... This means slow memory at the time did not exist as that was an Amiga 500 only thing. With that in mind the A1000 could have the chip memory expanded from 256k to 512k and up to 8 megs worth of fast. Because slow memory does not exist yet then fast memory use does not have any stigma attached to it and can be exploited to the full.

The first batches of the Amiga 1000 did not have EHB mode but was later added and users that did not have it could upgrade. With that in mind for the discussion we will use 2 graphics mode at low res - 5 bit-planes and 6 bit-planes (EHB).

Now that the rules are set what have my tests proved that is possible with the most taxing mode which is EHB. We also need 3 frames to process a full screen to get good bandwidth and parallel processing with the blitter.

Here are a list of things that it cannot do.

1. Code can never run from chip memory
2. 1x1 pixels are out of the question - not enough bandwidth.
3. 2x1 pixels. There's bandwidth there but drawing scaled vertical lines and merging for a word write destroys the bandwidth. If every vertical line combination of a 2x1 pixel can be calculated then it is possible. Unfortunately there's not enough memory - not even close to do that - to pre-calculate.

So realistically we can only use 2x2 pixels. I took a look at Gloom Deluxe and changed the pixels to 2x2 and it looks acceptable - and lets face it to see a solid world at that resolution moving around in 3D with texture maps would of been mind blowing in 1985.

Ok, the only way to get textures on the screen in the simplest and quickest way is to use ray-casting. This means we need scaled vertical lines. All this can be pre-calculated for the cost of memory by using sequences of code like:

Scaling up

move.b (a0)+,d0
move.b d0,160*1(a1)
move.b d0,160*2(a1)

A 1x1 mapping ratio

move.b (a0)+,160*1(a1)

Scaling down

move.b 64*1(a0),160*1(a1)

By doing this our reads per texture size of 64x64 can only be 64 at a max for a line. And the writes can only be the length of the line. By doing this we make sure that bus usage is at the bare minimum.

So we want this game to run on NTSC and PAL and Wolfenstein used a 320 x 200 screen. But in actual screen space with out the panel at the bottom was 320 x 176. For our screen of 2x2 pixels we need 160x88 which comes to 14080 bytes. With copper line doubling we can get the y to go to 176. The x gets doubled by encoding the bytes in a special way for the C2P conversion routine... more of this later but the blitter will be used for the C2P conversion.

Before I discuss how much bandwidth it takes to fill a whole screen with pixels we need to set out what DMA usage we actually need. What we need is:

1. Memory refresh
2. Bit-planes
3. Sound
4. Copper
5. Blitter

What we don't need is:

1. Sprites when we render
2. Disk access when we render

By removing sprites and disk DMA we gain 19 DMA slots back per line which equates to nearly 1 bit-plane which needs 20 DMA slots. This means we are effectively running only 5 bit-plane accesses when we are using 6 bit-planes for EHB mode. This means the blitter can do more per frame in the long run.

There's two ways we can get the chunky screen to chip memory and each has it's pros and cons.

The first way is to store the chunky screen in fast memory and do all your reads and writes there. Then just transfer that to chip men by using a movem copy. The transfer speed is around half a frame to do this. This leaves 2.5 frames to render with. The half a frame transfer means that the blitter cannot be used when this happening.

The second way is to store the chunky screen in chip memory. This means that all bus writes go to chip memory for the render but of course chip memory is slower than fast. But this means you remove the movem copy and the blitter can run for 3 frames.

So which way is quicker for blitter usage and frame time.

Both methods can write a whole worth of pixels to the screen in 2 frames. Using the fast memory technique is faster of course but you pay back with the copy and lose a bit of blitter time. Writing to chip you lose out on the writes on the bus but you gain back because you don't need to copy. Surprisingly both methods are comparable but writing to chip gives you extra blitter time.

So having the chunky screen in chip is the way to go. This chunky screen needs to be double buffered. What we have to remember here is the code is still running from fast memory and all reads come from fast as well. The conclusion is the CPU is not that much crippled with DMA contention.

So how do we do the C2P conversion? This job will be done with the blitter. The blitter can process the next frames worth of data while the previous frame is displayed. This means there will be a 1 frame lag (3 frames really because we will be running at 17fps on PAL and 20fps NTSC) but that really is not that much of a problem. If we read the inputs every v-sync and store them we can process the game update to compensate for the lag.

This will be all synchronised by waiting for the 3rd frame and line 176. When that condition happens we change the copper pointer so that the bit-plane pointers will show the new screen, and then fire off the first blitter process chain. Then once the first blit finishes it will trigger a blitter finish interrupt and the next blitter process can fire off. This blitter chain will process over 3 frames and must end before line 176 on the 3rd.

The blitter tests I did over 3 frames shows that the blitter has enough time to convert to planer and time left over for other usages. This was the first thing I tested and was shocked that an OCS chipset with only a 68000 could do this. By using the blitter and the code in fast memory makes all this possible.

Now we need a nice why to encode the byte for the C2P conversion:

12334566

The numbers represent bit-planes.

The chunky screen pixel x order will be

0,8,1,9,2,10,3,11,4,12,5,13,6,14,7,15

Each number represents a byte. This way a word will mean that it contains 2 pixels at 8 pixels apart. For a ray casting engine that only draws vertical means that this is not an issue.

With this order we get a word that looks like this:
12334566 12334566

Doing the usual merge passes with the blitter we get these merge patterns:

12124545 12124545 1 pass
33336666 33336666 1 pass

12121212 12121212 1 pass
45454545 45454545 1 pass
33333333 33333333 converted bit-plane (double pixels) 1 pass
66666666 66666666 converted bit-plane (double pixels) 1 pass

To get the double pixels for bit-planes 1,2,4 and 5 we just mask out the unwanted bits and shift one of the channels by a pixel like this:

10101010 10101010
01010101 01010101

11111111 11111111 converted bit-plane 1 pass
22222222 22222222 converted bit-plane 1 pass
44444444 44444444 converted bit-plane 1 pass
55555555 55555555 converted bit-plane 1 pass

This breaks down to 10 passes. These calculations are in words using PAL timings for blitter times and don't forget we use one of the channels as a constant so the DMA channel for C does not need to be enabled - this means the blitter uses 6 cycles:

(3520 * 2 passes * 6 cycles / 7.09) / 1000 = roughly 5.957 m/s
(1760 * 8 passes * 6 cycles / 7.09) / 1000 = roughly 11.915 m/s

Which comes to 17.872 m/s. Even with DMA contention to be considered this can easily be calculated over 3 frames at 17fps on PAL.

As you can see, in theory, the numbers work out that it is possible to have a 64 colour low res screen using 2x2 pixels, having the code run in fast memory and using the blitter spread over 3 frames that it is possible to do a game like wolfenstein on a memory expanded Amiga 1000. And what is mind blowing is that this could of been done in 1985

Something released at that time would of been a game changer and certainly computer history would of been different!

I have not really programmed the Amiga since the mid 90s. So it was really good fun working all this out in the last couple of months. Next on my list is to get the theory working in a little demo - not sure how long it will take as I work full time. But as I said it's been fun doing this as a little hobby project.

Note:

I have done these tests using an emulator (winUAE) with the accuracy set to high. What I've read into this is that the 68000 emulation is pretty much spot on and the custom chip timings are spot on. If not then all this theory needs to be tested on a real machine which I do not have at the present time. But if it's more or less accurate then all this is possible.

28 October 2016, 18:23	#1
AndNN Registered User Join Date: Oct 2016 Location: Australia Posts: 58	The Amiga 1000 could of done a game like Wolfenstein in 1985 - shock! Sorry about the title - I could not resist For the the past month or so I've been doing viable tests to see if a game like Wolfenstein is possible on an Amiga with only a 68000 cpu. I like the idea that it was possible for a game like that to be released in 1985, the year the Amiga 1000 was launched - if it was released in 1985 I like to think it would of been a game changer and the landscape of the PC world today would be different all together. So this discussion will basically use the Amiga 1000 and what it had to offer with upgrades to memory only... This means slow memory at the time did not exist as that was an Amiga 500 only thing. With that in mind the A1000 could have the chip memory expanded from 256k to 512k and up to 8 megs worth of fast. Because slow memory does not exist yet then fast memory use does not have any stigma attached to it and can be exploited to the full. The first batches of the Amiga 1000 did not have EHB mode but was later added and users that did not have it could upgrade. With that in mind for the discussion we will use 2 graphics mode at low res - 5 bit-planes and 6 bit-planes (EHB). Now that the rules are set what have my tests proved that is possible with the most taxing mode which is EHB. We also need 3 frames to process a full screen to get good bandwidth and parallel processing with the blitter. Here are a list of things that it cannot do. 1. Code can never run from chip memory 2. 1x1 pixels are out of the question - not enough bandwidth. 3. 2x1 pixels. There's bandwidth there but drawing scaled vertical lines and merging for a word write destroys the bandwidth. If every vertical line combination of a 2x1 pixel can be calculated then it is possible. Unfortunately there's not enough memory - not even close to do that - to pre-calculate. So realistically we can only use 2x2 pixels. I took a look at Gloom Deluxe and changed the pixels to 2x2 and it looks acceptable - and lets face it to see a solid world at that resolution moving around in 3D with texture maps would of been mind blowing in 1985. Ok, the only way to get textures on the screen in the simplest and quickest way is to use ray-casting. This means we need scaled vertical lines. All this can be pre-calculated for the cost of memory by using sequences of code like: Scaling up move.b (a0)+,d0 move.b d0,1601(a1) move.b d0,1602(a1) A 1x1 mapping ratio move.b (a0)+,1601(a1) Scaling down move.b 641(a0),1601(a1) By doing this our reads per texture size of 64x64 can only be 64 at a max for a line. And the writes can only be the length of the line. By doing this we make sure that bus usage is at the bare minimum. So we want this game to run on NTSC and PAL and Wolfenstein used a 320 x 200 screen. But in actual screen space with out the panel at the bottom was 320 x 176. For our screen of 2x2 pixels we need 160x88 which comes to 14080 bytes. With copper line doubling we can get the y to go to 176. The x gets doubled by encoding the bytes in a special way for the C2P conversion routine... more of this later but the blitter will be used for the C2P conversion. Before I discuss how much bandwidth it takes to fill a whole screen with pixels we need to set out what DMA usage we actually need. What we need is: 1. Memory refresh 2. Bit-planes 3. Sound 4. Copper 5. Blitter What we don't need is: 1. Sprites when we render 2. Disk access when we render By removing sprites and disk DMA we gain 19 DMA slots back per line which equates to nearly 1 bit-plane which needs 20 DMA slots. This means we are effectively running only 5 bit-plane accesses when we are using 6 bit-planes for EHB mode. This means the blitter can do more per frame in the long run. There's two ways we can get the chunky screen to chip memory and each has it's pros and cons. The first way is to store the chunky screen in fast memory and do all your reads and writes there. Then just transfer that to chip men by using a movem copy. The transfer speed is around half a frame to do this. This leaves 2.5 frames to render with. The half a frame transfer means that the blitter cannot be used when this happening. The second way is to store the chunky screen in chip memory. This means that all bus writes go to chip memory for the render but of course chip memory is slower than fast. But this means you remove the movem copy and the blitter can run for 3 frames. So which way is quicker for blitter usage and frame time. Both methods can write a whole worth of pixels to the screen in 2 frames. Using the fast memory technique is faster of course but you pay back with the copy and lose a bit of blitter time. Writing to chip you lose out on the writes on the bus but you gain back because you don't need to copy. Surprisingly both methods are comparable but writing to chip gives you extra blitter time. So having the chunky screen in chip is the way to go. This chunky screen needs to be double buffered. What we have to remember here is the code is still running from fast memory and all reads come from fast as well. The conclusion is the CPU is not that much crippled with DMA contention. So how do we do the C2P conversion? This job will be done with the blitter. The blitter can process the next frames worth of data while the previous frame is displayed. This means there will be a 1 frame lag (3 frames really because we will be running at 17fps on PAL and 20fps NTSC) but that really is not that much of a problem. If we read the inputs every v-sync and store them we can process the game update to compensate for the lag. This will be all synchronised by waiting for the 3rd frame and line 176. When that condition happens we change the copper pointer so that the bit-plane pointers will show the new screen, and then fire off the first blitter process chain. Then once the first blit finishes it will trigger a blitter finish interrupt and the next blitter process can fire off. This blitter chain will process over 3 frames and must end before line 176 on the 3rd. The blitter tests I did over 3 frames shows that the blitter has enough time to convert to planer and time left over for other usages. This was the first thing I tested and was shocked that an OCS chipset with only a 68000 could do this. By using the blitter and the code in fast memory makes all this possible. Now we need a nice why to encode the byte for the C2P conversion: 12334566 The numbers represent bit-planes. The chunky screen pixel x order will be 0,8,1,9,2,10,3,11,4,12,5,13,6,14,7,15 Each number represents a byte. This way a word will mean that it contains 2 pixels at 8 pixels apart. For a ray casting engine that only draws vertical means that this is not an issue. With this order we get a word that looks like this: 12334566 12334566 Doing the usual merge passes with the blitter we get these merge patterns: 12124545 12124545 1 pass 33336666 33336666 1 pass 12121212 12121212 1 pass 45454545 45454545 1 pass 33333333 33333333 converted bit-plane (double pixels) 1 pass 66666666 66666666 converted bit-plane (double pixels) 1 pass To get the double pixels for bit-planes 1,2,4 and 5 we just mask out the unwanted bits and shift one of the channels by a pixel like this: 10101010 10101010 01010101 01010101 11111111 11111111 converted bit-plane 1 pass 22222222 22222222 converted bit-plane 1 pass 44444444 44444444 converted bit-plane 1 pass 55555555 55555555 converted bit-plane 1 pass This breaks down to 10 passes. These calculations are in words using PAL timings for blitter times and don't forget we use one of the channels as a constant so the DMA channel for C does not need to be enabled - this means the blitter uses 6 cycles: (3520 2 passes * 6 cycles / 7.09) / 1000 = roughly 5.957 m/s (1760 * 8 passes * 6 cycles / 7.09) / 1000 = roughly 11.915 m/s Which comes to 17.872 m/s. Even with DMA contention to be considered this can easily be calculated over 3 frames at 17fps on PAL. As you can see, in theory, the numbers work out that it is possible to have a 64 colour low res screen using 2x2 pixels, having the code run in fast memory and using the blitter spread over 3 frames that it is possible to do a game like wolfenstein on a memory expanded Amiga 1000. And what is mind blowing is that this could of been done in 1985 Something released at that time would of been a game changer and certainly computer history would of been different! I have not really programmed the Amiga since the mid 90s. So it was really good fun working all this out in the last couple of months. Next on my list is to get the theory working in a little demo - not sure how long it will take as I work full time. But as I said it's been fun doing this as a little hobby project. Note: I have done these tests using an emulator (winUAE) with the accuracy set to high. What I've read into this is that the 68000 emulation is pretty much spot on and the custom chip timings are spot on. If not then all this theory needs to be tested on a real machine which I do not have at the present time. But if it's more or less accurate then all this is possible.