English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 28 October 2016, 18:23   #1
AndNN
Registered User
 
Join Date: Oct 2016
Location: Australia
Posts: 58
The Amiga 1000 could of done a game like Wolfenstein in 1985 - shock!

Sorry about the title - I could not resist

For the the past month or so I've been doing viable tests to see if a game like Wolfenstein is possible on an Amiga with only a 68000 cpu.

I like the idea that it was possible for a game like that to be released in 1985, the year the Amiga 1000 was launched - if it was released in 1985 I like to think it would of been a game changer and the landscape of the PC world today would be different all together. So this discussion will basically use the Amiga 1000 and what it had to offer with upgrades to memory only... This means slow memory at the time did not exist as that was an Amiga 500 only thing. With that in mind the A1000 could have the chip memory expanded from 256k to 512k and up to 8 megs worth of fast. Because slow memory does not exist yet then fast memory use does not have any stigma attached to it and can be exploited to the full.

The first batches of the Amiga 1000 did not have EHB mode but was later added and users that did not have it could upgrade. With that in mind for the discussion we will use 2 graphics mode at low res - 5 bit-planes and 6 bit-planes (EHB).

Now that the rules are set what have my tests proved that is possible with the most taxing mode which is EHB. We also need 3 frames to process a full screen to get good bandwidth and parallel processing with the blitter.

Here are a list of things that it cannot do.

1. Code can never run from chip memory
2. 1x1 pixels are out of the question - not enough bandwidth.
3. 2x1 pixels. There's bandwidth there but drawing scaled vertical lines and merging for a word write destroys the bandwidth. If every vertical line combination of a 2x1 pixel can be calculated then it is possible. Unfortunately there's not enough memory - not even close to do that - to pre-calculate.

So realistically we can only use 2x2 pixels. I took a look at Gloom Deluxe and changed the pixels to 2x2 and it looks acceptable - and lets face it to see a solid world at that resolution moving around in 3D with texture maps would of been mind blowing in 1985.

Ok, the only way to get textures on the screen in the simplest and quickest way is to use ray-casting. This means we need scaled vertical lines. All this can be pre-calculated for the cost of memory by using sequences of code like:

Scaling up

move.b (a0)+,d0
move.b d0,160*1(a1)
move.b d0,160*2(a1)

A 1x1 mapping ratio

move.b (a0)+,160*1(a1)

Scaling down

move.b 64*1(a0),160*1(a1)

By doing this our reads per texture size of 64x64 can only be 64 at a max for a line. And the writes can only be the length of the line. By doing this we make sure that bus usage is at the bare minimum.

So we want this game to run on NTSC and PAL and Wolfenstein used a 320 x 200 screen. But in actual screen space with out the panel at the bottom was 320 x 176. For our screen of 2x2 pixels we need 160x88 which comes to 14080 bytes. With copper line doubling we can get the y to go to 176. The x gets doubled by encoding the bytes in a special way for the C2P conversion routine... more of this later but the blitter will be used for the C2P conversion.

Before I discuss how much bandwidth it takes to fill a whole screen with pixels we need to set out what DMA usage we actually need. What we need is:

1. Memory refresh
2. Bit-planes
3. Sound
4. Copper
5. Blitter

What we don't need is:

1. Sprites when we render
2. Disk access when we render

By removing sprites and disk DMA we gain 19 DMA slots back per line which equates to nearly 1 bit-plane which needs 20 DMA slots. This means we are effectively running only 5 bit-plane accesses when we are using 6 bit-planes for EHB mode. This means the blitter can do more per frame in the long run.

There's two ways we can get the chunky screen to chip memory and each has it's pros and cons.

The first way is to store the chunky screen in fast memory and do all your reads and writes there. Then just transfer that to chip men by using a movem copy. The transfer speed is around half a frame to do this. This leaves 2.5 frames to render with. The half a frame transfer means that the blitter cannot be used when this happening.

The second way is to store the chunky screen in chip memory. This means that all bus writes go to chip memory for the render but of course chip memory is slower than fast. But this means you remove the movem copy and the blitter can run for 3 frames.

So which way is quicker for blitter usage and frame time.

Both methods can write a whole worth of pixels to the screen in 2 frames. Using the fast memory technique is faster of course but you pay back with the copy and lose a bit of blitter time. Writing to chip you lose out on the writes on the bus but you gain back because you don't need to copy. Surprisingly both methods are comparable but writing to chip gives you extra blitter time.

So having the chunky screen in chip is the way to go. This chunky screen needs to be double buffered. What we have to remember here is the code is still running from fast memory and all reads come from fast as well. The conclusion is the CPU is not that much crippled with DMA contention.

So how do we do the C2P conversion? This job will be done with the blitter. The blitter can process the next frames worth of data while the previous frame is displayed. This means there will be a 1 frame lag (3 frames really because we will be running at 17fps on PAL and 20fps NTSC) but that really is not that much of a problem. If we read the inputs every v-sync and store them we can process the game update to compensate for the lag.

This will be all synchronised by waiting for the 3rd frame and line 176. When that condition happens we change the copper pointer so that the bit-plane pointers will show the new screen, and then fire off the first blitter process chain. Then once the first blit finishes it will trigger a blitter finish interrupt and the next blitter process can fire off. This blitter chain will process over 3 frames and must end before line 176 on the 3rd.

The blitter tests I did over 3 frames shows that the blitter has enough time to convert to planer and time left over for other usages. This was the first thing I tested and was shocked that an OCS chipset with only a 68000 could do this. By using the blitter and the code in fast memory makes all this possible.

Now we need a nice why to encode the byte for the C2P conversion:

12334566

The numbers represent bit-planes.

The chunky screen pixel x order will be

0,8,1,9,2,10,3,11,4,12,5,13,6,14,7,15

Each number represents a byte. This way a word will mean that it contains 2 pixels at 8 pixels apart. For a ray casting engine that only draws vertical means that this is not an issue.

With this order we get a word that looks like this:
12334566 12334566

Doing the usual merge passes with the blitter we get these merge patterns:

12124545 12124545 1 pass
33336666 33336666 1 pass

12121212 12121212 1 pass
45454545 45454545 1 pass
33333333 33333333 converted bit-plane (double pixels) 1 pass
66666666 66666666 converted bit-plane (double pixels) 1 pass

To get the double pixels for bit-planes 1,2,4 and 5 we just mask out the unwanted bits and shift one of the channels by a pixel like this:

10101010 10101010
01010101 01010101

11111111 11111111 converted bit-plane 1 pass
22222222 22222222 converted bit-plane 1 pass
44444444 44444444 converted bit-plane 1 pass
55555555 55555555 converted bit-plane 1 pass

This breaks down to 10 passes. These calculations are in words using PAL timings for blitter times and don't forget we use one of the channels as a constant so the DMA channel for C does not need to be enabled - this means the blitter uses 6 cycles:

(3520 * 2 passes * 6 cycles / 7.09) / 1000 = roughly 5.957 m/s
(1760 * 8 passes * 6 cycles / 7.09) / 1000 = roughly 11.915 m/s

Which comes to 17.872 m/s. Even with DMA contention to be considered this can easily be calculated over 3 frames at 17fps on PAL.


As you can see, in theory, the numbers work out that it is possible to have a 64 colour low res screen using 2x2 pixels, having the code run in fast memory and using the blitter spread over 3 frames that it is possible to do a game like wolfenstein on a memory expanded Amiga 1000. And what is mind blowing is that this could of been done in 1985 Something released at that time would of been a game changer and certainly computer history would of been different!

I have not really programmed the Amiga since the mid 90s. So it was really good fun working all this out in the last couple of months. Next on my list is to get the theory working in a little demo - not sure how long it will take as I work full time. But as I said it's been fun doing this as a little hobby project.

Note:

I have done these tests using an emulator (winUAE) with the accuracy set to high. What I've read into this is that the 68000 emulation is pretty much spot on and the custom chip timings are spot on. If not then all this theory needs to be tested on a real machine which I do not have at the present time. But if it's more or less accurate then all this is possible.
AndNN is offline  
Old 28 October 2016, 21:39   #2
sandruzzo
Registered User
 
Join Date: Feb 2011
Location: Italy/Rome
Posts: 2,281
All this is fascinating, but what about timining to do ray casting, collisions, and game logic? How much time do we have even with fast mem?
sandruzzo is offline  
Old 28 October 2016, 22:17   #3
AndNN
Registered User
 
Join Date: Oct 2016
Location: Australia
Posts: 58
In theory we should have about 1 frame for all that. The cpu will be doing reads and writes to and from fast memory so it will be running at full speed with no DMA contention.

The ray casting side can mostly run from very large look-up tables... 8 meg of fast memory will be very handy indeed.

Collision is basically a 2D grid problem because that's what Wolfenstein is. I guess it's pretty similar to games like the Chaos Engine or Alien Breed.

I'm just guessing here with assumptions... these points will need investigation on the feasibility of all this. But I think 1 frame or less will be ok for what breaks down to a 2D game problem.

But the real question is how fast can the sprites be? I have some really neat ideas on how this should be done. That will be the next viable tests I do.
AndNN is offline  
Old 30 October 2016, 14:21   #4
sandruzzo
Registered User
 
Join Date: Feb 2011
Location: Italy/Rome
Posts: 2,281
I woul be interesting to do some tests
sandruzzo is offline  
Old 30 October 2016, 14:41   #5
frank_b
Registered User
 
Join Date: Jun 2008
Location: Boston USA
Posts: 466
Quote:
Originally Posted by sandruzzo View Post
I woul be interesting to do some tests


It should be possible to do it on a 68000. It's already been done
[ Show youtube player ]
frank_b is offline  
Old 30 October 2016, 15:38   #6
idrougge
Registered User
 
Join Date: Sep 2007
Location: Stockholm
Posts: 4,332
If you take the price and availability of fastmem in 1985 into account, this discussion is purely academical.
idrougge is offline  
Old 30 October 2016, 18:33   #7
AndNN
Registered User
 
Join Date: Oct 2016
Location: Australia
Posts: 58
Quote:
Originally Posted by idrougge View Post
If you take the price and availability of fastmem in 1985 into account, this discussion is purely academical.
That I agree... I wonder what % of A1000 users within the first year of launch upgraded the memory. 256mb just seems pitiful to do anything useful with, especially after the A500 launch those A1000 users would of upgraded the chip memory to 512kb. And when people expanded the A500 to 1 meg an A1000 user would only have the fast memory option.

Every system has the killer app that sells hardware... look what Doom did for PC graphic card sales. It just makes me wonder if someone released something like Wolfenstein before the A500 launch. And we are talking 1985 here so that would of been mind blowing to see textures moving around in 3D. I think if a game like that was released in 1985 then people would of bought fast memory no matter what the cost of it was... Just to play a game that was miles ahead graphically to any thing else. It's fun to think what might of been.

But as I said I agree that fast memory was not really a standard on the Amiga.


The whole purpose of this thread is to prove it can be done. I know the Atari ST got a port of Wolfenstein but with 16 colours and the fps been not that good that I know the Amiga can do so much better.

When I actually say Wolfenstein... I don't want to really port that - I think it will be fun to do something original.

On the testing front I've written a blitter chain system using the blitter interrupt and have the C2P working. So that is basically taking my byte format for pixels and converts to 6 bit-planes for the EHB mode and of course 64 colours.

If I put the code into chip memory the whole system runs very slowly and TBH would not be worth doing... Fast memory is the only why this can work and give a decent frame rate.

On a side note I've just bought an A600 off eBay for testing on real hardware.
AndNN is offline  
Old 31 October 2016, 10:16   #8
chb
Registered User
 
Join Date: Dec 2014
Location: germany
Posts: 439
Quote:
Originally Posted by idrougge View Post
If you take the price and availability of fastmem in 1985 into account, this discussion is purely academical.
Well, AFAIK, the 256k kickstart ram (WCS) in the A1000 had its own refresh logic, so was like real fast mem. For normal operation, it was write protected after loading kickstart from disk, but with some reset trickery it could be kept writeable (at least Dragon's Lair and the A-Max Macintosh emulator make use of this)*. But of course later on, most machines, including most A1000s, the kickstart was put into ROM, so this would work ony on a very small share of the installed user base.




*Source: http://aminet.net/package/util/misc/DumpA1000BootROM

Last edited by chb; 31 October 2016 at 10:27.
chb is offline  
Old 31 October 2016, 10:29   #9
AndNN
Registered User
 
Join Date: Oct 2016
Location: Australia
Posts: 58
I'm actually starting to change my mind on where he chunky buffer should be stored. I did some experiments where I did some reads from chip memory with the code in fast and things get slower. I want to do reads on the chunky buffer for transparency effects.

The C2P conversion takes around 1.5 frames... which means I can use the blitter for other graphic updates. Because I've decided to store the chunky buffer in fast now the movem copy to chip takes about .5 frames. That leaves about 1 frame for the blitter for other things.

Even though I loose .5 frames for the blitter it still works out about the same because the blitter takes less time to complete tasks as it's not yielding cycles for the CPU.

Doing simple test of writing to the chunky buffer in fast memory I can update a whole screens worth of pixels in just over 1.5 frames and with the movem copy brings it to just slightly over 2 frames. That means there's nearly 1 frame left for the CPU to process the game logic and ray casting.

It looks pretty promising so far.
AndNN is offline  
Old 31 October 2016, 11:24   #10
sandruzzo
Registered User
 
Join Date: Feb 2011
Location: Italy/Rome
Posts: 2,281
16 colors for 1x1 would suffice. Why use 2x2?
sandruzzo is offline  
Old 31 October 2016, 11:51   #11
AndNN
Registered User
 
Join Date: Oct 2016
Location: Australia
Posts: 58
Quote:
Originally Posted by sandruzzo View Post
16 colors for 1x1 would suffice. Why use 2x2?
Because we are are using fast memory the DMA will have no effect on the speed of the CPU. It's more about what the 68000 running from fast can write into fast memory. A byte is still used for the chunky buffer so by cutting the colour depth down will not actually speed the writes up because you still process the same amount of data. And in the case of 1x1 pixels you x4 the amount of data to shovel around. An example of this is the copy to chip takes .5 frames to happen. So 2x1 pixels will take a frame and 1x1 pixels will take 2 frames... it would not be worth doing.

But saying all that it would be nice to detect CPU speed and adapt for better quality. But the whole purpose of this exercise is to get something that would work on a Amiga 1000 with fast memory at a nice frame rate.

Edit:

Also to note is the blitter time. In realistic terms the more data we need for the chunky buffer means the more the blitter has to convert. The hard limit on this is that over 3 frames the blitter can convert 2x1 pixels for 6 bit-planes. Using 1x1 pixels the blitter will not have enough time for the conversion.

The test I've been doing is just to see what bandwidth is available for the blitter as well as the 68000 running in fast. The reason why fast memory helps is because we can use parallel processing - the blitter and the 68000 can run in tandem with no DMA contentions.

Last edited by AndNN; 31 October 2016 at 12:24.
AndNN is offline  
Old 31 October 2016, 14:01   #12
sandruzzo
Registered User
 
Join Date: Feb 2011
Location: Italy/Rome
Posts: 2,281
with only 16 colors you'll have 8 pixel into 32 bit insted 4. Not bad. You can almost double the pixel written into chip and fast memory
sandruzzo is offline  
Old 31 October 2016, 14:14   #13
S0ulA55a551n
Registered User
 
S0ulA55a551n's Avatar
 
Join Date: Nov 2010
Location: South Wales
Age: 46
Posts: 934
Quote:
Originally Posted by idrougge View Post
If you take the price and availability of fastmem in 1985 into account, this discussion is purely academical.
well it's all academical as this didn't happen in 1985 and we don't have a time machine
S0ulA55a551n is offline  
Old 31 October 2016, 15:59   #14
Retro1234
Phone Homer
 
Retro1234's Avatar
 
Join Date: Jun 2006
Location: 5150
Posts: 5,773
Ask the FBI


I couldn't find the video.

Last edited by Retro1234; 31 October 2016 at 16:06.
Retro1234 is offline  
Old 31 October 2016, 16:02   #15
sandruzzo
Registered User
 
Join Date: Feb 2011
Location: Italy/Rome
Posts: 2,281
we could write down some code, and do some real test. Target i would say a500 + 1-2 mb of fast-mem. What do you think?
sandruzzo is offline  
Old 31 October 2016, 22:49   #16
fgh
Registered User
 
Join Date: Dec 2010
Location: Norway
Posts: 817
Don't forget to tick 'cycle exact' emulation in WinUAE to get realistic speed
fgh is offline  
Old 01 November 2016, 03:05   #17
AndNN
Registered User
 
Join Date: Oct 2016
Location: Australia
Posts: 58
Quote:
Originally Posted by sandruzzo View Post
with only 16 colors you'll have 8 pixel into 32 bit insted 4. Not bad. You can almost double the pixel written into chip and fast memory
You won't save on writes to the chunky buffer, though. The 68000 will have to do sequences like this.

Because you are now using nibbles you will have to merge them together. On my first post I mentioned the compiled sequences to draw vertical lines... as well as doing that for say even lines, for odd lines you will have to merge with ors:

move.b (a0)+,d0
or d0,160(a1)
or d0,160(a1)

Which means you will need to still write double the amount. Admittedly the chunky buffer copy will be the same speed for 2x1 pixels to 2x2 pixels. Also, your texture data will need different versions for odd and even.

There is a way to cut the writes down in half which I mentioned in my first post and that is to process two vertical lines together and do the or on the cpu and write once for 2 pixels. Because to make this as fast as possible each possible vertical line will consist of a raw sequence of instructions. This gets complicated for 2 lines at once. For every possible line, we will need a version for odd and even lines together. That is a crap load of memory that the Amiga does not have.

But there's no harm in doing tests for this to rule what options are available. And if anybody wants to have a go at testing you are welcome.

I myself will be going down the 2x2 pixel route using 64 colours and 32 colours because my own testing proved that the bandwidth is comfortable spread over 3 frames. And if I want to go full screen PAL then I will have options to do that.

Who says I have not got a time machine... once this is proven I'm going back in time to change history
AndNN is offline  
Old 01 November 2016, 06:57   #18
sandruzzo
Registered User
 
Join Date: Feb 2011
Location: Italy/Rome
Posts: 2,281
When you have to convert chuncky into planar you'll see the difference on speed, and what about having only 4 planes enabled? You'll spare a lot of free dma cycles wich will speed up, copy from fast to chip mem

About merge operation. I think there is a way to arrange buffer in order to speed up this operations too
sandruzzo is offline  
Old 01 November 2016, 10:17   #19
Samurai_Crow
Total Chaos forever!
 
Samurai_Crow's Avatar
 
Join Date: Aug 2007
Location: Waterville, MN, USA
Age: 49
Posts: 2,186
The Copper can stretch pixels vertically by alternating modulo register values so the chunky to planar conversion doesn't have to stretch in that direction.
Samurai_Crow is offline  
Old 01 November 2016, 10:33   #20
AndNN
Registered User
 
Join Date: Oct 2016
Location: Australia
Posts: 58
Quote:
Originally Posted by sandruzzo View Post
When you have to convert chuncky into planar you'll see the difference on speed, and what about having only 4 planes enabled? You'll spare a lot of free dma cycles wich will speed up, copy from fast to chip mem

About merge operation. I think there is a way to arrange buffer in order to speed up this operations too
That's true, there will be more DMA slots for the blitter. It maybe actually better to do the writes to chip direct and avoid to copy to chip, which was not too bad with 6 bit-planes - so long as the cpu runs and reads from fast. For this to work really well when the cpu is doing the game logic from fast (that's running, reading, and writing) would be the best to fire off the blitter. So there will be no DMA contention between blitter and cpu for at least 1 frame

So all that will probably work really well but it depends on a clever way to make the merge not double the writes. If there is clever way to do that then you can get 64 colours with 2x1 pixels. This is possible because the instruction timings are the same for byte or words. So instead of updating the chunky buffer with bytes you use words. So you would do 2 reads, let the cpu merge, then write out a word. That method I mentioned needs so much memory that there is not enough when you draw vertically with the pre-compiled drawing instructions.

If 16 colours at 2x1 pixels is possible then that is just an extra mode added to the 64 colours at 2x2 pixels.

The Atari ST version of Wolfenstein uses 16 colours with 2x2 pixels and the framerate is not that good (don't get me wrong the guy that did that went through hoops to get that to work) but the Amiga can do so much better.

I know there are demos that have done 2x1 pixels at full screen but there will not be enough time to run the cpu on game and ray casting logic so the framerate runs at 17fps on PAL and 20fps in NTSC... all the testing is just finding out what can actually be done without committing.
AndNN is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Block Shock/Shock Wave - The Last Chance Retro-Nerd Games images which need to be WHDified 11 04 May 2012 23:31
Amiga 1000 Inboard 1000 by Spirit - help needed amiman99 support.Hardware 0 13 December 2011 04:50
Very old damaged EA game disks (1985/1986): looking for ADF replacement... Hacky request.Old Rare Games 19 20 October 2008 22:56
Connecting Amiga 1000 DKB Insider to 1000 without daughterboard huggies support.Hardware 0 05 October 2008 11:10
What Pre-1985 Video Game Character Am I? Kodoichi Nostalgia & memories 12 03 January 2002 17:46

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 18:02.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.16985 seconds with 14 queries