English Amiga Board


Go Back   English Amiga Board > Main > Retrogaming General Discussion

 
 
Thread Tools
Old 18 April 2020, 22:11   #101
eXeler0
Registered User
 
eXeler0's Avatar
 
Join Date: Feb 2015
Location: Sweden
Age: 50
Posts: 2,970
Quote:
Originally Posted by spiff View Post
  1. Dump 3d models and levels from VF
  2. Convert to 3d construction kit format
  3. Import
  4. Benchmark
  5. ???
  6. ... profit!!!

;-)
Actually, if someone could dump the models from the 32x version?
Obviously to have them as editable assets one would want them in a modern 3d software package. Or at least something like Milkshape 3d.
Unfortunately, I don't really know much about what formats devs used back in the day for 3d amiga games. I always assumed everyone wrote their own importers. (Or if it was really low poly, they probably typed in the vertex coords by hand ;-) [Edit] So, is there anything resembling a standard 3d file format on the Amiga that game devs used? What would you do today? (I know Vlad wrote his own tool to read fbx format and convert to his own..)
3d construction kit ;-), wasn't that only supporting its own Freescape engine?
Was there ever anything useful done with it? Gotta check youtube..

Last edited by eXeler0; 19 April 2020 at 01:02.
eXeler0 is offline  
Old 21 April 2020, 01:30   #102
AmigaHope
Registered User
 
Join Date: Sep 2006
Location: New Sandusky
Posts: 943
Thanks for helping frame things in a better way than my attempts at explaining. =)

Quote:
Originally Posted by ReadOnlyCat View Post
1- hierarchical polygonal model animation
-> this is many times more expensive than 3D projection
-> forget it, too many matrix multiplications, the Amiga 030 must use Quake-like animation (pre-stored positions)
The problem is that Virtua Fighter is all about that smooth movement of the fighters, and having pre-calculated models for each frame of the fighters would take up way too much RAM.

680x0 didn't get a truly fast multiplier until the 68060, though you could probably muddle through a simplified version of VF's workload on a 40Mhz 68040 (14-20 cycles execution time).

The SH-2 handles VF just fine and it takes the same number of cycles to multiply as the 68060. (Plus the 68060 is superscalar and can do a simple instruction for free at the same time without the bus contention the dual SH-2 setup that the Saturn has to deal with. A 50Mhz 68060 should be significantly faster overall than the 2X 28Mhz SH-2s in the Saturn)

Quote:
6- Polygon rasterization.
-> I will leave that to you
This is the real barrier on any Amiga using the native chipset. Drawing to chipmem is too slow and the blitter is too slow to do it for you.

Quote:
Originally Posted by coder76
There are some possibilities to make a good conversion of Virtua Fighter. The polygonal characters don't appear to have many colors
They actually have a lot of colors. The flat-shaded polygons made it deceptive, but if you look carefully there's lots of brightness variation across the model on a given frame. The models may have only 8 or so "base" colors but the fake lighting effects create many more.
AmigaHope is offline  
Old 21 April 2020, 15:20   #103
chb
Registered User
 
Join Date: Dec 2014
Location: germany
Posts: 439
Quote:
Originally Posted by AmigaHope View Post
The problem is that Virtua Fighter is all about that smooth movement of the fighters, and having pre-calculated models for each frame of the fighters would take up way too much RAM.
Is that really such a big problem? If we take those 200 quads per player, that's around 200 vertexes if one vertex is shared among 4 quads. Probably 16 bit fixed point is sufficient, so we get 3x2x200 = 1200 bytes per frame. So 3000 unique frames give you 50 seconds of 60 fps animation and take ~3.5 MB per player. I have no estimate, but 50s seem to be a lot for a fighting game.
Quote:
Originally Posted by AmigaHope View Post
This is the real barrier on any Amiga using the native chipset. Drawing to chipmem is too slow and the blitter is too slow to do it for you.
Hmm, I guess you'd fill in fast and transfer the complete frame to chip in one go. The CPU can write around 6-7 MB/s, or around 100k per frame to chip.

Furthermore, during fight the fighters occupy only a quite small part of the frame. You could transfer only the union of the rectangular bounding boxes of both players.

I'd probably use a sprite background (yes, repeats after 256 pix, but I guess that's ok), some copper gradient and 5 planes or so for the fighters. Then you could simply clear the (partial) frame with the blitter while the CPU works in fast mem. Maybe even draw and fill the floor (the 32x version is very simple) with the blitter in 2 extra planes.

Quote:
Originally Posted by AmigaHope View Post
They actually have a lot of colors. The flat-shaded polygons made it deceptive, but if you look carefully there's lots of brightness variation across the model on a given frame. The models may have only 8 or so "base" colors but the fake lighting effects create many more.
Hmm, but maybe you'd get away with 16 colors per player?

Anyway, I don't want to claim it is possible, just some thoughts about its feasibility.
chb is offline  
Old 21 April 2020, 19:14   #104
spiff
Oh noes!
 
spiff's Avatar
 
Join Date: Mar 2003
Location: Neverland
Posts: 766
Quote:
Originally Posted by chb View Post
Is that really such a big problem? If we take those 200 quads per player, that's around 200 vertexes...
It's more than that, physics for pony tail, clothes (the hat), debris, juggling hits etc. Static animation will affect gamplay.

But then again, its a conversion... so..
spiff is offline  
Old 21 April 2020, 23:46   #105
eXeler0
Registered User
 
eXeler0's Avatar
 
Join Date: Feb 2015
Location: Sweden
Age: 50
Posts: 2,970
Quote:
Originally Posted by spiff View Post
It's more than that, physics for pony tail, clothes (the hat), debris, juggling hits etc. Static animation will affect gamplay.

But then again, its a conversion... so..
I think we've already established that its no use trying to pretend we can match the power of the model 1 hardware, so all those details can be ignored. The 32x version has none of that, although Saturn version has some. The "Lau" character with the pony tail has some decent animation on Saturn that looks like its physics driven, 32x version just has stiff hair. It moves from one node-point, but looks nothing like Saturn. For Amiga version, I think the realistic aim would be something that matches the 32x.

In this comparison table you can see by how much the Saturn outperforms a Pentium 133MHz for geometric calculations etc.. Ok its theoretical numbers which were probably never achieved on Saturn in any actual game, but still..

https://segaretro.org/Sega_Saturn/Ha...mparison_table
eXeler0 is offline  
Old 22 April 2020, 00:41   #106
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,430
Those numbers are very suspect though. The PS1 was, in reality, quite a bit better at "real world" 3D than the Saturn was.
roondar is offline  
Old 22 April 2020, 02:30   #107
ReadOnlyCat
Code Kitten
 
Join Date: Aug 2015
Location: Montreal/Canadia
Age: 52
Posts: 1,178
Quote:
Originally Posted by VladR View Post
Question : Can you have a Chunky 256-color RTG Framebuffer on a vanilla 030 ? Or do you need some accelerator for that ?
If you are willing to limit yourself to a 40 x 256 pixels resolution then even OCS models have a 4096 colors chunky mode available (using the Copper).

(Note: I am only slightly jesting.)

Quote:
Originally Posted by AmigaHope View Post
The problem is that Virtua Fighter is all about that smooth movement of the fighters, and having pre-calculated models for each frame of the fighters would take up way too much RAM.
Not necessarily, interpolation could be used to reduce the amount of stored positions and in any case we do not have a choice.

Quote:
Originally Posted by AmigaHope View Post
The SH-2 handles VF just fine and it takes the same number of cycles to multiply as the 68060. (Plus the 68060 is superscalar and can do a simple instruction for free at the same time without the bus contention the dual SH-2 setup that the Saturn has to deal with. A 50Mhz 68060 should be significantly faster overall than the 2X 28Mhz SH-2s in the Saturn)
We are off topic here but bus contention is only an issue when running out of the 4KB byte cache and that represents a lot of 4x4 matrices and v4 vectors to multiply together. If the two processors can be synchronized to work out of phase (one reads the bus while the other computes) then this would not be an issue at all.
Anyway, enough fantasizing about the Saturn, back to the Amiga.

Quote:
Originally Posted by AmigaHope View Post
This is the real barrier on any Amiga using the native chipset. Drawing to chipmem is too slow and the blitter is too slow to do it for you.
Others on the EAB have shown that C2P on a 50MHz 030+ can be done at CHIP bandwidth saturation speeds so if one was willing to halve the FPS count it would be possible to render in chunky Fast RAM and C2P efficiently to Chip RAM.

I do not know if this would be the most efficient use of CPU though. Sure, the Blitter is slower at filling RAM than a 50MHz 030 but this is the worst possible use of that CPU: better use it to project vertices while the Blitter is busy being a Blitter.

Quote:
Originally Posted by roondar View Post
Those numbers are very suspect though. The PS1 was, in reality, quite a bit better at "real world" 3D than the Saturn was.
I have never coded for the Saturn, only the PS1, but reading the official documents leave me the impression that it is indeed more powerful.
Moreover, the PS1's CPU is alone to do projections and running the game code while the Saturn can dedicate one of its CPU solely to 3D projections.
But anyway, let us not digress.

So, has anyone written a v4 x M4x4 multiplication routine yet?
How many of these can you run per frame?

(Obviously, we are talking 16 bit fixed point maths, this is what the Saturn and PS1 used.)
ReadOnlyCat is offline  
Old 22 April 2020, 12:10   #108
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,430
Quote:
Originally Posted by ReadOnlyCat View Post
I have never coded for the Saturn, only the PS1, but reading the official documents leave me the impression that it is indeed more powerful.
Moreover, the PS1's CPU is alone to do projections and running the game code while the Saturn can dedicate one of its CPU solely to 3D projections.
But anyway, let us not digress.
I agree we should not digress, so I'll leave it at this: the real world results speak for themselves. PS1 3D games are generally more impressive than Saturn ones. To me, that shows that spec sheets don't tell the whole story here.
roondar is offline  
Old 22 April 2020, 13:37   #109
chb
Registered User
 
Join Date: Dec 2014
Location: germany
Posts: 439
Quote:
Originally Posted by ReadOnlyCat View Post
So, has anyone written a v4 x M4x4 multiplication routine yet?
How many of these can you run per frame?

(Obviously, we are talking 16 bit fixed point maths, this is what the Saturn and PS1 used.)
When I look at that game, it seems to me that apart from the cut scenes (and maybe replay) one might get away with rotation around only one axis (y) + zoom, maybe even simplifying perspective projection to a single zoom factor per player that's applied to all vertices. That gives you a 3x3 matrix where only 4 elements are different form 0 resp. 1, so only 4 multiplications to transform a 3D vector, instead of 16. Plus translation, but this is an addition.

Having only one axis of rotation + no perspective projection would give you also the possibility to pre-compute a ton of things like surface normals for lightning, backface culling etc.

Anyway, I had only a look at some Youtube clips of VF and may be entirely wrong.

Last edited by chb; 22 April 2020 at 13:42.
chb is offline  
Old 22 April 2020, 14:41   #110
AmigaHope
Registered User
 
Join Date: Sep 2006
Location: New Sandusky
Posts: 943
Quote:
Originally Posted by ReadOnlyCat View Post
Not necessarily, interpolation could be used to reduce the amount of stored positions and in any case we do not have a choice.
That's precisely the sort of overhead that we don't have the oomph to do on 68030.

Quote:
We are off topic here but bus contention is only an issue when running out of the 4KB byte cache and that represents a lot of 4x4 matrices and v4 vectors to multiply together. If the two processors can be synchronized to work out of phase (one reads the bus while the other computes) then this would not be an issue at all.
Anyway, enough fantasizing about the Saturn, back to the Amiga.
Well yeah, that was one of the big challenges devs faced -- keeping everything in that cache. My main point is that a 68060 -- which has execution times similar to the SH-2 -- clocked approx twice as fast as the SH-2 would still be faster than both SH-2s combined, by virtue of smoother memory access and the fact that the 060 can issue two instructions per cycle (if they're the right instructions).

Quote:
Others on the EAB have shown that C2P on a 50MHz 030+ can be done at CHIP bandwidth saturation speeds so if one was willing to halve the FPS count it would be possible to render in chunky Fast RAM and C2P efficiently to Chip RAM.
Yes this is true, but the key thing you said there is "halve the fps". The problem is (mostly) not the C2P, but the chipmem bandwidth. Even if AGA had a chunky pixel mode it would still be effing slow.

I'm honestly surprised that no accelerator manufacturers tried including a chipmem-writeback buffer. Obviously you wouldn't want it enabled for most writes but it would be cool to have a special-purpose buffer that you could write to and say "copy this to chipmem in the background and set an interrupt when you're done" for software that was aware of it.

Quote:
I do not know if this would be the most efficient use of CPU though. Sure, the Blitter is slower at filling RAM than a 50MHz 030 but this is the worst possible use of that CPU: better use it to project vertices while the Blitter is busy being a Blitter.
It's not that simple though, the blitter is pretty dumb and you have to queue its operations. If you use the CPU to queue it you have to have an extremely complex routine that interleaves with whatever other code your game is doing and doesn't attempt to write to chip until a blit is over but can come back and service the blitter again the instant it's ready. i.e. your transform routine has to run *in the same code* as your rasterization routine.

For simple scenes you can build a copperlist to make a more complex queue but that's way into the weeds and is no good for what we're trying to do.

In the end, given that the CPU has more bandwidth to the bitmap than the blitter does, doing a simple CPU alternating geometry + rasterization loop is 1000X simpler to code and not really any slower (and potentially faster).

Quote:
I have never coded for the Saturn, only the PS1, but reading the official documents leave me the impression that it is indeed more powerful.
Moreover, the PS1's CPU is alone to do projections and running the game code while the Saturn can dedicate one of its CPU solely to 3D projections.
But anyway, let us not digress.
The Saturn's rasterizer (VDP-1) is indeed more powerful than the PS1's (GPU) in terms of the number of textured polys it can throw around. VDP-1 is insanely powerful and fast. You build it a spritelist each frame and then it displays them (polys are rotated/scaled sprites). It's a sprite monster that wound up being good for both 2D and 3D games.

The PS1 had the GTE though, a dedicated 3D transform engine, so the CPU could focus on game code. The GTE was easy to program for directly, and also had optimized libraries made for it early on if you didn't want to write your own engine. The Saturn only has a second CPU and a generic DSP, both of which are hard to program for (especially the DSP), and both of which were slower than the PS1's GTE at geometry.

To get better 3D out of the Saturn you had to run transform threads in parallel on dissimilar processors, while figuring out a sane way to balance the workload between them, and very few games did this. The crappiest games didn't even try to use the second CPU, and all geometry was run with game code on the first CPU.

It's sort of digressing but it does also help illustrate the problem on the Amiga. Using the Amiga's built-in chipset, we have to do everything in software using a single CPU into a slow framebuffer, or we have to use the CPU to manage an even slower blitter queue while trying to squeeze out spare cycles for other stuff while still servicing it.

Virtua Fighter 32X is 100% software rendered (aside from the backgrounds and status bar overlays) but it has the benefit of having a much faster framebuffer to render to. Even then the result is just "good", any slower and it would be pretty poor.
AmigaHope is offline  
Old 24 April 2020, 08:19   #111
ReadOnlyCat
Code Kitten
 
Join Date: Aug 2015
Location: Montreal/Canadia
Age: 52
Posts: 1,178
Quote:
Originally Posted by AmigaHope View Post
In the end, given that the CPU has more bandwidth to the bitmap than the blitter does, doing a simple CPU alternating geometry + rasterization loop is 1000X simpler to code and not really any slower (and potentially faster).
You may very well be right.
Although I suspect the Blitter could still be useful to do some menial tasks in the background while the CPU is busy with non-3D stuff.

Quote:
Originally Posted by AmigaHope View Post
It's sort of digressing but it does also help illustrate the problem on the Amiga. Using the Amiga's built-in chipset, we have to do everything in software using a single CPU into a slow framebuffer, or we have to use the CPU to manage an even slower blitter queue while trying to squeeze out spare cycles for other stuff while still servicing it.
I don't know that building the Blitter queue would be that complex, if it is already pre-stored in memory, the CPU only needs to feed it the input parameters for each blit. I suspect the Blitter would be most efficient for operations involving all four channels so maybe it is possible to task it only with 2D effects overlaid on top of the CPU rendered framebuffer (sparks, particles, etc.).

In any case, this is just speculation (from me).
What really needs to be estimated prioritarily is how many vertices can realistically be projected per frame.
ReadOnlyCat is offline  
Old 24 April 2020, 16:14   #112
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,430
It's not really accurate to say that a CPU has more bandwidth to chipmemory than the Blitter. Best case bandwidth for both is identical (~7MB/sec), but many CPU's/turboboard don't manage to get that best case. Real world bandwidth of the CPU is therefore often somewhat lower than the Blitter.

The only reason it seems to be the case that the CPU has higher bandwidth is because it can use both fast and chip memory, while the Blitter is always using just chip memory. But this is a "trick of the mind": the CPU will still never exceed the ~7MB/sec to chip memory even if part of the operation is done using much faster fast memory.

Not only that, but the CPU can't use all chip memory cycles while the Blitter can, which might make using the Blitter at the same time the CPU is working a good idea. Even a basic A1200 without fast memory can already do this to increase total bandwidth to chip memory.

Last edited by roondar; 24 April 2020 at 16:34.
roondar is offline  
Old 24 April 2020, 16:48   #113
Gorf
Registered User
 
Gorf's Avatar
 
Join Date: May 2017
Location: Munich/Bavaria
Posts: 2,424
Quote:
Originally Posted by roondar View Post
It's not really accurate to say that a CPU has more bandwidth to chipmemory than the Blitter. Best case bandwidth for both is identical (7MB/sec), but many CPU's/turboboard don't manage to get that best case. Real world bandwidth of the CPU is therefore often somewhat lower than the Blitter....

Even a basic A1200 without fast memory can already do this to increase total bandwidth to chip memory.
I think it is 7MB/sec for the CPU on the A1200 since it can 32bits at once (and a little bit less on the A3000 and A4000 because of synchronization costs)
An 3.5MB/sec on a A500/A1000/A2000

But the AGA-Blitter is still only 16bit wide ... so even it it gets all cycles and the CPU gets none there it has only 3.5MB/sec ... + some more if you use only very few colors (16 colors on Highres or >64 colors in low res)

but still: on 32bit Amigas the Blitter has less chipram-bandwidth then the CPU
Gorf is offline  
Old 24 April 2020, 17:13   #114
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,430
Quote:
Originally Posted by Gorf View Post
I think it is 7MB/sec for the CPU on the A1200 since it can 32bits at once (and a little bit less on the A3000 and A4000 because of synchronization costs)
An 3.5MB/sec on a A500/A1000/A2000
True, but only in the best case (i.e. when using a well designed accelerator).

Quote:
But the AGA-Blitter is still only 16bit wide ... so even it it gets all cycles and the CPU gets none there it has only 3.5MB/sec ... + some more if you use only very few colors (16 colors on Highres or >64 colors in low res)

but still: on 32bit Amigas the Blitter has less chipram-bandwidth then the CPU
Nope. The Blitter can access all cycles (and therefore can reach ~7MB/sec bandwidth: ~3.5M cycles@16 bits), the CPU can only access every other cycle at maximum. All cycles @16bits is equal to half of cycles @ 32 bits

Edit: DMA contention does play a role here, but even then the Blitter will generally not lose 50% of cycles on AGA* and given enough colours both CPU and Blitter are affected in a similar (though not identical) fashion.

*) 320x256x8 bitplanes only takes roughly 1/7th of chip memory's total bandwidth on AGA.

Last edited by roondar; 24 April 2020 at 17:21.
roondar is offline  
Old 24 April 2020, 17:21   #115
Gorf
Registered User
 
Gorf's Avatar
 
Join Date: May 2017
Location: Munich/Bavaria
Posts: 2,424
Quote:
Originally Posted by roondar View Post
True, but only in the best case (i.e. when using a well designed accelerator).


Nope. The Blitter can access all cycles (and therefore can reach 7MB/sec bandwidth), the CPU can only access every other cycle at maximum. All cycles @16bits is equal to half of cycles @ 32 bits
only if you have a blank screen ... as soon as you have display DMA the Blitter can no longer access all cycles.
But the CPU can still access half of the cycles with double the width...

So: the Blitter has always LESS bandwidth!

Quote:
*) 320x256x8 bitplanes only takes roughly 1/7th of chip memory's total bandwidth on AGA.
should bee a quarter of all chip ram cycles (with "doubleCAS" 64 bit fetches), isn't it?

Leaving three quarters to the Blitter (in nasty mode)

still: 3/4 of 7MB/sec = 5.25MB/sec is less than the CPU Bandwidth of around 7 MB/sec in a A1200

Last edited by Gorf; 24 April 2020 at 17:31.
Gorf is offline  
Old 24 April 2020, 17:33   #116
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,430
Quote:
Originally Posted by Gorf View Post
only if you have a blank screen ... as soon as you have display DMA the Blitter can no longer access all cycles.
But the CPU can still access half of the cycles with double the width...

So: the Blitter has always LESS bandwidth!
This is false.

Whether or not it has more/less cycles depends on a variety of factors. Assuming AGA, if you have a small number of bitplanes and a very efficient accelerator (most are not) then yes, the CPU will exceed the Blitter speed. But note this will be nowhere near the 2x speed you claimed before, more like 5-10% (dependent on number of bitplanes displayed).

However, if you increase the number of bitplanes, the CPU advantage effectively dimishes to the point of vanishing as display DMA will start to steal cycles from it at a comparable rate as it does from the Blitter. Again, assuming AGA and a "perfect" accelerator: at 8 bitplanes, there is effectively no difference. At 7 bitplanes, the difference is only about 2%. At 6 bitplanes it's closer to 3,6%. At 5 bitplanes it's about 5,5%. Etc.

Note that all this assumes the accelerator you're using hits every cycle to chip memory. Many don't. Even a 5% inefficiency here is enough to basically remove any advantage the CPU has.

Quote:
should bee a quarter of all chip ram cycles (with "doubleCAS" 64 bit fetches), isn't it?
No, it's a quarter of all chip ram cycles where the display is fetching. Which is a lot less. About 1/7th for a 320x256x8 display.

Edit: here's some cycle math to show what I mean. The Amiga chipset (PAL) has 70512 DMA cycles per frame available to it. Of these, you lose 1152 to memory refresh. This leaves 69360 cycles. A 320x256x8 (4x fetch) display uses (320/64)*256*8=10240 DMA cycles. This is about 1/7th of the total. Neither CPU nor Blitter can access these cycles because the 8 bitplane fetch means all cycles in their respective slots are taken, leaving none for the Blitter or CPU to interleave.
Quote:
still: 3/4 of 7MB/sec = 5.25MB/sec is less than the CPU Bandwidth of around 7 MB/sec in a A1200
It's closer to 1/7th as I pointed out. And the CPU loses an very similar chunk of performance in this scenario so it won't win.

Last edited by roondar; 24 April 2020 at 17:41.
roondar is offline  
Old 24 April 2020, 17:40   #117
Gorf
Registered User
 
Gorf's Avatar
 
Join Date: May 2017
Location: Munich/Bavaria
Posts: 2,424
Quote:
Originally Posted by roondar View Post
This is false.

Whether or not it has more/less cycles depends on a variety of factors. Assuming AGA, if you have a small number of bitplanes and a very efficient accelerator (most are not) then yes, the CPU will exceed the Blitter speed. But note this will be nowhere near the 2x speed you claimed before, more like 5-10% (dependent on number of bitplanes displayed).
Highres screenmode with 8 bitplanes and the display DMA will eat up half of the available cycles.
So the Blitter has the same amount of cycles left the CPU would have.
Only the Blitter is 16bit wide, while the CPU is 32bit wide.
Therefore the CPU has double the bandwidth the Blitter has in this case.
Gorf is offline  
Old 24 April 2020, 17:44   #118
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,430
Quote:
Originally Posted by Gorf View Post
Highres screenmode with 8 bitplanes and the display DMA will eat up half of the available cycles.
So the Blitter has the same amount of cycles left the CPU would have.
Only the Blitter is 16bit wide, while the CPU is 32bit wide.
Therefore the CPU has double the bandwidth the Blitter has in this case.
We were talking about porting Virtua Fighter to the Amiga. This is obviously not going to be done in hires/superhires and thus I didn't account for that

Also, the Blitter only loses cycles while the bitplane data is being fetched, not across all cycles. That really does make a big difference. As does the fact that 8 bitplane fetches take all their cycles in their "blocks", so the CPU doesn't get better interleaving and effectively loses the same "chunk of performance" the Blitter does.

Edit: I suggest we move any further talk on Blitter vs CPU cycles somewhere else, I feel it's getting a bit too far off the topic at hand. More than willing to continue talking about it if you want, though.

Last edited by roondar; 24 April 2020 at 17:50.
roondar is offline  
Old 24 April 2020, 17:50   #119
Gorf
Registered User
 
Gorf's Avatar
 
Join Date: May 2017
Location: Munich/Bavaria
Posts: 2,424
Quote:
Originally Posted by roondar View Post
It's closer to 1/7th as I pointed out. And the CPU loses an very similar chunk of performance in this scenario so it won't win.
How so?
Looking at the RKM that can't be:
on OCS/ECS 4 bitplanes (16 colors) take exactly halve of the available cycles.

On AGA we can can fetch 4x as much in one go ... but we want now double the bitplanes.
2/4=0.5
So we need 0.5 the DMA time we needed on ECS.
On ECS we needed 50% of all cycles.
So we need now 25% of all cycles (for LowRes@256)
Gorf is offline  
Old 24 April 2020, 17:57   #120
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,430
Quote:
Originally Posted by Gorf View Post
How so?
Looking at the RKM that can't be:
on OCS/ECS 4 bitplanes (16 colors) take exactly halve of the available cycles.

On AGA we can can fetch 4x as much in one go ... but we want now double the bitplanes.
2/4=0.5
So we need 0.5 the DMA time we needed on ECS.
On ECS we needed 50% of all cycles.
So we need now 25% of all cycles (for LowRes@256)
Maybe this helps?
  1. You are correct that OCS/ECS 4 bitplanes take half of available cycles. In fact, all your numbers on cycles are correct.
  2. However... You are not correct that Amiga bitplane fetches happen all the time. They only happen when they need to for the display, which is only a certain part of the total time in a frame
  3. The Blitter and CPU are on "even footing" for the part of the frame where no display fetches happen (both during horizontal and vertical border/blanking), which is why the total number of cycles the Blitter loses vs the CPU is so much lower than you expect
But again, perhaps we should move this to a different thread
roondar is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Found: Shadow Fighter (Was: Anime Fighter) LaundroMat Looking for a game name ? 6 14 June 2017 20:52
DKB Cobra/Viper 030 (Full 030) + FPU + Ram £100 ElectroBlaster MarketPlace 1 08 March 2013 12:52
DKB Viper 030 + 128mb simm for A500 030 + ram... ElectroBlaster Swapshop 0 18 August 2012 19:48
[Found: Virtua Cop] shootie game with a gun cosmicfrog Looking for a game name ? 11 05 October 2009 22:11
GVP G-force 030 board for A2000-problem switching between 030 and 68k Unregistered support.Hardware 5 19 August 2004 10:04

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 15:30.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.11367 seconds with 14 queries