English Amiga Board

English Amiga Board (https://eab.abime.net/index.php)
-   Coders. General (https://eab.abime.net/forumdisplay.php?f=37)
-   -   Amiga Games I'm willing to fund the development of (https://eab.abime.net/showthread.php?t=95987)

ImmortalA1000 08 June 2022 01:25

Quote:

Originally Posted by Adropac2 (Post 1545482)
I recall from one of the coders of Fractulas or perhaps it was another of three games, saying essentially that there wouldn't be any especially great gains from the difference in power. To be fair I can see his logic because there's really not much only a little extra power could bring I would guess to the engine. It still would have been good though to have seen fractulas on Amiga

That's because Rescue on Fractalus, and therefore the rest of the Fractal game engines, are the brainchild of none other than Loren Carpenter, the God of 3D Fractal (he invented the whole idea in the late 70s using Boeing mainframes) and he never ever came back to Lucasfilm or ever wrote any other computer/console games as far as I know. Either way, like I said above, that little bonus in Masterblazer on ST/Amiga was not really a serious effort and not that impressive, the 68000 is the best 16bit CPU being put into computers of mid 1980s and significantly more powerful than what was in the C64, Amstrad CPC, Atari 800 or Apple II etc. Even without throwing extra CPU you still have the subtlety a Copper would bring to the table, combine that with 5 bitplanes and the mind boggles what a genius like Loren Carpenter could have done with an Amiga 1000, he really had no interest though, he was doing George Lucas a favour and a bit of personal challenge when asked by the Lucasfilm guys 'is it possible' but ultimately he was only interested in cutting edge computing fields of computer graphics not home gaming business.

VladR 11 June 2022 23:53

Quote:

Originally Posted by ImmortalA1000 (Post 1537717)
I was always amazed that there was never an Amiga port of Rescue on Fractalus.

That's because RoF was tailor-designed around Atari 400/800 HW capabilities. It uses its architecture to the max.
Its Viewport is 160x48. Lowest Res on Amiga is 320x200, so viewport would be 320x96. That's 4x more pixels, right there.

Also, I'm pretty sure lots of Atari coders tried a similar engine on OCS and when it ran like crap, they just discarded it. Amiga arch with bitplanes is horrible for a game like this. The 7 MHz doesn't compensate for difference in resolution, let alone the additional bitplane cost which is non-existent on Atari.

Quote:

Originally Posted by buzzybee (Post 1545369)
I seem to remember an interview with Factor-5-guys about them trying to port RoF to Amiga, but did not succeed due to the hardware lacking in computational power. The "Masterblazer"-intro shows the proof-of-concept (or rather, not-proof) they came up with, have a look at 1:25 mins:

https://youtu.be/pHSp63AGG0o?t=87

I don't understand. How is the above "not-proof" ? Compared to 160x48 on Atari, this is 320x200, so about 8x more pixels. And it runs at higher framerate. MasterBlazer is nothing short of jaw-dropping.

But, I realize that to consider it jaw-dropping, one would have to have experience on both Atari and Amiga and not many people fulfill that condition...

Quote:

Originally Posted by Adropac2 (Post 1545482)
I recall from one of the coders of Fractulas or perhaps it was another of three games, saying essentially that there wouldn't be any especially great gains from the difference in power. To be fair I can see his logic because there's really not much only a little extra power could bring I would guess to the engine. It still would have been good though to have seen fractulas on Amiga

As the MasterBlazer proves, if they cut the vertical resolution to half (cockpit view), then the framerate would be even higher than it is now. Certainly more than playable.
RoF doesn't need more than 8 fps anyway. Anything above 8 fps is just waste of performance and could be rather used for better visuals or physics.

Quote:

Originally Posted by ImmortalA1000 (Post 1549092)
I did see that a long time ago, it is a very basic routine and given the 68000 at 7mhz is probably 500% better than even the top end spec (1.79mhz of Atari 800) it really wasn't good.

No, it isn't 500% better for RoF's ORA/EOR Filler. Far from it.
Even if the only difference was the resolution (160x96) vs (320x200), that's already a factor of 4x.

The second, much bigger issue is that 68000 is very inefficient in ops per cycle due to its architecture compared to 6502.Please don't give me some synthetic benchmark or edge case where you can get more ops from 68000 as we are talking about RoF here.

Sure, you have 16 registers and shitton of addressing modes on 68000. But as RoF proves, it's not needed. 2 index registers and 1 math register is clearly all that RoF needs.

On 6502 you can do amazing amount of ops in just 2 or 3 cycles. Not the case for 68000.

Fundamentally, frequency is irrelevant. What matters is how many ops you can do per frame (1/60s). For what RoF needs, it can easily beat 7 MHz 68000.

Amiga is great in Copper+Sprites+Blitter. But SW rasterizing sucks big time (understandably). RoF is using XOR/ORA filler.


I fully admit, that I was spoiled by Jaguar's architecture. It has 68000 at 13.3 MHz, but its framebuffers are chunky. So, it can deliver an oomph per 1 Mhz that is utterly impossible on Amiga with its bitplanes.

Hell, I have been recently rewriting lots of 3D routines from Atari 800 and running them under OCS and it's incredible how much can the little 6502 do in one frame. My appreciation for it rose substantially.


EDIT: But most people see things like:
1.79 MHz vs 7.16 Mhz
8-bit vs 16/32-bit
And think - "ooooh, that's just so.much.faster". Right :-D

VladR 12 June 2022 00:05

Quote:

Originally Posted by ImmortalA1000 (Post 1549093)
combine that with 5 bitplanes and the mind boggles what a genius like Loren Carpenter could have done with an Amiga 1000

5 bitplanes on OCS ? The CPU throughput would be further butchered due to DMA to about ~50%.

That's effectively ~3.58 MHz (and yes, I am aware that mul and div and other ops could run while DMA is blocking CPU, but those aren't needed for our use case here - RoF)

Now Jaguar, with 13.3 MHz 68000, 26.6 MHz GPU, 26.6 MHz DSP, 64-bit Blitter and a linear chunky framebuffer - yes that's a different ballgame...

mcgeezer 12 June 2022 00:13

Quote:

Originally Posted by VladR (Post 1549643)
That's because RoF was tailor-designed around Atari 400/800 HW capabilities. It uses its architecture to the max.
Its Viewport is 160x48. Lowest Res on Amiga is 320x200, so viewport would be 320x96. That's 4x more pixels, right there.

Also, I'm pretty sure lots of Atari coders tried a similar engine on OCS and when it ran like crap, they just discarded it. Amiga arch with bitplanes is horrible for a game like this. The 7 MHz doesn't compensate for difference in resolution, let alone the additional bitplane cost which is non-existent on Atari.

I don't understand. How is the above "not-proof" ? Compared to 160x48 on Atari, this is 320x200, so about 8x more pixels. And it runs at higher framerate. MasterBlazer is nothing short of jaw-dropping.

But, I realize that to consider it jaw-dropping, one would have to have experience on both Atari and Amiga and not many people fulfill that condition...

As the MasterBlazer proves, if they cut the vertical resolution to half (cockpit view), then the framerate would be even higher than it is now. Certainly more than playable.
RoF doesn't need more than 8 fps anyway. Anything above 8 fps is just waste of performance and could be rather used for better visuals or physics.


No, it isn't 500% better for RoF's ORA/EOR Filler. Far from it.
Even if the only difference was the resolution (160x96) vs (320x200), that's already a factor of 4x.

The second, much bigger issue is that 68000 is very inefficient in ops per cycle due to its architecture compared to 6502.Please don't give me some synthetic benchmark or edge case where you can get more ops from 68000 as we are talking about RoF here.

Sure, you have 16 registers and shitton of addressing modes on 68000. But as RoF proves, it's not needed. 2 index registers and 1 math register is clearly all that RoF needs.

On 6502 you can do amazing amount of ops in just 2 or 3 cycles. Not the case for 68000.

Fundamentally, frequency is irrelevant. What matters is how many ops you can do per frame (1/60s). For what RoF needs, it can easily beat 7 MHz 68000.

Amiga is great in Copper+Sprites+Blitter. But SW rasterizing sucks big time (understandably). RoF is using XOR/ORA filler.


I fully admit, that I was spoiled by Jaguar's architecture. It has 68000 at 13.3 MHz, but its framebuffers are chunky. So, it can deliver an oomph per 1 Mhz that is utterly impossible on Amiga with its bitplanes.

Hell, I have been recently rewriting lots of 3D routines from Atari 800 and running them under OCS and it's incredible how much can the little 6502 do in one frame. My appreciation for it rose substantially.


EDIT: But most people see things like:
1.79 MHz vs 7.16 Mhz
8-bit vs 16/32-bit
And think - "ooooh, that's just so.much.faster". Right :-D

Uhuh https://www.youtube.com/watch?v=lJyir-XEDwg

Quote:

Originally Posted by VladR (Post 1549645)
5 bitplanes on OCS ? The CPU throughput would be further butchered due to DMA to about ~50%.

That's effectively ~3.58 MHz (and yes, I am aware that mul and div and other ops could run while DMA is blocking CPU, but those aren't needed for our use case here - RoF)

Now Jaguar, with 13.3 MHz 68000, 26.6 MHz GPU, 26.6 MHz DSP, 64-bit Blitter and a linear chunky framebuffer - yes that's a different ballgame...

Now I reckon you're Kieran Hawken, or Ace Rimmer. :D

VladR 12 June 2022 00:19

I wonder what OCS game has a 3D terrain that people would consider as good enough, considering the negative attitude towards MasterBlazer's terrain.

TCD 12 June 2022 00:43

Quote:

Originally Posted by VladR (Post 1549649)
I wonder what OCS game has a 3D terrain that people would consider as good enough, considering the negative attitude towards MasterBlazer's terrain.

Hunter and Midwinter (2) spring to mind.

mcgeezer 12 June 2022 00:44

Quote:

Originally Posted by VladR (Post 1549649)
I wonder what OCS game has a 3D terrain that people would consider as good enough, considering the negative attitude towards MasterBlazer's terrain.

Look, you’re effectively saying the Amiga can’t do ROF. ROF ran on an Atari 8 bit at about 10 FPS which was excellent. I love the game and i play it now an again. But to say an Amiga can’t do it because it hasnt been done before.

You’re wrong. It’s two planes with a very short display. The engine could be done. I dunno why it never appeared on Amiga/ST or other platforms, maybe a licensing thing but programmers could have done it.

Encounter probably a similar story.

VladR 12 June 2022 01:39

Quote:

Originally Posted by mcgeezer (Post 1549652)
Look, you’re effectively saying the Amiga can’t do ROF. ROF ran on an Atari 8 bit at about 10 FPS which was excellent. I love the game and i play it now an again. But to say an Amiga can’t do it because it hasnt been done before.

You’re wrong. It’s two planes with a very short display. The engine could be done. I dunno why it never appeared on Amiga/ST or other platforms, maybe a licensing thing but programmers could have done it.

Yes, technically it can be done (there is MasterBlazer after all and it's fullscreen!). I'm not disputing that.

It's just that this whole "500%" performance compared to 1.79 Mhz 6502 is nonsense, especially with more than 4x pixels that Amiga has to draw compared to 160x48 viewport on Atari.

If Fractal terrain was easy to pull off on OCS Amiga in great framerate, surely there would be hundreds of games utilizing that by now. But it's a very different beast compared to setting few Copper registers or moving sprites around the screen like most games do!

Amiga is amazing for many things. SW rasterizer is not one of those things...

It's unfortunate Amiga didn't have native 160x200...

Go and implement XOR/ORA filler with 2 bitplanes and then let's compare how well it fares on 68000 compared to 6502. I'm not saying the second bitplane doubles the amount of CPU work, but it's close to that for that particular stage of the rendering pipeline (which is a pretty significant chunk of whole frame time).


Quote:

Originally Posted by mcgeezer (Post 1549652)
Encounter probably a similar story.

Maybe it was ST (and not Amiga), but just few weeks ago I found some YT channel with Encounter and its successor (or was that predecessor?) game by Paul Woakes. Sorry, do not recall the name right now.
A proper 16-bit version, for sure. Very well suited for the HW!

VladR 12 June 2022 01:46

Quote:

Originally Posted by mcgeezer (Post 1549652)
Encounter probably a similar story.

Looks like I was wrong. While I did watch the ST version the other day, I also did watch an Amiga version of Encounter called Backlash:

https://www.youtube.com/watch?v=mR3D...el=Mamemeister

Adropac2 12 June 2022 02:30

the article I remember if I'm to read between the lines wasn't if this type thing could be done better necessarily but if a better engine on stronger machines would really allow for the concept of what Fractulas, Eidolon and Koronis Rift did to be taken further by them

VladR 12 June 2022 03:37

Quote:

Originally Posted by Adropac2 (Post 1549658)
the article I remember if I'm to read between the lines wasn't if this type thing could be done better necessarily but if a better engine on stronger machines would really allow for the concept of what Fractulas, Eidolon and Koronis Rift did to be taken further by them

That would be an interesting read, for sure.

I don't doubt for a second that an enhanced version of Fractalus engine would look great. Just double the grid spacing and get 4x as much detail. But it really needs more color at that point (or dithering). We can have 64 shades on OCS (at the cost of butchering CPU due to 6 BPs) without any tricks.

The Million dollar question is, however, if it's going to look better than a Voxel engine would on same HW and same framerate.

Imagine Comanche on 386DX40. Same or lower framerate than Fractalus. Especially the datadisks with reflection ran at around 4 fps. Would Fractalus look better than that ? I don't think so, yet its overdraw is also substantial.

I've recently implemented Voxel terrain on Jaguar and got some hard data on overdraw and other performance characteristics that it shares with Fractalus (though that particular type of renderer is still on my to do list, for a good reason).

I simply don't believe it can trump voxel, though it does have a very clean, neat look that voxel does not...


And where exactly would we draw the HW line on Amiga ? 68060 ? At that point good luck competing with voxel engines that look great and move even better...

VladR 12 June 2022 03:45

Quote:

Originally Posted by TCD (Post 1549651)
Hunter and Midwinter (2) spring to mind.

Thank you, Midwinter 2 looks interesting, though I would hazard a guess that MasterBlazer would be more visually appealing to most people (but it's not a full game either hence its framerate would be lower with everything else in the game running during each frame).

VladR 12 June 2022 19:40

Quote:

Originally Posted by ImmortalA1000 (Post 1549092)
I did see that a long time ago, it is a very basic routine and given the 68000 at 7mhz is probably 500% better than even the top end spec (1.79mhz of Atari 800) it really wasn't good.

Now, as much as I love pulling numbers out of my arse, how about we throw in some real-world numbers ? I've recently implemented my very first OCS DrawPixel routine and tested it in 6-bitplane EHB (64 colors).
So, I can actually compare it to my own DrawPixel routine on 6502.

Disclaimer:
  • It's a first-version, generic, reference rasterizer routine without any optimizations whatsoever. Wouldn't use it in a game without spending another afternoon optimizing it.
  • I can see at least 3 immediate optimizations and will get to them (sooner or later - probably much later:D)
  • I'm sure somebody can supply a number from their own optimized routine on Amiga
  • I'm not claiming my 33-cycle 6502 DrawPixel routine is the fastest, but it is what I use and for my purposes it's "good enough" (such that I can't be arsed to write a faster one, at least :D)
Code:

Both Atari/Amiga Cycle budgets are NTSC
Atari's number accounts for DMA stealing.
Both gfx modes are 4-color

----------------------------------------------------------------------------------
Platform  |  Frequency  |  Frame Cycles  |  DrawPixel Cycles  |  Pixels/Frame
----------------------------------------------------------------------------------
  6502            1.79            24,186                  33            732.9
 68000            7.16          119,333                278            429.2
----------------------------------------------------------------------------------

Why DrawPixel ? Well, Star Raiders is a great example of a 3D game that is a flagship Atari game and is not present on the "500%" faster computer and renders pixels as its main FX.

For Amiga to merely equalize with the uber-puny 1.79 MHz Atari (a fair ask I believe given its architectural and CPU improvements), given its 4x higher clock speed, all it has to do is render 4X more pixels per frame.
732.9 x 4 = 2,931.6 pixels per frame
119,333 / 2,931.6 = 40.7 cycles per DrawPixel routine.


Can I please see some generic (x,y,color) 40-cycle DrawPixel (2 bpl) routine on Amiga ? Thank you !


EDIT: I spent an hour with a quick optimization and 6-Bitplane DrawPixel dropped from 1,576c to 1,108c. The 2-Bitplane one dropped from 612c to 554c. Well worth an hour. I can actually see it drawing faster under WinUAE...
EDIT2: Another hour and the 6-Bitplane DrawPixel dropped from 1,108c to 922c. The 2-Bitplane one dropped from 554c to 490c.
EDIT3: Another hour. 6-Bitplane DrawPixel dropped from 922c to 624c. The 2-Bitplane one dropped from 490c to 392c. Numbers are averages for full range of colors. Zero bits aren't forced anymore in this version. This is as far as I can get today without introducing LUT tables.
EDIT4: Another hour. 6-Bitplane DrawPixel dropped from 624c to 510c. The 2-Bitplane one dropped from 392c to 278c.

a/b 12 June 2022 20:10

Quote:

Originally Posted by VladR (Post 1549744)
Can I please see some generic (x,y,color) 40-cycle DrawPixel (2 bpl) routine on Amiga ? Thank you !

No offence man, but that's exactly how you end up with crappy atari ports to amiga. I've written at least a dozen different fx with dots (thousands of pix per frame, 50fps, and yeah that's not a proper game engine but it's far far from ~200 pix/frame) and none of them is using a generic drawpixel() because that's simply not how you do things.
If you *really* need something like that I'd use a 2^N wide screen, dump x/y into a list (one for each color), use blitter to convert x/y to bit (d0-d7) and 16-bit offset and dump the output to unrolled code, 1 instructiion (or 2 for color3) per pixel.

VladR 12 June 2022 22:03

Quote:

Originally Posted by a/b (Post 1549745)
No offence man, but that's exactly how you end up with crappy atari ports to amiga. I've written at least a dozen different fx with dots (thousands of pix per frame, 50fps, and yeah that's not a proper game engine but it's far far from ~200 pix/frame) and none of them is using a generic drawpixel() because that's simply not how you do things.
If you *really* need something like that I'd use a 2^N wide screen, dump x/y into a list (one for each color), use blitter to convert x/y to bit (d0-d7) and 16-bit offset and dump the output to unrolled code, 1 instructiion (or 2 for color3) per pixel.

Unfortunately, Star Raiders is not a compo demo with precomputed and unrolled code for each FX, but a real game where each frame you can turn anywhere in 3D space (free movement alongside all 3 axis), hence each dot is a generic pixel (after 3D transform).

Of course, with 0.5 MB RAM, Blitter, Copper and ~120,000 cycles per frame of 68000, there's no argument that 68000 can do some nice precomputed or real-time FX!

That being said, having written my very first OCS version of generic DrawPixel, for purposes of something like Star Raiders, it's obvious we would need multiple versions, probably sorted per color to avoid needless table look-ups and removing unneeded bit manipulations (especially for bit 0). Then again, time to sort the batches might be longer than the gains (it all depends on number of pixels), so extensive benchmarking would be needed.

But it is an interesting engineering problem, one with plethora different approaches on Amiga...

paraj 13 June 2022 12:51

No reason why a/b's approach wouldn't work in general, and I think with an extra blitter pass could be used with any screen width evenly divisible by 8.

A generic putpixel function hits some of the things Amiga doesn't do well, so you would probably never design something new that would make heavy use of that. Also don't know why you'd forgo using LUTs?

I'm sure it can be improved, but something like the below for intereleaved bitmaps should quite a bit faster (though still not very fast). If you don't to clear 0 bits the "else bclr" part can be left out for further speedup.
Code:

        ; a0=dest,d0=x,d1=y,d2=color
        ; registers modified:d0,d1,d2,a1
        ; cost: 102 (20/0)
        ; or with mulu.w/add.l: 126-156 (17/0)
        ; + 108 (21/6) @ 6 BPL
putpixel:
        add.w  d1,d1
        add.w  d1,d1
        add.l  ytable(pc,d1.w),a0
        move.w  d0,d1
        lsr.w  #3,d0
        add.w  d0,a0
        not.w  d1
        and.w  #7,d1
        add.w  d2,d2
        lea    colfunctab(pc),a1
        move.w  0(a1,d2.w),d2
        jmp    0(a1,d2.w)

ytable:
.ofs set 0
        rept screenh
        dc.l .ofs
.ofs set .ofs+rowbytes*screend
        endr

        ; cost: 12+16*screend (2+3*(screend-1)+4/screend)
colfunc macro
colfunc\<col>:
.cnt set 0
        rept screend
        ifne (col&(1<<.cnt))
        bset.b  d1,.cnt*rowbytes(a0)
        else
        bclr.b  d1,.cnt*rowbytes(a0)
        endc
.cnt set .cnt+1
        endr
        rts
        endm

col set 0
        rept (1<<screend)
        colfunc
col set col+1
        endr

colfuncentry macro
        dc.w    colfunc\<col>-colfunctab
        endm

colfunctab:
col set 0
        rept (1<<screend)
        colfuncentry
col set col+1
        endr

Even with this version you're not going to be drawing more than a couple of hundred pixels per frame (with a 320x256x6 display active).

meynaf 13 June 2022 13:35

Quote:

Originally Posted by VladR (Post 1549744)
Now, as much as I love pulling numbers out of my arse, how about we throw in some real-world numbers ? I've recently implemented my very first OCS DrawPixel routine and tested it in 6-bitplane EHB (64 colors).
So, I can actually compare it to my own DrawPixel routine on 6502.

May we see said 6502 DrawPixel routine ? It would be more "real-world" if we had the code.
Would be interesting to see that XOR/ORA filler as well.
I've never seen 6502 code beating 68000.

a/b 13 June 2022 15:30

My point was, at least that's how I'd approach these things, if you write a game that heavily relies on drawing hundreds of pixels every frame you do not say: OK, I have a drawpixel() routine that takes care of that and we can sort it out later (even if it does work out, you will still be wasting a decent amount of milliseconds that could be better spent elsewhere).
I'd design my game around a drawmanypixels() routine, it would be tightly integrated into the game and part of design right from the start. Be it the format of bitmap (width 256, 320, 512-modulo, ..., interleaved or not, how much memory you can afford to waste), blitter available for mass processing yes/no, bus bandwidth usage e.g. is blitter running at the same time and/or high number of bitplanes (meaning cpu plotter with cycle-wise slower 1 mem access shifts/rolls becomes faster than LUTs with multiple mem accesses), how much memory can be reserved for unrolls and/or support tables so it can ran reasonably fast, ...
And that's where the FX stuff comes in. Yeah, and I said already those are not actual game engines, but *many* techniques and tricks still apply, and will get you to the finish line easier than a drawpixel().
Because, unfortunately (or not), we have to deal with bitplanes. And as soon as you want to draw a large number of multi-color pixels and have to hammer individual bits you are in for a lot of hurt by doing it one by one.

VladR 13 June 2022 17:04

Quote:

Originally Posted by meynaf (Post 1549834)
May we see said 6502 DrawPixel routine ? It would be more "real-world" if we had the code.
Would be interesting to see that XOR/ORA filler as well.
I've never seen 6502 code beating 68000.

Sure, I can try to dig up the 6502 routine from my sources later today or tomorrow and post it here.
Unfortunately, like I said, I have not implemented the XOR/ORA column filler used in Fractalus. Because the moment I do that, I engage in yet another rabbit hole of refactoring that will consume few months of my limited free time. It's fairly high on my to-do list of things to try, but not in top 3.

Quote:

Originally Posted by a/b (Post 1549846)
My point was, at least that's how I'd approach these things, if you write a game that heavily relies on drawing hundreds of pixels every frame you do not say: OK, I have a drawpixel() routine that takes care of that and we can sort it out later (even if it does work out, you will still be wasting a decent amount of milliseconds that could be better spent elsewhere).

Of course, as I already said before, I wouldn't just do that on Amiga (that's basically just a first reference rasterizer version for testing).My initial hunch was to group the pixels by the Bitplane layer index as one of the first approaches to try.

I don't mind writing dozen versions, really. Even if half of them do not turn out to be faster.
But, so far, after having spent 4 hours on optimizing, I came up with 4 versions and each was faster than the previous one.
The 6-bitplane one dropped from 1,576c down to 510c. I consider that a nice speed-up from one afternoon of effort.

Quote:

Originally Posted by a/b (Post 1549846)
I'd design my game around a drawmanypixels() routine, it would be tightly integrated into the game and part of design right from the start. Be it the format of bitmap (width 256, 320, 512-modulo, ..., interleaved or not, how much memory you can afford to waste), blitter available for mass processing yes/no, bus bandwidth usage e.g. is blitter running at the same time and/or high number of bitplanes (meaning cpu plotter with cycle-wise slower 1 mem access shifts/rolls becomes faster than LUTs with multiple mem accesses), how much memory can be reserved for unrolls and/or support tables so it can ran reasonably fast, ...
And that's where the FX stuff comes in. Yeah, and I said already those are not actual game engines, but *many* techniques and tricks still apply, and will get you to the finish line easier than a drawpixel().
Because, unfortunately (or not), we have to deal with bitplanes. And as soon as you want to draw a large number of multi-color pixels and have to hammer individual bits you are in for a lot of hurt by doing it one by one.

Well, on Atari, I do the grouping because there is only 3 colors to draw and just few pixels, so it's easy to do manually and each DrawPixel version has LUTs for specific color mask.
That's one of the approaches to try on Amiga too.

I am certainly curious how we can use the Blitter for this scenario.

I'm loving this discussion ! So much stuff to learn here:)

VladR 13 June 2022 17:43

Quote:

Originally Posted by paraj (Post 1549831)
No reason why a/b's approach wouldn't work in general, and I think with an extra blitter pass could be used with any screen width evenly divisible by 8.

I'm not saying it wouldn't work. I haven't yet examined it. I wanna run out of CPU alternatives first and I keep getting new ideas with each optimization I do (so far 4 done and 2 more on the to-do list) :)

Quote:

Originally Posted by paraj (Post 1549831)
A generic putpixel function hits some of the things Amiga doesn't do well, so you would probably never design something new that would make heavy use of that.

OK, maybe not entirely generic.
But, if you imagine a Star Raiders starfield, the stars basically cover all XPOS and YPOS points and they can be of any color (say, 16 or 32 or 64).
So, that's kinda generic as we don't have control over where the points end up on screen. It depends on player's input (3 axis of rotation).

We could certainly sort and group them based on color index, for example. This would avoid setting unwanted bits or even checking for 0 and bypassing the set.

Quote:

Originally Posted by paraj (Post 1549831)
Also don't know why you'd forgo using LUTs?

I didn't forgo LUTs, like I said, they're on the to-do list. Once I run out of possibilities to optimize my current version, that's when I will go for LUTs.

Quote:

Originally Posted by paraj (Post 1549831)
I'm sure it can be improved, but something like the below for intereleaved bitmaps should quite a bit faster (though still not very fast). If you don't to clear 0 bits the "else bclr" part can be left out for further speedup.
Code:

        ; a0=dest,d0=x,d1=y,d2=color
        ; registers modified:d0,d1,d2,a1
        ; cost: 102 (20/0)
        ; or with mulu.w/add.l: 126-156 (17/0)
        ; + 108 (21/6) @ 6 BPL
putpixel:
        add.w  d1,d1
        add.w  d1,d1
        add.l  ytable(pc,d1.w),a0
        move.w  d0,d1
        lsr.w  #3,d0
        add.w  d0,a0
        not.w  d1
        and.w  #7,d1
        add.w  d2,d2
        lea    colfunctab(pc),a1
        move.w  0(a1,d2.w),d2
        jmp    0(a1,d2.w)

ytable:
.ofs set 0
        rept screenh
        dc.l .ofs
.ofs set .ofs+rowbytes*screend
        endr

        ; cost: 12+16*screend (2+3*(screend-1)+4/screend)
colfunc macro
colfunc\<col>:
.cnt set 0
        rept screend
        ifne (col&(1<<.cnt))
        bset.b  d1,.cnt*rowbytes(a0)
        else
        bclr.b  d1,.cnt*rowbytes(a0)
        endc
.cnt set .cnt+1
        endr
        rts
        endm

col set 0
        rept (1<<screend)
        colfunc
col set col+1
        endr

colfuncentry macro
        dc.w    colfunc\<col>-colfunctab
        endm

colfunctab:
col set 0
        rept (1<<screend)
        colfuncentry
col set col+1
        endr


Thank you. I will examine this after I implement my LUT solution.
I most definitely want to have a discussion about LUTs for 6 bitplanes once I get my first LUT version up and running.

Quote:

Originally Posted by paraj (Post 1549831)
Even with this version you're not going to be drawing more than a couple of hundred pixels per frame (with a 320x256x6 display active).

Yeah, right now, my current 6-BPL version is 510c.

Given 6 bitplanes, the CPU will be at about 54% utilization after all the DMA - right ?
So, 0.54*119,333 = 64,439c available per frame, which results in 126 px (64439/510) rendered per frame.
That's actually quite nice. A starfield of ~128 stars in EHB mode is pretty good. I'm sure it will be a challenge to transform that many in second frame even with LUTs (so that we fit to 30 fps) and fit all the game logic there.

Hell, I'm sure people wouldn't complain too much if there was a 3D space game with 64 colors in 30 (or perhaps 20) fps.


All times are GMT +2. The time now is 21:25.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.

Page generated in 0.10272 seconds with 11 queries