English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General

 
 
Thread Tools
Old 08 June 2022, 01:25   #21
ImmortalA1000
Registered User
 
Join Date: Feb 2009
Location: london/england
Posts: 1,347
Quote:
Originally Posted by Adropac2 View Post
I recall from one of the coders of Fractulas or perhaps it was another of three games, saying essentially that there wouldn't be any especially great gains from the difference in power. To be fair I can see his logic because there's really not much only a little extra power could bring I would guess to the engine. It still would have been good though to have seen fractulas on Amiga
That's because Rescue on Fractalus, and therefore the rest of the Fractal game engines, are the brainchild of none other than Loren Carpenter, the God of 3D Fractal (he invented the whole idea in the late 70s using Boeing mainframes) and he never ever came back to Lucasfilm or ever wrote any other computer/console games as far as I know. Either way, like I said above, that little bonus in Masterblazer on ST/Amiga was not really a serious effort and not that impressive, the 68000 is the best 16bit CPU being put into computers of mid 1980s and significantly more powerful than what was in the C64, Amstrad CPC, Atari 800 or Apple II etc. Even without throwing extra CPU you still have the subtlety a Copper would bring to the table, combine that with 5 bitplanes and the mind boggles what a genius like Loren Carpenter could have done with an Amiga 1000, he really had no interest though, he was doing George Lucas a favour and a bit of personal challenge when asked by the Lucasfilm guys 'is it possible' but ultimately he was only interested in cutting edge computing fields of computer graphics not home gaming business.
ImmortalA1000 is offline  
Old 11 June 2022, 23:53   #22
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
Quote:
Originally Posted by ImmortalA1000 View Post
I was always amazed that there was never an Amiga port of Rescue on Fractalus.
That's because RoF was tailor-designed around Atari 400/800 HW capabilities. It uses its architecture to the max.
Its Viewport is 160x48. Lowest Res on Amiga is 320x200, so viewport would be 320x96. That's 4x more pixels, right there.

Also, I'm pretty sure lots of Atari coders tried a similar engine on OCS and when it ran like crap, they just discarded it. Amiga arch with bitplanes is horrible for a game like this. The 7 MHz doesn't compensate for difference in resolution, let alone the additional bitplane cost which is non-existent on Atari.

Quote:
Originally Posted by buzzybee View Post
I seem to remember an interview with Factor-5-guys about them trying to port RoF to Amiga, but did not succeed due to the hardware lacking in computational power. The "Masterblazer"-intro shows the proof-of-concept (or rather, not-proof) they came up with, have a look at 1:25 mins:

[ Show youtube player ]
I don't understand. How is the above "not-proof" ? Compared to 160x48 on Atari, this is 320x200, so about 8x more pixels. And it runs at higher framerate. MasterBlazer is nothing short of jaw-dropping.

But, I realize that to consider it jaw-dropping, one would have to have experience on both Atari and Amiga and not many people fulfill that condition...

Quote:
Originally Posted by Adropac2 View Post
I recall from one of the coders of Fractulas or perhaps it was another of three games, saying essentially that there wouldn't be any especially great gains from the difference in power. To be fair I can see his logic because there's really not much only a little extra power could bring I would guess to the engine. It still would have been good though to have seen fractulas on Amiga
As the MasterBlazer proves, if they cut the vertical resolution to half (cockpit view), then the framerate would be even higher than it is now. Certainly more than playable.
RoF doesn't need more than 8 fps anyway. Anything above 8 fps is just waste of performance and could be rather used for better visuals or physics.

Quote:
Originally Posted by ImmortalA1000 View Post
I did see that a long time ago, it is a very basic routine and given the 68000 at 7mhz is probably 500% better than even the top end spec (1.79mhz of Atari 800) it really wasn't good.
No, it isn't 500% better for RoF's ORA/EOR Filler. Far from it.
Even if the only difference was the resolution (160x96) vs (320x200), that's already a factor of 4x.

The second, much bigger issue is that 68000 is very inefficient in ops per cycle due to its architecture compared to 6502.Please don't give me some synthetic benchmark or edge case where you can get more ops from 68000 as we are talking about RoF here.

Sure, you have 16 registers and shitton of addressing modes on 68000. But as RoF proves, it's not needed. 2 index registers and 1 math register is clearly all that RoF needs.

On 6502 you can do amazing amount of ops in just 2 or 3 cycles. Not the case for 68000.

Fundamentally, frequency is irrelevant. What matters is how many ops you can do per frame (1/60s). For what RoF needs, it can easily beat 7 MHz 68000.

Amiga is great in Copper+Sprites+Blitter. But SW rasterizing sucks big time (understandably). RoF is using XOR/ORA filler.


I fully admit, that I was spoiled by Jaguar's architecture. It has 68000 at 13.3 MHz, but its framebuffers are chunky. So, it can deliver an oomph per 1 Mhz that is utterly impossible on Amiga with its bitplanes.

Hell, I have been recently rewriting lots of 3D routines from Atari 800 and running them under OCS and it's incredible how much can the little 6502 do in one frame. My appreciation for it rose substantially.


EDIT: But most people see things like:
1.79 MHz vs 7.16 Mhz
8-bit vs 16/32-bit
And think - "ooooh, that's just so.much.faster". Right :-D

Last edited by VladR; 12 June 2022 at 00:09. Reason: typos
VladR is offline  
Old 12 June 2022, 00:05   #23
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
Quote:
Originally Posted by ImmortalA1000 View Post
combine that with 5 bitplanes and the mind boggles what a genius like Loren Carpenter could have done with an Amiga 1000
5 bitplanes on OCS ? The CPU throughput would be further butchered due to DMA to about ~50%.

That's effectively ~3.58 MHz (and yes, I am aware that mul and div and other ops could run while DMA is blocking CPU, but those aren't needed for our use case here - RoF)

Now Jaguar, with 13.3 MHz 68000, 26.6 MHz GPU, 26.6 MHz DSP, 64-bit Blitter and a linear chunky framebuffer - yes that's a different ballgame...
VladR is offline  
Old 12 June 2022, 00:13   #24
mcgeezer
Registered User
 
Join Date: Oct 2017
Location: Sunderland, England
Posts: 2,702
Quote:
Originally Posted by VladR View Post
That's because RoF was tailor-designed around Atari 400/800 HW capabilities. It uses its architecture to the max.
Its Viewport is 160x48. Lowest Res on Amiga is 320x200, so viewport would be 320x96. That's 4x more pixels, right there.

Also, I'm pretty sure lots of Atari coders tried a similar engine on OCS and when it ran like crap, they just discarded it. Amiga arch with bitplanes is horrible for a game like this. The 7 MHz doesn't compensate for difference in resolution, let alone the additional bitplane cost which is non-existent on Atari.

I don't understand. How is the above "not-proof" ? Compared to 160x48 on Atari, this is 320x200, so about 8x more pixels. And it runs at higher framerate. MasterBlazer is nothing short of jaw-dropping.

But, I realize that to consider it jaw-dropping, one would have to have experience on both Atari and Amiga and not many people fulfill that condition...

As the MasterBlazer proves, if they cut the vertical resolution to half (cockpit view), then the framerate would be even higher than it is now. Certainly more than playable.
RoF doesn't need more than 8 fps anyway. Anything above 8 fps is just waste of performance and could be rather used for better visuals or physics.


No, it isn't 500% better for RoF's ORA/EOR Filler. Far from it.
Even if the only difference was the resolution (160x96) vs (320x200), that's already a factor of 4x.

The second, much bigger issue is that 68000 is very inefficient in ops per cycle due to its architecture compared to 6502.Please don't give me some synthetic benchmark or edge case where you can get more ops from 68000 as we are talking about RoF here.

Sure, you have 16 registers and shitton of addressing modes on 68000. But as RoF proves, it's not needed. 2 index registers and 1 math register is clearly all that RoF needs.

On 6502 you can do amazing amount of ops in just 2 or 3 cycles. Not the case for 68000.

Fundamentally, frequency is irrelevant. What matters is how many ops you can do per frame (1/60s). For what RoF needs, it can easily beat 7 MHz 68000.

Amiga is great in Copper+Sprites+Blitter. But SW rasterizing sucks big time (understandably). RoF is using XOR/ORA filler.


I fully admit, that I was spoiled by Jaguar's architecture. It has 68000 at 13.3 MHz, but its framebuffers are chunky. So, it can deliver an oomph per 1 Mhz that is utterly impossible on Amiga with its bitplanes.

Hell, I have been recently rewriting lots of 3D routines from Atari 800 and running them under OCS and it's incredible how much can the little 6502 do in one frame. My appreciation for it rose substantially.


EDIT: But most people see things like:
1.79 MHz vs 7.16 Mhz
8-bit vs 16/32-bit
And think - "ooooh, that's just so.much.faster". Right :-D
Uhuh [ Show youtube player ]

Quote:
Originally Posted by VladR View Post
5 bitplanes on OCS ? The CPU throughput would be further butchered due to DMA to about ~50%.

That's effectively ~3.58 MHz (and yes, I am aware that mul and div and other ops could run while DMA is blocking CPU, but those aren't needed for our use case here - RoF)

Now Jaguar, with 13.3 MHz 68000, 26.6 MHz GPU, 26.6 MHz DSP, 64-bit Blitter and a linear chunky framebuffer - yes that's a different ballgame...
Now I reckon you're Kieran Hawken, or Ace Rimmer.
mcgeezer is offline  
Old 12 June 2022, 00:19   #25
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
I wonder what OCS game has a 3D terrain that people would consider as good enough, considering the negative attitude towards MasterBlazer's terrain.
VladR is offline  
Old 12 June 2022, 00:43   #26
TCD
HOL/FTP busy bee
 
TCD's Avatar
 
Join Date: Sep 2006
Location: Germany
Age: 46
Posts: 31,535
Quote:
Originally Posted by VladR View Post
I wonder what OCS game has a 3D terrain that people would consider as good enough, considering the negative attitude towards MasterBlazer's terrain.
Hunter and Midwinter (2) spring to mind.
TCD is offline  
Old 12 June 2022, 00:44   #27
mcgeezer
Registered User
 
Join Date: Oct 2017
Location: Sunderland, England
Posts: 2,702
Quote:
Originally Posted by VladR View Post
I wonder what OCS game has a 3D terrain that people would consider as good enough, considering the negative attitude towards MasterBlazer's terrain.
Look, you’re effectively saying the Amiga can’t do ROF. ROF ran on an Atari 8 bit at about 10 FPS which was excellent. I love the game and i play it now an again. But to say an Amiga can’t do it because it hasnt been done before.

You’re wrong. It’s two planes with a very short display. The engine could be done. I dunno why it never appeared on Amiga/ST or other platforms, maybe a licensing thing but programmers could have done it.

Encounter probably a similar story.
mcgeezer is offline  
Old 12 June 2022, 01:39   #28
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
Quote:
Originally Posted by mcgeezer View Post
Look, you’re effectively saying the Amiga can’t do ROF. ROF ran on an Atari 8 bit at about 10 FPS which was excellent. I love the game and i play it now an again. But to say an Amiga can’t do it because it hasnt been done before.

You’re wrong. It’s two planes with a very short display. The engine could be done. I dunno why it never appeared on Amiga/ST or other platforms, maybe a licensing thing but programmers could have done it.
Yes, technically it can be done (there is MasterBlazer after all and it's fullscreen!). I'm not disputing that.

It's just that this whole "500%" performance compared to 1.79 Mhz 6502 is nonsense, especially with more than 4x pixels that Amiga has to draw compared to 160x48 viewport on Atari.

If Fractal terrain was easy to pull off on OCS Amiga in great framerate, surely there would be hundreds of games utilizing that by now. But it's a very different beast compared to setting few Copper registers or moving sprites around the screen like most games do!

Amiga is amazing for many things. SW rasterizer is not one of those things...

It's unfortunate Amiga didn't have native 160x200...

Go and implement XOR/ORA filler with 2 bitplanes and then let's compare how well it fares on 68000 compared to 6502. I'm not saying the second bitplane doubles the amount of CPU work, but it's close to that for that particular stage of the rendering pipeline (which is a pretty significant chunk of whole frame time).


Quote:
Originally Posted by mcgeezer View Post
Encounter probably a similar story.
Maybe it was ST (and not Amiga), but just few weeks ago I found some YT channel with Encounter and its successor (or was that predecessor?) game by Paul Woakes. Sorry, do not recall the name right now.
A proper 16-bit version, for sure. Very well suited for the HW!
VladR is offline  
Old 12 June 2022, 01:46   #29
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
Quote:
Originally Posted by mcgeezer View Post
Encounter probably a similar story.
Looks like I was wrong. While I did watch the ST version the other day, I also did watch an Amiga version of Encounter called Backlash:

[ Show youtube player ]
VladR is offline  
Old 12 June 2022, 02:30   #30
Adropac2
Zone Friend
 
Join Date: Jan 2006
Location: Kent
Age: 51
Posts: 1,057
the article I remember if I'm to read between the lines wasn't if this type thing could be done better necessarily but if a better engine on stronger machines would really allow for the concept of what Fractulas, Eidolon and Koronis Rift did to be taken further by them
Adropac2 is online now  
Old 12 June 2022, 03:37   #31
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
Quote:
Originally Posted by Adropac2 View Post
the article I remember if I'm to read between the lines wasn't if this type thing could be done better necessarily but if a better engine on stronger machines would really allow for the concept of what Fractulas, Eidolon and Koronis Rift did to be taken further by them
That would be an interesting read, for sure.

I don't doubt for a second that an enhanced version of Fractalus engine would look great. Just double the grid spacing and get 4x as much detail. But it really needs more color at that point (or dithering). We can have 64 shades on OCS (at the cost of butchering CPU due to 6 BPs) without any tricks.

The Million dollar question is, however, if it's going to look better than a Voxel engine would on same HW and same framerate.

Imagine Comanche on 386DX40. Same or lower framerate than Fractalus. Especially the datadisks with reflection ran at around 4 fps. Would Fractalus look better than that ? I don't think so, yet its overdraw is also substantial.

I've recently implemented Voxel terrain on Jaguar and got some hard data on overdraw and other performance characteristics that it shares with Fractalus (though that particular type of renderer is still on my to do list, for a good reason).

I simply don't believe it can trump voxel, though it does have a very clean, neat look that voxel does not...


And where exactly would we draw the HW line on Amiga ? 68060 ? At that point good luck competing with voxel engines that look great and move even better...
VladR is offline  
Old 12 June 2022, 03:45   #32
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
Quote:
Originally Posted by TCD View Post
Hunter and Midwinter (2) spring to mind.
Thank you, Midwinter 2 looks interesting, though I would hazard a guess that MasterBlazer would be more visually appealing to most people (but it's not a full game either hence its framerate would be lower with everything else in the game running during each frame).
VladR is offline  
Old 12 June 2022, 19:40   #33
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
Quote:
Originally Posted by ImmortalA1000 View Post
I did see that a long time ago, it is a very basic routine and given the 68000 at 7mhz is probably 500% better than even the top end spec (1.79mhz of Atari 800) it really wasn't good.
Now, as much as I love pulling numbers out of my arse, how about we throw in some real-world numbers ? I've recently implemented my very first OCS DrawPixel routine and tested it in 6-bitplane EHB (64 colors).
So, I can actually compare it to my own DrawPixel routine on 6502.

Disclaimer:
  • It's a first-version, generic, reference rasterizer routine without any optimizations whatsoever. Wouldn't use it in a game without spending another afternoon optimizing it.
  • I can see at least 3 immediate optimizations and will get to them (sooner or later - probably much later)
  • I'm sure somebody can supply a number from their own optimized routine on Amiga
  • I'm not claiming my 33-cycle 6502 DrawPixel routine is the fastest, but it is what I use and for my purposes it's "good enough" (such that I can't be arsed to write a faster one, at least )
Code:
Both Atari/Amiga Cycle budgets are NTSC
Atari's number accounts for DMA stealing. 
Both gfx modes are 4-color

----------------------------------------------------------------------------------
Platform   |   Frequency  |  Frame Cycles  |  DrawPixel Cycles  |  Pixels/Frame
----------------------------------------------------------------------------------
  6502            1.79            24,186                  33            732.9
 68000            7.16           119,333                 278            429.2
----------------------------------------------------------------------------------
Why DrawPixel ? Well, Star Raiders is a great example of a 3D game that is a flagship Atari game and is not present on the "500%" faster computer and renders pixels as its main FX.

For Amiga to merely equalize with the uber-puny 1.79 MHz Atari (a fair ask I believe given its architectural and CPU improvements), given its 4x higher clock speed, all it has to do is render 4X more pixels per frame.
732.9 x 4 = 2,931.6 pixels per frame
119,333 / 2,931.6 = 40.7 cycles per DrawPixel routine.


Can I please see some generic (x,y,color) 40-cycle DrawPixel (2 bpl) routine on Amiga ? Thank you !


EDIT: I spent an hour with a quick optimization and 6-Bitplane DrawPixel dropped from 1,576c to 1,108c. The 2-Bitplane one dropped from 612c to 554c. Well worth an hour. I can actually see it drawing faster under WinUAE...
EDIT2: Another hour and the 6-Bitplane DrawPixel dropped from 1,108c to 922c. The 2-Bitplane one dropped from 554c to 490c.
EDIT3: Another hour. 6-Bitplane DrawPixel dropped from 922c to 624c. The 2-Bitplane one dropped from 490c to 392c. Numbers are averages for full range of colors. Zero bits aren't forced anymore in this version. This is as far as I can get today without introducing LUT tables.
EDIT4: Another hour. 6-Bitplane DrawPixel dropped from 624c to 510c. The 2-Bitplane one dropped from 392c to 278c.

Last edited by VladR; 13 June 2022 at 16:49. Reason: Optimization Update
VladR is offline  
Old 12 June 2022, 20:10   #34
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,039
Quote:
Originally Posted by VladR View Post
Can I please see some generic (x,y,color) 40-cycle DrawPixel (2 bpl) routine on Amiga ? Thank you !
No offence man, but that's exactly how you end up with crappy atari ports to amiga. I've written at least a dozen different fx with dots (thousands of pix per frame, 50fps, and yeah that's not a proper game engine but it's far far from ~200 pix/frame) and none of them is using a generic drawpixel() because that's simply not how you do things.
If you *really* need something like that I'd use a 2^N wide screen, dump x/y into a list (one for each color), use blitter to convert x/y to bit (d0-d7) and 16-bit offset and dump the output to unrolled code, 1 instructiion (or 2 for color3) per pixel.
a/b is offline  
Old 12 June 2022, 22:03   #35
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
Quote:
Originally Posted by a/b View Post
No offence man, but that's exactly how you end up with crappy atari ports to amiga. I've written at least a dozen different fx with dots (thousands of pix per frame, 50fps, and yeah that's not a proper game engine but it's far far from ~200 pix/frame) and none of them is using a generic drawpixel() because that's simply not how you do things.
If you *really* need something like that I'd use a 2^N wide screen, dump x/y into a list (one for each color), use blitter to convert x/y to bit (d0-d7) and 16-bit offset and dump the output to unrolled code, 1 instructiion (or 2 for color3) per pixel.
Unfortunately, Star Raiders is not a compo demo with precomputed and unrolled code for each FX, but a real game where each frame you can turn anywhere in 3D space (free movement alongside all 3 axis), hence each dot is a generic pixel (after 3D transform).

Of course, with 0.5 MB RAM, Blitter, Copper and ~120,000 cycles per frame of 68000, there's no argument that 68000 can do some nice precomputed or real-time FX!

That being said, having written my very first OCS version of generic DrawPixel, for purposes of something like Star Raiders, it's obvious we would need multiple versions, probably sorted per color to avoid needless table look-ups and removing unneeded bit manipulations (especially for bit 0). Then again, time to sort the batches might be longer than the gains (it all depends on number of pixels), so extensive benchmarking would be needed.

But it is an interesting engineering problem, one with plethora different approaches on Amiga...
VladR is offline  
Old 13 June 2022, 12:51   #36
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,099
No reason why a/b's approach wouldn't work in general, and I think with an extra blitter pass could be used with any screen width evenly divisible by 8.

A generic putpixel function hits some of the things Amiga doesn't do well, so you would probably never design something new that would make heavy use of that. Also don't know why you'd forgo using LUTs?

I'm sure it can be improved, but something like the below for intereleaved bitmaps should quite a bit faster (though still not very fast). If you don't to clear 0 bits the "else bclr" part can be left out for further speedup.
Code:
        ; a0=dest,d0=x,d1=y,d2=color
        ; registers modified:d0,d1,d2,a1
        ; cost: 102 (20/0)
        ; or with mulu.w/add.l: 126-156 (17/0)
        ; + 108 (21/6) @ 6 BPL
putpixel:
        add.w   d1,d1
        add.w   d1,d1
        add.l   ytable(pc,d1.w),a0
        move.w  d0,d1
        lsr.w   #3,d0
        add.w   d0,a0
        not.w   d1
        and.w   #7,d1
        add.w   d2,d2
        lea     colfunctab(pc),a1
        move.w  0(a1,d2.w),d2
        jmp     0(a1,d2.w)

ytable:
.ofs set 0
        rept screenh
        dc.l .ofs
.ofs set .ofs+rowbytes*screend
        endr

        ; cost: 12+16*screend (2+3*(screend-1)+4/screend)
colfunc macro
colfunc\<col>:
.cnt set 0
        rept screend
        ifne (col&(1<<.cnt))
        bset.b  d1,.cnt*rowbytes(a0)
        else
        bclr.b  d1,.cnt*rowbytes(a0)
        endc
.cnt set .cnt+1
        endr
        rts
        endm

col set 0
        rept (1<<screend)
        colfunc
col set col+1
        endr

colfuncentry macro
        dc.w    colfunc\<col>-colfunctab
        endm

colfunctab:
col set 0
        rept (1<<screend)
        colfuncentry
col set col+1
        endr
Even with this version you're not going to be drawing more than a couple of hundred pixels per frame (with a 320x256x6 display active).
paraj is offline  
Old 13 June 2022, 13:35   #37
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by VladR View Post
Now, as much as I love pulling numbers out of my arse, how about we throw in some real-world numbers ? I've recently implemented my very first OCS DrawPixel routine and tested it in 6-bitplane EHB (64 colors).
So, I can actually compare it to my own DrawPixel routine on 6502.
May we see said 6502 DrawPixel routine ? It would be more "real-world" if we had the code.
Would be interesting to see that XOR/ORA filler as well.
I've never seen 6502 code beating 68000.
meynaf is offline  
Old 13 June 2022, 15:30   #38
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,039
My point was, at least that's how I'd approach these things, if you write a game that heavily relies on drawing hundreds of pixels every frame you do not say: OK, I have a drawpixel() routine that takes care of that and we can sort it out later (even if it does work out, you will still be wasting a decent amount of milliseconds that could be better spent elsewhere).
I'd design my game around a drawmanypixels() routine, it would be tightly integrated into the game and part of design right from the start. Be it the format of bitmap (width 256, 320, 512-modulo, ..., interleaved or not, how much memory you can afford to waste), blitter available for mass processing yes/no, bus bandwidth usage e.g. is blitter running at the same time and/or high number of bitplanes (meaning cpu plotter with cycle-wise slower 1 mem access shifts/rolls becomes faster than LUTs with multiple mem accesses), how much memory can be reserved for unrolls and/or support tables so it can ran reasonably fast, ...
And that's where the FX stuff comes in. Yeah, and I said already those are not actual game engines, but *many* techniques and tricks still apply, and will get you to the finish line easier than a drawpixel().
Because, unfortunately (or not), we have to deal with bitplanes. And as soon as you want to draw a large number of multi-color pixels and have to hammer individual bits you are in for a lot of hurt by doing it one by one.
a/b is offline  
Old 13 June 2022, 17:04   #39
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
Quote:
Originally Posted by meynaf View Post
May we see said 6502 DrawPixel routine ? It would be more "real-world" if we had the code.
Would be interesting to see that XOR/ORA filler as well.
I've never seen 6502 code beating 68000.
Sure, I can try to dig up the 6502 routine from my sources later today or tomorrow and post it here.
Unfortunately, like I said, I have not implemented the XOR/ORA column filler used in Fractalus. Because the moment I do that, I engage in yet another rabbit hole of refactoring that will consume few months of my limited free time. It's fairly high on my to-do list of things to try, but not in top 3.

Quote:
Originally Posted by a/b View Post
My point was, at least that's how I'd approach these things, if you write a game that heavily relies on drawing hundreds of pixels every frame you do not say: OK, I have a drawpixel() routine that takes care of that and we can sort it out later (even if it does work out, you will still be wasting a decent amount of milliseconds that could be better spent elsewhere).
Of course, as I already said before, I wouldn't just do that on Amiga (that's basically just a first reference rasterizer version for testing).My initial hunch was to group the pixels by the Bitplane layer index as one of the first approaches to try.

I don't mind writing dozen versions, really. Even if half of them do not turn out to be faster.
But, so far, after having spent 4 hours on optimizing, I came up with 4 versions and each was faster than the previous one.
The 6-bitplane one dropped from 1,576c down to 510c. I consider that a nice speed-up from one afternoon of effort.

Quote:
Originally Posted by a/b View Post
I'd design my game around a drawmanypixels() routine, it would be tightly integrated into the game and part of design right from the start. Be it the format of bitmap (width 256, 320, 512-modulo, ..., interleaved or not, how much memory you can afford to waste), blitter available for mass processing yes/no, bus bandwidth usage e.g. is blitter running at the same time and/or high number of bitplanes (meaning cpu plotter with cycle-wise slower 1 mem access shifts/rolls becomes faster than LUTs with multiple mem accesses), how much memory can be reserved for unrolls and/or support tables so it can ran reasonably fast, ...
And that's where the FX stuff comes in. Yeah, and I said already those are not actual game engines, but *many* techniques and tricks still apply, and will get you to the finish line easier than a drawpixel().
Because, unfortunately (or not), we have to deal with bitplanes. And as soon as you want to draw a large number of multi-color pixels and have to hammer individual bits you are in for a lot of hurt by doing it one by one.
Well, on Atari, I do the grouping because there is only 3 colors to draw and just few pixels, so it's easy to do manually and each DrawPixel version has LUTs for specific color mask.
That's one of the approaches to try on Amiga too.

I am certainly curious how we can use the Blitter for this scenario.

I'm loving this discussion ! So much stuff to learn here
VladR is offline  
Old 13 June 2022, 17:43   #40
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
Quote:
Originally Posted by paraj View Post
No reason why a/b's approach wouldn't work in general, and I think with an extra blitter pass could be used with any screen width evenly divisible by 8.
I'm not saying it wouldn't work. I haven't yet examined it. I wanna run out of CPU alternatives first and I keep getting new ideas with each optimization I do (so far 4 done and 2 more on the to-do list)

Quote:
Originally Posted by paraj View Post
A generic putpixel function hits some of the things Amiga doesn't do well, so you would probably never design something new that would make heavy use of that.
OK, maybe not entirely generic.
But, if you imagine a Star Raiders starfield, the stars basically cover all XPOS and YPOS points and they can be of any color (say, 16 or 32 or 64).
So, that's kinda generic as we don't have control over where the points end up on screen. It depends on player's input (3 axis of rotation).

We could certainly sort and group them based on color index, for example. This would avoid setting unwanted bits or even checking for 0 and bypassing the set.

Quote:
Originally Posted by paraj View Post
Also don't know why you'd forgo using LUTs?
I didn't forgo LUTs, like I said, they're on the to-do list. Once I run out of possibilities to optimize my current version, that's when I will go for LUTs.

Quote:
Originally Posted by paraj View Post
I'm sure it can be improved, but something like the below for intereleaved bitmaps should quite a bit faster (though still not very fast). If you don't to clear 0 bits the "else bclr" part can be left out for further speedup.
Code:
        ; a0=dest,d0=x,d1=y,d2=color
        ; registers modified:d0,d1,d2,a1
        ; cost: 102 (20/0)
        ; or with mulu.w/add.l: 126-156 (17/0)
        ; + 108 (21/6) @ 6 BPL
putpixel:
        add.w   d1,d1
        add.w   d1,d1
        add.l   ytable(pc,d1.w),a0
        move.w  d0,d1
        lsr.w   #3,d0
        add.w   d0,a0
        not.w   d1
        and.w   #7,d1
        add.w   d2,d2
        lea     colfunctab(pc),a1
        move.w  0(a1,d2.w),d2
        jmp     0(a1,d2.w)

ytable:
.ofs set 0
        rept screenh
        dc.l .ofs
.ofs set .ofs+rowbytes*screend
        endr

        ; cost: 12+16*screend (2+3*(screend-1)+4/screend)
colfunc macro
colfunc\<col>:
.cnt set 0
        rept screend
        ifne (col&(1<<.cnt))
        bset.b  d1,.cnt*rowbytes(a0)
        else
        bclr.b  d1,.cnt*rowbytes(a0)
        endc
.cnt set .cnt+1
        endr
        rts
        endm

col set 0
        rept (1<<screend)
        colfunc
col set col+1
        endr

colfuncentry macro
        dc.w    colfunc\<col>-colfunctab
        endm

colfunctab:
col set 0
        rept (1<<screend)
        colfuncentry
col set col+1
        endr
Thank you. I will examine this after I implement my LUT solution.
I most definitely want to have a discussion about LUTs for 6 bitplanes once I get my first LUT version up and running.

Quote:
Originally Posted by paraj View Post
Even with this version you're not going to be drawing more than a couple of hundred pixels per frame (with a 320x256x6 display active).
Yeah, right now, my current 6-BPL version is 510c.

Given 6 bitplanes, the CPU will be at about 54% utilization after all the DMA - right ?
So, 0.54*119,333 = 64,439c available per frame, which results in 126 px (64439/510) rendered per frame.
That's actually quite nice. A starfield of ~128 stars in EHB mode is pretty good. I'm sure it will be a challenge to transform that many in second frame even with LUTs (so that we fit to 30 fps) and fit all the game logic there.

Hell, I'm sure people wouldn't complain too much if there was a 3D space game with 64 colors in 30 (or perhaps 20) fps.
VladR is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Help Fund the Amiga 4000 Replica Project! Acill Amiga scene 82 02 March 2020 20:04
Financial Fund London Amiga or PC runandbecome Amiga scene 8 30 September 2016 00:44
An idea for continued games development... using Amiga Galahad/FLT Amiga scene 91 29 December 2010 11:45
Amiga development freehand Retrogaming General Discussion 4 18 April 2010 17:53
Amizilla Fund closes in on almost $9000 in donations; first one that donates and gets Pyromania News 0 11 January 2005 11:00

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 13:59.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.10281 seconds with 14 queries