English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 13 January 2018, 15:45   #41
LaBodilsen
Registered User

 
Join Date: Dec 2017
Location: Gandrup / Denmark
Posts: 26
Quote:
Originally Posted by Master484 View Post
Could a system be used where the graphics of the previous frame are never cleared, but instead we preserve them, and when drawing a new frame we only draw those pixels that have been changed since the last frame, and therefore need to be updated.

And those pixels that remain the same color as they did last frame, are simply skipped.

So every frame we would go through the pixels one by one, and check the current color versus the color that it should be; and draw the pixel only in the case where the "should be color" is different from the current color.

I think in quite many cases the individual pixel colors in two consequtive frames would be the same, and also you would never need to totally "clear" the screen. So could this method work or boost the speed?
explained for a non-coder

what is currently happening:
Code:
copy TexturePixel1 to ScreenPixel1
copy TexturePixel2 to ScreenPixel2
copy TexturePixel3 to ScreenPixel3
etc
what you suggest would be
Code:
Read ScreenPixel1
Compare ScreenPixel1 to TexturePixel1
IF not equal:	Write TexturePixel1 to ScreenPixel1
If equal:	Read ScreenPixel2
Compare ScreenPixel2 to TexturePixel2
IF not equal:	Write TexturePixel2 to ScreenPixel2
If equal:	Read ScreenPixel3
Compare ScreenPixel3 to TexturePixel3
IF not equal:	Write TexturePixel3 to ScreenPixel3
If equal etc.....
So when the pixels are not equal, the current way is _MUCH_ faster than your suggestion. And if the pixel are equal, the current way is still faster, as you would have to read each screenpixel anyway to make a compare. Hope it makes sence.

Last edited by LaBodilsen; 13 January 2018 at 20:46.
LaBodilsen is offline  
AdSense AdSense  
Old 13 January 2018, 19:31   #42
Master484
Registered User
Master484's Avatar
 
Join Date: Nov 2015
Location: Vaasa, Finland
Posts: 292
Thanks for the explanation. Indeed that makes my idea look quite slow. So back to the drawing board.

Although in theory the idea is fascinating, because the previous frame always approximately contains the same image that we are going to draw next. So if the old data could be somehow used to build the current frame, without too many reads and comparisons, then something like this might be useful.

But right now it's just an interesting theory, and I don't know how to make it work.
Master484 is offline  
Old 13 January 2018, 22:27   #43
chb
Registered User

 
Join Date: Dec 2014
Location: germany
Posts: 63
Quote:
Originally Posted by Master484 View Post
Although in theory the idea is fascinating, because the previous frame always approximately contains the same image that we are going to draw next
That's a wrong assumption, or a misconception what "approximately" means in this context. Imagine walking in z-direction in the game. The zoom factor for every column changes and, because of perspective, the columns also move horizontally. No wall pixel will stay in its place, at least not in an easily predictable way.

You might think of something like video motion compensation, where blocks of pixels are moved to approximate the next frame. But that's not possible here, predicting the next frame would be much slower than just rendering it.

What could work: There are sometimes large parts of the image that are not textured - floor and ceiling. One could divide the screen in 16 or 32 pixels wide vertical stripes, determine the maximal distance from the center in y for each stripe, and do c2p and clearing only on that part. But there's a number of disadvantages: Depending on your chunky buffer layout, this might be rather hard to do; you have to call the blitter for c2p much more often (n times for every stripe, instead of n times for the whole image), which introduces some overhead - more if you use interrupts, less if you do copper waits, but in the latter case you have to be careful with frame boundaries (copper list is restarted at vertical blank); it improves only some cases (lot of floor & ceiling), but not all - so the frame rate could be quite unsteady, albeit on average higher. Probably not that desirable for a game. On the other hand, if you have a lot of columns with a high zoom factor (so close to a wall), you might in general see less enemies and other sprites, which speeds up rendering and could compensate for this.

Last edited by chb; 13 January 2018 at 22:51. Reason: clarification
chb is offline  
Old 14 January 2018, 11:59   #44
Master484
Registered User
Master484's Avatar
 
Join Date: Nov 2015
Location: Vaasa, Finland
Posts: 292
I thought a little bit more about my idea, and I came up with a version that doesn't need any data reads or pixel comparisons.

So we would start with the same thing: no screen clearing and the previous frame is always preserved.

But when drawing the next frame, we only draw every other column. So if screen has 320 columns, we only update 160 of them. This results in a screen where every other column is new and every other is from the previous frame. And when the next frame arrives then we change the draw order: if in previous frame we drew columns 1, 3 and 5 then in this frame we draw columns 2, 4 and 6.

So this method would surely be faster: no screen clearing, and only need to raycast half of the columns each frame.

However, I'm not sure if this would look good. But because every other column would be 100% right, and the other half would be "approximately right", then maybe it could look acceptable?

---

Also, a more advanced version of this could be something like this: we draw every column, but skip every other pixel in the columns.

So in each column only every other pixel would be updated, and the rest would be from the previous frame. And for the next column we change the order, so if in the current column we updated pixels 1, 3 and 5 then in the next column we update pixels 2, 4 and 6.

So the resulting screen would have a "chessboard pattern" of new and old pixels. And when the frame changes, then we switch the pixel updating order in the columns, so that the old and new pixels alternate every frame.

But again, would this look good? Don't know.

Last edited by Master484; 14 January 2018 at 12:05.
Master484 is offline  
Old 14 January 2018, 12:38   #45
Galahad/FLT
Going nowhere

Galahad/FLT's Avatar
 
Join Date: Oct 2001
Location: United Kingdom
Age: 44
Posts: 6,660
Quote:
Originally Posted by Master484 View Post
I thought a little bit more about my idea, and I came up with a version that doesn't need any data reads or pixel comparisons.

So we would start with the same thing: no screen clearing and the previous frame is always preserved.

But when drawing the next frame, we only draw every other column. So if screen has 320 columns, we only update 160 of them. This results in a screen where every other column is new and every other is from the previous frame. And when the next frame arrives then we change the draw order: if in previous frame we drew columns 1, 3 and 5 then in this frame we draw columns 2, 4 and 6.

So this method would surely be faster: no screen clearing, and only need to raycast half of the columns each frame.

However, I'm not sure if this would look good. But because every other column would be 100% right, and the other half would be "approximately right", then maybe it could look acceptable?

---

Also, a more advanced version of this could be something like this: we draw every column, but skip every other pixel in the columns.

So in each column only every other pixel would be updated, and the rest would be from the previous frame. And for the next column we change the order, so if in the current column we updated pixels 1, 3 and 5 then in the next column we update pixels 2, 4 and 6.

So the resulting screen would have a "chessboard pattern" of new and old pixels. And when the frame changes, then we switch the pixel updating order in the columns, so that the old and new pixels alternate every frame.

But again, would this look good? Don't know.
Alternating the pixel drawing would result in the whole screen appearing to "flash" as it redraws, which in lowres would be magnified over something like hires.
Galahad/FLT is offline  
Old 14 January 2018, 15:09   #46
Kalms
Registered User
 
Join Date: Nov 2006
Location: Stockholm, Sweden
Posts: 169
@master484 How about you make proof-of-concepts of your ideas? If it is just a question of "what will the end result look like" and not "how quick will it run on the a500" then you don't need to implement a full Wolf3d renderer yourself; you can simulate the various ideas by doing video processing offline on a set of frame captures. (For source data, you could run wolf3d_v2 in an amiga emulator and do a video capture.) That way you will better understand which ideas are worth sharing, and that in turn saves time for others in the thread.
Kalms is offline  
Old 14 January 2018, 17:47   #47
britelite
Registered User
 
Join Date: Feb 2010
Location: Espoo / Finland
Posts: 322
I have to agree with Kalms here, posting random suggestions without displaying any knowledge of the subject at hand is better suited for the other Wolf3d-thread.
britelite is offline  
Old 14 January 2018, 19:01   #48
Master484
Registered User
Master484's Avatar
 
Join Date: Nov 2015
Location: Vaasa, Finland
Posts: 292
Ok, I'll exit the thread and leave it to you guys.

I just thought to share these ideas, as they seemed to good to me. But agreed, maybe it would be better to make a some kind of proof of concept demo first. I do have a book about raycasting, so maybe I'll try to cook something with Blitz if I have time.
Master484 is offline  
Old 17 January 2018, 22:13   #49
britelite
Registered User
 
Join Date: Feb 2010
Location: Espoo / Finland
Posts: 322
Alright, had some spare time to tinker with the rendering again, and now the stream runs at a nice steady 25fps (around 1.6-1.9 frames). The rendering is now done horizontally, with the previously added double pixel blitter pass modified to also rotate the buffer back 90 degrees.

There's still room for improvement in the wall rendering (for example, the code slices could be modified to draw longwords where possible). But I think the next would be to make the raycasting real time for interactivity, although this will probably not be happening any time soon

The usual preview is available here
britelite is offline  
Old 18 January 2018, 09:19   #50
LaBodilsen
Registered User

 
Join Date: Dec 2017
Location: Gandrup / Denmark
Posts: 26
Quote:
Originally Posted by britelite View Post
Alright, had some spare time to tinker with the rendering again, and now the stream runs at a nice steady 25fps (around 1.6-1.9 frames).
i'm lost for words.. this is simply amazing.

Quote:
The rendering is now done horizontally, with the previously added double pixel blitter pass modified to also rotate the buffer back 90 degrees.
So you get the 90 degrees rotation almost for free, as you have to perform that blitter pass anyway to sort out upper and lower pixels?

Quote:
There's still room for improvement in the wall rendering (for example, the code slices could be modified to draw longwords where possible).
Would that really help?.. as the A500 memorybus is 16bit, if you write a longword, it would still take as many cycles as writing 2 words? i think any improvement would be minimal.

Quote:
But I think the next would be to make the raycasting real time for interactivity, although this will probably not be happening any time soon
Maybe someone else can contribute to this. would you mind if anyone used the wall render for their own project, as long as proper credits are given?

If we could make a kinda framework for a game engine, then maybe others would love to create a game (or port) with that.

Quote:
The usual preview is available here
So cool, something to play with over the weekend. and maybe combine it with the raycaster i'm currently trying to create.
LaBodilsen is offline  
Old 18 January 2018, 09:29   #51
britelite
Registered User
 
Join Date: Feb 2010
Location: Espoo / Finland
Posts: 322
Quote:
Originally Posted by LaBodilsen View Post
So you get the 90 degrees rotation almost for free, as you have to perform that blitter pass anyway to sort out upper and lower pixels?
Indeed, the pass is slightly slower now as I need to restart the blitter more often.

Quote:
Would that really help?.. as the A500 memorybus is 16bit, if you write a longword, it would still take as many cycles as writing 2 words? i think any improvement would be minimal.
It might save a few cycles in some cases, but as you say it would probably be minimal.

Quote:
Maybe someone else can contribute to this. would you mind if anyone used the wall render for their own project, as long as proper credits are given?
Of course not, would be happy to see something real materialize from this.
britelite is offline  
Old 18 January 2018, 11:28   #52
chb
Registered User

 
Join Date: Dec 2014
Location: germany
Posts: 63
Wow, running in 1.6 -1.9 frames? Totally awesome! Congratulations, great achievement.

Quote:
Originally Posted by LaBodilsen View Post
Would that really help?.. as the A500 memorybus is 16bit, if you write a longword, it would still take as many cycles as writing 2 words? i think any improvement would be minimal.
The memory bus is only 16 bit, true, but you always need to fetch the instruction itself, too. That's why move.l is faster even on the A500: a move.w Dn,(a0)+ is 8 cycles (two memory accesses, one for the instruction and one for the data), a move.l is 12 cycles (three memory accesses, one for the instruction and two for the data). So one move.l Dn,(a0)+ takes 25% less time than two word moves. But that's a quite optimal case, for instructions that do more memory fetches (e.g. move.x (a0)+,(a1)+) the ratio of instruction fetches to memory access is lower and therefore the advantage smaller. On the other hand, if you use an instruction with complex address calculation like "d(An)" or "d(An,ix)", the gain may be even higher, as address calculations are always 32-bit anyway and have to be carried out twice for the two word instructions *- but then again that internal calculation is not slowed down by other DMA memory access.... So, it's complicated.

EDIT: The neogeo dev wiki has a nice table with instruction/addressing modes timings, AFAICS it's from the official 68000 manual, but with nicer formatting:
https://wiki.neogeodev.org/index.php...ctions_timings

EDIT2: *Hmm, does not seem to be true according to the table I linked...

Last edited by chb; 18 January 2018 at 11:42.
chb is offline  
Old 18 January 2018, 12:51   #53
LaBodilsen
Registered User

 
Join Date: Dec 2017
Location: Gandrup / Denmark
Posts: 26
@chb
makes sense, so atleast some gain could be made by using longwords. maybe not much, but for a game like this running on A500, we would need to make use of all the tricks available

@Britelite
Do you have any good ideas for sprites, as i would see that as the next performance killer.
LaBodilsen is offline  
Old 18 January 2018, 13:16   #54
Mathesar
Registered User

Mathesar's Avatar
 
Join Date: Aug 2014
Location: Netherlands
Posts: 108
Quote:
Originally Posted by britelite View Post
Alright, had some spare time to tinker with the rendering again, and now the stream runs at a nice steady 25fps (around 1.6-1.9 frames). The rendering is now done horizontally, with the previously added double pixel blitter pass modified to also rotate the buffer back 90 degrees.
Simply amazing!

About the raycaster. All tutorials I have seen state that you need to cast a ray for every horizontal pixel in the projection plane. Which is, in this case, 160 pixels right? So 160 rays to be casted.

I was thinking, if the scene is simple, most rays would would hit the same wall. What if you cast every other ray (only even rays) and when two rays hit the same wall just interpolate the ray inbetween? Might save a few cycles. You could do it even dirtier by only calculating every fourth ray or so and when they hit the same wall, interpolate the other 3 rays.
Mathesar is offline  
Old 18 January 2018, 13:28   #55
britelite
Registered User
 
Join Date: Feb 2010
Location: Espoo / Finland
Posts: 322
Quote:
Originally Posted by LaBodilsen View Post
Do you have any good ideas for sprites, as i would see that as the next performance killer.
One idea would be to check the zoom factor of a visible sprite and depending on factor choose one of two different methods of drawing. If the sprite is small enough then just scale the sprite/mask directly to the buffer with CPU, but if a sprite is larger, then scale the sprite/mask vertically to a separate buffer with CPU and then draw it to the screenbuffer with the blitter, expanding it horizontally in the process.
britelite is offline  
Old Yesterday, 09:07   #56
LaBodilsen
Registered User

 
Join Date: Dec 2017
Location: Gandrup / Denmark
Posts: 26
Quote:
Originally Posted by britelite View Post
The usual preview is available here
I've had a change to view the new preview, and the speed is simply amazing. But i'm not particular fond of the mipmap artifacts, as it really degrades the visual quality.

Would it make much of a performance difference if the mipmaping was done differently. Instead of upscaling the smaller levels, then down scale to the next mipmap level, and then compensate the performance loss by using longwords where possible.

ex: for wall height between 64-33 use the fullsize texture, for 32-17 use first mipmap level texture and for 16-0 use lowest mipmap level texture. (hope i make sence)

i would'nt mind if i had to manually hand optimize every wall height below 64 pixels, as it's no more than 32 code segments.
LaBodilsen is offline  
Old Yesterday, 09:22   #57
britelite
Registered User
 
Join Date: Feb 2010
Location: Espoo / Finland
Posts: 322
Quote:
Originally Posted by LaBodilsen View Post
But i'm not particular fond of the mipmap artifacts, as it really degrades the visual quality.
I agree that the visual quality is degraded, but would it really matter during gameplay?

Quote:
Would it make much of a performance difference if the mipmaping was done differently. Instead of upscaling the smaller levels, then down scale to the next mipmap level, and then compensate the performance loss by using longwords where possible.
It would add 4 cycles to every pixel drawn with downscaling instead of upscaling, so the best case scenarios (when the walls are smaller and have less pixels to draw) would take a hit, but the worst case (when drawing full height strips) would remain the same. Mipmapping could also be added as an option in a game, so the player could choose between slightly better framerate or more precision.
britelite is offline  
Old Yesterday, 10:37   #58
LaBodilsen
Registered User

 
Join Date: Dec 2017
Location: Gandrup / Denmark
Posts: 26
Quote:
Originally Posted by britelite View Post
It would add 4 cycles to every pixel drawn with downscaling instead of upscaling, so the best case scenarios (when the walls are smaller and have less pixels to draw) would take a hit, but the worst case (when drawing full height strips) would remain the same.
Would it add 4 cycles in all cases, if you used the approach that you suggested in the first post, for walls smaller than texture size.

Code:
...
move.w (a1)+,(a0)+
move.w (a1)+,(a0)+
addq.l #2,a1 ; skipping an additional word
move.w (a1)+,(a0)+
move.w (a1)+,(a0)+
...
of course the closer we get to" move, add, move, add". it would not make sence.

Quote:
Mipmapping could also be added as an option in a game, so the player could choose between slightly better framerate or more precision.
I like that, it would be a great way for people to choose what is more important for them.

Last edited by LaBodilsen; Yesterday at 10:48.
LaBodilsen is offline  
Old Yesterday, 11:11   #59
LaBodilsen
Registered User

 
Join Date: Dec 2017
Location: Gandrup / Denmark
Posts: 26
Just tried some rough calculations to see the cycle count, to see if using Downscaling with longwords could be worth it. i assume you upscale the mipmap like below.

Mipmap upscale:
Code:
move.w	(a1),(a0)+		; 12 cycles
move.w	(a1)+,(a0)+		; 12 cycles
move.w	(a1),(a0)+		; 12 cycles
move.w	(a1)+,(a0)+		; 12 cycles
= 48 cycles
Mipmap downscale:
Code:
move.w	(a1)+,(a0)+		; 12 cycles
move.w	(a1)+,(a0)+		; 12 cycles
addq.l	#2,A1			; 4 cycles
move.w	(a1)+,(a0)+		; 12 cycles
move.w	(a1)+,(a0)+		; 12 cycles
= 52 cycles
Mipmap downscale with longwords:
Code:
move.l	(a1)+,(a0)+		; 20 cycles
addq.l	#4,A1			; 8 cycles
move.l	(a1)+,(a0)+		; 20 cycles
= 48 cycles
So in some cases downscaling with longwords would be faster than mipmap upscaling* So in some cases downscaling with longwords would be as fast as mipmap upscaling, and downscaling with words are in some cases only 4 cycles slower per 4 pixels. or am i missing the point here? ofcourse using longwords with mipmap upscaling would also in some cases be even faster.

Just an idea to take advantage of this:
for walls between 63 - 48 use downscaling of 64px texture with longwords where possible, and for 47-33 use mipmap upscaling of 32px texture with longwords where possible.

as mentioned, i would'nt mind if i had to manually hand optimize every wall height below 64 pixels

EDIT: *Changed the cycle count after Tony corrected me

Last edited by LaBodilsen; Yesterday at 13:32. Reason: Changed the cycle count after Tony corrected me
LaBodilsen is offline  
Old Yesterday, 13:25   #60
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 42
Posts: 20,174
addq.l #x,an is 8 cycles. (memory access + 4 idle cycles)
Toni Wilen is offline  
AdSense AdSense  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Wolf3D on stock A500 gururise Retrogaming General Discussion 9 08 November 2017 15:03
Wolf3d: more ideas. AndNN Coders. Asm / Hardware 7 17 October 2017 14:03
Optimizing HAM8 renderer. Thorham Coders. Asm / Hardware 5 22 June 2017 19:29
NetSurf AGA optimizing arti Coders. Asm / Hardware 199 10 November 2013 15:36
rendering under wb 1.3 _ThEcRoW request.Apps 2 02 October 2005 18:23

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 07:21.


Powered by vBulletin® Version 3.8.8 Beta 1
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Page generated in 0.28114 seconds with 15 queries