13 January 2018, 14:45 | #41 | |
Registered User
Join Date: Dec 2017
Location: Denmark
Posts: 179
|
Quote:
what is currently happening: Code:
copy TexturePixel1 to ScreenPixel1 copy TexturePixel2 to ScreenPixel2 copy TexturePixel3 to ScreenPixel3 etc Code:
Read ScreenPixel1 Compare ScreenPixel1 to TexturePixel1 IF not equal: Write TexturePixel1 to ScreenPixel1 If equal: Read ScreenPixel2 Compare ScreenPixel2 to TexturePixel2 IF not equal: Write TexturePixel2 to ScreenPixel2 If equal: Read ScreenPixel3 Compare ScreenPixel3 to TexturePixel3 IF not equal: Write TexturePixel3 to ScreenPixel3 If equal etc..... Last edited by LaBodilsen; 13 January 2018 at 19:46. |
|
13 January 2018, 18:31 | #42 |
Registered User
Join Date: Nov 2015
Location: Vaasa, Finland
Posts: 525
|
Thanks for the explanation. Indeed that makes my idea look quite slow. So back to the drawing board.
Although in theory the idea is fascinating, because the previous frame always approximately contains the same image that we are going to draw next. So if the old data could be somehow used to build the current frame, without too many reads and comparisons, then something like this might be useful. But right now it's just an interesting theory, and I don't know how to make it work. |
13 January 2018, 21:27 | #43 | |
Registered User
Join Date: Dec 2014
Location: germany
Posts: 440
|
Quote:
You might think of something like video motion compensation, where blocks of pixels are moved to approximate the next frame. But that's not possible here, predicting the next frame would be much slower than just rendering it. What could work: There are sometimes large parts of the image that are not textured - floor and ceiling. One could divide the screen in 16 or 32 pixels wide vertical stripes, determine the maximal distance from the center in y for each stripe, and do c2p and clearing only on that part. But there's a number of disadvantages: Depending on your chunky buffer layout, this might be rather hard to do; you have to call the blitter for c2p much more often (n times for every stripe, instead of n times for the whole image), which introduces some overhead - more if you use interrupts, less if you do copper waits, but in the latter case you have to be careful with frame boundaries (copper list is restarted at vertical blank); it improves only some cases (lot of floor & ceiling), but not all - so the frame rate could be quite unsteady, albeit on average higher. Probably not that desirable for a game. On the other hand, if you have a lot of columns with a high zoom factor (so close to a wall), you might in general see less enemies and other sprites, which speeds up rendering and could compensate for this. Last edited by chb; 13 January 2018 at 21:51. Reason: clarification |
|
14 January 2018, 10:59 | #44 |
Registered User
Join Date: Nov 2015
Location: Vaasa, Finland
Posts: 525
|
I thought a little bit more about my idea, and I came up with a version that doesn't need any data reads or pixel comparisons.
So we would start with the same thing: no screen clearing and the previous frame is always preserved. But when drawing the next frame, we only draw every other column. So if screen has 320 columns, we only update 160 of them. This results in a screen where every other column is new and every other is from the previous frame. And when the next frame arrives then we change the draw order: if in previous frame we drew columns 1, 3 and 5 then in this frame we draw columns 2, 4 and 6. So this method would surely be faster: no screen clearing, and only need to raycast half of the columns each frame. However, I'm not sure if this would look good. But because every other column would be 100% right, and the other half would be "approximately right", then maybe it could look acceptable? --- Also, a more advanced version of this could be something like this: we draw every column, but skip every other pixel in the columns. So in each column only every other pixel would be updated, and the rest would be from the previous frame. And for the next column we change the order, so if in the current column we updated pixels 1, 3 and 5 then in the next column we update pixels 2, 4 and 6. So the resulting screen would have a "chessboard pattern" of new and old pixels. And when the frame changes, then we switch the pixel updating order in the columns, so that the old and new pixels alternate every frame. But again, would this look good? Don't know. Last edited by Master484; 14 January 2018 at 11:05. |
14 January 2018, 11:38 | #45 | |
Going nowhere
Join Date: Oct 2001
Location: United Kingdom
Age: 50
Posts: 9,020
|
Quote:
|
|
14 January 2018, 14:09 | #46 |
Registered User
Join Date: Nov 2006
Location: Stockholm, Sweden
Posts: 237
|
@master484 How about you make proof-of-concepts of your ideas? If it is just a question of "what will the end result look like" and not "how quick will it run on the a500" then you don't need to implement a full Wolf3d renderer yourself; you can simulate the various ideas by doing video processing offline on a set of frame captures. (For source data, you could run wolf3d_v2 in an amiga emulator and do a video capture.) That way you will better understand which ideas are worth sharing, and that in turn saves time for others in the thread.
|
14 January 2018, 16:47 | #47 |
Registered User
Join Date: Feb 2010
Location: Espoo / Finland
Posts: 821
|
I have to agree with Kalms here, posting random suggestions without displaying any knowledge of the subject at hand is better suited for the other Wolf3d-thread.
|
14 January 2018, 18:01 | #48 |
Registered User
Join Date: Nov 2015
Location: Vaasa, Finland
Posts: 525
|
Ok, I'll exit the thread and leave it to you guys.
I just thought to share these ideas, as they seemed to good to me. But agreed, maybe it would be better to make a some kind of proof of concept demo first. I do have a book about raycasting, so maybe I'll try to cook something with Blitz if I have time. |
17 January 2018, 21:13 | #49 |
Registered User
Join Date: Feb 2010
Location: Espoo / Finland
Posts: 821
|
Alright, had some spare time to tinker with the rendering again, and now the stream runs at a nice steady 25fps (around 1.6-1.9 frames). The rendering is now done horizontally, with the previously added double pixel blitter pass modified to also rotate the buffer back 90 degrees.
There's still room for improvement in the wall rendering (for example, the code slices could be modified to draw longwords where possible). But I think the next would be to make the raycasting real time for interactivity, although this will probably not be happening any time soon The usual preview is available here |
18 January 2018, 08:19 | #50 | |||||
Registered User
Join Date: Dec 2017
Location: Denmark
Posts: 179
|
Quote:
Quote:
Quote:
Quote:
If we could make a kinda framework for a game engine, then maybe others would love to create a game (or port) with that. Quote:
|
|||||
18 January 2018, 08:29 | #51 | |||
Registered User
Join Date: Feb 2010
Location: Espoo / Finland
Posts: 821
|
Quote:
Quote:
Quote:
|
|||
18 January 2018, 10:28 | #52 | |
Registered User
Join Date: Dec 2014
Location: germany
Posts: 440
|
Wow, running in 1.6 -1.9 frames? Totally awesome! Congratulations, great achievement.
Quote:
EDIT: The neogeo dev wiki has a nice table with instruction/addressing modes timings, AFAICS it's from the official 68000 manual, but with nicer formatting: https://wiki.neogeodev.org/index.php...ctions_timings EDIT2: *Hmm, does not seem to be true according to the table I linked... Last edited by chb; 18 January 2018 at 10:42. |
|
18 January 2018, 11:51 | #53 |
Registered User
Join Date: Dec 2017
Location: Denmark
Posts: 179
|
@chb
makes sense, so atleast some gain could be made by using longwords. maybe not much, but for a game like this running on A500, we would need to make use of all the tricks available @Britelite Do you have any good ideas for sprites, as i would see that as the next performance killer. |
18 January 2018, 12:16 | #54 | |
Registered User
Join Date: Aug 2014
Location: Netherlands
Posts: 701
|
Quote:
About the raycaster. All tutorials I have seen state that you need to cast a ray for every horizontal pixel in the projection plane. Which is, in this case, 160 pixels right? So 160 rays to be casted. I was thinking, if the scene is simple, most rays would would hit the same wall. What if you cast every other ray (only even rays) and when two rays hit the same wall just interpolate the ray inbetween? Might save a few cycles. You could do it even dirtier by only calculating every fourth ray or so and when they hit the same wall, interpolate the other 3 rays. |
|
18 January 2018, 12:28 | #55 |
Registered User
Join Date: Feb 2010
Location: Espoo / Finland
Posts: 821
|
One idea would be to check the zoom factor of a visible sprite and depending on factor choose one of two different methods of drawing. If the sprite is small enough then just scale the sprite/mask directly to the buffer with CPU, but if a sprite is larger, then scale the sprite/mask vertically to a separate buffer with CPU and then draw it to the screenbuffer with the blitter, expanding it horizontally in the process.
|
19 January 2018, 08:07 | #56 | |
Registered User
Join Date: Dec 2017
Location: Denmark
Posts: 179
|
Quote:
Would it make much of a performance difference if the mipmaping was done differently. Instead of upscaling the smaller levels, then down scale to the next mipmap level, and then compensate the performance loss by using longwords where possible. ex: for wall height between 64-33 use the fullsize texture, for 32-17 use first mipmap level texture and for 16-0 use lowest mipmap level texture. (hope i make sence) i would'nt mind if i had to manually hand optimize every wall height below 64 pixels, as it's no more than 32 code segments. |
|
19 January 2018, 08:22 | #57 | ||
Registered User
Join Date: Feb 2010
Location: Espoo / Finland
Posts: 821
|
Quote:
Quote:
|
||
19 January 2018, 09:37 | #58 | ||
Registered User
Join Date: Dec 2017
Location: Denmark
Posts: 179
|
Quote:
Code:
... move.w (a1)+,(a0)+ move.w (a1)+,(a0)+ addq.l #2,a1 ; skipping an additional word move.w (a1)+,(a0)+ move.w (a1)+,(a0)+ ... Quote:
Last edited by LaBodilsen; 19 January 2018 at 09:48. |
||
19 January 2018, 10:11 | #59 |
Registered User
Join Date: Dec 2017
Location: Denmark
Posts: 179
|
Just tried some rough calculations to see the cycle count, to see if using Downscaling with longwords could be worth it. i assume you upscale the mipmap like below.
Mipmap upscale: Code:
move.w (a1),(a0)+ ; 12 cycles move.w (a1)+,(a0)+ ; 12 cycles move.w (a1),(a0)+ ; 12 cycles move.w (a1)+,(a0)+ ; 12 cycles = 48 cycles Code:
move.w (a1)+,(a0)+ ; 12 cycles move.w (a1)+,(a0)+ ; 12 cycles addq.l #2,A1 ; 4 cycles move.w (a1)+,(a0)+ ; 12 cycles move.w (a1)+,(a0)+ ; 12 cycles = 52 cycles Code:
move.l (a1)+,(a0)+ ; 20 cycles addq.l #4,A1 ; 8 cycles move.l (a1)+,(a0)+ ; 20 cycles = 48 cycles Just an idea to take advantage of this: for walls between 63 - 48 use downscaling of 64px texture with longwords where possible, and for 47-33 use mipmap upscaling of 32px texture with longwords where possible. as mentioned, i would'nt mind if i had to manually hand optimize every wall height below 64 pixels EDIT: *Changed the cycle count after Tony corrected me Last edited by LaBodilsen; 19 January 2018 at 12:32. Reason: Changed the cycle count after Tony corrected me |
19 January 2018, 12:25 | #60 |
WinUAE developer
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,574
|
addq.l #x,an is 8 cycles. (memory access + 4 idle cycles)
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Wolf3D on stock A500 | gururise | Retrogaming General Discussion | 9 | 08 November 2017 14:03 |
Wolf3d: more ideas. | AndNN | Coders. Asm / Hardware | 7 | 17 October 2017 13:03 |
Optimizing HAM8 renderer. | Thorham | Coders. Asm / Hardware | 5 | 22 June 2017 18:29 |
NetSurf AGA optimizing | arti | Coders. Asm / Hardware | 199 | 10 November 2013 14:36 |
rendering under wb 1.3 | _ThEcRoW | request.Apps | 2 | 02 October 2005 17:23 |
|
|