English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General

 
 
Thread Tools
Old 21 February 2024, 14:52   #101
agermose
Registered User
 
Join Date: Nov 2019
Location: Odense / Denmark
Posts: 251
Quote:
Originally Posted by reassembler View Post
I've been having a really really wild time optimizing sprite routines. Oh boy.

Some highlights:
1/ All 15 Stages are in and rendering correctly with original level data. You can progress through them, via the road split and cursor keys. The engine and the optimizations I've made seem technically sound. All the levels should be pixel perfect with the original in terms of layout, road height maps, curves, sprite layout.

2/ The sprite routines are now much much faster. By a factor of 2-3x performance. I split the sprite rendering into 6 custom routines that can each make a variety of assumptions about how the sprite is going to be rendered. This code is a nightmare to maintain, but as optimized as my brain can handle for now. Only a new paradigm could improve things. Or just culling scenery.

3/ I optimized some of the original OutRun codebase. To be honest, it's not the critical path in terms of performance, but there were inefficiencies in the most called sprite routines. I guess I'm the first person to optimize the original code since 1986, so that's kind of fun and satisfying.

4/ In a move that somewhat surprised myself, I've mostly got shadow translucency working and with a solution that's no slower than rendering a sprite - well it's faster than rendering the blobs I had previously! So yes, the engine can designate a shadow to a sprite and it will darken the underlying pixels of the road layer as per hardware. I'm actually using the correct values derived from hardware analysis for darkening. I'm not sure if MAME is still buggy, but its shadows were way too bright for many years. Let's make the Amiga version better!

There is an alternate shadow mode that OutRun also uses. I don't think I'll be able to support this from a performance standpoint as it would require a per-pixel rendering check that would tank performance. But this is less visible and used to minor effect in comparison.

5/ I optimized my road code further still. It's funny - you think there's nothing left, and then you look at it another day and figure out a set of ideas. It's marginal gains at this point.

Overall I think this is the most complex optimization and porting project I've ever undertaken. Whilst I don't want to create a false sense of hope - the sprite scaling and scenery rendering was the area I was most concerned about. And it's working faster now.

I'm away from my Amiga at the moment, so I haven't checked speeds again on hardware. I'll post a video next week. I've attached a few picture postcards. Enjoy. And any coding questions welcome. There's a lot of detail I could go into...

PS Thanks to Agermose for nudging me to turn the black border on!
About point 3. I'd be interested in what those optimisations were. I have made my own optimisations of the original code. Mostly related to how memory is accessed, and various 68020 optimisations. Those are on my critical path, when running on chip mem only :-)
agermose is offline  
Old 21 February 2024, 15:43   #102
S0ulA55a551n
Registered User
 
S0ulA55a551n's Avatar
 
Join Date: Nov 2010
Location: South Wales
Age: 47
Posts: 947
Quote:
Originally Posted by reassembler View Post

(Disclosure: I own two STs)
And you were doing so well up until this point
S0ulA55a551n is offline  
Old 21 February 2024, 23:27   #103
reassembler
Registered User
 
reassembler's Avatar
 
Join Date: Oct 2023
Location: London, UK
Posts: 124
Quote:
Originally Posted by agermose View Post
About point 3. I'd be interested in what those optimisations were. I have made my own optimisations of the original code. Mostly related to how memory is accessed, and various 68020 optimisations. Those are on my critical path, when running on chip mem only :-)
In my case, I think the biggest gains have been / will be focusing on how the original codebase interfaced with hardware. And how it should actually interface with the new graphics routines coded for the Amiga. For instance, the original codebase performs a lot of shifting and manipulation to get the data in the format expected by the Sega hardware. Rethinking this could yield considerable gains.

The sprites are the area I focused on most closely with this. That being said, I'm wary of over optimizing the original code at this point - any bugs I accidentally introduce will just make development slower and debugging difficult. And in my case, the original 68k logic is still pretty quick compared with the software sprite emulation/code.

I had a quick pass tonight of replacing some of the 68000 instructions with 020/030 equivalents, especially using scale to access memory. This in itself wasn't so important, but using it allowed me in a couple of cases to free up a register or so in some very intense routines. If you're doing add.w d0,d0 type stuff for indexing you end up trashing the original register value, so sometimes a backup was previously needed, whereas one isn't now.

I gained an 8% overall speed increase with this work with sprite rendering and a 20% increase on road rendering - although some of the road boost was down to fine tuning my (bad) code. I probably need to stop trying to optimize things now and port some more features over next.

I also raised a VSCode/Amiga Assembly query here if anyone knows the answer:
https://github.com/prb28/vscode-amig...iscussions/301
reassembler is offline  
Old 21 February 2024, 23:30   #104
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,775
Optimise a man's code and he'll see a performance gain for a single application.

Teach a man to optimise and he'll never finish anything ever again
Karlos is online now  
Old 21 February 2024, 23:38   #105
gimbal
cheeky scoundrel
 
gimbal's Avatar
 
Join Date: Nov 2004
Location: Spijkenisse/Netherlands
Age: 43
Posts: 7,029
Quote:
Originally Posted by Karlos View Post
Optimise a man's code and he'll see a performance gain for a single application.

Teach a man to optimise and he'll never finish anything ever again
If the latter is true, who is optimizing a man's code then?
gimbal is offline  
Old 21 February 2024, 23:44   #106
Karlos
Alien Bleed
 
Karlos's Avatar
 
Join Date: Aug 2022
Location: UK
Posts: 4,775
Quote:
Originally Posted by gimbal View Post
If the latter is true, who is optimizing a man's code then?
Someone who has learned only to apply his skill to code he didn't write in the first place. When it's your own baby, you just can't help tickling it, can you?
Karlos is online now  
Old 22 February 2024, 14:43   #107
agermose
Registered User
 
Join Date: Nov 2019
Location: Odense / Denmark
Posts: 251
Quote:
Originally Posted by reassembler View Post
In my case, I think the biggest gains have been / will be focusing on how the original codebase interfaced with hardware. And how it should actually interface with the new graphics routines coded for the Amiga. For instance, the original codebase performs a lot of shifting and manipulation to get the data in the format expected by the Sega hardware. Rethinking this could yield considerable gains.

The sprites are the area I focused on most closely with this. That being said, I'm wary of over optimizing the original code at this point - any bugs I accidentally introduce will just make development slower and debugging difficult. And in my case, the original 68k logic is still pretty quick compared with the software sprite emulation/code.

I had a quick pass tonight of replacing some of the 68000 instructions with 020/030 equivalents, especially using scale to access memory. This in itself wasn't so important, but using it allowed me in a couple of cases to free up a register or so in some very intense routines. If you're doing add.w d0,d0 type stuff for indexing you end up trashing the original register value, so sometimes a backup was previously needed, whereas one isn't now.

I gained an 8% overall speed increase with this work with sprite rendering and a 20% increase on road rendering - although some of the road boost was down to fine tuning my (bad) code. I probably need to stop trying to optimize things now and port some more features over next.

I also raised a VSCode/Amiga Assembly query here if anyone knows the answer:
https://github.com/prb28/vscode-amig...iscussions/301
Tossing the various copy to hw ram was the first thing I did, big improvement there.
Scalar index by 2 or 4, also done.
My next thing planned is address register indexed for main cpu code (similar to how the road cpu uses a5, I have freed up a3 to do this). Saves a few cycles off every memory access. But huge task.

I’m also experiencing the same problem in Vscode with single file build.

Last edited by agermose; 22 February 2024 at 15:27.
agermose is offline  
Old 22 February 2024, 15:57   #108
agermose
Registered User
 
Join Date: Nov 2019
Location: Odense / Denmark
Posts: 251
Quote:
Originally Posted by reassembler View Post
About 50% of the time in terms of level scenery. Pretty much all scenery needs to be flipped at some stage. My sprite flipping routine is only a few cycles slower than normal sprite rendering so it's not a performance drain.

It's more complicated overall. Sega's sprite hardware can render all sprites from left-to-right or right-to-left (both flipped and unflipped). Think of this as x-anchoring. The OutRun engine will actually update the anchoring applied as the screen is scrolled left and right. This is where the D-Pad style movement through the level I have comes in useful for testing.

In a further twist, the original hardware performs x-clipping automatically. That needs to be handled in our software interpretation. Y-clipping is handled in the original engine in software. Bear in mind that sprites are completely free-form in size and the larger ones scale beyond screen size. There's no nice assumptions of 'hey this is a multiple of 8' or whatever!

There are a lot of variables to consider. You can potentially begin to understand why I march all sprites through one of six routines. In fact, the same sprite will often be rendered in multiple ways by the engine I'm writing once it's spawned in the game world:

1/ Is the sprite clipped? No... We can render faster!
2/ It's clipped. What direction are we rendering in (x delta/anchoring). We can make assumptions and render faster.
3/ Is the sprite horizontally flipped? This determines the read direction of the source data and again uses a custom routine.

The sprite routines I've written handle the free form zooming and palette remapping. I've avoided anything that could be considered slow: division, multiplication and even logical shifts to do this. As much as possible is register driven.

But ultimately there's a huge amount of sprite data pounding around. Hang On and Space Harrier would be really easy in comparison if I ever moved onto them.
About logical shifts. You are aware the cost is constant on 68020 or better?
On 68k the cost increases with shift count.
agermose is offline  
Old 22 February 2024, 17:43   #109
hooverphonique
ex. demoscener "Bigmama"
 
Join Date: Jun 2012
Location: Fyn / Denmark
Posts: 1,652
@agermose @reassembler prb28 has provided a solution to the syntax verification step problem in the github issue already, in case you didn't notice.
hooverphonique is offline  
Old 22 February 2024, 18:11   #110
reassembler
Registered User
 
reassembler's Avatar
 
Join Date: Oct 2023
Location: London, UK
Posts: 124
Quote:
Originally Posted by agermose View Post
About logical shifts. You are aware the cost is constant on 68020 or better?
On 68k the cost increases with shift count.
Yep.
reassembler is offline  
Old 22 February 2024, 18:11   #111
reassembler
Registered User
 
reassembler's Avatar
 
Join Date: Oct 2023
Location: London, UK
Posts: 124
Quote:
Originally Posted by hooverphonique View Post
@agermose @reassembler prb28 has provided a solution to the syntax verification step problem in the github issue already, in case you didn't notice.
Yes, that's awesome - thanks very much, I responded in the thread!
reassembler is offline  
Old 23 February 2024, 17:44   #112
agermose
Registered User
 
Join Date: Nov 2019
Location: Odense / Denmark
Posts: 251
Quote:
Originally Posted by hooverphonique View Post
@agermose @reassembler prb28 has provided a solution to the syntax verification step problem in the github issue already, in case you didn't notice.
Haha no I didn’t look, that was fast :-)
agermose is offline  
Old 24 February 2024, 01:11   #113
eXeler0
Registered User
 
eXeler0's Avatar
 
Join Date: Feb 2015
Location: Sweden
Age: 50
Posts: 2,998
Quote:
Originally Posted by reassembler View Post
As promised made a video on hardware.

Also the C2P routine I'm using is here:
https://github.com/Kalmalyzer/kalms-...1_8_c5_030_2.s

Is there a faster one?

[ Show youtube player ]
Holy sh1t, this is starting to look real good.
eXeler0 is offline  
Old 14 March 2024, 00:26   #114
reassembler
Registered User
 
reassembler's Avatar
 
Join Date: Oct 2023
Location: London, UK
Posts: 124
I've been working on two somewhat dull, but remaining parts of the rendering engine.

1/ The tilemaps.
2/ The palette fades between level transitions.

For the end result - which isn't really as exciting as the road or sprites - these have taken a surprisingly long time to implement and been painful to code.

OutRun's hardware has two separate tile layers, the foreground and background tile layers that scroll independently. Each layer comprises 4 name tables that can be arranged in a number of different ways. OutRun's arrangement is effectively has a 2,048 pixel wide faster scrolling foreground layer and a 1,536 pixel wide background layer (used for the giant clouds on Stage 1 for example).

The tilemaps are relatively simple to cache into a giant off-screen image before blitting the visible area to the screen, which partially doubles as a screen clearing device which is nice.

The difficulty arises when the level transitions and new tilemaps sweep into view dynamically and smoothly. At this point, the palette usage doubles, the number of tilemaps doubles and there's also a need to decode the tilemaps and render them without slowing down the overall experience.

Because of the various optimizations I made to ensure tilemaps were as fast as possible during normal gameplay, this 5 second transition became incredibly tricky.

Anyway - it appears to be working and behaving identically to the arcade. But, my god, for some reason it was just an awful coding experience. So much line by line debugging was needed for this transition.

The palette transitions between stages weren't too bad. This section of code fades the road palette, the sky and background when transitioning between stages. I pretty much reworked these to AGA as the originals were using the Sega 5-bit per pixel colour format.

Aside from that, I've been optimizing other areas of the code base. The road rendering code is substantially faster still, at the expense of some memory.

This weekend, I'll try things out on hardware, take a video if I get a chance, and hopefully clean-up the latest code and kick the tyres for bugs.

The main thing - and I'm going to regret saying this no doubt - is that a lot of the remaining game logic should be relatively easy. It's almost a copy and paste job from the original source code. I've found solutions for all of the rendering required and just about have enough colours remaining. It also isn't particularly performance intensive.

Clouds and Foreground Tiles:


Level Transition Stuff:
Attached Thumbnails
Click image for larger version

Name:	tilemaps2.PNG
Views:	546
Size:	40.8 KB
ID:	81821   Click image for larger version

Name:	tilemaps3.PNG
Views:	553
Size:	18.2 KB
ID:	81822  
reassembler is offline  
Old 14 March 2024, 01:17   #115
dlfrsilver
CaptainM68K-SPS France
 
dlfrsilver's Avatar
 
Join Date: Dec 2004
Location: Melun nearby Paris/France
Age: 47
Posts: 10,532
Send a message via MSN to dlfrsilver
Looks awesome ! Keep up the good work !!
dlfrsilver is offline  
Old 14 March 2024, 08:02   #116
jotd
This cat is no more
 
jotd's Avatar
 
Join Date: Dec 2004
Location: FRANCE
Age: 52
Posts: 8,409
if you have the commented original 68000 code, the logic will work just as well.

The hard part is always adapting the HW specs to the amiga, and inserting hacks known only by you so the rendering is good. Even in my Moon Patrol game (very modest but Z80 and required full RE), most of the time is spent on displaying colors properly.

In the end you'll be proud to see how good your hack works during those 5 seconds

No compromises, hack it to death!

One of my personal favorites, keep up the good work.
jotd is offline  
Old 14 March 2024, 08:39   #117
malko
Ex nihilo nihil
 
malko's Avatar
 
Join Date: Oct 2017
Location: CH
Posts: 5,099
Looks really good !!!
Hope that implementing the remaining logic will be as easy as you think
malko is offline  
Old 14 March 2024, 09:30   #118
Tigerskunk
Inviyya Dude!
 
Tigerskunk's Avatar
 
Join Date: Sep 2016
Location: Amiga Island
Posts: 2,801
Quote:
Originally Posted by reassembler View Post
As promised made a video on hardware.

Also the C2P routine I'm using is here:
https://github.com/Kalmalyzer/kalms-...1_8_c5_030_2.s

Is there a faster one?

[ Show youtube player ]
Looks amazing.
May I ask what kind of setup you use here?
Obviously a 1200/AGA, but how much RAM and which CPU?

add/edit: Kudos, really nice project!

Last edited by Tigerskunk; 14 March 2024 at 09:40.
Tigerskunk is offline  
Old 14 March 2024, 10:16   #119
reassembler
Registered User
 
reassembler's Avatar
 
Join Date: Oct 2023
Location: London, UK
Posts: 124
I'm using a TerribleFire 1230 card. I expect the end result to need 8mb fast + 2mb chip if I wave my finger in the air.

I do intend to release a demo as I'd like reports from those with faster machines to provide performance stats. I would expect a considerable boost on 040 and 060 hardware.

Before I do so, I have some work to do on my ROM conversion tooling that translates the graphics ROM data to the format required by the Amiga.
reassembler is offline  
Old 14 March 2024, 11:08   #120
Tigerskunk
Inviyya Dude!
 
Tigerskunk's Avatar
 
Join Date: Sep 2016
Location: Amiga Island
Posts: 2,801
Quote:
Originally Posted by reassembler View Post
I'm using a TerribleFire 1230 card. I expect the end result to need 8mb fast + 2mb chip if I wave my finger in the air.

I do intend to release a demo as I'd like reports from those with faster machines to provide performance stats. I would expect a considerable boost on 040 and 060 hardware.

Before I do so, I have some work to do on my ROM conversion tooling that translates the graphics ROM data to the format required by the Amiga.
Sounds good to me..

Will pull my Vampired 1200 out of the closet for this to test..
Tigerskunk is offline  
 


Currently Active Users Viewing This Thread: 5 (0 members and 5 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Outrun AGA agermose project.Amiga Game Factory 417 17 July 2024 19:08
Better Outrun port for Amiga tekopaa Retrogaming General Discussion 399 14 April 2022 17:56
Outrun adfs macce2 request.Old Rare Games 3 18 April 2021 21:22
would you like to have an Outrun like for Aga? sandruzzo Retrogaming General Discussion 50 30 January 2013 12:03
Aweb: New APL 3.5Beta AOS4 PPC code + Milestone: KHTML porting started Paul News 0 05 November 2004 11:21

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 20:25.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.13287 seconds with 14 queries