English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General

 
 
Thread Tools
Old 16 May 2024, 18:19   #281
TCD
HOL/FTP busy bee

 
TCD's Avatar
 
Join Date: Sep 2006
Location: Germany
Age: 46
Posts: 31,812
Yep, the 'I don't know how to code myself, but I know how you should have done it and then it would run at 60hz with twice as many objects, colors, music channels and rainbows, unicorns and megagodzillaspacekittens!!!!eins!' crowd is strong these days.
TCD is online now  
Old 16 May 2024, 18:22   #282
jotd
This cat is no more
 
jotd's Avatar
 
Join Date: Dec 2004
Location: FRANCE
Age: 52
Posts: 8,325
fortunately this doesn't happen often. Someone must have left the bozo gate open for a while.
jotd is offline  
Old 16 May 2024, 18:24   #283
TCD
HOL/FTP busy bee

 
TCD's Avatar
 
Join Date: Sep 2006
Location: Germany
Age: 46
Posts: 31,812
Quote:
Originally Posted by jotd View Post
Someone must have left the bozo gate open for a while.
TCD is online now  
Old 16 May 2024, 18:43   #284
reassembler
Registered User
 
reassembler's Avatar
 
Join Date: Oct 2023
Location: London, UK
Posts: 116
Back to C2P, I figured I'd 'quickly' try the blitter assisted routine here:
https://github.com/Kalmalyzer/kalms-...1_8_c3b1_030.s

For whatever reason and despite having success with other C2P routines, I can't get this to work.

- What is the difference between 'c3' and 'c5' in the naming conventions of this routine?
- Does anyone understand the size the extra buffer has to be (passed in the a2 register) for this version? I've stored the buffer in chip memory.
- I can confirm I have enabled Blitter DMA.
- Am I doing something completely dumb when it comes to calling QBlit - is there any circumstances where QBlit would fail?

Code:
move.l	S_GraBase,a6  ; Graphic Base, grabbed during startup address appears ok
jsr	_LVOQBlit(a6) ; Where the subroutine is at an offset of -276
From quickly debugging, I'm presuming something isn't getting setup correctly, as the c2p routine stalls on second execution waiting for the blitter indefinitely.
reassembler is offline  
Old 16 May 2024, 19:11   #285
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,167
Haven't used his blitter assisted C2P's, but:

c3b1 = 3 cpu passes, 1 blitter pass. c5 = 5 cpu passes
95% sure the extra blit buffer buffer needs to be the same size as the screen.
Do you have blitter interrupts enabled? If not, that's probably the cause as that's what will be driving the blits. If you kill the system, you probably have to use a replacement for QBlit, and the repo might have that (EDIT: it does: https://github.com/Kalmalyzer/kalms-...hers/qblit.lha)

Last edited by paraj; 16 May 2024 at 19:19.
paraj is offline  
Old 16 May 2024, 19:27   #286
reassembler
Registered User
 
reassembler's Avatar
 
Join Date: Oct 2023
Location: London, UK
Posts: 116
@paraj - well spotted. I'll give that a go tomorrow and report back on hardware.
reassembler is offline  
Old 16 May 2024, 20:10   #287
reassembler
Registered User
 
reassembler's Avatar
 
Join Date: Oct 2023
Location: London, UK
Posts: 116
Right yes, that's 'working' albeit with a load of timing issues because the c2p is running in parallel as opposed to in series with the rest of the codebase. Will have to untangle that mess and see if it's yielded any exciting performance improvements!
reassembler is offline  
Old 16 May 2024, 20:27   #288
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,167
Quote:
Originally Posted by reassembler View Post
Right yes, that's 'working' albeit with a load of timing issues because the c2p is running in parallel as opposed to in series with the rest of the codebase. Will have to untangle that mess and see if it's yielded any exciting performance improvements!
Great (and good progress BTW!), but yeah, it's unfortunately not a free win. While the blitter will handle 2 passes this way the CPU will still do as many chip writes as before, and since the blitter can only work on 16-bit words, and need 3 chip accesses for each, it probably works best when you're limited by computation speed (and never on 040+).


Looking forward to the results of your tests, but you probably should spend too much time on getting it "right" if the speed improvement isn't very noticeable.
paraj is offline  
Old 16 May 2024, 21:17   #289
reassembler
Registered User
 
reassembler's Avatar
 
Join Date: Oct 2023
Location: London, UK
Posts: 116
Yeah, I haven't focused on pure Amiga hardware optimizations for some time so figured I'd revisit some options. The necessity to use chip memory kills a lot of ideas.

I do want to benchmark having dummy space either side of the chunky buffer on a horizontal line basis. This means some expensive clipping checks can be omitted. However, I'm unsure whether doing so would completely hose the c2p performance as the algorithm would have to skip bytes which might negate any advantage.

I do something internally like this for road rendering, which uses bucketloads of ram, but is lightning fast as a result. Basically avoiding checks and branch conditions in the most intense routines and granting the code permission to write all over the place is often a massive win. Being as dumb and simple as possible is often better than being smart.
reassembler is offline  
Old 17 May 2024, 16:24   #290
mrupp
Registered User
 
mrupp's Avatar
 
Join Date: Jun 2019
Location: St.Gallen, Switzerland
Posts: 105
Quote:
Originally Posted by reassembler View Post
Alright, you asked for it - a video:

[ Show youtube player ]
Thanks a lot for the vid, it already looks spectacular.

Quote:
Originally Posted by reassembler View Post
I'm not going to go wild implementing 'new' features in the Amiga version. I've really done that as part of CannonBall (modern machines) and OutRun Enhanced (runs on Arcade hardware). This is more of a 'just get it bloody running' effort. That being said, I have fixed a few of the simple bugs present in the original game.
Yes of course, I totally understand, maybe my feature request would better fit as option to CannonBall. Just in case you feel like revisiting CannonBall at some later point, it would be a nice addition.

But by all means, keep going on working at this port here first! It's simply amazing!
mrupp is offline  
Old 17 May 2024, 21:24   #291
reassembler
Registered User
 
reassembler's Avatar
 
Join Date: Oct 2023
Location: London, UK
Posts: 116
Quote:
Originally Posted by paraj View Post
Looking forward to the results of your tests, but you probably should spend too much time on getting it "right" if the speed improvement isn't very noticeable.
OK, thanks to your help I implemented the C2P algorithm using the Blitter.

I measured the time it took the AI to drive the car from the start line to a particular point of the game with a stop-watch. The AI is deterministic so will behave the same way every time from a fresh boot. Due to the way in which the blitter runs in the background, I figured this was a better way of analyzing overall performance, as opposed to actual in-engine timings.

With the normal 030 c2p... it took 2m 11s. With the blitter assisted c2p it took 1m 59s. About 11% faster overall for a real world situation.

However... whilst this is a moderate speed-boost there are some caveats:

1/ The more intensive parts of the game are now obviously faster (because C2P is chugging away in the background)

2/ The lightweight parts of the game are now slower (because the C2P is still running when we get to the end of a frame, and we have to wait for it to finish).

3/ In order to get the benefit from Blitter C2P I have to start the C2P at the very beginning of the frame, effectively working on the previous frames data, so that there's other stuff the game engine can be doing whilst the blitter slowly ploughs through the data.

So originally we had:
Game Logic -> Render to chunky -> c2p -> swap screen buffer -> vblank stuff

Now we have:
c2p -> Game Logic-> render to chunky -> check c2p has finished -> swap screen buffer -> vblank stuff

This means that the frame displayed is 1 frame older than previously. It also means I need to add some hacks/delays with various palette updates that ran in the vblank. So overall the code gets a little messier and harder to debug.

I don't really know whether I love this approach. On the one hand, it's faster overall. On the other, it's kind of hacky and is very 030 specific. There's no way you'd want this on an 040. Plus there's the fact that I'm planning 030 optimizations/streamlining anyway. It kind of sucks if their benefit is diminished due to blitter waits.

I'll sleep on it, but I think if it had been significantly faster I would have welcomed it more. But I'm not so sure.

Edit: Not that it will tell you more than the above, but here's a quick video of it running:
[ Show youtube player ]

Last edited by reassembler; 17 May 2024 at 21:53.
reassembler is offline  
Old 23 May 2024, 22:04   #292
edd_jedi
Registered User
 
edd_jedi's Avatar
 
Join Date: Apr 2010
Location: London / UK
Posts: 423
Looking really good so far, amazing work! Will keep an eye on this thread, if you ever want a build tested on my 060 just let me know
edd_jedi is offline  
Old 23 May 2024, 22:36   #293
TEG
Registered User
 
TEG's Avatar
 
Join Date: Apr 2017
Location: France
Posts: 635
And how it behave on more heavy parts like the one with the arks?
TEG is online now  
Old 27 May 2024, 00:07   #294
agermose
Registered User
 
Join Date: Nov 2019
Location: Odense / Denmark
Posts: 241
Quote:
Originally Posted by reassembler View Post
OutRun generally pushes over 256 colours simultaneously and uses many more over the course of some stages. There's already palette reduction going on for AGA. By the time you slice that down to 128 colours, you're really looking at reworking all the palettes. Which I'm not enthusiastic about as there are 255 * 16 colour palettes available at any one time. So over 4,000 colours. Losing a bitplane doesn't seem worth it.

I always ask myself, what the pillars that make the game successful? For OutRun it's the vibrant palette and scenery, cool audio, fast frame-rate, simple yet highly nuanced gameplay. Ideally, I'll keep as many of those pillars intact where possible - with hopefully some tasteful sacrifices.

Maybe in the future I'll port PCE OutRun to A500! But even that has a pretty impressive colour palette vs. a typical 16/32 colour Amiga game.

Agermose is currently reworking all the OutRun art for his port. That's a massive job. Easy enough to get a single level of the game working ok. But the scenery needs to work across multiple stages in various combinations of colours. A daunting amount of work. And also why his project is interesting because of the different approach taken.
Technically speaking it is now Adrian who is doing the gfx. I dumped everything including a huge Excel sheet with all the patterns and sprite/palette usage in the 15 stages. He’s doing the monster task of converting to 32 colours.

Last edited by agermose; 27 May 2024 at 00:19.
agermose is offline  
Old 27 May 2024, 00:14   #295
agermose
Registered User
 
Join Date: Nov 2019
Location: Odense / Denmark
Posts: 241
Quote:
Originally Posted by reassembler View Post
Maybe. That would need to be benchmarked in terms of Sprite DMA usage trade-off, amount of chip ram needed as the cached tilemaps are huge, the cost of accessing chip ram vs. fast ram etc. Plus I've already eaten the cost of chunky conversion anyway, so that computation wouldn't be recovered.

So the honest answer is, I don't really know without spending considerable time trying it. It's more of a case of considering the overall architecture of the engine, as opposed to a binary 'sprites are almost free' unfortunately (an imaginary quote not yours!)

I'd probably rather keep remaining chip ram for music and sound effects.

There would be a considerable saving to be made by merging the tile layer into one single layer with no transparency - as them I'm just movem.l'ing (is that a new expression?) data around fast memory. Plus native sprites wouldn't handle the parallax anyway - which is the expensive part of this.

I think Agermose was experimenting with sprites for the tile layers on his AGA port. He may have some insights into the pros and cons. Bear in mind I'm using a full 8 bitplanes, so I'm already hammering those DMA slots hard... My lack of Amiga coding experience makes it hard to anticipate how it would play out. From previous experiments with Blitter usage it was a pain in the arse / bottleneck on the 030.
Yes I’m using the hw sprites for the back tile layer, ditched the front tile layer. I don’t think it is suited for your project. Pros are the easy scrolling, and “free” layer. There are quite a few cons, mainly the 3 colour limit, and the huge (chip) memory use (which can be reduced somewhat, but with complicated code). PM me if you want more details.

Last edited by agermose; 30 May 2024 at 14:05.
agermose is offline  
 


Currently Active Users Viewing This Thread: 2 (0 members and 2 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Outrun AGA agermose project.Amiga Game Factory 347 Today 05:58
Better Outrun port for Amiga tekopaa Retrogaming General Discussion 399 14 April 2022 17:56
Outrun adfs macce2 request.Old Rare Games 3 18 April 2021 21:22
would you like to have an Outrun like for Aga? sandruzzo Retrogaming General Discussion 50 30 January 2013 12:03
Aweb: New APL 3.5Beta AOS4 PPC code + Milestone: KHTML porting started Paul News 0 05 November 2004 11:21

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 17:32.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.11186 seconds with 15 queries