English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 14 April 2022, 05:08   #1
Jobbo
Registered User
 
Jobbo's Avatar
 
Join Date: Jun 2020
Location: Druidia
Posts: 388
Stable cpu raster bars?

Yes I know that the copper is the right tool for implementing stable raster color bars.

But what if the copper is already very busy!

I'm using the cpu to change the background color halfway down the screen, the big problem is that it's quite unstable and jumps around.

Is there some definitive technique for fixing this instability?

My current approach is to use a cia timer interrupt and then a vhposr wait loop to try and synchronize to the beam. However no matter what I do the results are never 100% stable.

Anyone seen this done perfectly?

P.S. I'm using WinUAE in cycle exact mode and hoping that is 100% accurate, I guess it's possible there are still some differences on real hardware.
Jobbo is offline  
Old 14 April 2022, 07:46   #2
mc6809e
Registered User
 
Join Date: Jan 2012
Location: USA
Posts: 372
I would do a loop until you get close to your position, then do a jmp (0, d0, a0) into a block of nops where a0 contains the address of the start of the nop list and d0 has hpos left shifted by one bit and h1 masked out. Each nop basically represents four pixels. At the end of the list is your color changing code.

You'll have to turn off copper DMA temporarily so that it doesn't interfere with the cpu.

That should get you four pixel precision. If you need two pixel precision for some reason you can use a branch instruction based on h1 that branches to the next instruction to create an extra two cycle delay. Other DMA might ruin this, though.

That jmp (0,d0,a0) might cause trouble with other DMA going on since it takes 14 cycles to compute the effective address. If that's a problem you can just do a seperate add instruction and then a jmp (a0).

Worst case Interrupt latency on the 68000 is over 300 cycles btw. Happens with movem.l all regs followed by divs. You're probably not doing that. Still, you may need to interrupt excessively early.

I'm sure there's a more efficient and clever way, but that's what I would try.

Last edited by mc6809e; 14 April 2022 at 07:52.
mc6809e is offline  
Old 14 April 2022, 08:36   #3
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
When I had to implement a stable raster in a hostile environment (ie with the copper not modifiable), but known (ie where I am certain of the worst situation regarding DMA traffic), it was to use the STOP instruction.
However, remember that in any case it is processor dependent and therefore the routine cannot be in every situation pixel perfect, however you can fork depending on the CPU or where the code is executed from (chip, fast, cache ..).

Main idea:
- ciab timer for hsync (need to be solo in the ICR!) in the previous line
- it trigger after undefined number of cycles (even 300 ..), but sure in this very same line
- setup a ciab trigger on next underflow
- STOP #$2500, manage stack
- next IRQ6 is fully cycle exact, now you can do what you want
ross is offline  
Old 14 April 2022, 09:44   #4
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,410
Quote:
Originally Posted by ross View Post
When I had to implement a stable raster in a hostile environment (ie with the copper not modifiable)
Love the choice of words here, only write this kind of code with a proper helmet and bullet proof vest on
Quote:

, but known (ie where I am certain of the worst situation regarding DMA traffic), it was to use the STOP instruction.
However, remember that in any case it is processor dependent and therefore the routine cannot be in every situation pixel perfect, however you can fork depending on the CPU or where the code is executed from (chip, fast, cache ..).
Wouldn't CPU dependent code get quite messy if you also have to account for different CPU speeds/Chip RAM access speeds? You might need to go so far as to measure real world latency ahead of time so as to make it work in a stable fashion across many CPU types.

In fact, I'm not 100% sure you can make this work across all the various turbo cards and specifically all their different Chip RAM access speeds. Not saying it can't be done of course, just wondering what the strategy would be to avoid issues. Otherwise, I'd say that putting the code in Chip RAM might avoid some of the issues.

It's an interesting idea for sure... Feels like it might fit better if limited systems where the CPU access speed to Chip RAM is a known quantity though.
Quote:
Main idea:
- ciab timer for hsync (need to be solo in the ICR!) in the previous line
- it trigger after undefined number of cycles (even 300 ..), but sure in this very same line
- setup a ciab trigger on next underflow
- STOP #$2500, manage stack
- next IRQ6 is fully cycle exact, now you can do what you want
This sounds suspiciously like some of the C64 stable raster tricks

Anyway, I'd add to this that it might be a good idea to not use colour 0 for the change and aim for changing during the HBlank. That way, there's some extra leeway in the timing.
roondar is offline  
Old 14 April 2022, 10:28   #5
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,506
Audio DMA slots can cause problems because they are basically "randomly" used.

Quote:
Originally Posted by Jobbo View Post
P.S. I'm using WinUAE in cycle exact mode and hoping that is 100% accurate, I guess it's possible there are still some differences on real hardware.
You need newer version than 4.9.1. (usual winuae.7z can be used if you know what are you doing) CPU chipset accesses had off by one cycle timing bug (in best case).
Toni Wilen is offline  
Old 14 April 2022, 14:44   #6
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by Toni Wilen View Post
Audio DMA slots can cause problems because they are basically "randomly" used.
That's right, and that's exactly the weak point I had thought about too..

The idea in this case is to use a set of selected instructions that intertwine and pass the audio cycles unscathed.
But I seem to remember that the ciab IRQ6 trigger call alone already 'surpasses' those cycles (only in bare 68k though..).
Of course the routine must finish within the end of the video line, otherwise we are in the same problem again

EDIT: if I find inspiration, I'll do some tests later

Last edited by ross; 14 April 2022 at 14:57.
ross is offline  
Old 14 April 2022, 14:52   #7
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by roondar View Post
In fact, I'm not 100% sure you can make this work across all the various turbo cards and specifically all their different Chip RAM access speeds. Not saying it can't be done of course, just wondering what the strategy would be to avoid issues. Otherwise, I'd say that putting the code in Chip RAM might avoid some of the issues.
Yep, it is basically impossible to do in all configurations, if not degrading as much as possible (and in any case it would never be enough ..).

Well, I think in any case it is more of an academic discussion. Copper exists and should be abused
ross is offline  
Old 14 April 2022, 15:16   #8
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,410
Quote:
Originally Posted by ross View Post
Yep, it is basically impossible to do in all configurations, if not degrading as much as possible (and in any case it would never be enough ..).

Well, I think in any case it is more of an academic discussion. Copper exists and should be abused
Absolutely, I'd recommend using the Copper as well. But since we're here anyway I'm still going to show interest in crazy alternative ideas
roondar is offline  
Old 14 April 2022, 22:11   #9
Jobbo
Registered User
 
Jobbo's Avatar
 
Join Date: Jun 2020
Location: Druidia
Posts: 388
Lots of good ideas but no silver bullets. I’ll have to keep playing around and see if it’s all worth it or if the alternative compromises to have the copper do the work are possible.
Jobbo is offline  
Old 15 April 2022, 00:03   #10
mc6809e
Registered User
 
Join Date: Jan 2012
Location: USA
Posts: 372
Quote:
Originally Posted by roondar View Post
Absolutely, I'd recommend using the Copper as well. But since we're here anyway I'm still going to show interest in crazy alternative ideas
Unless the copper is busy with the blitter.
mc6809e is offline  
Old 15 April 2022, 09:09   #11
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by mc6809e View Post
Unless the copper is busy with the blitter.
This would collapse the discussion from the beginning.

Quote:
Originally Posted by ross View Post
.. but known (ie where I am certain of the worst situation regarding DMA traffic)
Copper using Blitter by definition would take up most of the line DMA cycles making a stable raster bar virtually impossible
ross is offline  
Old 15 April 2022, 23:00   #12
Hannibal
Registered User
 
Join Date: May 2015
Location: Kirkland, Washington, USA
Posts: 56
While this is not practical, this is a fun thought experiment :-)
if blitter and copper are busy and unpredictable, could you possibly disable blitter and copper DMA just while you do anything that requires the stable raster, and enable them afterwards?

For sound, if you use software mixing and always run the same channels, and at the highest pitch (lowest period), doesn’t it use one dma read per audio channel per raster line? In that case it would be predictable, so possible to make it stable

And if the code then runs from chip ram, then it seems like the cpu model is the only thing that’s unpredictable, right? (Assuming no moving sprites and no disk read/write)
Hannibal is offline  
Old 16 April 2022, 03:00   #13
mc6809e
Registered User
 
Join Date: Jan 2012
Location: USA
Posts: 372
Quote:
Originally Posted by ross View Post
This would collapse the discussion from the beginning.

Copper using Blitter by definition would take up most of the line DMA cycles making a stable raster bar virtually impossible
Nah. Your solution is better than you realyze.

After the first interrupt turn off copper and blitter DMA, "pausing" them both. Then do the STOP.. After second interrupt change background color and then "unpause" copper and blitter by turning their DMA back on. They'll continue from where they left off.

Besides, if the copper is running the blitter it's unlikely they're both stealing cycles at the same time. The CPU should have enough cycles available to handle the first int provided bp DMA isn't grabbing too much.

Here's what the bus looks like for an interrupt (from yacht.txt):

n nn ns ni n- n nS ns nV nv np np

Each symbol represents two CPU cycles. The CPU needs the bus for seven DMA cycles: three for the stack, two for the vector address, and two more to fetch the first instruction in the handler. Not too bad. Worst case move needs four more to turn off DMA. Just 11 memory accesses from int to DMA pause.
mc6809e is offline  
Old 16 April 2022, 09:42   #14
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by mc6809e View Post
Nah. Your solution is better than you realyze.

After the first interrupt turn off copper and blitter DMA, "pausing" them both. Then do the STOP.. After second interrupt change background color and then "unpause" copper and blitter by turning their DMA back on. They'll continue from where they left off.

Besides, if the copper is running the blitter it's unlikely they're both stealing cycles at the same time. The CPU should have enough cycles available to handle the first int provided bp DMA isn't grabbing too much.

Here's what the bus looks like for an interrupt (from yacht.txt):

n nn ns ni n- n nS ns nV nv np np

Each symbol represents two CPU cycles. The CPU needs the bus for seven DMA cycles: three for the stack, two for the vector address, and two more to fetch the first instruction in the handler. Not too bad. Worst case move needs four more to turn off DMA. Just 11 memory accesses from int to DMA pause.
Yeah, but I was starting from an assumption of an even worse situation
If it's just copper driving the blitter *and* BLTPRI=0 then there's a good chance my method will work.
But if we add any situation where I can't temporarily stop the copper/blitter from acting I'm screwed..

The copper may need to do something else during the x position (change resolution, number of bitplanes, a BPLCON1 shift, some ptrs or who knows what) and if stoped it generate heavy video glitches. I know, I'm a pessimist (I want to consider myself a realist..)

Well, I need an example of the offending copper list from Jobbo and try to build a stable raster on it with the CPU

Perhaps it is simpler than it seems.
(obviously I must not touch the copper list in any way!)
ross is offline  
Old 18 April 2022, 06:28   #15
Jobbo
Registered User
 
Jobbo's Avatar
 
Join Date: Jun 2020
Location: Druidia
Posts: 388
Quote:
Originally Posted by ross View Post
Main idea:
- ciab timer for hsync (need to be solo in the ICR!) in the previous line
- it trigger after undefined number of cycles (even 300 ..), but sure in this very same line
- setup a ciab trigger on next underflow
- STOP #$2500, manage stack
- next IRQ6 is fully cycle exact, now you can do what you want

I've given this a quick try without success, I have some questions:

What is the stop command doing to the status register and why?
What do you mean by managing the stack?
Why do you say the timer needs to be solo?

I updated my test code in github with an example where I try to stabilize after an alarm interrupt. I have a and b timer interrupts in there also to spice things up because otherwise the timing is perfect anyway. No dma is active for this test.

It was a quick try and I'll look more closely at the stop command but wanted to see what more info you might be able to give me.
Jobbo is offline  
Old 18 April 2022, 10:31   #16
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,099
Ross can elaborate if he had something extra things in mind, but:
- stop #$2500 sets the interrupt mask part of the status register to 5 (and keeps the supervisor bit set) -> only level 6 (and 7) interrupts will trigger (See section 6.3.2 of the 68000 user's manual).
- If the timer isn't solo (i.e. there are other possible CIAB interrupts) we can't be sure that the next level 6 interrupt is actually the one we're waiting for

I'm not sure about the manage stack part either, but to be cycle exact you have to make sure the stack, interrupt handler and VBR are all located in chip (or slow) mem.
paraj is offline  
Old 18 April 2022, 11:43   #17
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by Jobbo View Post
I've given this a quick try without success
Please wait, CIAB-TOD ALRM/IRQ behavior is under test.
As usual it does not work exactly as expected and his behavior will soon be described and emulated properly.
(8520/6526 internals are not available..)

Quote:
.. but to be cycle exact you have to make sure the stack, interrupt handler and VBR are all located in chip (or slow) mem.
This, but also remember that STOP is usually used only by OS code and in superstate mode.
If you are in a complex environment, with several possible IRQs that interrupt the STOP and also change processor state it is possible that a stack 'supervisor' is needed.
In fact for this simple case is optional and that is why I pointed it out after the comma
ross is offline  
Old 19 April 2022, 04:54   #18
Jobbo
Registered User
 
Jobbo's Avatar
 
Join Date: Jun 2020
Location: Druidia
Posts: 388
Quote:
Originally Posted by ross View Post
Please wait, CIAB-TOD ALRM/IRQ behavior is under test.
For whatever it's worth I tried a version of my test exe on my real A1200 and the second TOD wait is not 100% stable. There's still earlier A/B waits, it might be stable without those. Or it might just be my bad code or unexpected CPU interruptions in between the stop and when I write out to color0. I did add a 10 scanline wait so I guess something could be happening in that time.

I only mention this since it seemed like there was some expectation that it should be stable. I'm happy to update my test code and do more testing if this helps develop WinUAE or furthers peoples understanding.

For my own purposed I just need to be able to trigger the color0 change consistently inside the hblank so any instability will be hidden.

I seem to be able to get that working relatively well, at least in WinUAE and then on my A1200. Though I might need different wait positions for each.

Where it's getting tricky is with part of my effect that uses HAM. It seems that changing color0 needs to happen at a very specific time. I'm not sure when the leading color is picked up by whatever will determine the left most HAM pixel. This combined with me moving around the fetch and window parameters each frame is making this hard! But also interesting!
Jobbo is offline  
Old 19 April 2022, 10:29   #19
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by Jobbo View Post
I only mention this since it seemed like there was some expectation that it should be stable.
If not 'stable' at least 'predictable'. This is the minimum for emulation

Quote:
I'm happy to update my test code and do more testing if this helps develop WinUAE or furthers peoples understanding.
To tell the truth I haven't looked at your code yet (I have my own specific code suitable for these tests, which is too 'dirty' and crude for general use), but I don't rule out that I can use it as a test bed.

ross is offline  
Old 19 April 2022, 13:13   #20
mc6809e
Registered User
 
Join Date: Jan 2012
Location: USA
Posts: 372
Since you're on the A1200 with a 68020, have you tried using movec to temporarily disable the cache? Maybe some of the inconsistency is due to cache effects.
mc6809e is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Amiga 3000 vertical bars (jail bars) via VGA normal? Per Hansson support.Hardware 15 24 March 2022 20:20
HELP with waiting raster at frame start KONEY Coders. Asm / Hardware 4 26 October 2021 22:44
'Raster Time' with winuae? jimmy2x2x Coders. Asm / Hardware 3 02 December 2014 14:54
raster dma cycles Wepl Coders. Asm / Hardware 9 05 July 2014 16:21
Raster Line Palette System! h0ffman Coders. General 19 10 August 2011 16:46

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 03:51.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.09754 seconds with 15 queries