English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 22 March 2012, 19:12   #1
mc6809e
Registered User
 
Join Date: Jan 2012
Location: USA
Posts: 372
Anyone have any luck with using CPU for manual mode sprites?

It's not too difficult to use manual mode for sprites with the copper. I'm wondering, though, about the feasibility of using the CPU and MOVEM to rapidly reload the sprite registers with new data after the sprites have displayed on a scanline.

The idea is to preload D0-D7 and A0-A7 with sprite data then halt the processor with a STOP instruction. Once stopped, the copper is used to interrupt the CPU mid-scanline and the int handler would use a MOVEM to dump the new sprite data into the sprite registers. It should be possible to get at least 16 sprites per line (with certain limitations, obviously).

I'm wondering, though, about the timing of chip register accesses. Is it four cpu cycles no matter what other DMA is going on? Seems at least the copper should prevent the CPU from accessing chip registers.
mc6809e is offline  
Old 25 March 2012, 06:29   #2
Photon
Moderator
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,602
Nothing stops CPU from writing SPRxDAT etc, it's exactly the same as writing them with the copper. At any chosen time, your Amiga is displaying a copper list. If the CPU wasn't able to change chip registers then, well then playing a song while showing anything onscreen would be impossible

With less than (what is it again? 5?) bitplanes on, MOVEM is indeed twice as fast as the copper with 4 cycles per word, since it doesn't have to read the MOVE-word of the copperlist.

Interrupt handling time varies between CPUs, on 68000 it's something like 70 cycles. Should be possible to time it for stock A500 and A1200, the rest is harder...

No need to stop the CPU if you use interrupts.

Writing 32 words will take between 1/3 to 1/2 scanline on A500, to that is added interrupt handling code and register-loading. I'd busy-wait for the raster with the CPU and save the interrupt handling code.

I guess you're planning to write not only SPRxDAT but also -POS since you're using a MOVEM.

SPRxDAT writing is already present in parallax scrolling games of yore, so nothing impossible about this idea.

Only one way to find out
Photon is offline  
Old 25 March 2012, 20:03   #3
mc6809e
Registered User
 
Join Date: Jan 2012
Location: USA
Posts: 372
Quote:
Originally Posted by Photon View Post
Nothing stops CPU from writing SPRxDAT etc, it's exactly the same as writing them with the copper. At any chosen time, your Amiga is displaying a copper list. If the CPU wasn't able to change chip registers then, well then playing a song while showing anything onscreen would be impossible
But the copper only gets even cycles and the CPU can take any available cycle. It is very difficult to completely starve the CPU. You'd have to create a copper list that grabbed every even cycle and create a display that used maximum vertical overscan with all disk, audio, and sprite DMA turned on.

But my question was more about the timing of the write to the chip registers. If DMA is occurring, can the CPU still write to the registers? Let's suppose the CPU has completed a MOVE instruction fetch and is ready to execute its write to the CHIP registers. Is this write blocked during DMA? If it's just like the copper, then it should block, bit I'm not sure. The CPU isn't competing with the chipset for access to RAM, after all. Can a chip registers write happen while DMA is happening? I tend to believe that the CPU is blocked, otherwise you can get into situations where the copper and CPU are trying to both update the same register at the same time. Maybe Toni knows what happens here.

Quote:
Originally Posted by Photon View Post
With less than (what is it again? 5?) bitplanes on, MOVEM is indeed twice as fast as the copper with 4 cycles per word, since it doesn't have to read the MOVE-word of the copperlist.

Interrupt handling time varies between CPUs, on 68000 it's something like 70 cycles. Should be possible to time it for stock A500 and A1200, the rest is harder...

No need to stop the CPU if you use interrupts.

Writing 32 words will take between 1/3 to 1/2 scanline on A500, to that is added interrupt handling code and register-loading. I'd busy-wait for the raster with the CPU and save the interrupt handling code.
The trouble is that interrupts on the 68000 are handled between instructions so the time between the generated interrupt and writing to the chip registers is going to vary with where the interrupt occurs in the instruction sequence. There are even situations where the instruction sequence has no conditional branches and is predictable, but where the interrupt latency is still uncertain because it depends on the data being processed. If the instruction stream includes MULS, for example, interrupt handling latency will vary with the pattern of bits in one of the operands! With busy-waiting, things are much more predictable of course, but that still wastes bus cycles. With a STOP instruction, though, precise timing is possible and the bus is freed for other things like a blit.

Quote:
Originally Posted by Photon View Post
I guess you're planning to write not only SPRxDAT but also -POS since you're using a MOVEM.

SPRxDAT writing is already present in parallax scrolling games of yore, so nothing impossible about this idea.

Only one way to find out
Yeah, I've seen examples where the copper modified SPRxDAT. Getting twice the write speed of MOVEM, though, is enticing. I'm also interested it what happens when the palette registers are written with MOVEM when more than 4 bit planes are active. In this case having the chipset block writes during DMA is actually a good thing since it lets you spread out palette changes over the length of a scanline. In ham mode, it should be possible to change the lower 16 palette registers twice over the length of the line. With the STOP instruction/copper interrupt method, this would allow you to time the palette changes precisely with one palette change every 8 pixels. There would be a couple of spots where you couldn't do this, of course, because the CPU would have to do another MOVEM instruction fetch.
mc6809e is offline  
Old 26 March 2012, 08:26   #4
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,505
Chip RAM, Slow RAM and custom chipset registers share same internal Agnus busses and also have exact same timing = CPU chipset register access or chip ram access stalls identically if DMA accesses Chip RAM at the same time.

I agree that STOP is probably the best way to do this. Another accurate but also quite stupid metho is to start big blit that steals all CPU cycles. Copper can stop the blit by writing to DMACON, "releasing" the CPU
Toni Wilen is offline  
Old 27 March 2012, 21:16   #5
Photon
Moderator
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,602
If blitter nasty is on this is a better alternative than stopping the CPU. When I visited this Stop the Cpu Land 20 years ago I found no fruit. Or something

It seems a terrible waste (well, on cycle-starved OCS at least) when what you really want the CPU to do is being kept as busy as possible, indeed, same for the whole chipset, and without any of them doing a single unnecessary thing. You know, like run the OS and $#|+.

As Toni also says, writing chip regs with the CPU is exactly like writing them with the copper, except, as mentioned the copper wasting 4c reading the register word.


Quote:
Originally Posted by mc6809e View Post
But the copper only gets even cycles and the CPU can take any available cycle.
You got it the wrong way round. CPU is way down the list of getting MA slots, even below blitter DMA which is below the rest of the DMA.

Between DMA and CPU, DMA has higher priority and CMA has lower priority. Highest is the DRAM refresh slots because they can't be disabled, but apart from that the highest priority MA hog is display DMA (when bitplanes are on of course). After all the other DMA slots on a line comes the blitter DMA, and last the CPU MA / CMA.

With CPU spriteDMA, on OCS it seems the CPU will waste 1/3 of the slots on a line busy-waiting or being stopped. The rest of the time it will be MOVEMing, and no other odd DMA must disturb it (blitter or >4 bpls).

As I said, don't ask, just try it. It's just an evening's work and it's your idea. My suggestion is to start with a busy-wait-for-HPOS (IIRC, diff hpos on odd/even lines) and a big MOVEM loaded before the wait.

Just as you can build a copperlist, you can build a complete program for the whole screen. If you put it in chipmem, accelerators (except ACA630 with some options on) won't throw up.
Photon is offline  
Old 28 March 2012, 23:47   #6
mc6809e
Registered User
 
Join Date: Jan 2012
Location: USA
Posts: 372
Quote:
Originally Posted by Photon View Post
You got it the wrong way round. CPU is way down the list of getting MA slots, even below blitter DMA which is below the rest of the DMA.

Between DMA and CPU, DMA has higher priority and CMA has lower priority.
You misunderstood. I said that the CPU can use any AVAILABLE cycle. Obviously if the copper is fetching instructions then the CPU is blocked. But the copper can only fetch on even cycles. It's entirely possible to have the CPU using odd cycles while the copper uses even cycles -- on horizontal scanlines immediately after VBlank that are still outside the playfield, for example. That was my point about the difficulty of trying to starve the CPU.

And while I agree with you that using the blitter to perform work while blocking the CPU can be a good use of cycles over just using STOP, you still have the problem of precisely timing chip register writes. The initiation of the blit must be done at the correct time so that the CPU performs its writes at the correct time. The only way I can see this being possible is by using the STOP instruction at least once. I suppose after that blits can be used, but you need at least one point where the CPU is perfectly in sync with the beam.
mc6809e is offline  
Old 20 October 2015, 01:13   #7
ReadOnlyCat
Code Kitten
 
Join Date: Aug 2015
Location: Montreal/Canadia
Age: 52
Posts: 1,178
My, just did a google search and look what an interesting thread I found.
The EAB is just a treasure trove of technical information.

So, without further ado, let me invigorate these old bones with new life! (*)

Quote:
Originally Posted by mc6809e View Post
[...] interrupts on the 68000 are handled between instructions so the time between the generated interrupt and writing to the chip registers is going to vary with where the interrupt occurs in the instruction sequence. [...]. If the instruction stream includes MULS, for example, interrupt handling latency will vary with the pattern of bits in one of the operands!
Fantastic, I knew interrupts had variable handling time on the 68k but did not know exactly why. Thank you Sir mc6809e for enlightening this kitten!

Quote:
Originally Posted by mc6809e View Post
Getting twice the write speed of MOVEM, though, is enticing. I'm also interested it what happens when the palette registers are written with MOVEM when more than 4 bit planes are active. In this case having the chipset block writes during DMA is actually a good thing since it lets you spread out palette changes over the length of a scanline. In ham mode, it should be possible to change the lower 16 palette registers twice over the length of the line. With the STOP instruction/copper interrupt method, this would allow you to time the palette changes precisely with one palette change every 8 pixels. There would be a couple of spots where you couldn't do this, of course, because the CPU would have to do another MOVEM instruction fetch.
A great idea! Did you get to use it? This could be used for CPU chunky couldn't it?
Also, now that I am thinking of it, Photon, are you using this technique in the Frazetta demo by any chance?

Quote:
Originally Posted by Photon View Post
With less than (what is it again? 5?) bitplanes on, MOVEM is indeed twice as fast as the copper with 4 cycles per word, since it doesn't have to read the MOVE-word of the copperlist.
So to pounce back on the idea of CPU chunky, that means that with a purely CPU rendered image and four bitplanes we can create a full color 80x200/256 mode right with some finer details added via the bitplanes and/or sprites, right? As mc6809e said, it would not be exactly 80 pixels because of
movem
"reloading" but close enough.

Obviously, only the vertical blanking area will be left for any real processing time but still quite cool.

Did anyone ever do that on OCS?

Quote:
Originally Posted by Toni Wilen View Post
Chip RAM, Slow RAM and custom chipset registers share same internal Agnus busses and also have exact same timing = CPU chipset register access or chip ram access stalls identically if DMA accesses Chip RAM at the same time.
Even during reads? What a waste. Don't Agnus and the other kittens have their own internal bus?

Quote:
Originally Posted by mc6809e View Post
But the copper can only fetch on even cycles. It's entirely possible to have the CPU using odd cycles while the copper uses even cycles -- on horizontal scanlines immediately after VBlank that are still outside the playfield, for example. That was my point about the difficulty of trying to starve the CPU.
But by default the CPU will use even cycles and in order for it to use odd ones you would have to force by executing an instruction with an odd number of cycles first. Another issue is that even if you have no bitplanes displayed, RAM refresh will reset it back to even cycles. So if you wanted them to always alternate you would have to force the CPU back to odd cycles every line.

(*)
ReadOnlyCat is offline  
Old 20 October 2015, 08:51   #8
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,505
Quote:
Originally Posted by ReadOnlyCat View Post
Even during reads? What a waste. Don't Agnus and the other kittens have their own internal bus?
Chipset side data bus is shared between all custom chips and Chip RAM.
Toni Wilen is offline  
Old 20 October 2015, 23:26   #9
ReadOnlyCat
Code Kitten
 
Join Date: Aug 2015
Location: Montreal/Canadia
Age: 52
Posts: 1,178
Quote:
Originally Posted by Toni Wilen View Post
Chipset side data bus is shared between all custom chips and Chip RAM.
Oki. That is good to know, thanks again.
Makes it even more regrettable that they did not put the Chip registers in Chip RAM... This would have allowed the blitter to set them for super fast setups.
ReadOnlyCat is offline  
Old 21 October 2015, 10:56   #10
jotd
This cat is no more
 
jotd's Avatar
 
Join Date: Dec 2004
Location: FRANCE
Age: 52
Posts: 8,162
that's why copperlists exist right?
jotd is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Manual CPU Frequency setting not working Scyphe support.WinUAE 11 19 July 2013 12:00
Change CPU in Save State Mode arti support.WinUAE 9 23 December 2010 19:27
Anyone had any luck with ADFView? MethodGit Amiga scene 1 25 September 2010 23:27
Problems with Detect Idle CPU mode bdoe support.WinUAE 6 27 September 2002 13:44
some luck - still no CD drives Unregistered support.WinUAE 2 13 September 2002 23:16

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 03:44.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.08541 seconds with 15 queries