English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 10 June 2016, 12:34   #1
zero
Registered User
 
Join Date: Jun 2016
Location: UK
Posts: 428
Maximum blitter speed with pipelining

How can we get the maximum speed from the blitter? Let's target Amigas with only chip RAM, so a stock A500 or A1200.

The blitter registers are not double buffered, so you can't pre-load them between operations. The CPU stops when the blitter has priority anyway...

Can you toggle the priority bit on half way through an operation? Then you could start in friendly mode, load up the next operation into CPU registers and enable the priority bit, and then immediately write the next operation settings into the blitter's registers.

What other techniques can be used to get maximum speed from the blitter? Say I have a load of pre-calculated operations I want to perform, like a demo effect or bob list.
zero is offline  
Old 10 June 2016, 12:42   #2
britelite
Registered User
 
Join Date: Feb 2010
Location: Espoo / Finland
Posts: 818
Quote:
Originally Posted by zero View Post
The CPU stops when the blitter has priority anyway...
No, it doesn't
britelite is offline  
Old 10 June 2016, 13:03   #3
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,506
I'd say "it depends", not no or yes
Toni Wilen is offline  
Old 10 June 2016, 13:26   #4
Mrs Beanbag
Glastonbridge Software
 
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,243
if you have only Chip RAM and no instruction caches
Mrs Beanbag is offline  
Old 10 June 2016, 13:59   #5
zero
Registered User
 
Join Date: Jun 2016
Location: UK
Posts: 428
Yeah, should have specified, a 68000 with only chip RAM stops, but of course an A1200 with 68EC020 can continue to execute code from its cache. However, in this scenario, since we need to fetch the next blitter operation parameters from RAM...

I suppose you could use speedcode, but getting the instructions into the cache seems like a tricky problem.

What about the copper? If you calculate the right wait positions so that it copies new data as soon as the blitter finishes, it could be quicker than the CPU I think. There might be some wasted cycles due to the copper not being able to wait for the exact point it needs to (extremes of the scanline).
zero is offline  
Old 10 June 2016, 19:05   #6
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,506
It still depends, even with chip ram only A500 (or chip+"slow" ram). It depends on selected channel combination, some have idle cycles that are free for the CPU.

Copper has blitter wait bit, it is used in many demos to start multiple blits sequentially.
Toni Wilen is offline  
Old 11 June 2016, 22:20   #7
zero
Registered User
 
Join Date: Jun 2016
Location: UK
Posts: 428
Thanks Toni. So would you say that is the fastest possible way to queue up blits, but waiting and loading registers with the copper?

Any other useful optimizations? I suppose arranging your bitmaps in memory so that you don't need to reload address registers might help.
zero is offline  
Old 12 June 2016, 06:28   #8
ReadOnlyCat
Code Kitten
 
Join Date: Aug 2015
Location: Montreal/Canadia
Age: 52
Posts: 1,178
This is definitely the fastest since the copper will react much faster than the CPU to a blitter interrupt and will be much faster at setting blitter registers.
Possible optimizations include grouping blits so that consecutive blits share very similar setups so the minimum amount of registers need to be re-set between each but his might require quite a bit of CPU if you need to do this dynamically and cannot predict this order statically.
ReadOnlyCat is offline  
Old 12 June 2016, 06:34   #9
sandruzzo
Registered User
 
Join Date: Feb 2011
Location: Italy/Rome
Posts: 2,281
Interleaved bitplanes could be faster than usual bitplanes setup
sandruzzo is offline  
Old 17 July 2016, 18:09   #10
Photon
Moderator
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,602
Quote:
Originally Posted by zero View Post
The CPU stops when the blitter has priority anyway...
Quote:
Originally Posted by britelite View Post
No, it doesn't
It does for most things everyone does with the Blitter.

Exceptions are clear and polyfill.

If you need CPU cycles during a blit, you can disable BLTPRI or use the uncommon USEx channel masks 7, 5, or 3. This will make the blit finish later.
Photon is offline  
Old 18 July 2016, 10:48   #11
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,410
Quote:
Originally Posted by ReadOnlyCat View Post
This is definitely the fastest since the copper will react much faster than the CPU to a blitter interrupt and will be much faster at setting blitter registers.
Possible optimizations include grouping blits so that consecutive blits share very similar setups so the minimum amount of registers need to be re-set between each but his might require quite a bit of CPU if you need to do this dynamically and cannot predict this order statically.
I'm a bit confused about this.

Surely if you're going to blit with the Copper the CPU has to create/update the copperlist for every blit you do? Doesn't this mean you end up spending more time because the CPU would've done more or less the same writes anyway to set up the blitter even when the Copper isn't used?

I guess it depends a bit, if you're dead set on using blitter interrupts it'd probably be faster than the CPU, but if you just blit in sequence with the CPU you don't have the overhead of interrupts so the CPU should win* in that case, shouldn't it?

*) You'd only have Blitter wait overhead but that can be limited by dynamically switching the BLTPRI bit in DMACON as part of the blitter wait.
roondar is offline  
Old 18 July 2016, 10:51   #12
sandruzzo
Registered User
 
Join Date: Feb 2011
Location: Italy/Rome
Posts: 2,281
Quote:
Originally Posted by roondar View Post
I'm a bit confused about this.

Surely if you're going to blit with the Copper the CPU has to create/update the copperlist for every blit you do? Doesn't this mean you end up spending more time because the CPU would've done more or less the same writes anyway to set up the blitter even when the Copper isn't used?

I guess it depends a bit, if you're dead set on using blitter interrupts it'd probably be faster than the CPU, but if you just blit in sequence with the CPU you don't have the overhead of interrupts so the CPU should win* in that case, shouldn't it?

*) You'd only have Blitter wait overhead but that can be limited by dynamically switching the BLTPRI bit in DMACON as part of the blitter wait.
Maybe, if you can precalculate some copper list, you can archive maximum blitter speed
sandruzzo is offline  
Old 18 July 2016, 11:12   #13
zero
Registered User
 
Join Date: Jun 2016
Location: UK
Posts: 428
It helps to think about what the CPU has to do in order to set up the blitter. The CPU has to write several words to the blitter's registers. Normally it would fetch the data from RAM into Dx registers and then write it out again, but that could be optimized a bit with speed code (self-modifying code loading immediate values).

The copper is a bit more efficient because it's all basically speed code, and because it uses a reduced address space to write values encoded in the instructions directly to chipset registers.

In other words, the CPU has to do three memory accesses (load immediate instruction, move to absolute address instruction, write to blitter register) and the copper only has to do two (read instruction, write to blitter register).

Of course it's different when you have an 020 or some fast RAM and the CPU can pre-load the values it wants to write to the blitter. Even then the copper is probably faster because the CPU will have to poll the blitter to check when it has finished.

This brings us to the other major advantage of the copper. It can wait for the blitter to finish and there is no interrupt or polling overhead.

So the fastest way to blit is to create a custom copperlist. That can get very tricky if you are trying to use the copper for other stuff like palette changes and sprite hacks. I think this is why you rarely see both in demo effects that make heavy use of the blitter.
zero is offline  
Old 18 July 2016, 11:15   #14
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,410
Quote:
Originally Posted by sandruzzo View Post
Maybe, if you can precalculate some copper list, you can archive maximum blitter speed
But then you'd still need to update that precalculated list whenever you want more/less bobs, or different positions/animation frames for your bobs.

Maybe I'm just missing something, but it still feels to me using the copper to blit won't be faster. More convenient because you don't need to bother with interrupts etc, but I'm not so sure about it being faster.
roondar is offline  
Old 18 July 2016, 11:18   #15
sandruzzo
Registered User
 
Join Date: Feb 2011
Location: Italy/Rome
Posts: 2,281
Quote:
Originally Posted by roondar View Post
But then you'd still need to update that precalculated list whenever you want more/less bobs, or different positions/animation frames for your bobs.

Maybe I'm just missing something, but it still feels to me using the copper to blit won't be faster. More convenient because you don't need to bother with interrupts etc, but I'm not so sure about it being faster.
For general game will be a challenge to do that. Maybe some steady part, like scrolling update can made by blitter, or maybe demo and 3d stuffs.
sandruzzo is offline  
Old 18 July 2016, 11:21   #16
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,410
Quote:
Originally Posted by zero View Post
It helps to think about what the CPU has to do in order to set up the blitter. The CPU has to write several words to the blitter's registers. Normally it would fetch the data from RAM into Dx registers and then write it out again, but that could be optimized a bit with speed code (self-modifying code loading immediate values).

The copper is a bit more efficient because it's all basically speed code, and because it uses a reduced address space to write values encoded in the instructions directly to chipset registers.

In other words, the CPU has to do three memory accesses (load immediate instruction, move to absolute address instruction, write to blitter register) and the copper only has to do two (read instruction, write to blitter register).

Of course it's different when you have an 020 or some fast RAM and the CPU can pre-load the values it wants to write to the blitter. Even then the copper is probably faster because the CPU will have to poll the blitter to check when it has finished.

This brings us to the other major advantage of the copper. It can wait for the blitter to finish and there is no interrupt or polling overhead.

So the fastest way to blit is to create a custom copperlist. That can get very tricky if you are trying to use the copper for other stuff like palette changes and sprite hacks. I think this is why you rarely see both in demo effects that make heavy use of the blitter.
I'm not denying that the copper list will be executed faster than the CPU can set up the blitter. It will!

What I'm saying is that:

A) the polling overhead is, if done right, nearly free* and
B) my guess is that writing the copper list will take about as much time as it would've to just blit direcly with the CPU (consider: you need to the the exact same work as setting up the blitter to write out the copperlist updates)

Now, I haven't actually created a copper based blitting system where the CPU creates/update such a list so I could be very wrong indeed. Hence my confusion

*) if you use the extra DMA acces you need to use for safe blitter waiting to set BLTPRI to on as part of the blitter wait and then deactivate it after the blitter wait loop you're looking at something on the order of 20 cycles total overhead per blit (assuming 68000 / no fast memory).

Last edited by roondar; 18 July 2016 at 11:26.
roondar is offline  
Old 18 July 2016, 11:37   #17
sandruzzo
Registered User
 
Join Date: Feb 2011
Location: Italy/Rome
Posts: 2,281
Could we have some dinamyc copper list using copper's jump register, and update only that with cpu?
sandruzzo is offline  
Old 18 July 2016, 11:59   #18
zero
Registered User
 
Join Date: Jun 2016
Location: UK
Posts: 428
Quote:
Originally Posted by roondar View Post
B) my guess is that writing the copper list will take about as much time as it would've to just blit direcly with the CPU (consider: you need to the the exact same work as setting up the blitter to write out the copperlist updates)
That's definitely not the case though. You can create a copperlist like this (pseudo-code, I don't have my Amiga hat on right now):

Code:
move $00, BLTCPTH
move $00, BLTCPTL
...
move $00, BLTCON0
move $00, BLTSIZE
wait for blitter finished bit
move $00, BLTCPTH
move $00, BLTCPTL
...
move $00, BLTCON0
move $00, BLTSIZE
wait for blitter finished bit
....
Now, all you need to do is set up your blits is write some words into this copperlist. You can replace one of the waits with the usual $FFFF,$FFFE to end the list if you don't need every "slot".

Consider that many of the values will not change from frame to frame. In a game you might have slots allocated to player and enemy bobs, so their source addresses, masks, functions and sizes stay the same. Only the destination address and bit rotation changes.
zero is offline  
Old 18 July 2016, 16:28   #19
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,410
Quote:
Originally Posted by zero View Post
That's definitely not the case though. You can create a copperlist like this (pseudo-code, I don't have my Amiga hat on right now):

Code:
move $00, BLTCPTH
move $00, BLTCPTL
...
move $00, BLTCON0
move $00, BLTSIZE
wait for blitter finished bit
move $00, BLTCPTH
move $00, BLTCPTL
...
move $00, BLTCON0
move $00, BLTSIZE
wait for blitter finished bit
....
Now, all you need to do is set up your blits is write some words into this copperlist. You can replace one of the waits with the usual $FFFF,$FFFE to end the list if you don't need every "slot".

Consider that many of the values will not change from frame to frame. In a game you might have slots allocated to player and enemy bobs, so their source addresses, masks, functions and sizes stay the same. Only the destination address and bit rotation changes.
So, the gain is in not needing to update every value every frame.

That is indeed useful, though I suspect you'll need to update source/mask adresses relatively often for animation purposes - at which point the gain will be lower.

Nice idea though, learned something new
roondar is offline  
Old 18 July 2016, 21:42   #20
ReadOnlyCat
Code Kitten
 
Join Date: Aug 2015
Location: Montreal/Canadia
Age: 52
Posts: 1,178
Quote:
Originally Posted by roondar View Post
So, the gain is in not needing to update every value every frame.

That is indeed useful, though I suspect you'll need to update source/mask adresses relatively often for animation purposes - at which point the gain will be lower.

Nice idea though, learned something new
It doesn't work with every type of game though. If one needs to create a Copper gradient while driving the Blitter then a pre-made Copper list does not work anymore. This is why Lotus 2 (or 3, I am not sure anymore) uses CPU interruptions to drive the Blitter: they needed the Copper to create the colored roadside strips.
ReadOnlyCat is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Maximum speed of the internal serial port? Iznougoud support.Hardware 32 06 November 2020 23:18
Blitter filling speed, how much? sandruzzo Coders. Asm / Hardware 7 03 July 2015 14:38
FS-UAE uses always maximum CPU speed? AGS support.FS-UAE 6 15 February 2015 13:08
Maximum MaxTransfer and ATAPI speed (IDEfix97) Leandro Jardim support.WinUAE 2 04 August 2014 14:45
CD/DVD Drive Maximum Speed Calgor support.Hardware 2 19 June 2007 16:18

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 17:47.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.38158 seconds with 13 queries