English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 24 May 2022, 03:33   #1
remz
Registered User
 
Join Date: May 2022
Location: Canada
Posts: 138
Question Chunky True Color 4 pixels

Hi Amiga coders,

I was thinking of something and wanted to see if this would be technically doable. Please rectify me at any steps, should I made any errors or miscalculations.

The Copper can Move a color registers in 8 clock cycles when 4 or less bitplanes are enabled. (i.e.: If I am not mistaken the Amiga cpu clock speed matches a lowres pixel duration?)
It appears that even when turning off bitplane DMA, the copper wont go faster than 8 pixels per color change: This seems to indicate the copper fetches its two 16-bit instructions words one at a time with 2 clock cycle "internal processing" in between, maybe like so:
Code:
read-work-read-Move-...
 0 1  2 3  4 5  7 8
The 68000 appears to act similarly: Even all dma are turned off, it wont go faster since it usually reads from memory, then does internal work. This is what makes the Amiga 68000 appears to run at full speed even when bitplane dma uses all the odd cycles during the screen display portions.

With that in mind, I thought that by interleaving copper and 68000 both setting color registers, it should be possible to have a 4 pixels wide chunky full color screen running in 0 bitplane. (However the only way I found so far to emit one word to color #0 in only 8 clock cycles is move d0,(a0), which implies preloading the cpu registers. The dma timing would look like this:
Code:
Copper: read-work-read-Move
CPU:    read-Move-read-work
clock:   0 1  2 3  4 5  7 8
But I wanted at least to spawn the discussion if this is something thinkable.

(note: The cpu doesn't appear to align perfectly at every scanline even with interrupt turned off: maybe something precise needs to be taken into account, for example one 'nop' every other scaneline perhaps due to alternating horizontal line length, I am not sure at this point).

Also note that this is assumed to be all running from chip ram. If cpu is running off fast ram, then technically bitplane dma could be still running up to 4 bitplanes without problem.

Last edited by remz; 24 May 2022 at 03:42. Reason: Typo in title
remz is offline  
Old 24 May 2022, 09:56   #2
hooverphonique
ex. demoscener "Bigmama"
 
Join Date: Jun 2012
Location: Fyn / Denmark
Posts: 1,624
The following thread discusses color changes using cpu: http://eab.abime.net/showthread.php?t=110394
hooverphonique is offline  
Old 24 May 2022, 10:05   #3
bloodline
Registered User
 
bloodline's Avatar
 
Join Date: Jan 2017
Location: London, UK
Posts: 433
Quote:
Originally Posted by remz View Post

With that in mind, I thought that by interleaving copper and 68000 both setting color registers, it should be possible to have a 4 pixels wide chunky full color screen running in 0 bitplane.
I'm not sure what you are tying to achieve here.

The advantage of a chunky display is to reduce the number of RAM accesses. So for the CPU to write a pixel on a normal 16 colour Amiga planar display it requires at least 4 separate RAM writes (and possibly some reads as well) not to mention masking and shifting (though 68k Bitwise operations mitigate this somewhat).

With a chunky display the pixel can be written with a single RAM write (or perhaps a read and a write in the case of a 4 bit framebuffer where you might need to mask half of the byte you aren't writing to).

I don't see how changing the colour palette multiple times per scanine using both the CPU and the Copper helps in this situation
bloodline is offline  
Old 24 May 2022, 10:32   #4
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by remz View Post
However the only way I found so far to emit one word to color #0 in only 8 clock cycles is move d0,(a0), which implies preloading the cpu registers.
You have already answered what the main problem is

Considering a maximum preload of 15 registers (with a7=$dff180) and using 17 changes with copper (the first and last in the line),
you would have a 128 wide pixels chunky 'screen' (15+17)*4, not extended to a full view.

Quote:
(note: The cpu doesn't appear to align perfectly at every scanline even with interrupt turned off
This is another problem that is not trivial to solve, it is possible to do it but with effort.

That said, as an 'academic' problem it's interesting, but there are other ways to make chunky displays that are more usable.
ross is offline  
Old 24 May 2022, 11:01   #5
bloodline
Registered User
 
bloodline's Avatar
 
Join Date: Jan 2017
Location: London, UK
Posts: 433
Quote:
Originally Posted by ross View Post
You have already answered what the main problem is

Considering a maximum preload of 15 registers (with a7=$dff180) and using 17 changes with copper (the first and last in the line),
you would have a 128 wide pixels chunky 'screen' (15+17)*4, not extended to a full view.


This is another problem that is not trivial to solve, it is possible to do it but with effort.

That said, as an 'academic' problem it's interesting, but there are other ways to make chunky displays that are more usable.
Ugh! So remz is actually trying to write a realtime Chunky to Planar conversion algorithm?
bloodline is offline  
Old 24 May 2022, 11:57   #6
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by bloodline View Post
Ugh! So remz is actually trying to write a realtime Chunky to Planar conversion algorithm?
Well, actually the chunky to planar conversion is not there

He is trying to directly display a 12-bit true color buffer on the screen.
The buffer itself is not linear (or double), because it contains the even pixels on one side for the copper, and the odd ones on the other for the cpu.

This of course also leads to the problem of rendering to this buffer(s)..
ross is offline  
Old 24 May 2022, 19:31   #7
remz
Registered User
 
Join Date: May 2022
Location: Canada
Posts: 138
Yes I was intrigued about the "technical possibility" more than its real-life usefulness, as you both mentionned the memory layout would be irksome and timings complicated.
However with fast ram and 16 colors (4 bitplanes), possibly changing 80 colors per scanline could be intriguing.
A sort of "UltraDynamic HighColor" mode?
remz is offline  
Old 24 May 2022, 21:02   #8
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,098
With fast ram available it probably doesn't make sense to involve the copper at all (at least for changing colors).
paraj is offline  
Old 24 May 2022, 23:14   #9
defor
Registered User
 
Join Date: Jun 2020
Location: Brno
Posts: 90
I'm afraid that using CPU to fetch colors to Denise (plus rather uncomfortable color buffer as some colors are set by CPU and others by Copper) is too restrictive.
This is very nice writing about good old classic copper chunky (on OCS using 7bpl bug): https://eab.abime.net/showthread.php?t=107015
defor is offline  
Old 25 May 2022, 00:04   #10
remz
Registered User
 
Join Date: May 2022
Location: Canada
Posts: 138
Yes that video of 57 copper chunky trick was very inspiring.
With code running off Fast ram, is it possible to fully saturate the chip ram bus completely just with the 680x0 cpu? If I read correctly, maximum chip ram bandwidth is 7.15MB/sec?
This would mean being able to set one word every two pixels?
(meaning a potential 160 pixels true color mode?
I tried it in WinUAE but I didn't manage to get smaller than 4 pixel wide.

[edit] Thinking about it, setting a color register has nothing to do with chip ram: It is direct access to Denise, so it doesn't have any dma bandwidth restriction.

Do someone know if the display hardward is fetching color registers at every pixels during a scanline? Maybe there is a limit in there too.

Last edited by remz; 25 May 2022 at 02:11. Reason: Adding precision about chip ram bandwidth
remz is offline  
Old 25 May 2022, 09:38   #11
defor
Registered User
 
Join Date: Jun 2020
Location: Brno
Posts: 90
There are 8 bus cycles per 16 lo-res pixels. The bus arbitration allows CPU to access every second bus cycle only (*if cycle is available). Hence 4 pixels by CPU, at best. You must check cycle counts for your particular processor (and its operating frequency) if it is able to utilize every available bus cycle. Therefore it is very configuration dependent.
(P.S.: Custom registers access (i.e. custom-chips) happens through the bus = all chip-ram access restrictions apply.)
defor is offline  
Old 25 May 2022, 10:11   #12
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,502
It is not possible for CPU to access chip ram (or chipset bus) every cycle. All chipset variants have same interleaved CPU access timing: first cycle is used to transfer address to Agnus/Alice (this cycle is always free for chipset DMA), second cycle is used to transfer data.

Fast CPUs waste lots of cycles doing nothing when accessing chip bus.

EDIT: 7M/s is possible if chip ram bus is 32-bit (A3000 or AGA)

Last edited by Toni Wilen; 25 May 2022 at 10:24.
Toni Wilen is offline  
Old 25 May 2022, 23:23   #13
remz
Registered User
 
Join Date: May 2022
Location: Canada
Posts: 138
Quote:
Originally Posted by defor View Post
There are 8 bus cycles per 16 lo-res pixels. The bus arbitration allows CPU to access every second bus cycle only (*if cycle is available). Hence 4 pixels by CPU, at best. You must check cycle counts for your particular processor (and its operating frequency) if it is able to utilize every available bus cycle. Therefore it is very configuration dependent.
(P.S.: Custom registers access (i.e. custom-chips) happens through the bus = all chip-ram access restrictions apply.)
"allows CPU to access every second bus cycle only":
You mean even with all DMA off, the CPU cannot uses all bus cycle?
For example, if I tried to MOVEM 64 bytes to set the whole 32 color palette as fast as possible, the MOVEM itself when done on chip ram would not be 14+4*32 = 142 clock cycles to set 64 bytes? (i.e.: one color per 2 lo-res pixel?)

Toni:
What you are saying is interesting for the Amiga 3000 32-bit chip ram: basically I would be inclined to say the Amiga 3000 could be running as an "almost AGA" speed: with 32-bit chip ram access, and fast ram, would the CPU be able to set sprites and colors potentially 4 times faster than copper?
This could open the door for massive ECS sprites by recycling them during a scanline by the CPU instead of the copper Oh I am tempted to try it
remz is offline  
Old 25 May 2022, 23:46   #14
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,408
Quote:
Originally Posted by remz View Post
"allows CPU to access every second bus cycle only":
You mean even with all DMA off, the CPU cannot uses all bus cycle?
For example, if I tried to MOVEM 64 bytes to set the whole 32 color palette as fast as possible, the MOVEM itself when done on chip ram would not be 14+4*32 = 142 clock cycles to set 64 bytes? (i.e.: one color per 2 lo-res pixel?)
Assuming a 68000, it would around 142 (maybe 144, I thought it was 16+8*registers cycles for a movem.l to an address). However, half of those cycles are not on the bus, but internal to the CPU. You can see this in WinUAE with cycle accurate timing and using the Visual DMA Debugger feature. Doing so, you'll notice that CPU activity always is interleaved with either idle cycles or other DMA, never back to back.

This half-internal, half-bus split is also why the 68000 on an OCS/ECS system isn't really slowed down by bitplane DMA until you go to 5 bitplanes lowres or 3 bitplanes hires.

Note however that this explanation is slightly simplified. For one, the CPU can access any cycle on the Chip Memory bus that isn't in use by DMA, it just can't access two cycles back to back.
Quote:
Toni:
What you are saying is interesting for the Amiga 3000 32-bit chip ram: basically I would be inclined to say the Amiga 3000 could be running as an "almost AGA" speed: with 32-bit chip ram access, and fast ram, would the CPU be able to set sprites and colors potentially 4 times faster than copper?
This could open the door for massive ECS sprites by recycling them during a scanline by the CPU instead of the copper Oh I am tempted to try it
Interesting idea, but do note that Fast RAM isn't infinite speed either, so you'll lose some speed compared to the theoretical maximum you point to here because you'd have to read in the data at some point too. That said, the Copper effectively manages to write only 1 word per 4 DMA cycles and this would be able to write 2 long words in the same time. So if you pre-load the CPU registers, this might be quite interesting.

Might be hard to time correctly though and I don't actually know if the A3000 has static Chip RAM access speeds or that they are CPU dependent. On A1200 at least many CPU cards don't get full bandwidth when accessing Chip RAM, this might also be the case on the A3000?

Edit: the above text was replaced, it erroneously referred to speed differences between the 68000/OCS and 32 bit Chip RAM speeds instead of Copper vs. CPU on 32 bit ECS/AGA.

Last edited by roondar; 25 May 2022 at 23:53. Reason: Misunderstood the post I replied to.
roondar is offline  
Old 26 May 2022, 00:58   #15
remz
Registered User
 
Join Date: May 2022
Location: Canada
Posts: 138
Can you however interleave Copper and CPU to saturate the chip ram bandwidth if bitplane dma is turned off?
The problem that I expect with the copper is that it itself runs off chip ram: so any Move operation costs two word-fetches.

Please correct me if I'm wrong, but the Copper writing to a custom register, is it "using the bus"? From what I understand so far, it seems not: That would mean copper can write to any custom registers (even to other chips like Denise and Paula) "for free"?
remz is offline  
Old 26 May 2022, 08:03   #16
bloodline
Registered User
 
bloodline's Avatar
 
Join Date: Jan 2017
Location: London, UK
Posts: 433
Quote:
Originally Posted by remz View Post
Can you however interleave Copper and CPU to saturate the chip ram bandwidth if bitplane dma is turned off?
The problem that I expect with the copper is that it itself runs off chip ram: so any Move operation costs two word-fetches.

Please correct me if I'm wrong, but the Copper writing to a custom register, is it "using the bus"? From what I understand so far, it seems not: That would mean copper can write to any custom registers (even to other chips like Denise and Paula) "for free"?
The Cooper uses the chipbus to load its two instruction words. Each of which uses a DMA slot. The copper move instruction uses the first word to load the register address from chipram and the second word to load the register’s new value from chipram.

-Edit- I don’t call that “for free”, as all Copper instructions use the same two cycles.

Last edited by bloodline; 26 May 2022 at 08:21.
bloodline is offline  
Old 26 May 2022, 10:13   #17
Cyprian
Registered User
 
Join Date: Jul 2014
Location: Warsaw/Poland
Posts: 171
Quote:
Originally Posted by Toni Wilen View Post
It is not possible for CPU to access chip ram (or chipset bus) every cycle. All chipset variants have same interleaved CPU access timing: first cycle is used to transfer address to Agnus/Alice (this cycle is always free for chipset DMA), second cycle is used to transfer data.
is the same access scheme valid also for a hardware registers, e.g. color registers?

Or they can be accessed faster (1 CPU clock or 1 chipset bus)?

Last edited by Cyprian; 26 May 2022 at 10:49.
Cyprian is offline  
Old 26 May 2022, 10:14   #18
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,502
Quote:
Originally Posted by remz View Post
would the CPU be able to set sprites and colors potentially 4 times faster than copper?
No because custom registers are 16-bit wide.

32-wide chip RAM (A3000 or AGA):
CPU can read or write 32-bit word (if address is 32-bit aligned) every second chipset cycle.
Custom registers (all chipsets):
CPU can read or write 16-bit word every second chipset cycle.

BPLxDAT and SPRxDAT are also only 16-bit wide from CPU point of view. Only DMA can do AGA 32-bit or 2x32-bit transfers.

CPU can use any free chipset cycle but CPU chipset bus access will always take 2 chipset cycles to complete. (Note that this is from chipset point of view, CPU/accelerator board can have write buffer(s) that can improve performance noticeably)
Toni Wilen is offline  
Old 26 May 2022, 10:52   #19
Cyprian
Registered User
 
Join Date: Jul 2014
Location: Warsaw/Poland
Posts: 171
Quote:
Originally Posted by Toni Wilen View Post
Custom registers (all chipsets):
CPU can read or write 16-bit word every second chipset cycle.

...

Only DMA can do AGA 32-bit or 2x32-bit transfers.
thanks for clarification

Last edited by Cyprian; 26 May 2022 at 12:12.
Cyprian is offline  
Old 26 May 2022, 13:21   #20
defor
Registered User
 
Join Date: Jun 2020
Location: Brno
Posts: 90
Quote:
Originally Posted by Toni Wilen View Post
CPU can use any free chipset cycle but CPU chipset bus access will always take 2 chipset cycles to complete
Does it mean that CPU can use odd numbered cycles if they're free, but as soon as chip-set needs them CPU is "synced" back to even numbered cycles (because he must wait)?
I though that the bus controller (Agnus?) strictly allows CPU to access even numbered cycles only (if available). The DMA time slot allocation diagram in HRM suggests that .
defor is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Color Saturation and Color Tint/Hue Retro-Nerd support.WinUAE 22 02 August 2018 10:38
Poland in pixels s2325 Nostalgia & memories 3 05 May 2014 22:38
Printing in color with WinUAE on color laser source support.Apps 7 14 April 2013 00:32
Déjà Vu: A Nightmare Comes True alkis21 project.Killergorilla's WHD packs 12 02 September 2012 18:49
ISO true color to 256 color algorithm Lord Riton Coders. General 19 15 April 2011 17:49

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 16:38.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.10803 seconds with 16 queries