English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 28 August 2022, 12:00   #21
chb
Registered User
 
Join Date: Dec 2014
Location: germany
Posts: 439
Quote:
Originally Posted by ross View Post
The OP also specified that by activating B channel the Blithog leaves the CPU out of the bus
That one is interesting BTW, according to the HRM errata AB->D with fill should have idle cycles, too.
chb is offline  
Old 28 August 2022, 12:11   #22
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by chb View Post
That one is interesting BTW, according to the HRM errata AB->D with fill should have idle cycles, too.
Yep, interesting, and this really need to be checked.

As it is undocumented it can be a specific condition of the OP, not due to the active channels (even if it seems strange to me) or the errata is 'errato'
But I fully trust WinUAE for this case.
ross is offline  
Old 28 August 2022, 13:05   #23
chb
Registered User
 
Join Date: Dec 2014
Location: germany
Posts: 439
Quote:
Originally Posted by ross View Post
Yep, interesting, and this really need to be checked.

As it is undocumented it can be a specific condition of the OP, not due to the active channels (even if it seems strange to me) or the errata is 'errato'
But I fully trust WinUAE for this case.

The errata is certainly very 'errato', but I haven't seen Toni mentioning that error (if it's one). Maybe fill idle cycles are different from 'normal' ones and can overlap with display DMA? AFAIR Toni mentioned that idle cycles occur due to some Agnus internal hardware resource being shared by the blitter and the DMA sequencer, maybe that's different for the fill stage and it's just an unused cycle there?
chb is offline  
Old 28 August 2022, 13:17   #24
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
I'm pretty sure I've tests that do blitter fill with EFE or IFE (and FCI and DESC), that use or not BLTPRI, and with selectable input channels (and with other competing DMA sources also).

Or wait for Toni to tell us everything, since I rememer nothing of it

EDIT: and right now I can't try any code.

Last edited by ross; 28 August 2022 at 13:53.
ross is offline  
Old 28 August 2022, 13:41   #25
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,038
Quote:
Originally Posted by Jobbo View Post
My blit in one example is a fill of 192x192x4 pixels which will use channels A and D...
Is it a 100% 68000 and no fast memory system? Here is a simple test for a500 (best viewed in asm-one/pro ). It opens a 256x256x4bpl interleaved screen, and it blitter fills the upper half with a specific pattern (the lower half is empty).

Now, the relevant part. Uncomment ;1 (which changes the pattern to solid), there is no change. Uncomment ;2 (which changes destination to lower half), there is no change. Uncomment both, there is no change.
Code:
	move.w	#Height<<6+0,($058,a6)	; width = 4*256 = 1024px

;1	move.l	#$09ff0000,($040,a6)	; d = 1
;2	move.l	#Bitmaps\.2,($054,a6)	; d
Do these modifications after commenting out the "nasty on" line, now you can see the changes on the screen.

And another thing, which is why I asked earlier what happens after a write to size. If you do something like move.w d0,(a0) where a0 is a blitter register (1 word instruction), and this is related to what Ross pointed out, you are most likely in danger because blitter won't immediately wake up and start blasting. That kind of instruction is very likely to slip through, but more "complex" stuff requiring extra fetches should be "safe" (I figure best to put it in quotes so I have an excuse :P ).

And finally, if you want your code to work on 020+ (this is related to what Thomas said, however the first word in thread title is A500 so...), you will need a different version, with blit waits. You could try messing around with disabling icache and similar and maybe it works, but I would rather have 2 versions. PITA especially if you have unrolled code ;(.

Code:
;---------------T

****************************************************************

Width		EQU	256
WidthB		EQU	Width/8
Height		EQU	128
Depth		EQU	4

****************************************************************

	SECTION	TestCode,CODE_C		; enforce chip/slow

Code	lea	($dff000),a6
	move.w	#$4000,($09a,a6)	; system off
	move.w	#$0020,($096,a6)

	move.l	#Copper,($080,a6)
	move.w	d0,($088,a6)

	lea	(Bitmaps),a0
	moveq	#0,d0
	move.w	#(Bitmaps\.End-Bitmaps)/16-1,d1
.ClearBMs	REPT	16/4
		move.l	d0,(a0)+
	ENDR
	dbf	d1,.ClearBMs

	lea	(Copper\.Planes+2),a0
	move.l	#Bitmaps,d0
	moveq	#WidthB,d1		; interleaved
	moveq	#Depth-1,d2
.SetPlanes	move.w	d0,(4,a0)
	swap	d0
	move.w	d0,(a0)
	swap	d0
	addq.l	#8,a0
	add.l	d1,d0
	dbf	d2,.SetPlanes

	move.w	#$8400,($096,a6)	; nasty on

.Main	move.l	($004,a6),d0
	and.l	#$01ff00,d0
	cmp.l	#$012c00,d0
	bne.b	.Main

	bsr.b	Test

	btst	#6,($bfe001)
	bne.b	.Main

	move.w	#$0400,($096,a6)	; nasty off

	move.l	(4).w,a0		; system on
	move.l	(156,a0),a0
	move.l	(38,a0),($080,a6)
	move.w	#$8020,($096,a6)
	move.w	#$c000,($09a,a6)
	rts

****************************************************************

Test
.WB1	btst	#14-8,($002,a6)
	bne.b	.WB1

	move.l	#$09f00000,($040,a6)	; d = a
	moveq	#~0,d0
	move.l	d0,($044,a6)		; f/lwm
	move.l	#SrcA,($050,a6)		; a
	move.l	#Bitmaps\.1,($054,a6)	; d
	move.w	#-Depth*WidthB,($064,a6) ; moda
	move.w	#0,($066,a6)		; modd
	move.w	#Height<<6+0,($058,a6)	; width = 4*256 = 1024px

;1	move.l	#$09ff0000,($040,a6)	; d = 1
;2	move.l	#Bitmaps\.2,($054,a6)	; d


.WB2	btst	#14-8,($002,a6)
	bne.b	.WB2
	rts

****************************************************************

	SECTION	TestChip,DATA_C

Copper	DC.W	$008e,$2ca1,$0090,$2ca1
	DC.W	$0092,$0048,$0094,$00c0
	DC.W	$0100,Depth<<12+$0200,$0102,$0000,$0104,$0000
	DC.W	$0106,$0c00,$010c,$0011,$01fc,$0000
	DC.W	$0108,(Depth-1)*WidthB,$010a,(Depth-1)*WidthB
.Planes
.D	SET	0
	REPT	Depth			; interleaved
		DC.W	$00e0+.D*4,0
		DC.W	$00e2+.D*4,0
.D		SET	.D+1
	ENDR
	DC.W	$0180,$000
.C	SET	0
	REPT	1<<Depth-1
.C		SET	.C+1
		DC.W	$0180+.C*2,.C*$111
	ENDR
	DC.W	$ffff,$fffe

SrcA	DCB.B	Depth*WidthB,$ab

****************************************************************

	SECTION	TestBitmap,BSS_C

Bitmaps
.1	DS.B	Depth*Height*WidthB
.2	DS.B	Depth*Height*WidthB
.End

****************************************************************
a/b is offline  
Old 28 August 2022, 14:02   #26
Niklas
Registered User
 
Join Date: Apr 2018
Location: Stockholm / Sweden
Posts: 129
I don't know if this has been mentioned in this thread (tl;dr) but the CPU cannot use two DMA slots back-to-back, even if a fast processor is used. So when comparing performance of copying words to and from chip memory using CPU vs using blitter, this is a clear advantage for the blitter.
Niklas is offline  
Old 28 August 2022, 14:26   #27
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,502
Blitter internals are detailed here (including fill and line draw special cases): https://eab.abime.net/showthread.php?t=104887
Toni Wilen is online now  
Old 28 August 2022, 15:28   #28
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by Toni Wilen View Post
Blitter internals are detailed here (including fill and line draw special cases): https://eab.abime.net/showthread.php?t=104887
Right, thanks

So, here the fill cases:
Code:
  /-1-\   /-2-\
A - B - X - Y - OUT
Fill mode extra special case:
If BLTCON0(D) and !BLTCON0(C): Add extra idle cycle. X=idle cycle,Y=D


This means (A enabled, Blithog enabled), :
A->D, idle cycles
AB->D, idle cycles
AC->D, no idle cycles
ABC->D, no idle cycles

Perhaps jobbo inverted B and C channels.
ross is offline  
Old 28 August 2022, 16:56   #29
Jobbo
Registered User
 
Jobbo's Avatar
 
Join Date: Jun 2020
Location: Druidia
Posts: 386
Quote:
Originally Posted by ross View Post
Perhaps jobbo inverted B and C channels.
I double checked and yes I got that backwards, sorry.

Last edited by Jobbo; 28 August 2022 at 17:15.
Jobbo is offline  
Old 28 August 2022, 17:15   #30
Jobbo
Registered User
 
Jobbo's Avatar
 
Join Date: Jun 2020
Location: Druidia
Posts: 386
Quote:
Originally Posted by a/b View Post
Is it a 100% 68000 and no fast memory system? Here is a simple test ...
I am indeed running on a standard A500 with no-fast ram.

You're coming at this from a different perspective, one where you are depending on blithog to avoid waits. That's interesting for bob records but it's not really what I was trying to understand.

In my case I'm using waits and don't have any reason not to. I'm using blithog to maximize the blitter work, while also doing separate cpu work.

All of this works fine for me, what was confusing was my reading of previous posts declaring blithog an absolute bus lockout for the cpu.

It only locks out the cpu for certain kinds of blits, but that nuance is lost in a lot of discussions. I wanted to know for sure that I wasn't getting unexpected cpu progress, either by using blithog incorrectly or configuring winuae badly. I'm satisfied that everything is good now.

I don't think my experience in any way contradicts what you're seeing.
Jobbo is offline  
Old 28 August 2022, 17:19   #31
Jobbo
Registered User
 
Jobbo's Avatar
 
Join Date: Jun 2020
Location: Druidia
Posts: 386
One thing that was not clear to me at all was how those differently blitter sequences with the idles would interact with bitplane dma.

It seemed plausible that the bitplane dma for planes 5 and 6 might fall into those idle slots and lock out the cpu.

But from what Roondar has said over on Discord it seems that regardless of the bitplane dma the same sequence will play out with the same number of idles available for the cpu, it will just be stretched out due to those additional bitplanes.

That wasn't clear to me at all but it's certainly simpler to understand if that is indeed the case.
Jobbo is offline  
Old 28 August 2022, 17:21   #32
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by Niklas View Post
I don't know if this has been mentioned in this thread (tl;dr) but the CPU cannot use two DMA slots back-to-back, even if a fast processor is used. So when comparing performance of copying words to and from chip memory using CPU vs using blitter, this is a clear advantage for the blitter.
Not explicitly in this very thread, but yes, Agnus does not allow *any* CPU to make two consecutive cycles access to chip RAM.
This is why the actual internal cycles bus usage by the CPU is at least halved.

Quote:
Originally Posted by Jobbo View Post
I double checked and yes I got that backwards, sorry.
Not a problem, so we know that errata is not 'errato', at least for this case
ross is offline  
Old 28 August 2022, 17:32   #33
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by Jobbo View Post
One thing that was not clear to me at all was how those differently blitter sequences with the idles would interact with bitplane dma.

It seemed plausible that the bitplane dma for planes 5 and 6 might fall into those idle slots and lock out the cpu.

But from what Roondar has said over on Discord it seems that regardless of the bitplane dma the same sequence will play out with the same number of idles available for the cpu, it will just be stretched out due to those additional bitplanes.

That wasn't clear to me at all but it's certainly simpler to understand if that is indeed the case.
This is simpler than you might think.

There is no interaction between the bitplanes DMA channels and those of the Blitter, except for the priority, which is always for the bitplanes.
Then the idle cycles are simply 'moved' forward in time until there is a free cycle (i.e. not used by a higher priority DMA channel, i.e. *all* others DMA channels in case of the Blitter).
As soon as this cycle is found it becomes an idle Blitter cycle and the CPU can use it
ross is offline  
Old 28 August 2022, 17:52   #34
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,038
Quote:
Originally Posted by Jobbo View Post
I am indeed running on a standard A500 with no-fast ram.

You're coming at this from a different perspective, one where you are depending on blithog to avoid waits. That's interesting for bob records but it's not really what I was trying to understand.

In my case I'm using waits and don't have any reason not to. I'm using blithog to maximize the blitter work, while also doing separate cpu work.

All of this works fine for me, what was confusing was my reading of previous posts declaring blithog an absolute bus lockout for the cpu.

It only locks out the cpu for certain kinds of blits, but that nuance is lost in a lot of discussions. I wanted to know for sure that I wasn't getting unexpected cpu progress, either by using blithog incorrectly or configuring winuae badly. I'm satisfied that everything is good now.

I don't think my experience in any way contradicts what you're seeing.
The problem is the fill special case, I didn't know about it. That explains discrepancy between your and my observations. I always evaluated fill as D = A equivalent with an extra internal operation, e.g. my example code above is not using fill mode. Now I've added that extra, and indeed there are idle cycles even with blithog.

About hog or not to hog, it really depends on what you are doing. I always try to avoid it simply because cpu, unlike the blitter, will not use all the free slots and it's typically better to have them run in parallel, so that blitter grab all the free slots throughout the entire frame for maximum bus saturation. Now if the cpu has very little to do, then blithog makes sense (e.g. fx I mentioned in my very first post of this thread). But again, it depends, so it's try and see (however, it helps when you understand wth is going on internally :P ).
a/b is offline  
Old 28 August 2022, 18:10   #35
chb
Registered User
 
Join Date: Dec 2014
Location: germany
Posts: 439
Quote:
Originally Posted by a/b View Post
About hog or not to hog, it really depends on what you are doing. I always try to avoid it simply because cpu, unlike the blitter, will not use all the free slots and it's typically better to have them run in parallel, so that blitter grab all the free slots throughout the entire frame for maximum bus saturation.
There's AFAIR an argument for using blithog if a blit has idle cycles: Idle cycles do not overlap with other dma - but without blithog the cpu gets granted access if it has been locked out of the bus by any DMA for three mem cycles. Because of this, there may occur a situation where the blitter has to give a regular cycle to the cpu and a later idle cycle may be wasted (bc the CPU does not need that cycle anymore). So in that case blithog can actually increase bus saturation.
chb is offline  
Old 28 August 2022, 18:49   #36
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,098
While on the subject of blthog, won't it potentially mess up music? I don't mean audio DMA which will obviously work just fine, but the module playback part. Being a little bit off is probably fine (though the musician might think otherwise), but kicking off a large D=A blit with hog active could give a pretty jarring effect, no?

P.S. Too bad that blitter schematics didn't reveal any really good stuff. Could use an undocumented feature that sets B=A w/o DMA for something I'm looking into
paraj is offline  
Old 28 August 2022, 19:58   #37
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,214
Audio DMA has priority over blitter DMA, so there is no problem delivering audio samples to Paula. However, there *may* be a problem if an audio interrupt that indicates that audio DMA has finished does not reach the CPU fast enough.
Thomas Richter is offline  
Old 28 August 2022, 20:00   #38
NorthWay
Registered User
 
Join Date: May 2013
Location: Grimstad / Norway
Posts: 839
To go on a slight tangent from the discussion here; does anyone know _why_ you can't use back-to-back cycles for the cpu? Did Jens of iComp meddle with this for his alleged 14M/s chipmem speed replacement mb or was that something else?
NorthWay is offline  
Old 28 August 2022, 20:52   #39
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,406
Quote:
Originally Posted by NorthWay View Post
To go on a slight tangent from the discussion here; does anyone know _why_ you can't use back-to-back cycles for the cpu? Did Jens of iComp meddle with this for his alleged 14M/s chipmem speed replacement mb or was that something else?
Over at wiki.icomp.de, they say the following about the A1200 reloaded
Quote:
Originally Posted by wiki.icomp.de
Chipram is one of the special things where the Commodore A1200 Reloaded scores: Although it's "only" 2MByte Chipram like all other AGA machines before, the memory is much faster and can be accessed by the processor at full speed even when eight bitplanes are switched on. This is accomplished by making use of modern memory technology, which is a lot faster than the D-RAMs that the AGA chipset was designed for. The higher speed allows inserting extra timeslots between the original DMA slots of the Alice chip.
No idea how they do it on a hardware level while retaining compatibility with Alice, but they apparently just allow more cycles to the CPU.

As for your question, I'd also like to know why the CPU can't do back to back accesses and in particular why Commodore didn't manage to upgrade this for the ECS or AGA chipsets. I'm no hardware guy, but this lack of change (while allowing '64 bit transfers' for AGA bitplanes/Sprites) makes it seem to me like a very fundamental difference to implement. Maybe someone can shed some light on that

Last edited by roondar; 28 August 2022 at 21:11.
roondar is offline  
Old 28 August 2022, 22:18   #40
Niklas
Registered User
 
Join Date: Apr 2018
Location: Stockholm / Sweden
Posts: 129
Quote:
Originally Posted by NorthWay View Post
To go on a slight tangent from the discussion here; does anyone know _why_ you can't use back-to-back cycles for the cpu?
That question is a bit tricky to answer. One obvious answer is "because that's the way that they chose to design Agnus". To definitively know why they chose to design it that way then perhaps one has to ask the people that were there and see if they can recount why.

One probable explanation why they designed it that way is that because of how the 68000 bus cycle works (as described in section 5 of https://www.nxp.com/docs/en/referenc.../MC68000UM.pdf) the 7 MHz 68000 CPU will always take one DMA slot to set up the next bus cycle after the previous access is complete, so given that this is the processor they were designing the chipset around then there is no point in designing Agnus to allow the CPU to do back-to-back DMA slot accesses.

One could argue that the timing of Agnus could have been made more general so to allow a faster CPU to do back-to-back accesses, but to counter that I would say that it is usually a mistake to make a more complicated design in order to handle a future scenario that may never happen.
Niklas is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
BLITHOG and Interrupts Ernst Blofeld Coders. Asm / Hardware 22 26 November 2020 14:45
Blithog behavior ovale Coders. Asm / Hardware 5 12 January 2015 08:05
A500+ with a GVP HD ||| Tools? Details? Lobotomika support.Hardware 5 02 August 2011 00:43
Weird A500 boot-up behavior 8bitbubsy support.Hardware 8 25 October 2009 20:10
Need A500 Keyboard connector details KillaByte support.Hardware 1 30 December 2001 14:31

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 08:55.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.19008 seconds with 13 queries