English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General

 
 
Thread Tools
Old 24 October 2011, 17:43   #1
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,517
How not to flush caches.

AROS m68k CopyBack random hang bug aka How Not To Flush Caches.

68k CacheClearU() executed following code if CPU is 040 or 060 (left out Supervisor() call and other minor things, it isn't important here)

CPUSHA BC
CINVA BC

Question: Why is this horribly bad idea?

(No, I didn't write this code and I didn't see the problem until few days ago..)
Toni Wilen is offline  
Old 25 October 2011, 04:26   #2
matthey
Banned
 
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
How did this only hang without the Supervisor call? No crash? I would expect this to work in Supervisor although it doesn't make much sense. The CPUSHA BC is all that is needed. CacheClearU() is a pretty simple function and there aren't many ways to write it the "correct" way. Did you check CacheClearE() also?

CacheClearU:
move.l a5,a0
lea (.super_push,pc),a5
jsr (Supervisor,a6)
move.l a0,a5
rts

.super_push:
cpusha bc
rte
matthey is offline  
Old 25 October 2011, 07:56   #3
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,517
You misunderstood (and yes, single cpusha bc fixes it). I now know what the problem is (was), just wondering if anyone else can see the problem more quickly than me

Bonus question: why did it work fine (usually) without copyback cache?
Toni Wilen is offline  
Old 25 October 2011, 14:09   #4
TheDarkCoder
Registered User
 
Join Date: Dec 2007
Location: Dark Kingdom
Posts: 213
just a rapid guess: since 040 is pipelined (and 060 superscalar) CINVA starts executing before the push of data cache is over. So some data cache entry is invalidated before being pushed (and so is not push back to ram).
Without copyback there are no problems because the ram is consistent with the caches (no need to push back, indeed)
TheDarkCoder is offline  
Old 25 October 2011, 14:25   #5
matthey
Banned
 
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
Quote:
Originally Posted by TheDarkCoder View Post
just a rapid guess: since 040 is pipelined (and 060 superscalar) CINVA starts executing before the push of data cache is over. So some data cache entry is invalidated before being pushed (and so is not push back to ram).
That's a good guess but both the 040 and 060 do a pipeline synchronization before executing a CPUSHA or CINVA similar to having a NOP before it. However, there could be bugs in some processors (check CPU erratas). Writethrough shouldn't have to push any cache as it's pushed to memory when written. The cache and memory are already consistent.

Last edited by matthey; 25 October 2011 at 14:43.
matthey is offline  
Old 25 October 2011, 15:37   #6
TheDarkCoder
Registered User
 
Join Date: Dec 2007
Location: Dark Kingdom
Posts: 213
Yeah, mine was a too easy explanation, Toni would have thought about it in 1 nanosecond.
Also, not doing a pipeline syncronization before executing such kind of instruction would have been a Stupid x86 Thing, unworthy of the wonderful 68k! :-D

so, unless there is some cpu bug, I don't know
TheDarkCoder is offline  
Old 25 October 2011, 15:45   #7
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,517
Something can happen between those two instructions..
Toni Wilen is offline  
Old 25 October 2011, 15:56   #8
Leffmann
 
Join Date: Jul 2008
Location: Sweden
Posts: 2,269
An interrupt could occur between the two instructions or even during the CPUSHA, but I can't see the full picture here. Tell us
Leffmann is offline  
Old 25 October 2011, 16:16   #9
TheDarkCoder
Registered User
 
Join Date: Dec 2007
Location: Dark Kingdom
Posts: 213
Well the interrupt handler may write in data cache some information which is needed outside the handler. The CINVA invalidate the cache entry without writing the information written by the handler to the ram. So information is lost, if the cache is in copyback mode.
Toni said that sometimes there are also problems in write-through mode, though, I don't know why (maybe in some circumstance the write-through do not happen?)

By the way, the CINVA instruction is inherently dangerous because of this reason, it should be used with very special care.
TheDarkCoder is offline  
Old 25 October 2011, 18:57   #10
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,517
Interrupt between CPUSHA and CINVA was the problem. CPUSHA can take hundreds of cycles if there is lots of data to push. (CopyBack enabled)

I didn't think this code was wrong because it has been like this since ages ago. (I guess it was actually never used until m68k AROS was resurrected about a year ago..)

It also caused very confusing side-effects, 1230scsi.device (1260 + Blizzard SCSI Kit) worked fine 1-10 seconds before it hung. Interrupts and other tasks still kept working fine. Which pointed to task scheduling problem..

It gets even stranger, 1230scsi.device stopped detecting any drives if system was reset after hang.

How are you supposed to debug something like this?

CINVA removed and problem disappeared completely.
Toni Wilen is offline  
Old 26 October 2011, 05:15   #11
matthey
Banned
 
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
Quote:
Originally Posted by Toni Wilen View Post
Interrupt between CPUSHA and CINVA was the problem. CPUSHA can take hundreds of cycles if there is lots of data to push. (CopyBack enabled)
68060 User Manual...

CPUSHA <=5394(0/512) cycles
CINVA <=17(0/0) cycles

The CPUSHA instruction time assumes fast 2-1-1-1 memory too. Slower memory adds cycles very fast. That's with an 8k instruction and 8k data cache. The Natami will have several times that amount of cache and will be even slower at flushing the whole cache despite having very fast memory. That's why it's important to have a working CacheClearE() and encourage it's use to only flush the cache that needs to be.
matthey is offline  
Old 26 October 2011, 10:28   #12
TheDarkCoder
Registered User
 
Join Date: Dec 2007
Location: Dark Kingdom
Posts: 213
Agreed. Anyway I would expect that CPUSHA only push to memory the dirty cache lines, i.e. 512 is just tha upper bound on the number of memory accesses
TheDarkCoder is offline  
Old 26 October 2011, 21:04   #13
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,517
Quote:
Originally Posted by matthey View Post
That's why it's important to have a working CacheClearE() and encourage it's use to only flush the cache that needs to be.
Thanks for reminding

CacheClearE() was also bad (bad as in caused unnecessary performance loss), it only called CacheClearU().

Fixed today, now it uses CPUSHP if flushed region is small enough (megabyte or so, prevents buggy programs flushing whole memory space slowly, page by page...) and also flushes only requested cache type(s).
Toni Wilen is offline  
Old 27 October 2011, 08:21   #14
matthey
Banned
 
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
@TheDarkCoder
Yes, pushing dirty (written) cache lines only sounds correct. Pushing all the cache lines would be rare but so is 2-1-1-1 memory on the Amiga. I would not be surprised if cpusha bc takes over a thousand cycles on average on Amiga 68060 accelerators.

@Toni Wilen
Thanks. That's the kind of support that projects like the Natami need in order to make a future possible.
matthey is offline  
Old 27 October 2011, 08:31   #15
TheDarkCoder
Registered User
 
Join Date: Dec 2007
Location: Dark Kingdom
Posts: 213
Quote:
Originally Posted by matthey View Post
@TheDarkCoder
Yes, pushing dirty (written) cache lines only sounds correct. Pushing all the cache lines would be rare but so is 2-1-1-1 memory on the Amiga. I would not be surprised if cpusha bc takes over a thousand cycles on average on Amiga 68060 accelerators.
could you please explain what does 2-1-1-1 memory mean?

Is it related with football ?
TheDarkCoder is offline  
Old 27 October 2011, 11:31   #16
Geijer
Oldtimer
 
Geijer's Avatar
 
Join Date: Nov 2010
Location: VXO / Sweden
Posts: 153
2-1-1-1 is a way to show the access speed of memory in burst mode.

The numbers represent the number of clock cycles it takes for each data to be fetched. In the above example the whole burst of 4 "words" takes 5 cycles. The first data in 2 cycles, the following takes additional 1 per data.

2-2-2-2 means that there is no difference in burst mode compared to non burst mode.

Old SDRAM is at best 5-1-1-1 iirc.
Geijer is offline  
Old 27 October 2011, 11:50   #17
TheDarkCoder
Registered User
 
Join Date: Dec 2007
Location: Dark Kingdom
Posts: 213
Quote:
Originally Posted by Geijer View Post
2-1-1-1 is a way to show the access speed of memory in burst mode.

The numbers represent the number of clock cycles it takes for each data to be fetched. In the above example the whole burst of 4 "words" takes 5 cycles. The first data in 2 cycles, the following takes additional 1 per data.

2-2-2-2 means that there is no difference in burst mode compared to non burst mode.

Old SDRAM is at best 5-1-1-1 iirc.
ok, thanks. Very interesting!

@matthey (or others):
so, are you saying that Amiga accelerators do not have, in fast ram, 2-1-1-1 access speed?
What are the usual access speed? Are speed of widespread accelerators (Cyberstorm, various Blizzard, Apollo, etc.) known ?
TheDarkCoder is offline  
Old 27 October 2011, 18:13   #18
matthey
Banned
 
Join Date: Jan 2010
Location: Kansas
Posts: 1,284
Quote:
Originally Posted by TheDarkCoder View Post
so, are you saying that Amiga accelerators do not have, in fast ram, 2-1-1-1 access speed?
What are the usual access speed? Are speed of widespread accelerators (Cyberstorm, various Blizzard, Apollo, etc.) known ?
The accelerator manufacturers generally did not publish the access times. Some had jumpers or settings to add wait states. I have heard numbers float around but I don't know how reliable they are. Here are possible maximums for some accelerators...

3640 7-7-7-7 (no burst)
WarpEngine 040 4-2-2-2
Cyberstorm MKIII/PPC 5-1-1-1
Atari CT60 5-1-1-1 reads 3-1-1-1 writes

The Natami has an SRAM cache that will likely do 3-1-1-1 or 2-1-1-1. It's regular DDR2 memory will likely be similar to the CT60 or a little better.
matthey is offline  
Old 28 October 2011, 10:05   #19
TheDarkCoder
Registered User
 
Join Date: Dec 2007
Location: Dark Kingdom
Posts: 213
thanks! :-)

probaly numbers variates with the type of memory installed.
TheDarkCoder is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 12:09.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.09161 seconds with 12 queries