English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 02 September 2016, 22:35   #101
Mrs Beanbag
Glastonbridge Software
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,202
Quote:
Originally Posted by NorthWay View Post
The change to Exec is, but the struct it points to is indeed not documented and definitely so intended not to be.
good show, it means they can change it without notice for any purpose they like... and so can we...

but back to those pesky RMWs...

well which instructions fit that bill anyway? Any operation with <ea> as destination, i suppose. Such as "addq #1,<ea>", that could be useful for thead safe stuff, right?

Well, give each core a priority (which might rotate on a per-cycle basis), the highest priority core that encounters a destination <ea> gets to set a busy bit that blocks all other memory reads (on this address?) until it clears it again at the end of the instruction. (How does the 68060 pipeline cope with such potential RAM-based data hazards, anyway? Just stall?)

As for TAS/CAS &c, found a thread about that very issue here on EAB, well i can see the problem when using it on chip RAM, because DMA doesn't respect it, which can lead to incorrect results. However on fast RAM, the situation is different. The Amiga only lives on one side of the expansion socket. Accelerators might have their own DMAs for various things, however, if we are designing an accelerator of our own, we have control over that.
Mrs Beanbag is offline  
Old 03 September 2016, 00:09   #102
Megol
Registered User

Megol's Avatar
 
Join Date: May 2014
Location: inside the emulator
Posts: 370
Quote:
Originally Posted by meynaf View Post
Quite a lot of trouble. Consider two programs accessing the same list, especially with inline versions of Forbid/Disable.
Snooping for those cases are simple in hardware and handling them are too. Making RMW instructions atomic isn't a problem either, there are a lot of other problems though that is mostly OS related. Don't see any need to add special SMP instructions except for switching on/off the snooping mechanism.

Quote:
Originally Posted by Mrs Beanbag View Post
good show, it means they can change it without notice for any purpose they like... and so can we...

but back to those pesky RMWs...

well which instructions fit that bill anyway? Any operation with <ea> as destination, i suppose. Such as "addq #1,<ea>", that could be useful for thead safe stuff, right?

Well, give each core a priority (which might rotate on a per-cycle basis), the highest priority core that encounters a destination <ea> gets to set a busy bit that blocks all other memory reads (on this address?) until it clears it again at the end of the instruction. (How does the 68060 pipeline cope with such potential RAM-based data hazards, anyway? Just stall?)

As for TAS/CAS &c, found a thread about that very issue here on EAB, well i can see the problem when using it on chip RAM, because DMA doesn't respect it, which can lead to incorrect results. However on fast RAM, the situation is different. The Amiga only lives on one side of the expansion socket. Accelerators might have their own DMAs for various things, however, if we are designing an accelerator of our own, we have control over that.
IMHO cache-line locking is the best solution to making RMW instructions atomic.

When reading from memory the data can be 1) in the local cache 2) in the cache of another processor 3) in main memory. If it is in the local cache the data will be locked (will not change coherency state until the instruction retires), if it is in a remote cache it will be fetched to the local cache and then locked. If it is in main memory it will be fetched and then locked when in the local cache.

In this way other processors can still continue to run programs unless they happen to access a cache-line that is locked.

Last edited by Megol; 03 September 2016 at 00:21.
Megol is offline  
Old 03 September 2016, 02:01   #103
NorthWay
Registered User
 
Join Date: May 2013
Location: Grimstad / Norway
Posts: 614
Quote:
Originally Posted by Megol View Post
IMHO cache-line locking is the best solution to making RMW instructions atomic.
That was what I was thinking of myself. Any idea how expensive it is in gate cost, execution slowdown from cache ping-ponging and implementation complexity?

And for once you really _need_ to have a per-cpu cache :-)
NorthWay is offline  
Old 05 September 2016, 22:25   #104
Mrs Beanbag
Glastonbridge Software
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,202
lately i've been musing about the possibility of doing massive hyperthreading/massive ILP instead of having multiple cores, but coming back to ISA stuff... or maybe this should go back in the "other thread"...

but something i've wondered before, if it could be possible to have a "fork" instruction, have some really simple hardware scheduler allowing one to create another thread directly from asm code.
Mrs Beanbag is offline  
Old 06 September 2016, 23:02   #105
Megol
Registered User

Megol's Avatar
 
Join Date: May 2014
Location: inside the emulator
Posts: 370
Quote:
Originally Posted by NorthWay View Post
That was what I was thinking of myself. Any idea how expensive it is in gate cost, execution slowdown from cache ping-ponging and implementation complexity?
How expensive it is depends on how coherency is done but it is essentially only a check when another processor want access to a certain cacheline if it is local (this is always done anyway) and if it is locked. If it is locked the remote request is stalled until the instruction is finished. In most cases very little extra hardware is required.

There would be no more ping-pong effect compared to other systems with cache coherency, a remote processor that want to access a local cacheline always have to request it before continuing execution. Well, one could do a very complicated design where the remote processor can execute an instruction on the local processor but for real workloads it would be slower.

The extra cost is mostly in the coherency logic itself however that is a cost one have to bear to make a user-friendly system.

(here local processor = the processor that currently owns the cacheline, remote processor = the processor that want to access the same cacheline)

Quote:
And for once you really _need_ to have a per-cpu cache :-)
Not really as one could modify the protocol and have per core locking. But for most cases at least one cache (preferably two: instruction, data) is the best solution anyway.
Megol is offline  
Old 06 September 2016, 23:11   #106
Megol
Registered User

Megol's Avatar
 
Join Date: May 2014
Location: inside the emulator
Posts: 370
Quote:
Originally Posted by Mrs Beanbag View Post
lately i've been musing about the possibility of doing massive hyperthreading/massive ILP instead of having multiple cores, but coming back to ISA stuff... or maybe this should go back in the "other thread"...

but something i've wondered before, if it could be possible to have a "fork" instruction, have some really simple hardware scheduler allowing one to create another thread directly from asm code.
The transputer had a hardware scheduler (well, microcode anyway) and very cheap creation of new threads. There have been other processors with cheap hardware supported multithreading but I can't remember anything except the transputer ATM, a search should show something.

Some other related design ideas for speculative threading etc. have equivalent of fork instructions to spin of a speculative thread.
Megol is offline  
Old 06 September 2016, 23:59   #107
NorthWay
Registered User
 
Join Date: May 2013
Location: Grimstad / Norway
Posts: 614
I've seen some of the states for modern Power caches and I think they have more than 20 different possible ones. Shared read-only is one of them (IIRC). That can of course be converted to local-rw plus far-purge on first write. Or if the caches aren't exclusive as local-rw plus far-ro.

But I still don't know how the OS should behave as it considers itself alone.
NorthWay is offline  
Old 07 September 2016, 22:19   #108
Megol
Registered User

Megol's Avatar
 
Join Date: May 2014
Location: inside the emulator
Posts: 370
Quote:
Originally Posted by NorthWay View Post
I've seen some of the states for modern Power caches and I think they have more than 20 different possible ones. Shared read-only is one of them (IIRC). That can of course be converted to local-rw plus far-purge on first write. Or if the caches aren't exclusive as local-rw plus far-ro.
Well that's a question of optimization Systems supporting massive amounts of multiprocessors want to reduce coherency overheads as much as practically possible, a smaller number of processors/cores can use less complicated designs.

A common coherency design uses four states/cacheline, MESI or Modified/Exclusive/Shared/Invalid. In short:

Modified: The cacheline have the current data which have been modified (not matching main memory).
Exclusive: The cacheline have the only copy of the data.
Shared: This cacheline have a copy of the data, other processors may also have it.
Invalid: Well, the line is invalid.

Making RMW instructions atomic can be done either by adding external logic not modifying the states themselves (just changing how state changes happen), by slightly redefining the Modified state (such as a RMW instruction changing the state to modified before actually modifying it and not allowing the state to change until the instruction is retired) or by adding a new state similar to modified that can only change to modified (Locked? Atomic? Something like that).

Quote:
But I still don't know how the OS should behave as it considers itself alone.
Yes. 100% compatibility will never be possible but I've not seen a good argument that an almost-SMP mode would be possible. Almost-SMP as there are some corner cases where true multi-processing would break down however for the most time the system should be similar to a SMP one.
Megol is offline  
Old 10 September 2016, 18:19   #109
Mrs Beanbag
Glastonbridge Software
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,202
Quote:
Originally Posted by Megol View Post
Yes. 100% compatibility will never be possible but I've not seen a good argument that an almost-SMP mode would be possible. Almost-SMP as there are some corner cases where true multi-processing would break down however for the most time the system should be similar to a SMP one.
right. (i presume you meant "would NOT be possible"?)

As for inter-process communication, a thing that didn't get mentioned when it should have been: as Maynaf says, on modern OS different processes are quite isolated from each other in their own address space so can't just write to each other's data structures. Well that is true for different processes but it is not true for different threads, which one can spawn (with, for instance, std::thread in C++11). All the threads of one process exist in the same address space so you can do exactly what Amiga applications to, and write to shared memory. It is sometimes difficult to get right but there is no hardware problem. The programmer doesn't know, or need to know, whether the other threads are running on a different core or not. 68k actually should make this sort of thing easier if atomicity of <ea> destination instructions can be guaranteed.

The only difficulty i can see is with interrupts. When an interrupt happens, the program it belongs to might reasonably assume the interrupt routine cannot be interrupted by anything else, so it could cause trouble if that program can carry on executing even when the interrupt is happening. One would need either, to know which process set up the interrupt and pause that one, or cautiously suspend all processes during servicing of any interrupt.

For instance if i try to stop a Protracker module while the playroutine interrupt is happening on another hardware thread, it might cause trouble. This is not a situation anyone currently needs to worry about.

Last edited by Mrs Beanbag; 10 September 2016 at 18:26.
Mrs Beanbag is offline  
Old 10 September 2016, 20:47   #110
Megol
Registered User

Megol's Avatar
 
Join Date: May 2014
Location: inside the emulator
Posts: 370
Quote:
Originally Posted by Mrs Beanbag View Post
right. (i presume you meant "would NOT be possible"?)
Yes.

Quote:
As for inter-process communication, a thing that didn't get mentioned when it should have been: as Maynaf says, on modern OS different processes are quite isolated from each other in their own address space so can't just write to each other's data structures. Well that is true for different processes but it is not true for different threads, which one can spawn (with, for instance, std::thread in C++11). All the threads of one process exist in the same address space so you can do exactly what Amiga applications to, and write to shared memory. It is sometimes difficult to get right but there is no hardware problem. The programmer doesn't know, or need to know, whether the other threads are running on a different core or not. 68k actually should make this sort of thing easier if atomicity of <ea> destination instructions can be guaranteed.

The only difficulty i can see is with interrupts. When an interrupt happens, the program it belongs to might reasonably assume the interrupt routine cannot be interrupted by anything else, so it could cause trouble if that program can carry on executing even when the interrupt is happening. One would need either, to know which process set up the interrupt and pause that one, or cautiously suspend all processes during servicing of any interrupt.

For instance if i try to stop a Protracker module while the playroutine interrupt is happening on another hardware thread, it might cause trouble. This is not a situation anyone currently needs to worry about.
Interrupts are one problem but an easily solved one: just make all interrupts/exceptions be handled by one processor halting all others.

That RMW instructions are virtually atomic on a one-processor system is solved by making them actually atomic on multiprocessor systems.

The disable/forbid etc. routines and their in-line macros can be handled by snooping changes of the two relevant addresses and stall other processors. A more optimized way to handle it would be to support "virtual stalling", as long as other processors doesn't disturb the one calling forbid etc. they can be allowed to continue execution.

Programs executing under Amiga OS can assume that they will not be preempted by lower priority ones and can use this as a kind of synchronizing mechanism. The simple way to handle this is making sure all processors are executing programs of the same priority, it isn't optimal though.

There are certainly some other problems that have to be handled however I think multiprocessing is possible and a way to provide Amiga systems with a nice speed boost. Could be wrong of course, I often am.
Megol is offline  
Old 10 September 2016, 21:48   #111
Mrs Beanbag
Glastonbridge Software
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,202
Quote:
Originally Posted by Megol View Post
The disable/forbid etc. routines and their in-line macros can be handled by snooping changes of the two relevant addresses and stall other processors. A more optimized way to handle it would be to support "virtual stalling", as long as other processors doesn't disturb the one calling forbid etc. they can be allowed to continue execution.
Forbid() is often used to prevent other processes from doing stuff temporarily, for instance shrinking an allocated area of memory, one calls forbid, Frees the memory and then re-allocates it with AllocAbs. That is one example. If some other process could reserve the same memory in the meantime that could get nasty. Mind you certain memory tracker programs already break this since they write tags all over blocks of ram at the point of freeing.

Quote:
Programs executing under Amiga OS can assume that they will not be preempted by lower priority ones and can use this as a kind of synchronizing mechanism. The simple way to handle this is making sure all processors are executing programs of the same priority, it isn't optimal though.
that is true, setting a process to a high priority is like a way of doing "thread.join()" i suppose.. the OS could make it "safe" at the expense of stalling all other running processes even though they don't have anything to do with the high priority task, which is not optimal.
Mrs Beanbag is offline  
Old 11 September 2016, 11:09   #112
Samurai_Crow
Total Chaos forever!

Samurai_Crow's Avatar
 
Join Date: Aug 2007
Location: Ft. Collins, CO USA
Age: 45
Posts: 1,310
Send a message via Yahoo to Samurai_Crow
The AROS source has an experimental project called "Silly SMP" that didn't help much on Intel because of the scheduler. The way they were doing it required high priority tasks to be non-blocking like the third-party Executive utility on AmigaOS.
Samurai_Crow is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
BOOM (DOOM Enhanced) port to 68k NovaCoder News 133 18 November 2019 16:29
ISA Ethernet Cards jmmijo support.Hardware 13 03 February 2015 12:04
Any ISA Mach64 Information? CU_AMiGA support.Hardware 21 09 September 2007 23:17
Help converting an 8bit ISA slot to 16bit ISA slot Smiley support.Hardware 4 25 April 2006 12:20
A2000 ISA slots Unknown_K support.Hardware 1 20 March 2005 10:48

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 15:03.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2019, vBulletin Solutions Inc.
Page generated in 0.08788 seconds with 16 queries