English Amiga Board


Go Back   English Amiga Board > Support > support.Hardware

 
 
Thread Tools
Old 21 January 2023, 17:10   #61
thebajaguy
Registered User
 
Join Date: Mar 2017
Location: Rhode Island / United States
Posts: 203
Quote:
Originally Posted by StompinSteve View Post
An interesting observation: only during all the MEMF_CHIP tests, the screen gets corrupted, bounces all over the place, tears, goes funky, and the speeds at which is does this, is different for each test (4096 byte, 32768 byte etc.).
At the same time, the IndivisionAGAmk3 OnScreenDisplay pops in and out saying that the resolution changed, that Interlace mode is on and off and goes crazy too.

As soon as the MEMF_CHIP benchmark is over, the screen is normal again.
I've never seen this. Only on this A1200 and only now that I run it on SCSI. It must mean that the test-software overwrites an area in chipram that has screenbuffer content in it I guess?
I have seen this jitter in other 'benchmark' situations, in times past, but it's been awhile.

As pure speculation, I suspect there may be unexpected 'interaction' between the CPU/Accelerator card having the ChipRAM bus and Alice (or Agnus) memory access while video DMA is stealing access cycles from the CPU.

I've wondered about it, but only someone with the test gear to watch who has the ChipRAM bus / who gives up the bus, and when might be able to tell. I don't think it's software-related.
thebajaguy is offline  
Old 22 January 2023, 00:29   #62
StompinSteve
Village idiot
 
StompinSteve's Avatar
 
Join Date: Jun 2021
Location: Switzerland
Posts: 267
Update on the Jitter issue:
Just now, I created my first Quarterback backup of this system since installing the Blizzard Scsi-kit. During the actual backup, the screen goes bananas, in the same way as during the MEMF_CHIP benchmark.

As this never happened before when I was using a IDE CF "Harddisk" and the 1240T in this machine, i'm 100% sure this is related to the Scsi-Kit IV

Last edited by StompinSteve; 22 January 2023 at 01:42.
StompinSteve is offline  
Old 28 January 2023, 10:45   #63
StompinSteve
Village idiot
 
StompinSteve's Avatar
 
Join Date: Jun 2021
Location: Switzerland
Posts: 267
For anyone looking for performance tips on using the ZuluSCSI, I wrote an article on the ZuluSCSI GitHub Homepage: https://github.com/ZuluSCSI/ZuluSCSI...iscussions/133
I also wrote an article with fun stuff you can do with a ZuluSCSI: https://github.com/ZuluSCSI/ZuluSCSI...iscussions/134
StompinSteve is offline  
Old 30 January 2023, 23:32   #64
StompinSteve
Village idiot
 
StompinSteve's Avatar
 
Join Date: Jun 2021
Location: Switzerland
Posts: 267
I've been talking to the actual developer of the TF536 and asked him if there is anything that can be done concerning the performance. This is his answer:
Quote:
DMA is not possible with the TF536. you could get parity by restricting the buffer memory type to 24bit dma.

realistically just don't use that GVP and a TF536. They're not designed to work together and the slow compatibility mode you get isnt worth it.
Same goes for the TF534, he later added.

So part of his reply was this: "you could get parity by restricting the buffer memory type to 24bit dma."
What does he mean by that? Does he refer to the 8MB FastRAM on the GVP and configuring "BufMem Type = 5" ?
StompinSteve is offline  
Old 31 January 2023, 00:46   #65
alexh
Thalion Webshrine
 
alexh's Avatar
 
Join Date: Jan 2004
Location: Oxford
Posts: 14,354
Yes
alexh is offline  
Old 31 January 2023, 04:40   #66
thebajaguy
Registered User
 
Join Date: Mar 2017
Location: Rhode Island / United States
Posts: 203
Parity does not exist on the Motorola bus. Parity checking can exist on SCSI, but that has nothing to do with the topic of performance.

Memory on the GVP card or at lease in the Zorro II 8M space, with the GVP driver, with a standard full mask of 0xFFFFFFFE will be your best performance option. Never let FFS do the work with that hack called DMA mask.

The A2000 is a 24-bit address system (16MB Address space), and the only way to get data to the memory above 16MB is to DMA to the 240bit address space, and CPU copy up, which will limit any I/O to 1/2 the Zorro bus bandwidth. gvpscsi.device does that natively.

Buffmem in mountlist and RDB definitions is the memory used for directory buffers, and NOTHING ELSE!
thebajaguy is offline  
Old 31 January 2023, 08:43   #67
alexh
Thalion Webshrine
 
alexh's Avatar
 
Join Date: Jan 2004
Location: Oxford
Posts: 14,354
Bad choice of words. TF does not say "Parity" here meaning a technological SCSI feature but a word which means equivalence
alexh is offline  
Old 31 January 2023, 16:26   #68
thebajaguy
Registered User
 
Join Date: Mar 2017
Location: Rhode Island / United States
Posts: 203
OK - noted. Thanks.

As far as the technical advice, the driver (gvpscsi.device) always knows to use 24Bit DMA memory in a 32-bit system, and does so as efficiently as possible. The omniscsi.device driver also does it, and that's why DMA_Mask values on filesystems must be set to full 32-bit-bit for them.

C= scsi.device for the A590/A2091 and hardframe.device for the Microbotics HardFrame are only 24-bit aware, and the mask set for 24-bit is required to work around the transfer >16MB address limitation. Note that the mask has no effect on any software which communicates directly to the driver, and is expecting it to fulfil data transfers to/from memory which does not have the 24-bit DMA flag on it.
thebajaguy is offline  
Old 31 January 2023, 16:31   #69
StompinSteve
Village idiot
 
StompinSteve's Avatar
 
Join Date: Jun 2021
Location: Switzerland
Posts: 267
Quote:
Originally Posted by thebajaguy View Post
Parity does not exist on the Motorola bus. Parity checking can exist on SCSI, but that has nothing to do with the topic of performance.
Hihihi no no Robert he meant Parity as in "The US Dollar and the Euro reached parity" meaning they are now equal

Edit: Alexh beat me to it
StompinSteve is offline  
Old 31 January 2023, 16:51   #70
SpeedGeek
Moderator
 
SpeedGeek's Avatar
 
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 841
Quote:
Originally Posted by StompinSteve View Post
I've been talking to the actual developer of the TF536 and asked him if there is anything that can be done concerning the performance. This is his answer:

Same goes for the TF534, he later added.

So part of his reply was this: "you could get parity by restricting the buffer memory type to 24bit dma."
What does he mean by that? Does he refer to the 8MB FastRAM on the GVP and configuring "BufMem Type = 5" ?
Both of the drivers gvpscsi.device (v4.x) and omniscsi.device (all versions) automatically perform buffered DMA for transfers. This means they allocate some MEMF_24BITDMA memory and then call Copymem() to move the data to it's final destination (e.g. the 32 bit Extended RAM on accelerator cards).

On the hardware side, there is not much more to be done other than to make sure the accelerator card can access the Zorro2 bus Fast RAM at the max 3.5MB/sec. transfer rate.

On the software side, the Guru ROM provides the omniscsi.device (which supports synchronous SCSI transfers). But no one has yet mentioned using a Copymemquick patch:

https://eab.abime.net/showthread.php?t=76777

Also, if your using the 68030.library + mmu.library you should have read this thread first:

https://eab.abime.net/showthread.php?t=108534

Last edited by SpeedGeek; 31 January 2023 at 21:23.
SpeedGeek is offline  
Old 31 January 2023, 17:32   #71
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,233
Quote:
Originally Posted by thebajaguy View Post
Buffmem in mountlist and RDB definitions is the memory used for directory buffers, and NOTHING ELSE!
Well, it is used for all buffers the file system allocates itself. This includes directory buffers, but it also includes data buffers that are allocated if the mask does not fit.

If the user buffer address AND (not MASK) is non-zero, then the file system allocates a buffer of type BUFMEMTYPE, performs the IO into this buffer, and then copies it over to the target. This implies "single block transfer". Thus - slow!

In short, MASK and BUFMEMTYPE rarely help you to improve performance - the device driver underneath should (and in case of the omniscsi does) really know better into which memory it can perform DMA, and handle user provided buffers itself by potentially copying data, though hopefully in larger units than single blocks.
Thomas Richter is offline  
Old 31 January 2023, 21:26   #72
StompinSteve
Village idiot
 
StompinSteve's Avatar
 
Join Date: Jun 2021
Location: Switzerland
Posts: 267
Quote:
Originally Posted by SpeedGeek View Post
But no one has yet mentioned using a Copymemquick patch:
https://eab.abime.net/showthread.php?t=76777

if your using the 68030.library + mmu.library you should have read this thread first:
https://eab.abime.net/showthread.php?t=108534
Then one needs to know that such a thread exists

Anywho, I read both threads, like 3 times, coz with some content I just glazed over if I'm honest. But what I deducted from both articles is that it's worth trying throwing MMUlibs overboard and just do "cpu fastrom" instead. The second take-away being that using CMQ&B might be a good idea.
I only ran the CPU, Memory and Disk tests with SysSpeed v2.6.

So, with my A500Plus + TF536 and GVP, MMUlibs tuned to the teeth, I benchmarked 3 times. Power-Cycling after each run and waiting 15 seconds before powering back on again to flush out sneaky resident stuff.
I got certain results. This would be my baseline.

I then threw it all out, took the stock SetPatch that comes on the C= Workbench 3.1 floppies and even deleted the 68030.library from Libs: because Commodore only shipped it's 68040.library file with WB3.1
So i'm now stock as can be.
No 68030.library in Libs:
All I added was "C:CPU FastROM DataBurst >NIL:" simply because the system feels like a lazy dog without it.

In all Memory benchmarks, the results where a tad higher. Only the ChipRam tests where the same as before. But every other memory test got a bit of a boost.

I then only added the current version of CMQ&B (directly after SetPatch) and....
another small bump in performance in all memory tests. And again the ChipRam tests where the same as before with MMUlibs present.

But the biggest shocker is the, not insignificant performance boost in the disk tests. The four Throughput testresults where the same but the Ops/Sec test saw quite an increase. Especially the Seek/Read test that went from 1146 to 1556 without MMUlib
Adding CMQ&B showed the same results as without.

In SysInfo, the disk-performance (which is a RawRead) went from 808KB/s consistently with MMUlibs, to 816KB/s consistently with just plain C= WB31 SetPatch and "cpu fastrom databurst".

So, i'm now running without MMUlibs, performance is a tad better that ever before.
What I don't know is what i'm missing out without MMUlibs. There must be a reason that the floppy that came with the TF536 has the latest MuLibs SetPatch on it.
I switched from the C= WB31 SetPatch to the TF536 / MuLibs SetPatch version, not because it improves performance (it's exactly the same) but because "it feels like the right thing to do".

Last edited by SpeedGeek; 01 February 2023 at 00:00. Reason: typo correction
StompinSteve is offline  
Old 31 January 2023, 23:52   #73
SpeedGeek
Moderator
 
SpeedGeek's Avatar
 
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 841
Quote:
Originally Posted by StompinSteve View Post
Then one needs to know that such a thread exists

Anywho, I read both threads, like 3 times, coz with some content I just glazed over if I'm honest. But what I deducted from both articles is that it's worth trying throwing MMUlibs overboard and just do "cpu fastrom" instead. The second take-away being that using CMQ&B might be a good idea.
I only ran the CPU, Memory and Disk tests with SysSpeed v2.6.

So, with my A500Plus + TF536 and GVP, MMUlibs tuned to the teeth, I benchmarked 3 times. Power-Cycling after each run and waiting 15 seconds before powering back on again to flush out sneaky resident stuff.
I got certain results. This would be my baseline.

I then threw it all out, took the stock SetPatch that comes on the C= Workbench 3.1 floppies and even deleted the 68030.library from Libs: because Commodore only shipped it's 68040.library file with WB3.1
So i'm now stock as can be.
No 68030.library in Libs:
All I added was "C:CPU FastROM DataBurst >NIL:" simply because the system feels like a lazy dog without it.

In all Memory benchmarks, the results where a tad higher. Only the ChipRam tests where the same as before. But every other memory test got a bit of a boost.

I then only added the current version of CMQ&B (directly after SetPatch) and....
another small bump in performance in all memory tests. And again the ChipRam tests where the same as before with MMUlibs present.

But the biggest shocker is the, not insignificant performance boost in the disk tests. The four Throughput testresults where the same but the Ops/Sec test saw quite an increase. Especially the Seek/Read test that went from 1146 to 1556 without MMUlib
Adding CMQ&B showed the same results as without.

In SysInfo, the disk-performance (which is a RawRead) went from 808KB/s consistently with MMUlibs, to 816KB/s consistently with just plain C= WB31 SetPatch and "cpu fastrom databurst".

So, i'm now running without MMUlibs, performance is a tad better that ever before.
What I don't know is what i'm missing out without MMUlibs. There must be a reason that the floppy that came with the TF536 has the latest MuLibs SetPatch on it.
I switched from the C= WB31 SetPatch to the TF536 / MuLibs SetPatch version, not because it improves performance (it's exactly the same) but because "it feels like the right thing to do".
What I would have done differently is not enable CPU databurst. The Zorro2 bus Fast memory does not support burst anyway, but even if the 32 bit Accelerator card Fast memory does, it won't offer much of performance boost and you probably don't know if the accelerator card memory control logic can abort the wrap around burst cycles either. However, CPU instruction burst should be enabled by Setpatch and will certainly give you a performance boost (with burst capable memory controllers).

It probably was not necessary to use the OS3.1 version of Setpatch. If Setpatch can't find the 68030.library it most certainly can't load it. Anyway, I would suggest trying RSCP again with the 512K transfer size selected.

Last edited by SpeedGeek; 01 February 2023 at 00:01.
SpeedGeek is offline  
Old 01 February 2023, 00:30   #74
StompinSteve
Village idiot
 
StompinSteve's Avatar
 
Join Date: Jun 2021
Location: Switzerland
Posts: 267
Quote:
Originally Posted by SpeedGeek View Post
What I would have done differently is not enable CPU databurst.
Ok I removed "databurst" from the CPU command. The only parameter now is "FastROM".

The SetPatch version i'm using now is the latest I could find on AmiNet, version 43.6b

RSCP (020 version) results:
512K Sequential MEMF_PUB: 800K/sec
512K Sequential MEMF_24BIT: 1128 K/sec
512K Sequential MEMF_CHIP: 1127 K/sec

Concerning not having a 68030.library: there are people with arguments pro and con. Some say it's not needed on an 030 and other say it is. What is your take on this?
And what 68030.library file would be recommended? The only one I can find is the one in the MuLibs Distro.
StompinSteve is offline  
Old 01 February 2023, 02:35   #75
dalek
Registered User
 
Join Date: Nov 2014
Location: NSW/Australia
Posts: 462
For the TF536 use the latest MMULib is the recommended approach.

Also, you've probably already spent more time researching this than you would have ever saved in your lifetime if the performance was say 4MB/s instead of 800k/s
dalek is offline  
Old 01 February 2023, 07:07   #76
thebajaguy
Registered User
 
Join Date: Mar 2017
Location: Rhode Island / United States
Posts: 203
At the end of the day, I believe the CIIN issue should be protected against with the MMU - when possible. There are system configurations that are unlikely to run into it - yes - but you will never know a dirty cache read of hardware happened until data is corrupted, or the system suddenly crashes. The MMU could have prevented it. In the 68EC030's case, the TTX register setting preventing data caching over the <16MB address space does the same (with potential performance impact likely).

If the OS structures end up in ChipRAM because the AddMem for the accelerator 32-bit FastRAM is done later than AutoConfig time, I prefer an MMU remap of the low memory area solution (MuFastZero functionality). The benefits to OS response and I/O performance are noticeable, and it's more compatible than using MoveSSP and VBR remap solutions. It's less important to remap this memory space with the MMU if some 32-bit FastRAM is present at OS setup time, and the OS makes use of it.

I prefer using an on-card Kickstart remap over the MMU - when it is available. This saves potential (minor) MMU lookup table hits. The MMU may be the only option in some cases, though.

I have come to prefer that ALL 16-bit RAM, in a system where there is a data cache on a 32-bit CPU accelerator (and it loads 4x 32-bit longwords, burst hardware protocol, or just 4x normal longword sequential accesses) be set to nocache. 4x longword load (or save) is a high penalty to pay for any high % of cache misses, and/or a single or double longword (at most) retrieval or push. If the RAM priority is last-use (typical), and is used mainly for a DMA-CPU copy buffer, the cache has no value. It's effectively shared RAM for I/O data, and will never cache-hit.

My problem with OS 3.1's CPU FastROM solution is that it does not solve CIIN with the MMU when it does FastROM remap. It's a no-frills version of SetCPU that does Kickstart remap with the 68030 CPU, and toggles the CPU cache and burst registers on and off. It won't address remapping system structures that end up in ChipRAM (the late AddMem of FastRAM issue mentioned above). It has no cache-setting contingencies for other shared memory expansion options. In a perfect system, it can be the fastest Kickstart remap solution, but the perfect system, with software/drivers that avoid the corner-case CIIN flaw, are hard to define.

SetCPU offers a little more versatility, solving CIIN and fixing the Bridgecard's shared RAM issue (with Janus 2.1, but not earlier versions) if caching and FastROM is enabled after the drivers load. It doesn't provide a MMU-based solution for the low-memory OS structure in ChipRAM use case mentioned above. It doesn't handle EC processor cases. RTG cards (w/shared RAM) may have issues with it's cache-mapping choices on them - it was designed before they became popular. It's ideal use case is a 68030 accelerator with some AutoConfig FastRAM on it - like the C= A2630 or similar.

MuLibs then becomes the Swiss Army knife solution, but will be slightly less efficient in some configurations because of the additional functionality. It's the only safe solution for the 68EC030. It can be paired with a few other OEM or 3rd party tools/solutions to improve on some of it's inefficiencies, but care and understanding are needed so as to not duplicate a function or cause a conflict.

All of this is focused on the 68030, and it's use in A500/A2000 24-bit motherboard use cases. The A3000/A4000 with their native 32-bit FastRAM and 32-bit ROM, and 32-bit CPU slot, warrant a different approach than the slower 16-bit data bus on the classic 68K systems. The 68040/68060 MMU, varied memory and bus access designs, and the behavioral differences from the 68030, are also different use cases than for the above opinion/solutions. The need for the CPU library to address their behavior has to be different than the 68030, and tailored for the different busses in each.

Last edited by thebajaguy; 01 February 2023 at 07:40.
thebajaguy is offline  
Old 01 February 2023, 08:47   #77
patrik
Registered User
 
patrik's Avatar
 
Join Date: Jan 2005
Location: UmeƄ
Age: 43
Posts: 924
Are there any other cases of the cache inhibit bug than 030 systems with a bridgeboard or CyberVision64/3D?
patrik is offline  
Old 01 February 2023, 09:30   #78
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,233
Quote:
Originally Posted by patrik View Post
Are there any other cases of the cache inhibit bug than 030 systems with a bridgeboard or CyberVision64/3D?
How can one possibly answer? Nobody has a complete list of all hardware available on the Amiga, and whether or not such hardware has longword aligned 32-bit registers in it. I can only provide the list of hardware I know and I had access to, and bridgeboards and the cvision3d are two affected boards, but there may be others.
Thomas Richter is offline  
Old 01 February 2023, 09:43   #79
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,233
Quote:
Originally Posted by StompinSteve View Post
Ok I removed "databurst" from the CPU command. The only parameter now is "FastROM".
Which is nonsense. Really. The old "CPU FastROM" command activates the MMU the very same way "MuFastROM" does, but in a less compatible way with less tools available around it. There is no difference otherwise.


There is a "GVPCPUCtrl" command which performs the remapping by means of on-board logic on GVP boards (not using the MMU), however this remapping does then not enable the 68030 CIIN workaround, and it is neither able to write-protect the ROM mirror (so programs can stomp upon the copy of the ROM). If you have no affected hardware, and you do not need the MMU otherwise, it may be an alternative. Note that the GVP on-board logic does not provide an alternative to MuFastZero, so it can only do "half of the job".



Quote:
Originally Posted by StompinSteve View Post
Concerning not having a 68030.library: there are people with arguments pro and con. Some say it's not needed on an 030 and other say it is. What is your take on this?
The purpose of this library is to turn on the MMU, same as the "CPU FastROM" command, but in an all-integrated way with the tools around it. It also provides an interface to control the FPU on board (with the FPU command line) if there is one, but that is of lesser importance.


Depending on the board you have, "MuFastROM" is almost always a good idea, and "MuFastZero" is also almost always a good idea. Depending on the hardware, enabling the MMU may be necessary to get it working, but see my post above - nobodoy has a complete list of hardware that is affected by the 68030 CIIN erratum. You can already trigger it with the custom chipset if you go for it, but it has separate read and write registers, so the problem does typically not manifest itself there.


Thus, please throw out "CPU FastROM". That is deprecated, and provides only disadvantages over the more compatible "MuFastROM".


I neither recommend any of these CMQ-"improvements". Some of them use instructions that do not work 100% robustly over the Zorro bus as they initiate bursts even though the bus cannot take them, at least on the 68040 onwards, and some have subtle defects. It is a micro-optimization that will likely not make a major difference anyhow. Whether it's 800KB/s or 880KB/s is not noticable anyhow. Mapping the ROM to fast RAM, and mapping the zero-page (with exec in it) to fast RAM will provide much more an advantage than this.
Thomas Richter is offline  
Old 01 February 2023, 10:44   #80
StompinSteve
Village idiot
 
StompinSteve's Avatar
 
Join Date: Jun 2021
Location: Switzerland
Posts: 267
Quote:
Originally Posted by dalek View Post
Also, you've probably already spent more time researching this than you would have ever saved in your lifetime if the performance was say 4MB/s instead of 800k/s
No no no my Amiga friend. I was merely wondering why my GVP A500 HDD is a bit faster with a stock 68000 than with a 50Mhz oh-thirty TF536.
I never expected the thread to become 400 pages long I'm learning a lot.

But as a person that likes to squeeze every ounce of performance out of an Amiga, knowing very well that my iPhone is 10000000000 times faster anyways

Last edited by StompinSteve; 01 February 2023 at 10:54.
StompinSteve is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Just broke: GVP Impact Series II A500 HD8+ kintel support.Hardware 2 29 October 2023 10:09
gvp impact series ii a500 hd8 caver99 support.Hardware 8 23 February 2021 08:32
Wanted: GVP Impact A500 HD8+ Series II Controller Smakar MarketPlace 3 16 November 2012 01:50
GVP Impact A500 HD8+ Series II Valuation please? paulcan MarketPlace 3 28 August 2010 15:18
GVP Impact Series II A500-HD+/HD8+ vs Trumpcard Photon support.Hardware 2 18 September 2009 22:27

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 05:30.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.10860 seconds with 16 queries