English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 24 December 2021, 15:27   #1
phx
Natteravn
 
phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,500
68060 Rev.1 Superscalar bug?

The last days I was debugging a wrong FPU result, which became correct as soon as I single-stepped it with BDebug (real A3000, CSPPC-060).

Today I had the idea that it might be a superscalar issue and started inserting NOPs between instructions in the relevant part. The bug disappeared! Finally I isolated the problem to the following instructions (part of a dot-product operation, compiler-generated code):
Code:
        fmove.s (4,a1),fp3
        fmove.s (4,a0),fp0
        fmul.x  fp3,fp0
        fadd.x  fp0,fp4
The problem disappears when I insert a NOP between the two FMOVE.S, which makes sure that the last instruction is finished, or the pipeline becomes empty(?). Similar to ISYNC on PPC, I guess.
The problem also disappears when I clear bit 0 (ESS) of the PCR register, which disables superscalar.
Is this a know bug for a revision 1 68060? Is it fixed in later revisions? My PCR is: $04300121.
phx is offline  
Old 24 December 2021, 16:49   #2
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,233
Quote:
Originally Posted by phx View Post
Today I had the idea that it might be a superscalar issue and started inserting NOPs between instructions in the relevant part. The bug disappeared! Finally I isolated the problem to the following instructions (part of a dot-product operation, compiler-generated code):
Code:
        fmove.s (4,a1),fp3
        fmove.s (4,a0),fp0
        fmul.x  fp3,fp0
        fadd.x  fp0,fp4
The problem disappears when I insert a NOP between the two FMOVE.S, which makes sure that the last instruction is finished, or the pipeline becomes empty(?).
Could you please include some instructions above the sequence? This looks like a variant of the F6 issue, which is fixed in later versions, but for that, it would require the address registers to be loaded just above the sequence.



If you want to, I can test this on my rev.6 68060, but it would require a more complete test sequence.


The NOP stalls the pipeline long enough to avoid the result forwarding for the second fmove.


Actually, this also indicates that the 68060.library you are using is not careful enough to disable superscalar execution on the rev1 cpus, which it should really do.
Thomas Richter is offline  
Old 24 December 2021, 18:34   #3
SpeedGeek
Moderator
 
SpeedGeek's Avatar
 
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 841
Hmm...

Did you try using FNOP instead of NOP? But either way, I assume you want to keep Superscalar mode enabled.

Last edited by SpeedGeek; 24 December 2021 at 20:52.
SpeedGeek is offline  
Old 24 December 2021, 22:30   #4
phx
Natteravn
 
phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,500
Quote:
Originally Posted by Thomas Richter View Post
Could you please include some instructions above the sequence? This looks like a variant of the F6 issue
I downloaded the errata now, and you are absolutely correct! The full sequence is indeed matching F6:
Code:
        fmul.x  fp1,fp4
        move.l  (8+l691,a7),a1
        move.l  (88+l691,a7),a0
        fmove.s (4,a1),fp3
        fmove.s (4,a0),fp0
Although it suggests inserting a NOP after F<op.A> as a workaround, which is FMUL here, while it also worked with a NOP after F<op.B>.

Quote:
If you want to, I can test this on my rev.6 68060, but it would require a more complete test sequence.
It is Quake, compiled in pure C as a compiler testsuite. So I could send you the executable only when you have the data. But I would be surprised when it fails on a Rev.6 060, after we identified F6.

Quote:
Actually, this also indicates that the 68060.library you are using is not careful enough to disable superscalar execution on the rev1 cpus, which it should really do.
The system is running Kickstart 45.57, Workbench 45.4, with the Phase5 68060.library 46.7 from 1999.
Is it really the recommended workaround to disable superscalar execution on rev.1 chips and downgrade the performance considerably? It's the first time I noticed a problem in decades.
phx is offline  
Old 24 December 2021, 22:38   #5
phx
Natteravn
 
phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,500
Quote:
Originally Posted by SpeedGeek View Post
Did you try using FNOP instead of NOP? But either way, I assume you want to keep Superscalar mode enabled.
Just tried it. FNOP seems to have the same positive effect as NOP.
phx is offline  
Old 25 December 2021, 10:38   #6
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,233
Quote:
Originally Posted by phx View Post
Is it really the recommended workaround to disable superscalar execution on rev.1 chips and downgrade the performance considerably?
The downgrade is minimal,actually, so it does not matter much. So yes, I really recommend doing so. Correctness over speed. It does not matter if the code generates a result that is very quick if this result is incorrect.



Of course you can create an NOP or FNOP in multiple places, but this downgrades the performance of your application on all CPUs, whereas disabling the load-store-buffer bypass only degrades performance on affected CPUs. I believe the GNU compiler follows the first approach, by injecting a lot of NOPs.
Thomas Richter is offline  
Old 25 December 2021, 11:51   #7
phx
Natteravn
 
phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,500
Quote:
Originally Posted by Thomas Richter View Post
The downgrade is minimal,actually, so it does not matter much. So yes, I really recommend doing so. Correctness over speed.
Agreed. At least when running compiler testsuites, to avoid spending days on hunting ghosts.

Quote:
whereas disabling the load-store-buffer bypass only degrades performance on affected CPUs.
One moment... the load-store-buffer bypass is a completely different issue, only present on rev.5 CPUs, IIRC? In fact, I still disable the Bypass in my Startup-Sequence (flipping bit 5 of PCR), because I had a rev.5 CPU board installed, before the repairs.
Now we are talking about disabling Superscalar (PCR bit 0), right?

BTW, is there a better 68060.library you would recommend, which can be easily downloaded for my system? I may update to 3.2 in the future, but currently I want to run 3.9 for testing purposes.

Quote:
I believe the GNU compiler follows the first approach, by injecting a lot of NOPs.
I was already discussing with Volker if the compiler should at least have an option to deal with these bugs. Maybe it can easily avoid that sequence, so there are no NOPs needed.
phx is offline  
Old 25 December 2021, 14:48   #8
phx
Natteravn
 
phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,500
Quote:
Originally Posted by Thomas Richter View Post
The downgrade is minimal,actually, so it does not matter much.
I just made some tests by compiling the BYTEmark '95 with identical settings. It is not minimal. Disabling superscalar has unfortunately a noticeable impact.

With Superscalar enabled:
Code:
BYTEmark (tm) Native Mode Benchmark ver. 2 (10/95)
NUMERIC SORT:  Iterations/sec.: 14.887315  Index: 0.381795
STRING SORT:  Iterations/sec.: 1.533946  Index: 0.685409
BITFIELD:  Iterations/sec.: 4624770.943366  Index: 0.793311
FP EMULATION:  Iterations/sec.: 1.694917  Index: 0.813300
FOURIER:  Iterations/sec.: 597.684677  Index: 0.679745
ASSIGNMENT:  Iterations/sec.: 0.259337  Index: 0.986824
IDEA:  Iterations/sec.: 38.285396  Index: 0.585565
HUFFMAN:  Iterations/sec.: 25.484308  Index: 0.706680
NEURAL NET:  Iterations/sec.: 0.279023  Index: 0.448229
LU DECOMPOSITION:  Iterations/sec.: 9.138496  Index: 0.473421
...done...
===========OVERALL============
INTEGER INDEX: 0.682454
FLOATING-POINT INDEX: 0.524446
 (90 MHz Dell Pentium = 1.00)
==============================
Without Superscalar:
Code:
BYTEmark (tm) Native Mode Benchmark ver. 2 (10/95)
NUMERIC SORT:  Iterations/sec.: 10.661501  Index: 0.273421
STRING SORT:  Iterations/sec.: 1.193346  Index: 0.533220
BITFIELD:  Iterations/sec.: 2800908.938665  Index: 0.480455
FP EMULATION:  Iterations/sec.: 1.370070  Index: 0.657423
FOURIER:  Iterations/sec.: 576.291257  Index: 0.655414
ASSIGNMENT:  Iterations/sec.: 0.209205  Index: 0.796062
IDEA:  Iterations/sec.: 30.637904  Index: 0.468598
HUFFMAN:  Iterations/sec.: 20.308745  Index: 0.563162
NEURAL NET:  Iterations/sec.: 0.269402  Index: 0.432774
LU DECOMPOSITION:  Iterations/sec.: 7.997249  Index: 0.414299
...done...
===========OVERALL============
INTEGER INDEX: 0.515503
FLOATING-POINT INDEX: 0.489816
 (90 MHz Dell Pentium = 1.00)
==============================
phx is offline  
Old 25 December 2021, 15:25   #9
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,233
Yes, my bad. I was a bit confused with super scalar vs. load/store buffer bypass. I disable the latter, but not the former on earlier CPUs.
Thomas Richter is offline  
Old 25 December 2021, 17:42   #10
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,104
PCR=$04300521 means rev 5 in this parlance, right? (asking because 68060UM says "The first revision is 00000000" for the Revision Number bits).

In that case let me know if you need some testing done on that revision. I have the quake data files, and can build vbcc+patches if needed.

P.S.: I'm using the 68060.library from Thomas' MMUlib (helpfully linked here), since bit 5 is set I'm guessing it's not rev6.
paraj is offline  
Old 25 December 2021, 20:48   #11
Bruce Abbott
Registered User
 
Bruce Abbott's Avatar
 
Join Date: Mar 2018
Location: Hastings, New Zealand
Posts: 2,581
This is bringing back some not-so-happy memories of the 68060 on my A3000. What a mess!
Bruce Abbott is offline  
Old 25 December 2021, 22:49   #12
phx
Natteravn
 
phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,500
Quote:
Originally Posted by paraj View Post
PCR=$04300521 means rev 5 in this parlance, right? (asking because 68060UM says "The first revision is 00000000" for the Revision Number bits).
Maybe the first revision is called rev. 0? When I boot NetBSD then it prints "68060 rev.1" with my PCR from above.

Quote:
In that case let me know if you need some testing done on that revision.
If you really have some time, drop me a mail and I will send you the executable for testing. The problem on my 060 revision is that, due to the F6 issue, the player can look around but neither move nor jump. Easy to spot.

Quote:
P.S.: I'm using the 68060.library from Thomas' MMUlib (helpfully linked here), since bit 5 is set I'm guessing it's not rev6.
Yes. No-bypass should only be set for revision 5.

But, honestly I'm not sure if the bit should be set or cleared, as it is not documented in my 68060UM. On my system it is also set, although I'm running the NoBypass patch (from Simon Goodwin?) directly after SetPatch. I know that it does a bchg #5, so it would revert the workaround when the 3.9 SetPatch already did it. I have to check that some day...
phx is offline  
Old 25 December 2021, 23:32   #13
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,233
The CPU command of 3.2 should tell you what the revision is. Concerning the load-store buffer bypass: The bypass is disabled (and by that the workaround is enabled) if the bit is 1. This problem applies also to the first revision, the 1F43G mask, not only to rev.5.
Thomas Richter is offline  
Old 26 December 2021, 02:09   #14
phx
Natteravn
 
phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,500
Ok, thanks.
Seems I already forgot most of the errata again.
phx is offline  
Old 26 December 2021, 15:12   #15
SpeedGeek
Moderator
 
SpeedGeek's Avatar
 
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 841
Quote:
Originally Posted by phx View Post
Maybe the first revision is called rev. 0? When I boot NetBSD then it prints "68060 rev.1" with my PCR from above.

If you really have some time, drop me a mail and I will send you the executable for testing. The problem on my 060 revision is that, due to the F6 issue, the player can look around but neither move nor jump. Easy to spot.

Yes. No-bypass should only be set for revision 5.

But, honestly I'm not sure if the bit should be set or cleared, as it is not documented in my 68060UM. On my system it is also set, although I'm running the NoBypass patch (from Simon Goodwin?) directly after SetPatch. I know that it does a bchg #5, so it would revert the workaround when the 3.9 SetPatch already did it. I have to check that some day...

What is the source of your information? If you looked at the Motorola 68060 Errata you should have seen the same mask set affected by F6 was also affected by I14 & I15. So not only Rev. 5 but Rev. 1 should have Load/Store bypass disabled too.

Neither, the 3.9 Setpatch nor the P5 68060.library does anything with Load/Store Bypass disable (PCR register bit #5). You were fortunate to have the NoBypass patch installed, which BTW does nothing for a Rev. 5 CPU.

Last edited by SpeedGeek; 26 December 2021 at 15:17.
SpeedGeek is offline  
Old 26 December 2021, 21:17   #16
phx
Natteravn
 
phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,500
Quote:
Originally Posted by SpeedGeek View Post
What is the source of your information?
My bad memory.

Quote:
If you looked at the Motorola 68060 Errata you should have seen the same mask set affected by F6 was also affected by I14 & I15.
Indeed. Thomas already mentioned that.

Quote:
Neither, the 3.9 Setpatch nor the P5 68060.library does anything with Load/Store Bypass disable (PCR register bit #5).
Ok. As expected.

Quote:
You were fortunate to have the NoBypass patch installed, which BTW does nothing for a Rev. 5 CPU.
How do you know? The errata only mentions mask sets, but no revisions.
phx is offline  
Old 26 December 2021, 22:35   #17
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,233
In case of my 68060.library, the answer is quite simple: It runs an instruction sequence that triggers the defect, and it disables the load-store buffer bypass in case the erratum could be reproduced. Thus, knowledge of the relation between mask sets and revisions is not necessary.
Thomas Richter is offline  
Old 27 December 2021, 14:03   #18
phx
Natteravn
 
phx's Avatar
 
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,500
Nice!
But as far as I understood it does neither run an F6 test sequence yet, nor disables superscalar in any situation?
phx is offline  
Old 27 December 2021, 14:21   #19
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,233
Correct. The F6 test sequence is currently only run by the CPU command. I'm a bit unclear what I should do about it. The Os math libraries are not affected (they don't trigger the defect) so the Os is, as such, fine. Unfortunately, it affects direct usage of the FPU in some cases.
Thomas Richter is offline  
Old 27 December 2021, 14:53   #20
SpeedGeek
Moderator
 
SpeedGeek's Avatar
 
Join Date: Dec 2010
Location: Wisconsin USA
Age: 60
Posts: 841
Quote:
Originally Posted by phx View Post
How do you know? The errata only mentions mask sets, but no revisions.
I simply relied on the extensive 68060 research performed by thebajaguy linked here:

https://www.amibay.com/forum/amibaye...mation-request

Note: You will need to login to Amibay to view this thread.

Of course, that information was not available when Simon Goodwin released his NoBypass patch (on Aminet).

Quote:
Originally Posted by Thomas Richter View Post
In case of my 68060.library, the answer is quite simple: It runs an instruction sequence that triggers the defect, and it disables the load-store buffer bypass in case the erratum could be reproduced. Thus, knowledge of the relation between mask sets and revisions is not necessary.
This certainly seems like a practical alternative solution. My only concern is whether or not the errata results are always the same. So for example, what happens if the Superscalar mode is disabled, the data cache mode is Nocache Precise or the Store buffer is disabled?

Last edited by SpeedGeek; 27 December 2021 at 15:49.
SpeedGeek is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Wanted - 68060 Rev 6 Retro-mania MarketPlace 48 06 August 2020 23:14
68060 rev. errata and performance impact? gdonner support.Hardware 6 24 April 2019 18:43
Difference between a 68060 rev 5 and 6 Syntrax support.Hardware 2 10 February 2019 21:23
WTB: 68060 Rev 6 71E41J TjLaZer MarketPlace 3 03 January 2016 14:10
68060 emulation bug riftcon support.WinUAE 4 14 March 2008 22:52

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 11:59.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.18284 seconds with 15 queries