English Amiga Board

English Amiga Board (https://eab.abime.net/index.php)
-   Hardware mods (https://eab.abime.net/forumdisplay.php?f=105)
-   -   E clock speedup mod (https://eab.abime.net/showthread.php?t=104631)

SpeedGeek 11 November 2020 03:52

E clock speedup mod
 
2 Attachment(s)
Hello my fellow EABers!

As you probably know, I've spent way WAY too much time overclocking and hacking my A2630. But all of this time and effort was spent on the 32 bit "Fast" bus of the A2630.

So now, I have given some time and effort to the "Slow" bus on the A2000. I mean the snail SLOW bus on the A2000. The E clock bus!

So why bother to do a mod which tweaks the performance on the E clock bus? Because all a fast CPU can do is clock off wait states on the snail slow E clock bus. So I want the E clock cycle to be optimized for less wait states... and this is exactly what the new A2630 U506 GAL does (see below):

WARNING:
Just in case you did not RTFM, this mod assumes no real 6800 E clock devices exist in the system and the 8520 devices can handle some performance tweaks. So you should keep the original U506 PAL just in case you ever get a very rare Zorro II E clock Board!

NOTES:
A GAL16V8 or equivalent PLD is required and must be programmed with the .jed file provided! Speed is not important but a 15ns device is nominal. This can also be done for the A2620, but I don't have one to test it on. The A2620 PAL U309 is nearly identical to the A2630 PAL U506. So, A2620 owners can just rename the GAL to U309!
EDIT: It appears no fix is needed for A3000/A4000 owners. See post #10 below.

DISCLAIMER:
Use at your own risk! No warranty expressed or implied, etc.

Code:

Name    U506 ;
PartNo  XXXXX ;
Date    5/6/1988 - 10/29/2020 ;
Revision 02 ;
Designer Haynie - SpeedGeek ;
Company  Commodore ;
Assembly 312828 ;
Location U506 ;
Device  G16V8 ;

/************************************************************************/
/*                                                                        */
/*  A2630        E clock generation, VMA generation, EDTACK generation,        */
/*                  Refresh request, 3 bits of refresh counter, and        */
/*                Refresh request.                                        */
/*                                                                        */
/************************************************************************/
/*  Allowable Target Device Types: 16R6A                                */
/************************************************************************/
/*  Clock:        7M                                                        */
/************************************************************************/
/*  Free Pins: 2(I)                                                        */
/************************************************************************/
/*  HISTORY                                                                */
/*        DBH Sep 25:        Made from U309R1 from A2630 Rev 2.                */
/*        SpeedGeek Oct. 29: Assume no real 6800 E CLK devices        */
/*      exist in the system and the 8520 devices can handle    */
/*        some performance tweaks.         
/************************************************************************/

/**  Inputs  **/

PIN 1                = CLK                ;        /* 7 MHz */
PIN 3                = A0                ;        /* E and refresh counter bits */
PIN 4                = A1                ;
PIN 5                = A2                ;
PIN 6                = A3                ;
PIN 7                = B2000                ;        /* Board inside a B2000 */
PIN 8                = TRISTATE        ;        /* Bus tristate control */
PIN 9                = !VPA                ;        /* Valid peripheral address */
PIN 11                = !OE                ;

/**  Outputs  **/

PIN 19                = E                ;        /* the 6800 E clock */
PIN 12                = !REFREQ        ;        /* Refresh request to refresh logic */
PIN 13                = IVMA                ;        /* Internal VMA */
PIN 14                = EDTACK        ;        /* DTACK for 6800 cycle */
PIN 18                = !StBit3        ;        /* Refresh counter bit 3 */
PIN 17                = !StBit2        ;        /* Refresh counter bit 2 */
PIN 16                = !StBit1        ;        /* Refresh counter bit 1 */
PIN 15                = !StBit0        ;        /* Refresh counter bit 0 */

/** Declarations and Intermediate Variable Definitions **/

count                = A0 & A3;

/**  Logic Equations  **/

E                = A2;
E.OE                = !B2000;

/* Initially, the logic here enabled IVMA during (!A3 & A2 & !A1 & A0 & VPA).
  This is the proper time to have VMA come out, just about when the 68000
  would bring it out, actually slightly sooner since this PAL releases it on
  the wrong 7M edge.  The main problem with this scheme is that if VPA falls
  in the case that's just prior to that enabling term (what I call CASE 3
  in my timing), the I/O cycle should be held off until the next E cycle.
  The 68000 does this, but the above IVMA would run that cycle right away.
  The fix to this used here moves the IVMA equation up by one clock cycle,
  assuring that a CASE 3 VPA will be delayed.  This adds a potential problem
  in that IVMA would is asserted sooner than a 68000 would assert it.  We
  know this is no problem for 8520 devices, and /VPA driven devices aren't
  supported under autoconfig, so we should be OK here.
*/ 

/* This was "!A3 & !A2 & !A1 & !A0 & VPA # !IVMA & !A3" but 8520
devices tolerate late assertion too! */

!IVMA.D                =  !A3 & !A2 & VPA
                # !IVMA & !A3;

/* This was "!A3 & A2 & A1 & A0 & !IVMA", but I think that may make
  the cycle end too late. So I'm pushing it down by one clock! */

!EDTACK.D        = !A3 & A2 & A1 & !A0 & !IVMA;

/*        This is the refresh counter described in the refreshcounter
 *        state machine file. We have added the holding term to
 *        The REFREQ output (REFREQ & !WIN).
 */

StBit3.D        = !count & !StBit1 & !StBit2 &  StBit3
                #  count & !StBit1 & !StBit2 & !StBit3
                #  count & !StBit0 &  StBit1 & !StBit3
                #  count & !StBit0 &  StBit2 & !StBit3
                # !count & !StBit0 &  StBit3;

StBit2.D        = !count &  StBit0 & !StBit1 &  StBit2 & !StBit3
                #  count & !StBit0 &  StBit1 & !StBit2 &  StBit3
                #  count & !StBit1 & !StBit2 &  StBit3
                #  count & !StBit0 &  StBit2 & !StBit3
                # !count & !StBit0 &  StBit2;

StBit1.D        =  count & !StBit0 & !StBit1 &  StBit2 & StBit3
                #  count & !StBit0 &  StBit1 & !StBit2 & StBit3
                #  count & !StBit0 &  StBit1 & !StBit3
                # !count & !StBit0 &  StBit1;

StBit0.D        =  count & !StBit0 &  StBit1 & StBit2 &  StBit3
                # !count &  StBit0 & !StBit1 & StBit2 & !StBit3
                # StBit0 & !StBit1 & !StBit2;

REFREQ =        count & StBit0 & !StBit1 & StBit2 & !StBit3;


AmigaHope 20 November 2020 12:41

I'm curious how compatible this is with "regular" Z2 cards that don't touch the external clock.

SpeedGeek 20 November 2020 14:48

Hmm... I thought this was very well explained. I have assumed that 99.9% of all Z2 cards use the 7 MHz clock and only .1 % of Z2 cards use the E clock. Therefore, 99.9% of the Z2 card users should have no problem.

However, if you believe my assumption is inaccurate then please provide me with the more accurate information. Thanks. ;)

BTW, E means Enable not External!

PR77 22 November 2020 13:27

This is a very interesting idea ... Do you have some benchmarks for comparison? With only Chip RAM present?

For newer accelerators which generate an E Clock (mine inclusive) based on the original timing (E Clock active -> /DTACK assertion) this would not be too hard to also tweak in the CPLD.

SpeedGeek 22 November 2020 17:23

Sorry, no benchmarks. Some benchmark programs just report the E clock frequency but never bothered with performance results for the E clock bus. Chip RAM is not relevant since it's timing is on the 7 MHz bus.

Some Approx. 7 MHz wait state calculations are as follows:

Old U506 PAL
---------------
Best case 8 clocks
Worst case 17 clocks

New U506 GAL
----------------
Best case 5 clocks
Worst case 14 clocks

Notes: The number of CPU wait states varies with the CPU clock speed and the efficiency of the 68000 state machine logic. Also, the average case performance is much better for the new U506 GAL since it runs the case 3 cycle and the old U506 PAL skips it. Yes, it could be tweaked in a CPLD based design too.

Dale423 26 May 2021 04:37

Are there any benchmarks for before and after?

Shadowfire 26 May 2021 19:09

Since the 8520's are the only chips in the system that use the E bus, it only affects accesses to the following things:
* Joystick fire buttons
* Filter LED
* Floppy disk control lines
* Parallel port
* Serial port

The only items on that list that you would hit more than once in a blue moon is the serial, parallel, and floppy ports. And the floppy port is already constrained by the speed of the floppy drive. Time spent fiddlefarting with the floppy control lines is MINISCULE compared to the time to read in a track.

Say you're in a game and its reading the fire button 60x a second. Lets give it an absolute best case scenario, and every time you read it, the mod saves 12 clock cycles.
So your game saves 720 clock cycles every second, or about 0.001% of the available clock cycles.

You might not even be able to measure the speedup with a synthetic test.

The only real chance of a measurable improvement would be serial port bit-banging; it *might* allow you to use a faster baud rate (but I doubt it).

SpeedGeek 26 May 2021 22:36

Quote:

Originally Posted by Shadowfire (Post 1487068)
Since the 8520's are the only chips in the system that use the E bus, it only affects accesses to the following things:
* Joystick fire buttons
* Filter LED
* Floppy disk control lines
* Parallel port
* Serial port

The only items on that list that you would hit more than once in a blue moon is the serial, parallel, and floppy ports. And the floppy port is already constrained by the speed of the floppy drive. Time spent fiddlefarting with the floppy control lines is MINISCULE compared to the time to read in a track.

Say you're in a game and its reading the fire button 60x a second. Lets give it an absolute best case scenario, and every time you read it, the mod saves 12 clock cycles.
So your game saves 720 clock cycles every second, or about 0.001% of the available clock cycles.

You might not even be able to measure the speedup with a synthetic test.

The only real chance of a measurable improvement would be serial port bit-banging; it *might* allow you to use a faster baud rate (but I doubt it).

You forgot to mention the timer functions which are the hardware component of timer.device. There is also an interrupt control register. So, there is active use of the 8520's even when the I/O or floppy ports are inactive.

Regarding the speedup, a logic analyzer would be the best option to REALLY see what's going on here. ;)

rzookol 31 May 2021 23:04

I did some tests with mp3 decoding and didn't get any speedup :(

SpeedGeek 26 December 2021 17:23

Thanks to patrik a benchmark tool is now available: :great

https://eab.abime.net/showthread.php?p=1523681

Unfortunately, I now have my GVP G-Force 030 installed in my A2000 (waiting support for another long delayed project). So I can't provide any immediate benchmark results.


All times are GMT +2. The time now is 20:11.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.

Page generated in 0.04382 seconds with 11 queries