04 November 2019, 20:09 | #21 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,491
|
Interesting thread!
Quote:
Do you have some code for a cold start setup (i.e. bootblock code for a game/demo start?) |
|
04 November 2019, 20:28 | #22 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,491
|
Unfortunately I have never had a 040 or 060, so I'm not totally sure how to handle them to their full potential in a true real Amiga environment..
[this should have been put in the message above, but oh well ] -- I found this interesting text among my old documents for 030 : Code:
MC68030 COMPARISON WITH MC68020 I read with interest the "Assessing MC68030 and MC68882 Performance" letter from Roy Druian, Product Marketing Manager of Motorola, as well as Dave Bursky's "32-Bit Microprocessors 1987 Technology Forecast" in the Electronic Design magazine of Jan. 8, 1987. Mr. Druian's letter tried to demonstrate that Motorola's 68030 at 20MHz has twice the performance of 68020 at 16.67MHz. In Mr. Busky's survey, Motorola's 68030 is presented as having 3 times the performance of 68020. Using Motorola's own performance data for 68020 and the data available for 68030 (see bibliography below), I calculated the 68030 performance improvement relative to the 68020. These calculations, presented in detail below, show that the performance of 68030 is only 18% better than that of 68020 at the same frequency (20MHz) even if we believe the high hit ratios Motorola claims for its small (256 byte) internal caches. Using the data presented by Motorola in [2], the calculated 68030 performance is 3.3 MIPS at 20 MHz, while 68020 performance is 2.3 MIPS at 16.67 and 2.8 MIPS at 20MHz. The other 68030 problems that Motorola's papers do not stress are: -impossible timing for its synchronous bus protocol (very hard and expensive to run with zero wait-states at 20MHz). -virtual (logical) caches (see [1] for problems of virtual caches) -lack of hardware cache invalidation for the internal caches, which makes 68030 very hard to use in multiprocessing systems or where external devices (eg. DMA) may change memory values -long context switching time, because it has to save temporary data inside the CPU (problem inherited from 68020) -small TLB (Address Translation Cache) for the 68030 MMU (22 entries only) Actually 68030 has no real MMU, as the TLB misses are handled by the 68030 Execution Unit and not by the MMU itself, transparent to the execution pipe. The high price of a TLB miss (because it is handled in microcode, serial to the instruction execution), combined with the relatively low TLB hit ratio (should be 90 - 95% for the 22 entry TLB and not 98% as Motorola claims) makes the use of the on-chip 68030 MMU expensive in terms of performance. 68030 vs. 68020 Comparison The improvements of 68030 versus 68020 consist of improvement of the Instruction Cache hit ratio due to larger line size (16 bytes for 68030 vs. 4 bytes for 68020), addition of a small Data Cache and integration of the MMU on chip. As for the "internal Harvard architecture", 68020 allows also the overlap of instruction fetches with data operand access [3]. The 68030 left unchanged the Execution Unit, except for the addition of some instructions to support the TLB on chip [4]. As a consequence, the only change in the performance model of 68030 when compared with 68020 is the improvement in the performance penalty for addressing instructions and data in memory. Using Motorola's own data (Table 6 in [2]), the average number of operand memory access per instruction for 68020 are: -0.384 reads per instruction -0.242 writes per instruction The Data Cache with 48% hit ratio and the 2 clock bus cycle will improve the 68030 execution time by: 0.384 * 0.48 * 2 + 0.384 * 0.52 * 1 = 0.567 clocks where: - 0.384 represents the average number of operand reads per instruction, - 0.48 represents the Data Cach hit ratio, - 2 represents the difference, in clocks, between 68020 going to external memory (no wait states) and 68030 finding data in the internal cache, - 0.52 (1-0.48) represents the Data Cache miss ratio, - 1 represents the difference, in clocks, between the 3 clock bus cycle of 68020 and the 2 clock bus cycle of 68030. The writes have roughly the same influence on performance for 68020 and 68030, as the Data Cache is a write-through cache and writes are buffered by the BIU. The other 68030 improvement is the higher Instruction Cache hit ratio because of 16 byte line size (the effect of the burst is already taken into account in the hit ratio of the Instruction Cache and Data Cache). According to Table 4 in [2], the number of clocks-per-instruction for 68020 drops from 7.159 with a 64% hit ratio instruction cache to 6.373 with a 100% hit ratio instruction cache (an improvement of 0.786 clocks-per-instruction). The estimated improvement given to 68030 by its 82% hit ratio instruction cache and its 2 clock bus protocol (no wait state) is 0.5 clocks-per-instruc- tion. The overall performance improvements due to the architectural improvements of 68030 relative to 68020 is then: 0.567 + 0.5 = 1.067 clocks/instruction According to Motorola's own calculations in [2], the average performance of 68020 with the Instruction Cache ON for the workload in [2] is 7.159 clocks/ instruction (Table 4 in [2]) when no wait states are present. This translates into 2.3 MIPS at 16.67 MHz and 2.8 MIPS at 20MHz. The (7.159 -1.067) = 6.092 clocks/instruction for 68030 translates into 3.3 MIPS at 20MHz. The relative improvement in performance of 68030 versus 68020 at the same frequency is then: (3.3 - 2.8) * 100/ 2.8 = 18% If the 16.67 MHz 68020 is compared against the 20 MHz 68030, the performance improvement factor is still only: (3.3 - 2.3) * 100/ 2.3 = 43% 68030 Synchronous Bus Timing Even the 18% architectural improvements of 68030 versus 68020 is questionable because of the way the 68030 2-cycle synchronous bus protocol is designed. For a Read bus cycle, the 68030 issues the address at the beginning of the first clock cycle and expects the system to return the ready signal (named STERM) at the end of the same cycle. Data is sampled by 68030 in the middle of the second cycle. According to the 68030 spec [6], the read bus timing at 20MHz is: - Address-to-STERM time = 25 ns - Address-to-Data time = 40 ns For a Write bus cycle, 68030 issues the address at the beginning of the first clock cycle, data at the beginning of the second clock cycle and expects the system to return ready (STERM) at the end of the first cycle. According to the 68030 spec, the write bus timing at 20 MHz is : - Address-to-STERM time = 25 ns - Write Data Valid time = 25 ns It is hard to avoid wait-states at 20MHz with such timing even when using the fastest and most expensive static RAM's. No wonder the Motorola is unwilling to commit pushing 68030 beyond 20MHz; with the 68030 bus protocol wait states are unavoidable. Sorin Iacobovici Computer and System Architecture National Semiconductor Corp. Santa Clara, Ca. REFERENCES ---------- 1. A.J. Smith "Cache Memories", ACM Computing Surveys, vol.14, no. 3, September 1982, pp. 473-530 2. D. MacGregor, J. Rubinstein "A Performance Analysis of MC68020-based Systems", IEEE Micro, Dec. 1985, pp. 50-70 3. D. MacGregor, D. Mothersole and B. Moyer "The Motorola MC68020", IEEE Micro, Vol.4, no.4 Aug. 1984, pp. 101-118 4. J.T.Reinhart "Extra Functions and Higher Speed Push Microprocessor to Top", Electron Products, Oct. 1, 1986, pp.35-39 5. D. MacGregor "Diverse Applications Put Splotlight on 68020's Improvements", Electronic Design, Feb. 7, 1985, pp.155-164 6. Motorola Inc. "MC68030 -Second Generation 32-Bit Enhanced Microprocessor", Technical Data, Motorola Inc., 1986 pp. 1-27 Last edited by ross; 04 November 2019 at 21:19. Reason: [] |
06 November 2019, 14:29 | #23 | ||
Registered User
Join Date: Sep 2015
Location: Germany
Posts: 260
|
Quote:
Alternatively I have these links: The 68030 and 68040 on the Zorro III Bus http://amigadev.elowar.com/read/ADCD.../node0161.html 68040 Compability Warning http://amigadev.elowar.com/read/ADCD.../node0083.html 68040 Programming https://www.drdobbs.com/68040-programming/184408316 Quote:
That's an interesting document you found. Thanks for sharing. Last edited by dissident; 06 November 2019 at 14:36. |
||
06 November 2019, 17:02 | #24 | |
Natteravn
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,546
|
Quote:
1. Determine CPU and set up caches. Code:
;--------------------------------------------------------------------------- getCPU: ; Determine CPU type and the interrupt vector base. ; Called through Supervisor(). ; a6 = SysBase ; Uses: d0, d1, d2 ; get CPU type (68000-68040) from SysBase.AttnFlags moveq #0,d0 move.b AttnFlags+1(a6),d1 moveq #0,d2 lsr.b #1,d1 addx.b d0,d2 lsr.b #1,d1 addx.b d0,d2 lsr.b #1,d1 addx.b d0,d2 lsr.b #1,d1 addx.b d0,d2 move.b d2,CPUtype(a4) beq .1 ; read vector base from VBR on 68010+ CPUs mc68010 movec vbr,d0 .1: move.l d0,AutoVecBase(a4) subq.b #4,d2 blo .3 ; 68060 check mc68040 nop cpusha bc moveq #0,d0 movec d0,cacr ; Does the 68060 DC No Allocate Mode bit in CACR work? bset #30,d0 movec d0,cacr movec cacr,d0 tst.l d0 beq .2 move.b #6,CPUtype(a4) ; ...then we have a real 68060 move.l #$00400000,d0 ; 68060: Clear All Branch Cache ; disable 68040/68060 caches for now .2: nop cpusha bc movec d0,cacr ; caches off for real .3: rte Code:
;--------------------------------------------------------------------------- enableCaches: ; Enable 68040 and 68060 caches and set transparent translation registers. ; Called through Supervisor(). ; a6 = SysBase ; d0 = main program base address cmp.b #4,CPUtype(a4) blo .2 mc68040 move.l #$0000c040,d1 movec d1,itt0 ; Cache Inhibit ChipRAM & Custom Chips movec d1,dtt0 and.l #$ff000000,d0 beq .1 ; main program in $00xxxxxx region? or.w #$c020,d0 .1: movec d0,itt1 ; Cache Copyback for Fast RAM movec d0,dtt1 ; paged MMU off nop moveq #0,d0 pflusha nop movec d0,tc ; enable caches, store buffering, branch cache, clear branch cache move.l #$a0c08000,d0 movec d0,cacr .2: rte |
|
06 November 2019, 18:30 | #25 |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,317
|
A word of warning: If you turn off the MMU on a 68040 or 68060, you make reliable DMA transfer impossible. That is, read or write accesses from DMA driven expansions such as SCSI host adapters may have unpredictable results.
|
06 November 2019, 22:55 | #26 |
Natteravn
Join Date: Nov 2009
Location: Herford / Germany
Posts: 2,546
|
Thanks. I'm aware of that. I'm using these routines only in games, which take over the system and do nothing else than trackloading from 3.5" disk drives.
|
07 November 2019, 11:27 | #27 | ||||
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,491
|
Thank you all for your answers and suggestions.
I just have to digest the whole thing (I still have some doubts ). Quote:
Quote:
But for a proper patched game, ICACHE would be better to be on (to speed-up unpacking and similar). Seems that some 040/060 Amiga accelerators use MMU right at cold start, so a proper handling is due. Quote:
There are a couple of instructions that I am not convinced of, but I have to do some tests first and then ask you questions Quote:
But how the global DCACHE active bit affect the memory access? i.e if I global disable data cache, the access could be still non-serialized? Cheers. |
||||
07 November 2019, 15:19 | #28 | |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,317
|
Quote:
|
|
07 November 2019, 16:03 | #29 |
ex. demoscener "Bigmama"
Join Date: Jun 2012
Location: Fyn / Denmark
Posts: 1,643
|
So given the above discussion, let's assume that the MMU needs to be enabled and have the right tables loaded for best performance and correct operation on 040/060. Is this done by the kickstart, or is Setpatch/680x0.library required? If the latter, this complicates things when trackloaded from the bootblock.
|
07 November 2019, 16:45 | #30 | ||
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,491
|
Quote:
Quote:
So probably the best thing at cold/bootblock startup is to enable the instruction cache and disable the data cache globally and in addition disable the MMU (just in case). At this point, however, it is useless to set the xTT registers because in the case of icache it makes no difference whether setup is WriteThrough or CopyBack since access to the bus is unidirectional, and the dcache is unused for any addressing. Only one thing is not fully clear to me: in the aforementioned case, access to chip ram chip and custom register is guaranteed to be serialized? |
||
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
A1000 68020 | Marchie | support.Hardware | 6 | 10 November 2017 12:08 |
68020+ instruction timings? | oRBIT | Coders. Asm / Hardware | 3 | 23 September 2017 12:38 |
Overclocking 68020? | Marchie | support.Hardware | 8 | 11 October 2016 13:33 |
68020 33 MHz | Leandro Jardim | support.WinUAE | 2 | 02 January 2012 19:21 |
Questions about 68020 CE | Maren | support.WinUAE | 11 | 09 December 2009 21:01 |
|
|