English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 27 August 2018, 15:32   #241
roondar
Registered User

 
Join Date: Jul 2015
Location: The Netherlands
Posts: 1,297
Quote:
Originally Posted by plasmab View Post
Indeed. The 020 doesn’t waste cycles so the chipset will be inserting waits.. as it should. It won’t wait accessing fastram though. The plain 68K would still access fastram like there was interleaved stuff going on.

This is the entire point.
I still don't really agree.

Which is to say, I do agree the CPU has large cycle counts per instruction, but I don't agree that the memory fetch is the only reason why this is. This is not to say the memory fetch is fast, but I don't agree that improving it the way you've suggested (i.e. memory bus = 1 cycle per word instead of 4) would've actually sped up the CPU all that much, if at all, given how the 68000 goes about executing instructions.

The TL;DR version of the stuff below is: the 68000's prefetch means that improving RAM accesses (even when you get it as low as 1 cycle per access) is only viable if the rest of the CPU is also radically changed, because it in reality uses most (if not all) of the 4 cycle period for internal processing of instructions and requires more than 4 cycles to do so for more complicated instructions.

As for why this post got so long, well... I kinda got carried away. Some part of me finds this all way too interesting to be healthy

Long stuff follows:
My point here is that the 68000 does prefetching (read more here: http://pasti.fxatari.com/68kdocs/68kPrefetch.html). It can read and decode the next instruction while the current one is being executed. This means that during a 4 cycle memory fetch the CPU also executes the current instruction, unless more memory needs to be fetched.

As a very simple example (from the link above):
Quote:

Before anything else the NOP opcode needs to be fetched. Every 68000 bus access requires a minimum of four machine cycles. So nothing could be done for four full cycles. Then the opcode needs to be decoded. This would require another two cycles. And finally the instruction would be executed. In this case it would be just to update the PC and would take another two cycles (every 68000 internal processing requires at least two machine cycles). Note that no step could be performed before the previous one is finished.


So a NOP would have taken 8 cycles, twice as much as it really requires thanks to the prefetch.
Note here that even the simplest of instructions takes 4 internal cycles which all execute while the processor is also busy fetching new data. This essentially means that even if we could shorten the memory access per word to just 1 cycle, the NOP above would still take four cycles because the rest of the CPU can't execute it any faster and is already running concurrently with the fetch.

For me this leads to some conclusions/points/questions:
1) any 4 cycle instruction is not going to be sped up no matter what is changed to the memory interface because the 68000 spends at least that many cycles internally executing even the simplest of instructions (see prefetch document and the NOP example above).

Now, there are not many 4 cylce instructions, but in practice they are rather common in code.

2) It seems some extra internal processing is needed for instructions dealing with 32 bits a opposed to 16 bit.

Commands like add.w dx,dx vs add.l dx,dx and add.w dx,an* suggest 32 bit operations take a bunch more cycles even if no extra memory is accessed. This suggest to me that instructions operating on 32 bits are not likely to speed up with faster memory access even if they do fetch more words.

3) meanwhile, move.w dx,d(an), an instruction that essentially does a add.w dx,an as part of its execution also cost 8 cycles, suggesting that the extra cycles the add costs are all done internally as part of the move to memory and it wouldn't actually speed up if the memory would be fetched any faster.

4) there are a whole bunch of more complicated commands that are also probably unlikely to find any speed up (f.ex. jsr/bsr/rte/trap/etc) due to the similar reasons - each requires the ALU in some capacity.

Note for instance that a bsr <offset.w> (which does 4 memory accesses) is only 2 cycles faster than a jsr <addr.l> (which does 5 memory accesses)*. If the 68000 was purely limited by bus access and not also limited by the ALU, the difference here should have been 4 cycles.

5) all this said, a few commands might see a performance increase with a new bus design, dependent on how the 68000 internal operation works. Stuff like move.l dx,xxxx.l might speed up, as are the movem.w/l commands.

However, I must note here that it's not that easy to know exactly how many cycles are spend by the 68000 internally on these commands.

We do know this number is never going to be zero. So even with these fetch-heavy commands it's still possible the speed up will be fairly limited or even non-existant (f.ex. movem.w might need two cycles per word for pc updating and two cycles per word for moving the required data register into the output register, in which case it won't see any speed ups at all even if memory accesses only took 1 cycle)

With all this in mind, I can see why you might think the 68000 has a poor memory interface, but I feel this is born in part due to not also looking at what the rest of the CPU is actually doing.

As it is, the 68000 memory interface is fast enough for what the rest of the CPU can keep up with. Which is really all that is ever needed, anything faster is IMHO just bad design (in the form of over engineering).

*)
Code:
add.w dx,dy = 4 cycles
add.l dx,dy = 8 cycles
add.w dx,an = 8 cycles ; this auto extends to operating on the full 32 bits - no adds/subs/moves on address registers are ever truly 16 bit on a 68000.
move. w dx,d(an) = 8 cycles

jsr xxxx.l = 20 cycles (3 reads/2 writes)
bsr xxxx.w = 18 cycles (2 reads/2 writes)
Cycle times as per http://oldwww.nvg.ntnu.no/amiga/MC68...timjmpetc.HTML
roondar is offline  
Old 27 August 2018, 15:44   #242
plasmab
Banned
plasmab's Avatar
 
Join Date: Sep 2016
Location: UK
Posts: 2,917
68k details

TL;DR

I never said it was the only reason. Just that it idles the bus half the time. When you think of things like a 32 bit reg write to ram.. that’s easy mode to do in 4 clock cycles. 68000 takes 8.

It’s not the whole story. But it would help.
plasmab is offline  
Old 27 August 2018, 15:49   #243
plasmab
Banned
plasmab's Avatar
 
Join Date: Sep 2016
Location: UK
Posts: 2,917
68k details

I actually implemented an Atari ST from scratch in verilog in 2012. I had to use DDR2 ram and HDMI out on the board I was using.... it was a challenge. The core was ultimately cherry picked for the best bits and they ended up in MiST.

I’ve seen and measured the effects of increasing the CPU ram speed (the micro coded ao68000).

So I really do know what I’m talking about.

So agree or disagree all you want.. until you build what I’m describing my opinion will not change. I have built and measured. Therefore I know.

[ Show youtube player ]

It’s not an entirely fair test because the cycle timings aren’t always the same. But it’s good enough to prove the bus is baws.

Last edited by plasmab; 27 August 2018 at 16:10.
plasmab is offline  
Old 27 August 2018, 16:31   #244
roondar
Registered User

 
Join Date: Jul 2015
Location: The Netherlands
Posts: 1,297
Quote:
I never said it was the only reason. Just that it idles the bus half the time. When you think of things like a 32 bit reg write to ram.. that’s easy mode to do in 4 clock cycles. 68000 takes 8.

It’s not the whole story. But it would help.
Apart from the fact it probably wouldn't help unless the rest of the CPU was completely redesigned to make your scheme work, but hey you didn't read the part of my post where I pointed that out, complete with evidence so I guess you just didn't realise that bit.

Quote:
So I really do know what I’m talking about.
Arguing from authority does not make one correct.

In this case, considering you're not even trying to explain how your improved bus would overcome the other architectural problems the 68000 has means you're proving nothing of the sort.

Oh and reimplementing an architecture in Verilog with all the niceties of modern technology doesn't mean anything for whether or not your scheme was possible to do in 1979. Considering it is apparently easy mode but still not done...

That's a pretty strong indicator it wasn't possible. Much stronger than your position they should've done it because you reimplemented their work 35 years later and managed to change it at any rate.
roondar is offline  
Old 27 August 2018, 16:36   #245
plasmab
Banned
plasmab's Avatar
 
Join Date: Sep 2016
Location: UK
Posts: 2,917
68k details

Your post was a novel. I didn’t read it.

Yes you’d need to redesign the chip. Never contested that.
plasmab is offline  
Old 27 August 2018, 16:42   #246
plasmab
Banned
plasmab's Avatar
 
Join Date: Sep 2016
Location: UK
Posts: 2,917
68k details

Quote:
Originally Posted by roondar View Post
Apart from the fact it probably wouldn't help unless the rest of the CPU was completely redesigned to make your scheme work, but hey you didn't read the part of my post where I pointed that out, complete with evidence so I guess you just didn't realise that bit.


Arguing from authority does not make one correct.

In this case, considering you're not even trying to explain how your improved bus would overcome the other architectural problems the 68000 has means you're proving nothing of the sort.

Oh and reimplementing an architecture in Verilog with all the niceties of modern technology doesn't mean anything for whether or not your scheme was possible to do in 1979. Considering it is apparently easy mode but still not done...

That's a pretty strong indicator it wasn't possible. Much stronger than your position they should've done it because you reimplemented their work 35 years later and managed to change it at any rate.


I never said they should have done this!!! I’m saying it would have improved things if they had.

Are you unable to appreciate the distinction?
plasmab is offline  
Old 27 August 2018, 16:45   #247
roondar
Registered User

 
Join Date: Jul 2015
Location: The Netherlands
Posts: 1,297
Quote:
Originally Posted by plasmab View Post
I never said they should have done this!!! I’m saying it would have improved things if they had.

Are you unable to appreciate the distinction?
I am indeed able to do so. And if you had read my novel, you'd know my main point in said novel was that it actually wouldn't have improved anything to any great degree (without that redesign I pointed out).

Unless they did do a full redesign. And my point was that I don't buy that this was possible at the time, so it's silly to argue that it could've been better.
roondar is offline  
Old 27 August 2018, 16:45   #248
plasmab
Banned
plasmab's Avatar
 
Join Date: Sep 2016
Location: UK
Posts: 2,917
Ok. You clearly are missing my point. I shall disengage now.
plasmab is offline  
Old 27 August 2018, 16:55   #249
roondar
Registered User

 
Join Date: Jul 2015
Location: The Netherlands
Posts: 1,297
To be fair I feel both of us are missing a point as well as making a point here.
But whatever. You are correct in that this has gone on long enough.

So we do agree on some things :P



Edit: I'm going to go one further and simply say this. Need to get it of my chest.



First: I am not a hardware expert. It's best I don't talk about such things because well, I'll get a lot wrong.

Second: What happened in this thread doesn't paint the best picture of me. I went into raging fanboy mode. I'm not even sure why, but I did. I actually got angry. Over a silly forum thread about ages old processors no one really cares about. It's not worth it, so...

To prevent me from posting yet more overly complicated rants in this thread, I'm just going to leave this thread now. I never wanted or intended to create friction and I feel I may have done.

I'll keep to threads about software from now on. At least there I can say I do know a few things.

Last edited by roondar; 27 August 2018 at 17:20.
roondar is offline  
Old 28 August 2018, 08:17   #250
drHirudo
Amiga user
drHirudo's Avatar
 
Join Date: Nov 2008
Location: Sofia / Bulgaria
Posts: 210
Quote:
Originally Posted by roondar View Post
To be fair I feel both of us are missing a point as well as making a point here.
But whatever. You are correct in that this has gone on long enough.

So we do agree on some things :P
The point is the 68000 processor was designed to be easy to use with high level languages and compilers, hence the so many registers, CISC architecture, relocatable stack and so on.
drHirudo is offline  
Old 28 August 2018, 19:05   #251
touko
Registered User

touko's Avatar
 
Join Date: Dec 2017
Location: france
Posts: 93
Quote:
I'm not sure a HU6820 can be called a 'normal' 6502.
You're right, it's in fact a custom 65C02 and not a vanilla 6502 .

Quote:
When I called it 'basically a faster 6502 with a block move' on my website I got a number of e-mails of people telling me the differences where much bigger than I had anticipated.
Yeah, even if it belongs of the 65xxx familly, you can classify it of a better 65C02, which is more powerfull than a classic 6502 but is still a 8 bit CPU .
But the ram access time are more or less identical, and taken in account if you are comparing 68k access time and 65xxx ones .
touko is offline  
Old 29 August 2018, 11:22   #252
hooverphonique
ex. demoscener "Bigmama"
 
Join Date: Jun 2012
Location: Fyn / Denmark
Posts: 949
Quote:
Originally Posted by plasmab View Post
Indeed. You just wire the address up backwards. Easy mode.

Didn’t think the A3000 did this. I saw a video interviewing Haynie and he mentioned it never made it in?

I’ve never owned an A3000

Neither have I


Yes, the buster chip supports static column page mode, and the kickstart enables it (and cache burst) if you have an 030, as far as I'm aware.
hooverphonique is offline  
Old 29 August 2018, 16:39   #253
chaos
Registered User

chaos's Avatar
 
Join Date: Mar 2013
Location: Slovenia
Posts: 135
Quote:
Originally Posted by plasmab View Post
I actually implemented an Atari ST from scratch in verilog in 2012. I had to use DDR2 ram and HDMI out on the board I was using.... it was a challenge. The core was ultimately cherry picked for the best bits and they ended up in MiST.
Is this true? First time I hear it. I was under the impression that Till Harbaum wrote The ST core for the MiST board, basically from scratch. I made a quick check and you aren't attributed in the source files. Is there anywhere I could see your ST code, for comparison?
chaos is offline  
Old 29 August 2018, 16:41   #254
plasmab
Banned
plasmab's Avatar
 
Join Date: Sep 2016
Location: UK
Posts: 2,917
Quote:
Originally Posted by chaos View Post
Is this true? First time I hear it. I was under the impression that Till Harbaum wrote The ST core for the MiST board, basically from scratch. I made a quick check and you aren't attributed in the source files. Is there anywhere I could see your ST code, for comparison?
*cough*

Didnt look very hard did you...

https://github.com/mist-devel/mist-b...st/mfp_timer.v

also...

https://github.com/mist-devel/mist-b.../Contributions


You really didnt look very hard. Gotta love that accusation from someone who pretended to check.

EDIT: As i say Till Cherry picked some bits from my core which I abandoned when i realised how much more advanced his was.

Last edited by plasmab; 29 August 2018 at 17:03.
plasmab is offline  
Old 29 August 2018, 17:12   #255
chaos
Registered User

chaos's Avatar
 
Join Date: Mar 2013
Location: Slovenia
Posts: 135
Chill man, no one is accusing you of anything. Like I wrote, I just made a quick check - that is opened some files, and looked at the copyright notice.

I have no stake in this core btw, I was just curious... And hoped that there might be a different (maybe better) implementation of the ST core ...
chaos is offline  
Old 29 August 2018, 17:25   #256
plasmab
Banned
plasmab's Avatar
 
Join Date: Sep 2016
Location: UK
Posts: 2,917
Quote:
Originally Posted by chaos View Post
Chill man, no one is accusing you of anything. Like I wrote, I just made a quick check - that is opened some files, and looked at the copyright notice.

I have no stake in this core btw, I was just curious... And hoped that there might be a different (maybe better) implementation of the ST core ...
I've not built that core in 5 years. Tills is by far the best. I abandoned the ST core and developed the Archimedes core for MiST instead to prevent duplication of effort.
plasmab is offline  
Old 29 August 2018, 19:41   #257
Megol
Registered User

Megol's Avatar
 
Join Date: May 2014
Location: inside the emulator
Posts: 368
Quote:
Originally Posted by plasmab View Post
You really didnt look very hard. Gotta love that accusation from someone who pretended to check.
Stop being confrontational please thank you very much. It's irritating.

Last edited by Megol; 29 August 2018 at 20:12. Reason: It's better this way
Megol is offline  
Old 29 August 2018, 20:23   #258
plasmab
Banned
plasmab's Avatar
 
Join Date: Sep 2016
Location: UK
Posts: 2,917
Quote:
Originally Posted by Megol View Post
Stop being confrontational please thank you very much. It's irritating.

It’s also irritating and confrontational when someone starts a post with “Is this true? First time I hear it.”
plasmab is offline  
Old 29 August 2018, 21:45   #259
Bruce Abbott
Registered User

Bruce Abbott's Avatar
 
Join Date: Mar 2018
Location: Hastings, New Zealand
Posts: 254
Quote:
Originally Posted by plasmab View Post
it idles the bus half the time. When you think of things like a 32 bit reg write to ram.. that’s easy mode to do in 4 clock cycles. 68000 takes 8.
You could argue that the 68000 should be able to do it in 4 cycles (2 cycles per memory access) but by that reasoning the 6502 should be able to do it in half a cycle. The truth is that the 68000 and 6502 both 'idle' the bus half the time, which is fortunate because we can use that 'idle' time to interleave video display access.

The Z80 is an example of a CPU that doesn't 'idle' the bus half the time, which is why some memory cycles take less than 4 T states (eg. LD HL,nn is 3 memory cycles but only 10 T states). But this turns out to be a pain because you can't share video memory synchronously without forcing the CPU into always using 4 T states per memory cycle. And if it misses by just 1 T state it might have to wait for another 3, which significantly slows down some instructions.

The first computer I designed from scratch used a 1MHz 6802 and a MC6847 Video controller. I implemented an interleaved shared memory system where the 6847 accessed RAM during the first half of the CPU cycle when the E clock is low. To do this I made a basic PLL circuit which 'bumped' the 6847's clock until it was synchronized with the 6802. This solution was simple and elegant, and by using the latest 2K CMOS static RAM chips I also avoided having to make a DRAM controller.

You can't do that with a Z80. You either have to stretch the Z80's clock input like the ZX Spectrum does, or add wait states like the Amstrad CPC and Mattel Aquarius do, or don't bother synchronizing and get screen corruption whenever the CPU accesses video memory like the VZ200 did.

The Amiga's 68000 bus interface is actually pretty efficient considering the limitations of RAM cycle time, CPU random access (which prohibits using page mode) etc. All the 'idle' slots can be used by the custom chips, except for a small amount of 'wasted' time during horizontal and vertical blanking. The blitter shares slots with the CPU but only when both are accessing chip RAM, and you have control over how much the CPU has to give up.

If you want to see some horrible bus bus designs then look at other home computers such as the Atari 800 and C64. The Atari is so bad that it has to blank the screen while accessing the floppy drive! The C64 manages to avoid this, but only by dramatically slowing down the drive. I already mentioned the Acorn Electron, which uses a miserable four 64kx1 DRAMs which have to do two memory cycles for each byte - making the CPU up to 6 times slower in 'high' resolution modes. And early 16 bit PCs weren't that great either, since the 8086 has many 8 bit instructions which misalign the bus about half the time.
Bruce Abbott is offline  
Old 29 August 2018, 21:47   #260
Gorf
Registered User

 
Join Date: May 2017
Location: Munich/Bavaria
Posts: 809
Quote:
Originally Posted by plasmab View Post
Indeed. You just wire the address up backwards. Easy mode.

Didn’t think the A3000 did this. I saw a video interviewing Haynie and he mentioned it never made it in?

I’ve never owned an A3000
It does, memory bandwidth proves it - proud owner of two A3000
Gorf is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Any software to see technical OS details? necronom support.Other 3 02 April 2016 12:05
2-star rarity details? stet HOL suggestions and feedback 0 14 December 2015 05:24
EAB's FTP details... Basquemactee1 project.EAB File Server 2 30 October 2013 22:54
req details for sdl turrican3 request.Other 0 20 April 2008 22:06
Forum Details BippyM request.Other 0 15 May 2006 00:56

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 09:56.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2019, vBulletin Solutions Inc.
Page generated in 0.11077 seconds with 14 queries