English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 23 August 2018, 17:22   #181
grond
Registered User

 
Join Date: Jun 2015
Location: Germany
Posts: 653
Quote:
Originally Posted by plasmab View Post
Oh i see you're assuming the CPU only uses 16 bit wide instructions.. thats not always the case. often it has to wait for the operand to even start the execution (e.g. a full address load is potentially 3 x 16 bits wide... = (4+4+4) 12 clock cycles ? whereas with a faster bus you'd get that in 2+2+2+(2 for execution) = 8.
It cannot work like this in such a simple and small design. If you made the fetching of N-word instructions faster because for N-1 of the fetch cycles there is nothing left to execute, you would have to build a real pipeline with several stages and variable-length execution stage. This simply was impossible within the tight limitations of 1979. Always remember: just 68000 gates! And the designers were so proud they used so many that (according to a popular myth) they named the processor after it!
grond is offline  
Old 23 August 2018, 17:24   #182
plasmab
Banned
plasmab's Avatar
 
Join Date: Sep 2016
Location: UK
Posts: 2,917
Quote:
Originally Posted by grond View Post
It cannot work like this in such a simple and small design. If you made the fetching of N-word instructions faster because for N-1 of the fetch cycles there is nothing left to execute, you would have to build a real pipeline with several stages and variable-length execution stage. This simply was impossible within the tight limitations of 1979. Always remember: just 68000 gates! And the designers were so proud they used so many that (according to a popular myth) they named the processor after it!
Erm no. When the first word is decoded you know you need N words for the operand. Since the prefetch is already getting the next one you need you instruct it to keep going.

Its very simple. Its sad that you lack the ability to see this. (there is no pipeline.. just the argument fetch).

EDIT: Clearly the CPU was good enough. But this is my frustration. It could have been better very easily. Other CPUs of the time did this.

Last edited by plasmab; 23 August 2018 at 17:29.
plasmab is offline  
Old 23 August 2018, 17:33   #183
ross
Per aspera ad astra

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 49
Posts: 2,166
Quote:
Originally Posted by meynaf View Post
By the way, just out of curiosity, what else would you use if available ?
In no particular order, and some are of very little usefulness (and if I think about it better, surely others would come):

- a scrambler for bytes position (ok 'swap', but is so limited..)
- a byte indexer (something like d(ax,dx.b))
- scaled index (020+ covered, but not for byte..)
- a better usage for the low bit bcc offset (in a word-aligned architecture why permit to jmp on odd adress?!?); yes I know this is not so simple but sure possible
- a quick form for variable shift/rol operation on [ea]
- a bits.x 'flipper', for fast mirror effect
- some simple form of bitfield instruction (but 020+ covered it in a very good way )
- an scc for set a single bit can be useful
- ALU instructions with saturation
- cmove
ross is offline  
Old 23 August 2018, 17:36   #184
plasmab
Banned
plasmab's Avatar
 
Join Date: Sep 2016
Location: UK
Posts: 2,917
Or i can explain it so that even grond has a chance to understand.

68000 - JSR $40000000

Bus Unit = BU
Execution Unit = EU.

HTML Code:
Cycle | 0      | 1      | 2      | 3      |  4     | 6       | 7 
BU     | Wait | Read | Wait | Read | Wait | Read | Wait  |
EU     |*Wait| Wait  | Wait | Wait  | Wait | Wait  | Exec |
DATA | ???? | 4EB9 |  ???? | 4000 | ???? | 0000 | ???? |
* = Maybe execute last instruction

With a better bus unit this could have been the outcome.

68000++ (TF Version) JSR $40000000

Code:
Cycle | 0      | 1      | 2      | 3      |  4    |
BU     | Wait | Read | Read | Read | Wait |
EU     |*Wait| Wait  | Wait | Wait  | Exec |
DATA | ???  | 4EB9  | 4000| 0000 | ???? |
* = Maybe execute last instruction

The point is the CPU does this anyway.. just takes longer.

Last edited by plasmab; 23 August 2018 at 17:49.
plasmab is offline  
Old 23 August 2018, 17:50   #185
meynaf
son of 68k
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 46
Posts: 3,600
Quote:
Originally Posted by ross View Post
In no particular order, and some are of very little usefulness (and if I think about it better, surely others would come):
How strange, most of them are suggestions i made quite a long time ago


Quote:
Originally Posted by ross View Post
- a scrambler for bytes position (ok 'swap', but is so limited..)
Coldfire's byterev ?
Move to/from register parts ?
(Generic byte scrambler inside registers would have a big shortcoming : must transfer from memory to register before moving at right place.)


Quote:
Originally Posted by ross View Post
- a byte indexer (something like d(ax,dx.b))
Oh yeah ! Missing since the first day i started coding on 68000 !
(Besides, i would prefer an unsigned version.)


Quote:
Originally Posted by ross View Post
- a better usage for the low bit bcc offset (in a word-aligned architecture why permit to jmp on odd adress?!?); yes I know this is not so simple but sure possible
The design reason was that they didn't want to close the door to byte encoding...


Quote:
Originally Posted by ross View Post
- a quick form for variable shift/rol operation on [ea]
Quite useful. But not really encodable in a clean way for 68k.


Quote:
Originally Posted by ross View Post
- a bits.x 'flipper', for fast mirror effect
Like coldfire's bitrev instruction, but extended to all 3 sizes ?


Quote:
Originally Posted by ross View Post
- some simple form of bitfield instruction (but 020+ covered it in a very good way )
Reverse mode added to bitfields, for counting like btst ?


Quote:
Originally Posted by ross View Post
- an scc for set a single bit can be useful
I have bs<cc> macros in some include file
Main problem for this one is to find an encoding.


Quote:
Originally Posted by ross View Post
- ALU instructions with saturation
Like convert longword to signed/unsigned byte or word ?


Quote:
Originally Posted by ross View Post
- cmove
Hmm... The only one i don't see the point.
After all, it's just a branch around a move...
meynaf is offline  
Old 23 August 2018, 18:20   #186
roondar
Registered User

 
Join Date: Jul 2015
Location: The Netherlands
Posts: 1,364
Quote:
Originally Posted by meynaf View Post
How strange, most of them are suggestions i made quite a long time ago

Coldfire's byterev ?
Move to/from register parts ?
(Generic byte scrambler inside registers would have a big shortcoming : must transfer from memory to register before moving at right place.)
A move or similar command that would allow direct access to the individual bytes in a register is something I would personally find very handy.
roondar is offline  
Old 23 August 2018, 18:21   #187
plasmab
Banned
plasmab's Avatar
 
Join Date: Sep 2016
Location: UK
Posts: 2,917
Quote:
Originally Posted by roondar View Post
A move or similar command that would allow direct access to the individual bytes in a register is something I would personally find very handy.
Yes. God yes.

Thats always a PITA. Even in C/C++ land.
plasmab is offline  
Old 23 August 2018, 19:36   #188
ross
Per aspera ad astra

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 49
Posts: 2,166
Quote:
Originally Posted by meynaf View Post
- a scrambler for bytes position
Move to/from register parts ?
Yes, this suffice. Possibly extended to [ea]

Quote:
- a bits.x 'flipper', for fast mirror effect
Like coldfire's bitrev instruction, but extended to all 3 sizes ?
Yes!

Quote:
- some simple form of bitfield instruction
Reverse mode added to bitfields, for counting like btst ?
This or some simple nibble operation..

Quote:
- ALU instructions with saturation
Like convert longword to signed/unsigned byte or word ?
https://en.wikipedia.org/wiki/Saturation_arithmetic
ADDS can suffice

Quote:
- cmove
Hmm... The only one i don't see the point.
After all, it's just a branch around a move...
The point is that there is no branch! (a really slow operation..).
Often you want to insert a value into a variable only to a specific condition.
With CMOVE you can without branch
ross is offline  
Old 23 August 2018, 19:55   #189
meynaf
son of 68k
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 46
Posts: 3,600
Quote:
Originally Posted by ross View Post
Yes, this suffice. Possibly extended to [ea]
Right, move register part to/from ea was the idea.
Allowing not only bytes but also the high word is in my view too.


Quote:
Originally Posted by ross View Post
Not sure. I've done arithmetic with saturation on ops like dct, and if you want good precision then you'd better use a larger data type than the one you target, then clamp at the very end.


Quote:
Originally Posted by ross View Post
The point is that there is no branch! (a really slow operation..).
Often you want to insert a value into a variable only to a specific condition.
With CMOVE you can without branch
So it's only a matter of speed ?
Then this can be transparently done in hardware, by "fusing" the branch with the move operation.
meynaf is offline  
Old 23 August 2018, 20:29   #190
ross
Per aspera ad astra

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 49
Posts: 2,166
Quote:
Originally Posted by meynaf View Post
Not sure. I've done arithmetic with saturation on ops like dct, and if you want good precision then you'd better use a larger data type than the one you target, then clamp at the very end.
Yes, this would be an extra, in fact you can do without it and it can only be useful for some simple and fast SIMD operations.

Quote:
So it's only a matter of speed ?
Then this can be transparently done in hardware, by "fusing" the branch with the move operation.
A matter of speed and not small.
In my opinion, even in two-digit percentages because useful in tight loops.
ross is offline  
Old 23 August 2018, 21:33   #191
touko
Registered User

touko's Avatar
 
Join Date: Dec 2017
Location: france
Posts: 96
Just for the record, the hu6280 (the pcengine's CPU which is a 65C02 variant) has some bloc transfert opcodes, and you can go up to 19833 bytes/frame (@60htz) .

Last edited by touko; 23 August 2018 at 22:01.
touko is offline  
Old 24 August 2018, 18:17   #192
grond
Registered User

 
Join Date: Jun 2015
Location: Germany
Posts: 653
Quote:
Originally Posted by plasmab View Post
Or i can explain it so that even grond has a chance to understand.
You are as rude as you are ignorant.


Quote:
68000 - JSR $40000000
Have you ever had a look at the clock cycles this takes on the 68000?

JSR (An) -- 16c
JSR $xxxx.w -- 18c
JSR $xxxxxxxx.l -- 20c

This means that all the stuff the JSR does takes 16 cycles in execution. The loading of each extra instruction word adds 2 cycles per instruction word. Big deal. You cut the execution down to 1 cycle to make the difference look big but in reality it was nearly irrelevant.
  • Could a 16bit implementation of the 68000 ISA be conceived that is faster than the original 68000? Yes, of course!
  • Could it be done with 1979 technology? No, it could not.

Just imagine how processor development was done in 1977-79:
  • No graphics workstations to use for development, you had to reserve time on a mainframe if you wanted to simulate something and then wait hours to days for the simulation results of even small circuits.
  • You had to draw transistor level schematics of small functional units by hand and convert them into typed netlists to put into the simulator. An experienced developer would rather calculate the circuit by hand.
  • You had to draw layouts by hand.
  • No place-and-route tools.
  • Very few metal layers so that you would sometimes have to route signals through semiconductor layers instead => bad timing.

Then you fail to understand that doubling buswidth wasn't just a copy'n'paste thing in the 70s. Just have a look at the size of a DIP 68000 and at a DIP 6502. Think about how much more capacitive load there must be on each pin of the 68000 in the huge housing. About how much more driving power it would need to drive that at the same clock frequency as a smaller capacitive load.

You don't understand that a finer granularity of the 68000's microprogram to squeeze out a few wasted cycles would have meant increasing the size of the microprogram. There would not have been enough space to store that larger microprogram on the die. What were the wafer sizes of the time? 4 inches? How many dice could they fit on one wafer?

In short: you have no clue about processor design in the historical and technical context.

If you look at something that real experts in the field did and that was considered a great achievement at its time, it is safe to assume that they understood what they did and why they did it and you as a mere hobbyist don't.
grond is offline  
Old 24 August 2018, 18:24   #193
plasmab
Banned
plasmab's Avatar
 
Join Date: Sep 2016
Location: UK
Posts: 2,917
68k details

people like you said that they couldn’t go to the moon in the 1960s.. yet it was done.

I’m laughing that you can’t see this.

EDIT: sure it wouldn’t have made the CPU 100x faster. I never said it would. I sit staring at 4 clocks per bus cycle and look at ways to make that shorter. It’s what I do.

I don’t care about the history. It’s not about blaming people for not doing it. It’s a wish/frustration/fantasy because the chip is exactly the same today.... by that I mean the ones rolling out of the fabs at this very moment.

Again I’m laughing at you for not seeing this.

And if you don’t want me to be rude you try being less condescending in your replies. Pot calling kettle black there mate.

Last edited by plasmab; 24 August 2018 at 18:40.
plasmab is offline  
Old 24 August 2018, 18:49   #194
plasmab
Banned
plasmab's Avatar
 
Join Date: Sep 2016
Location: UK
Posts: 2,917
Quote:
Originally Posted by grond View Post
You are as rude as you are ignorant.
Mods please! Name calling is something I’ve been pulled up for. Consistency please!

Quote:


In short: you have no clue about processor design in the historical and technical context.



If you look at something that real experts in the field did and that was considered a great achievement at its time, it is safe to assume that they understood what they did and why they did it and you as a mere hobbyist don't.


Of course I’m not an expert here. I barely born then. I never said I was. I just said I wanted it and that it would have improved things... and that other chips (albiet less sophisticated ones could do this at the time).

I maintain it would have improved things. Perhaps not enough to justify the expense of it but it would have made it better.
plasmab is offline  
Old 24 August 2018, 20:24   #195
plasmab
Banned
plasmab's Avatar
 
Join Date: Sep 2016
Location: UK
Posts: 2,917
I realised I meant JMP in my example by the way. Rather than JSR which of course would push something to the stack too... taking another 8 clock cycles... 4 more than necessary.
plasmab is offline  
Old 24 August 2018, 20:38   #196
modrobert
old bearded fool

modrobert's Avatar
 
Join Date: Jan 2010
Location: Bangkok
Age: 52
Posts: 525
Lots of positive vibes in this thread, improving for each page, hehe.
modrobert is offline  
Old 24 August 2018, 20:46   #197
plasmab
Banned
plasmab's Avatar
 
Join Date: Sep 2016
Location: UK
Posts: 2,917
68k details

Quote:
Originally Posted by modrobert View Post
Lots of positive vibes in this thread, improving for each page, hehe.


You aren’t allowed to have an opinion here. If you aren’t steeped in the archaeology of chip design tools you have no place voicing an opinion. You are ignorant and rude to suggest that it would have been nice to have had things differently. To criticise the way Motorola designed its chips is blasphemous!!! Blasphemy I tell you!!



There was a similar regime in Spain... run by the church I believe... wasn’t expecting them.

Last edited by plasmab; 24 August 2018 at 20:55.
plasmab is offline  
Old 24 August 2018, 21:22   #198
pipper
Registered User

 
Join Date: Jul 2017
Location: San Jose
Posts: 144
Question: is the Amiga is at least using those ,empty bus cycles’ for Agnus‘ dma activity? Or am I conflating two things here?
pipper is offline  
Old 24 August 2018, 21:26   #199
plasmab
Banned
plasmab's Avatar
 
Join Date: Sep 2016
Location: UK
Posts: 2,917
68k details

Quote:
Originally Posted by pipper View Post
Question: is the Amiga is at least using those ,empty bus cycles’ for Agnus‘ dma activity? Or am I conflating two things here?

At the risk of getting shot down ... they are used when the video, audio or blitter needs them. Other times they aren’t used. E.g with sound off and no blitting going on they are wasted in the vertical blanking period.

Hope my ignorant answer isn’t too rude!

EDIT: add floppy to that list
plasmab is offline  
Old 24 August 2018, 21:35   #200
ross
Per aspera ad astra

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 49
Posts: 2,166
Add Memory Refresh, Copper and Sprites to the list.
ross is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Any software to see technical OS details? necronom support.Other 3 02 April 2016 13:05
2-star rarity details? stet HOL suggestions and feedback 0 14 December 2015 06:24
EAB's FTP details... Basquemactee1 project.EAB File Server 2 30 October 2013 23:54
req details for sdl turrican3 request.Other 0 20 April 2008 23:06
Forum Details BippyM request.Other 0 15 May 2006 01:56

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 01:12.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2019, vBulletin Solutions Inc.
Page generated in 0.13002 seconds with 16 queries