14 December 2020, 18:26 | #21 | ||||||||
Registered User
Join Date: May 2017
Location: Munich/Bavaria
Posts: 2,425
|
nope.
it is just a 32bit write to the "other" halve of a 64bit register, as mentioned before. the former hidden entry is overwritten and becomes visible. The old entry of the register is now on the hidden part. Quote:
In my case EXG d0,d1 could decode to move d0,d1 moveh d1,d0 (moveh beging move of hidden part of the register as micro-code, that is not exposed as op-code..) That is of course up to the actual hardware implementation, if there ever is one... Quote:
Quote:
addnib color1, color2 And your solved the problem of the other thread with just one operation ... I call that a trick. Quote:
Quote:
Quote:
Quote:
You can speed up things, by stopping the log and not use that command in your interrupts handler and/or kernel, if that is what you prefer. Quote:
Last edited by Gorf; 14 December 2020 at 18:56. |
||||||||
14 December 2020, 19:50 | #22 | |||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,351
|
Quote:
Quote:
In HW you don't need logical gates to change the order of bits. Quote:
The operations were there before. They are general purpose. They have a defined encoding. I simply used them. In less than a minute i can fire up my vm debugger, assemble these instructions somewhere in memory, and trace them to see their effect. If you call that a trick, we'd better stop this discussion right now. Still more work than reading "normal" code. Quote:
Quote:
In your starting post you said you wanted to know how useful your ideas would be for real assembler coders. You have your answer, even though it's not at all the one you wanted. |
|||||
14 December 2020, 22:11 | #23 | ||||||
Registered User
Join Date: May 2017
Location: Munich/Bavaria
Posts: 2,425
|
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
You already said earlier: "And as a coder i don't see the point in any of the 3, sorry." And I did not object to that - I was just answering your questions and clarifying things. So thank you for your feedback and for thinking about my ideas. And since you are the only one who did, I guess there is no benefit for others as well. |
||||||
14 December 2020, 22:36 | #24 | |
Registered User
Join Date: Mar 2012
Location: Norfolk, UK
Posts: 1,157
|
Quote:
On the FPGAs I'm targetting the block RAM size is typically 9kbit so a 16 x 32bit register file is quite wasteful; making use of those otherwise wasted bits would be interesting so your stack idea did pique my interest. A simpler idea might be simply to give the CPU a complete shadow register file for exception processing. (In my own simplistic CPU project I used 8-bit opcodes in an effort to maximise code density. The result - for compiled C code at least - was in the same ballpark as i386 and m68k and significantly better than the various 32-bit RISC architectures - but the compressed RISC architectures did significantly better.) |
|
14 December 2020, 23:06 | #25 | |||
Registered User
Join Date: May 2017
Location: Munich/Bavaria
Posts: 2,425
|
Quote:
Quote:
There is a little-endian 68k FPGA implementation that holds up to 512 shadow registers for interrupts und tasks. Depending on the RAM interface it might be enough to have one or two of these for interrupt handling and kernel. But this is a feature, that does not really matter much to the coder of an application, only for OS development or real-time stuff. Quote:
|
|||
15 December 2020, 10:24 | #26 | ||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,351
|
If that 'no' means "there is no extra bit to handle", then it implies "they are exchanged by value", and then exg has to move 128 bits around, not 64.
Actually, every write will move 32 bits more than before (instead of just changing a single bit which will say where to fetch the actual value). Only signal routing. Quote:
Your ideas change the overall architecture of the cpu -- it's not as if it was just adding a new instruction. This has a big impact, and one that isn't easy to estimate. In the meantime, a few extra instructions could give a bigger gain. Normal code is already hard enough to handle. No need to add another layer. On the contrary, changes that enhance code readability are welcome. Higher than what ? Than bare 68k ? Not that hard. Try to beat me, now Quote:
Nevertheless i think you should experiment more with it. Document the full ISA. Write a fully featured VM, an assembler, a debugger. There is a lot to be learnt in the process. |
||
15 December 2020, 13:14 | #27 |
Thalion Webshrine
Join Date: Jan 2004
Location: Oxford
Posts: 14,448
|
Slightly off topic but have you been watching the PiStorm project? It's a software defined 680x0 replacement that uses a RaspberryPi and a Carrier board to plug into a 68000 socket. Currently getting 040@25MHz speeds with an RPI3.
Because the CPU is software defined running on an ARM processor, you can add or manipulate 680x0 instructions very easily. The project is very ambitious, with the intent to expose all SoC peripherals to the virtual 680x0 & AmigaOS. Mass storage and 32-bit FastRAM already working, Wifi etc in the works. And it's cheap and open source Last edited by alexh; 15 December 2020 at 13:20. |
15 December 2020, 14:08 | #28 | |||||
Registered User
Join Date: May 2017
Location: Munich/Bavaria
Posts: 2,425
|
Quote:
Only 32 bits are written. The rest ist "only signal routing" No really: is is just a state that gets flipped at every write access - in the simplest possible way, this would be a switch a flipflop determining which lines to use. And no: we do not need an extra register and we do not need to preserve this state on context switch: the order in which the registers will be written back, determines this automatically. Quote:
It (EXG) is actually a very similar situation, but here you have no problem in seeing it as "simple". Besides the actual wiring you need of course some logic as well, telling the CPU when and what to do if this command is issued ... but ok. Quote:
I think I came up with some not so obvious ideas, that could have a big impact - positiv and negative - on code and performance. Quote:
That's the kind of feedback I was looking for. Maybe I can provide a tool - some editor plugin, to compensate this, and make these features more transparent. The "repeat" feature could look for the coder just like the original instructions in a different color or a bracket around them... All "pop" instructions would have a comment, pointing you to the last writing command on that register ... Quote:
My ideas are not colliding with better instructions - both can be done simultaneously. Last edited by Gorf; 15 December 2020 at 14:23. |
|||||
15 December 2020, 14:20 | #29 | |
Registered User
Join Date: May 2017
Location: Munich/Bavaria
Posts: 2,425
|
Quote:
But for CPU (or better VM design at this point) the PiStorm would have no benefit ... it would be a pain in the ass much more likely, to try this on real hardware. And my ideas are definitely slowing things down in software, so some of the FPGA boards make much more sense here, once I try to implement this in hardware. |
|
15 December 2020, 15:44 | #30 | ||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,351
|
Quote:
Quote:
But in your case, you have 16 registers with the cost - and access time - of 32. Quote:
Quote:
For now your approach has lost the comparison - and by a fair amount. But that's only a single case. Time for another test maybe ? What about a situation in which we're out of registers ? After all, this is the original goal of your shadow register idea, isn't it ? As an example of this, a 8-bit c2p (or p2c, that's the same). 8 input pointers, 1 output pointer (or vice versa). 8 data, 1 loop counter, plus any used temporary. Code is also relatively repetitive, (at least in theory) giving you opportunities for instruction reuse. How would you fare in this example ? That depends. As long as they use some opcode space, it's space no longer available for anything else. |
||||
15 December 2020, 16:27 | #31 | |
Registered User
Join Date: May 2017
Location: Munich/Bavaria
Posts: 2,425
|
Quote:
Post the routine here and I will apply my ideas to it. My fist estimation would be, that only 4 registers will be needed for data. |
|
15 December 2020, 16:45 | #32 | |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,351
|
Quote:
https://github.com/Kalmalyzer/kalms-c2p |
|
15 December 2020, 19:50 | #33 | |
Registered User
Join Date: May 2017
Location: Munich/Bavaria
Posts: 2,425
|
Quote:
Code:
move.l (a0)+,d0 lsl.l #4,d0 or.l (a0)+,d0 move.l (a0)+,d1 lsl.l #4,d1 or.l (a0)+,d1 move.l (a0)+,d2 lsl.l #4,d2 or.l (a0)+,d2 move.l (a0)+,d3 lsl.l #4,d3 or.l (a0)+,d3 move.w d2,d6 move.w d3,d7 move.w d0,d2 move.w d1,d3 swap d2 swap d3 move.w d2,d0 move.w d3,d1 move.w d6,d2 move.w d7,d3 lsl.l #2,d0 lsl.l #2,d1 or.l d2,d0 or.l d3,d1 becomes: move.l (a0)+,d3 lsl.l #4,d3 or.l (a0)+,d3 move.l d3,d0 rep3 -4,-3,-2 ; we repeat move.l (a0)+,d3 , lsl.l #4,d3 and or.l (a0)+,d3 move.l d3,d1 rep3 -4,-3,-2 move.l d3,d2 rep3 -4,-3,-2 move.w d2,d3 ; we temporarily overwrite d3 move.w d0,d2 swap d2 move.w d2,d0 pop.w d3,d2 ; we move.w d3,d2 and restore former d3 move.w d3,d2 ; we temporarily overwrite d2 move.w d1,d3 swap d3 move.w d3,d1 pop.w d2,d3 ; we move.w d2,d3 and restore former d2 lsl.l #2,d0 lsl.l #2,d1 or.l d2,d0 or.l d3,d1 We could of course just keep the original routine in the upper part for speed, but I wanted to demonstrate the "repeat" feature here. It introduces 3 move.l and would be slower without other mechanism in the CPU like OoO-execution ... except we might gain a few cycles by using already decoded instructions and fewer instruction fetches. Last edited by Gorf; 15 December 2020 at 20:18. |
|
15 December 2020, 20:35 | #34 |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,351
|
So that's 26 instructions becoming 24. Partial code so no pressure on registers. Sorry, but I'm not impressed.
Now look at how the 16-bit part of the merges can look like : Code:
swap d4 swap d5 swap d6 swap d7 exg.w d0,d4 exg.w d1,d5 exg.w d2,d6 exg.w d3,d7 (And no, i have not invented it right now. It's just an excerpt of my current, tested and working, vm c2p code.) If you want an advice : keep it simple. As you told yourself in the title, your ideas are 'fancy'. Perhaps too much. So instead of looking for original ideas, it might be better to search for a good mix of everything. |
15 December 2020, 21:38 | #35 | |||
Registered User
Join Date: May 2017
Location: Munich/Bavaria
Posts: 2,425
|
Quote:
I tried to save registers here, and showed how my idea can be used to do exactly that. Quote:
You point me to code written for one ISA and I apply my ideas on that ... and then you say: look, on a complete different ISA it is even better. Quote:
It is even totally besides my project, which is about trying unorthodox things. You had me already convinced, that you do not find them useful or intriguing 10 posts ago... Last edited by Gorf; 15 December 2020 at 21:47. |
|||
15 December 2020, 23:10 | #36 |
Registered User
Join Date: May 2017
Location: Munich/Bavaria
Posts: 2,425
|
I just hat a look at the second ham8 routine:
Code:
swap d0 swap d2 move.l d0,d1 move.l d2,d3 lsl.l #6,d0 lsl.l #6,d2 lsr.w #3,d0 lsr.w #3,d2 lsr.b #3,d0 lsr.b #3,d2 lsl.l #8,d0 lsl.l #8,d2 move.b d1,d0 move.b d3,d2 swap d1 swap d3 move.l d1,d5 move.l d3,d7 lsl.l #6,d1 lsl.l #6,d3 lsr.w #3,d1 lsr.w #3,d3 lsr.b #3,d1 lsr.b #3,d3 lsl.l #8,d1 lsl.l #8,d3 move.b d5,d1 move.b d7,d3 becomes swap d0 move.l d0,d7 lsl.l #6,d7 lsr.w #3,d7 lsr.b #3,d7 lsl.l #8,d7 pop.b d0,d7 exg.l d0,d7 rep2 -6,-5 rep3 -6,-5,-4 move.l d7,d1 swap d2 move.l d2,d7 rep2 -8,-7 rep2 -8,-7 pop.b d2,d7 exg.l d2,d7 rep2 -6,-5 rep3 -6,-5,-4 move.l d7,d3 Last edited by Gorf; 15 December 2020 at 23:26. |
16 December 2020, 09:23 | #37 | ||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,351
|
Quote:
Quote:
It does not matter for which ISA the code you're rewriting was originally for. I could just have shown some C code and asked you to "play compiler". Actually, proper code density comparison should have included other cpu families as well, not just 68k. Quote:
Quote:
Anyway... You know that with my instructions i can replace 4 blocks of move.l + lsl.l #6 + lsr.w #3 + lsr.b #3 each by a single instruction, going from 28 lines down to 16, and i didn't attempt to fully optimise it. So why coming back for more ? |
||||
16 December 2020, 10:03 | #38 | |
Registered User
Join Date: Mar 2018
Location: Hastings, New Zealand
Posts: 2,708
|
Quote:
Assembly language programmers tend to get frustrated when certain instructions take up more space or work less efficiently than expected, but 99% of the time it isn't that important. Most programs can be made smaller simply by thinking about different ways of doing things or even not doing them at all. For example, trying to improve c2p code with 'extended' instructions. Better to embed an Akiko-like c2p converter into the machine, then you dramatically reduce code size, potentially do it much faster and have better compatibility. The 68k in an Amiga is what it is. Coders should enjoy trying to squeeze the most out what they have rather than pining for something 'better'. When playing chess do you change the rules because a piece can't make a move you want? Of course not, and playing within the rules makes it more fun! |
|
16 December 2020, 10:24 | #39 | ||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,351
|
Quote:
Quote:
In my case it's not only about making the code smaller, but also easier to read and write. There is no progress if we just accept all limitations inflicted to us. Quote:
Quote:
|
||||
16 December 2020, 10:36 | #40 | |||||
Registered User
Join Date: May 2017
Location: Munich/Bavaria
Posts: 2,425
|
Quote:
I used only 4 registers instead of 6 in fewer lines. Your reply to that: "Partial code so no pressure on registers.." ok ..... Quote:
But the 3 ideas, i mentioned here specifically, are quite ISA Independent - and they would not break any old code no matter of where they would be applied. Quote:
Quote:
Quote:
Last edited by Gorf; 16 December 2020 at 10:51. |
|||||
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
68k & PPC CPU Usage monitor for OS3 | ancalimon | support.Apps | 1 | 29 June 2020 23:42 |
68k CPU pause (bubble) | kamelito | Coders. Asm / Hardware | 9 | 27 January 2020 15:09 |
Bad weather for the 68K socket cpu cards | Solderbro | support.Hardware | 0 | 14 July 2018 10:19 |
Looking to get max CPU performance in WinUAE 68k OS | GunnzAkimbo | support.WinUAE | 1 | 12 May 2016 11:18 |
Apollo / Phoenix CISC CPUs m68k compatible | Snake79 | News | 3 | 05 March 2015 20:20 |
|
|