English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General

 
 
Thread Tools
Old 20 May 2021, 11:07   #141
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,979
Quote:
Originally Posted by Bruce Abbott View Post
I think litwr wants fastest and smallest, so it's a bit tricky. Is a 5% speedup worthwhile if it adds 4 bytes to the file size?

On litwr's benchmark site the main loop code size is 54 bytes on a 50MHz 68030 vs 57 bytes (~6% larger) on a 25MHz 386, while the 386 would theoretically be ~5% faster if running at the same clock speed. Some speed optimization might make the Amiga code 5% quicker but 5% larger, and therefore virtually identical to the 386 (except that the 030 is 25% faster in real terms because 386's top out at 40MHz).

It's also good to see the Amiga 1200 with Blizzard 1230-IV beating a 36MHz ARM3 and a 33MHz 80486 (though of course these figures don't mean much in the real world).


Fair enough, but ultimately we want to know what our efforts have achieved. Hoping to see a side-by-side comparison between the original code and the final optimized version.

Wow, such easy pickings! Perhaps the total size can still be shrunk quite a lot and get even quicker!
If main loop code started from label .l0 (without Write routine) then except my today 4 bytes size optimisation, i can gained 6 bytes more too. Seems 386 is not good enough to beat 68020 in code density.
Don_Adan is offline  
Old 20 May 2021, 11:57   #142
alkis
Registered User
 
Join Date: Dec 2010
Location: Athens/Greece
Age: 53
Posts: 719
Quote:
Originally Posted by Bruce Abbott View Post
I think litwr wants fastest and smallest, so it's a bit tricky. Is a 5% speedup worthwhile if it adds 4 bytes to the file size?
You spelled "wants to troll" incorrectly there.

Basic Premise: (from troll's site)
Every program is satisfying four restrictions: 1) it measures time; 2) it uses an OS function to print digits, it prints 4 digits a time synchronously with the calculation of them; 3) it uses less than 64 KB RAM for the code and data; 4) it utilizes all available RAM below 64 KB limit to get the maximum number of calculated digits, so it is forbidden to restrict artificially the maximum number of digits.

Take note on 2. Use OS for print. I think it was Maynaf that suggested OS's RawDoFmt/Write a gazzilian years ago, but troll said it was not fair. So, use OS but don't use OS if the amiga has an advantage.

It's pretty pointless, unless you want to keep feeding the troll though.

My €0.02
alkis is offline  
Old 20 May 2021, 12:12   #143
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,421
Quote:
Originally Posted by alkis View Post
You spelled "wants to troll" incorrectly there.

Basic Premise: (from troll's site)
Every program is satisfying four restrictions: 1) it measures time; 2) it uses an OS function to print digits, it prints 4 digits a time synchronously with the calculation of them; 3) it uses less than 64 KB RAM for the code and data; 4) it utilizes all available RAM below 64 KB limit to get the maximum number of calculated digits, so it is forbidden to restrict artificially the maximum number of digits.

Take note on 2. Use OS for print. I think it was Maynaf that suggested OS's RawDoFmt/Write a gazzilian years ago, but troll said it was not fair. So, use OS but don't use OS if the amiga has an advantage.

It's pretty pointless, unless you want to keep feeding the troll though.

My €0.02
You could easily argue that the 64KB code/data limitation also gives an artificial advantage to some implementations. In particular, this will benefit 8 bit architectures and probably those that have 64KB segmentation as well. To me it's actually an odd choice, regardless of platform. Optimisation tends to be either best speed or best size. Asking for best speed and best size at the same time usually gives neither.

I'm not going to guess about the intentions here (they may be perfectly legitimate, they may not), but IMHO it's quite clear the stated limitations as is make the nature of the program not very good as a cross-platform benchmarking tool. Meaning, it won't really tell you all that much about real world performance differences because of these kind of specialised limitations.
roondar is offline  
Old 20 May 2021, 14:49   #144
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,979
On Amiga is no problem to write program which use full 64KB for store digits. But it will be unfair for 8bit systems and maybe some other CPU's.
Don_Adan is offline  
Old 20 May 2021, 19:10   #145
saimo
Registered User
 
saimo's Avatar
 
Join Date: Aug 2010
Location: Italy
Posts: 787
I'm sorry if this sounds pedant, but we should not spread wrong notions, especially when there is already some confusion.

Quote:
Originally Posted by meynaf View Post
As an example, most assemblers will accept moveq.l even though it is technically incorrect (moveq has no size).
moveq does have a size, and that's long. Here is the official definition:



Please don't be fooled by the fact that the writing Assembler Syntax: MOVEQ # < data > ,Dn doesn't include ".l", as the same happens also with the other instructions - for example, here's move:



Instructions without a size are explicitly declared unsized - here's an example:



Quote:
So about LSL.L D5 being syntaxically correct or not, it is a matter of how you see it
Syntax is not an opinion: it's a formal set of rules defined by the designer of the CPU. The fact that some assemblers can be tolerant doesn't change the syntax. lsl.l d5 does not exist in the official syntax and is therefore wrong.
saimo is offline  
Old 20 May 2021, 19:25   #146
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,771
Quote:
Originally Posted by alkis View Post
Take note on 2. Use OS for print. I think it was Maynaf that suggested OS's RawDoFmt/Write a gazzilian years ago, but troll said it was not fair. So, use OS but don't use OS if the amiga has an advantage.
Just use VPrintf()
Thorham is offline  
Old 20 May 2021, 19:39   #147
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by saimo View Post
I'm sorry if this sounds pedant, but we should not spread wrong notions, especially when there is already some confusion.
It is not wrong notions. It is 30+ years of coding.
That some "official" online doc says something does not change what is correct and what is not (btw. Freescale isn't Motorola as we knew it).


Quote:
Originally Posted by saimo View Post
moveq does have a size, and that's long.
No. Most, if not all, disassemblers, will output it without a size and all assemblers accept it without a size -- while you may eventually find some which reject moveq.l (and bset.b, etc).


Quote:
Originally Posted by saimo View Post
Syntax is not an opinion: it's a formal set of rules defined by the designer of the CPU. The fact that some assemblers can be tolerant doesn't change the syntax. lsl.l d5 does not exist in the official syntax and is therefore wrong.
If assemblers rejected all 'wrong' syntax with this definition, not a single source in the world would assemble
meynaf is offline  
Old 20 May 2021, 20:00   #148
saimo
Registered User
 
saimo's Avatar
 
Join Date: Aug 2010
Location: Italy
Posts: 787
Quote:
Originally Posted by meynaf View Post
It is not wrong notions. It is 30+ years of coding.
That some "official" online doc says something does not change what is correct and what is not (btw. Freescale isn't Motorola as we knew it).
That online doc is the PDF version of the Motorola's M68000 Family Programmer's Reference Manual. For the record, I have the physical book right here, and I can guarantee that the text matches 100%. FreeScale and NXP simply acquired the assets and rights, but didn't redefine the syntax.

Quote:
No. Most, if not all, disassemblers, will output it without a size and all assemblers accept it without a size -- while you may eventually find some which reject moveq.l (and bset.b, etc).
What assemblers and disassemblers do mean less than 0: Motorola defined the syntax, and Motorola said that moveq has a long size.

Quote:
If assemblers rejected all 'wrong' syntax with this definition, not a single source in the world would assemble
That doesn't change the fact that a wrong syntax is just that: wrong.
saimo is offline  
Old 20 May 2021, 20:12   #149
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by saimo View Post
That online doc is the PDF version of the Motorola's M68000 Family Programmer's Reference Manual. For the record, I have the physical book right here, and I can guarantee that the text matches 100%. FreeScale and NXP simply acquired the assets and rights, but didn't redefine the syntax.
Technically, moveq is 8->32, not 32. You don't believe me ? Read this then :
http://www2.ece.ohio-state.edu/~degr.../M68000PRM.pdf


Quote:
Originally Posted by saimo View Post
What assemblers and disassemblers do mean less than 0: Motorola defined the syntax, and Motorola said that moveq has a long size.
Not exactly, no. They just say - and only in the specific manual you linked to - that it's a longword operation. Nowhere they say we should write 'moveq.l' and not 'moveq'.


Quote:
Originally Posted by saimo View Post
That doesn't change the fact that a wrong syntax is just that: wrong.
Other syntaxes exist, not just the one of Motorola :
https://www.nextop.de/NeXTstep_3.3_D...mld/index.html
meynaf is offline  
Old 20 May 2021, 20:20   #150
litwr
Registered User
 
Join Date: Mar 2016
Location: Ozherele
Posts: 229
Quote:
Originally Posted by saimo View Post
Don't worry, I'm not taking this conversation personally at all: I just had to point out that, unfortunately, your stubbord attitude is preventing you to see something that is very simple and also shows disrespect to who's trying to help you see.
Now, there is nothing controversial here: the pages of Motorola's official manual define clearly the syntax of the instructions, how the instructions work and how they are encoded. You simply fail to understand those pages. I tried to help you with an almost word-by-word guidance, but given that you choose not to see, I won't add anything else.
You have your interpretation. I have mine. I have showed you my logic you prefer to stop showing yours. So I continue to insist that official Moto's doc doesn't forbid LSL.L D5.
BTW I have just checked LSL.L D5 with ASMONE - it works perfectly.
litwr is offline  
Old 20 May 2021, 20:24   #151
litwr
Registered User
 
Join Date: Mar 2016
Location: Ozherele
Posts: 229
Quote:
Originally Posted by Don_Adan View Post
3 times divu.w called, 3x53 cycles, average about 150 cycles less per one access to PR0000 routine. Plus much fastest code for cv handling. In total, about 165 cycles fastest per one access.
Thank you but even 200 cycles give us less than 0.5% - it is still undetectable. Moreover your optimization may slow down the 68020.


EDIT. And 140 cycles for DIVU is the worst case. 78 is the best.

Last edited by litwr; 20 May 2021 at 20:40.
litwr is offline  
Old 20 May 2021, 20:28   #152
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,039
If you look at moveq's opcode you will not find size bits. moveq is always .L and there is no point, in my opinion, to write .L, it's unambiguous. Same with, for example, lea. And since these instructions are so common and frequently used it should be common knowledge what they do and cut the c... size out. And the fact that eg. winuae debugger's craptastic disassembler spits out nonsense like bt instead of bra, lea.l, moveq.l etc, does not change that.
If you look at addq.w #n,ax and addq.l #n,ax, they do exactly the same thing. You could say there's no point in writing the size, but they don't have the same opcode (size is part of the opcode in this case) so it does matter.
And finally...
lsl dx does not have its own opcode, it's an alias for lsl #1,dx at best. lsl <ea> does exists, *but* you should not stop there, you should look at its <ea> table and you'll see that dx is not supported (eg. that specific opcode might be used to encode some other instruction).

Thread moves fast... No, Moto doc *does* "forbid" lsl dx. Again, look at the <ea> table for lsl and you will see: Dn -
You cannot just look at the first part of the information and then ignore the other, relevant, part.
What assemblers accept or don't is another thing, they are typically written to accept all kinds of crap for back/cross/whatever compatibilty.
a/b is online now  
Old 20 May 2021, 20:32   #153
litwr
Registered User
 
Join Date: Mar 2016
Location: Ozherele
Posts: 229
Quote:
Originally Posted by modrobert View Post
I tested now running in reverse order, exactly the same results, 'pi-na' is 0.24 seconds faster than 'pi-align'.

Try using 'cnop 0,4' to align with next long word address instead.
It is very strange.
Code:
CNOP 0,4
and
Code:
ALIGN 2
do the same things.

We need help from the 68k experts for this issue. The 68k experts! Help us! I am completely baffled here.
litwr is offline  
Old 20 May 2021, 20:41   #154
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,039
Use ALIGN 0,4.
I presume that ALIGN 2 is expanded to ALIGN 2,0, so it does no current address aligment (2nd argument is 0) and then adds 2 to the current address. Eg. it works the same only if the current address is not longword aligned (2, 6, 10, ...).
a/b is online now  
Old 20 May 2021, 21:24   #155
modrobert
old bearded fool
 
modrobert's Avatar
 
Join Date: Jan 2010
Location: Bangkok
Age: 56
Posts: 779
Quote:
Originally Posted by litwr View Post
It is very strange.
Code:
CNOP 0,4
and
Code:
ALIGN 2
do the same things.
No, long word vs word.


PS: I didn't get any source code this time, so you better change it.
modrobert is offline  
Old 20 May 2021, 22:26   #156
saimo
Registered User
 
saimo's Avatar
 
Join Date: Aug 2010
Location: Italy
Posts: 787
Quote:
Originally Posted by meynaf View Post
Technically, moveq is 8->32, not 32. You don't believe me ? Read this then :
http://www2.ece.ohio-state.edu/~degr.../M68000PRM.pdf
No, problem. Just let me download the file and have a look... Oh, surprise!


(Click to see in full size.)

Quote:
Not exactly, no. They just say - and only in the specific manual you linked to - that it's a longword operation. Nowhere they say we should write 'moveq.l' and not 'moveq'.
The specific manual I linked to is the same manual you linked to, and that I happen have here in paper, straight from Motorola And this is what it says:


(Click to see in full size.)

Now, you're throwing in the mix two things I either didn't touch on or say:
* Technically, moveq is 8->32, not 32 - that's right, but I didn't even remotely touch on that aspect;
* we should write 'moveq.l' and not 'moveq' - nowhere I said that.

Let's instead look at what actually happened.
In post #140 you wrote: As an example, most assemblers will accept moveq.l even though it is technically incorrect (moveq has no size).
With post #145 I showed that "moveq has no size" is false, as the official reference manual from Motorola (again, the same you linked to) states that the size of moveq is long; additionally, I showed an example on an instruction that actually has no size (bfextu).
That's all there is to it, and I'm shocked that such a basic matter started such a reaction

Regarding appending ".l" to "moveq": it's redundant, but not technically wrong, because the size attribute of moveq is precisely .l. But that's a totally different story from lsr.l d5: that is just wrong, because Motorola's syntax - and, even more, instruction encoding - demands that a count be specified when the operand is a register.

Quote:
Other syntaxes exist, not just the one of Motorola :
https://www.nextop.de/NeXTstep_3.3_D...mld/index.html
Who designed the CPU and defined the instruction set with its syntax is the only authority in such matter, and that's Motorola. Alternative syntaxes can and have been be adopted, but they can't have higher authority.
saimo is offline  
Old 20 May 2021, 22:27   #157
litwr
Registered User
 
Join Date: Mar 2016
Location: Ozherele
Posts: 229
Quote:
Originally Posted by Don_Adan View Post
Next 2 bytes less.
Code:
         move.l #start+$10000-ra,D7
         divu.w #7*4,D7
         ext.l D7
         lsl.w #2,D7    ;d7=maxn
For my version of PR0000, 2 more bytes gained.
Thank you. But your version is longer and could be slower for the 68020/30. I am really very impressed by your efforts to make the code better. But you know, the perfection is impossible, every next step to the perfect result is much harder than the previous. So IMHO we have very good code know. Its further improvements will cost much and give almost nothing.

Quote:
Originally Posted by Bruce Abbott View Post
Only because the 68000 has this arbitrary limitation. To shift by more than 1 bit (without a barrel shifter or temporary registers to store intermediate results) it would have to perform multiple reads and writes to memory, which would be very slow. Also the 16 bit opcode does not have enough space to specify both shift value and <ea>.
Even the top 68k (even the 68060) can move only words in memory and only by 1 bit.

Quote:
Originally Posted by Bruce Abbott View Post
Yes. However in this case - as in many others - not all <ea> modes are valid. Specifically, modes An and Dn are illegal for shift/rotate, and will cause an exception if you try to execute them (even though some debuggers or disassemblers may think they are valid code).

There are equivalent opcodes using Dn explicitly that are valid, which an assembler could alias to for 'convenience'. This also applies to some other instructions that do have an equivalent <ea> opcode, which is a pain when the assembler silently changes one to the other without asking or providing any way to avoid it (sometimes we need to have the exact opcode we asked for!).
You are right but it seems that you try to prove things that are very well known for us both. I have never claimed that LSR D5 encoding is a particular case of LSR <ea> encoding. I claimed exactly the same thing as you do: LSR D5 is a convenient shorthand version (an alias ) for LSR #1,D5. You know, the x86 SHR AX,1 and SHR AX,2 have very different encoding and it is good that assemblers don't bother programmers to think about it. Technically it would be more correct to write SHR AX instead of SHR AX,1 because this allows us to use different encodings for the both cases but it breaks the convenience of logic and it is not used therefore.

Quote:
Originally Posted by Bruce Abbott View Post
Encoding is relevant though, because it tells you which modes are valid for particular instructions.
I can't completely agree. Encoding only provides the base for the whole "building" of the assembly. It is very odd to reduce assembler usability just making it to blindly follow hardware encoding.

Quote:
Originally Posted by Bruce Abbott View Post
"The great thing about standards is that there are so many of them!"
Yes, some code that is "syntactically correct" in one assembler may not be in another. To avoid confusion and maintain compatibility it is best to stick to a common subset with unambiguous syntax where possible, and specify the syntax used when it isn't. Otherwise people may have trouble understanding and using your code.
Thank you very much. You know there is a very old problem. You can just follow your understanding of the rules and try to satisfy everybody. This usually works worse than some people think. There is another way, someone can try to use better rules. IMHO briefer assembler statements are better for computer nerds.

Quote:
Originally Posted by Bruce Abbott View Post
Time for summary of progress so far?
How much space and time have we now saved (or gained) over litwr's original code, as a proportion of it?
IMHO we already got almost perfect code. I reported about this in http://eab.abime.net/showpost.php?p=...&postcount=115
However saimo and Don_Adan just tries to make the impossible. They pushed me to make some minor improvements which mean very little. Saimo also started this fruitless LSR D5 discussion.

Quote:
Originally Posted by Don_Adan View Post
Critical code perhaps can be shortest/fastest only a few, but other code called only once still can be shortened.
Here is example:
from
move #10,d4
to
moveq #10,D4
Mostly time calculation routine can be optimised for space.
VASM compiles MOVE.L #10,d4 into MOVEQ #10,D4 - however you offer to replace MOVE.W by MOVEQ and it saves 2 bytes! Thank you very much.

Quote:
Originally Posted by Don_Adan View Post
Perhaps this code can be shortened/optimised too. A few shortet a few fastest.
Thank you very much again. IMHO the code has become so polished that it can dazzle somebody by its light. But its speed and digit number have not changed. However the programs became 6 bytes less and this is good. The changes have just been committed.

Quote:
Originally Posted by Bruce Abbott View Post
I think litwr wants fastest and smallest, so it's a bit tricky. Is a 5% speedup worthwhile if it adds 4 bytes to the file size?
IMHO even a 1% speedup is rather impossible, it requires some real magic. All efforts gave us only 4 saved cycles. 4 more saved cycles were just rediscovered. You know, the main goal is speed, the code size is secondary and much less important.

Quote:
Originally Posted by Bruce Abbott View Post
It's also good to see the Amiga 1200 with Blizzard 1230-IV beating a 36MHz ARM3 and a 33MHz 80486 (though of course these figures don't mean much in the real world).
Of course, these are only results for this particular algorithm. This is mostly the division benchmark.
litwr is offline  
Old 20 May 2021, 22:32   #158
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,421
Quote:
Originally Posted by saimo View Post
That's all there is to it, and I'm shocked that such a basic matter started such a reaction
I can't help but agree. In my opinion, the Motorola manuals are the authority here (unless there are errata, in which case they take precedence).
Quote:
But that's a totally different story from lsr.l d5: that is just wrong, because Motorola's syntax - and, even more, instruction encoding - demands that a count be specified when the operand is a register.
It seems to me if the instruction coding for a specific instruction form does not actually exist then using a syntax that implies that form is being used is just plain wrong. End of story.
Quote:
Who designed the CPU and defined the instruction set with its syntax is the only authority in such matter, and that's Motorola. Alternative syntaxes can and have been be adopted, but they can't have higher authority.
I fully agree
roondar is offline  
Old 20 May 2021, 22:33   #159
litwr
Registered User
 
Join Date: Mar 2016
Location: Ozherele
Posts: 229
Quote:
Originally Posted by Don_Adan View Post
If main loop code started from label .l0 (without Write routine) then except my today 4 bytes size optimisation, i can gained 6 bytes more too. Seems 386 is not good enough to beat 68020 in code density.
The main loop starts from .longdiv label and it ends on the bcc .l2 statement. The main loops for 80286 and 68020 have the same size now.
litwr is offline  
Old 20 May 2021, 22:37   #160
saimo
Registered User
 
saimo's Avatar
 
Join Date: Aug 2010
Location: Italy
Posts: 787
Quote:
Originally Posted by litwr View Post
You have your interpretation. I have mine. I have showed you my logic you prefer to stop showing yours.
The M68000UM is not a collection of poems. There is no room for interpretation. Your interpretation is wrong.

Quote:
So I continue to insist that official Moto's doc doesn't forbid LSL.L D5.
It does and I already explained you why, almost word by word.
The only result you'll achieve by not accepting that your interpretation is wrong is that you won't learn something new and your reputation will be affected negatively.

Quote:
BTW I have just checked LSL.L D5 with ASMONE - it works perfectly.
It means nothing.
saimo is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
68020 Bit Field Instructions mcgeezer Coders. Asm / Hardware 9 27 October 2023 23:21
68060 64-bit integer math BSzili Coders. Asm / Hardware 7 25 January 2021 21:18
Discovery: Math Audio Snow request.Old Rare Games 30 20 August 2018 12:17
Math apps mtb support.Apps 1 08 September 2002 18:59

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 19:25.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.13565 seconds with 16 queries