English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Language > Coders. C/C++

 
 
Thread Tools
Old 31 October 2023, 22:33   #1
mark_k
Registered User
 
Join Date: Aug 2004
Location:
Posts: 3,349
C compiler generating instructions which can be jumped into the middle of?

Sorry if this sounds a little vague...

I seem to remember examining the code of some programs using the ReSource disassembler a long time ago, and noticing something a little odd.

(This isn't an actual code example, just trying to illustrate the point from my vague recollection.)

The C compiler had been generating code like
Code:
TST.W D2
BEQ.B *+2
CMPI.W #$7000,D0  ;$7000 is opcode for MOVEQ #0,D0
For that example, if D2 is non-zero it's a kind of NOP equivalent (the code doesn't depend on CMPI.W setting condition codes). If D2 contains 0 then the branch is taken to $7000 opcode which is MOVEQ #0,D0. The branch "jumps into the middle of" the CMPI.W instruction.

Does anyone remember which compiler could have output code sequences like that? Maybe some version of SAS/C???
mark_k is offline  
Old 31 October 2023, 23:13   #2
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,029
From my memory it was SAS C optimization.
Slowest bra.b was replaced with fastest cmp.w.
Don_Adan is offline  
Old 01 November 2023, 07:42   #3
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,182
TFX AGA contains a string saying it was compiled with SAS C 6.51, and has e.g.
Code:
	TST.L	D0			;50e98: 4a80
	BEQ.S	LAB_24C4+2		;50e9a: 6704
	MOVEQ	#20,D0			;50e9c: 7014
LAB_24C4:
	CMPI.W	#$7000,D0		;50e9e: 0c407000
So perhaps generated from
Code:
  if (a)
    a=20;
paraj is offline  
Old 01 November 2023, 08:08   #4
mark_k
Registered User
 
Join Date: Aug 2004
Location:
Posts: 3,349
Thanks! I disassembled a few programs with code sequences like that years ago but can't remember which. I'd like to find (ideally) some small/tiny programs to use as an example too.
mark_k is offline  
Old 01 November 2023, 08:32   #5
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,182
Only have 6.50 at hand, and I'm not sure what exactly triggers the optimization, but:
Code:
long foo(long a) { return a=a?20:0; }
compiled with sc OPT
turns into:
Code:
        CMPA.L  0(A4),A7
        DC.W    $6500
        DC.W    $0000
        MOVE.L  D7,-(A7)
        MOVE.L  8(A7),D7
        TST.L   D7
        BEQ.S   LAB_0001+2
        MOVEQ   #20,D7
LAB_0001:
        CMPI.W  #$7e00,D0
        MOVE.L  D7,D0
        MOVE.L  (A7)+,D7
        RTS
paraj is offline  
Old 01 November 2023, 10:04   #6
hooverphonique
ex. demoscener "Bigmama"
 
Join Date: Jun 2012
Location: Fyn / Denmark
Posts: 1,635
Does this sort of trick work on pipelined cpu's? I would suspect the instruction decoding to go haywire when jumping to the middle of an instruction.
hooverphonique is offline  
Old 01 November 2023, 10:26   #7
mark_k
Registered User
 
Join Date: Aug 2004
Location:
Posts: 3,349
I vaguely remember reading that maybe the 68060 didn't like it. Perhaps that type of code generation was removed from the final SASC versions???
mark_k is offline  
Old 01 November 2023, 11:23   #8
Photon
Moderator
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,650
Quote:
Originally Posted by paraj View Post
Only have 6.50 at hand, and I'm not sure what exactly triggers the optimization, but:
Code:
long foo(long a) { return a=a?20:0; }
compiled with sc OPT
turns into:
Code:
        CMPA.L  0(A4),A7
        DC.W    $6500
        DC.W    $0000
        MOVE.L  D7,-(A7)
        MOVE.L  8(A7),D7
        TST.L   D7
        BEQ.S   LAB_0001+2
        MOVEQ   #20,D7
LAB_0001:
        CMPI.W  #$7e00,D0
        MOVE.L  D7,D0
        MOVE.L  (A7)+,D7
        RTS
What strange C code, and what terrible Asm code.

The straight conversion to Asm from C was written would be moveq #20,d0;rts which is much more than 10x faster.

If jumping into the middle of instructions works on pipelined CPUs it must cause a pipeline stall - the next instruction is likely already decoded and ea calced.

The reason for jumping into the middle of instructions is probably some convoluted/misguided idea about branch handling with values if some flag is set (and refusing to use scc).

Or maybe the code is in turn generated from some prior misguided implementation of same from another language. But most languages in the 70s had simple, straightforward, and fast branch handling.
Photon is offline  
Old 01 November 2023, 11:50   #9
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,488
Nahh , this is the right code:
Code:
    move.l  d0,d1
    sne d1
    moveq   #20,d0
    and.b   d1,d0
Nothing strange in what paraj wrote, I do it that way quite often too.

ross is offline  
Old 01 November 2023, 12:01   #10
hooverphonique
ex. demoscener "Bigmama"
 
Join Date: Jun 2012
Location: Fyn / Denmark
Posts: 1,635
Is this
Quote:
Originally Posted by paraj View Post
Code:
long foo(long a) { return a=a?20:0; }
not the same as
Code:
long foo(long a) { return a?20:0; }
?
hooverphonique is offline  
Old 01 November 2023, 12:13   #11
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,488
Yep, in this case the same because return expect a long.

The former is a generic ternary assignment.
ross is offline  
Old 01 November 2023, 12:31   #12
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,488
Quote:
Originally Posted by ross View Post
Nahh , this is the right code:
mmh, this is faster on 000:
Code:
    tst.l   d0
    beq.b   .0
    moveq   #20,d0
.0
How much slower is it on advanced processors to use a branch?
ross is offline  
Old 01 November 2023, 12:49   #13
Olaf Barthel
Registered User
 
Join Date: Aug 2010
Location: Germany
Posts: 532
Quote:
Originally Posted by mark_k View Post
I vaguely remember reading that maybe the 68060 didn't like it. Perhaps that type of code generation was removed from the final SASC versions???
Indeed, it was removed for this reason (in version 6.58, if memory serves). I also recall that the 68060 not only failed to appreciate this clever hack, it would throw an exception, thus turning an optimization into a pessimization. Possibly, there was no choice but to perform a full pipeline flush because of the ambiguity of branch prediction and the opcode breaking down into two instructions.

You know what's worse about this problem? The Lattice 'C' compiler had this optimization in place at least since version 5, which AmigaOS 2.04 and beyond were built with. Plenty of opportunity to make the 68060 lose its edge over a loop optimization, even today
Olaf Barthel is offline  
Old 01 November 2023, 12:58   #14
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,488
Quote:
Originally Posted by Olaf Barthel View Post
I also recall that the 68060 not only failed to appreciate this clever hack, it would throw an exception...
Wait wait, by this you mean that in the 68060.library (or somewhere in the bootroms of the accelerator board) there must be management of this exception otherwise many programs (or the very system) would not work?
(I don't know much about the 68060, but it's always interesting to 'discover' these, for me, 'new features' )
ross is offline  
Old 01 November 2023, 15:00   #15
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,182
Yes, the code is silly, it was just the shortest code where I (quickly) could get it to use the optimization (and yeah, it fails to notice the dead store).

I don't think there's any problem with it on 060. It certainly doesn't trigger an exception or misbehave (*).
Also think (if I'm reading the manual correctly) that the branchy version (from Ross) will be slightly faster if the branch is correctly predicted with high probability (>80%).

(*) MC68060UM 11.1.1 does list a case where there can be an issue with branch prediction, but it only concerns the TRAPF instruction.
paraj is offline  
Old 01 November 2023, 18:15   #16
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,300
Quote:
Originally Posted by Olaf Barthel View Post
I also recall that the 68060 not only failed to appreciate this clever hack, it would throw an exception, thus turning an optimization into a pessimization.

It's really not that bad - there is no exception in this case (and yes, I just verified), at least not when I tried. It probably just messes with its branch prediction cache and probably the instruction pipeline.



The famous "clear the branch cache" handling in the 68060.library is for a different case. This exception is triggered if the MMU configuration changed under an existing (logically indexed) branch cache entry, and a prefetch of a branch exception on a (no-longer existing) branch which is predicted as taken into a no longer existing page. This triggers this exception as the CPU attempts to prefetch an instruction at the branch target at a page that is no longer resident.


In such a case, you need of course to flush the branch cache because the "seemingly" branch instruction is no longer a branch instruction, and the target of the branch is of no relevance.


So this exception is not very mysterious, just the description in the UM is quite opaque. What the manual calls a "context switch" is a "MMU reconfiguration".
Thomas Richter is offline  
Old 01 November 2023, 18:48   #17
Photon
Moderator
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,650
I guess this boils down to my criticism of C as a wannabe low-level language, sometimes requiring unreadable or illogical code to control how some compiler version translates code.

In other languages, a=a is a comparison and comparing a variable with itself is unnecessary. C broke with all other languages and made = mean assignment, and it's equally unnecessary.

a==0?0:20, a==false?0:20, or a?20:0 would have been more logical and readable. (Last two, more readable with a cast.)

Pseudocode
Code:
if a=0 then result <- 0 else result <- 20
return result
For which the straight translation is,
Code:
foo:	cmp.l #0,d0
	beq.s .then
.else:	move.l #20,d0
	rts
.then:	move.l #0,d0
	rts
And it could still be faster than the illogical code you had to write, for a certain version of a compiler to optimize it into that mess of Assembly.

I know all relevant information to write a top performance Assembly program on 68060. I also know that it's hugely up to cache misses and very little else, and that even if you actually run it to measure it, your benchmark could be artificial; unrepresentative.

This means that if someone says, 'this is the fastest code', the first questions asked should be: in which exact context is it running, how many times is it called, how is it called, are interrupts running, etc. If the resulting code uses fast ALU instruction, is short, has few memory accesses, and has few branches, it's reasonable to guess that it will be faster, but this is far from true on CPUs with big caches, good branch prediction, inexpensive multiplication operations, etc.

The resulting code from this simple function is very strange, even if execution speed is not involved at all. It's not smallest, it's not fastest, it doesn't know about Scc and other features of the 68000 family, but it tries to apply some hack that will stall the pipeline on a 68000 CPU that has one.

Last edited by Photon; 01 November 2023 at 22:00.
Photon is offline  
Old 01 November 2023, 18:59   #18
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,182
Quote:
Originally Posted by Thomas Richter View Post
It's really not that bad - there is no exception in this case (and yes, I just verified), at least not when I tried. It probably just messes with its branch prediction cache and probably the instruction pipeline.

The famous "clear the branch cache" handling in the 68060.library is for a different case. This exception is triggered if the MMU configuration changed under an existing (logically indexed) branch cache entry, and a prefetch of a branch exception on a (no-longer existing) branch which is predicted as taken into a no longer existing page. This triggers this exception as the CPU attempts to prefetch an instruction at the branch target at a page that is no longer resident.

In such a case, you need of course to flush the branch cache because the "seemingly" branch instruction is no longer a branch instruction, and the target of the branch is of no relevance.

So this exception is not very mysterious, just the description in the UM is quite opaque. What the manual calls a "context switch" is a "MMU reconfiguration".
Very interesting, thanks for the info. Indeed the UM is not very clear, but my takeaway was that it was not something that would come up in most Amiga cases (and I think that's the correct reading, right?).


I also tested earlier (before writing my reply) and there are no unexpected slowdowns even if the branches are unpredictable etc. I think that would also be very surprising.


Maybe there's an idea that the 060 somehow caches "instructions" (or remembers instruction boundaries). It doesn't. The instruction cache is "dumb", and when a branch is mispredicted all prior knowledge is lost and it flushes the instruction pipeline and restarts fetching from the correct address. (Not 100% true, but I think this is close enough). This can be observed easily with instruction fetch limited loops.
paraj is offline  
Old 01 November 2023, 20:36   #19
Minuous
Coder/webmaster/gamer
 
Minuous's Avatar
 
Join Date: Oct 2001
Location: Canberra/Australia
Posts: 2,669
Quote:
Originally Posted by Photon View Post
C broke with all other languages and made = mean assignment
That's not correct, there were already lots of languages using = for assignment.
Minuous is offline  
Old 01 November 2023, 21:10   #20
Olaf Barthel
Registered User
 
Join Date: Aug 2010
Location: Germany
Posts: 532
Quote:
Originally Posted by Photon View Post
I guess this boils down to my criticism of C as a wannabe low-level language, sometimes requiring unreadable or illogical code to control how some compiler version translates code.
The 'C' programming language was shaped by many desires and lessons learned from previous experience, not the least by the arduous Multics development cycle.

One reason why the 'C' language designers chose to keep the number of operators very small was the same as for keeping the number of reserved keywords small, e.g. reusing "static" for more than one purpose. The compiler and language designers wanted a very small language, which made for a small compiler and thereby also for fast compiler runs. If you compare 'C' against its precursors, with BCPL at the starting line, you'll find that the language became smaller and smaller with each step towards 'C'. The syntax also became more focused and less "baroque", if you will.

The Unix implementors wanted a system development language like BCPL, but on their own terms. They already had enough experience with assembly language not to build a better assembly language development platform instead (I have seen what that can produce: Elate). They chose BCPL with its Algol 68 legacy as the starting point for their work. 'C' may look like it was just one step above assembly language, but that was not the primary objective, more like what you get when you sharpen a knife blade over and over again: you'll bleed for it

There's a good description of how 'C' came to be in Peter van der Linden's book "Expert 'C' programming: Deep 'C' secrets". When I read it years ago, I was baffled by the explanations offered why 'C' became what it is. Really, that book explains so much about the 'C' language's collection of weird aspects it almost makes you appreciate it more than you did before But it is hard to like the language which keeps luring you into setting painful traps for yourself.
Olaf Barthel is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Who can help generating an IPF of Andromeda Mission? apex Amiga scene 5 25 December 2021 13:34
Generating FFP values at compile time deimos Coders. C/C++ 28 13 July 2021 21:19
Generating an accurate Paula period table 8bitbubsy Coders. General 55 07 September 2020 21:04
WinUAELoader: generating single game .uae? Telegattone support.WinUAE 1 27 December 2016 12:28
Software for generating screenshots and videos Edi (FZ2D) Retrogaming General Discussion 5 08 April 2010 23:34

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 03:38.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.11054 seconds with 13 queries