C compiler generating instructions which can be jumped into the middle of? - Page 2

Thomas Richter · 01 November 2023, 22:20

Quote:

Originally Posted by paraj

Very interesting, thanks for the info. Indeed the UM is not very clear, but my takeaway was that it was not something that would come up in most Amiga cases (and I think that's the correct reading, right?).

Indeed, that situation would relatively rarely come up at all.

Quote:

Originally Posted by paraj

Maybe there's an idea that the 060 somehow caches "instructions" (or remembers instruction boundaries). It doesn't. The instruction cache is "dumb", and when a branch is mispredicted all prior knowledge is lost and it flushes the instruction pipeline and restarts fetching from the correct address. (Not 100% true, but I think this is close enough). This can be observed easily with instruction fetch limited loops.

I can only speculate how the branch cache operates, if you attempt to decypher what the 68060UM provides then it seems like the processor keeps pairs of (logical address, target address) for each branch taken, and performs instruction prefetch from the "target address" if the prefetch address hits the "logical address" in that cache, otherwise performs prefetching in increasing address order. That would very well explain *why* the instruction prefetch could trigger an access error, and why flushing the branch cache helps (or is recommended).

The instruction cache itself is a relatively stupid mechanism that sits "upfront the MMU" at the physical side of the bus and buffers instruction fetches issues by the prefetch pipeline sitting at the other end of the MMU. Since the instruction cache is physically indexed (unlike the branch cache), it does not require flushing on "MMU context changes".

The MMULib always flushes the branch cache on MMU Context changes, but that is *not* a task switch - a task switch (typically, at least) stays within the same context. I believe it was this chapter in the 68060UM that inspired me to name such objects "MMUContexts".

Photon · 01 November 2023, 23:07

Quote:

Originally Posted by paraj

The instruction cache is "dumb", and when a branch is mispredicted all prior knowledge is lost and it flushes the instruction pipeline and restarts fetching from the correct address. (Not 100% true, but I think this is close enough). This can be observed easily with instruction fetch limited loops.

What's in cache is just data, even code. The 68060 will by design predict the branch as taken and must break the work done in the pipeline, because the half-instruction isn't decoded. The pipeline is made to decode instructions in sequence. Any CPU might attempt to decode at least the first instruction in case the branch is taken, but the generated code prevents any such thing in any CPU, and also thwarts debugging efforts if a problem is caused by hacky code of this sort.

Quote:

Originally Posted by Minuous

That's not correct, there were already lots of languages using = for assignment.

There are two sides to this, the CSci side which says that comparison operators are well established and should be kept, and this other route that C took at the time. Distinguishing comparison from assignment is obviously vital to readability, but in this case both interpreting it as comparison or assignment makes no sense, as written. The one-liner in OP looks that bad because it triggers some optimization in a specific version of a compiler, which has nothing to do with a language, and this is very bad for readability, and is very far from CSci (=educational) and Pseudocode.

Some languages had a syntax that could distinguish assignment from comparison with the same operator. A simple example would be BASIC, requiring LET, and some dialects could later remove the LET. Other languages need superfluous parentheses around the simplest comparisons to make the distinction.

C added further hacky operators, such as += etc. which also deviated from all other languages. a <- a + b reads perfectly well and can be unambiguously interpreted by any interpreter or compiler.

I take the CSci route of <insert famous language developer/mathematician>. I can make a language filled with various hacky shorthand combination symbols that makes any reader of just normal code scratch his head. And then some more. And then some more.

But I won't, because all code is an expression of thought. I'd much rather that every single language calling itself one, got close to Pseudocode, which is vetted for expressability and readability of these thoughts. We don't want weird dialects anymore, and it wasn't desirable then.

Quote:

Originally Posted by Olaf Barthel

The 'C' programming language was shaped by many desires and lessons learned from previous experience, not the least by the arduous Multics development cycle.

One reason why the 'C' language designers chose to keep the number of operators very small was the same as for keeping the number of reserved keywords small, e.g. reusing "static" for more than one purpose. The compiler and language designers wanted a very small language, which made for a small compiler and thereby also for fast compiler runs. If you compare 'C' against its precursors, with BCPL at the starting line, you'll find that the language became smaller and smaller with each step towards 'C'. The syntax also became more focused and less "baroque", if you will.

The Unix implementors wanted a system development language like BCPL, but on their own terms. They already had enough experience with assembly language not to build a better assembly language development platform instead (I have seen what that can produce: Elate). They chose BCPL with its Algol 68 legacy as the starting point for their work. 'C' may look like it was just one step above assembly language, but that was not the primary objective, more like what you get when you sharpen a knife blade over and over again: you'll bleed for it

There's a good description of how 'C' came to be in Peter van der Linden's book "Expert 'C' programming: Deep 'C' secrets". When I read it years ago, I was baffled by the explanations offered why 'C' became what it is. Really, that book explains so much about the 'C' language's collection of weird aspects it almost makes you appreciate it more than you did before

But it is hard to like the language which keeps luring you into setting painful traps for yourself.

History is the description of how things happened, shrug. The pain is what I describe we could have avoided, so that from the 1990s onward programmers could just express thoughts as normal in the CSci languages.

Back on topic, every language has a way of making a comparison to set condition flags, because that's the only way to create if-statements as here. Turning down CSci languages is the reason for being forced to write the a=a code that expresses no thought, to try to provoke some optimization of some compiler version, to get this unanalyzable clump of Assembly code out that is far from optimized. And that this true also for innumerable obscure C statements that have no root in how a CPU works or how you should express a thought in code.

paraj · 02 November 2023, 08:40

Quote:

Originally Posted by Thomas Richter

I can only speculate how the branch cache operates, if you attempt to decypher what the 68060UM provides then it seems like the processor keeps pairs of (logical address, target address) for each branch taken, and performs instruction prefetch from the "target address" if the prefetch address hits the "logical address" in that cache, otherwise performs prefetching in increasing address order. That would very well explain *why* the instruction prefetch could trigger an access error, and why flushing the branch cache helps (or is recommended).

Yes, something like that makes sense. There's really no time to do much of anything besides continuing the decode (if the branch is assumed to be taken, like it always is for bra) at the target address if a branch cache entry exists. Hard to get 0 cycle cost otherwise

Quote:

Originally Posted by Photon

What's in cache is just data, even code. The 68060 will by design predict the branch as taken and must break the work done in the pipeline, because the half-instruction isn't decoded. The pipeline is made to decode instructions in sequence. Any CPU might attempt to decode at least the first instruction in case the branch is taken, but the generated code prevents any such thing in any CPU, and also thwarts debugging efforts if a problem is caused by hacky code of this sort.

It has a 5-state branch predictor, and e.g. alternating between taken/not-taken is just as fast as an always taken/not-taken branch (after warm-up). Also it seems like from Table 10-17 it seems to assume that backward branches (Bcc) are taken, and forward ones are not taken in the absence of any information.
I still don't see how the code could cause issues for the CPU? It will decode instructions based on the state of the prediction, and it will either be right or it will "backtrack" and also do the right thing (and lose a couple of cycles).

bebbo · 02 November 2023, 09:05

Quote:

Originally Posted by Photon

a <- a + b reads perfectly well and can be unambiguously interpreted by any interpreter or compiler.

That evaluates to true, if b > 2a, false otherwise.

hooverphonique · 02 November 2023, 13:31

Quote:

Originally Posted by bebbo

That evaluates to true, if b > 2a, false otherwise.

I think this is meant to express "let a = a + b".

Bruce Abbott · 02 November 2023, 17:17

Quote:

Originally Posted by mark_k

Sorry if this sounds a little vague...

I seem to remember examining the code of some programs using the ReSource disassembler a long time ago, and noticing something a little odd.

(This isn't an actual code example, just trying to illustrate the point from my vague recollection.)

The C compiler had been generating code like

Code:

TST.W D2
BEQ.B *+2
CMPI.W #$7000,D0  ;$7000 is opcode for MOVEQ #0,D0

...

Does anyone remember which compiler could have output code sequences like that? Maybe some version of SAS/C???

Yes, it's SASC.

I call this the 'skipword' instruction. It's used like this:-

Code:

 tst.w d2
 beq.s .zero
 moveq #5,d0
 skipword
.zero:
 moveq #0,d0
.next:

The idea is that instead of having to branch around the 'moveq #0,d0' it simply skips over it by executing cmp.w #xx,d0 (where xx is the next instruction). This is faster than branching, and 'safe' because it doesn't affect anything except flags.

It's a tiny optimization that is often swamped by inefficient code elsewhere, but it does work so...

In assembler I would probably do something like this instead:-

Code:

 moveq #5,d0
 tst.w d2
 bne.s .next
 moveq #0,d0
.next:

Photon · 03 November 2023, 19:00

Quote:

Originally Posted by bebbo

That evaluates to true, if b > 2a, false otherwise.

Quote:

Originally Posted by hooverphonique

I think this is meant to express "let a = a + b".

Yes. In Pseudocode, a single left arrow symbol is used as the assignment operator. This was present on the keyboard (as character) on some systems in the 1970s, for those languages.

I tried using the corresponding Unicode character here, but it wasn't displayed correctly. Raw HTML is not allowed, e.g.

HTML Code:

&larr;

For these basic operators, there was a push for a more unified and math-like set of operator symbols in the mid-1980s.

The Pascal family uses inc(a,b) which is "increase a by b" (as opposed to "add them and replace a with the result"), and if an operator is a function, a language can allow it to be overridden (a paradigm used widely today to not have to name functions by their object type, but the same could apply for custom types in languages that support them).

In other languages used today, a right-arrow is used for conditional deferred execution (and sometimes a few more dissimilar ideas within the same language, which is not ideal). Using this for every statement, i.e. have the result at the end of a chain of manipulations, would make the language work like a CPU does, and make advanced mathematical expressions a chain, reducing any recursion level of expression complexity to a stack level of 1. ("The opposite of LISP".)

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Who can help generating an IPF of Andromeda Mission?	apex	Amiga scene	5	25 December 2021 13:34
Generating FFP values at compile time	deimos	Coders. C/C++	28	13 July 2021 21:19
Generating an accurate Paula period table	8bitbubsy	Coders. General	55	07 September 2020 21:04
WinUAELoader: generating single game .uae?	Telegattone	support.WinUAE	1	27 December 2016 12:28
Software for generating screenshots and videos	Edi (FZ2D)	Retrogaming General Discussion	5	08 April 2010 23:34

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)