Apollo Maggie 3D Chip - Preview Demo - Page 7

meynaf · 04 April 2022, 17:24

Quote:

Originally Posted by Thomas Richter

Nobody, performing productive work. What you do in your spare time is of course your business, but in professional software development, rarely ever, maybe some embedded projects if at all.

"professional software" (aka boring business software) is OT here.

Quote:

Originally Posted by Thomas Richter

We've been switched long time from hand-optimized assembler to hand-tuned C++ code with CPU specific vector extensions. Of course it is all "measure- tune - look at compiler code", so reading assembler is helpful to some degree, but writing it, no longer.

If you can't write it, you can't read it either. So writing it IS useful in all cases.
And even when you don't read it anymore, asm knowledge is useful for writing better code regardless of the language.

Quote:

Originally Posted by Thomas Richter

Compilers became quite powerful, and it's hard to beat them, and typically not worth the trouble. You get more gains by CPU specific optimizations.

Hard to beat ? I routinely beat GCC by a factor of at least 2, when it's not 4 or even more.

Quote:

Originally Posted by Thomas Richter

Mostly, it is, yes. Any moderately sized project in assembler is a PITA. It's taking too long to write, too long to debug and too long to understand. Creating software is to a good deal about creating a structured architecture you and your colleages are able to handle, and that's just not given with assembler.

It seems we've had similar discussion in the past and it has lead nowhere...
Anyhow, large projects involving several programmers are a PITA regardless of the language.

Quote:

Originally Posted by Thomas Richter

Assembler is mostly unstructured program flow and hard to follow. I'm using (for the 68K legacy projects) a lot of macros that create structure (saveregs/defvar/do-loop/for-next), just to avoid some of the typical assembler problems. It reads close to C in some respect, but there are still typical assembler problems left. Register allocation is a typical problem.

If you're writing asm like you would write C, you're missing the point.
Coding in asm needs a very different mindset, which some programmers simply don't have.
Register allocation is indeed typical, in the sense it's easy to handle while writing the code with the d8/a8 trick. The mistake is doing this allocation too early.
On the other hand, signed/unsigned mismatch is typical C problem that does not occur in asm.
Different languages, different issues.

Quote:

Originally Posted by Thomas Richter

That depends on the project. My programs are "data-bound", so the size of the code size does not really matter, but the amount of data you can push through the code matters, and proper data flow such that the code can exploit the cache.

And this is where compiled code fails miserably.
In asm we can perform computations on 8-bit and 16-bit entities directly, where C insists upon converting everything to "int" (usually 32-bit) - and programmers usually don't care about size (after all, we've got plenty of memory !).
This leads to more data to handle, and of course it means more pressure on data cache.

meynaf · 04 April 2022, 17:34

Quote:

Originally Posted by Promilus

On the other hand I find that "let's make Diablo for Amiga, let's port Tomb Raider for Amiga" rather amusing. While this might allow some new developers to prove themselves it doesn't bring anything really new to amiga world.

For my part, i value doing game ports as it allows fixing errors in original games, sometimes even adding new features. And only Amiga makes this possible.
But i reckon game design appears to be a little out of new ideas. It seems to me that games were more innovative in 8-bit days.

dreadnought · 04 April 2022, 17:44

Quote:

Originally Posted by grond

Actually writing simple 3D games may be easier than writing good looking and entertaining 2D games. That's the primary reason why there are so many 1st person shooters. You need a few textures, a couple of monsters, some sound fx and then you just need maps, maps, maps.

I think "may be" is the key word in this pargraph. AI & the engine won't write itself, and you'll also need quality textures, models, maps, story, etc - so it's just like a 2D game really.

Promilus · 04 April 2022, 19:39

Quote:

In asm we can perform computations on 8-bit and 16-bit entities directly, where C insists upon converting everything to "int" (usually 32-bit)

It depends heavily on which optimization is chosen and how exactly code is written. If I use int8_t or char it WILL work with byte operations should optimization be none or size only. Weird things start to happen when -O2 or O3 is chosen. But apparently that way IS faster.

Quote:

But i reckon game design appears to be a little out of new ideas. It seems to me that games were more innovative in 8-bit days

I have to agree with that. Nowadays there's plenty of "exceptional graphics" but fresh ideas ... not really. Just remasters, prequels, sequels ... achievements, finishers, microtransactions, quick time events etc. But unique storyline, gameplay... better search in indie...

meynaf · 04 April 2022, 20:12

Quote:

Originally Posted by Promilus

It depends heavily on which optimization is chosen and how exactly code is written.

Assuredly, and that's the problem.
Usually the programmer doesn't know what he's doing.

In addition, compilers must follow the specs and limitations of the source language where asm programmers must only get the right result. This means some optimizations are forever "forbidden" to even the best compilers.

paraj · 04 April 2022, 21:04

Quote:

Originally Posted by grond

The Pentium doesn't do more instructions per cycle than the 060 and most certainly not more than the 080.

IIRC Pentium allows FPU operations to execute concurrently with integer operations (famously used for to interleave floating point divisions with the texture mapping inner loop in Quake), while an 060 will stall (again IIRC), but for integer stuff they seem mostly comparable architecture wise.
Don't know if that factors in to the estimate or how that's handled on the 080.

Thomas Richter · 04 April 2022, 21:31

Quote:

Originally Posted by meynaf

"professional software" (aka boring business software) is OT here.

Oh, you don't to talk about it? But I do.

Quote:

Originally Posted by meynaf

If you can't write it, you can't read it either.

I can't write x86 assembler, but I can read it perfectly fine. I don't want to write it, actually. Neither do I want to write arm code, but I can read it fine. Yet, I don't want to care about all the details at instruction level.

Quote:

Originally Posted by meynaf

Hard to beat ? I routinely beat GCC by a factor of at least 2, when it's not 4 or even more.

Well, go along, and demonstrate.

Quote:

Originally Posted by meynaf

It seems we've had similar discussion in the past and it has lead nowhere...

To some conclusions, at least. You're a die-hard. Sure, as hobbist, fine. But that doesn't make this a relevant craft or a sane approach for software development.

Quote:

Originally Posted by meynaf

Anyhow, large projects involving several programmers are a PITA regardless of the language.

With you in the team, for sure. (-: The problem is: Most software is large, and requires more than one person in the team to create. Software is not about "hacking code", but creating architecture that is clear enough so other people can read, understand and use it.

Quote:

Originally Posted by meynaf

If you're writing asm like you would write C, you're missing the point.

No, you are missing the point. If you apply the mindset that assembler is a lumb of code without structure, you will fail with every medium sized project because software is about structure and architecture. That is something very substantial you don't seem to understand - probably because you never worked on a serious project large enough to stumble over your own feet.

Quote:

Originally Posted by meynaf

On the other hand, signed/unsigned mismatch is typical C problem that does not occur in asm.

Of course it does.If you compare an argument to function that can represent a negative value with a value that can get large enough to set the MSB, then you do have a problem. The C compiler is just pointing you to a problem you might have easily overlooked when writing in assembler. Yes, you can solve it, but to solve a problem, you need to see it, and the compiler is able to point you to the problem.

Quote:

Originally Posted by meynaf

And this is where compiled code fails miserably.
In asm we can perform computations on 8-bit and 16-bit entities directly, where C insists upon converting everything to "int" (usually 32-bit) - and programmers usually don't care about size (after all, we've got plenty of memory !).

When's the last time you looked at compiled code, again? A compiler will typically eliminate the upconversion and downconversion.

Quote:

Originally Posted by meynaf

This leads to more data to handle, and of course it means more pressure on data cache.

Cache friendly code is not about low-level optimizations like that. Cache friendly code means that you need to create a "data-flow" architecture within which the data remains "hot" all the time. Cache friendly means having the right architecture - and not about caring upconversion of data. The latter the compiler will care about just fine by itself.

Gorf · 04 April 2022, 21:32

Quote:

Originally Posted by paraj

IIRC Pentium allows FPU operations to execute concurrently with integer operations (famously used for to interleave floating point divisions with the texture mapping inner loop in Quake), while an 060 will stall (again IIRC), but for integer stuff they seem mostly comparable architecture wise.

Both is correct.

Quote:

Shay Shariatzadeh

the FPU of the 68080 is fully pipelined, meaning it works in parallel to the integer ALU and can take full advantage of interleaving operations.

Pollock · 04 April 2022, 22:57

Quote:

Originally Posted by Gorf

Both is correct. the FPU of the 68080 is fully pipelined, meaning it works in parallel to the integer ALU and can take full advantage of interleaving operations.

yeah, but the FPU is my 68060 is faster than the Vampire FPU

Gorf · 04 April 2022, 23:00

Quote:

Originally Posted by Pollock

yeah, but the FPU is my 68060 is faster than the Vampire FPU

Is it?
I don't think so.

Pollock · 04 April 2022, 23:07

Quote:

Originally Posted by Gorf

Is it?
I don't think so.

actually, yes, it is.

Go benchmark both back to back, and you will see what i mean.

Gorf · 04 April 2022, 23:21

Quote:

Originally Posted by Pollock

actually, yes, it is.

Go benchmark both back to back, and you will see what i mean.

Are you talking V2 or V4?

Unless your 68060 is clocked over 150Mhz, I don't see it..
Or are you talking 80bit precision?

meynaf · 05 April 2022, 08:32

Quote:

Originally Posted by Thomas Richter

Oh, you don't to talk about it? But I do.

You sure do, but it is OT not only on this thread but also on this whole site.
Wanting to do some strawman fallacy maybe ?

Quote:

Originally Posted by Thomas Richter

I can't write x86 assembler, but I can read it perfectly fine. I don't want to write it, actually. Neither do I want to write arm code, but I can read it fine. Yet, I don't want to care about all the details at instruction level.

If you don't care about the details, you also can't read it fine (you can read it, but not fine).

Quote:

Originally Posted by Thomas Richter

Well, go along, and demonstrate.

You could benchmark my flac decoder and my picture viewer, for example.
Good luck with your compiled code to even approach their performance.
Or you could open a thread here for a nice asm-vs-compiler coding contest.

Quote:

Originally Posted by Thomas Richter

To some conclusions, at least. You're a die-hard. Sure, as hobbist, fine. But that doesn't make this a relevant craft or a sane approach for software development.

I don't have this approach for business software, obviously, as such software won't use 68k at all and i clearly don't want to mess with x86 or whatever asm.

Quote:

Originally Posted by Thomas Richter

With you in the team, for sure. (-: The problem is: Most software is large, and requires more than one person in the team to create. Software is not about "hacking code", but creating architecture that is clear enough so other people can read, understand and use it.

With me in the team, the project won't go overly large.
What you are creating with your "large" projects is actually just waste, aka bloatware. I don't create bloatware (and when i have to work on some, i don't make things worse).
But even. Complexity of a project shouldn't raise much with its size, there is something called modularity to cope with that.

Quote:

Originally Posted by Thomas Richter

No, you are missing the point. If you apply the mindset that assembler is a lumb of code without structure, you will fail with every medium sized project because software is about structure and architecture. That is something very substantial you don't seem to understand - probably because you never worked on a serious project large enough to stumble over your own feet.

It appears you didn't understand it. I never wrote that assembler is a lumb of code without structure, this is only your belief (or your own way to write asm, which could explain a few things).
Actually, a good asm program has more structure than with your average compiled code, simply because it's less tolerant to bad programming. It's like a violin - no space for mediocrity.
Of course software is about structure and architecture - but those do not depend that much on the used language.
And please do not make assumptions on the projects i worked on - you are clearly far from reality.

Quote:

Originally Posted by Thomas Richter

Of course it does.If you compare an argument to function that can represent a negative value with a value that can get large enough to set the MSB, then you do have a problem. The C compiler is just pointing you to a problem you might have easily overlooked when writing in assembler. Yes, you can solve it, but to solve a problem, you need to see it, and the compiler is able to point you to the problem.

No. Totally wrong. In asm the code indicates the signedness and is unambiguous. While the C compiler doesn't say a thing if you keep the default signed data type on something that should obviously be unsigned. Typical case is "<$20" test that mismatches accentuated characters with control codes.

Quote:

Originally Posted by Thomas Richter

When's the last time you looked at compiled code, again?

I'm not watching recent developments for x86 or whatever, if it's what you meant.

Quote:

Originally Posted by Thomas Richter

A compiler will typically eliminate the upconversion and downconversion.

Not the ones producing code for the same machines i'm writing asm for.

Quote:

Originally Posted by Thomas Richter

Cache friendly code is not about low-level optimizations like that. Cache friendly code means that you need to create a "data-flow" architecture within which the data remains "hot" all the time. Cache friendly means having the right architecture - and not about caring upconversion of data. The latter the compiler will care about just fine by itself.

Nope. Cache friendly code uses as small data as possible, and only asm does this fine.
Having the right architecture is of course mandatory but you won't see if your architecture is right or not by just trusting what the compiler does.

Mathesar · 05 April 2022, 09:26

Quote:

Originally Posted by Thomas Richter

Quote:

Originally Posted by meynaf

And this is where compiled code fails miserably.
In asm we can perform computations on 8-bit and 16-bit entities directly, where C insists upon converting everything to "int" (usually 32-bit)

When's the last time you looked at compiled code, again? A compiler will typically eliminate the upconversion and downconversion.

Meynaf is right in some cases, but it is more of a processor problem than a compiler problem. I am a hardware developer and most of my coding is in ANSI C for embedded processor which nowaday means ARM. ARM Cortex M-something in "Thumb" mode more specifically. And the stupid thing with ARM is that all arithmetic instructions can only work on 32bit values. So, if you call a function with 8bit or 16bit variables, the compiler will convert those into 32bit values before doing any arithmetic on it. This conversion even takes an extra instruction and thus an extra cycle.
68K is more flexible in that regard as it can do arithmetic on 8/16/32 bit values.

As for the discussion about code-size. The reason ARM has the "Thumb" mode is all about code size. Classic ARM code uses 32bit instruction words while Thumb mode uses 16it instruction words. And 32bit instruction words kinda add up when you are programming on a 16kb Flash / 4kb RAM device. (And dont't get me started on GCC with it's "newlib-nano". There is nothing "nano" about it

)

Mathesar · 05 April 2022, 09:49

point in case:

Code:

int8_t Mul_8 (int8_t a, int8_t b)
{
    return (a*b);
}

Compiles to:

Code:

26:main.c        **** int8_t Mul_8 (int8_t a, int8_t b)
  27:main.c        **** {
  54              		.loc 1 27 0
  55              		.cfi_startproc
  56              		@ args = 0, pretend = 0, frame = 0
  57              		@ frame_needed = 0, uses_anonymous_args = 0
  58              		@ link register save eliminated.
  59              	.LVL0:
  28:main.c        ****     return (a*b);
  60              		.loc 1 28 0
  61 0000 4843     		muls	r0, r1
  62              	.LVL1:
  63 0002 40B2     		sxtb	r0, r0
  29:main.c        **** }

See the sxtb instruction before returning the result in r0? Notice the compiler was quite smart here by only sign extending the result after the multiplication and not on both operands as well before.

meynaf · 05 April 2022, 10:01

Quote:

Originally Posted by Mathesar

68K is more flexible in that regard as it can do arithmetic on 8/16/32 bit values.

And that's where the problem lies : i've yet to see a 68k compiler doing this. Instead, they extend the data...
But even when the compiler is able, it does not imply it really will. Consider the case of a complex enough computation whose intermediate results can eventually overflow (but do not in our use case). If a and b are int8 and you write a+b, what is the type of the result ? Furthermore, it's easy to have 'int' implicitly by several means - consider writing 'c' for example, spec says it's int and not char.

Quote:

Originally Posted by Mathesar

Notice the compiler was quite smart here by only sign extending the result after the multiplication and not on both operands as well before.

That, you have to verify. It might be stupid enough to sign extend parameters while passing them !

Mathesar · 05 April 2022, 10:30

Quote:

Originally Posted by meynaf

And that's where the problem lies : i've yet to see a 68k compiler doing this. Instead, they extend the data...

But isn't that a problem of the fact that 68K compilers are kinda old now? Compilers for ARM and X86 are actively being developed and they have become much better over the years. Although one still has to tweak C-code sometimes to speed it up.

Quote:

Originally Posted by meynaf

That, you have to verify. It might be stupid enough to sign extend parameters while passing them !

Oh, I am sure it will! In fact, it depends on the calling convention. But what I've learned over the time is to look (for critical parts) at the compiler output and adjust the C-code accordingly. Since ARM has become so widespread I often use 32bit variables (for a simple loop counter for example in a tight loop) even when 16bit or 8bit would have sufficed. When passing parameters around in registers it doesn't matter anyway and it prevents usage of the dreaded SXTB instruction and it's variants.

meynaf · 05 April 2022, 10:53

Quote:

Originally Posted by Mathesar

But isn't that a problem of the fact that 68K compilers are kinda old now? Compilers for ARM and X86 are actively being developed and they have become much better over the years. Although one still has to tweak C-code sometimes to speed it up.

Yes, of course they are old and it counts. But are more recent compilers that much better ? How can we know, as nobody can really attempt to challenge them ?

My guess is that they perform reasonably well on small, straightforward code so they give the illusion of being "good enough" - but when code starts to become more complicated and demanding, havoc is unleashed.

Quote:

Originally Posted by Mathesar

Oh, I am sure it will! In fact, it depends on the calling convention. But what I've learned over the time is to look (for critical parts) at the compiler output and adjust the C-code accordingly. Since ARM has become so widespread I often use 32bit variables (for a simple loop counter for example in a tight loop) even when 16bit or 8bit would have sufficed. When passing parameters around in registers it doesn't matter anyway and it prevents usage of the dreaded SXTB instruction and it's variants.

Right, extending after a computation should have nice effects (ahem) as it involves a data dependency hazard (the cpu can't perform it prior to having the result of said computation).
In registers, being full size is not normally problematic but in memory it can be. For aarch64 i don't know, but IIRC arm32 could not perform 16-bit memory accesses.

Could be interesting, too, to see in your example what happens to r0,r1 if a function call is added before the multiply. Normally they should be considered scratch regs and lost in the call...

mschulz · 05 April 2022, 11:47

Quote:

Originally Posted by meynaf

Yes, of course they are old and it counts. But are more recent compilers that much better ? How can we know, as nobody can really attempt to challenge them ?

My guess is that they perform reasonably well on small, straightforward code so they give the illusion of being "good enough" - but when code starts to become more complicated and demanding, havoc is unleashed.

That depends. Consider a case combining modern CPU (say some aarch64) and compiler. Now, I am pretty sure you can write good code by hand for it, but you need to study the architecture of the model you are exactly using. Worth reading are also optimization guides. Then you write a code which is, let's say, optimal for the cortex-a53 you had. But, once someone will change it to cortex-a76 you would need to re-write your code considering the optimization guide for this very model. Not because the cortex-a53 code will be bad, but rather because a76 offers new/better/more efficient optimizations.

Once when I was working on PowerPC AROS I had to dive deeply into PPC assembly. I wrote nice looking code which was easy to understand and to follow. Then I took the optimization guides for PPC and improved performance of the code. It did work better yet was harder to read, harder to follow and not so nicely written, anymore.

What helps you while writing in m68k assembly is the (rather sad) fact that the architecture is already very archaic and, until vampire came out, not updated. Had it evolved as any other CPU architecture, then you would have hardly chance to write as effective code as compiler can do for you.

Quote:

Originally Posted by meynaf

In registers, being full size is not normally problematic but in memory it can be. For aarch64 i don't know, but IIRC arm32 could not perform 16-bit memory accesses.

you remember wrong. 32-bit arm as well as 16 bit thumb can read/write 16 bit data as well as 8 bit data. There are alignment restriction but these can be eventually disabled (most of them with very few exceptions, only). Of course, aarch64 can read/write 8, 16, 32, 64 and 128 bit data too.

Quote:

Originally Posted by meynaf

Could be interesting, too, to see in your example what happens to r0,r1 if a function call is added before the multiply. Normally they should be considered scratch regs and lost in the call...

I bet compiler would copy them to some callee saved registers and later on performed the multiplication there.

mschulz · 05 April 2022, 11:55

Quote:

Originally Posted by meynaf

That, you have to verify. It might be stupid enough to sign extend parameters while passing them !

This already shows how biased your opinion on compilers is...

Code:

#include <stdint.h>

int8_t Mul_8 (int8_t a, int8_t b);

int8_t foo() {
	return Mul_8(2,3)-5;
}

int8_t bar(int a, int b) {
	return Mul_8(a, b) + 3;
}

int8_t moo(int8_t a, int8_t b) {
	return Mul_8(a, b);
}

gives (gcc .cfi_proc stuff removed for clearity):

Code:

foo:
	str	x30, [sp, -16]!
	mov	w1, 3
	mov	w0, 2
	bl	Mul_8
	sub	w0, w0, #5
	ldr	x30, [sp], 16
	ret

bar:
	str	x30, [sp, -16]!
	bl	Mul_8
	add	w0, w0, 3
	ldr	x30, [sp], 16
	ret

moo:
	b	Mul_8

PS. In foo() and bar() I have added computations to result of Mul_8, otherwise gcc would optimize all of them to a jump instead of branch (same as moo())...

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Apollo 1240 missing Mach chip	Benfromnorway	MarketPlace	3	01 June 2016 21:53
Apollo 1240@25mhz + 32mb Ram (Mach131 chip so can be upgraded to 060)	fitzsteve	MarketPlace	4	16 August 2010 19:01
Gauging interest: Amiga 600HD, Apollo 620, 2MB Chip, 8MB Fast	chiark	MarketPlace	9	25 November 2009 20:18
Wanted: MACH131 chip from Apollo 040 or 060	8bitbubsy	MarketPlace	8	29 October 2009 15:55
Cedric and the lost scepture Demo/Preview-Version	mai	request.Old Rare Games	3	28 March 2008 16:27

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)