English Amiga Board - View Single Post - Vampire discourse, keep it civil (was: Vampire 1200 V2 waiting times)

nonarkitten · 13 September 2022, 08:29

Quote:

Originally Posted by meynaf

This is totally ridiculous.
68K can do direct memory-to-memory moves, it can do arithmetic to/from memory directly too, it can handle three data types in computations and has a complete set of addressing modes.

ARM can load bytes, words and longs too. ARM does ternary arithmetic though, so it needs far fewer pointless moves shuttling around data in registers.

Quote:

Originally Posted by meynaf

So much programming flexibility in comparison to any RISC cpu. ARM is incredibly limited and has a terrible syntax. Not to mention 68k also has better code density.

I like ARM's syntax. It's quite readable. Not sure what issue you have.

Better code density? Than Thumb2? No way.

Quote:

Originally Posted by meynaf

Why would I do this, it's totally useless instruction in real life code.

Which part, loading a signed int? Because this isn't real?

Code:

int8_t flags;

Or maybe it's the 32-byte offset? Because this isn't real? You couldn't have an array of these you might want to iterate over?

Code:

struct { int8_t flags; .... }; // size is 32 bytes

Or maybe it's the conditional? Because those never happen either, right?

Quote:

Originally Posted by meynaf

Very funny argument, considering predicates have been dropped in aarch64 (along with automatic barrel shifter).

These were dropped because ARM64 still uses 32-bit opcodes and the number of registers needed more bits.

Quote:

Originally Posted by meynaf

Not five instructions, just four (and if Günni listened to me it would be only three).

The original 68000 has no byte-to-long sign-extend, but sure. On the 68020 and higher, this would "only" be four.

Quote:

Originally Posted by meynaf

As said, your one RISC instruction isn't very useful for real life workloads. Now consider simple and much more useful add.l #data,mem on 68k (32-bit constant added on a variable with a linear 32-bit address). Doing that on PPC should require something like 6 instructions. How many on ARM already ?

Three, ldr, add, then str. This is a little contrived, you usually don't march through your data in passes doing one thing at a time.

On the 68K, RAM was "fast enough" to not care, so optimization tricks like using tables, were a common thing. At 1GHz, none of those tricks work anymore because a cache miss can cost dozens of clock cycles. So it's better to compute on the fly and keep things in registers. 68K style of assembly language requires that the CPU be slower than RAM and even by the 68060, that was breaking down.

Quote:

Originally Posted by meynaf

Why would we want to do this ? We HAVE a stack, so why not just use it ?

Because you don't have infinite memory bandwidth?

Quote:

Originally Posted by meynaf

Oh sorry, maybe ARM does not support a proper stack ? Oh wait, i forgot : you wasted a GPR for the program counter so you don't want to reserve another one. I understand.

Yes ARM supports a "proper stack," don't be stupid. And using PC in a regular register has unlimited potential for abuse that's so cool. Things you could never do on 68K.

Quote:

Originally Posted by meynaf

But we can serve an interrupt without touching the registers.

But not the stack. By the time you're in the interrupt, you're already dozens of cycles behind the ARM.

Quote:

Originally Posted by meynaf

The 68k can support direct memory operations.
Oh, wait. ARM can't do ADDQ.W #1,mem ? Poor mite, your so great cpu can't implement simple interrupt counter without touching registers.

That's silly. Why would you worry about touching registers? You're ADDQ is still performing the LOAD, ADD and STORE operations, you're just not aware of what's going on with the microcode.

Quote:

Originally Posted by meynaf

And it can't move a memory cell directly to another ?

Again, once processors top around 200MHz, direct memory for everything becomes a serious limitation since RAM cannot keep up anymore.

Quote:

Originally Posted by meynaf

Also we CAN branch to a subroutine without touching the stack, it's just LEA+JMP. Not that this operation would be a common one, of course.

Good point, but then you don't get any sort of prediction on the "return."

Quote:

Originally Posted by meynaf

Try move ccr to a data register. Or use Scc instruction to keep the condition. Or better, do your computation on an address register. Or don't do it at all, it's not a common operation either.

It's pretty common in emulation.

On ARM you just omit the 's' flag on the opcode and then all ALU operations don't affect flags. The nice thing is, this works for all ALU operations like MUL and DIV and not just the couple cherry picked ones that some engineer in 1976 though would be useful. Saving and restoring are two cycles too many for me.

Quote:

Originally Posted by meynaf

But again, there is nothing wrong in having a stack. Or maybe you're allergic to stacks ? Don't have a look at java bytecode or webassembly then !

LOL. These are intermediate representations and both will always get JITed into machine code. Many of those stack operations get eliminated.

Quote:

Originally Posted by meynaf

Not everything is useful, but we have Bitfields and good luck with your ARM to do the same with fewer instructions !

ARM has bitfields too.

ARM has better code density than 68K.

Your contrived example is far worse than mine. I see this kind of pattern all the time in compiled code on ARM and use it in PJIT. It's great. I love conditional everything. I love that every load can also be a sign or zero extend. I love that I can take huge steps when indexing. It's great for structs. But you're an ASM coder, you don't think in "structs."

But a single RMW for a RAM variable? Unless that's ALL you're going to do with that variable, it would be a lot more efficient to have separate LOAD/ADD/STORE steps. Not that I've ever had to have an interrupt just to count one number. That's what timers are for.

Your "everything in memory model" doesn't work with modern hardware where CPU's are several dozens of times slower than the fastest RAM. Caching helps, but expecting it to save your bacon is poor programming design. And even on the 68000, loading stuff into register to do a lot of work is still going to be faster than munching through RAM all the time. Every RMW is going to eat cycles, and ADDQ.L to a register is always going to be faster than ADDQ.L to RAM.