12 September 2022, 21:22 | #961 | ||||||||
Registered User
Join Date: Mar 2018
Location: Hastings, New Zealand
Posts: 2,546
|
Quote:
Quote:
Quote:
6.56 Built-in Functions to Perform Arithmetic with Overflow Checking Quote:
Quote:
Due to that and other issues with GCC I'm not touching it. Instead I port the code to SASC. I fully admit to not being an expert on GCC or C in general, so this often takes a while (I'm getting better at it...). This is worth doing because I have fun doing it, and it's in the retrocomputing spirit. Who knows, perhaps one day I will port some of my own C code to the Amiga, and of course release the sources for anyone else to use. Quote:
But there was a rivalry between Intel and Motorola with each trying to outdo the other - for a while. At some point (040?) Motorola probably realized they didn't have the chip-making skills to keep up, so they started removing features instead. Intel just kept making chips with more and more transistors, and putting bigger and bigger heatsinks with huge fans on them. This strategy worked. Cutting the CPU down to work with fewer transistors didn't. Imagine if Motorola had followed the same path as Intel. Assuming they were able to upgrade their foundries similarly, we might now have a successor to the 68030 similar to the 68080 but at GHz speeds. Apple might not have gone PPC and then back to Intel, making Macs more popular and 68k still relevant. The Amiga would ride on those coattails instead of heading down the dead end of PPC. Quote:
Quote:
|
||||||||
12 September 2022, 21:37 | #962 | |
Registered User
Join Date: Mar 2018
Location: Hastings, New Zealand
Posts: 2,546
|
Quote:
Just go to Stack Overflow and you will see what I mean. It's full of condescending elites who berate any beginner brave enough to ask a question. Can be entertaining when they fight amongst themselves though... |
|
13 September 2022, 03:05 | #963 |
Registered User
Join Date: Jun 2018
Location: Calgary/Canada
Posts: 247
|
How to check overflow of an integer in C.
Option 1. This is by far the most common one, but you compare the result to your operands. If the result of an addition is less than either of the operands, then an overflow has occurred. Most decent compilers will compile this into a simple flag-check. Option 2. Use multi-precision arithmetic. So if you want a 32x32 => 64-bit MULU on hardware that doesn't have it, break it into four 16x16 => 32-bit MULU's and add them together. My first real programming job was writing a 64-bit library for the 68332. It's not hard. Option 3. Define a small assembly routine (or even inline) which does this and then you can call it from C. Those built-ins from GCC could be reimplemented for SAS/C in an afternoon. |
13 September 2022, 05:47 | #964 | |
Registered User
Join Date: Aug 2020
Location: Australia
Posts: 663
|
Quote:
--- Leaked NTVDMx64 is based on the real Microsoft's original NTVDM, hence it's Microsoft's own direction to drop 16bit Windows support with 64bit Windows. More info from https://github.com/leecher1337/ntvdmx64 HAXM version doesn't emulate the CPU, it uses HAXM VT-x hardware acceleration (the CPU needs to support it). |
|
13 September 2022, 06:23 | #965 | |
Registered User
Join Date: Aug 2020
Location: Australia
Posts: 663
|
Quote:
RiVA AMMX (AC68EC080) requires a separate build from RIVA 68K while PiStorm/RPi 3a/Emu68 (like Transmeta Code Morphing Software method, cite ref 1) path improves the legacy RIVA 68K's playback performance. Depending on the CPU, CoffinOS R58's startup swaps between RIVA AMMX and RIVA 68K executables. Transmeta Code Morphing Software (CMS) includes JIT with instruction reorder on VLIW micro-architectured CPU. The compiled code is stored in a "translation cache". CMS does not try retranslating the region in which the interrupt occurs. CMS has techniques for handling self-modifying code. Reference 1. https://www.cs.cornell.edu/courses/c...log/transmeta/ |
|
13 September 2022, 08:29 | #966 | |||||||||||||
Registered User
Join Date: Jun 2018
Location: Calgary/Canada
Posts: 247
|
Quote:
Quote:
Better code density? Than Thumb2? No way. Quote:
Code:
int8_t flags; Code:
struct { int8_t flags; .... }; // size is 32 bytes Quote:
Quote:
Quote:
On the 68K, RAM was "fast enough" to not care, so optimization tricks like using tables, were a common thing. At 1GHz, none of those tricks work anymore because a cache miss can cost dozens of clock cycles. So it's better to compute on the fly and keep things in registers. 68K style of assembly language requires that the CPU be slower than RAM and even by the 68060, that was breaking down. Quote:
Quote:
But not the stack. By the time you're in the interrupt, you're already dozens of cycles behind the ARM. Quote:
Again, once processors top around 200MHz, direct memory for everything becomes a serious limitation since RAM cannot keep up anymore. Quote:
Quote:
On ARM you just omit the 's' flag on the opcode and then all ALU operations don't affect flags. The nice thing is, this works for all ALU operations like MUL and DIV and not just the couple cherry picked ones that some engineer in 1976 though would be useful. Saving and restoring are two cycles too many for me. Quote:
Quote:
ARM has better code density than 68K. Your contrived example is far worse than mine. I see this kind of pattern all the time in compiled code on ARM and use it in PJIT. It's great. I love conditional everything. I love that every load can also be a sign or zero extend. I love that I can take huge steps when indexing. It's great for structs. But you're an ASM coder, you don't think in "structs." But a single RMW for a RAM variable? Unless that's ALL you're going to do with that variable, it would be a lot more efficient to have separate LOAD/ADD/STORE steps. Not that I've ever had to have an interrupt just to count one number. That's what timers are for. Your "everything in memory model" doesn't work with modern hardware where CPU's are several dozens of times slower than the fastest RAM. Caching helps, but expecting it to save your bacon is poor programming design. And even on the 68000, loading stuff into register to do a lot of work is still going to be faster than munching through RAM all the time. Every RMW is going to eat cycles, and ADDQ.L to a register is always going to be faster than ADDQ.L to RAM. |
|||||||||||||
13 September 2022, 08:37 | #967 | |
Registered User
Join Date: Jun 2018
Location: Calgary/Canada
Posts: 247
|
Quote:
- Emu68 is not a tracing JIT; cache is just flushed if it runs out - Emu68 does not perform any interpreter passes first - Emu68 does very little to optimize code - Emu68 has no 'rollback' nor any need to - Emu68 handles self-modifying code by checksumming the whole compiled block Bernie talks about this in the Apollo forums -- too much optimization is usually worse for JIT than just "begin a dumb translator." Emu68 doesn't do all the crap so it's best case performance actually matches that of the ARM core itself. Transmeta was trying to be too smart here and didn't get that simpler is better. Last edited by nonarkitten; 13 September 2022 at 08:45. |
|
13 September 2022, 10:26 | #968 | |||||||||||||||||||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
|
Quote:
Quote:
Recent ARM, yes. Original ARM couldn't do words. And while it can load/store them, it can not do direct computations on this data size, and it can not operate arithmetic operations on the fly. Instead of single, simple RMW instructions, you have load + ope + store. In comparison, even 6502 can increment a memory cell in just one instruction. Quote:
Quote:
Oh yes, it has. And the bigger the program becomes, the bigger the difference is. Ready for a small code contest ? You seem to forget that 16-bit Thumb opcodes can only access 8 registers while 68k 16-bit opcodes can access them all. And operations they can perform are of course limited, while on 68k they are not. Quote:
Quote:
Quote:
No, really, it's not the byte, it's not the 32, it's not the conditional. Not per individual. It's all them together that makes an unlikely situation. Quote:
Quote:
Anyway, as i said, we don't need to always sign extend. Rarely, actually. We're not on some puny RISC cpu which can only perform 32-bit computations and is therefore forced to extend everything. Besides, most bytes are actually unsigned. Quote:
You forgot that there are two 32-bit values to load with ldr (the address and the data). Quote:
We can use tricks to reduce the register pressure, like using the high part of a register to hold different data - something that requires the ability to perform byte or word operations without touching the rest of the register. Consider, for example, two flags and a 16-bit loop counter all in same register. We don't have infinite number of registers either and top of stack is nice candidate for L1 cache. Oh yeah ? So how many instructions for simple 68k's JSR in the stack ? Quote:
Of course you also have one register less for use in regular programs. Hopefully ! Quote:
But if the code becomes complex enough, then ARM too will be forced to use the stack, destroying the benefit. Quote:
Besides, with fully pipelined cpu there's not such microcode at all. Your LOAD, ADD and STORE are all done at different stages of the pipeline. Quote:
Second, it's not about direct memory for everything. Memory is still used and when it needs to be, you're happy not having to waste registers for its access. Quote:
Quote:
Now consider this situation : Code:
.loop addx.l -(a0),-(a1) dbf d0,.loop This happens in emulation as well, not all instructions touching the flags will change them all (rarely, actually, on 68k most leave X bit alone). Consider simple btst which only touches Z. ARM will not leave you the choice : all flags are altered, or none are. Quote:
Also JIT isn't really clever, it has fewer options than a regular compiler. Then show me the equivalent of bfins d2,(a0){d0:d1). Or bfset (a0){d0:1}. Certainly not. You can always cherry-pick a single example but it won't prove anything. It's only with real examples of say 20-40 instructions that we can start to see something. Quote:
I see the value of this, but tell that to Gunnar who said mvz/mvs were useless and refused to add them... Quote:
Indexing is something like move.b (a0,d0.w),(a1,d1.w) or add.w (a0,d0.w),d1 and ARM can NOT do this. Quote:
So when it happens, be happy to have operations that don't require using a register because in many situations you don't have any that's free and you have to save and restore one ! Invalid point. ADDQ.L to a register assumes we have a register available, and RMW eats less cycles than separate load+op+store. |
|||||||||||||||||||||
13 September 2022, 11:00 | #969 |
Registered User
Join Date: Jun 2015
Location: Germany
Posts: 1,918
|
The sign-extension on loading indeed is not necessary in the 080 because the ext-instruction is only another word-sized instruction that will be fused with the move. That means the combination of the move and the ext is treated exactly like a sign-extending ldr instruction, it will be scheduled to the ALU as a single instruction and it will execute in a single cycle and even in parallel with another instruction scheduled to the other pipeline.
With regard to ARM, I liked coding ARM in assembly language and I didn't have much difficulties with it. It looked to me much like a more orthogonal 68k CPU. There is no distinction between A and D registers and there are no exceptions like EOR and some other stuff. I also like 3-operand code even though CPUs nowadays don't gain much (if anything) from 3-operand code because 2-operand instructions and the extra move-instructions usually can be fused into a single operation on a 3-operand ALU. It thus can be argued that 3-operand code means worse code density without producing faster code. I never found the ARM mnemonics hard to remember or decipher. It's simply a matter of getting used to them. I never worked much on memory even on 68k, in speed-critical code you usually burst-load data, process it and then store it. For non-speed-critical code it just doesn't matter much whether you get worse code density. We now have plenty of RAM and storage. I agree that predication is a concept on ARM that at first sight appears a great feature but at second sight isn't that attractive any more. It's four bits wasted in each instruction and gets used rarely. For blocks of instructions a branch is usually better even on ARM. So yes, code density suffers on ARM as much as on most RISCs but the predictable size of instructions is a much more important advantage for creating a high-speed processor than code density is. I think the most popular compromise for recent CPU architectures is to mix 16bit and 32bit instructions which still helps keep the instruction decoders simple (important for highly superscalar CPUs) but gives overall much improved code density. Last edited by grond; 13 September 2022 at 11:14. |
13 September 2022, 12:14 | #970 |
Registered User
Join Date: Dec 2019
Location: Ur, Atlantis
Posts: 1,899
|
Whoa! For those who follow the "how many PgDwns to scroll a post" rankings, Meynaf has just came up with a world-beating 7-presser! That is truly impressive and leaves the previous contenders (TR: 4x, BA: 3x, ='_'=: 4x though that was separate posts) waaaay behind.
It will take some doing, but can anyone beat this? |
13 September 2022, 12:21 | #971 | |
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
|
Quote:
|
|
13 September 2022, 12:33 | #972 | |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,215
|
Quote:
Code:
A1 = A0 ..and nobody stops you creating a similar assembler for 68K or arm such that Code:
add.l 4(a0),d0 becomes something like Code:
d0 += [a0+4] |
|
13 September 2022, 12:35 | #973 |
Registered User
Join Date: Sep 2013
Location: Poland
Posts: 807
|
It's funny to see how (again) discussion went from Apollo products (or design, or features) to asm vs c and 68k vs arm What I'd like to add in that particular topic - it doesn't matter if C is memory hungry. It doesn't even matter should it waste clock cycles. If you can bring new software to Amiga it is good. And iirc VanillaConquer is written in C and so is DevilutionX ... So everyone can argue which approach is the best in their opinion but the one thing which remains is "put money where your mouth is". ASM might be fun, fast and efficient but it is irrelevant when there's hardly any NEW software written in it. Even in the last corner where it is still widely used.
|
13 September 2022, 12:43 | #974 | |||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
|
Quote:
In all cases, not having it still damages code density and programming flexibility, which are my main concerns. Quote:
Having A and D registers is a nice feature as they don't behave the same and both ways have their use. Consider add.w d0,d1 vs adda.w d0,a1 - one touches ccr and leaves high part alone, the other leaves ccr alone and provides automatic extension. Funny that you mention the EOR exception, 68k indeed can't do EOR from mem, but ARM also can't ! You can't possibly see that as an advantage. Quote:
Quote:
Quote:
|
|||||
13 September 2022, 13:18 | #975 |
Banned
Join Date: Feb 2022
Location: Anywhere and everywhere I have a contract
Posts: 822
|
Well here is the deal with V2 licensing Bromigos! Just had this sent to me on Email.
license V2 cards: 90% of all V2 cards are licensed and can be updated without any trouble. If your card is licensed the following informations are not important for you. This is required for previously NOT licensed V2 accelerator cards. Licensing your V2 Card will allow you to benefit from the new core updates, to use all the new games and to take advantage of the all great new features. Unlicensed cards will only have a black and white screen after core update 2.16 and higher. ==> Did you buy your card from a reseller, then please contact him. ==> If you want to purchase your Apollo 68080 core license in the apollo computer shop Buy your V2 license SPECIAL DISCOUND PRICE of 50 € UNTIL END OF SEPTEMBER (Normal price 100 €) How does the licensing work: You pay the license fee in the shop. You read the serial number of your V2 and email us the number ==> type in the CLI: VControl SN We will send you a personalized license sticker. We will update the serial number in the core update asap. From now on you can use all core updates without any problems. |
13 September 2022, 13:28 | #976 | |||||||
Registered User
Join Date: Jun 2015
Location: Germany
Posts: 1,918
|
Quote:
I have never run out of memory for code. Yes, I tried to squeeze loops into the 020/030's tiny instruction cache for optimal execution but it's not like I see this as a common usecase for which an ISA should be defined. With cache sizes common in this century I don't see much problems in having lower code density (at least not if code density isn't lower by factors) if the instruction decoder gets so much simpler by having reliable instruction boundaries. Quote:
Quote:
Quote:
Yes, but you can't choose which one does, it's implied by the registers you use. ARM can do the extension and you can specify for each instruction whether it should modify the flags or not. Quote:
Quote:
Quote:
|
|||||||
13 September 2022, 14:10 | #977 | |
Registered User
Join Date: May 2018
Location: Ireland
Posts: 672
|
Quote:
|
|
13 September 2022, 14:19 | #978 | |||||||||
son of 68k
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
|
Quote:
But as he refused to provide some kind of nice programming tools, coders aren't exactly beating down his door. Quote:
Quote:
Quote:
Quote:
Quote:
Anyway, specifying if you modify the flags or not has a cost of 1 extra bit per instruction. I really prefer the D/A split which comes at no encoding cost. Quote:
Quote:
So if they say "cpu xyz has better asm than 68k", they speak about something they do not know. Write whole programs of significant size in asm first. Then we'll talk. Quote:
|
|||||||||
13 September 2022, 15:33 | #979 | |
Ex nihilo nihil
Join Date: Oct 2017
Location: CH
Posts: 4,856
|
Quote:
Alone on this sole thread, I do not count anymore how much time this argument was given "with actual speed, bla bla bla, it doesn't matter"... |
|
13 September 2022, 15:45 | #980 | ||||||
Registered User
Join Date: Jun 2015
Location: Germany
Posts: 1,918
|
Quote:
I was referring to how I assess CPU architectures. Quote:
a) there are eight instructions b) where each instructions is located When you try to decode 32 bytes of x86 instruction data, you don't know any of this. 68k is somewhere in the middle. Quote:
Quote:
Quote:
Quote:
Last edited by grond; 13 September 2022 at 16:40. |
||||||
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Vampire V4 plus Amiga 1200 and 500 for sale | drusso66 | MarketPlace | 7 | 14 November 2021 05:59 |
For Sale: Amiga 1200 with vampire 1200 v2 | supperbin | MarketPlace | 8 | 09 July 2021 15:47 |
Warp 1260 or Vampire 1200 V2 | dude1995 | MarketPlace | 0 | 20 May 2021 04:05 |
Vampire 1200 | HanSolo | support.Hardware | 55 | 19 June 2017 10:15 |
Amiga 1200 Vampire Cards | PaulG | Amiga scene | 61 | 24 February 2017 03:47 |
|
|