14 October 2021, 08:51 | #1301 |
old chunk of coal
Join Date: Nov 2011
Location: Hungary
Posts: 1,289
|
I attached a small(ish) test case for the inline assembly issue. This is part of the cache system from Build. I basically took a bunch of external assembly functions and turned them into GCC inlines. While the original ones work fine, these inlines don't. Maybe I made a mistake when I converted the functions to inlines, but I can't o figure out what I double checked the input/output/clobber lists, and they looked OK.
I included generic C versions of the offending functions, if you define NOASM they will be used. You can build the example program with: m68k-amigaos-gcc -Wall -noixemul -m68040 -O2 -fomit-frame-pointer -fno-strict-aliasing -o cachetest cachetest.c |
14 October 2021, 18:06 | #1302 |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,099
|
EDIT: You forgot d1 in the clobberlist
You have an error somewhere in copybufbyte (maybe because you forgot d1 from the clobberlist). Replacing its body with just Code:
"1: subq.l #1, d0\n\t" " bmi.s 2f\n\t" " move.b (a0)+,(a1)+\n\t" " bra.s 1b\n\t" "2:\n\t" Debugging note: copybuf doesn't seem to be called at all, so there's no reason to include it, and you could have checked whether either of the functions worked on their own to reduce the test case (sometimes they interact, but it doesn't seem to be the case in this example). BTW is that complicated function even worth it time wise compared to calling memcpy/CopyMemQuick or w/e? Last edited by paraj; 14 October 2021 at 18:15. |
14 October 2021, 18:14 | #1303 |
old chunk of coal
Join Date: Nov 2011
Location: Hungary
Posts: 1,289
|
As I mentioned all of these functions work perfectly as external assembly, they only break when I turn them into inline assembly functions. It doesn't matter if the complicated copybufbyte function is worth it or not, I only included it in the example because it triggers the bug very easily. In the actual program (Blood) even the simplest inline assembly functions break the code.
|
14 October 2021, 18:41 | #1304 | |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,099
|
Quote:
You just missed my edit, but your clobberlist is bugged. Inline assembly is extremely difficult to get right and the errors are very unforgiving (as you've noticed). It'll often seem to work fine until a specific set of circumstances arise (sometimes much later in the development process) and you'll have a very hard time tracking down the bugs. My recommendation would be to avoid inline assembly and stick with externally defined asm functions and only use it for the hopefully very few cases where it's essential for performance. Your mulscale32 function could be an example, but you'd want to express it in a way that doesn't force specific registers to be used, which would be more difficult to express properly but would allow it to interact better with the C optimizer. Really, avoid inline asm even if you think you know what you're doing. Speaking from experience Last edited by paraj; 14 October 2021 at 18:57. |
|
14 October 2021, 19:01 | #1305 |
old chunk of coal
Join Date: Nov 2011
Location: Hungary
Posts: 1,289
|
Thanks, I indeed missed the edit! I'll add d1 to the clobber list and try again. The reason I'm trying to use inline assembly is because the code is riddled with those 64-bit math functions, and there must be a few cycles I could save since they are called very often.
If I can get this working then I'll try to use %0, %1, etc. instead of explicit registers. edit: With d1 in the clobber list it works perfectly! Thanks again, I'll go back to every function to verify if nothing is missing from the clobber lists! Last edited by BSzili; 14 October 2021 at 19:07. |
15 October 2021, 17:17 | #1306 | |
bye
Join Date: Jun 2016
Location: Some / Where
Posts: 680
|
Quote:
sorry, but I don't see the reason for using such inline assembly with gcc. the NOASM functions aren't worse. With loop unrolling it should be even faster. https://franke.ms/cex/z/aTjM31 |
|
16 October 2021, 08:42 | #1307 |
old chunk of coal
Join Date: Nov 2011
Location: Hungary
Posts: 1,289
|
Of course. It was just an example to figure out what did I mess up (clobber lists). For the mulscale, etc. functions I need the assembly as the 64-bit multiplication is replaced with the FPU version on the 68060. The rest will be gradually replaced with C versions, as GCC6 generate pretty good code.
|
16 October 2021, 10:11 | #1308 | |
bye
Join Date: Jun 2016
Location: Some / Where
Posts: 680
|
Quote:
Oh, I see. with -m68060 the code is... insane^^ that should be fixed in gcc... |
|
16 October 2021, 18:17 | #1309 | ||
bye
Join Date: Jun 2016
Location: Some / Where
Posts: 680
|
Quote:
oh - the 68060 code is OK! Why? Quote:
which means: you may use mulu.l/muls.l on a MC68060 but this will raise an exception and an emulation of the instruction gets invoked instead. I would not bet on which is faster in the end. |
||
16 October 2021, 19:39 | #1310 | |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,099
|
Quote:
Might be missing something here, but it seems like the code generated for the C version of mulscale32 is slower than it needs to be for 060. It calls ___muldi3 (which is understandable since '060 doesn't have 32x32->64bit multiply), but then at least with my version (m68k-amigaos-gcc (GCC) 6.5.0b 210726154642, built from amiga-gcc commit 15656337dad68ed40f54d600ed2b19e64bfd9ea2) ___muldi3 looks like this: Code:
__muldi3 (DWtype u, DWtype v) { 4af8: 4e55 0000 link.w a5,#0 4afc: 48e7 3c00 movem.l d2-d5,-(sp) 4b00: 242d 0008 move.l 8(a5),d2 4b04: 262d 000c move.l 12(a5),d3 const DWunion uu = {.ll = u}; const DWunion vv = {.ll = v}; DWunion w = {.ll = __umulsidi3 (uu.s.low, vv.s.low)}; 4b08: 2003 move.l d3,d0 { 4b0a: 282d 0010 move.l 16(a5),d4 4b0e: 2a2d 0014 move.l 20(a5),d5 DWunion w = {.ll = __umulsidi3 (uu.s.low, vv.s.low)}; 4b12: 4c05 0401 mulu.l d5,d1,d0 4b16: 2041 movea.l d1,a0 4b18: 2240 movea.l d0,a1 4b1a: 2008 move.l a0,d0 4b1c: 2209 move.l a1,d1 w.s.high += ((UWtype) uu.s.low * (UWtype) vv.s.high 4b1e: 4c04 3800 muls.l d4,d3 + (UWtype) uu.s.high * (UWtype) vv.s.low); 4b22: 4c05 2800 muls.l d5,d2 4b26: d483 add.l d3,d2 w.s.high += ((UWtype) uu.s.low * (UWtype) vv.s.high 4b28: 2002 move.l d2,d0 4b2a: d088 add.l a0,d0 return w.ll; } 4b2c: 4cdf 003c movem.l (sp)+,d2-d5 4b30: 4e5d unlk a5 4b32: 4e75 rts I.e. it uses an emulated 32x32->64bit multiply (mulu.l d5,d1,d0) and does a bunch of extra work? |
|
16 October 2021, 20:17 | #1311 |
old chunk of coal
Join Date: Nov 2011
Location: Hungary
Posts: 1,289
|
Sorry I was a bit vague, I have custom FPU-based replacements for the missing instructions, that are faster the ones that come with the compiler. It's not a one size fits all solution as I have to set the FPU to round toward minus infinity, but for these games it's OK as they only use the FPU lightly and don't depend on the default rounding for the game logic.
|
16 October 2021, 20:50 | #1312 | |
bye
Join Date: Jun 2016
Location: Some / Where
Posts: 680
|
Quote:
hehe - fun - that lib wasn't build for 68060. => you need libs built for the 68060... |
|
17 October 2021, 15:47 | #1313 | |
bye
Join Date: Jun 2016
Location: Some / Where
Posts: 680
|
Quote:
something like Code:
int mulscale32(int u, int v) { return ((double)u) * v / ((double)(1<<16) * (1<<16)); } ? |
|
17 October 2021, 19:54 | #1314 |
Registered User
Join Date: Nov 2015
Location: Italy
Posts: 191
|
There are some gcc inline fixedmul/fixeddiv functions in DoomAttack source. If they still work with newer gcc versions (was in 2.95 era) maybe can be used as inspiration. They look like this:
Code:
extern __inline fixed_t FixedMul(fixed_t eins,fixed_t zwei) { #ifndef version060 __asm __volatile ("muls.l %1,%1:%0 \n\t" "move %1,%0 \n\t" "swap %0 " : "=d" (eins), "=d" (zwei) : "0" (eins), "1" (zwei) ); return eins; #else __asm __volatile ("fmove.l %0,fp0 \n\t" "fmul.l %2,fp0 \n\t" "fmul.x fp7,fp0 \n\t" /* "fintrz.x fp0,fp0 \n\t"*/ "fmove.l fp0,%0" : "=d" (eins) : "0" (eins), "d" (zwei) : "fp0" ); return eins; #endif } |
18 October 2021, 07:30 | #1315 | |
old chunk of coal
Join Date: Nov 2011
Location: Hungary
Posts: 1,289
|
Quote:
https://github.com/BSzili/jfbuild/bl...ragmas.h#L3635 |
|
26 October 2021, 14:13 | #1316 |
bye
Join Date: Jun 2016
Location: Some / Where
Posts: 680
|
There is a tutorial from Wei-ju Wu: Setting up gcc for Amiga cross development
=> [ Show youtube player ] |
28 October 2021, 18:30 | #1317 |
old chunk of coal
Join Date: Nov 2011
Location: Hungary
Posts: 1,289
|
I have a question about the mathieeedoubtrans.library dependency in libnix. After looking at this issue I tried to compile my executable with
-noixemul -m68881 -mhard-float, but pow() for example it still pulls in the libm020/libm881/libm.a, which uses IEEEDPPow. Is is possible to avoid this? I could make a PR to add inline asm replacements to libnix when __HAVE_68881__ is defined. For example this clib2's pow implementation: https://github.com/adtools/clib2/blo...ath_pow.c#L122 It's available under BSD license, and these shouldn't take up much more space than the mathieeedoubtrans.library calls. Would you be interested in such a patch for libnix? |
28 October 2021, 20:37 | #1318 | |
bye
Join Date: Jun 2016
Location: Some / Where
Posts: 680
|
Quote:
yes, it is possible: use -ffast-math https://franke.ms/cex/z/EEnfbE Last edited by bebbo; 28 October 2021 at 20:45. |
|
28 October 2021, 20:50 | #1319 |
old chunk of coal
Join Date: Nov 2011
Location: Hungary
Posts: 1,289
|
Thanks, I guess I can use this pow replacement to avoid mathieeedoubtrans.library
|
29 October 2021, 08:10 | #1320 | |
bye
Join Date: Jun 2016
Location: Some / Where
Posts: 680
|
Quote:
I have no idea if it's smart to hook into the mathieee-stuff or not... ... it's an Amiga thing to provide these. It would be also possible to provide these functions as builtins, which would result into a direct call into the Amiga libraries not using any stub... ... well only thoughts |
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
New GCC based dev toolchain for AmigaOS 3.x | cla | Coders. Releases | 8 | 24 December 2017 10:18 |
Issue with photon/xxxx WinUAE Toolchain | arpz | Coders. Asm / Hardware | 2 | 26 September 2015 22:33 |
New 68k gcc toolchain | arti | Coders. C/C++ | 17 | 31 July 2015 03:59 |
Hannibal's WinUAE Demo Toolchain 5 | Bobic | Amiga scene | 1 | 23 July 2015 21:04 |
From gcc to vbcc. | Cowcat | Coders. General | 9 | 06 June 2014 14:45 |
|
|