Casting floats to uint (or rather not) with VBCC

Hedeon · 04 January 2022, 10:54

Hi,

What I want to do is to have a single precision float number and endian swap it before I store it in a PCI register.

I have something like

StorePCI((ULONG)pciaddress, (float)driver->xvalue);

and StorePCI being:

StorePCI(__reg("a0") ULONG address, __reg("d0") ULONG value)="\trol.w\t#8,d0\n\tswap\td0\n\trol.w\t#8,d0\n\tmove.l\td0,(a0)\n";

What I need is something like:

fmove.s #1,fp0 //let's say value stored in driver->xvalue
fmove.s fp0,d0
move.l #pciaddress,a0

rol.w #8,d0
swap d0
rol.w #8,d0
move.l d0,(a0)

What I end up with is a fmove.d fp0,d0. So the integer value is stored in d0 ($1) instead of the floating point "raw" value ($3f800000)

How can I actually byteswap $3f800000 instead of $1 with VBCC. I have tried all different kinds of casting. I am a bit lost.

Thomas Richter · 04 January 2022, 11:12

Does vbcc actually support single precision IEEE numbers? I'm asking, because SAS/C does not, it only supports mathffp numbers and double precisoin IEEE numbers.

However, in case it does - and check for the right math options - the following should do it:

Code:

union {
 float u_float;
 uint32_t u_int;
} u;
u.u_float = f; /* put the float value into the union */
i = (u.u_int >> 24) | ((u.u_int >> 8) & 0xff00) | ((u.u_int << 8) & 0xff0000) | ((u.u_int << 24)); /* perform endian swap */

Note that a C compiler has no way to reinterpret a float as int, a cast will always round, so you have to go through a temporary in memory. A C++-compiler has the possibility to do so via reinterpret_cast<int>(float). Also, there is no endian-swap primitive in C or C++, though a couple of modern compilers recognize the above idiom and generate ideal code, such as the GNU compiler.

Hedeon · 04 January 2022, 11:20

Quote:

Originally Posted by Hedeon

What I need is something like:

fmove.s driver->xvalue,fp0 //changed this to make it a bit more clear, not real code
fmove.s fp0,d0
move.l #pciaddress,a0

Even better would be:

move.l driver->xvalue, d0
move.l #pciaddress,a0

removing both fmoves.

Hedeon · 04 January 2022, 11:23

Quote:

Originally Posted by Thomas Richter

Does vbcc actually support single precision IEEE numbers? I'm asking, because SAS/C does not, it only supports mathffp numbers and double precisoin IEEE numbers.

However, in case it does - and check for the right math options - the following should do it:

Code:

union {
 float u_float;
 uint32_t u_int;
} u;
u.u_float = f; /* put the float value into the union */
i = (u.u_int >> 24) | ((u.u_int >> 8) & 0xff00) | ((u.u_int << 8) & 0xff0000) | ((u.u_int << 24)); /* perform endian swap */

Note that a C compiler has no way to reinterpret a float as int, a cast will always round, so you have to go through a temporary in memory. A C++-compiler has the possibility to do so via reinterpret_cast<int>(float). Also, there is no endian-swap primitive in C or C++, though a couple of modern compilers recognize the above idiom and generate ideal code, such as the GNU compiler.

Thanks Thomas, I'll give it a try. It is old source that used to compile with gcc2.95 m68k and PPC. Those seem to generate code even skipping the fmove.s opcodes and load d0 directly with the $3f800000 value.

Now I am trying to compile with VBCC and the result is different. I did not change the source except the asm macro.

(Same with PPC where the old compiled program with gcc uses stfs and vbcc uses stfd (single versus double)) or it just get loaded with a lwz.

Hedeon · 04 January 2022, 12:57

So the differences:

If float calculations are done, GCC in the end stores it in d0 using a fmove.s. When the value/parameter is directly used with no calculations it just uses move.l to d0

VBCC after calculations does a fmove.d to store the result in d0. When the value is taken directly, it casts using fmove.d to d0

vbc · 04 January 2022, 13:56

Quote:

Originally Posted by Hedeon

Hi,
[...]
StorePCI(__reg("a0") ULONG address, __reg("d0") ULONG value)="\trol.w\t#8,d0\n\tswap\td0\n\trol.w\t#8,d0\n\tmove.l\td0,(a0)\n";
[...]

How can I actually byteswap $3f800000 instead of $1 with VBCC. I have tried all different kinds of casting. I am a bit lost.

You have to declare value as float rather than ULONG to prevent a float=>int conversion. Unfortunately, vbcc does not allow to specify a data register for a float value when generating code for FPU. (I am not sure what the rationale was originally. I did a quick test to change that and it seems to work, but maybe there is some code path in the backend that does not expect it and has to be adapted.)

Anyway, why not use one of those:

Best for soft-float only:

Code:

StorePCI(__reg("a0") ULONG address, __reg("d0") float value)="\trol.w\t#8,d0\n\tswap\td0\n\trol.w\t#8,d0\n\tmove.l\td0,(a0)\n";

Best for FPU only:

Code:

StorePCI(__reg("a0") ULONG address, __reg("fp0") float  value)="\tfmove.s\tfp0,d0\n\trol.w\t#8,d0\n\tswap\td0\n\trol.w\t#8,d0\n\tmove.l\td0,(a0)\n";

Slightly less efficient, but should work in both cases:

Code:

StorePCI(__reg("a0") ULONG address, float  value)="\tmove.l\t(a7),d0\n\trol.w\t#8,d0\n\tswap\td0\n\trol.w\t#8,d0\n\tmove.l\td0,(a0)\n";

Or use the most efficient one based on the selected target:

Code:

#if __FPU__>68000 
void StorePCI(__reg("a0") ULONG address, __reg("fp0") float value)="\tfmove.s\tfp
0,d0\n\trol.w\t#8,d0\n\tswap\td0\n\trol.w\t#8,d0\n\tmove.l\td0,(a0)\n"; 
#else 
void StorePCI(__reg("a0") long address, __reg("d0") float value)="\trol.w\t#8,d0\
n\tswap\td0\n\trol.w\t#8,d0\n\tmove.l\td0,(a0)\n"; 
#endif

vbc · 04 January 2022, 14:07

Quote:

Originally Posted by Thomas Richter

Does vbcc actually support single precision IEEE numbers? I'm asking, because SAS/C does not, it only supports mathffp numbers and double precisoin IEEE numbers.

Yes, unless code for kickstart 1.x is generated.

Quote:

However, in case it does - and check for the right math options - the following should do it:

Code:

union {
 float u_float;
 uint32_t u_int;
} u;
u.u_float = f; /* put the float value into the union */
i = (u.u_int >> 24) | ((u.u_int >> 8) & 0xff00) | ((u.u_int << 8) & 0xff0000) | ((u.u_int << 24)); /* perform endian swap */

The union hack is not correct C. It is only allowed to read the member of a union that has been written most recently. On higher optimization levels some compilers (including vbcc) will make use of C aliasing rules and code like this may not work.

What you can do is using a char * pointer to the float variable and read out the individual bytes. Note that only char * is allowed here without breaking aliasing rules.

paraj · 04 January 2022, 14:47

Quote:

Originally Posted by vbc

Yes, unless code for kickstart 1.x is generated.

The union hack is not correct C. It is only allowed to read the member of a union that has been written most recently. On higher optimization levels some compilers (including vbcc) will make use of C aliasing rules and code like this may not work.

What you can do is using a char * pointer to the float variable and read out the individual bytes. Note that only char * is allowed here without breaking aliasing rules.

Isn't it explicitly allowed in C99 and later? §6.5.2.3.3:

Quote:

A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member, 82) and is an lvalue if the first e xpression is an lv alue. If the first e xpression has qualified type, the result has the so-qualified version of the type of the designated member.

Where footnote 82 says:

Quote:

If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted
as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.

Some discussion on stackoverflow: https://stackoverflow.com/a/25672839/786653

P.S. In C++ reinterpret_cast won't work, you either have to use memcpy or std::bit_cast (from C++20)

phx · 04 January 2022, 14:48

Quote:

Originally Posted by Hedeon

What I want to do is to have a single precision float number and endian swap it before I store it in a PCI register.

I have something like

StorePCI((ULONG)pciaddress, (float)driver->xvalue);

The m68k has no instruction to transfer the IEEE single precision value from an FPU register directly into a data register. fmove to Dn always means convert to integer.

So, as Thomas correctly pointed out, you have to go over a temporary memory location. In C this is usually done with a union, as shown in his example.

If you want it as an assembler inline function, it could look like this:

Code:

void StorePCI(__reg("a0") void *addr, __reg("fp0") float val) =
  "\tfmove.s\tfp0,-(sp)\n"
  "\tmove.l\t(sp)+,d0\n"
  "\trol.w\t#8,d0\n"
  "\tswap\td0\n"
  "\trol.w\t#8,d0\n"
  "\tmove.l\td0,(a0)";

Quote:

Originally Posted by Thomas Richter

Does vbcc actually support single precision IEEE numbers? I'm asking, because SAS/C does not, it only supports mathffp numbers and double precisoin IEEE numbers.

Yes it supports single precision IEEE, which is ideal for OS2/3. But for the Kickstart 1.x target I had to implement conversion routines from/to mathffp.

Quote:

Originally Posted by Hedeon

If float calculations are done, GCC in the end stores it in d0 using a fmove.s. When the value/parameter is directly used with no calculations it just uses move.l to d0

AFAIK gcc's m68k backend only knows the V.4-ABI, which makes a function always return float results in data registers (d0 for single, d0/d1 for double precision), while the AmigaOS-ABI prefers to use fp0 when compiled with an FPU-option. There is the -no-fp-return option to switch vbcc to V.4-ABI for floating point return values.

Thomas Richter · 04 January 2022, 15:03

Quote:

Originally Posted by vbc

The union hack is not correct C. It is only allowed to read the member of a union that has been written most recently. On higher optimization levels some compilers (including vbcc) will make use of C aliasing rules and code like this may not work.

*Cough* Aliasing between members of unions does not apply. There is a specific clause for that in C. The result is, of course, undefined, but what else can the C standard say?

See, for example:

https://stackoverflow.com/questions/...through-unions

Thomas Richter · 04 January 2022, 15:04

Quote:

Originally Posted by phx

The m68k has no instruction to transfer the IEEE single precision value from an FPU register directly into a data register.

Sure it does. "fmove.s fpx,dy" works nicely. It transfers the bit-pattern of the source fpu register, rounded to 32-bit single precision, to the data register dy. Would be rather bad if this wouldn't work because the processor libraries are full of such code. (-;

phx · 04 January 2022, 15:21

Quote:

Originally Posted by Thomas Richter

"fmove.s fpx,dy" works nicely. It transfers the bit-pattern of the source fpu register, rounded to 32-bit single precision, to the data register dy.

Indeed! Completely forgot about it.

Hedeon · 04 January 2022, 15:35

I made stuff more complicated to try to compile for both ppc and m68k. The ppc does not have direct fpu register to general register modes. So yesterday I ended up with sending the (ULONG*)&driver->xvalue to StorePCI and made value in the asm macro also ULONG* and added a normal load from memory to the data register first. Compiled for at least 68020 and fpu 68881. This also worked for PPC (with -lm).

However, while the speed from the vbcc m68k generated was comparable to gcc (around 5% slower on my 68060), the ppc generated one took a 40% speed hit. I guessed because for 68k the number of opcodes in the macro went from 4 to 5, while the ppc one went from 1 (a single stwbrx) to 2. Also, when looking at the differences in the code where vbcc uses more (double precision) fpu opcodes while gcc uses more general registers directly, I wondered if the mistake was in the casting as I really wanted to ditch those extra opcodes in the asm macros again for speed reasons. This function is used a lot. Hence the post, but focused on m68k while there is more expertise there.

Looking at the answers it looks like that for ppc and speed wise I have to revert to gcc2.95 anyway. Sorry for the maybe a bit misleading first post.

vbc · 04 January 2022, 16:32

Quote:

Originally Posted by paraj

Isn't it explicitly allowed in C99 and later? §6.5.2.3.3:

This footnote is not in my copy of the standard and it was apparently added later through a defect report. While I agree that the wording is somewhat misleading, the defect report suggests that it was intended as clarification, perhaps to allow a trap representation in all cases.

There still is 6.5p7 which lists all allowed types for accessing an lvalue:

Quote:

An object shall have its stored value accessed only by an lvalue expression that has one of
the following types:73)
— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the object,
— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
— a character type.

vbc · 04 January 2022, 16:52

Quote:

Originally Posted by Thomas Richter

*Cough*

Gesundheit!

Quote:

Aliasing between members of unions does not apply. There is a specific clause for that in C.

Which clause?

Quote:

The result is, of course, undefined, but what else can the C standard say?

It could say "implementation defined result". I am not sure what clause you are refering to, but if the C standard mentions "undefined" that means the code is illegal and the compiler can ignore such a case without having to diagnose it or handling it in any meaningful way.

Quote:

https://stackoverflow.com/questions/...through-unions

That seems to contain a lot of opinions without any backing. comp.std.c was the place to get decent information on such topics. Unfortunately newsgroups are not much in fashion any more.

Thomas Richter · 04 January 2022, 17:15

Quote:

Originally Posted by vbc

Which clause?

See here for the references:

https://stackoverflow.com/questions/...-what-does-not

Quote:

Originally Posted by vbc

That seems to contain a lot of opinions without any backing. comp.std.c was the place to get decent information on such topics. Unfortunately newsgroups are not much in fashion any more.

I suggest then to go checking there if you don't believe me. I run into this issue a while ago, this is why I mention it. Storing in memory and going through a pointer cast is indeed not going to work, and had issues with that with, for example, the icc compiler. The "union hack", as you call it, solves that type of problem.

paraj · 04 January 2022, 17:56

Quote:

Originally Posted by vbc

This footnote is not in my copy of the standard and it was apparently added later through a defect report. While I agree that the wording is somewhat misleading, the defect report suggests that it was intended as clarification, perhaps to allow a trap representation in all cases.

There still is 6.5p7 which lists all allowed types for accessing an lvalue:

Sorry, I should have mentioned that I was looking at N1256 (final draft of TC3 from 2007).

I have to admit the issue is less clear than I remembered it, and it might be the case that the standard technically doesn't require type-punning through unions to be supported (this stackoverflow answer makes a persuasive argument).

You're also right that it was added through a defect report (DR283). Following the linked discussions, I think it's quite clear (from proposal N980) that the intention was to allow it though. It's certainly widely believed to be, as evidences by this thread

Hedeon · 04 January 2022, 19:12

Quote:

Originally Posted by phx

AFAIK gcc's m68k backend only knows the V.4-ABI, which makes a function always return float results in data registers (d0 for single, d0/d1 for double precision), while the AmigaOS-ABI prefers to use fp0 when compiled with an FPU-option. There is the -no-fp-return option to switch vbcc to V.4-ABI for floating point return values.

Tried the following with that option:

int main(void)
{
float x;
x = 1.0;
return int(x);
}

result is moveq #1,d0 and sadly not move.l #$3f800000,d0

The union approach has me changing a lot of the code. Will take a while.

vbc · 04 January 2022, 19:49

Quote:

Originally Posted by Thomas Richter

See here for the references:

https://stackoverflow.com/questions/...-what-does-not

This is a discussion about details in the gcc documentation. The only mentioned parts of the C standard that I found are the non-normative footnote that was already mentioned in this thread and the part from 6.5 that I quoted. One poster even states: "What gcc says is it relaxes the rules a bit, and allows type-punning through unions even though the standard doesn't require it to"

Quote:

I suggest then to go checking there if you don't believe me.

This is what I did 20-25 years ago.

Quote:

I run into this issue a while ago, this is why I mention it. Storing in memory and going through a pointer cast is indeed not going to work, and had issues with that with, for example, the icc compiler. The "union hack", as you call it, solves that type of problem.

It may make that specific code work on specific versions of specific compilers. But it still violates the C standard and relies on internals of a specific compiler without the need to do so. There are alternatives which do not cause undefined behaviour, like using char-pointers.

paraj · 04 January 2022, 19:54

Quote:

Originally Posted by Hedeon

Tried the following with that option:

int main(void)
{
float x;
x = 1.0;
return int(x);
}

result is moveq #1,d0 and sadly not move.l #$3f800000,d0

The union approach has me changing a lot of the code. Will take a while.

Is there any reason why you couldn't just do this?

Code:

#ifdef __M68K__
#ifdef __GNUC__
void StorePCIFloat(ULONG address, float f)
{
    union {
        float f;
        ULONG u;
    } u = { .f = f };
   *(volatile ULONG*)address  = __builtin_bswap32(u.u);
 }
 #else
void StorePCIFloat(__reg("a0") ULONG address, __reg("fp0") float  value)="\tfmove.s\tfp0,d0\n\trol.w\t#8,d0\n\tswap\td0\n\trol.w\t#8,d0\n\tmove.l\td0,(a0)\n";
 #endif

#else
// Insert equivalent PPC magic

#endif

(Yes you have to change every place you want to write a float, but at least any further changes will be localized).

The above generates very sensible code with both Bebbos GCC and VBCC.

GCC even manages to convert StorePCIFloat(0x1234, 1.0f) into move.l #32831,4660.w.

04 January 2022, 11:12	#2
Thomas Richter Registered User Join Date: Jan 2019 Location: Germany Posts: 3,233	Does vbcc actually support single precision IEEE numbers? I'm asking, because SAS/C does not, it only supports mathffp numbers and double precisoin IEEE numbers. However, in case it does - and check for the right math options - the following should do it: Code: union { float u_float; uint32_t u_int; } u; u.u_float = f; /* put the float value into the union / i = (u.u_int >> 24) \| ((u.u_int >> 8) & 0xff00) \| ((u.u_int << 8) & 0xff0000) \| ((u.u_int << 24)); / perform endian swap */ Note that a C compiler has no way to reinterpret a float as int, a cast will always round, so you have to go through a temporary in memory. A C++-compiler has the possibility to do so via reinterpret_cast<int>(float). Also, there is no endian-swap primitive in C or C++, though a couple of modern compilers recognize the above idiom and generate ideal code, such as the GNU compiler.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
VBCC - What's going on here?	deimos	Coders. C/C++	69	28 July 2018 16:14
Space Hulk (1993) - Question about ray casting & graphics	Cherno	Nostalgia & memories	0	27 August 2017 10:24
Integers vs floats (FFP/Sing/Doub) + printf()	guy lateur	Coders. Asm / Hardware	63	18 July 2017 17:57
Ray casting	sandruzzo	Coders. General	14	21 June 2017 01:06
AmiDevCpp and Floats	AmigaEd	Coders. General	0	18 January 2006 03:16

04 January 2022, 10:54	#1
Hedeon Semi-Retired Join Date: Mar 2012 Location: Leiden / The Netherlands Posts: 2,002	Casting floats to uint (or rather not) with VBCC Hi, What I want to do is to have a single precision float number and endian swap it before I store it in a PCI register. I have something like StorePCI((ULONG)pciaddress, (float)driver->xvalue); and StorePCI being: StorePCI(__reg("a0") ULONG address, __reg("d0") ULONG value)="\trol.w\t#8,d0\n\tswap\td0\n\trol.w\t#8,d0\n\tmove.l\td0,(a0)\n"; What I need is something like: fmove.s #1,fp0 //let's say value stored in driver->xvalue fmove.s fp0,d0 move.l #pciaddress,a0 rol.w #8,d0 swap d0 rol.w #8,d0 move.l d0,(a0) What I end up with is a fmove.d fp0,d0. So the integer value is stored in d0 ($1) instead of the floating point "raw" value ($3f800000) How can I actually byteswap $3f800000 instead of $1 with VBCC. I have tried all different kinds of casting. I am a bit lost.

04 January 2022, 12:57	#5
Hedeon Semi-Retired Join Date: Mar 2012 Location: Leiden / The Netherlands Posts: 2,002	So the differences: If float calculations are done, GCC in the end stores it in d0 using a fmove.s. When the value/parameter is directly used with no calculations it just uses move.l to d0 VBCC after calculations does a fmove.d to store the result in d0. When the value is taken directly, it casts using fmove.d to d0

04 January 2022, 15:35	#13
Hedeon Semi-Retired Join Date: Mar 2012 Location: Leiden / The Netherlands Posts: 2,002	I made stuff more complicated to try to compile for both ppc and m68k. The ppc does not have direct fpu register to general register modes. So yesterday I ended up with sending the (ULONG)&driver->xvalue to StorePCI and made value in the asm macro also ULONG and added a normal load from memory to the data register first. Compiled for at least 68020 and fpu 68881. This also worked for PPC (with -lm). However, while the speed from the vbcc m68k generated was comparable to gcc (around 5% slower on my 68060), the ppc generated one took a 40% speed hit. I guessed because for 68k the number of opcodes in the macro went from 4 to 5, while the ppc one went from 1 (a single stwbrx) to 2. Also, when looking at the differences in the code where vbcc uses more (double precision) fpu opcodes while gcc uses more general registers directly, I wondered if the mistake was in the casting as I really wanted to ditch those extra opcodes in the asm macros again for speed reasons. This function is used a lot. Hence the post, but focused on m68k while there is more expertise there. Looking at the answers it looks like that for ppc and speed wise I have to revert to gcc2.95 anyway. Sorry for the maybe a bit misleading first post.

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)