Pseudo Floating Point

Ernst Blofeld · 08 February 2021, 17:50

I'm trying to implement some formulae in fixed point, in C if I can, in assembler if I have to.

The problem I have is that the numbers I'm multiplying can vary between very small and very large. The parts are always combined with multiplication, so I'm free to adjust the exponent part, where the point is, of the number as I please.

I don't want to use floating point numbers. I just want to adjust how I scale so that things stay within data sizes I can code with.

Every approach I've considered relies on knowing how big a number is, i.e. which is the highest bit set? It's not like real floating point where you always start with one point zero something.

Obviously, my lack of maths vocabulary is letting me down here, but I think what I'm asking for is a method to see that a number is, say 21 bits long, and as I'd like to keep my numbers to 16 bits, then a shift of 5 bits and subtracting of 5 from my tracking of where the decimal place is, will fix everything.

I know I've not explained this well.

Edit

There is one component of these equations that can range from 1 to 21,000,000. It gets squared. I don't to lose precision for small values, or have to use 64 bits maths for big values. I also don't want a full floating point system. If I could work out how to keep 16 bits of precision so a normal multiply would work, but know where my decimal place has ended up, rinse and repeat for every part of the formula...

Thomas Richter · 08 February 2021, 18:10

Sorry, I don't understand. If the range of your numbers varies, and there is not a static range into which they fit, floating point *is* the best choice. After all, floating point is just "dynamic scaling", whereas fixed point is "static scaling".

The scaling algorithm of floating point is just cleverly made such that the error remains minimal - if you want a similar low error, you would just reinvent the wheel, i.e. "floating point".

On the Amiga, the simplest floating point for 32 bit numbers is mathffp, which is even in ROM, so everyone has it. 32 bits per number, but no "fancy stuff" such as INFs or NANs. Works well enough. Mathffp is "floating point the home made way". Not overly smart, though the best speed/precision ratio on a 68000.

If you need a bit more the fancy stuff, and more precision, and more scientific credibility, go for mathieeesingbas. It even uses the FPU if there is one, is therefore faster on FPU equipped systems, and it is not that much slower than mathffp. It's also in ROM. It is unfortunately not natively supported by most if not all compilers, and it cannot be (easily) mixed with double precision should you ever need that.

But, again, if you want "dynamic range" for your numbers, stay away solving this yourself. It won't work well (unless you have the math background), just use what others prepared for you.

Ernst Blofeld · 08 February 2021, 18:44

Quote:

Originally Posted by Thomas Richter

Sorry, I don't understand. If the range of your numbers varies, and there is not a static range into which they fit, floating point *is* the best choice. After all, floating point is just "dynamic scaling", whereas fixed point is "static scaling".

The scaling algorithm of floating point is just cleverly made such that the error remains minimal - if you want a similar low error, you would just reinvent the wheel, i.e. "floating point".

On the Amiga, the simplest floating point for 32 bit numbers is mathffp, which is even in ROM, so everyone has it. 32 bits per number, but no "fancy stuff" such as INFs or NANs. Works well enough. Mathffp is "floating point the home made way". Not overly smart, though the best speed/precision ratio on a 68000.

If you need a bit more the fancy stuff, and more precision, and more scientific credibility, go for mathieeesingbas. It even uses the FPU if there is one, is therefore faster on FPU equipped systems, and it is not that much slower than mathffp. It's also in ROM. It is unfortunately not natively supported by most if not all compilers, and it cannot be (easily) mixed with double precision should you ever need that.

But, again, if you want "dynamic range" for your numbers, stay away solving this yourself. It won't work well (unless you have the math background), just use what others prepared for you.

Ok, that's a good and somewhat valid answer.

What I'm looking for is more than fixed point, and less than floating point. Maybe mathffp will be what I'm looking for, but going back and forward between it and the fixed point I've used elsewhere may be interesting. I'll look into it before dismissing it.

I'm not looking for "dynamic range", I'm looking for more "adaptable range". I have fixed upper and lower bounds that are not huge, and I need to work with other code which is happily ticking away with its fixed points. I could even resort to having several versions of the function I'm writing, to cope with different scales of the inputs, but no one really wants that.

Edit: I've taken a look at mathffp, and it still seems to leave me with the same problem as before. If I'm going to put my numbers into it, I've got to shift them so that the mantissa is normalised. How do I do that efficiently?

Thomas Richter · 08 February 2021, 19:17

Quote:

Originally Posted by Ernst Blofeld

Edit: I've taken a look at mathffp, and it still seems to leave me with the same problem as before. If I'm going to put my numbers into it, I've got to shift them so that the mantissa is normalised. How do I do that efficiently?

I'm not quite clear what you mean by that? Conversion of binary numbers to FFP and back, possibly?

Ernst Blofeld · 08 February 2021, 19:52

Quote:

Originally Posted by Thomas Richter

I'm not quite clear what you mean by that? Conversion of binary numbers to FFP and back, possibly?

Well, of fixed point, say 4:12 format which is what my force coefficients are in, or 22:10 which my velocity vector components are in.

Ernst Blofeld · 08 February 2021, 20:01

Quote:

Originally Posted by Ernst Blofeld

Well, of fixed point, say 4:12 format which is what my force coefficients are in, or 22:10 which my velocity vector components are in.

Even if everything to do with the physics and motion was done within this floating point world, I still need to translate to and from the fixed point world that I render.

a/b · 08 February 2021, 20:05

I don't know how your rendering pipeline looks like and what you are doing internally, but here are my general observations and thoughts. You mentioned millimeters several times. I really don't know why'd you need such a high accuracy, and I think this is what's causing you all kinds of problems, finding partial solutions leading into more problems.

You are dealing with airplanes, tanks, runways, roads. Absolute minimum unit you need to accurately define these is about 10cm, lets make it 12.5cm (1/8m). That's 3 bits, 1 bit for a sign, and you have 12 bits left, -4096 to 4095 or a little above +/-4km, which is reasonably enough for every object definition in 3d. This guarantees all local computations are 16-bit.
Large objects (runways, roads) are generally static, no local transformations are required, and you keep their coords in 32-bit (1 sign, 28 int, 3 fraction), also all object positions are 32-bit as well. You transform these fully in 32-bit.
Next, if you use LOD to reduce all small/medium objects over 4000m range to a pixel, you just use object's 32-bit position to draw it. When under 4000m range everything small/medium is within 16-bit range, after you have translated the world to your position (or make it a little less than 4000 if you have larger medium objects, 95m+ in size), and you can use 16-bit math again.
So, 16-bit object coords -> local 16-bit transformations resulting in 32-bit data -> global translation in 32-bit -> if over 4000m draw a pixel and done (you check the distance first, before local transformations) -> global transformations in 16-bit resulting in 32-bit data -> projection and normalization (shift/swap) -> 2d screen coords.
Again, large objects and object positions you transform fully in 32-bit, but there shouldn't be too many of them or should have very simple geometry.
With 28 bits available for the int part, and assuming "8-bit" sin/cos amplitude (-256 to 256), you have 19 bits left (or 20 if you limit sin/cos to 255) or 500+km.

That's how I would aproach the whole thing on m68k+a500 hardware.

Ernst Blofeld · 08 February 2021, 20:13

Quote:

Originally Posted by a/b

I don't know how your rendering pipeline looks like and what you are doing internally, but here are my general observations and thoughts. You mentioned millimeters several times. I really don't know why'd you need such a high accuracy, and I think this is what's causing you all kinds of problems, finding partial solutions leading into more problems.

You are dealing with airplanes, tanks, runways, roads. Absolute minimum unit you need to accurately define these is about 10cm, lets make it 12.5cm (1/8m). That's 3 bits, 1 bit for a sign, and you have 12 bits left, -4096 to 4095 or a little above +/-4km, which is reasonably enough for every object definition in 3d. This guarantees all local computations are 16-bit.
Large objects (runways, roads) are generally static, no local transformations are required, and you keep their coords in 32-bit (1 sign, 28 int, 3 fraction), also all object positions are 32-bit as well. You transform these fully in 32-bit.
Next, if you use LOD to reduce all small/medium objects over 4000m range to a pixel, you just use object's 32-bit position to draw it. When under 4000m range everything small/medium is within 16-bit range, after you have translated the world to your position (or make it a little less than 4000 if you have larger medium objects, 95m+ in size), and you can use 16-bit math again.
So, 16-bit object coords -> local 16-bit transformations resulting in 32-bit data -> global translation in 32-bit -> if over 4000m draw a pixel and done (you check the distance first, before local transformations) -> global transformations in 16-bit resulting in 32-bit data -> projection and normalization (shift/swap) -> 2d screen coords.
Again, large objects and object positions you transform fully in 32-bit, but there shouldn't be too many of them or should have very simple geometry.
With 28 bits available for the int part, and assuming "8-bit" sin/cos amplitude (-256 to 256), you have 19 bits left (or 20 if you limit sin/cos to 255) or 500+km.

That's how I would aproach the whole thing on m68k+a500 hardware.

I'm not on the computer anymore, so I'll just answer the millimetre part now: 1. Milimetres work out just right for a eurofighter to fit in a word, 2: I want to handle low speeds like less than a meter per second, 3: a long worth of milimetres is quite a lot.

What you've said about 1/8 metre resolution for objects is a good point for low LOD far away objects, but I know from experience its not great for close up. I will make sure I use it for my mid distance objects.

My problem right now is velocities, from less than 1m/s to mach 4, which in many formulas get squared. One squared is a small number, but mach 4 isn't. Floating point is an answer, but maybe there are others.

roondar · 08 February 2021, 20:37

Now, of course it's your decision to make. But be advised that floating point is not really a good idea for anything targeting a 68000, it's liable to be quite slow even if done well.

a/b · 08 February 2021, 20:43

The set-up I described gives you a reasonable sub-meter accuracy (12.5cm per second).
Again, I don't know how compatible that approach is with what you are doing (quaternions, 4x4 matrix, 3x3 matrices and doing stuff step by step as I descibed, ...). It's more work than say running an array of coords through a 4x4 matrix and calling it a day (use faster HW if it's not fast enough ;P), but gives you a lot more room for optimizations and specializations along the way.

Ernst Blofeld · 08 February 2021, 20:45

Quote:

Originally Posted by a/b

The set-up I described gives you a reasonable sub-meter accuracy (12.5cm per second).
Again, I don't know how compatible that approach is with what you are doing (quaternions, 4x4 matrix, 3x3 matrices and doing stuff step by step as I descibed, ...). It's more work than say running an array of coords through a 4x4 matrix and calling it a day (use faster HW if it's not fast enough ;P), but gives you a lot more room for optimizations and specializations along the way.

But how does it help me deal with velocities from less than 1m/s to mach 4?

a/b · 08 February 2021, 21:00

Quote:

Originally Posted by Ernst Blofeld

What you've said about 1/8 metre resolution for objects is a good point for low LOD far away objects, but I know from experience its not great for close up. I will make sure I use it for my mid distance objects.

Hmm, what is your fov/aspect/scaling factor for projection?
For example, if you have a full screen eurofighter, that's 320pix/16m, or 20pix for each meter of length, with 3-bit accuracy that's minimum unit of size being 2.5 pixels wide on the screen (and since it's all vector graphics it won't have a blocky look).

Ernst Blofeld · 08 February 2021, 21:14

Quote:

Originally Posted by a/b

Hmm, what is your fov/aspect/scaling factor for projection?
For example, if you have a full screen eurofighter, that's 320pix/16m, or 20pix for each meter of length, with 3-bit accuracy that's minimum unit of size being 2.5 pixels wide on the screen (and since it's all vector graphics it won't have a blocky look).

One eurofighter is so close to the view plane that just its canopy is filling the majority of the screen. It is moving at only a few mm per second relative to the camera. It has just released a missile which is accelerating towards its top speed of Mach 4.

a/b · 08 February 2021, 21:21

Quote:

Originally Posted by Ernst Blofeld

But how does it help me deal with velocities from less than 1m/s to mach 4?

That's simple, you nerf mach/super speeds somewhat. If you want to be super realistic with everything, yeah you're probably going to need floats or 64-bit, but at what price and would it really be noticable? I'd rather be realistic with what the a500 hardware can handle :P.
With a 500km x 500km map (remember, object positions are 32-bit with 1/8m accuracy, this also holds for speeds), you can still handle both 12.5cm/s and 1.2km/h at the same time, and it would take you ~7min to go from one side of the map to the opposite.

Furthermore, how are you applying speed? Full amount every 1sec, 1/50 of full amount every VBL, 1/fps of full amount every rendered frame, ...? This might help regardless of fp or int, if you can reduce the max amount of speed you are adding per calculation.

Ernst Blofeld · 08 February 2021, 21:37

Quote:

Originally Posted by a/b

Furthermore, how are you applying speed? Full amount every 1sec, 1/50 of full amount every VBL, 1/fps of full amount every rendered frame, ...?

I am calculating forces and moments and integrating them up into acceleration and therefore velocity. I count the number of steps I need to do in the vertical blank, so it will be a number that 50 will divide by, so maybe 5 or 10 or 25 per second, but I add them all up in one step. I'll stop adding forces when I run out of cpu, for instance at the moment drag is limited to a basic value based on the angle of attack plus an extra for elevator position multiplied by an adjustment that increases it going past the sound barrier. Accuracy is limited more by the lack of publicly available data than anything else.

a/b · 08 February 2021, 21:59

Quote:

Originally Posted by Ernst Blofeld

One eurofighter is so close to the view plane that just its canopy is filling the majority of the screen. It is moving at only a few mm per second relative to the camera. It has just released a missile which is accelerating towards its top speed of Mach 4.

You cheat if you need such extreme close-ups.
Since the plane is moving at high speeds, you don't need extra precision for it, so you keep camera position as 40-bit (extra 8 bits of fraction, calculated once per frame so no problem, and you ignore them except...) and if an object is super close, you do alternative transformations:
1. partially normalize its coords after the local rotation (so you gain extra few bits of precision from sin/cos), they are still fit well within 16 bits (again, super close)
2. use those extra 8 bits of camera position
3. global translate, rotate and project with extra precision so that the object doesn't jump around the screen (12.cm is approx 20pix in this case, assuming the canopy is ~2m, but now with more bits it should be much smoother)
As long as 12.cm is enough to model 3d objects with sufficient accuracy, you are fine.

edit: changed 8 extra bits to fewer (like 4 should be fine, leaving you with 8 bits or +/-256m object size) to make sure the object can still work with 16-bit math

Ernst Blofeld · 09 February 2021, 08:47

Quote:

Originally Posted by a/b

You cheat if you need such extreme close-ups.
Since the plane is moving at high speeds, you don't need extra precision for it, so you keep camera position as 40-bit (extra 8 bits of fraction, calculated once per frame so no problem, and you ignore them except...) and if an object is super close, you do alternative transformations:
1. partially normalize its coords after the local rotation (so you gain extra few bits of precision from sin/cos), they are still fit well within 16 bits (again, super close)
2. use those extra 8 bits of camera position
3. global translate, rotate and project with extra precision so that the object doesn't jump around the screen (12.cm is approx 20pix in this case, assuming the canopy is ~2m, but now with more bits it should be much smoother)
As long as 12.cm is enough to model 3d objects with sufficient accuracy, you are fine.

edit: changed 8 extra bits to fewer (like 4 should be fine, leaving you with 8 bits or +/-256m object size) to make sure the object can still work with 16-bit math

This was never a question about rendering or 3D modeling though.

Thomas Richter · 09 February 2021, 09:41

Quote:

Originally Posted by Ernst Blofeld

Even if everything to do with the physics and motion was done within this floating point world, I still need to translate to and from the fixed point world that I render.

But that's quite trivial. MathFFP includes conversion to and from integer. All you need to do is to fixup the binary exponent after or before the conversion by the number of fractional bits. For MathFFP, the exponent sits in the least 8 significant bits if I recall.

Thus, to convert from fix to float, first use the integer to ffp conversion, then substract the number of fix bits from the number with sub.b #fix_bits,d0.

To convert from floating point to fixpoint, invert: First add the number of fix bits (add.b #fix_bits,d0), then call the FFP to integer conversion.

Ernst Blofeld · 09 February 2021, 10:42

Quote:

Originally Posted by Thomas Richter

But that's quite trivial. MathFFP includes conversion to and from integer. All you need to do is to fixup the binary exponent after or before the conversion by the number of fractional bits. For MathFFP, the exponent sits in the least 8 significant bits if I recall.

Thus, to convert from fix to float, first use the integer to ffp conversion, then substract the number of fix bits from the number with sub.b #fix_bits,d0.

To convert from floating point to fixpoint, invert: First add the number of fix bits (add.b #fix_bits,d0), then call the FFP to integer conversion.

Thanks, I had somehow missed the FPFix and FPFlt functions, and wouldn't have known they could work with manipulating the exponent part like that.

chaos · 09 February 2021, 16:31

If you need a little more range and can sacrifice some precision, you can also look into companding the data (think A-law or mu-law compression).

08 February 2021, 17:50	#1
Ernst Blofeld <optimized out> Join Date: Sep 2020 Location: <optimized out> Posts: 321	Pseudo Floating Point I'm trying to implement some formulae in fixed point, in C if I can, in assembler if I have to. The problem I have is that the numbers I'm multiplying can vary between very small and very large. The parts are always combined with multiplication, so I'm free to adjust the exponent part, where the point is, of the number as I please. I don't want to use floating point numbers. I just want to adjust how I scale so that things stay within data sizes I can code with. Every approach I've considered relies on knowing how big a number is, i.e. which is the highest bit set? It's not like real floating point where you always start with one point zero something. Obviously, my lack of maths vocabulary is letting me down here, but I think what I'm asking for is a method to see that a number is, say 21 bits long, and as I'd like to keep my numbers to 16 bits, then a shift of 5 bits and subtracting of 5 from my tracking of where the decimal place is, will fix everything. I know I've not explained this well. Edit There is one component of these equations that can range from 1 to 21,000,000. It gets squared. I don't to lose precision for small values, or have to use 64 bits maths for big values. I also don't want a full floating point system. If I could work out how to keep 16 bits of precision so a normal multiply would work, but know where my decimal place has ended up, rinse and repeat for every part of the formula... Last edited by Ernst Blofeld; 08 February 2021 at 18:01.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
StormC4 - floating-point number	emufan	Coders. C/C++	0	11 November 2017 21:32
WinUAE Assembly dump from point to point?	Sim085	support.WinUAE	3	11 April 2017 19:43
Fast floating-point random numbers	Leffmann	Coders. Tutorials	27	25 August 2016 01:59
Important Facts about Floating Point Numbers	Mrs Beanbag	Coders. General	2	13 August 2016 21:14
Floating point without FPU	oRBIT	Coders. Asm / Hardware	13	18 March 2015 23:11

08 February 2021, 18:10	#2
Thomas Richter Registered User Join Date: Jan 2019 Location: Germany Posts: 3,330	Sorry, I don't understand. If the range of your numbers varies, and there is not a static range into which they fit, floating point is the best choice. After all, floating point is just "dynamic scaling", whereas fixed point is "static scaling". The scaling algorithm of floating point is just cleverly made such that the error remains minimal - if you want a similar low error, you would just reinvent the wheel, i.e. "floating point". On the Amiga, the simplest floating point for 32 bit numbers is mathffp, which is even in ROM, so everyone has it. 32 bits per number, but no "fancy stuff" such as INFs or NANs. Works well enough. Mathffp is "floating point the home made way". Not overly smart, though the best speed/precision ratio on a 68000. If you need a bit more the fancy stuff, and more precision, and more scientific credibility, go for mathieeesingbas. It even uses the FPU if there is one, is therefore faster on FPU equipped systems, and it is not that much slower than mathffp. It's also in ROM. It is unfortunately not natively supported by most if not all compilers, and it cannot be (easily) mixed with double precision should you ever need that. But, again, if you want "dynamic range" for your numbers, stay away solving this yourself. It won't work well (unless you have the math background), just use what others prepared for you.

08 February 2021, 20:05	#7
a/b Registered User Join Date: Jun 2016 Location: europe Posts: 1,068	I don't know how your rendering pipeline looks like and what you are doing internally, but here are my general observations and thoughts. You mentioned millimeters several times. I really don't know why'd you need such a high accuracy, and I think this is what's causing you all kinds of problems, finding partial solutions leading into more problems. You are dealing with airplanes, tanks, runways, roads. Absolute minimum unit you need to accurately define these is about 10cm, lets make it 12.5cm (1/8m). That's 3 bits, 1 bit for a sign, and you have 12 bits left, -4096 to 4095 or a little above +/-4km, which is reasonably enough for every object definition in 3d. This guarantees all local computations are 16-bit. Large objects (runways, roads) are generally static, no local transformations are required, and you keep their coords in 32-bit (1 sign, 28 int, 3 fraction), also all object positions are 32-bit as well. You transform these fully in 32-bit. Next, if you use LOD to reduce all small/medium objects over 4000m range to a pixel, you just use object's 32-bit position to draw it. When under 4000m range everything small/medium is within 16-bit range, after you have translated the world to your position (or make it a little less than 4000 if you have larger medium objects, 95m+ in size), and you can use 16-bit math again. So, 16-bit object coords -> local 16-bit transformations resulting in 32-bit data -> global translation in 32-bit -> if over 4000m draw a pixel and done (you check the distance first, before local transformations) -> global transformations in 16-bit resulting in 32-bit data -> projection and normalization (shift/swap) -> 2d screen coords. Again, large objects and object positions you transform fully in 32-bit, but there shouldn't be too many of them or should have very simple geometry. With 28 bits available for the int part, and assuming "8-bit" sin/cos amplitude (-256 to 256), you have 19 bits left (or 20 if you limit sin/cos to 255) or 500+km. That's how I would aproach the whole thing on m68k+a500 hardware.

08 February 2021, 20:37	#9
roondar Registered User Join Date: Jul 2015 Location: The Netherlands Posts: 3,438	Now, of course it's your decision to make. But be advised that floating point is not really a good idea for anything targeting a 68000, it's liable to be quite slow even if done well.

08 February 2021, 20:43	#10
a/b Registered User Join Date: Jun 2016 Location: europe Posts: 1,068	The set-up I described gives you a reasonable sub-meter accuracy (12.5cm per second). Again, I don't know how compatible that approach is with what you are doing (quaternions, 4x4 matrix, 3x3 matrices and doing stuff step by step as I descibed, ...). It's more work than say running an array of coords through a 4x4 matrix and calling it a day (use faster HW if it's not fast enough ;P), but gives you a lot more room for optimizations and specializations along the way.

09 February 2021, 16:31	#20
chaos Registered User Join Date: Mar 2013 Location: Slovenia Posts: 138	If you need a little more range and can sacrifice some precision, you can also look into companding the data (think A-law or mu-law compression).

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)