English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 14 July 2020, 03:19   #1
Jobbo
Registered User

Jobbo's Avatar
 
Join Date: Jun 2020
Location: Lexington, MA
Posts: 8
3D Projection using log and exp.

Does anyone have experience using log and exp tables to speed up 3D projection by reducing the divs and muls to subs and adds?


It's basically about using these identities:


log(x/y) = log(x) - log(y)
log(x*y) = log(x) + log(y)


I'm struggling to get this working, either because I'm making mistakes or I've blown my precision.


Anyway I was wondering if anyone has experience with this they could share or if anyone could point me to some examples or documents.


Thanks
Jobbo is offline  
Old 14 July 2020, 19:24   #2
Kalms
Registered User
 
Join Date: Nov 2006
Location: Stockholm, Sweden
Posts: 226
Precision is not great, no. You end up having very large log/exp tables. I never tested it out myself, as a napkin analysis shows that it should really only be useful for spinning a large number of dots in the center of the screen. I think it would be more useful for the rotation than for the perspective projection.


Instead, consider using the identity (a * b) = ((a + b)^2 - (a - b)^2)/4 and having a single y = x^2/4 table. This can help slightly compared to using normal multiplies, and it is easier to understand where your precision is going (or disappearing). Less clock cycles under ideal conditions and arbitrary (a, b) values - but more instruction/data memory accesses in total so may come out worse if you have lots of DMA activity going on.

Then, for the perspective divide, use a table lookup to compute 1/x and use a regular multiply (as the numerical range of the input operands will require a very large table). Or, if you really want to use a table approach, convert the 1/x factor into (mantissa, exponent) format via a table lookup, use an x^2/4 table on the mantissa portion and then shift according to the exponent afterward. Converting the divide into a multiply-by-reciprocal is often a win; converting the multiply into table lookups and add/sub is only sometimes a win.
Kalms is offline  
Old 15 July 2020, 06:34   #3
Jobbo
Registered User

Jobbo's Avatar
 
Join Date: Jun 2020
Location: Lexington, MA
Posts: 8
Thanks Kalms,

I'll probably give a bunch of different appoached a try. It does seem like it's a balancing act that depends on what you are going for. I appreciate you sharing your experience.

I'm still determined to figure out the log/exp approach. Axis/Oxyron seems to have used it effectively for the starfield in Planet-Rocklobster. The code is on github but takes a little figuring out.

He states that he was able get 520 stars running at 50hz, rotating, projecting, plotting and clearing. Only adding sprites and copper activity reduced it for the demo. Based on the code and his approach that seems pretty optimal. Not sure if anyone has achieved more?
Jobbo is offline  
Old 18 July 2020, 10:13   #4
Antiriad_UK
OCS forever!

Antiriad_UK's Avatar
 
Join Date: Mar 2019
Location: Birmingham, UK
Posts: 248
I also looked at that code in Planet Rocklobster. I didn't know anything about logs/exp for mult/div so spent some time looking into that but couldn't quite work out how the table was generated for the demo or follow the code (can't even follow my own code without comments lol). I'd be interested to know how you get on.

I was doing my own starfield and I ended up doing the reciprocal method Kalms mentioned. With 2 bitplanes and a triple buffer I got around 350 stars but didn't spend major time optimizing as that was good enough for the title screen I wanted. But yeah a lot less that 500
Antiriad_UK is online now  
Old 18 July 2020, 10:35   #5
ross
Per aspera ad astra

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 50
Posts: 2,707
I'm pretty sure that at some point I experimented with the LOG/EXP tables, but then for the 3D star field I used the classic method (out of laziness?).
In any case, I had enough stars
ross is offline  
Old 18 July 2020, 14:44   #6
Kalms
Registered User
 
Join Date: Nov 2006
Location: Stockholm, Sweden
Posts: 226
I had a look at Planet Rocklobster. It uses plain linear combinations to do the rotation, and log/exp for the division.

The code at https://github.com/AxisOxy/Planet-Ro....asm#L678-L766 is run each frame. It computes rotated direction vectors for the X, Y and Z axes respectively. Then https://github.com/AxisOxy/Planet-Ro....asm#L814-L876 is invoked a couple of times. Those routines compute scaled versions of those direction vectors, for length = 0,1, ..., 63.

Later, the pregenerated code at https://github.com/AxisOxy/Planet-Ro....asm#L333-L345 does rotations by fetching suitable pre-scaled versions of the pre-rotated X, Y and Z vectors, and adding them together.

The tradeoff being made here is, that generating 3*64 scaled direction vectors, and then adding 500 triplets of them together is quicker than rotating 500 vertices using matrix*vector multiplication. This, in turn depends on that A) generating a number of pre-scaled versions of the basis vectors can be done by stepping & adding, without needing to do multiplies (so generating 3*64 scaled direction vectors is a lot quicker than rotating 3*64 vectors), B) adding together 500 triplets of vectors is a lot quicker than rotating 500 vectors, and C) the space is dense or highly quantized (so the ratio of vertex count to scaled direction vector count is high). You get a dense space if you have a small object with lots of vertices in it; you get a highly quantized space if you enlarge the space so that each direction vector steps is many pixels on-screen - which is fine for a starfield because people can't see the quantization as well as if the vertices were connected to each other or otherwise formed a shape.

If we then look at the generated code, around https://github.com/AxisOxy/Planet-Ro...field.asm#L347 , then a2 is a log(x) table and a3/a5 are exp(x) tables with extra stuff built in (the X table probably preconverts output into bit/byte split data, the Y table probably pre-multiplies by pitch -- just guessing here).

Last edited by Kalms; 18 July 2020 at 17:50.
Kalms is offline  
Old 18 July 2020, 21:09   #7
Jobbo
Registered User

Jobbo's Avatar
 
Join Date: Jun 2020
Location: Lexington, MA
Posts: 8
Kalms' description is pretty thorough. However the devil is still in the details, it's easy to get tripped up on a number of different precision sensitive phases.

Like Kalms' said the rotation approach works best for a large number of vertices because of the fixed overhead building the axis tables. I actually did something similar by following Zappy's write up and his x86 code from many moons ago: http://www.codercorner.com/Fstrt19b.zip

It's very tricky to keep the axis table rotation results in a range that works well for the log/exp without overflowing. And so I'm impressed with how Axis handles that without the need for any relatively slow asl shifts.

Axis' seems to be using the addx trick to maintain good accuracy for the axis table lerps without the shifts.

He's also generating the rotation code so the unrotated vertices are embedded in the code as offsets to the axis tables, so no need to read any input vertices.

The log/exp tables are used in place of divides for projection using the identity: x/z=exp2(log2(x)-log2(z))

Where it gets tricky is when x is negative because you need to take log2(abs(x)) and then you need to sign correct the exp2 result.

Axis has managed to do this by arranging the table in such a way that there's no need for abs() or sign(). I can't quite see exactly how he's done it but I have something similar working in C code where I just push the log2(-x) value out further so log2(x)-log2(z) works out to a different area in the exp2 table.

Finally the Axis exp tables as Kalms' guessed are optimized so the y result is premultiplied with the pitch and also which bitplane, and the x results are split into word and bit for a single bset.

I don't 100% see exactly how he's doing it all because I can't tell exactly what the table contents are. But I've been able to figure out enough to get something similar almost working. Just a little more work to do optimizing it into asm.

For a starfield Axis' has as far as I can tell taken every optimization opportunity he could. I'm certainly impressed and it's been interesting to figure it out.

For my own purposes I might not go to the same extreme, I'd still like some flexibility so probably won't bake the vertices into the code.
Jobbo is offline  
Old 01 August 2020, 17:17   #8
Jobbo
Registered User

Jobbo's Avatar
 
Join Date: Jun 2020
Location: Lexington, MA
Posts: 8
Hi again

I finished up multiple implementations for vector rotation and projection so they could be compared. I'd be interested to hear if anyone has done the same and had different results.

All tests had 100 vectors randomly set in the range -16384...16384. Everything was running in Winuae as an A500 in cycle exact mode.

First up I tried three different approaches for vector rotation.

1. Regular old matrix 3x3 transform using muls.
Pros:
- No tables needed.
- Vector input can easily be dynamic.
Cons:
- Unsurprisingly quite slow. I got it about as fast as I could by making sure the matrix is entirely resident in registers and also by allowing the results to end up scaled down by 4x so I can just use swap with no shifts.
- Because it's using all the registers it's hard to imagine folding the projection into the same function to avoid piping results out and in.

2. Matrix 3x3 transform using log/exp table look ups.
Pros:
- Takes about 73% of the time the muls version took.
Cons:
- Needs 128k for the log/exp tables.
- Vectors need to be prepared as log x/y/z which makes it harder to handle dynamic changes to the vectors.
- Again pretty register heavy so hard to combine with projection.
- Precision is in theory a concern but honestly I found no issues.

3. Matrix 3x3 transform using axis lookup.
Pros:
- Easily the fastest given enough vectors. For my test it was 47% of the time taken for the muls version.
- Since the main transform code is simple it's easy to imagine folding the projection in here.
- Small table size.
Cons:
- Fixed overhead for building axis tables each frame. This isn't so bad if the vectors are reasonably quantized and there are plenty of them.
- Vectors need to be quantized and prepared so they index the axis table. This isn't really that bad depending on what you're trying to model. For less quantization you'll need bigger axis tables which then costs more fixed overhead.
- Vector inputs are now indices to the axis table which makes dynamic vectors slightly trick, though easier then the log version.

Next up for projection I tried two versions. Both also had some translation folded in for movement.

1. Reciprocal z table to avoid divides.
Pros:
- Nice speed up from the basic divs version.
- Simple enough to fold in with rotation if you wanted.
Cons:
- Requires a table, I used 32k.
- If you want to pre-multiply the screen pitch for y you'll need a separate y table and a separate lookup which will slow things down a little.

2. Log/exp table to avoid divides.
Pros:
- Slightly faster than the reciprocal z version, though not much in the generic case.
- Easy to see how you could fold this in with rotation.
- Easy to use separate x/y tables to pre-multiply the pitch for y.
- Also easy to see how to add screen center offset to tables.
Cons:
- Needs 112k table. In the generic case this will overlap with the log/exp table for rotation if that gets used.
- Needs more table space for separate x/y exp tables with per-multiplied centered results.
- Precision is in theory a concern but again I didn't find that.

Ultimately, which version to use will depend on the effect you're going for, which is something I've not really decided yet.

But looking at all the possibilities was lots of fun.

I'd love to hear if anyone has more suggested approaches or clever tricks they've used.

Last edited by Jobbo; 01 August 2020 at 17:23.
Jobbo is offline  
Old 01 August 2020, 18:03   #9
a/b
Registered User

 
Join Date: Jun 2016
Location: europe
Posts: 186
If you go the rotation matrix route, full 3-axis rotation is doable with some math magix, only 6 muls and 3 extra adds (which are 3 precalced muls), if the object doesn't change. Otherwise you have to precalc the muls again, or just don't use this method if it changes too often.
a/b is offline  
Old 01 August 2020, 19:55   #10
Jobbo
Registered User

Jobbo's Avatar
 
Join Date: Jun 2020
Location: Lexington, MA
Posts: 8
Quote:
Originally Posted by a/b View Post
If you go the rotation matrix route, full 3-axis rotation is doable with some math magix, only 6 muls and 3 extra adds (which are 3 precalced muls), if the object doesn't change. Otherwise you have to precalc the muls again, or just don't use this method if it changes too often.

That is a little too cryptic for me to figure out. Can you explain further what you mean?
Jobbo is offline  
Old 01 August 2020, 21:06   #11
a/b
Registered User

 
Join Date: Jun 2016
Location: europe
Posts: 186
Basic idea is that these two are identical: y=x and y=x+A-A.
Now if x is a bit more complex, like (sin(atan(x^3)*pi)/pentium_fdiv_is_best)*e^acos(sqrt(x)*0.42), you are in heaven. But.... if you pick a specially crafted constant A, you have a chance to simplify it down to something like y=x*69-A.
Same with 3-mul rotation, and how to do it is to start with these generally known and useful identities:
(a+-b)*(a+-b) = a^2+-2ab+b^2 (or a^2+b^2 = (a+-b)*(a-+b)+-2ab)
(a+-b)*(c+-d) = ac+-ad+-bc+-db (or +-ad+-bc = (a+-b)*(c+-d)+-ac+-bd)
So if rot_x = ax+by+cz, you can replace ax+by with one of the above and see where that brings you. Say ac becomes ab (mul of rot matrix coefficients) and bd becomes xy (precalced x*y coords mul).
Not trying to antagonize you :P, I don't have the exact formulas/code (my newer code is using either tables, non-carthesian coords or other weird stuff, and for non-criticals and prototyping I just use 9-mul version for its simplicity).
a/b is offline  
Old 02 August 2020, 17:51   #12
morbid
Registered User

 
Join Date: Aug 2020
Location: Huddinge
Posts: 2
You can find formulas here:
https://mikro.naprvyraz.sk/docs/Coding/1/3D-ROTAT.TXT
morbid is offline  
Old 02 August 2020, 18:20   #13
ross
Per aspera ad astra

ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 50
Posts: 2,707
Quote:
Originally Posted by morbid View Post
Thanks, concise and clear compendium.
ross is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
WHDLOAD: Kickstart slave and exp mem Asman Coders. General 5 31 May 2010 10:28
Amiga 600 exp memory lukezab support.Hardware 4 30 December 2008 17:45
Laser Squad and Lords of Chaos (+exp) costabunny request.Old Rare Games 16 09 November 2008 15:22
MTek exp. board failed or... kixs support.Hardware 9 21 May 2007 15:38
A1200 bits (4/8mb ram exp, 3.5" HD upgrade kit, Mobo) Smiley MarketPlace 5 23 May 2006 17:06

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 21:45.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, vBulletin Solutions Inc.
Page generated in 0.09245 seconds with 14 queries