14 July 2020, 03:19  #1 
Registered User
Join Date: Jun 2020
Location: Lexington, MA
Posts: 8

3D Projection using log and exp.
Does anyone have experience using log and exp tables to speed up 3D projection by reducing the divs and muls to subs and adds?
It's basically about using these identities: log(x/y) = log(x)  log(y) log(x*y) = log(x) + log(y) I'm struggling to get this working, either because I'm making mistakes or I've blown my precision. Anyway I was wondering if anyone has experience with this they could share or if anyone could point me to some examples or documents. Thanks 
14 July 2020, 19:24  #2 
Registered User
Join Date: Nov 2006
Location: Stockholm, Sweden
Posts: 226

Precision is not great, no. You end up having very large log/exp tables. I never tested it out myself, as a napkin analysis shows that it should really only be useful for spinning a large number of dots in the center of the screen. I think it would be more useful for the rotation than for the perspective projection.
Instead, consider using the identity (a * b) = ((a + b)^2  (a  b)^2)/4 and having a single y = x^2/4 table. This can help slightly compared to using normal multiplies, and it is easier to understand where your precision is going (or disappearing). Less clock cycles under ideal conditions and arbitrary (a, b) values  but more instruction/data memory accesses in total so may come out worse if you have lots of DMA activity going on. Then, for the perspective divide, use a table lookup to compute 1/x and use a regular multiply (as the numerical range of the input operands will require a very large table). Or, if you really want to use a table approach, convert the 1/x factor into (mantissa, exponent) format via a table lookup, use an x^2/4 table on the mantissa portion and then shift according to the exponent afterward. Converting the divide into a multiplybyreciprocal is often a win; converting the multiply into table lookups and add/sub is only sometimes a win. 
15 July 2020, 06:34  #3 
Registered User
Join Date: Jun 2020
Location: Lexington, MA
Posts: 8

Thanks Kalms,
I'll probably give a bunch of different appoached a try. It does seem like it's a balancing act that depends on what you are going for. I appreciate you sharing your experience. I'm still determined to figure out the log/exp approach. Axis/Oxyron seems to have used it effectively for the starfield in PlanetRocklobster. The code is on github but takes a little figuring out. He states that he was able get 520 stars running at 50hz, rotating, projecting, plotting and clearing. Only adding sprites and copper activity reduced it for the demo. Based on the code and his approach that seems pretty optimal. Not sure if anyone has achieved more? 
18 July 2020, 10:13  #4 
OCS forever!
Join Date: Mar 2019
Location: Birmingham, UK
Posts: 248

I also looked at that code in Planet Rocklobster. I didn't know anything about logs/exp for mult/div so spent some time looking into that but couldn't quite work out how the table was generated for the demo or follow the code (can't even follow my own code without comments lol). I'd be interested to know how you get on.
I was doing my own starfield and I ended up doing the reciprocal method Kalms mentioned. With 2 bitplanes and a triple buffer I got around 350 stars but didn't spend major time optimizing as that was good enough for the title screen I wanted. But yeah a lot less that 500 
18 July 2020, 10:35  #5 
Per aspera ad astra
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 50
Posts: 2,707

I'm pretty sure that at some point I experimented with the LOG/EXP tables, but then for the 3D star field I used the classic method (out of laziness?).
In any case, I had enough stars 
18 July 2020, 14:44  #6 
Registered User
Join Date: Nov 2006
Location: Stockholm, Sweden
Posts: 226

I had a look at Planet Rocklobster. It uses plain linear combinations to do the rotation, and log/exp for the division.
The code at https://github.com/AxisOxy/PlanetRo....asm#L678L766 is run each frame. It computes rotated direction vectors for the X, Y and Z axes respectively. Then https://github.com/AxisOxy/PlanetRo....asm#L814L876 is invoked a couple of times. Those routines compute scaled versions of those direction vectors, for length = 0,1, ..., 63. Later, the pregenerated code at https://github.com/AxisOxy/PlanetRo....asm#L333L345 does rotations by fetching suitable prescaled versions of the prerotated X, Y and Z vectors, and adding them together. The tradeoff being made here is, that generating 3*64 scaled direction vectors, and then adding 500 triplets of them together is quicker than rotating 500 vertices using matrix*vector multiplication. This, in turn depends on that A) generating a number of prescaled versions of the basis vectors can be done by stepping & adding, without needing to do multiplies (so generating 3*64 scaled direction vectors is a lot quicker than rotating 3*64 vectors), B) adding together 500 triplets of vectors is a lot quicker than rotating 500 vectors, and C) the space is dense or highly quantized (so the ratio of vertex count to scaled direction vector count is high). You get a dense space if you have a small object with lots of vertices in it; you get a highly quantized space if you enlarge the space so that each direction vector steps is many pixels onscreen  which is fine for a starfield because people can't see the quantization as well as if the vertices were connected to each other or otherwise formed a shape. If we then look at the generated code, around https://github.com/AxisOxy/PlanetRo...field.asm#L347 , then a2 is a log(x) table and a3/a5 are exp(x) tables with extra stuff built in (the X table probably preconverts output into bit/byte split data, the Y table probably premultiplies by pitch  just guessing here). Last edited by Kalms; 18 July 2020 at 17:50. 
18 July 2020, 21:09  #7 
Registered User
Join Date: Jun 2020
Location: Lexington, MA
Posts: 8

Kalms' description is pretty thorough. However the devil is still in the details, it's easy to get tripped up on a number of different precision sensitive phases.
Like Kalms' said the rotation approach works best for a large number of vertices because of the fixed overhead building the axis tables. I actually did something similar by following Zappy's write up and his x86 code from many moons ago: http://www.codercorner.com/Fstrt19b.zip It's very tricky to keep the axis table rotation results in a range that works well for the log/exp without overflowing. And so I'm impressed with how Axis handles that without the need for any relatively slow asl shifts. Axis' seems to be using the addx trick to maintain good accuracy for the axis table lerps without the shifts. He's also generating the rotation code so the unrotated vertices are embedded in the code as offsets to the axis tables, so no need to read any input vertices. The log/exp tables are used in place of divides for projection using the identity: x/z=exp2(log2(x)log2(z)) Where it gets tricky is when x is negative because you need to take log2(abs(x)) and then you need to sign correct the exp2 result. Axis has managed to do this by arranging the table in such a way that there's no need for abs() or sign(). I can't quite see exactly how he's done it but I have something similar working in C code where I just push the log2(x) value out further so log2(x)log2(z) works out to a different area in the exp2 table. Finally the Axis exp tables as Kalms' guessed are optimized so the y result is premultiplied with the pitch and also which bitplane, and the x results are split into word and bit for a single bset. I don't 100% see exactly how he's doing it all because I can't tell exactly what the table contents are. But I've been able to figure out enough to get something similar almost working. Just a little more work to do optimizing it into asm. For a starfield Axis' has as far as I can tell taken every optimization opportunity he could. I'm certainly impressed and it's been interesting to figure it out. For my own purposes I might not go to the same extreme, I'd still like some flexibility so probably won't bake the vertices into the code. 
01 August 2020, 17:17  #8 
Registered User
Join Date: Jun 2020
Location: Lexington, MA
Posts: 8

Hi again
I finished up multiple implementations for vector rotation and projection so they could be compared. I'd be interested to hear if anyone has done the same and had different results. All tests had 100 vectors randomly set in the range 16384...16384. Everything was running in Winuae as an A500 in cycle exact mode. First up I tried three different approaches for vector rotation. 1. Regular old matrix 3x3 transform using muls. Pros:  No tables needed.  Vector input can easily be dynamic. Cons:  Unsurprisingly quite slow. I got it about as fast as I could by making sure the matrix is entirely resident in registers and also by allowing the results to end up scaled down by 4x so I can just use swap with no shifts.  Because it's using all the registers it's hard to imagine folding the projection into the same function to avoid piping results out and in. 2. Matrix 3x3 transform using log/exp table look ups. Pros:  Takes about 73% of the time the muls version took. Cons:  Needs 128k for the log/exp tables.  Vectors need to be prepared as log x/y/z which makes it harder to handle dynamic changes to the vectors.  Again pretty register heavy so hard to combine with projection.  Precision is in theory a concern but honestly I found no issues. 3. Matrix 3x3 transform using axis lookup. Pros:  Easily the fastest given enough vectors. For my test it was 47% of the time taken for the muls version.  Since the main transform code is simple it's easy to imagine folding the projection in here.  Small table size. Cons:  Fixed overhead for building axis tables each frame. This isn't so bad if the vectors are reasonably quantized and there are plenty of them.  Vectors need to be quantized and prepared so they index the axis table. This isn't really that bad depending on what you're trying to model. For less quantization you'll need bigger axis tables which then costs more fixed overhead.  Vector inputs are now indices to the axis table which makes dynamic vectors slightly trick, though easier then the log version. Next up for projection I tried two versions. Both also had some translation folded in for movement. 1. Reciprocal z table to avoid divides. Pros:  Nice speed up from the basic divs version.  Simple enough to fold in with rotation if you wanted. Cons:  Requires a table, I used 32k.  If you want to premultiply the screen pitch for y you'll need a separate y table and a separate lookup which will slow things down a little. 2. Log/exp table to avoid divides. Pros:  Slightly faster than the reciprocal z version, though not much in the generic case.  Easy to see how you could fold this in with rotation.  Easy to use separate x/y tables to premultiply the pitch for y.  Also easy to see how to add screen center offset to tables. Cons:  Needs 112k table. In the generic case this will overlap with the log/exp table for rotation if that gets used.  Needs more table space for separate x/y exp tables with permultiplied centered results.  Precision is in theory a concern but again I didn't find that. Ultimately, which version to use will depend on the effect you're going for, which is something I've not really decided yet. But looking at all the possibilities was lots of fun. I'd love to hear if anyone has more suggested approaches or clever tricks they've used. Last edited by Jobbo; 01 August 2020 at 17:23. 
01 August 2020, 18:03  #9 
Registered User
Join Date: Jun 2016
Location: europe
Posts: 186

If you go the rotation matrix route, full 3axis rotation is doable with some math magix, only 6 muls and 3 extra adds (which are 3 precalced muls), if the object doesn't change. Otherwise you have to precalc the muls again, or just don't use this method if it changes too often.

01 August 2020, 19:55  #10  
Registered User
Join Date: Jun 2020
Location: Lexington, MA
Posts: 8

Quote:
That is a little too cryptic for me to figure out. Can you explain further what you mean? 

01 August 2020, 21:06  #11 
Registered User
Join Date: Jun 2016
Location: europe
Posts: 186

Basic idea is that these two are identical: y=x and y=x+AA.
Now if x is a bit more complex, like (sin(atan(x^3)*pi)/pentium_fdiv_is_best)*e^acos(sqrt(x)*0.42), you are in heaven. But.... if you pick a specially crafted constant A, you have a chance to simplify it down to something like y=x*69A. Same with 3mul rotation, and how to do it is to start with these generally known and useful identities: (a+b)*(a+b) = a^2+2ab+b^2 (or a^2+b^2 = (a+b)*(a+b)+2ab) (a+b)*(c+d) = ac+ad+bc+db (or +ad+bc = (a+b)*(c+d)+ac+bd) So if rot_x = ax+by+cz, you can replace ax+by with one of the above and see where that brings you. Say ac becomes ab (mul of rot matrix coefficients) and bd becomes xy (precalced x*y coords mul). Not trying to antagonize you :P, I don't have the exact formulas/code (my newer code is using either tables, noncarthesian coords or other weird stuff, and for noncriticals and prototyping I just use 9mul version for its simplicity). 
02 August 2020, 17:51  #12 
Registered User
Join Date: Aug 2020
Location: Huddinge
Posts: 2

You can find formulas here:
https://mikro.naprvyraz.sk/docs/Coding/1/3DROTAT.TXT 
02 August 2020, 18:20  #13  
Per aspera ad astra
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 50
Posts: 2,707

Quote:


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)  
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
WHDLOAD: Kickstart slave and exp mem  Asman  Coders. General  5  31 May 2010 10:28 
Amiga 600 exp memory  lukezab  support.Hardware  4  30 December 2008 17:45 
Laser Squad and Lords of Chaos (+exp)  costabunny  request.Old Rare Games  16  09 November 2008 15:22 
MTek exp. board failed or...  kixs  support.Hardware  9  21 May 2007 15:38 
A1200 bits (4/8mb ram exp, 3.5" HD upgrade kit, Mobo)  Smiley  MarketPlace  5  23 May 2006 17:06 

