Not sure if this helps, but you can change the logic of the table lookup approach to 'setup code, move.l from table, neg.w table offset, sub.l from table'. If you are doing matrix operations you can re-use some of the setup code across many multiplies. If you are transforming vertices you can also pre-shift your input data. If precision is not very important to you then you can do the transform in logarithmic space and thereby get down to 1 table lookup per multiply.
|