View Single Post
Old 06 January 2018, 12:14   #6
chb
Registered User
 
Join Date: Dec 2014
Location: germany
Posts: 439
Just an idea: If you render from the middle to the upper and lower end, you can exploit some symmetry. if you put texel i above the center, you'll always put texel h-i-1 below (h texture height). If you store your texture 90 deg. rotated and scramble it like 0,h-1,1,h-2,2,h-3..., you can read and write a word with two pixels instead of a byte with one pixel, thereby double the speed. You'd need two additional blitter passes to separate/unscramble the two bytes in the output (or you could maybe even manage to bake it into the c2p?), but you could still come out faster.

EDIT: Here's a nice example where a byte-swap instruction is missing...
chb is offline  
 
Page generated in 0.05420 seconds with 11 queries