I wonder if WinUAE could be patched to help code profile Amiga executables? I imagine that during C/ASM compilation a MAP file can be produced and the program counter / instruction cache can be modified to dump addresses which can be traced to a particular function allowing for a % percentage time taken in each function. I think Hatari has something similar.
Dunno if that would help you programmers with your optimisations? I remember drawing a pixel bar with all spare CPU time and however far down the screen the bar was that was how much free CPU time you had per VBL