"subjective quality" is a great term for that, since i suppose you can't describe it in scientific terms?

It can be seen on any common video upscaler  and it can be described by math to  pre and post ringing are error that can be described.
Similar is in audio  Sinc is optimal only in frequency domain as a approximation of the brick filter with some limited (windowed) number of taps.
actually, in terms of quality sinc is ideal. the ringing can be minimized as in my implementation in bubsy's protracker clone.

optimal but only in frequency domain  FFT is not everything...
try this: close your eyes and listen to the audio, can you hear ringing, or can you hear exactly what comes out of a paula? have you attempted to record the output of your paula using your pc's soundcard, and did you notice any ringing then?
now we've completed our postulate and can move forward to the experimental stage:

Why not use decent filter with flat phase and magnitude response? this is only coefficient for software...
All proposed method are OK but they are also OK to prove that Sinc can be replaced by different type of interpolation and number of taps (or rather complexity of interpolator) can be similar or lower without pre and postriniging which not exist in source data.
Interpolation is like guessing  maybe this sample is 0.1 or maybe 0.0975 (but it can be 0.97 to)... there is no one optimal way to solve such problem  and this is personal  subjective choice  until You filter out aliasing and fulfil Nyquist criteria You are ok but way how it can be made or done it is different story.