Hi, thanks for the feedback, I agree with most of what you said. Just a few comments:
I think the main thing is that you really don't want branches inside the inner loop. You can handle the cases where a component is unused by building a mask before entering the loop and just or'ing with that mask inside the loop.
Good point. Not sure whether that eliminates the need for both the 'if(DestFormat.bits[x])' AND 'if(SrcFormat.bits[x])', but I can merge them into one 'if' at least.
As for the shifts, in the large majority of cases you will only need two shifts. The only time you need more than that is when the destination component is more than two times the size of the source component, so I don't think you want to bother with that in the common function.
I don't think there's much overhead by doing the for loop since even in the cases that need to shifts there's at least one if(SrcFormat.bits < DestFormat.bits) necessary. The code inside the for loop will just add one assignment and an additional if. Perhaps lateron this could really be split into a seperate function, but I don't think it's worth to add another hundred lines of code for this small optimization :/
Apart from that, I'll be changing my code accordingly to your suggestions. Best regards, Tony