I believe this performs similarly, if not better, than how direct calls previously performed. It is possible to make things even better with tail calls on the PE side, but it is going to be a little bit harder, and I'll make another MR later.
FWIW except for vkoverhead benchmark, I still yet to see a real case scenario where it makes a difference, though I think this is straightforward enough.
CC @mbriar