I hope we can have something cleaner than that. For example, we already have wow_handlers16 for user32.dll -> user.exe calls. We could just have win16 kernel callbacks implemented in user32 and use wow_handlers16 to forward calls to user.exe.
Display drivers are more tricky. Ideally, they would not need PE side and then they would not need kernel callbacks. It's quite a lot of work and requires a number of changes, but that would be the end goal. Maybe we could use something different meantime.
Vulkan will still need kernel callbacks, but we could make the mechanism somewhat more generic. For example, we could have a single generic kernel callback in user32, where we would pass the actual callback in the beginning of args struct. We'd then just take that address in user32 and call it. (Maybe the same mechanism could be then used by display drivers for the transaction period, although that may not be very practical in some cases).