For example, we could have a single generic kernel callback in user32, where we would pass the actual callback in the beginning of args struct. We'd then just take that address in user32 and call it.
That was my first thought but since that would give someone a way to run arbitrary code in a kernel callback again I implemented this instead to maybe limit which functions can be called. But there are plenty of other ways to run arbitrary user code from "kernel" code so that was kind of silly. I will implement this way instead.