I am thinking that the process that owns the wl_surface (the desktop process in the design described above) is the one that receives all input and must handle all output for it and all this involves some per-process state. So, either the desktop process handles all that state, making the processes owning the HWND much leaner, or we somehow forward that state to the HWND process to handle (BTW, this seems quite painful, would not recommend).
In any case, Wine has to dispatch input sometimes differently from where it is received from the host point of view. We use the host window mostly as a hint, and I think we try to not send input to another window unless necessary but in the end it doesn't make much different which window really received it, and the receiver -if that's a dedicated process- could simply hint wineserver about the target HWND.
For DirectComposition would the idea be that the host system handles the composition completely (through host child surfaces), or do you expect that Wine would also grow a full-fledged compositor (performing full input handling at the "root" level, compositing to a final surface that is the handed to the host etc)?
I don't really know what it would involve. I hope that we can always keep a fast path where a process can render and present its client surface directly.