Sorry for the delay, it's hard for me to have a concrete opinion. The approach is very different than what I imagined. Those win32u internals and child list shuffling feels specific to this exact winex11 implementation and it's not clear to me that we need a driver-specific thing like that in the first place.
With things like the compositor on the horizon, I expect that we will want to submit child window to the compositor rather than to graphics driver. I guess that means we will need to be able to create some form of virtualized swapchain where we have full control over the presentation, sort of like you did for the GDI fallback. Something like `VK_EXT_headless_surface` seems nice for it, but we have enough control over Vulkan overall to do things like that without it.
For the actual host WSI surface, we could require just one, for the top level window. If we had it detached from the client surfaces (like above), we could just create, modify, recreate and reattach it to surfaces at as it suits us. For the presentation, we'd pass through images whenever possible. If it's not possible to pass through, we fallback to GDI path. Ideally, that's where we'd plug into the compositor to make it efficient, but I think it would be interesting to try something like that even without the compositor.
I can't be sure about any of that without trying. That said, I'm unassigning myself as I don't intend to be a blocker.