Hmm, which test? I guess we do a Wait(0) and expect that to be signaled immediately, whereas waiting for the GPU to be done isn't fast enough?
I think that's likely enough to not _actually_ break something, that having the temporary todo is fine. But it also calls to question whether Wait(0) is actually the right thing.
Or could we even just swap the order?
I doubt any real program is going to be broken by splitting that commit, but in both orders you have a check that temporarily regresses. Partly because I couldn't choose between the orders, partly because the changes are somehow related, partly because they're both quite little, I ended up merging them. At any rate, I split that commit in both ways, you tell me which one you prefer: https://gitlab.winehq.org/giomasce/wine/-/commits/dxgi4 and https://gitlab.winehq.org/giomasce/wine/-/commits/dxgi5 (they both contain the same commit at the beginning to restrict testing to the test that actually matters and skip all the mode changing mess).
Yeah, but there's a lot of nuance there. You might expect this kind of thing to be signaled, say:
(1) when the present is submitted to the GPU
Well, submitted to the underlying API (Vulkan in this case). Whether that coincides with when the present is submitted to the GPU might depend on the driver, I think. At any rate, it's the last moment in which we know something about that frame's timing; once it's submitted to Vulkan we have no way to know any more, unless we use VK_KHR_present_wait.
(2) when the present is visible to the underlying graphics library, so drawing commands from other APIs will work — this is what GLX_OML_sync_control does, or at least what we are trying to use it for. This may be before or after (1)
(3) when the first pixel is visible to the user — this is what KHR_present_wait is supposed to do
(4) when the last pixel is visible to the user
I could honestly see the language that Microsoft uses meaning any one of these. I might be more inclined to guess (4), honestly. I don't know which one makes sense if you want to avoid having more than N frames in flight, since I basically don't know what the purpose of that is. [I guess for something like a video game, where you want the user's reactions based on their last input, you'd want (4), but I don't think that quite works out wrt neurology anyway...]
I would argue that (3) and (4) are basically the same thing: AFAIK on modern monoitors the image is displayed in one shot, there is no electronic cannon scanning the pixels one by one; as soon as the entire image is on the screen and the swap time comes, all the pixels switch to the new value. Even if I am mistaken, or if the user is using an older monitor, soon after the last pixel is made visible to the user a new frame will begin, and its first pixel will be visible to the user. So in practice (3) and (4) are either the same thing or very close to each other. In other words, I guess (3) and (4) are the best approximation one of the other if we want one of them and the other one is not directly available.
In my interpretation the expectation for the frame latency waitable is either (3) or (4). I don't know for sure which one, but as I said using VK_KHR_present_wait is the best strategy anyway.
This commit changes it to be (1). I don't see a reason for that to matter, but I don't have much knowledge of this in the first place.
Well, before that commit it is a random moment between when `Present()` is called and when the frame is actually presented. Which moment it is depends on factors that should be irrelevant, like how many other waits there are in the queue and how they interact to each other (i.e., when the internal vkd3d queue is flushed), potentially even for waits that happen after `Present()` is called.
If we want to emulate (4), I guess we'd want to do a CPU wait for a semaphore signaled by the present command we just queued.
I don't think the present command can signal anything (neither fences nor semaphores) on Vulkan. It is a one way command, with respect to timing. AFAICT once you've submitted something with `vkQueuePresentKHR()` the only way to know something about its timing is to use VK_KHR_present_wait (which is a can of worms on its own, since it requires a dedicated thread for doing that wait, but that's material for another MR). I think VK_KHR_present_wait was added precisely because of this `vkQueuePresentKHR()` deficiency.
So I am resorting to (1) for the moment because it's the only one that can be used in full generality, even if it's a rather poor approximation. Eventually VK_KHR_present_wait support should be added. Ideally my MR is designed so that shouldn't require many more changes (except for an additional thread per swapchain, unfortunately).
"When the rendering loop calls Present(), the system blocks the thread until it is done presenting a prior frame, making room to queue up the new frame, before it actually presents."
Which seems to say pretty unequivocally that native does a wait before present.
Hmm, ok. It can't understand the reason for that choice, but you're right that it's the most obvious interpretation of that sentence. I'll fix the MR.
I can't think of a way in which this difference would be actually observable, mind.
Well, I guess advanced users can see the slight latency difference between when events are processed and when the frame is presented. Or you can see that with advanced equipment, I think there are YouTube channels that care about that. It's also likely that contexts in which the developer means to care about latency do explicitly use the frame latency waitable.