1/6 doesn't reflect the actual commit that was merged, so I guess it should be updated?
Sure. Initially I would assume that this MR would wait for the vkd3d release anyway, but if it is fine to touch the vkd3d sources directly it's even better for me; I'll fix the MR.
Ah, I think there's no reason not to wait for the release (and not waiting would tick off distributions, although we're already needlessly annoying them...)
Is this actually related to the bias removal? I don't immediately see why it's related, but I'm probably missing something.
I agree it's not very obvious, and surely there is room for interpretation about whether they're related or not. The immediate reason for putting these two changes together is that if I don't I temporarily have to add a `todo_wine`, because, even while broken, the bias covers up for a deficiency with signalling through the command line rather then signalling the fence directly. If that's ok for you I can first remove the bias and then change the `Signal()` in another commit.
Hmm, which test? I guess we do a Wait(0) and expect that to be signaled immediately, whereas waiting for the GPU to be done isn't fast enough?
I think that's likely enough to not *actually* break something, that having the temporary todo is fine. But it also calls to question whether Wait(0) is actually the right thing.
Or could we even just swap the order?
Do you have a reference for where the frame latency event is actually supposed to be signaled? i.e. that KHR_present_wait is the correct place? I don't think I see a problem with moving it here either way, but it'd be nice to know.
[IDXGISwapChain2::GetFrameLatencyWaitableObject()](https://learn.microsoft.com/en-us/windows/win32/api/dxgi1_3/nf-dxgi1_3-idxgi...) says "a waitable handle that signals when the DXGI adapter has finished presenting a new frame", which I interpret as being the same thing that Vulkan specifies for VK_KHR_present_wait. AFAIU the idea is that you decide that you don't want to have more than, say, three frames in flight at any given time. So you start your semaphore at three, release each time a frame is presented (i.e., it's not in flight any more) and wait for it each time you want to start working on a new one.
Yeah, but there's a lot of nuance there. You might expect this kind of thing to be signaled, say:
(1) when the present is submitted to the GPU
(2) when the present is visible to the underlying graphics library, so drawing commands from other APIs will work — this is what GLX_OML_sync_control does, or at least what we are trying to use it for. This may be before or after (1)
(3) when the first pixel is visible to the user — this is what KHR_present_wait is supposed to do
(4) when the last pixel is visible to the user
I could honestly see the language that Microsoft uses meaning any one of these. I might be more inclined to guess (4), honestly. I don't know which one makes sense if you want to avoid having more than N frames in flight, since I basically don't know what the purpose of that is. [I guess for something like a video game, where you want the user's reactions based on their last input, you'd want (4), but I don't think that quite works out wrt neurology anyway...]
This commit changes it to be (1). I don't see a reason for that to matter, but I don't have much knowledge of this in the first place.
If we want to emulate (4), I guess we'd want to do a CPU wait for a semaphore signaled by the present command we just queued.
Should we wait _before_ presenting instead? It's what the documentation seems to suggest native does [1], and it seems like it makes more sense—wait until there's "room" to present a frame before doing so? I don't know if it makes a difference in practice, though.
I'm not sure I read that in that page. Reading step 4, it seems that the suggested loop is wait-render-present, rather than render-wait-present. So, if you assume that the onus of waiting is on the implementation rather than the client, the wait should happen after presenting, not before. And TBH that's what sounds more sensible to me. The wait operation is not to ensure you have room: you're assuming you have room in the swapchain already when you're rendering, because you're rendering to the swapchain image. The wait operation is to avoid start doing computations for a following frame too early, in order to keep latency within the maximum you decided.
Sure, that's the intent of the latency event feature, but there's also this sentence:
"When the rendering loop calls Present(), the system blocks the thread until it is done presenting a prior frame, making room to queue up the new frame, before it actually presents."
Which seems to say pretty unequivocally that native does a wait before present.
I can't think of a way in which this difference would be actually observable, mind.
`+ swapchain->frame_latency = 3;`
This seems reasonable enough, but is it derived from anywhere specific?
I see it in [IDXGIDevice1::SetMaximumFrameLatency()](https://learn.microsoft.com/en-us/windows/win32/api/dxgi/nf-dxgi-idxgidevice...), section "Parameters". It also aligns with my timing observations. Unfortunately it's very hard to write tests for that that are not very flaky, because timing is the only observable that I have and that's of course very noisy (when I have access to the frame latency waitable it's a bit easier, since checking the semaphore count is somewhat easier).
Ah, thanks!
EDIT: fight markdown