On Fri Feb 10 12:34:40 2023 +0000, Conor McCarthy wrote:
I don't think we need to lock the fence mutex before checking max_pending_value. All cases which update max_pending_value lead to a new call to d3d12_command_queue_flush_ops(), except for updates from vkd3d_wait_for_gpu_timeline_semaphore(), which are not relevant because higher values being waited on there have already been included in max_pending_value.
As before, I don't think we should assume strong memory models, as I guess we want our code to be portable. And on relaxed memory models you're not guaranteed that memory reads are current unless you've performed some acquire operation, like locking a mutex.
Specifically in this case, I guess this is a scenario which could go wrong: thread A waits on a fence, with a value larger than the current `max_pending_value`; it therefore mark that queue as blocked. Then thread A continues doing other stuff, until, for some unrelated reason, it begins flushing queues. In `d3d12_device_flush_blocked_queues_once()` it "steals" all the blocked queues, locks the device mutex and is about to start processing them when it finishes its time quantum and has to yield to another thread. On another core, thread B finally signals the fence with a value that would unblock the queue, and it updates `max_pending_value`. When it comes to `d3d12_device_flush_blocked_queues_once()` there is nothing to do, because thread A has temporarily stolen them, so thread B resumes doing other stuff. This should not be a problem, because as soon as thread A gets rescheduled it should see the new `max_pending_value` and act by it. But, as I said, unless there is sufficient sync hronization, thread A is not guaranteed to see the `max_pending_value` that was set by thread B, and it could conclude that there is nothing to do for that fence, incorrectly missing the queue unblock.