On Fri Feb 10 12:10:25 2023 +0000, Conor McCarthy wrote:
My understanding of the problem with blocked_queue_count is this sequence:
- thread0: Is in d3d12_command_queue_Wait() after `if
(!command_queue->ops_count && value <= fence->max_pending_value)` but has not yet called d3d12_device_add_blocked_command_queues(). 2. thread1: Is handling a signal which unblocks the wait, but had not yet updated max_pending_value when thread0 checked it. It executes `if (!device->blocked_queue_count)` before thread0 calls d3d12_device_add_blocked_command_queues(). 3. thread0: calls d3d12_device_add_blocked_command_queues() but the corresponding call to d3d12_device_flush_blocked_queues() has passed in thread1.
Notice that on relaxed memory architectures, like ARM, something even worse can happen: even after that `d3d12_device_add_blocked_command_queues()` has released its mutex, it's not guaranteed that another thread observes the new value for `blocked_queue_count` unless this latter thread has first acquired the mutex. The reading thread might, for example, read from an earlier cached value of `blocked_queue_count` and not bother checking that the cache is still up-to-date. So I think we should really protect the read with the mutex.