I'm not well enough at the moment to examine this series thoroughly, but what issues is it designed to fix? I understand that in d3d12_device_flush_blocked_queues_once() a blocked queue which has been buffered in blocked_queues[] for re-adding may be unblocked and then re-added in another thread, then added again from blocked_queues[]. Can we just perform deduplication on the blocked queue array (maybe only if fills up), since flushing an unblocked queue is harmless, or would there still be an issue?
On the other hand, why do you think that doing deduplication would be better? Keeping the blocked queues array deduplicated in the first place looks better to me because it saves time both when scanning it to look for unblocked queues and when deduplicating it if necessary. The cost of my implementation is basically having to lock and unlock the mutex more frequently, but all the critical sections are quite fast, so that should be rarely contended.
The effect is probably mild if existing at all on either side, granted, so I can drop this patch without too much remorse, but I am interested in your perspective.