On 9/19/19 9:31 AM, Conor McCarthy wrote:
I think it's risky to store the lock in the vkd3d_view, because if the current thread hasn't already inc'd the ref count, the view could be deallocated at any time.
The core problem is that while ref counting is fine for concurrent management of object lifetime, it's not a safe way to acquire a reference to an object to which we have no reference. You could have just incremented the ref count of a deallocated object! So we need locks to protect the acquisition of a reference. I've attached a patch which does this and is sufficient to get the game working. It uses the device mutex so is not optimal, but it's designed to spend the minimum amount of time locked. See if you can make a faster spinlock version.
This is tricky. For maximum performance we must only wait for our destination descriptor's view to be replaced and in a valid state again. But, as you mentioned, if we store the lock inside the view, there are multiple steps involved in acquiring it (accessing the view pointer, then increment the reference count).
Also, in your patch, I noticed that you intentionally unlock before performing the copy or overwrite. This puzzles me, if one thread waits on the global mutex for access to a descriptor's view to dereference, and we unlock the mutex for them when the view is NULL, they may crash when trying to decrement the reference of the NULL view. If we add a nullcheck to vkd3d_view_decref_descriptor, we'd still have a problem with a resource leak, where a thread overwrites the descriptor without actually freeing the Vulkan object created by the other thread.
Taking this all into account, I think that a good solution would be keeping your global mutex (or making it a spinlock local to accessing the views, maybe even local to the heap?) to acquire the references safely. However, instead of destroying the Vulkan object inside of this lock, maybe we could copy the Vulkan handle, and destroy it after we unlock the broader mutex/spinlock. Assuming we keep the overwriting/copying outside of the lock, we would probably want to call the Vulkan destroy function after the descriptor's view is in a valid state again, to minimize time where the descriptor is invalid and avoid the problem I mentioned in the previous paragraph.
Thoughts?
-Derek