Giovanni Mascellani (@giomasce) commented about libs/vkd3d/vkd3d_private.h:
HRESULT vkd3d_uav_clear_state_init(struct vkd3d_uav_clear_state *state, struct d3d12_device *device); void vkd3d_uav_clear_state_cleanup(struct vkd3d_uav_clear_state *state, struct d3d12_device *device);
+struct desc_object_cache_head +{
- void *head;
- unsigned int spinlock;
+};
In theory it is advised to avoid putting more than a spinlock on the same cache line, because the cache line would be contended by different cores even if they mean to operate on different spinlocks. I guess that would amount to ensure that `struct desc_object_cache_head` is padded and aligned to the size of a cache line. On Intel architectures a cache line is 64 bytes, so you are putting four spinlocks in the same line.
That, of course, could turn out to be the usual theoretical thing that doesn't count at all in practice, but maybe it's worth having a try.
Unfortunately C doesn't (as far as I know) offer a portable way to query the cache line size at compilation time ([as C++17 does](https://en.cppreference.com/w/cpp/thread/hardware_destructive_interference_s...)). [Experimenting a little bit with the compiler explorer](https://godbolt.org/z/fEe4rK74s) it seems that most architectures are either 32 or 64 bytes, with PowerPC being 128 bytes and ARM64 possibly even 265 bytes. Given that we mostly care about Intel and ARM, I guess that we can just settle for 64, but 256 for ARM64.