Re: [PATCH v4 0/1] MR297: vkd3d: Fix invalid atomic behaviour in the view cache linked list.

21 Aug 2023


      ...
Spinning is the big performance killer. That seems to be the case for mutexes too because entry uses spinlocking. I see no measurable performance gain from a 64-byte alignment,
I imagine at least part of that is due to the atomic operations on cache->next_index and cache->free_count in vkd3d_desc_object_cache_get() and vkd3d_desc_object_cache_push().
...
I did some measurements with Cyberpunk 2077 to see how many times we need to spin (i.e., execute the `for` loop) on average for each call to `vkd3d_desc_object_cache_get()`. Results seem to be good: the ratio never reaches 2. It starts at 1, then grows a bit towards 1.5-1.6, then it decreases back seemingly converging to 1. That means that after some transient we basically never spin more than once for each call to `vkd3d_desc_object_get()`.
So it essentially get rid of the contention; that's great to know.
...
I think the MR is already good enough to be accepted. Further optimization like the cache size or thread-local caches could be considered in the future if some more performance has to be squeezed (though I wouldn't oppose to having them immediately if anybody wants to implement them right away).
I think it's an improvement too, so I'll approve this. I do think there's further room for improvement though, both in terms of performance and in terms of code quality, and I'd prefer seeing those sooner rather than later. (E.g., I don't like the magic "16"; I don't like that we're rolling our own spinlocks here; I don't like the number of atomic operations in what's supposed to be a hot path.)
-- 
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/297#note_42864

2025

2024

2023

2022

Re: [PATCH v4 0/1] MR297: vkd3d: Fix invalid atomic behaviour in the view cache linked list.