I did some measurements with Cyberpunk 2077 to see how many times we need to spin (i.e., execute the `for` loop) on average for each call to `vkd3d_desc_object_cache_get()`. Results seem to be good: the ratio never reaches 2. It starts at 1, then grows a bit towards 1.5-1.6, then it decreases back seemingly converging to 1. That means that after some transient we basically never spin more than once for each call to `vkd3d_desc_object_get()`.
I think the MR is already good enough to be accepted. Further optimization like the cache size or thread-local caches could be considered in the future if some more performance has to be squeezed (though I wouldn't oppose to having them immediately if anybody wants to implement them right away).