Here is a preview of my shader cache work for early comments. It isn't complete, but does successfully cache things.
What's there:
* A new vkd3d API that is used internally for caching, can be used to implement the ID3D12ShaderCacheSession interface and hopefully be used by wined3d as well
* Simple saving and loading of the cached objects
* It is used to cache render passes, root signatures and pipeline states
What is not yet there
* Partial cache loading and eviction
* ID3D12ShaderCacheSession - largely because it needs bumping ID3D12Device up to version 9, which may bring unrelated regressions. For this and tests see my "cache-rework" branch (which
* Cache file compression
* Incremental updates of cache files - right now they are rewritten from scratch on exit
* Loading the cache in an extra thread. The pipeline state creation code will need some refactor for that
I am not quite happy yet with the two patches that write and reload actual graphics pipelines. The way I am storing the d3d settings aren't quite consistent yet either - in some cases I use the d3d input data as key directly, in others I store them as values attached to a hash value. The latter is usually the case if I need to cross-reference something, e.g. have a link from the pipeline state to the root signature. This kind of setup shows how wined3d can build a chain of linked state though.
There are also known issues with locking, explained in comments in the patches.
--
v2: vkd3d: Try to find a read-only cache in C:\windows\scache
vkd3d: Cache and preload compute pipelines.
DEBUG: Make cache profiling more visible
vkd3d: Add some cache efficiency debug code.
vkd3d: Add EXT_pipeline_creation_feedback.
vkd3d: Catch and release graphics pipelines.
Store graphics pipelines in the cache.
vkd3d: Precreate root signatures from cache
vkd3d: Keep root signatures around.
Store render passes in the on-disk cache and recreate them on startup.
vkd3d: Store the VK pipeline cache in an on-disk vkd3d cache.
vkd3d: Keep the application name around.
Add a win32 version of vkd3d_get_program_name.
vkd3d: Basic shader cache writing and reading.
vkd3d: Replace the custom render pass cache with vkd3d_shader_cache.
vkd3d: Implement vkd3d_shader_cache_enumerate.
Add cache locking.
vkd3d: Implement vkd3d_shader_cache_get.
vkd3d: Implement vkd3d_shader_cache_put.
Create and destroy the shader cache tree.
vkd3d: Implement shader_cache_open/close.
vkd3d: Define and stub the shader cache API.
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541
After running 'make check', this script can be used to see which tests are not passing, showing the test line and a \[XP\], \[XF\], or \[F\] tag, for each backend.
It can be used like this:
```plaintext
python3 lightboard.py [-b] <vkd3d_build_path>/tests/hlsl
```

So we can say "All lights green across the board" in a cinematic way.
--
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/427
moniker_tree_get_rightmost(root) can return the same pointer as the root parameter so node can
equal to root. moniker_tree_discard(node) frees node, which could be same as root. Then
moniker_create_from_tree(root) could access the already freed pointer.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/4768
Based on [a patch](https://www.winehq.org/mailman3/hyperkitty/list/wine-devel@winehq.or… by Jinoh Kang (@iamahuman) from February 2022.
I removed the need for the event object and implemented fast paths for Linux.
On macOS 10.14+ `thread_get_register_pointer_values` is called on every thread of the process.
On Linux 4.14+ `membarrier(MEMBARRIER_CMD_GLOBAL_EXPEDITED, ...)` is used.
On x86 Linux <= 4.13 and on other platforms `madvise(..., MADV_DONTNEED)` is used, which sends IPIs to all cores causing them to do a memory barrier.
--
v11: ntdll: Add thread_get_register_pointer_values-based implementation of NtFlushProcessWriteBuffers.
ntdll: Add sys_membarrier-based implementation of NtFlushProcessWriteBuffers.
ntdll: Add MADV_DONTNEED-based implementation of NtFlushProcessWriteBuffers.
https://gitlab.winehq.org/wine/wine/-/merge_requests/741
This adds the missing interfaces for `ID3D11On12Device1` and `ID3D11On12Device2` so they can be used in projects using mingw.
--
v2: d3d11on12: Add interfaces for ID3D11On12Device1 and ID3D11On12Device2
https://gitlab.winehq.org/wine/wine/-/merge_requests/4951
Currently GetFileType() ends up returning the file type solely based on Unix fd type if it gets it from server. The problematic case is when our process gets a pipe or socket fd as a stdin or stderr from Unix. The server object which gets created through wine_server_fd_to_handle is a regular file regardless of the underlying Unix fd type. It probably can't be anything else, at least for pipes, as the pipe should have both ends but we have only one in this case.
That causes problems, e. g., with libuv (used, e. g., by Vampire Survivors). It checks stdout handle type and if it is pipe it tries to do SetNamedPipeHandleState() on it and does not tolerate the failure.
It looks more sensible to me to report all the pipes and sockets created through wine_server_fd_to_handle() as regular files as that matches the server objects we have for them. When that is not the case we should get the correct type from server fd type.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/1425
This implements most of `D3DKMTEnumAdapters2` function, which is [indirectly used by all games using NVIDIA Streamline SDK](https://github.com/NVIDIAGameWorks/Streamline/blob/7ac42e47c7dd55b5b6d…. Having it (and [one another D3DKMT function](https://github.com/NVIDIAGameWorks/Streamline/blob/7ac42e47c7dd55… but one thing at a time) working is a requirement to enable some interesting features like DLFG (also known as DLSS-G / DLSS Frame Generation). Although currently not supported by NVIDIA on Unix systems, the lack of these functions also prevents [some game modifications that hijack DLSS-G](https://github.com/Nukem9/dlssg-to-fsr3) from working altogether because the adapter check still has to be performed.
The way I chose to tackle this was adding a new GDI driver function which winex11 then implements. This mirrors how `D3DKMTOpenAdapterFromLuid` is implemented, but a not-so-short conversation I had with Paul Gofman made me realize that there could be a better way to do this by tapping into win32u's adapter cache. More on that closer to the end, first I'll explain my initial idea.
The handle assignment is quite tricky; because this is done in win32u, we need to prepare as many handles as the number of elements in the array we were passed. However, this array can be much larger than the actual number of GPUs in the system (Windows 10 machine I ran some tests on always responded with 32 `NumAdapters` when given `NULL` in `pAdapters` even though there were just 2 adapters available). I chose to resolve this by allocating that many handles, then calling GDI driver function while still holding the mutex, then freeing unused handles from the end of the list. Some possible alternatives would be:
* Not holding the mutex while calling GDI driver function. We would still allocate more handles than needed but other functions would not be blocked while winex11 is doing its thing. We would then acquire the mutex again to free unused handles, which would require us to search for them first because it would no longer be guaranteed that they are at the end of the list.
* Calling the GDI driver function more than once, the first time to actually enumerate GPUs and the second time to give their handles GDI driver to save them, which would complicate the API contract between win32u and GDI driver quite a bit.
* Pass an additional parameter to let GDI driver allocate handles itself. It could be something as simple as pointer to the next handle variable, or something as complicated as an allocator function than can do anything win32 needs it to do (plus maybe a deallocator function to handle errors). But I suppose it would still require the mutex to be held for the entire call.
* … or something else I didn't think of.
There are two more members returned in each adapter info struct, `NumOfSources` and `bPrecisePresentRegionsPreferred`. I'm not exactly sure how am I supposed to handle them so for now they will always be left zeroed. VidPN and related stuff are just too arcane for me, sorry 😅
However, Paul suggested that I don't actually need to add a new GDI driver function to implement this. If I considered win32u's adapter list as maintained in `sysparams.c` to be the source of truth here, I could avoid calling into GDI driver and just perform the enumeration within win32u. But with how other D3DKMT functions are currently implemented, this will almost surely cause issues when interacting with some of them; e.g. handles returned by `EnumAdapters2` wouldn't be usable with `D3DKMTQueryVideoMemoryInfo` because that function _does_ forward most of the heavy lifting to be done by the GDI driver and winex11 would be unaware of what the given handle represents, hence unable to query appropriate `VkPhysicalDevice`.
(In the short term this could probably be avoided by calling `D3DKMTOpenAdapterFromLuid` for each enumerated LUID but this in turn relies on adapters actually having valid LUIDs assigned. On my current machine, Intel/Nvidia dual GPU laptop running Plasma Wayland, mainline Wine without some patches pulled from Proton seems to have… issues doing this properly…)
On the other hand, if everything in D3DKMT family of functions already went through win32u caches, this would have been much simpler. _Probably._ But because it does not, I feel like I should ask: why was it done like that? If there was a specific reason for avoiding win32u caches, what would be the correct way of implementing `EnumAdapters2` then? Following the way of `OpenAdapterFromLuid` and mapping UUIDs to LUIDs manually via the registry (my first/existing version, which also makes this fully reliant on Vulkan) or rewriting all this to go through win32u adapter cache? If the latter, then shouldn't existing D3DKMT functions perhaps be rewritten to use the cache as well and avoid calling GDI driver/Vulkan?
I'm open to suggestions so please let me know if you have any.
(I'm aware Wine is currently in code freeze, but I wanted to send this to gather some early feedback, I'm fine with waiting a few weeks before this becomes eligible for merging.)
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/4791