- We shouldn't add this to include/vkd3d.h. We can start implementing the shader cache functionality inside libvkd3d, but we shouldn't expose it as a public libvkd3d API at any point. !732 should unblock moving the implementation to libvkd3d-shader.
That works for me. Once the wined3d side gets more mature we'll need to export it somewhere of course.
- Do we really need vkd3d_shader_cache_enumerate(), and do we really need enumeration to happen through a callback? Could we e.g. return a list of keys instead? As the documentation says, the callback runs while holding the cache mutex, and that seems like asking for trouble.
Returning a list of keys sounds doable and would avoid the held lock. I am also implicitly relying on the reentrancy of critical sections to call other get() calls from inside the callback.
I'll see if I find some inspiration in the implementation of iterator objects in the C++ STL.
- vkd3d_shader_cache_put() has a "flags" parameter, but vkd3d_shader_cache_get() doesn't. Are we sure we'll never need it?
I can add one without defining any valid flags for now. I have no particular preference either way. However, see below on your question regarding eviction - we could potentially drop the flags on put() as well.
- Presumably "mem_size" and "disk_size" in struct vkd3d_shader_cache_info are specified in bytes. Being uint32_t's that would make their maximum size about 4 GiB. And while you can store a fair amount of data in 4 GiB, it's perhaps not entirely inconceivable to go over that limit these days. Should we use size_t for these?
32 bit sizes follow Microsoft's API. If we want to make 64 bit limits possible I'd prefer uint64_t over size_t. If we use size_t we have to use 64 bit sizes in the disk files anyway and beware of situations where a 64 bit sized cache is opened in a 32 bit process.
A cache larger than the address space size is hypothetically possible once eviction is implemented.
- vkd3d_shader_cache_open() gets most of its parameters through struct vkd3d_shader_cache_info, but not "name". Seems arbitrary.
I blame Microsoft for that one too, but I am open to changes.
- By convention, vkd3d_shader_cache_open() would be called vkd3d_shader_open_cache(). (We also have e.g. vkd3d_create_instance(), vkd3d_create_device(), vkd3d_shader_free_shader_code(), and so on.)
The thought was to start all shader cache related functions with "vkd3d_shader_cache" - I'll rename _open and double check _close().
- Do we want to give the application control over when the cache should be flushed to disk? Probably not a requirement for the initial version, but it seems useful to have options other than "the cache get serialised to disk when the cache is closed" and "any shader cache API call may cause the cache to get serialised to disk".
Yes, I think we want that. The second game I tested in wined3d (Mass Effect 2) already ran into problems there - it just calls exit() rather than closing the device properly.
The default behavior should probably be something more reliable than "flush only on close". Either flush after every n writes, have a worker thread flush after a time of inactivity (although I don't like the idea of spawning a thread), flush from an _onexit() callback.
Microsoft's cache seems to flush on every write, at least for application-created caches. I don't want to do that in a blocking manner because depending on the hardware doing a disk write and possibly fsync is potentially worse than compiling spir-v.
- Suppose I want to run two instances of e.g. vkd3d-gears simultaneously. How should that work?
Good question. Windows refuses to open an on-disk cache from two different processes with CreateShaderCacheSession. I haven't tested what happens with the driver's own cache, but I guess one of the processes will end up without access to the cache.
And yes, doing share checks is not there right now. Windows has better support for file locking than Unix and the answer is probably somewhere in flock() or CreateFile's dwShareMode.
- How should names get resolved for on-disk caches? Should "cache" and "./cache" be considered the same cache?
I think we should leave it up to the file system semantics. And yes, we might run in a situation where a caller tries to open "c:/Cache", "C:\cache" and "\?\c:\cAcHe" and it fails because the existing cache list doesn't catch that they refer to the same file.
- Presumably we'd like to be able to evict entries at some point. How would that work with e.g. the render pass cache? Would an eviction just vkDestroyRenderPass() an entry that may still be in use by libvkd3d? Also, how would the cache know how to destroy the evicted entry?
For objects that might be referenced somewhere else the caller should essentially disable eviction by setting the limits to ~0u. That'll need an explicit check if we stick to uint32_t. If we use uint64_t we'll run out of address space before eviction kicks in.
I haven't considered a case where a caller may want eviction to happen, but has special requirements for how to free cached objects. We could either just say that's not supported (i.e., if your values are more than just dumb data, set limits to ~0u) or allow to specify an eviction callback.
An eviction callback could also be a way to specify something similar to PUT_FLAGS_NO_REPLACE. By passing a reason for freeing something (eviction or overwrite) and a return value, the callback could reject the attempt to overwrite an existing value.
- Serialising the rbtree in depth-first order, and then inserting one node at a time when reading it back doesn't seem optimal.
- vkd3d_shader_cache_write() doesn't seem very robust. Generally, when rewriting a file like that, you'd first write the new version to a temporary file, sync, and then replace the old file with the name one. Ideally we'd simply avoid completely rewriting the cache each time though.
Yeah I am still not a fan of doing this stuff ourselves, but as discussed previously I am not convinced by existing libraries I looked at.
Partial updates is something I had implemented in a pre-cleanup patch series. I can add it here if you want, but I don't think it is worth making the MR bigger for it.
- I don't think there's much point in carrying around the ID3D12Device9 patch in this MR. You should just upstream that, ideally sooner rather than later.
- That's largely true for the ID3D12ShaderCacheSession tests as well.
Yes, now that we are already at Device8 it isn't much trouble. Back in November bumping it from Device1 to Device9 was much more regression prone.
I'll send stubs + tests in the next days.