Here is a preview of my shader cache work for early comments. It isn't complete, but does successfully cache things.
What's there: * A new vkd3d API that is used internally for caching, can be used to implement the ID3D12ShaderCacheSession interface and hopefully be used by wined3d as well * Simple saving and loading of the cached objects * It is used to cache render passes, root signatures and pipeline states
What is not yet there * Partial cache loading and eviction * ID3D12ShaderCacheSession - largely because it needs bumping ID3D12Device up to version 9, which may bring unrelated regressions. For this and tests see my "cache-rework" branch (which * Cache file compression * Incremental updates of cache files - right now they are rewritten from scratch on exit * Loading the cache in an extra thread. The pipeline state creation code will need some refactor for that
I am not quite happy yet with the two patches that write and reload actual graphics pipelines. The way I am storing the d3d settings aren't quite consistent yet either - in some cases I use the d3d input data as key directly, in others I store them as values attached to a hash value. The latter is usually the case if I need to cross-reference something, e.g. have a link from the pipeline state to the root signature. This kind of setup shows how wined3d can build a chain of linked state though.
There are also known issues with locking, explained in comments in the patches.
-- v3: DEBUG: Make cache profiling more visible Revert "simple benchmark" simple benchmark tests: Add tests for ID3D12ShaderCacheSession. vkd3d: Implement ID3D12ShaderCacheSession::SetDeleteOnDestroy. vkd3d: Implement ID3D12ShaderCacheSession on top of vkd3d_shader_cache. vkd3d: Add a stub ID3D12ShaderCacheSession implementation. vkd3d: Stub device methods up to D3D12Device9. vkd3d: Load cached pipelines in a separate thread. vkd3d: Try to find a read-only cache in C:\windows\scache vkd3d: Remove the custom render pass lock. vkd3d: Cache and preload compute pipelines. vkd3d: Add some cache efficiency debug code. vkd3d: Add EXT_pipeline_creation_feedback. vkd3d: Catch and release graphics pipelines. vkd3d: Store graphics pipelines in the cache. vkd3d: Precreate root signatures from cache vkd3d: Keep root signatures around. vkd3d: Store render passes in the on-disk cache and recreate them on startup. vkd3d: Store the VK pipeline cache in an on-disk vkd3d cache. vkd3d: Keep the application name around. vkd3d: Add a win32 version of vkd3d_get_program_name. vkd3d: Basic shader cache writing and reading. vkd3d: Replace the custom render pass cache with vkd3d_shader_cache. vkd3d: Implement vkd3d_shader_cache_enumerate. vkd3d: Add cache locking. vkd3d: Implement vkd3d_shader_cache_get. vkd3d: Implement vkd3d_shader_cache_put. vkd3d: Create and destroy the shader cache tree. vkd3d: Implement shader_cache_open/close. vkd3d: Define and stub the shader cache API.
This merge request has too many patches to be relayed via email. Please visit the URL below to see the contents of the merge request. https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541
I've pushed an update of the shader cache patches. Sorry Gio, it is a mix of rebase and new cache work.
It has a big amount of patches now that can be roughly grouped as follows:
1-9: Define and implement the shader cache API itself 10-20: Use the cache to store vkd3d pipelines 21-23: Some refinements we may or may not want (esp 22 is fairly crossover centric) 24-28: Tests 29: Benchmark / debug hacks. We don't want to merge them as such
I think I got out the obvious TODOs, lack-of-error-handling FIXMEs etc. There are still some tasks left before this is ready to be merged (see below), but I don't think they are in the way of another round of review.
Major changes since the last version:
1) I've removed the hash collision return value and compare the entire key in the tree compare callback. Extra care will need to be taken once we add memory evicition, I have added a comment to that effect. The reason is that wined3d can replace some of its own shader rbtrees with an in-memory cache as long as the cache doesn't throw away data randomly (which a hash collision de facto does).
2) Plenty of work for a non-change: No sqlite. I looked into it, and we can certainly use it to write our files, but it is not suitable for replacing our own memory data structures. The performance of sqlite for a simple key-value lookup is a lot worse than the rbtree. Furthermore it can't stop being a transactional database system that writes to disk at the earliest oportunity. This makes cache writes for on-disk caches slow, although this is a property that native d3dscache shares.
If we write the entire cache in one transaction (i.e., keep our own rbtree and just flush it out in shader_cache_write in one go) it is fast. The sqlite way of integrating it is to put the 8 MB sqlite3.c 'amalgamation' into our source tree, but with our default compiler settings it spews out plenty of warnings about discarded consts and the like. The Wine way would be to put said sqlite3.c into wine/dlls/winsqlite3.dll and link to it. I don't want to add this delay right now, but I can certainly change my mind if there's consensus to do either of those two things.
Another concern with sqlite is the address space use. Its documentation specifies a number of knobs to control its memory consumption, but I didn't look if they actually do what we want.
3) I added a flag to disallow replacing an existing value.
4) I nicked VKD3D_MUTEX_INITIALIZER from !384.
5) Vulkan-style extended structures
---
Some things I didn't do (yet). I don't think they influence review much:
1) Relocate everything to vkd3d-shader. Or figure out the d3dscache.dll API and implement that. It doesn't look complicated. I think we want to sort this out before un-drafting this MR.
2) Size accounting. This is where vkd3d_shader_cache_{add/remove}_item come in in the future.
3) Cache eviction / partial reads. Not a major issue for me at this point, but we want this sorted before using it in wined3d (due to 32 bit concerns).
4) Still using mvk's hashing function I don't think we need to change away from 64 bit hashes now that the cache always compares the full key.
5) A separate flag for memory only caches vs using a disk size of 0. I don't care, Matteo and Gio fight this out outside of the saloon at high noon :-)
6) Add a flag for a private only cache / make sharing opt-in as suggested by Gio: I have no strong opinion either way. I can't think of a case where we want in-process cache sharing ourselves, otoh it is yet another behavior modifying flag in the API.
I did a fairly high-level review of this. I did notice a number of details as well (e.g. "The output format of a compiled shader." as documentation for enum vkd3d_shader_cache_flags), but have largely left those out for brevity.
- We shouldn't add this to include/vkd3d.h. We can start implementing the shader cache functionality inside libvkd3d, but we shouldn't expose it as a public libvkd3d API at any point. !732 should unblock moving the implementation to libvkd3d-shader.
- Do we really need vkd3d_shader_cache_enumerate(), and do we really need enumeration to happen through a callback? Could we e.g. return a list of keys instead? As the documentation says, the callback runs while holding the cache mutex, and that seems like asking for trouble.
- vkd3d_shader_cache_put() has a "flags" parameter, but vkd3d_shader_cache_get() doesn't. Are we sure we'll never need it?
- Presumably "mem_size" and "disk_size" in struct vkd3d_shader_cache_info are specified in bytes. Being uint32_t's that would make their maximum size about 4 GiB. And while you can store a fair amount of data in 4 GiB, it's perhaps not entirely inconceivable to go over that limit these days. Should we use size_t for these?
- vkd3d_shader_cache_open() gets most of its parameters through struct vkd3d_shader_cache_info, but not "name". Seems arbitrary.
- By convention, vkd3d_shader_cache_open() would be called vkd3d_shader_open_cache(). (We also have e.g. vkd3d_create_instance(), vkd3d_create_device(), vkd3d_shader_free_shader_code(), and so on.)
- Do we want to give the application control over when the cache should be flushed to disk? Probably not a requirement for the initial version, but it seems useful to have options other than "the cache get serialised to disk when the cache is closed" and "any shader cache API call may cause the cache to get serialised to disk".
- Suppose I want to run two instances of e.g. vkd3d-gears simultaneously. How should that work?
- How should names get resolved for on-disk caches? Should "cache" and "./cache" be considered the same cache?
- vkd3d_shader_cache_compare_key() should use vkd3d_u32_compare(), and introduce vkd3d_u64_compare().
- > ```diff > +/* As the name implies this is taken from moltenvk. */ > +#define MVKHASH_SEED 5381 > +static inline uint64_t mvkHash64(const uint64_t *pVals, size_t count, uint64_t seed) > +{ > + uint64_t hash = seed; > + for (size_t i = 0; i < count; ++i) > + hash = ((hash << 5) + hash) ^ pVals[i]; > + > + return hash; > +} > ```
Please don't do that; MoltenVK has an incompatible license. There's also no need; this essentially the djb2a hash, although modified in a questionable way to use a 64-bit "pVals". (Incidentally, note that djb2a is equivalent to FNV-1 with "offset" 5381 and "prime" 33.)
- > ```diff > - struct vkd3d_render_pass_cache render_pass_cache; > + struct vkd3d_shader_cache *render_pass_cache; > ```
I think we'd like to instead define struct vkd3d_render_pass_cache like this: ```c struct vkd3d_render_pass_cache { struct vkd3d_shader_cache cache; }; ``` and retain some type checking.
- Presumably we'd like to be able to evict entries at some point. How would that work with e.g. the render pass cache? Would an eviction just vkDestroyRenderPass() an entry that may still be in use by libvkd3d? Also, how would the cache know how to destroy the evicted entry?
- Serialising the rbtree in depth-first order, and then inserting one node at a time when reading it back doesn't seem optimal.
- vkd3d_shader_cache_write() doesn't seem very robust. Generally, when rewriting a file like that, you'd first write the new version to a temporary file, sync, and then replace the old file with the name one. Ideally we'd simply avoid completely rewriting the cache each time though.
- We shouldn't pollute the current working directory with shader cache files. These should go into XDG_CACHE_HOME/vkd3d unless otherwise specified, and the Windows equivalent on Windows.
- > ```diff > +HRESULT vkd3d_render_pass_cache_find(struct vkd3d_shader_cache *cache, struct d3d12_device *device, > + const struct vkd3d_render_pass_key *key, VkRenderPass *vk_render_pass); > +struct vkd3d_shader_cache *vkd3d_root_signature_cache_init(struct d3d12_device *device); > +void vkd3d_root_signature_cache_cleanup(struct vkd3d_shader_cache *cache, struct d3d12_device *device); > HRESULT vkd3d_render_pass_cache_find(struct vkd3d_shader_cache *cache, struct d3d12_device *device, > const struct vkd3d_render_pass_key *key, VkRenderPass *vk_render_pass); > ```
That doesn't look right...
- Instead of introducing d3d12_cache_hr_from_vkd3d(), I think we can just add the extra cases to the existing hresult_from_vkd3d_result().
As for next steps: - I don't think there's much point in carrying around the ID3D12Device9 patch in this MR. You should just upstream that, ideally sooner rather than later. - That's largely true for the ID3D12ShaderCacheSession tests as well. - You can then start implementing ID3D12ShaderCacheSession, and make those tests pass, using the vkd3d_shader_cache API as a purely internal API. - Once that's in place the rest of libvkd3d can start using this for the renderpass cache, the Vulkan pipeline cache, and so on. - Once we have those working well, we can introduce this as a public API in libvkd3d-shader, for external users like wined3d.
- We shouldn't add this to include/vkd3d.h. We can start implementing the shader cache functionality inside libvkd3d, but we shouldn't expose it as a public libvkd3d API at any point. !732 should unblock moving the implementation to libvkd3d-shader.
That works for me. Once the wined3d side gets more mature we'll need to export it somewhere of course.
- Do we really need vkd3d_shader_cache_enumerate(), and do we really need enumeration to happen through a callback? Could we e.g. return a list of keys instead? As the documentation says, the callback runs while holding the cache mutex, and that seems like asking for trouble.
Returning a list of keys sounds doable and would avoid the held lock. I am also implicitly relying on the reentrancy of critical sections to call other get() calls from inside the callback.
I'll see if I find some inspiration in the implementation of iterator objects in the C++ STL.
- vkd3d_shader_cache_put() has a "flags" parameter, but vkd3d_shader_cache_get() doesn't. Are we sure we'll never need it?
I can add one without defining any valid flags for now. I have no particular preference either way. However, see below on your question regarding eviction - we could potentially drop the flags on put() as well.
- Presumably "mem_size" and "disk_size" in struct vkd3d_shader_cache_info are specified in bytes. Being uint32_t's that would make their maximum size about 4 GiB. And while you can store a fair amount of data in 4 GiB, it's perhaps not entirely inconceivable to go over that limit these days. Should we use size_t for these?
32 bit sizes follow Microsoft's API. If we want to make 64 bit limits possible I'd prefer uint64_t over size_t. If we use size_t we have to use 64 bit sizes in the disk files anyway and beware of situations where a 64 bit sized cache is opened in a 32 bit process.
A cache larger than the address space size is hypothetically possible once eviction is implemented.
- vkd3d_shader_cache_open() gets most of its parameters through struct vkd3d_shader_cache_info, but not "name". Seems arbitrary.
I blame Microsoft for that one too, but I am open to changes.
- By convention, vkd3d_shader_cache_open() would be called vkd3d_shader_open_cache(). (We also have e.g. vkd3d_create_instance(), vkd3d_create_device(), vkd3d_shader_free_shader_code(), and so on.)
The thought was to start all shader cache related functions with "vkd3d_shader_cache" - I'll rename _open and double check _close().
- Do we want to give the application control over when the cache should be flushed to disk? Probably not a requirement for the initial version, but it seems useful to have options other than "the cache get serialised to disk when the cache is closed" and "any shader cache API call may cause the cache to get serialised to disk".
Yes, I think we want that. The second game I tested in wined3d (Mass Effect 2) already ran into problems there - it just calls exit() rather than closing the device properly.
The default behavior should probably be something more reliable than "flush only on close". Either flush after every n writes, have a worker thread flush after a time of inactivity (although I don't like the idea of spawning a thread), flush from an _onexit() callback.
Microsoft's cache seems to flush on every write, at least for application-created caches. I don't want to do that in a blocking manner because depending on the hardware doing a disk write and possibly fsync is potentially worse than compiling spir-v.
- Suppose I want to run two instances of e.g. vkd3d-gears simultaneously. How should that work?
Good question. Windows refuses to open an on-disk cache from two different processes with CreateShaderCacheSession. I haven't tested what happens with the driver's own cache, but I guess one of the processes will end up without access to the cache.
And yes, doing share checks is not there right now. Windows has better support for file locking than Unix and the answer is probably somewhere in flock() or CreateFile's dwShareMode.
- How should names get resolved for on-disk caches? Should "cache" and "./cache" be considered the same cache?
I think we should leave it up to the file system semantics. And yes, we might run in a situation where a caller tries to open "c:/Cache", "C:\cache" and "\?\c:\cAcHe" and it fails because the existing cache list doesn't catch that they refer to the same file.
- Presumably we'd like to be able to evict entries at some point. How would that work with e.g. the render pass cache? Would an eviction just vkDestroyRenderPass() an entry that may still be in use by libvkd3d? Also, how would the cache know how to destroy the evicted entry?
For objects that might be referenced somewhere else the caller should essentially disable eviction by setting the limits to ~0u. That'll need an explicit check if we stick to uint32_t. If we use uint64_t we'll run out of address space before eviction kicks in.
I haven't considered a case where a caller may want eviction to happen, but has special requirements for how to free cached objects. We could either just say that's not supported (i.e., if your values are more than just dumb data, set limits to ~0u) or allow to specify an eviction callback.
An eviction callback could also be a way to specify something similar to PUT_FLAGS_NO_REPLACE. By passing a reason for freeing something (eviction or overwrite) and a return value, the callback could reject the attempt to overwrite an existing value.
- Serialising the rbtree in depth-first order, and then inserting one node at a time when reading it back doesn't seem optimal.
- vkd3d_shader_cache_write() doesn't seem very robust. Generally, when rewriting a file like that, you'd first write the new version to a temporary file, sync, and then replace the old file with the name one. Ideally we'd simply avoid completely rewriting the cache each time though.
Yeah I am still not a fan of doing this stuff ourselves, but as discussed previously I am not convinced by existing libraries I looked at.
Partial updates is something I had implemented in a pre-cleanup patch series. I can add it here if you want, but I don't think it is worth making the MR bigger for it.
- I don't think there's much point in carrying around the ID3D12Device9 patch in this MR. You should just upstream that, ideally sooner rather than later.
- That's largely true for the ID3D12ShaderCacheSession tests as well.
Yes, now that we are already at Device8 it isn't much trouble. Back in November bumping it from Device1 to Device9 was much more regression prone.
I'll send stubs + tests in the next days.
- vkd3d_shader_cache_put() has a "flags" parameter, but vkd3d_shader_cache_get() doesn't. Are we sure we'll never need it?
I can add one without defining any valid flags for now. I have no particular preference either way. However, see below on your question regarding eviction - we could potentially drop the flags on put() as well.
One of the advantages of not exposing anything in a public interface for the moment is that we can let interfaces evolve a bit before we commit to anything.
- Presumably "mem_size" and "disk_size" in struct vkd3d_shader_cache_info are specified in bytes. Being uint32_t's that would make their maximum size about 4 GiB. And while you can store a fair amount of data in 4 GiB, it's perhaps not entirely inconceivable to go over that limit these days. Should we use size_t for these?
32 bit sizes follow Microsoft's API. If we want to make 64 bit limits possible I'd prefer uint64_t over size_t. If we use size_t we have to use 64 bit sizes in the disk files anyway and beware of situations where a 64 bit sized cache is opened in a 32 bit process.
I'd agree with using `uint64_t` rather than `size_t`.
- Do we want to give the application control over when the cache should be flushed to disk? Probably not a requirement for the initial version, but it seems useful to have options other than "the cache get serialised to disk when the cache is closed" and "any shader cache API call may cause the cache to get serialised to disk".
Yes, I think we want that. The second game I tested in wined3d (Mass Effect 2) already ran into problems there - it just calls exit() rather than closing the device properly.
The default behavior should probably be something more reliable than "flush only on close". Either flush after every n writes, have a worker thread flush after a time of inactivity (although I don't like the idea of spawning a thread), flush from an _onexit() callback.
A worker thread wouldn't seem inappropriate to me. It gives the advantage that writes are quick for the application (because they don't immediately hit the disk) and can be pooled in batches (which is quite faster for the database, because it only syncs at the end of the transaction).
- Suppose I want to run two instances of e.g. vkd3d-gears simultaneously. How should that work?
Good question. Windows refuses to open an on-disk cache from two different processes with CreateShaderCacheSession. I haven't tested what happens with the driver's own cache, but I guess one of the processes will end up without access to the cache.
And yes, doing share checks is not there right now. Windows has better support for file locking than Unix and the answer is probably somewhere in flock() or CreateFile's dwShareMode.
Do we consider valuable to replicate Windows' defect here? It seems that sharing the cache, at least for reading, is just better.
On the main reasons I'm pushing for using SQLite (rather that implement our own disk format) is precisely because I expect it to make much easier dealing with scenarios like this. Two processes can open the same SQLite database with no problem, then each time any of the two processes has some new shaders to put in the cache it can begin a transaction, INSERT the shaders and close the transaction. Every now and then the two processes can SELECT to refresh their in-memory idea of which shaders are available in the database, and when they need one it's just another SELECT. No further need to deal with OS-specific details (like locking) or rewriting the whole file each time and then moving it (which helps ensuring that the database remains consistent, but means that writes by one concurrent writer are overwritten by the next one).
- How should names get resolved for on-disk caches? Should "cache" and "./cache" be considered the same cache?
I think we should leave it up to the file system semantics. And yes, we might run in a situation where a caller tries to open "c:/Cache", "C:\cache" and "\?\c:\cAcHe" and it fails because the existing cache list doesn't catch that they refer to the same file.
I'm not yet convinced that keeping a global list of all the open caches and deduplicate open calls at that level is a good idea, both for in-memory and disk-backed caches. My feeling is that the caller has a better knowledge at what's actually going on to decide when two caches are really the same cache or not. For example, I suppose that right now you rely on this mechanism to ensure that all the devices created within the same process use the same cache object. Why not opening the cache just once, as soon as the first device needs it, and then keep around the cache handle in a global? This problem of comparing file paths goes out of the window.
Giovanni Mascellani (@giomasce) commented about include/vkd3d.h:
+/**
- Properties of the opened shader cache.
- \since 1.12
- */
+struct vkd3d_shader_cache_info +{
- enum vkd3d_structure_type type;
- const void *next;
- /** Maximum amount of data the cache holds in memory. */
- uint32_t mem_size;
- /** Maximum amount of data written to disk. Ignored for VKD3D_SHADER_CACHE_MEMORY_ONLY. */
- uint32_t disk_size;
- /** Maximum number of cache entries. */
- uint32_t max_entries;
Does `max_entries` limit entries on disk or in memory?
In general, what is expected to happen when you reach the limits? I guess that on reaching the memory limit you want the cache to pick a bunch of entries, ensure they're already written to disk and then drop them from memory. But what happens then you reach the disk limit? Should you return an error or drop something anyway, on the basis that this is just a cache?
On Fri Mar 22 11:53:47 2024 +0000, Giovanni Mascellani wrote:
Does `max_entries` limit entries on disk or in memory? In general, what is expected to happen when you reach the limits? I guess that on reaching the memory limit you want the cache to pick a bunch of entries, ensure they're already written to disk and then drop them from memory. But what happens then you reach the disk limit? Should you return an error or drop something anyway, on the basis that this is just a cache?
It just drops. Tbh I have only tested that with a memory only cache in my tests (see commit b7db58f9 - "tests: Add tests for ID3D12ShaderCacheSession.") but I see no reason for disk caches to behave differently. There is no indication that a key/value pair got evicted, other than that you don't find anything when you look up the key.
How does it choose? I guess least recently used, and the tests don't give me a reason to suspect anything else, but it might also be something more advanced.
On Fri Mar 22 11:53:45 2024 +0000, Giovanni Mascellani wrote:
- vkd3d_shader_cache_put() has a "flags" parameter, but
vkd3d_shader_cache_get() doesn't. Are we sure we'll never need it?
I can add one without defining any valid flags for now. I have no
particular preference either way. However, see below on your question regarding eviction - we could potentially drop the flags on put() as well. One of the advantages of not exposing anything in a public interface for the moment is that we can let interfaces evolve a bit before we commit to anything.
- Presumably "mem_size" and "disk_size" in struct
vkd3d_shader_cache_info are specified in bytes. Being uint32_t's that would make their maximum size about 4 GiB. And while you can store a fair amount of data in 4 GiB, it's perhaps not entirely inconceivable to go over that limit these days. Should we use size_t for these?
32 bit sizes follow Microsoft's API. If we want to make 64 bit limits
possible I'd prefer uint64_t over size_t. If we use size_t we have to use 64 bit sizes in the disk files anyway and beware of situations where a 64 bit sized cache is opened in a 32 bit process. I'd agree with using `uint64_t` rather than `size_t`.
- Do we want to give the application control over when the cache
should be flushed to disk? Probably not a requirement for the initial version, but it seems useful to have options other than "the cache get serialised to disk when the cache is closed" and "any shader cache API call may cause the cache to get serialised to disk".
Yes, I think we want that. The second game I tested in wined3d (Mass
Effect 2) already ran into problems there - it just calls exit() rather than closing the device properly.
The default behavior should probably be something more reliable than
"flush only on close". Either flush after every n writes, have a worker thread flush after a time of inactivity (although I don't like the idea of spawning a thread), flush from an _onexit() callback. A worker thread wouldn't seem inappropriate to me. It gives the advantage that writes are quick for the application (because they don't immediately hit the disk) and can be pooled in batches (which is quite faster for the database, because it only syncs at the end of the transaction).
- Suppose I want to run two instances of e.g. vkd3d-gears
simultaneously. How should that work?
Good question. Windows refuses to open an on-disk cache from two
different processes with CreateShaderCacheSession. I haven't tested what happens with the driver's own cache, but I guess one of the processes will end up without access to the cache.
And yes, doing share checks is not there right now. Windows has better
support for file locking than Unix and the answer is probably somewhere in flock() or CreateFile's dwShareMode. Do we consider valuable to replicate Windows' defect here? It seems that sharing the cache, at least for reading, is just better. On the main reasons I'm pushing for using SQLite (rather that implement our own disk format) is precisely because I expect it to make much easier dealing with scenarios like this. Two processes can open the same SQLite database with no problem, then each time any of the two processes has some new shaders to put in the cache it can begin a transaction, INSERT the shaders and close the transaction. Every now and then the two processes can SELECT to refresh their in-memory idea of which shaders are available in the database, and when they need one it's just another SELECT. No further need to deal with OS-specific details (like locking) or rewriting the whole file each time and then moving it (which helps ensuring that the database remains consistent, but means that writes by one concurrent writer are overwritten by the next one).
- How should names get resolved for on-disk caches? Should "cache"
and "./cache" be considered the same cache?
I think we should leave it up to the file system semantics. And yes,
we might run in a situation where a caller tries to open "c:/Cache", "C:\cache" and "\?\c:\cAcHe" and it fails because the existing cache list doesn't catch that they refer to the same file. I'm not yet convinced that keeping a global list of all the open caches and deduplicate open calls at that level is a good idea, both for in-memory and disk-backed caches. My feeling is that the caller has a better knowledge at what's actually going on to decide when two caches are really the same cache or not. For example, I suppose that right now you rely on this mechanism to ensure that all the devices created within the same process use the same cache object. Why not opening the cache just once, as soon as the first device needs it, and then keep around the cache handle in a global? This problem of comparing file paths goes out of the window.
Fwiw I extended my tests a bit - native limited the maximum cache disk size to 1GB, although not the memory size. If I try to insert huge items (approx 512 MB) StoreValue() crashes. At larger values (1 GB) it returns an error, and at even larger values (around 2GB) it crashes again.
I think using 64 bit sizes is overengineering. It is easy to do though and doesn't cost much, at least for the total cache size.
I think using 64 bit sizes is overengineering. It is easy to do though and doesn't cost much, at least for the total cache size.
I think the concern is mosty having problems with ABI compatibility, rather than expecting to use more than 4 GB shortly.
I blame Microsoft for that one too, but I am open to changes.
We largely don't care what Microsoft did, as long as we can support all the features required to implement ID3D12ShaderCacheSession. I.e., generally the question should be "What makes sense?", rather than "What does Windows do?". That's slightly different from traditional Wine development, yes.
Microsoft's cache seems to flush on every write, at least for application-created caches. I don't want to do that in a blocking manner because depending on the hardware doing a disk write and possibly fsync is potentially worse than compiling spir-v.
That doesn't have to be a problem per se, but it does imply users of the API would e.g. need to spawn a separate thread to avoid blocking, yes.
The default behavior should probably be something more reliable than "flush only on close". Either flush after every n writes, have a worker thread flush after a time of inactivity (although I don't like the idea of spawning a thread), flush from an _onexit() callback.
A worker thread wouldn't seem inappropriate to me. It gives the advantage that writes are quick for the application (because they don't immediately hit the disk) and can be pooled in batches (which is quite faster for the database, because it only syncs at the end of the transaction).
Note though that if we introduce such a thread on the vkd3d-shader side of the API (as opposed to the user side), that would imply a thread per opened cache with the current API design.
Note though that if we introduce such a thread on the vkd3d-shader side of the API (as opposed to the user side), that would imply a thread per opened cache with the current API design.
I don't think it necessitates that. We could use a worker thread that accepts work items from any cache. That worker API could even use host functionality like RtlQueueWorkItem where available.
That said, using one thread per cache might just be simpler.