[PATCH 0/17] MR541: RFC: Shader cache for vkd3d

List overview All Threads

newer

older

[PATCH 0/5] MR550:...

[PATCH v3 0/1] MR4823: winex11:...

Stefan Dösinger (＠stefan)

4 Jan 2024 4 Jan '24

9:17 p.m.

Here is a preview of my shader cache work for early comments. It isn't complete, but does successfully cache things.

What's there: * A new vkd3d API that is used internally for caching, can be used to implement the ID3D12ShaderCacheSession interface and hopefully be used by wined3d as well * Simple saving and loading of the cached objects * It is used to cache render passes, root signatures and pipeline states

What is not yet there * Partial cache loading and eviction * ID3D12ShaderCacheSession - largely because it needs bumping ID3D12Device up to version 9, which may bring unrelated regressions. For this and tests see my "cache-rework" branch (which * Cache file compression * Incremental updates of cache files - right now they are rewritten from scratch on exit * Loading the cache in an extra thread. The pipeline state creation code will need some refactor for that

I am not quite happy yet with the two patches that write and reload actual graphics pipelines. The way I am storing the d3d settings aren't quite consistent yet either - in some cases I use the d3d input data as key directly, in others I store them as values attached to a hash value. The latter is usually the case if I need to cross-reference something, e.g. have a link from the pipeline state to the root signature. This kind of setup shows how wined3d can build a chain of linked state though.

There are also known issues with locking, explained in comments in the patches.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Show replies by date

Stefan Dösinger

4 Jan 4 Jan

9:17 p.m.

New subject: [PATCH 01/17] vkd3d: Define and stub the shader cache API.

From: Stefan Dösinger stefan@codeweavers.com

---

Q1: Can I add those include files to vkd3d.h or is there a problem?

Why not in vkd3d-shader? Because vkd3d-shader has no locks/mutexes, and I'd like to do locking in the shader cache implementation instead of the caller.

Q2: Since this is not in vkd3d-shader I could use DXGI_ERROR_* instead of defining more vkd3d_result enums. I kinda like having this independent of dxgi types though. --- Makefile.am | 1 + include/vkd3d.h | 186 ++++++++++++++++++++++++++++++++++++++++++ include/vkd3d_types.h | 10 +++ libs/vkd3d/cache.c | 60 ++++++++++++++ libs/vkd3d/vkd3d.map | 6 ++ 5 files changed, 263 insertions(+) create mode 100644 libs/vkd3d/cache.c

diff --git a/Makefile.am b/Makefile.am index 23e7add3c..e8b3e438a 100644 --- a/Makefile.am +++ b/Makefile.am @@ -331,6 +331,7 @@ libvkd3d_la_SOURCES = \ include/vkd3d_d3d12.idl \ include/vkd3d_d3dcommon.idl \ include/vkd3d_unknown.idl \ + libs/vkd3d/cache.c \ libs/vkd3d/command.c \ libs/vkd3d/device.c \ libs/vkd3d/resource.c \ diff --git a/include/vkd3d.h b/include/vkd3d.h index a3bb8e0dd..9a71e7a57 100644 --- a/include/vkd3d.h +++ b/include/vkd3d.h @@ -19,6 +19,8 @@ #ifndef __VKD3D_H #define __VKD3D_H

+#include <stdbool.h> +#include <stdint.h> #include <vkd3d_types.h>

#ifndef VKD3D_NO_WIN32_TYPES @@ -187,6 +189,63 @@ struct vkd3d_image_resource_create_info D3D12_RESOURCE_STATES present_state; };

+struct vkd3d_shader_cache; + +/** The output format of a compiled shader. */ +enum vkd3d_shader_cache_flags +{ + /** + * No particular behaviour modifications. + */ + VKD3D_SHADER_CACHE_FLAGS_NONE, + /** + * Don't acquire the cache mutex before access. + */ + VKD3D_SHADER_CACHE_FLAGS_NO_SERIALIZE, + + VKD3D_FORCE_32_BIT_ENUM(VKD3D_SHADER_CACHE_FLAGS), +}; + +/** + * Huhu document me + * + * \since 1.10 + */ +struct vkd3d_shader_cache_desc +{ + /** Maximum amount of data the cache holds in memory. */ + uint32_t mem_size; + /** Maximum amount of data written to disk. Set to 0 for a memory-only cache. */ + uint32_t disk_size; + /** Maximum number of cache entries. */ + uint32_t max_entries; + /** Random flags, what else. */ + enum vkd3d_shader_cache_flags flags; + /** An application-chosen version number. If the version of an existing + * cache on disk does match, the old data will be discarded. */ + uint64_t version; +}; + +/** + * Callback function for vkd3d_shader_cache_enumerate. + * + * \ref key and \ref value become invalid after the callback returns and must not be freed or modified. + * + * \param key The application-specified key of the currently enumerated element. + * + * \param key_size Size of \ref key in bytes. + * + * \param value The value associated with \ref key. + * + * \param value_size Size of \ref value in bytes. + * + * \param context The context parameter passed to \ref vkd3d_shader_cache_enumerate. + * + * \return true if the enumeration should be continued, false to abort it. + */ +typedef bool (vkd3d_shader_cache_traverse_func)(const void *key, uint32_t key_size, + const void *value, uint32_t value_size, void *context); + #ifdef LIBVKD3D_SOURCE # define VKD3D_API VKD3D_EXPORT #else @@ -282,6 +341,111 @@ VKD3D_API HRESULT vkd3d_create_versioned_root_signature_deserializer(const void */ VKD3D_API void vkd3d_set_log_callback(PFN_vkd3d_log callback);

+/** + * Creates a new shader cache or opens an existing one. + * + * \param name The name of the cache. In case of an on-disk cache, this is a file name. In case of a memory-only + * cache, opening the same name again in the same process will return the same vkd3d_shader_cache handle. + * Cache handles are reference counted, so vkd3d_shader_cache_close has to be called for each successful + * vkd3d_shader_cache_open invocation. + * + * \param desc Cache properties. See \ref vkd3d_shader_cache_desc. + * + * \param cache Return pointer of the opened or created cache. + * + * \return A member of \ref vkd3d_result. + * + * \since 1.10 + */ +VKD3D_API int vkd3d_shader_cache_open(const char *name, + const struct vkd3d_shader_cache_desc *desc, struct vkd3d_shader_cache **cache); + +/** + * Decrements the cache reference count, closing it if it falls to zero. + * + * \param cache The cache to close. + * + * \since 1.10 + */ +VKD3D_API void vkd3d_shader_cache_close(struct vkd3d_shader_cache *cache); + +/** + * Stores a key-value pair in a shader cache. + * + * \param cache The cache to store the value in. + * + * \param key An opaque key of key_size bytes. The cache does not parse the key in any way. If the key already + * exists, the existing value will be replaced. + * FIXME: For some users (e.g. the renderpass cache) it would be interesting to prevent replacement and get + * an error instead if the value already exists. Without this they need their own lock to have an atomic + * get() - create new object - put() sequence. + * + * \param key_size The size of \ref key in bytes. + * + * \param value The value to associate with \ref key. + * + * \param value_size The size of \ref value in bytes. + * + * \return A member of \ref vkd3d_result. + * + * \since 1.10 + */ +VKD3D_API int vkd3d_shader_cache_put(struct vkd3d_shader_cache *cache, + const void *key, uint32_t key_size, const void *value, uint32_t value_size); + +/** + * Retrieves the stored value associated with a key in a shader cache. + * + * If the key is found, \ref value_size is set to the size of the value stored in the cache. If \ref value is non-NULL, + * and the input value of \ref value_size is equal to or larger than the size of the stored value, the stored value + * will be copied to the memory pointed to by \ref value. + * + * \param cache The cache to retrieve the value from. + * + * \param key The key to look up. + * + * \param key_size The size of \ref key in bytes. + * + * \param value The buffer where to write the value to, of size *value_size. This parameter may be NULL. + * + * \param value_size The size of \ref value in bytes. The size of the stored value will be returned here. + * + * \return A member of \ref vkd3d_result. + * + * \since 1.10 + */ +VKD3D_API int vkd3d_shader_cache_get(struct vkd3d_shader_cache *cache, + const void *key, uint32_t key_size, void *value, uint32_t *value_size); + +/** + * Marks an on-disk shader cache for deletion. + * + * When the final reference of \ref cache is released, the cache files on disk will be deleted. This function has no + * effect on memory-only caches, which are discarded after use in any case. + * + * \param cache The cache to delete. + * + * \since 1.10 + */ +VKD3D_API void vkd3d_shader_cache_delete_on_destroy(struct vkd3d_shader_cache *cache); + +/** + * Enumerates all key-value pairs in a cache. + * + * This function invokes \ref cb once for each stored entry. No particular enumeration order is guaranteed. The cache's lock is held + * during the entire operation, including when invoking the callback. + * + * \param cache huhu. + * + * \param cb callback function, see \ref vkd3d_shader_cache_traverse_func. + + * \param context An application-specified pointer that is passed to the callback for each invocation. + * + * \since 1.10 + */ +VKD3D_API void vkd3d_shader_cache_enumerate(struct vkd3d_shader_cache *cache, + vkd3d_shader_cache_traverse_func *cb, void *context); + #endif /* VKD3D_NO_PROTOTYPES */

/* @@ -328,6 +492,28 @@ typedef HRESULT (*PFN_vkd3d_create_versioned_root_signature_deserializer)(const /** Type of vkd3d_set_log_callback(). \since 1.4 */ typedef void (*PFN_vkd3d_set_log_callback)(PFN_vkd3d_log callback);

+/** Type of vkd3d_shader_cache_open(). \since 1.10 */ +typedef int (*PFN_vkd3d_shader_cache_open)(const char *name, + const struct vkd3d_shader_cache_desc *desc, struct vkd3d_shader_cache **cache); + +/** Type of vkd3d_shader_cache_close(). \since 1.10 */ +typedef void (*PFN_vkd3d_shader_cache_close)(struct vkd3d_shader_cache *cache); + +/** Type of vkd3d_shader_cache_put(). \since 1.10 */ +typedef int (*PFN_vkd3d_shader_cache_put)(struct vkd3d_shader_cache *cache, + const void *key, uint32_t key_size, const void *value, uint32_t value_size); + +/** Type of vkd3d_shader_cache_get(). \since 1.10 */ +typedef int (*PFN_vkd3d_shader_cache_get)(struct vkd3d_shader_cache *cache, + const void *key, uint32_t key_size, void *value, uint32_t *value_size); + +/** Type of vkd3d_shader_cache_delete_on_destroy(). \since 1.10 */ +typedef void (*PFN_vkd3d_shader_cache_delete_on_destroy)(struct vkd3d_shader_cache *cache); + +/** Type of vkd3d_shader_cache_enumerate(). \since 1.10 */ +typedef void (*PFN_vkd3d_shader_cache_enumerate)(struct vkd3d_shader_cache *cache, + vkd3d_shader_cache_traverse_func *cb, void *context); + #ifdef __cplusplus } #endif /* __cplusplus */ diff --git a/include/vkd3d_types.h b/include/vkd3d_types.h index 4a7aca236..775a06a06 100644 --- a/include/vkd3d_types.h +++ b/include/vkd3d_types.h @@ -51,6 +51,16 @@ enum vkd3d_result VKD3D_ERROR_INVALID_SHADER = -4, /** The operation is not implemented in this version of vkd3d. */ VKD3D_ERROR_NOT_IMPLEMENTED = -5, + /** The requested shader cache key was not found. */ + VKD3D_ERROR_NOT_FOUND = -6, + /** The requested shader cache value was bigger than the passed buffer. */ + VKD3D_ERROR_MORE_DATA = -7, + /** A different key with the same hash was found in the shader cache. */ + VKD3D_ERROR_HASH_COLLISSION = -8, + /** A shader cache with the same name but different version is already opened. */ + VKD3D_ERROR_VERSION_MISMATCH = -9, + /** The cache lock is contended. */ + VKD3D_ERROR_LOCK_NOT_AVAILABLE = -10,

VKD3D_FORCE_32_BIT_ENUM(VKD3D_RESULT), }; diff --git a/libs/vkd3d/cache.c b/libs/vkd3d/cache.c new file mode 100644 index 000000000..51be6f265 --- /dev/null +++ b/libs/vkd3d/cache.c @@ -0,0 +1,60 @@ +/* + * Copyright 2024 Stefan Dösinger for CodeWeavers + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA + */ + +#include "vkd3d_private.h" +#include "rbtree.h" + +#include <stdarg.h> +#include <stdio.h> + +int vkd3d_shader_cache_open(const char *name, + const struct vkd3d_shader_cache_desc *desc, struct vkd3d_shader_cache **cache) +{ + FIXME("%s, %p, %p: stub!\n", debugstr_a(name), desc, cache); + return VKD3D_ERROR_NOT_IMPLEMENTED; +} + +void vkd3d_shader_cache_close(struct vkd3d_shader_cache *cache) +{ + FIXME("Stub!\n"); +} + +int vkd3d_shader_cache_put(struct vkd3d_shader_cache *cache, + const void *key, uint32_t key_size, const void *value, uint32_t value_size) +{ + FIXME("%p, %p, %#x, %p, %#x stub!\n", cache, key, key_size, value, value_size); + return VKD3D_ERROR_NOT_IMPLEMENTED; +} + +int vkd3d_shader_cache_get(struct vkd3d_shader_cache *cache, + const void *key, uint32_t key_size, void *value, uint32_t *value_size) +{ + FIXME("%p, %p, %#x, %p, %p stub!\n", cache, key, key_size, value, value_size); + return VKD3D_ERROR_NOT_IMPLEMENTED; +} + +void vkd3d_shader_cache_delete_on_destroy(struct vkd3d_shader_cache *cache) +{ + FIXME("Stub!\n"); +} + +void vkd3d_shader_cache_enumerate(struct vkd3d_shader_cache *cache, + vkd3d_shader_cache_traverse_func *cb, void *context) +{ + FIXME("%p, %p, %p: stub!\n", cache, cb, context); +} diff --git a/libs/vkd3d/vkd3d.map b/libs/vkd3d/vkd3d.map index 441b2e35b..9e7bdbe9e 100644 --- a/libs/vkd3d/vkd3d.map +++ b/libs/vkd3d/vkd3d.map @@ -23,6 +23,12 @@ global: vkd3d_serialize_root_signature; vkd3d_serialize_versioned_root_signature; vkd3d_set_log_callback; + vkd3d_shader_cache_close; + vkd3d_shader_cache_delete_on_destroy; + vkd3d_shader_cache_enumerate; + vkd3d_shader_cache_get; + vkd3d_shader_cache_open; + vkd3d_shader_cache_put;

local: *; };

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Stefan Dösinger

9:17 p.m.

New subject: [PATCH 02/17] vkd3d: Implement shader_cache_open/close.

From: Stefan Dösinger stefan@codeweavers.com

--- libs/vkd3d/cache.c | 79 ++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 76 insertions(+), 3 deletions(-)

diff --git a/libs/vkd3d/cache.c b/libs/vkd3d/cache.c index 51be6f265..c82c19090 100644 --- a/libs/vkd3d/cache.c +++ b/libs/vkd3d/cache.c @@ -22,16 +22,89 @@ #include <stdarg.h> #include <stdio.h>

+/* List of open caches. I expect the number to be small. */ +static struct list cache_list = LIST_INIT(cache_list); +static struct vkd3d_mutex cache_list_mutex; +static LONG cache_mutex_initialized; + +struct vkd3d_shader_cache +{ + LONG refcount; + struct vkd3d_shader_cache_desc desc; + struct list cache_list_entry; + char name[1]; +}; + int vkd3d_shader_cache_open(const char *name, const struct vkd3d_shader_cache_desc *desc, struct vkd3d_shader_cache **cache) { - FIXME("%s, %p, %p: stub!\n", debugstr_a(name), desc, cache); - return VKD3D_ERROR_NOT_IMPLEMENTED; + struct vkd3d_shader_cache *object; + size_t size; + + TRACE("%s, %p, %p.\n", debugstr_a(name), desc, cache); + + if (!name || !desc) + { + WARN("No name or description, returning VKD3D_ERROR_INVALID_ARGUMENT.\n"); + return E_INVALIDARG; + } + + /* FIXME: This isn't thread safe and cache_mutex_initialized might overflow. Do we have a + * something like DllMain or a platform-independent InitializeOnce? */ + if (InterlockedIncrement(&cache_mutex_initialized) == 1) + vkd3d_mutex_init(&cache_list_mutex); + + vkd3d_mutex_lock(&cache_list_mutex); + LIST_FOR_EACH_ENTRY(object, &cache_list, struct vkd3d_shader_cache, cache_list_entry) + { + if (!strcmp(object->name, name)) + { + TRACE("found an open cache of name %s.\n", debugstr_a(name)); + if (object->desc.version != desc->version) + { + WARN("Version mismatch: %"PRIu64", %"PRIu64".\n", object->desc.version, desc->version); + vkd3d_mutex_unlock(&cache_list_mutex); + return VKD3D_ERROR_VERSION_MISMATCH; + } + InterlockedIncrement(&object->refcount); + *cache = object; + vkd3d_mutex_unlock(&cache_list_mutex); + return S_OK; + } + } + + size = strlen(name) + 1; + object = vkd3d_calloc(1, offsetof(struct vkd3d_shader_cache, name[size])); + if (!object) + { + vkd3d_mutex_unlock(&cache_list_mutex); + return VKD3D_ERROR_OUT_OF_MEMORY; + } + + object->refcount = 1; + object->desc = *desc; + memcpy(object->name, name, size); + + list_add_head(&cache_list, &object->cache_list_entry); + vkd3d_mutex_unlock(&cache_list_mutex); + + *cache = object; + return S_OK; }

void vkd3d_shader_cache_close(struct vkd3d_shader_cache *cache) { - FIXME("Stub!\n"); + ULONG refcount = InterlockedDecrement(&cache->refcount); + TRACE("cache %s refcount %u.\n", cache->name, refcount); + + if (refcount) + return; + + vkd3d_mutex_lock(&cache_list_mutex); + list_remove(&cache->cache_list_entry); + vkd3d_mutex_unlock(&cache_list_mutex); + + vkd3d_free(cache); }

int vkd3d_shader_cache_put(struct vkd3d_shader_cache *cache,

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Stefan Dösinger

9:17 p.m.

New subject: [PATCH 03/17] Create and destroy the shader cache tree.

From: Stefan Dösinger stefan@codeweavers.com

--- libs/vkd3d/cache.c | 47 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+)

diff --git a/libs/vkd3d/cache.c b/libs/vkd3d/cache.c index c82c19090..c1d155746 100644 --- a/libs/vkd3d/cache.c +++ b/libs/vkd3d/cache.c @@ -27,14 +27,52 @@ static struct list cache_list = LIST_INIT(cache_list); static struct vkd3d_mutex cache_list_mutex; static LONG cache_mutex_initialized;

+/* Data structures used in the serialized files. Changing these will break compatibility with + * existing cache files, so bump the cache version if doing so. + * + * We don't intend these files to be read by third party code, so consider them a vkd3d + * implementation detail. */ +struct vkd3d_cache_object_v1 +{ + uint64_t hash; + uint32_t offset; /* Where key + value are located in the .val file. */ + uint32_t disk_size; /* Size of the entry in the .val file. May be compressed. */ + uint32_t key_size; /* Size of the app provided key. */ + uint32_t value_size; /* Size of the value. key_size + value_size = uncompressed entry size*/ +}; + +/* End disk data structures. */ + struct vkd3d_shader_cache { LONG refcount; struct vkd3d_shader_cache_desc desc; struct list cache_list_entry; + + struct rb_tree tree; + char name[1]; };

+struct shader_cache_entry +{ + struct vkd3d_cache_object_v1 d; + struct rb_entry entry; /* Entry in the hash table. */ + uint8_t *payload; /* App key + value. Separate allocation to allow eviction. */ +}; + +static int vkd3d_shader_cache_compare_key(const void *key, const struct rb_entry *entry) +{ + const uint64_t *k = key; + const struct shader_cache_entry *e = RB_ENTRY_VALUE(entry, struct shader_cache_entry, entry); + + if (*k < e->d.hash) + return -1; + if (*k > e->d.hash) + return 1; + return 0; +} + int vkd3d_shader_cache_open(const char *name, const struct vkd3d_shader_cache_desc *desc, struct vkd3d_shader_cache **cache) { @@ -84,6 +122,7 @@ int vkd3d_shader_cache_open(const char *name, object->refcount = 1; object->desc = *desc; memcpy(object->name, name, size); + rb_init(&object->tree, vkd3d_shader_cache_compare_key);

list_add_head(&cache_list, &object->cache_list_entry); vkd3d_mutex_unlock(&cache_list_mutex); @@ -92,6 +131,12 @@ int vkd3d_shader_cache_open(const char *name, return S_OK; }

+static void vkd3d_shader_cache_clear(struct rb_entry *entry, void *context) +{ + struct shader_cache_entry *e = RB_ENTRY_VALUE(entry, struct shader_cache_entry, entry); + vkd3d_free(e); +} + void vkd3d_shader_cache_close(struct vkd3d_shader_cache *cache) { ULONG refcount = InterlockedDecrement(&cache->refcount); @@ -104,6 +149,8 @@ void vkd3d_shader_cache_close(struct vkd3d_shader_cache *cache) list_remove(&cache->cache_list_entry); vkd3d_mutex_unlock(&cache_list_mutex);

+ rb_destroy(&cache->tree, vkd3d_shader_cache_clear, NULL); + vkd3d_free(cache); }

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Stefan Dösinger

9:17 p.m.

New subject: [PATCH 04/17] vkd3d: Implement vkd3d_shader_cache_put.

From: Stefan Dösinger stefan@codeweavers.com

---

The exit(1) should obviously go before merging this. --- libs/vkd3d/cache.c | 122 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 120 insertions(+), 2 deletions(-)

diff --git a/libs/vkd3d/cache.c b/libs/vkd3d/cache.c index c1d155746..250d3b825 100644 --- a/libs/vkd3d/cache.c +++ b/libs/vkd3d/cache.c @@ -73,6 +73,16 @@ static int vkd3d_shader_cache_compare_key(const void *key, const struct rb_entry return 0; }

+static void vkd3d_shader_cache_add_item(struct vkd3d_shader_cache *cache, struct shader_cache_entry *e) +{ + rb_put(&cache->tree, &e->d.hash, &e->entry); +} + +static void vkd3d_shader_cache_remove_item(struct vkd3d_shader_cache *cache, struct shader_cache_entry *e) +{ + rb_remove(&cache->tree, &e->entry); +} + int vkd3d_shader_cache_open(const char *name, const struct vkd3d_shader_cache_desc *desc, struct vkd3d_shader_cache **cache) { @@ -154,11 +164,119 @@ void vkd3d_shader_cache_close(struct vkd3d_shader_cache *cache) vkd3d_free(cache); }

+/* As the name implies this is taken from moltenvk. */ +#define MVKHASH_SEED 5381 +static inline uint64_t mvkHash64(const uint64_t *pVals, size_t count, uint64_t seed) +{ + uint64_t hash = seed; + for (size_t i = 0; i < count; ++i) + hash = ((hash << 5) + hash) ^ pVals[i]; + + return hash; +} + +static uint64_t hash_key(const void *key, size_t size) +{ + uint64_t last = 0, ret; + + ret = mvkHash64(key, size / sizeof(uint64_t), MVKHASH_SEED); + if (size % sizeof(uint64_t)) + { + const char *c = key; + /* FIXME: Endianess? */ + c += align(size, sizeof(uint64_t)) - sizeof(uint64_t); + memcpy(&last, c, size % sizeof(uint64_t)); + ret = mvkHash64(&last, 1, ret); + } + return ret; +} + +static bool vkd3d_shader_cache_trylock(struct vkd3d_shader_cache *cache) +{ + /* Not yet implemented. */ + return true; +} + +static void vkd3d_shader_cache_unlock(struct vkd3d_shader_cache *cache) +{ + /* Not yet implemented. */ +} + int vkd3d_shader_cache_put(struct vkd3d_shader_cache *cache, const void *key, uint32_t key_size, const void *value, uint32_t value_size) { - FIXME("%p, %p, %#x, %p, %#x stub!\n", cache, key, key_size, value, value_size); - return VKD3D_ERROR_NOT_IMPLEMENTED; + struct shader_cache_entry *e; + struct rb_entry *entry; + enum vkd3d_result ret; + uint64_t hash; + + TRACE("%p, %p, %#x, %p, %#x.\n", cache, key, key_size, value, value_size); + + if (!vkd3d_shader_cache_trylock(cache)) + { + WARN("Cache lock not available.\n"); + return VKD3D_ERROR_LOCK_NOT_AVAILABLE; + } + + hash = hash_key(key, key_size); + entry = rb_get(&cache->tree, &hash); + e = entry ? RB_ENTRY_VALUE(entry, struct shader_cache_entry, entry) : NULL; + + if (e && (e->d.key_size != key_size || memcmp(e->payload, key, key_size))) + { + FIXME("Actual case of hash collission found.\n"); + exit(1); + } + + if (e && e->d.value_size >= value_size) + { + if (e->d.value_size == value_size && !memcmp(e->payload + e->d.key_size, value, value_size)) + { + TRACE("No-op store call, existing item unchanged.\n"); + } + else + { + e->d.value_size = value_size; + memcpy(e->payload + e->d.key_size, value, value_size); + TRACE("Cache item %"PRIu64" overwritten.\n", hash); + } + ret = VKD3D_OK; + goto unlock; + } + else if (e) + { + vkd3d_free(e->payload); + vkd3d_shader_cache_remove_item(cache, e); + vkd3d_free(e); + } + + e = vkd3d_calloc(1, sizeof(*e)); + if (!e) + { + ret = VKD3D_ERROR_OUT_OF_MEMORY; + goto unlock; + } + e->payload = vkd3d_malloc(key_size + value_size); + if (!e->payload) + { + vkd3d_free(e); + ret = VKD3D_ERROR_OUT_OF_MEMORY; + goto unlock; + } + + e->d.key_size = key_size; + e->d.value_size = value_size; + e->d.hash = hash; + memcpy(e->payload, key, key_size); + memcpy(e->payload + key_size, value, value_size); + + vkd3d_shader_cache_add_item(cache, e); + TRACE("Cache item %"PRIu64" stored.\n", hash); + ret = VKD3D_OK; + +unlock: + vkd3d_shader_cache_unlock(cache); + return ret; }

int vkd3d_shader_cache_get(struct vkd3d_shader_cache *cache,

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Stefan Dösinger

9:17 p.m.

New subject: [PATCH 05/17] vkd3d: Implement vkd3d_shader_cache_get.

From: Stefan Dösinger stefan@codeweavers.com

--- libs/vkd3d/cache.c | 58 ++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 56 insertions(+), 2 deletions(-)

diff --git a/libs/vkd3d/cache.c b/libs/vkd3d/cache.c index 250d3b825..c8d77294a 100644 --- a/libs/vkd3d/cache.c +++ b/libs/vkd3d/cache.c @@ -282,8 +282,62 @@ unlock: int vkd3d_shader_cache_get(struct vkd3d_shader_cache *cache, const void *key, uint32_t key_size, void *value, uint32_t *value_size) { - FIXME("%p, %p, %#x, %p, %p stub!\n", cache, key, key_size, value, value_size); - return VKD3D_ERROR_NOT_IMPLEMENTED; + struct shader_cache_entry *e; + struct rb_entry *entry; + enum vkd3d_result ret; + uint32_t size_in; + uint64_t hash; + + TRACE("%p, %p, %#x, %p, %p.\n", cache, key, key_size, value, value_size); + + if (!vkd3d_shader_cache_trylock(cache)) + { + WARN("Cache lock not available.\n"); + return VKD3D_ERROR_LOCK_NOT_AVAILABLE; + } + + size_in = *value_size; + + hash = hash_key(key, key_size); + entry = rb_get(&cache->tree, &hash); + if (!entry) + { + WARN("entry not found\n"); + ret = VKD3D_ERROR_NOT_FOUND; + goto unlock; + } + + e = RB_ENTRY_VALUE(entry, struct shader_cache_entry, entry); + if (key_size != e->d.key_size || memcmp(key, e->payload, key_size)) + { + /* There is a return value for this, but I want to see if this ever happens. */ + FIXME("Hash collission. sizes %u, %u. read from offset %x hash %"PRIu64"\n", + key_size, e->d.key_size, e->d.offset, e->d.hash); + exit(1); + } + + *value_size = e->d.value_size; + if (!value) + { + TRACE("Found item, returning needed size %#x.\n", e->d.value_size); + ret = VKD3D_OK; + goto unlock; + } + + if (size_in < e->d.value_size) + { + WARN("Output buffer is too small, got %#x want %#x.\n", size_in, e->d.value_size); + ret = VKD3D_ERROR_MORE_DATA; + goto unlock; + } + + memcpy(value, e->payload + e->d.key_size, e->d.value_size); + ret = VKD3D_OK; + TRACE("Returning cached data.\n"); + +unlock: + vkd3d_shader_cache_unlock(cache); + return ret; }

void vkd3d_shader_cache_delete_on_destroy(struct vkd3d_shader_cache *cache)

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Stefan Dösinger

9:17 p.m.

New subject: [PATCH 06/17] Add cache locking.

From: Stefan Dösinger stefan@codeweavers.com

Why is this in the cache and not the caller? To allow for future improvements, e.g. reader-writer locks, that allow for simultaneous reads while iterating over the cache.

Why trylock? If we try to add a new pipeline to the cache while iterating over existing pipelines during startup, it is better to discard that new pipeline than block until after all pipelines are loaded. --- libs/vkd3d/cache.c | 14 +++++++++++--- libs/vkd3d/vkd3d_private.h | 11 +++++++++++ 2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/libs/vkd3d/cache.c b/libs/vkd3d/cache.c index c8d77294a..2b692c343 100644 --- a/libs/vkd3d/cache.c +++ b/libs/vkd3d/cache.c @@ -49,6 +49,7 @@ struct vkd3d_shader_cache struct vkd3d_shader_cache_desc desc; struct list cache_list_entry;

+ struct vkd3d_mutex lock; struct rb_tree tree;

char name[1]; @@ -133,6 +134,7 @@ int vkd3d_shader_cache_open(const char *name, object->desc = *desc; memcpy(object->name, name, size); rb_init(&object->tree, vkd3d_shader_cache_compare_key); + vkd3d_mutex_init(&object->lock);

list_add_head(&cache_list, &object->cache_list_entry); vkd3d_mutex_unlock(&cache_list_mutex); @@ -160,6 +162,7 @@ void vkd3d_shader_cache_close(struct vkd3d_shader_cache *cache) vkd3d_mutex_unlock(&cache_list_mutex);

rb_destroy(&cache->tree, vkd3d_shader_cache_clear, NULL); + vkd3d_mutex_destroy(&cache->lock);

vkd3d_free(cache); } @@ -193,13 +196,18 @@ static uint64_t hash_key(const void *key, size_t size)

static bool vkd3d_shader_cache_trylock(struct vkd3d_shader_cache *cache) { - /* Not yet implemented. */ - return true; + if (cache->desc.flags & VKD3D_SHADER_CACHE_FLAGS_NO_SERIALIZE) + return true; + + return vkd3d_mutex_trylock(&cache->lock); }

static void vkd3d_shader_cache_unlock(struct vkd3d_shader_cache *cache) { - /* Not yet implemented. */ + if (cache->desc.flags & VKD3D_SHADER_CACHE_FLAGS_NO_SERIALIZE) + return; + + vkd3d_mutex_unlock(&cache->lock); }

int vkd3d_shader_cache_put(struct vkd3d_shader_cache *cache, diff --git a/libs/vkd3d/vkd3d_private.h b/libs/vkd3d/vkd3d_private.h index bf32d04c2..909cd650b 100644 --- a/libs/vkd3d/vkd3d_private.h +++ b/libs/vkd3d/vkd3d_private.h @@ -220,6 +220,11 @@ static inline void vkd3d_mutex_lock(struct vkd3d_mutex *lock) EnterCriticalSection(&lock->lock); }

+static inline bool vkd3d_mutex_trylock(struct vkd3d_mutex *lock) +{ + return TryEnterCriticalSection(&lock->lock); +} + static inline void vkd3d_mutex_unlock(struct vkd3d_mutex *lock) { LeaveCriticalSection(&lock->lock); @@ -324,6 +329,12 @@ static inline void vkd3d_mutex_lock(struct vkd3d_mutex *lock) ERR("Could not lock the mutex, error %d.\n", ret); }

+static inline bool vkd3d_mutex_trylock(struct vkd3d_mutex *lock) +{ + /* FIXME: Untested. */ + return !pthread_mutex_lock(&lock->lock); +} + static inline void vkd3d_mutex_unlock(struct vkd3d_mutex *lock) { int ret;

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Stefan Dösinger

9:17 p.m.

New subject: [PATCH 07/17] vkd3d: Implement vkd3d_shader_cache_enumerate.

From: Stefan Dösinger stefan@codeweavers.com

FIXME: Calling put/get from the enum callback will deadlock with a unix build that uses posix mutexes instead of win32 critical sections. --- libs/vkd3d/cache.c | 27 ++++++++++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/libs/vkd3d/cache.c b/libs/vkd3d/cache.c index 2b692c343..de08c1b2c 100644 --- a/libs/vkd3d/cache.c +++ b/libs/vkd3d/cache.c @@ -194,6 +194,14 @@ static uint64_t hash_key(const void *key, size_t size) return ret; }

+static void vkd3d_shader_cache_lock(struct vkd3d_shader_cache *cache) +{ + if (cache->desc.flags & VKD3D_SHADER_CACHE_FLAGS_NO_SERIALIZE) + return; + + vkd3d_mutex_lock(&cache->lock); +} + static bool vkd3d_shader_cache_trylock(struct vkd3d_shader_cache *cache) { if (cache->desc.flags & VKD3D_SHADER_CACHE_FLAGS_NO_SERIALIZE) @@ -353,8 +361,25 @@ void vkd3d_shader_cache_delete_on_destroy(struct vkd3d_shader_cache *cache) FIXME("Stub!\n"); }

+struct vkd3d_shader_cache_enum_ctx +{ + vkd3d_shader_cache_traverse_func *cb; + void *context; +}; + +static void vkd3d_shader_cache_trampoline(struct rb_entry *entry, void *context) +{ + struct vkd3d_shader_cache_enum_ctx *ctx = context; + struct shader_cache_entry *e = RB_ENTRY_VALUE(entry, struct shader_cache_entry, entry); + ctx->cb(e->payload, e->d.key_size, e->payload + e->d.key_size, e->d.value_size, ctx->context); +} + void vkd3d_shader_cache_enumerate(struct vkd3d_shader_cache *cache, vkd3d_shader_cache_traverse_func *cb, void *context) { - FIXME("%p, %p, %p: stub!\n", cache, cb, context); + struct vkd3d_shader_cache_enum_ctx ctx = {cb, context}; + + vkd3d_shader_cache_lock(cache); + rb_for_each_entry(&cache->tree, vkd3d_shader_cache_trampoline, &ctx); + vkd3d_shader_cache_unlock(cache); }

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Stefan Dösinger

9:17 p.m.

New subject: [PATCH 08/17] vkd3d: Replace the custom render pass cache with vkd3d_shader_cache.

From: Stefan Dösinger stefan@codeweavers.com

--- libs/vkd3d/device.c | 4 +- libs/vkd3d/state.c | 81 ++++++++++++++++---------------------- libs/vkd3d/vkd3d_private.h | 15 ++----- 3 files changed, 40 insertions(+), 60 deletions(-)

diff --git a/libs/vkd3d/device.c b/libs/vkd3d/device.c index 69a46e918..194cb11d1 100644 --- a/libs/vkd3d/device.c +++ b/libs/vkd3d/device.c @@ -2537,7 +2537,7 @@ static ULONG STDMETHODCALLTYPE d3d12_device_Release(ID3D12Device5 *iface) vkd3d_uav_clear_state_cleanup(&device->uav_clear_state, device); vkd3d_destroy_null_resources(&device->null_resources, device); vkd3d_gpu_va_allocator_cleanup(&device->gpu_va_allocator); - vkd3d_render_pass_cache_cleanup(&device->render_pass_cache, device); + vkd3d_render_pass_cache_cleanup(device->render_pass_cache, device); d3d12_device_destroy_pipeline_cache(device); d3d12_device_destroy_vkd3d_queues(device); vkd3d_desc_object_cache_cleanup(&device->view_desc_cache); @@ -4365,7 +4365,7 @@ static HRESULT d3d12_device_init(struct d3d12_device *device, goto out_cleanup_descriptor_heap_layouts; }

- vkd3d_render_pass_cache_init(&device->render_pass_cache); + device->render_pass_cache = vkd3d_render_pass_cache_init(device); vkd3d_gpu_va_allocator_init(&device->gpu_va_allocator); vkd3d_time_domains_init(device);

diff --git a/libs/vkd3d/state.c b/libs/vkd3d/state.c index 1457ddf9c..b812ea9f7 100644 --- a/libs/vkd3d/state.c +++ b/libs/vkd3d/state.c @@ -1554,13 +1554,12 @@ struct vkd3d_render_pass_entry

STATIC_ASSERT(sizeof(struct vkd3d_render_pass_key) == 48);

-static HRESULT vkd3d_render_pass_cache_create_pass_locked(struct vkd3d_render_pass_cache *cache, +static HRESULT vkd3d_render_pass_cache_create_pass_locked(struct vkd3d_shader_cache *cache, struct d3d12_device *device, const struct vkd3d_render_pass_key *key, VkRenderPass *vk_render_pass) { VkAttachmentReference attachment_references[D3D12_SIMULTANEOUS_RENDER_TARGET_COUNT + 1]; VkAttachmentDescription attachments[D3D12_SIMULTANEOUS_RENDER_TARGET_COUNT + 1]; const struct vkd3d_vk_device_procs *vk_procs = &device->vk_procs; - struct vkd3d_render_pass_entry *entry; unsigned int index, attachment_index; VkSubpassDescription sub_pass_desc; VkRenderPassCreateInfo pass_info; @@ -1568,17 +1567,6 @@ static HRESULT vkd3d_render_pass_cache_create_pass_locked(struct vkd3d_render_pa unsigned int rt_count; VkResult vr;

- if (!vkd3d_array_reserve((void **)&cache->render_passes, &cache->render_passes_size, - cache->render_pass_count + 1, sizeof(*cache->render_passes))) - { - *vk_render_pass = VK_NULL_HANDLE; - return E_OUTOFMEMORY; - } - - entry = &cache->render_passes[cache->render_pass_count]; - - entry->key = *key; - have_depth_stencil = key->depth_enable || key->stencil_enable; rt_count = have_depth_stencil ? key->attachment_count - 1 : key->attachment_count; assert(rt_count <= D3D12_SIMULTANEOUS_RENDER_TARGET_COUNT); @@ -1672,8 +1660,7 @@ static HRESULT vkd3d_render_pass_cache_create_pass_locked(struct vkd3d_render_pa pass_info.pDependencies = NULL; if ((vr = VK_CALL(vkCreateRenderPass(device->vk_device, &pass_info, NULL, vk_render_pass))) >= 0) { - entry->vk_render_pass = *vk_render_pass; - ++cache->render_pass_count; + vkd3d_shader_cache_put(cache, key, sizeof(*key), vk_render_pass, sizeof(*vk_render_pass)); } else { @@ -1684,28 +1671,18 @@ static HRESULT vkd3d_render_pass_cache_create_pass_locked(struct vkd3d_render_pa return hresult_from_vk_result(vr); }

-HRESULT vkd3d_render_pass_cache_find(struct vkd3d_render_pass_cache *cache, - struct d3d12_device *device, const struct vkd3d_render_pass_key *key, VkRenderPass *vk_render_pass) +HRESULT vkd3d_render_pass_cache_find(struct vkd3d_shader_cache *cache, struct d3d12_device *device, + const struct vkd3d_render_pass_key *key, VkRenderPass *vk_render_pass) { - bool found = false; + uint32_t size = sizeof(*vk_render_pass); + enum vkd3d_result ret; HRESULT hr = S_OK; - unsigned int i;

vkd3d_mutex_lock(&device->pipeline_cache_mutex);

- for (i = 0; i < cache->render_pass_count; ++i) - { - struct vkd3d_render_pass_entry *current = &cache->render_passes[i]; + ret = vkd3d_shader_cache_get(device->render_pass_cache, key, sizeof(*key), vk_render_pass, &size);

- if (!memcmp(&current->key, key, sizeof(*key))) - { - *vk_render_pass = current->vk_render_pass; - found = true; - break; - } - } - - if (!found) + if (ret) hr = vkd3d_render_pass_cache_create_pass_locked(cache, device, key, vk_render_pass);

vkd3d_mutex_unlock(&device->pipeline_cache_mutex); @@ -1713,27 +1690,37 @@ HRESULT vkd3d_render_pass_cache_find(struct vkd3d_render_pass_cache *cache, return hr; }

-void vkd3d_render_pass_cache_init(struct vkd3d_render_pass_cache *cache) +struct vkd3d_shader_cache *vkd3d_render_pass_cache_init(struct d3d12_device *device) { - cache->render_passes = NULL; - cache->render_pass_count = 0; - cache->render_passes_size = 0; + struct vkd3d_shader_cache_desc cache_desc = {0}; + struct vkd3d_shader_cache *cache; + enum vkd3d_result ret; + char cache_name[128]; + + cache_desc.mem_size = ~0; + cache_desc.max_entries = ~0; + cache_desc.flags = VKD3D_SHADER_CACHE_FLAGS_NO_SERIALIZE; + sprintf(cache_name, "memory:%p:renderpass", device); + + if ((ret = vkd3d_shader_cache_open(cache_name, &cache_desc, &cache))) + ERR("Failed to create an in-memory cache\n"); + return cache; }

-void vkd3d_render_pass_cache_cleanup(struct vkd3d_render_pass_cache *cache, - struct d3d12_device *device) +static bool vkd3d_rp_cache_cleanup(const void *key, uint32_t key_size, + const void *data, uint32_t data_size, void *context) { + struct d3d12_device *device = context; const struct vkd3d_vk_device_procs *vk_procs = &device->vk_procs; - unsigned int i; - - for (i = 0; i < cache->render_pass_count; ++i) - { - struct vkd3d_render_pass_entry *current = &cache->render_passes[i]; - VK_CALL(vkDestroyRenderPass(device->vk_device, current->vk_render_pass, NULL)); - } + VkRenderPass pass = *(VkRenderPass *)data; + VK_CALL(vkDestroyRenderPass(device->vk_device, pass, NULL)); + return true; +}

- vkd3d_free(cache->render_passes); - cache->render_passes = NULL; +void vkd3d_render_pass_cache_cleanup(struct vkd3d_shader_cache *cache, struct d3d12_device *device) +{ + vkd3d_shader_cache_enumerate(cache, vkd3d_rp_cache_cleanup, device); + vkd3d_shader_cache_close(cache); }

static void d3d12_init_pipeline_state_desc(struct d3d12_pipeline_state_desc *desc) @@ -2889,7 +2876,7 @@ static HRESULT d3d12_graphics_pipeline_state_create_render_pass( key.padding = 0; key.sample_count = graphics->ms_desc.rasterizationSamples;

- return vkd3d_render_pass_cache_find(&device->render_pass_cache, device, &key, vk_render_pass); + return vkd3d_render_pass_cache_find(device->render_pass_cache, device, &key, vk_render_pass); }

static VkLogicOp vk_logic_op_from_d3d12(D3D12_LOGIC_OP op) diff --git a/libs/vkd3d/vkd3d_private.h b/libs/vkd3d/vkd3d_private.h index 909cd650b..7bbb831e6 100644 --- a/libs/vkd3d/vkd3d_private.h +++ b/libs/vkd3d/vkd3d_private.h @@ -543,17 +543,10 @@ struct vkd3d_render_pass_key

struct vkd3d_render_pass_entry;

-struct vkd3d_render_pass_cache -{ - struct vkd3d_render_pass_entry *render_passes; - size_t render_pass_count; - size_t render_passes_size; -}; - -void vkd3d_render_pass_cache_cleanup(struct vkd3d_render_pass_cache *cache, struct d3d12_device *device); -HRESULT vkd3d_render_pass_cache_find(struct vkd3d_render_pass_cache *cache, struct d3d12_device *device, +struct vkd3d_shader_cache *vkd3d_render_pass_cache_init(struct d3d12_device *device); +void vkd3d_render_pass_cache_cleanup(struct vkd3d_shader_cache *cache, struct d3d12_device *device); +HRESULT vkd3d_render_pass_cache_find(struct vkd3d_shader_cache *cache, struct d3d12_device *device, const struct vkd3d_render_pass_key *key, VkRenderPass *vk_render_pass); -void vkd3d_render_pass_cache_init(struct vkd3d_render_pass_cache *cache);

struct vkd3d_private_store { @@ -1785,7 +1778,7 @@ struct d3d12_device bool worker_should_exit;

struct vkd3d_mutex pipeline_cache_mutex; - struct vkd3d_render_pass_cache render_pass_cache; + struct vkd3d_shader_cache *render_pass_cache; VkPipelineCache vk_pipeline_cache;

VkPhysicalDeviceMemoryProperties memory_properties;

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Stefan Dösinger

9:17 p.m.

New subject: [PATCH 09/17] vkd3d: Basic shader cache writing and reading.

From: Stefan Dösinger stefan@codeweavers.com

--- libs/vkd3d/cache.c | 217 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 217 insertions(+)

diff --git a/libs/vkd3d/cache.c b/libs/vkd3d/cache.c index de08c1b2c..6e16ecfb5 100644 --- a/libs/vkd3d/cache.c +++ b/libs/vkd3d/cache.c @@ -32,6 +32,21 @@ static LONG cache_mutex_initialized; * * We don't intend these files to be read by third party code, so consider them a vkd3d * implementation detail. */ + +/* TODO: Endinaness of all these uints. */ + +/* VKD3DSHC */ +#define VKD3D_SHADER_CACHE_MAGIC 0x564B443344534843ull +#define VKD3D_SHADER_CACHE_VERSION ((uint64_t)1) + +struct vkd3d_cache_header_v1 +{ + uint64_t magic; + uint64_t struct_size; + uint64_t vkd3d_version; + uint64_t app_version; +}; + struct vkd3d_cache_object_v1 { uint64_t hash; @@ -52,6 +67,8 @@ struct vkd3d_shader_cache struct vkd3d_mutex lock; struct rb_tree tree;

+ FILE *indices, *values; + char name[1]; };

@@ -84,6 +101,136 @@ static void vkd3d_shader_cache_remove_item(struct vkd3d_shader_cache *cache, str rb_remove(&cache->tree, &e->entry); }

+static bool vkd3d_shader_cache_read_entry(struct vkd3d_shader_cache *cache, struct shader_cache_entry *e) +{ + size_t len; + + TRACE("reading object key len %u, data %ud.\n", e->d.key_size, e->d.value_size); + /* TODO: Check if the read size makes sense - is it smaller than the requested + * max size, is it smaller than the file on the disk etc. */ + e->payload = vkd3d_malloc(e->d.key_size + e->d.value_size); + if (!e->payload) + { + WARN("Out of memory.\n"); + return false; + } + + if (e->d.disk_size != e->d.key_size + e->d.value_size) + ERR("How do I get a compressed object before implementing compression?\n"); + + fseek(cache->values, e->d.offset, SEEK_SET); + len = fread(e->payload, e->d.key_size + e->d.value_size, 1, cache->values); + if (len != 1) + { + /* I suppose this could be handled better. */ + ERR("Failed to read cached object data len %u offset %u.\n", + e->d.key_size + e->d.value_size, e->d.offset); + vkd3d_free(e->payload); + return false; + } + + return true; +} + +static void vkd3d_shader_cache_read(struct vkd3d_shader_cache *cache) +{ + struct shader_cache_entry *e = NULL; + struct vkd3d_cache_header_v1 hdr; + char *filename; + FILE *indices; + size_t len; + + filename = vkd3d_malloc(strlen(cache->name) + 5); + + sprintf(filename, "%s.val", cache->name); + cache->values = fopen(filename, "r+b"); + if (!cache->values) + { + cache->values = fopen(filename, "w+b"); + if (!cache->values) + { + WARN("Value file %s not found and could not be created.\n", filename); + cache->desc.disk_size = 0; /* Convert to mem only. */ + vkd3d_free(filename); + return; + } + } + + sprintf(filename, "%s.idx", cache->name); + indices = fopen(filename, "rb"); + if (!indices) + { + /* This happens when the cache files did not exist. Keep the opened + * values file, we'll use it later. */ + WARN("Index file %s not found.\n", filename); + vkd3d_free(filename); + return; + } + + vkd3d_free(filename); + + TRACE("Reading cache %s.{idx, val}.\n", cache->name); + + len = fread(&hdr, sizeof(hdr), 1, indices); + if (len != 1) + { + WARN("Failed to read cache header.\n"); + goto done; + } + if (hdr.magic != VKD3D_SHADER_CACHE_MAGIC) + { + WARN("Invalid cache magic.\n"); + goto done; + } + if (hdr.struct_size < sizeof(hdr)) + { + WARN("Invalid cache header size.\n"); + goto done; + } + if (hdr.vkd3d_version != VKD3D_SHADER_CACHE_VERSION) + { + WARN("vkd3d shader version mismatch: Got %"PRIu64", want %"PRIu64".\n", + hdr.vkd3d_version, VKD3D_SHADER_CACHE_VERSION); + goto done; + } + if (hdr.app_version != cache->desc.version) + { + WARN("Application version mismatch: Cache has %"PRIu64", app wants %"PRIu64".\n", + hdr.app_version, cache->desc.version); + goto done; + } + + while (!feof(indices)) + { + e = vkd3d_calloc(1, sizeof(*e)); + if (!e) + { + WARN("Alloc fail.\n"); + break; + } + + len = fread(&e->d, sizeof(e->d), 1, indices); + if (len != 1) + { + if (!feof(indices)) + ERR("Failed to read object header.\n"); + break; + } + + if (!vkd3d_shader_cache_read_entry(cache, e)) + break; + + vkd3d_shader_cache_add_item(cache, e); + + TRACE("Loaded an item.\n"); + e = NULL; + } + +done: + vkd3d_free(e); + fclose(indices); +} + int vkd3d_shader_cache_open(const char *name, const struct vkd3d_shader_cache_desc *desc, struct vkd3d_shader_cache **cache) { @@ -139,6 +286,9 @@ int vkd3d_shader_cache_open(const char *name, list_add_head(&cache_list, &object->cache_list_entry); vkd3d_mutex_unlock(&cache_list_mutex);

+ if (desc->disk_size) + vkd3d_shader_cache_read(object); + *cache = object; return S_OK; } @@ -149,6 +299,70 @@ static void vkd3d_shader_cache_clear(struct rb_entry *entry, void *context) vkd3d_free(e); }

+struct write_context +{ + struct vkd3d_shader_cache *cache; + FILE *indices; +}; + +static void vkd3d_shader_cache_write_entry(struct rb_entry *entry, void *context) +{ + struct shader_cache_entry *e = RB_ENTRY_VALUE(entry, struct shader_cache_entry, entry); + struct write_context *ctx = context; + struct vkd3d_shader_cache *cache = ctx->cache; + + /* TODO: Compress the data. */ + e->d.disk_size = e->d.key_size + e->d.value_size; + e->d.offset = ftell(cache->values); + + fwrite(&e->d, sizeof(e->d), 1, ctx->indices); + fwrite(e->payload, e->d.disk_size, 1, cache->values); +} + +static void vkd3d_shader_cache_write(struct vkd3d_shader_cache *cache) +{ + struct vkd3d_cache_header_v1 hdr; + struct write_context ctx; + char *filename; + + fseek(cache->values, 0, SEEK_END); + + filename = vkd3d_malloc(strlen(cache->name) + 5); + /* For now unconditionally repack. */ + if (1) + { + fclose(cache->values); + sprintf(filename, "%s.val", cache->name); + cache->values = fopen(filename, "w+b"); + if (!cache->values) + ERR("Reopen fail\n"); + } + + sprintf(filename, "%s.idx", cache->name); + ctx.indices = fopen(filename, "wb"); + if (!ctx.indices) + { + WARN("Failed to open %s\n", filename); + vkd3d_free(filename); + return; + } + vkd3d_free(filename); + + ctx.cache = cache; + hdr.magic = VKD3D_SHADER_CACHE_MAGIC; + hdr.struct_size = sizeof(hdr); + hdr.vkd3d_version = VKD3D_SHADER_CACHE_VERSION; + hdr.app_version = cache->desc.version; + + fwrite(&hdr, sizeof(hdr), 1, ctx.indices); + + rb_for_each_entry(&cache->tree, vkd3d_shader_cache_write_entry, &ctx); + + fseek(cache->values, 0, SEEK_END); + fclose(cache->values); + fclose(ctx.indices); +} + void vkd3d_shader_cache_close(struct vkd3d_shader_cache *cache) { ULONG refcount = InterlockedDecrement(&cache->refcount); @@ -157,6 +371,9 @@ void vkd3d_shader_cache_close(struct vkd3d_shader_cache *cache) if (refcount) return;

+ if (cache->desc.disk_size) + vkd3d_shader_cache_write(cache); + vkd3d_mutex_lock(&cache_list_mutex); list_remove(&cache->cache_list_entry); vkd3d_mutex_unlock(&cache_list_mutex);

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Stefan Dösinger

9:17 p.m.

New subject: [PATCH 10/17] Add a win32 version of vkd3d_get_program_name.

From: Stefan Dösinger stefan@codeweavers.com

Taken from wined3d_get_app_name. --- libs/vkd3d/utils.c | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+)

diff --git a/libs/vkd3d/utils.c b/libs/vkd3d/utils.c index 5ebe1b63e..296460615 100644 --- a/libs/vkd3d/utils.c +++ b/libs/vkd3d/utils.c @@ -879,6 +879,33 @@ bool vkd3d_get_program_name(char program_name[PATH_MAX]) return true; }

+#elif defined(WIN32) + +bool vkd3d_get_program_name(char program_name[PATH_MAX]) +{ + char buffer[MAX_PATH]; + unsigned int len; + char *p, *name; + + *program_name = '\0'; + len = GetModuleFileNameA(0, buffer, ARRAY_SIZE(buffer)); + if (!(len && len < MAX_PATH)) + return false; + + name = buffer; + if ((p = strrchr(name, '/' ))) + name = p + 1; + if ((p = strrchr(name, '\'))) + name = p + 1; + + len = strlen(name) + 1; + if (PATH_MAX < len) + return false; + + memcpy(program_name, name, len); + return true; +} + #else

bool vkd3d_get_program_name(char program_name[PATH_MAX])

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Stefan Dösinger

9:17 p.m.

New subject: [PATCH 11/17] vkd3d: Keep the application name around.

From: Stefan Dösinger stefan@codeweavers.com

--- libs/vkd3d/device.c | 4 ++++ libs/vkd3d/vkd3d_private.h | 1 + 2 files changed, 5 insertions(+)

diff --git a/libs/vkd3d/device.c b/libs/vkd3d/device.c index 194cb11d1..63f07fd46 100644 --- a/libs/vkd3d/device.c +++ b/libs/vkd3d/device.c @@ -583,6 +583,7 @@ static HRESULT vkd3d_instance_init(struct vkd3d_instance *instance, application_info.apiVersion = VK_API_VERSION_1_0; instance->api_version = VKD3D_API_VERSION_1_0;

+ application_info.pApplicationName = ""; if ((vkd3d_application_info = vkd3d_find_struct(create_info->next, APPLICATION_INFO))) { if (vkd3d_application_info->application_name) @@ -602,6 +603,9 @@ static HRESULT vkd3d_instance_init(struct vkd3d_instance *instance, application_info.pApplicationName = application_name; }

+ strncpy(instance->application_name, application_info.pApplicationName, + ARRAY_SIZE(instance->application_name)); + instance->application_name[ARRAY_SIZE(instance->application_name) - 1] = '\0'; TRACE("Application: %s.\n", debugstr_a(application_info.pApplicationName)); TRACE("vkd3d API version: %u.\n", instance->api_version);

diff --git a/libs/vkd3d/vkd3d_private.h b/libs/vkd3d/vkd3d_private.h index 7bbb831e6..52d72e3d0 100644 --- a/libs/vkd3d/vkd3d_private.h +++ b/libs/vkd3d/vkd3d_private.h @@ -191,6 +191,7 @@ struct vkd3d_instance uint64_t host_ticks_per_second;

LONG refcount; + char application_name[PATH_MAX]; };

#ifdef _WIN32

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Stefan Dösinger

9:17 p.m.

New subject: [PATCH 12/17] vkd3d: Store the VK pipeline cache in an on-disk vkd3d cache.

From: Stefan Dösinger stefan@codeweavers.com

--- libs/vkd3d/device.c | 78 ++++++++++++++++++++++++++++++++++++-- libs/vkd3d/vkd3d_private.h | 32 ++++++++++++++++ 2 files changed, 107 insertions(+), 3 deletions(-)

diff --git a/libs/vkd3d/device.c b/libs/vkd3d/device.c index 63f07fd46..46803c35c 100644 --- a/libs/vkd3d/device.c +++ b/libs/vkd3d/device.c @@ -19,8 +19,17 @@ #include "vkd3d_private.h" #include "vkd3d_version.h"

+#ifdef HAVE_UNISTD_H +#include <unistd.h> +#endif + #define VKD3D_MAX_UAV_CLEAR_DESCRIPTORS_PER_TYPE 256u

+/* FIXME: We may want to put the GPU and driver identities in there, + * although under which conditions the pipeline cache can be transfered + * from one GPU/driver to another is a Vulkan implementation detail. */ +static const char vk_pipeline_cache_key[] = "vk_pipeline_cache"; + struct vkd3d_struct { enum vkd3d_structure_type type; @@ -2058,15 +2067,26 @@ static HRESULT d3d12_device_init_pipeline_cache(struct d3d12_device *device) { const struct vkd3d_vk_device_procs *vk_procs = &device->vk_procs; VkPipelineCacheCreateInfo cache_info; + struct vkd3d_shader_cache_vk_blob *cache_data = NULL; + uint32_t cache_size = 0; VkResult vr;

vkd3d_mutex_init(&device->pipeline_cache_mutex);

+ if (!vkd3d_shader_cache_get(device->persistent_cache, vk_pipeline_cache_key, + sizeof(vk_pipeline_cache_key), NULL, &cache_size)) + { + cache_data = vkd3d_malloc(cache_size); + vkd3d_shader_cache_get(device->persistent_cache, vk_pipeline_cache_key, + sizeof(vk_pipeline_cache_key), cache_data, &cache_size); + cache_size -= offsetof(struct vkd3d_shader_cache_vk_blob, blob[0]); + } + cache_info.sType = VK_STRUCTURE_TYPE_PIPELINE_CACHE_CREATE_INFO; cache_info.pNext = NULL; cache_info.flags = 0; - cache_info.initialDataSize = 0; - cache_info.pInitialData = NULL; + cache_info.initialDataSize = cache_size; + cache_info.pInitialData = cache_data->blob; if ((vr = VK_CALL(vkCreatePipelineCache(device->vk_device, &cache_info, NULL, &device->vk_pipeline_cache))) < 0) { @@ -2074,15 +2094,40 @@ static HRESULT d3d12_device_init_pipeline_cache(struct d3d12_device *device) device->vk_pipeline_cache = VK_NULL_HANDLE; }

+ vkd3d_free(cache_data); + return S_OK; }

static void d3d12_device_destroy_pipeline_cache(struct d3d12_device *device) { const struct vkd3d_vk_device_procs *vk_procs = &device->vk_procs; + struct vkd3d_shader_cache_vk_blob *cache_data = NULL; + size_t cache_size = 0; + VkResult vr;

if (device->vk_pipeline_cache) + { + vr = VK_CALL(vkGetPipelineCacheData(device->vk_device, device->vk_pipeline_cache, &cache_size, NULL)); + if (vr == VK_SUCCESS && cache_size) + cache_data = vkd3d_malloc(offsetof(struct vkd3d_shader_cache_vk_blob, blob[cache_size])); + if (cache_data) + { + cache_data->header.type = SHADER_CACHE_ENTRY_VULKAN_BLOB; + cache_data->header.vkd3d_revision = VKD3D_SHADER_CACHE_VKD3D_VERSION; + vr = VK_CALL(vkGetPipelineCacheData(device->vk_device, device->vk_pipeline_cache, + &cache_size, cache_data->blob)); + if (vr == VK_SUCCESS) + { + vkd3d_shader_cache_put(device->persistent_cache, vk_pipeline_cache_key, + sizeof(vk_pipeline_cache_key), cache_data, + offsetof(struct vkd3d_shader_cache_vk_blob, blob[cache_size])); + } + vkd3d_free(cache_data); + } + VK_CALL(vkDestroyPipelineCache(device->vk_device, device->vk_pipeline_cache, NULL)); + }

vkd3d_mutex_destroy(&device->pipeline_cache_mutex); } @@ -2549,6 +2594,7 @@ static ULONG STDMETHODCALLTYPE d3d12_device_Release(ID3D12Device5 *iface) if (device->use_vk_heaps) device_worker_stop(device); vkd3d_free(device->heaps); + vkd3d_shader_cache_close(device->persistent_cache); VK_CALL(vkDestroyDevice(device->vk_device, NULL)); if (device->parent) IUnknown_Release(device->parent); @@ -4317,7 +4363,9 @@ static void *device_worker_main(void *arg) static HRESULT d3d12_device_init(struct d3d12_device *device, struct vkd3d_instance *instance, const struct vkd3d_device_create_info *create_info) { + struct vkd3d_shader_cache_desc cache_desc = {0}; const struct vkd3d_vk_device_procs *vk_procs; + char *cache_name, *cwd; HRESULT hr;

device->ID3D12Device5_iface.lpVtbl = &d3d12_device_vtbl; @@ -4344,8 +4392,30 @@ static HRESULT d3d12_device_init(struct d3d12_device *device, if (FAILED(hr = vkd3d_create_vk_device(device, create_info))) goto out_free_instance;

+ /* FIXME: Does this use of getcwd work on Unix too? */ + cwd = getcwd(NULL, 0); + cache_name = vkd3d_malloc(strlen(cwd) + strlen(instance->application_name) + 8); + sprintf(cache_name, "%s/%s.cache", cwd, instance->application_name); + free(cwd); /* Use libc's free() because it is malloc'ed by getcwd. */ + + cache_desc.mem_size = 32 << 20; + cache_desc.disk_size = ~0u; + cache_desc.max_entries = ~0u; + cache_desc.version = VKD3D_SHADER_CACHE_OBJ_VERSION; + if (vkd3d_shader_cache_open(cache_name, &cache_desc, &device->persistent_cache)) + { + FIXME("Failed to open shader cache %s\n", debugstr_a(cache_name)); + cache_desc.disk_size = 0; + if (vkd3d_shader_cache_open(cache_name, &cache_desc, &device->persistent_cache)) + { + vkd3d_free(cache_name); + goto out_free_vk_resources; + } + } + vkd3d_free(cache_name); + if (FAILED(hr = d3d12_device_init_pipeline_cache(device))) - goto out_free_vk_resources; + goto out_free_cache;

if (FAILED(hr = vkd3d_private_store_init(&device->private_store))) goto out_free_pipeline_cache; @@ -4398,6 +4468,8 @@ out_free_private_store: vkd3d_private_store_destroy(&device->private_store); out_free_pipeline_cache: d3d12_device_destroy_pipeline_cache(device); +out_free_cache: + vkd3d_shader_cache_close(device->persistent_cache); out_free_vk_resources: vk_procs = &device->vk_procs; VK_CALL(vkDestroyDevice(device->vk_device, NULL)); diff --git a/libs/vkd3d/vkd3d_private.h b/libs/vkd3d/vkd3d_private.h index 52d72e3d0..8a8099400 100644 --- a/libs/vkd3d/vkd3d_private.h +++ b/libs/vkd3d/vkd3d_private.h @@ -43,6 +43,37 @@ #include <limits.h> #include <stdbool.h>

+/* The following structures define data structures that are stored in vkd3d's + * cache. It doesn't define the cache format itself, those details are found + * in cache.c. + * + * Changing the structures will break compatibility with existing cache files. + * In this case bump VKD3D_SHADER_CACHE_OBJ_VERSION. + * + * The structs aren't meant to be read by external code, consider them a vkd3d + * implementation detail. */ +#define VKD3D_SHADER_CACHE_OBJ_VERSION 1ull +#define VKD3D_SHADER_CACHE_VKD3D_VERSION 1u + +enum vkd3d_shader_cache_entry_type +{ + SHADER_CACHE_ENTRY_VULKAN_BLOB = VKD3D_MAKE_TAG('V', 'K', 'P', 'C'), +}; + +struct vkd3d_shader_cache_entry +{ + uint32_t vkd3d_revision; /* Put the git revision here, discard translated code if changed. */ + uint32_t type; +}; + +struct vkd3d_shader_cache_vk_blob +{ + struct vkd3d_shader_cache_entry header; + uint8_t blob[1]; +}; + +/* End shader data structures */ + #define VK_CALL(f) (vk_procs->f)

#define VKD3D_DESCRIPTOR_MAGIC_FREE 0x00000000u @@ -1779,6 +1810,7 @@ struct d3d12_device bool worker_should_exit;

struct vkd3d_mutex pipeline_cache_mutex; + struct vkd3d_shader_cache *persistent_cache; struct vkd3d_shader_cache *render_pass_cache; VkPipelineCache vk_pipeline_cache;

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Stefan Dösinger

9:17 p.m.

New subject: [PATCH 13/17] Store render passes in the on-disk cache and recreate them on startup.

From: Stefan Dösinger stefan@codeweavers.com

This doesn't do all too much, renderpass creation is fast. It is a nice demonstration though. We might want to skip this patch when committing the cache upstream. --- libs/vkd3d/device.c | 26 ++++++++++++++++++++++++++ libs/vkd3d/state.c | 5 +++++ libs/vkd3d/vkd3d_private.h | 23 ++++++++++++----------- 3 files changed, 43 insertions(+), 11 deletions(-)

diff --git a/libs/vkd3d/device.c b/libs/vkd3d/device.c index 46803c35c..82212a33e 100644 --- a/libs/vkd3d/device.c +++ b/libs/vkd3d/device.c @@ -4360,6 +4360,30 @@ static void *device_worker_main(void *arg) return NULL; }

+static bool d3d12_device_load_cache(const void *key, uint32_t key_size, + const void *value, uint32_t value_size, void *context) +{ + const struct vkd3d_shader_cache_entry *e = value; + struct d3d12_device *device = context; + VkRenderPass rp; + + TRACE("device %p got entry type (%c%c%c%c)\n", device, + e->type & 0xff, e->type >> 8 & 0xff, e->type >> 16 & 0xff, + e->type >> 24 & 0xff); + + switch (e->type) + { + case SHADER_CACHE_ENTRY_RENDER_PASS: + vkd3d_render_pass_cache_find(device->render_pass_cache, device, key, &rp); + break; + + case SHADER_CACHE_ENTRY_VULKAN_BLOB: + break; + } + + return true; +} + static HRESULT d3d12_device_init(struct d3d12_device *device, struct vkd3d_instance *instance, const struct vkd3d_device_create_info *create_info) { @@ -4451,6 +4475,8 @@ static HRESULT d3d12_device_init(struct d3d12_device *device,

device_init_descriptor_pool_sizes(device);

+ vkd3d_shader_cache_enumerate(device->persistent_cache, d3d12_device_load_cache, device); + if ((device->parent = create_info->parent)) IUnknown_AddRef(device->parent);

diff --git a/libs/vkd3d/state.c b/libs/vkd3d/state.c index b812ea9f7..5fa4da3e4 100644 --- a/libs/vkd3d/state.c +++ b/libs/vkd3d/state.c @@ -1560,6 +1560,7 @@ static HRESULT vkd3d_render_pass_cache_create_pass_locked(struct vkd3d_shader_ca VkAttachmentReference attachment_references[D3D12_SIMULTANEOUS_RENDER_TARGET_COUNT + 1]; VkAttachmentDescription attachments[D3D12_SIMULTANEOUS_RENDER_TARGET_COUNT + 1]; const struct vkd3d_vk_device_procs *vk_procs = &device->vk_procs; + struct vkd3d_shader_cache_entry rp_data; unsigned int index, attachment_index; VkSubpassDescription sub_pass_desc; VkRenderPassCreateInfo pass_info; @@ -1668,6 +1669,10 @@ static HRESULT vkd3d_render_pass_cache_create_pass_locked(struct vkd3d_shader_ca *vk_render_pass = VK_NULL_HANDLE; }

+ rp_data.type = SHADER_CACHE_ENTRY_RENDER_PASS; + rp_data.vkd3d_revision = VKD3D_SHADER_CACHE_VKD3D_VERSION; + vkd3d_shader_cache_put(device->persistent_cache, key, sizeof(*key), &rp_data, sizeof(rp_data)); + return hresult_from_vk_result(vr); }

diff --git a/libs/vkd3d/vkd3d_private.h b/libs/vkd3d/vkd3d_private.h index 8a8099400..5144fa2b1 100644 --- a/libs/vkd3d/vkd3d_private.h +++ b/libs/vkd3d/vkd3d_private.h @@ -55,8 +55,20 @@ #define VKD3D_SHADER_CACHE_OBJ_VERSION 1ull #define VKD3D_SHADER_CACHE_VKD3D_VERSION 1u

+struct vkd3d_render_pass_key +{ + unsigned int attachment_count; + bool depth_enable; + bool stencil_enable; + bool depth_stencil_write; + bool padding; + unsigned int sample_count; + VkFormat vk_formats[D3D12_SIMULTANEOUS_RENDER_TARGET_COUNT + 1]; +}; + enum vkd3d_shader_cache_entry_type { + SHADER_CACHE_ENTRY_RENDER_PASS = VKD3D_MAKE_TAG('R', 'P', 'A', 'S'), SHADER_CACHE_ENTRY_VULKAN_BLOB = VKD3D_MAKE_TAG('V', 'K', 'P', 'C'), };

@@ -562,17 +574,6 @@ D3D12_GPU_VIRTUAL_ADDRESS vkd3d_gpu_va_allocator_allocate(struct vkd3d_gpu_va_al void *vkd3d_gpu_va_allocator_dereference(struct vkd3d_gpu_va_allocator *allocator, D3D12_GPU_VIRTUAL_ADDRESS address); void vkd3d_gpu_va_allocator_free(struct vkd3d_gpu_va_allocator *allocator, D3D12_GPU_VIRTUAL_ADDRESS address);

-struct vkd3d_render_pass_key -{ - unsigned int attachment_count; - bool depth_enable; - bool stencil_enable; - bool depth_stencil_write; - bool padding; - unsigned int sample_count; - VkFormat vk_formats[D3D12_SIMULTANEOUS_RENDER_TARGET_COUNT + 1]; -}; - struct vkd3d_render_pass_entry;

struct vkd3d_shader_cache *vkd3d_render_pass_cache_init(struct d3d12_device *device);

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Stefan Dösinger

9:17 p.m.

New subject: [PATCH 14/17] vkd3d: Keep root signatures around.

From: Stefan Dösinger stefan@codeweavers.com

--- libs/vkd3d/device.c | 9 ++++- libs/vkd3d/state.c | 69 +++++++++++++++++++++++++++++++++----- libs/vkd3d/vkd3d_private.h | 5 +++ tests/d3d12.c | 8 ++--- 4 files changed, 77 insertions(+), 14 deletions(-)

diff --git a/libs/vkd3d/device.c b/libs/vkd3d/device.c index 82212a33e..51effbef4 100644 --- a/libs/vkd3d/device.c +++ b/libs/vkd3d/device.c @@ -2594,6 +2594,7 @@ static ULONG STDMETHODCALLTYPE d3d12_device_Release(ID3D12Device5 *iface) if (device->use_vk_heaps) device_worker_stop(device); vkd3d_free(device->heaps); + vkd3d_root_signature_cache_cleanup(device->root_signature_cache, device); vkd3d_shader_cache_close(device->persistent_cache); VK_CALL(vkDestroyDevice(device->vk_device, NULL)); if (device->parent) @@ -4438,9 +4439,13 @@ static HRESULT d3d12_device_init(struct d3d12_device *device, } vkd3d_free(cache_name);

- if (FAILED(hr = d3d12_device_init_pipeline_cache(device))) + device->root_signature_cache = vkd3d_root_signature_cache_init(device); + if (!device->root_signature_cache) goto out_free_cache;

+ if (FAILED(hr = d3d12_device_init_pipeline_cache(device))) + goto out_free_cache2; + if (FAILED(hr = vkd3d_private_store_init(&device->private_store))) goto out_free_pipeline_cache;

@@ -4494,6 +4499,8 @@ out_free_private_store: vkd3d_private_store_destroy(&device->private_store); out_free_pipeline_cache: d3d12_device_destroy_pipeline_cache(device); +out_free_cache2: + vkd3d_shader_cache_close(device->root_signature_cache); out_free_cache: vkd3d_shader_cache_close(device->persistent_cache); out_free_vk_resources: diff --git a/libs/vkd3d/state.c b/libs/vkd3d/state.c index 5fa4da3e4..c8da18c3e 100644 --- a/libs/vkd3d/state.c +++ b/libs/vkd3d/state.c @@ -55,6 +55,12 @@ static ULONG STDMETHODCALLTYPE d3d12_root_signature_AddRef(ID3D12RootSignature * ULONG refcount = InterlockedIncrement(&root_signature->refcount);

TRACE("%p increasing refcount to %u.\n", root_signature, refcount); + if (refcount == 1) + { + if (FAILED(vkd3d_private_store_init(&root_signature->private_store))) + ERR("mama!!!\n"); + d3d12_device_add_ref(root_signature->device); + }

return refcount; } @@ -117,10 +123,8 @@ static ULONG STDMETHODCALLTYPE d3d12_root_signature_Release(ID3D12RootSignature if (!refcount) { struct d3d12_device *device = root_signature->device; - vkd3d_private_store_destroy(&root_signature->private_store); - d3d12_root_signature_cleanup(root_signature, device); - vkd3d_free(root_signature); d3d12_device_release(device); + vkd3d_private_store_destroy(&root_signature->private_store); }

return refcount; @@ -1392,7 +1396,7 @@ static HRESULT d3d12_root_signature_init(struct d3d12_root_signature *root_signa binding_desc = NULL;

root_signature->ID3D12RootSignature_iface.lpVtbl = &d3d12_root_signature_vtbl; - root_signature->refcount = 1; + root_signature->refcount = 0;

root_signature->vk_pipeline_layout = VK_NULL_HANDLE; root_signature->vk_set_count = 0; @@ -1492,11 +1496,6 @@ static HRESULT d3d12_root_signature_init(struct d3d12_root_signature *root_signa root_signature->push_constant_ranges, &root_signature->vk_pipeline_layout))) goto fail;

- if (FAILED(hr = vkd3d_private_store_init(&root_signature->private_store))) - goto fail; - - d3d12_device_add_ref(device); - return S_OK;

fail: @@ -1515,9 +1514,20 @@ HRESULT d3d12_root_signature_create(struct d3d12_device *device, struct vkd3d_shader_versioned_root_signature_desc vkd3d; } root_signature_desc; struct d3d12_root_signature *object; + uint32_t size = sizeof(object); HRESULT hr; int ret;

+ ret = vkd3d_shader_cache_get(device->root_signature_cache, bytecode, bytecode_length, + &object, &size); + if (ret == VKD3D_OK) + { + ERR("found cached root sig\n"); + *root_signature = object; + d3d12_root_signature_AddRef(&object->ID3D12RootSignature_iface); + return S_OK; + } + if ((ret = vkd3d_parse_root_signature_v_1_0(&dxbc, &root_signature_desc.vkd3d)) < 0) { WARN("Failed to parse root signature, vkd3d result %d.\n", ret); @@ -1540,11 +1550,52 @@ HRESULT d3d12_root_signature_create(struct d3d12_device *device,

TRACE("Created root signature %p.\n", object);

+ ret = vkd3d_shader_cache_put(device->root_signature_cache, bytecode, bytecode_length, + &object, size); + if (ret) + ERR("papa!\n"); + *root_signature = object; + d3d12_root_signature_AddRef(&object->ID3D12RootSignature_iface);

return S_OK; }

+struct vkd3d_shader_cache *vkd3d_root_signature_cache_init(struct d3d12_device *device) +{ + struct vkd3d_shader_cache_desc cache_desc = {0}; + struct vkd3d_shader_cache *cache; + char cache_name[64]; + + cache_desc.mem_size = ~0u; + cache_desc.max_entries = ~0u; + cache_desc.version = 0; + cache_desc.disk_size = 0; + + sprintf(cache_name, "memory:%p:root signatures", device); + if (vkd3d_shader_cache_open(cache_name, &cache_desc, &cache)) + return NULL; + + return cache; +} + +static bool vkd3d_rs_cache_cleanup(const void *key, uint32_t key_size, + const void *data, uint32_t data_size, void *context) +{ + struct d3d12_root_signature *root_signature = *(struct d3d12_root_signature **)data; + struct d3d12_device *device = context; + + d3d12_root_signature_cleanup(root_signature, device); + vkd3d_free(root_signature); + return true; +} + +void vkd3d_root_signature_cache_cleanup(struct vkd3d_shader_cache *cache, struct d3d12_device *device) +{ + vkd3d_shader_cache_enumerate(cache, vkd3d_rs_cache_cleanup, device); + vkd3d_shader_cache_close(device->root_signature_cache); +} + /* vkd3d_render_pass_cache */ struct vkd3d_render_pass_entry { diff --git a/libs/vkd3d/vkd3d_private.h b/libs/vkd3d/vkd3d_private.h index 5144fa2b1..b4fef5b8f 100644 --- a/libs/vkd3d/vkd3d_private.h +++ b/libs/vkd3d/vkd3d_private.h @@ -578,6 +578,10 @@ struct vkd3d_render_pass_entry;

struct vkd3d_shader_cache *vkd3d_render_pass_cache_init(struct d3d12_device *device); void vkd3d_render_pass_cache_cleanup(struct vkd3d_shader_cache *cache, struct d3d12_device *device); +HRESULT vkd3d_render_pass_cache_find(struct vkd3d_shader_cache *cache, struct d3d12_device *device, + const struct vkd3d_render_pass_key *key, VkRenderPass *vk_render_pass); +struct vkd3d_shader_cache *vkd3d_root_signature_cache_init(struct d3d12_device *device); +void vkd3d_root_signature_cache_cleanup(struct vkd3d_shader_cache *cache, struct d3d12_device *device); HRESULT vkd3d_render_pass_cache_find(struct vkd3d_shader_cache *cache, struct d3d12_device *device, const struct vkd3d_render_pass_key *key, VkRenderPass *vk_render_pass);

@@ -1813,6 +1817,7 @@ struct d3d12_device struct vkd3d_mutex pipeline_cache_mutex; struct vkd3d_shader_cache *persistent_cache; struct vkd3d_shader_cache *render_pass_cache; + struct vkd3d_shader_cache *root_signature_cache; VkPipelineCache vk_pipeline_cache;

VkPhysicalDeviceMemoryProperties memory_properties; diff --git a/tests/d3d12.c b/tests/d3d12.c index 1126d9749..552c864aa 100644 --- a/tests/d3d12.c +++ b/tests/d3d12.c @@ -2674,9 +2674,9 @@ static void test_create_root_signature(void) * heap manager reuses the allocation. */ hr = create_root_signature(device, &root_signature_desc, &root_signature2); ok(hr == S_OK, "Failed to create root signature, hr %#x.\n", hr); - todo ok(root_signature == root_signature2, "Got different root signature pointers.\n"); + ok(root_signature == root_signature2, "Got different root signature pointers.\n"); refcount = ID3D12RootSignature_Release(root_signature2); - todo ok(refcount == 1, "ID3D12RootSignature has %u references left.\n", (unsigned int)refcount); + ok(refcount == 1, "ID3D12RootSignature has %u references left.\n", (unsigned int)refcount);

hr = 0xdeadbeef; hr = ID3D12RootSignature_SetPrivateData(root_signature, &test_guid, sizeof(hr), &hr); @@ -2728,9 +2728,9 @@ static void test_create_root_signature(void)

hr = create_root_signature(device, &root_signature_desc, &root_signature2); ok(hr == S_OK, "Failed to create root signature, hr %#x.\n", hr); - todo ok(root_signature == root_signature2, "Got different root signature pointers.\n"); + ok(root_signature == root_signature2, "Got different root signature pointers.\n"); refcount = ID3D12RootSignature_Release(root_signature2); - todo ok(refcount == 1, "ID3D12RootSignature has %u references left.\n", (unsigned int)refcount); + ok(refcount == 1, "ID3D12RootSignature has %u references left.\n", (unsigned int)refcount);

refcount = ID3D12RootSignature_Release(root_signature); ok(!refcount, "ID3D12RootSignature has %u references left.\n", (unsigned int)refcount);

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Stefan Dösinger

9:17 p.m.

New subject: [PATCH 15/17] vkd3d: Precreate root signatures from cache

From: Stefan Dösinger stefan@codeweavers.com

--- libs/vkd3d/cache.c | 2 +- libs/vkd3d/device.c | 7 +++++++ libs/vkd3d/state.c | 35 +++++++++++++++++++++++++++++++---- libs/vkd3d/vkd3d_private.h | 11 +++++++++++ 4 files changed, 50 insertions(+), 5 deletions(-)

diff --git a/libs/vkd3d/cache.c b/libs/vkd3d/cache.c index 6e16ecfb5..7ba55e34d 100644 --- a/libs/vkd3d/cache.c +++ b/libs/vkd3d/cache.c @@ -395,7 +395,7 @@ static inline uint64_t mvkHash64(const uint64_t *pVals, size_t count, uint64_t s return hash; }

-static uint64_t hash_key(const void *key, size_t size) +uint64_t hash_key(const void *key, size_t size) { uint64_t last = 0, ret;

diff --git a/libs/vkd3d/device.c b/libs/vkd3d/device.c index 51effbef4..a608befd7 100644 --- a/libs/vkd3d/device.c +++ b/libs/vkd3d/device.c @@ -4364,6 +4364,7 @@ static void *device_worker_main(void *arg) static bool d3d12_device_load_cache(const void *key, uint32_t key_size, const void *value, uint32_t value_size, void *context) { + const struct vkd3d_shader_cache_root_signature *rs; const struct vkd3d_shader_cache_entry *e = value; struct d3d12_device *device = context; VkRenderPass rp; @@ -4378,6 +4379,12 @@ static bool d3d12_device_load_cache(const void *key, uint32_t key_size, vkd3d_render_pass_cache_find(device->render_pass_cache, device, key, &rp); break;

+ case SHADER_CACHE_ENTRY_ROOT_SIGNATURE: + rs = value; + d3d12_root_signature_create(device, rs->dxbc, value_size + - offsetof(struct vkd3d_shader_cache_root_signature, dxbc[0]), NULL); + break; + case SHADER_CACHE_ENTRY_VULKAN_BLOB: break; } diff --git a/libs/vkd3d/state.c b/libs/vkd3d/state.c index c8da18c3e..f5b71bc82 100644 --- a/libs/vkd3d/state.c +++ b/libs/vkd3d/state.c @@ -1513,6 +1513,7 @@ HRESULT d3d12_root_signature_create(struct d3d12_device *device, D3D12_VERSIONED_ROOT_SIGNATURE_DESC d3d12; struct vkd3d_shader_versioned_root_signature_desc vkd3d; } root_signature_desc; + struct vkd3d_shader_cache_root_signature *cache_value; struct d3d12_root_signature *object; uint32_t size = sizeof(object); HRESULT hr; @@ -1523,8 +1524,13 @@ HRESULT d3d12_root_signature_create(struct d3d12_device *device, if (ret == VKD3D_OK) { ERR("found cached root sig\n"); - *root_signature = object; - d3d12_root_signature_AddRef(&object->ID3D12RootSignature_iface); + if (root_signature) + { + *root_signature = object; + d3d12_root_signature_AddRef(&object->ID3D12RootSignature_iface); + } + else + ERR("Why do I create a cached root sig twice?\n"); return S_OK; }

@@ -1555,8 +1561,29 @@ HRESULT d3d12_root_signature_create(struct d3d12_device *device, if (ret) ERR("papa!\n");

- *root_signature = object; - d3d12_root_signature_AddRef(&object->ID3D12RootSignature_iface); + /* Why the hash as key and d3d root signature description as value? Because we store + * the root signature hash in pipelines and need a way to look up the root signature + * when we recreate the pipelines. + * + * Alternatively we could use bytecode as key here and store a hash -> bytecode lookup + * at runtime in device->root_signature_cache. I am unsure for now. */ + object->hash = hash_key(bytecode, bytecode_length); + size = offsetof(struct vkd3d_shader_cache_root_signature, dxbc[bytecode_length]); + cache_value = vkd3d_malloc(size); + cache_value->header.vkd3d_revision = VKD3D_SHADER_CACHE_VKD3D_VERSION; + cache_value->header.type = SHADER_CACHE_ENTRY_ROOT_SIGNATURE; + memcpy(cache_value->dxbc, bytecode, bytecode_length); + ret = vkd3d_shader_cache_put(device->persistent_cache, &object->hash, sizeof(object->hash), + cache_value, size); + if (ret) + ERR("uncle!\n"); + vkd3d_free(cache_value); + + if (root_signature) + { + *root_signature = object; + d3d12_root_signature_AddRef(&object->ID3D12RootSignature_iface); + }

return S_OK; } diff --git a/libs/vkd3d/vkd3d_private.h b/libs/vkd3d/vkd3d_private.h index b4fef5b8f..57de059af 100644 --- a/libs/vkd3d/vkd3d_private.h +++ b/libs/vkd3d/vkd3d_private.h @@ -69,6 +69,7 @@ struct vkd3d_render_pass_key enum vkd3d_shader_cache_entry_type { SHADER_CACHE_ENTRY_RENDER_PASS = VKD3D_MAKE_TAG('R', 'P', 'A', 'S'), + SHADER_CACHE_ENTRY_ROOT_SIGNATURE = VKD3D_MAKE_TAG('R', 'O', 'O', 'T'), SHADER_CACHE_ENTRY_VULKAN_BLOB = VKD3D_MAKE_TAG('V', 'K', 'P', 'C'), };

@@ -84,8 +85,17 @@ struct vkd3d_shader_cache_vk_blob uint8_t blob[1]; };

+struct vkd3d_shader_cache_root_signature +{ + struct vkd3d_shader_cache_entry header; + uint8_t dxbc[1]; +}; + /* End shader data structures */

+/* FIXME: Better name. */ +uint64_t hash_key(const void *key, size_t size); + #define VK_CALL(f) (vk_procs->f)

#define VKD3D_DESCRIPTOR_MAGIC_FREE 0x00000000u @@ -1212,6 +1222,7 @@ struct d3d12_root_signature { ID3D12RootSignature ID3D12RootSignature_iface; LONG refcount; + uint64_t hash;

VkPipelineLayout vk_pipeline_layout; struct d3d12_descriptor_set_layout descriptor_set_layouts[VKD3D_MAX_DESCRIPTOR_SETS];

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Stefan Dösinger

9:17 p.m.

New subject: [PATCH 16/17] Store graphics pipelines in the cache.

From: Stefan Dösinger stefan@codeweavers.com

--- libs/vkd3d/state.c | 153 +++++++++++++++++++++++++++++++++++++ libs/vkd3d/vkd3d_private.h | 62 +++++++++++++++ 2 files changed, 215 insertions(+)

diff --git a/libs/vkd3d/state.c b/libs/vkd3d/state.c index f5b71bc82..c30f970c1 100644 --- a/libs/vkd3d/state.c +++ b/libs/vkd3d/state.c @@ -2456,6 +2456,133 @@ static HRESULT d3d12_pipeline_state_find_and_init_uav_counters(struct d3d12_pipe return hr; }

+static struct vkd3d_shader_cache_pipeline_state *vkd3d_cache_pipeline_from_d3d( + const struct d3d12_pipeline_state_desc *desc, + const struct d3d12_root_signature *root_signature, uint32_t *entry_size) +{ + struct vkd3d_shader_cache_pipeline_state *entry; + uint32_t size, pos = 0, i; + + size = desc->cs.BytecodeLength; + size += desc->vs.BytecodeLength; + size += desc->ps.BytecodeLength; + size += desc->ds.BytecodeLength; + size += desc->hs.BytecodeLength; + size += desc->gs.BytecodeLength; + size += desc->stream_output.NumEntries * sizeof(struct vkd3d_so_declaration_cache_entry); + size += desc->stream_output.NumStrides * sizeof(*desc->stream_output.pBufferStrides); + /* FIXME: Dynamically handle semantic strings */ + size += desc->input_layout.NumElements * sizeof(struct vkd3d_input_layout_element_cache); + + *entry_size = offsetof(struct vkd3d_shader_cache_pipeline_state, data[size]); + entry = vkd3d_calloc(1, *entry_size); + + entry->super.vkd3d_revision = VKD3D_SHADER_CACHE_VKD3D_VERSION; + entry->super.type = 0; + + entry->root_signature = root_signature->hash; + + entry->cs_size = desc->cs.BytecodeLength; + if (entry->cs_size) + { + memcpy(entry->data + pos, desc->cs.pShaderBytecode, entry->cs_size); + pos += entry->cs_size; + } + + entry->vs_size = desc->vs.BytecodeLength; + if (entry->vs_size) + { + memcpy(entry->data + pos, desc->vs.pShaderBytecode, entry->vs_size); + pos += entry->vs_size; + } + + entry->ps_size = desc->ps.BytecodeLength; + if (entry->ps_size) + { + memcpy(entry->data + pos, desc->ps.pShaderBytecode, entry->ps_size); + pos += entry->ps_size; + } + + entry->ds_size = desc->ds.BytecodeLength; + if (entry->ds_size) + { + memcpy(entry->data + pos, desc->ds.pShaderBytecode, entry->ds_size); + pos += entry->ds_size; + } + + entry->hs_size = desc->hs.BytecodeLength; + if (entry->hs_size) + { + memcpy(entry->data + pos, desc->hs.pShaderBytecode, entry->hs_size); + pos += entry->hs_size; + } + + entry->gs_size = desc->gs.BytecodeLength; + if (entry->gs_size) + { + memcpy(entry->data + pos, desc->gs.pShaderBytecode, entry->gs_size); + pos += entry->gs_size; + } + + entry->so_entries = desc->stream_output.NumEntries; + for (i = 0; i < entry->so_entries; ++i) + { + struct vkd3d_so_declaration_cache_entry *e = (void *)(entry->data + pos); + e->stream = desc->stream_output.pSODeclaration[i].Stream; + strncpy(e->semantic_name, desc->stream_output.pSODeclaration[i].SemanticName, 32); + e->semantic_name[31] = 0; + e->semantic_index = desc->stream_output.pSODeclaration[i].SemanticIndex; + e->start_component = desc->stream_output.pSODeclaration[i].StartComponent; + e->component_count = desc->stream_output.pSODeclaration[i].ComponentCount; + e->output_slot = desc->stream_output.pSODeclaration[i].OutputSlot; + + if (strlen(desc->stream_output.pSODeclaration[i].SemanticName) > 31) + FIXME("Output semantic name too long\n"); + + pos += sizeof(*e); + } + entry->so_strides = desc->stream_output.NumStrides; + if (entry->so_strides) + { + memcpy(entry->data + pos, desc->stream_output.pBufferStrides, + sizeof(*desc->stream_output.pBufferStrides) * entry->so_strides); + pos += sizeof(*desc->stream_output.pBufferStrides) * entry->so_strides; + } + + entry->input_layout_elements = desc->input_layout.NumElements; + for (i = 0; i < entry->input_layout_elements; ++i) + { + struct vkd3d_input_layout_element_cache *e = (void *)(entry->data + pos); + strncpy(e->semantic_name, desc->input_layout.pInputElementDescs[i].SemanticName, 32); + e->semantic_name[31] = 0; + e->semantic_index = desc->input_layout.pInputElementDescs[i].SemanticIndex; + e->format = desc->input_layout.pInputElementDescs[i].Format; + e->input_slot = desc->input_layout.pInputElementDescs[i].InputSlot; + e->aligned_byte_offset = desc->input_layout.pInputElementDescs[i].AlignedByteOffset; + e->input_slot_class = desc->input_layout.pInputElementDescs[i].InputSlotClass; + e->instance_data_step_rate = desc->input_layout.pInputElementDescs[i].InstanceDataStepRate; + + if (strlen(desc->input_layout.pInputElementDescs[i].SemanticName) > 31) + FIXME("Input semantic name too long\n"); + + pos += sizeof(*e); + } + + entry->blend_state = desc->blend_state; + entry->sample_mask = desc->sample_mask; + entry->rasterizer_state = desc->rasterizer_state; + entry->depth_stencil_state = desc->depth_stencil_state; + entry->strip_cut_value = desc->strip_cut_value; + entry->primitive_topology_type = desc->primitive_topology_type; + entry->rtv_formats = desc->rtv_formats; + entry->dsv_format = desc->dsv_format; + entry->sample_desc = desc->sample_desc; + entry->node_mask = desc->node_mask; + entry->flags = desc->flags; + + return entry; +} + static HRESULT d3d12_pipeline_state_init_compute(struct d3d12_pipeline_state *state, struct d3d12_device *device, const struct d3d12_pipeline_state_desc *desc) { @@ -3018,6 +3145,7 @@ static HRESULT d3d12_pipeline_state_init_graphics(struct d3d12_pipeline_state *s uint32_t aligned_offsets[D3D12_VS_INPUT_REGISTER_COUNT]; struct vkd3d_shader_descriptor_offset_info offset_info; struct vkd3d_shader_parameter ps_shader_parameters[1]; + struct vkd3d_shader_cache_pipeline_state *cache_entry; struct vkd3d_shader_transform_feedback_info xfb_info; struct vkd3d_shader_spirv_target_info ps_target_info; struct vkd3d_shader_interface_info shader_interface; @@ -3030,6 +3158,7 @@ static HRESULT d3d12_pipeline_state_init_graphics(struct d3d12_pipeline_state *s const struct vkd3d_format *format; unsigned int instance_divisor; VkVertexInputRate input_rate; + uint32_t cache_entry_size; unsigned int i, j; size_t rt_count; uint32_t mask; @@ -3535,6 +3664,18 @@ static HRESULT d3d12_pipeline_state_init_graphics(struct d3d12_pipeline_state *s state->vk_bind_point = VK_PIPELINE_BIND_POINT_GRAPHICS; d3d12_device_add_ref(state->device = device);

+ cache_entry = vkd3d_cache_pipeline_from_d3d(desc, root_signature, &cache_entry_size); + if (cache_entry) + { + uint64_t hash; + cache_entry->super.type = SHADER_CACHE_ENTRY_GRAPHICS_STATE; + hash = hash_key(cache_entry, cache_entry_size); + vkd3d_shader_cache_put(device->persistent_cache, &hash, sizeof(hash), + cache_entry, cache_entry_size); + vkd3d_free(cache_entry); + state->state_hash = hash; + } + return S_OK;

fail: @@ -3755,6 +3896,8 @@ VkPipeline d3d12_pipeline_state_get_or_create_pipeline(struct d3d12_pipeline_sta struct d3d12_graphics_pipeline_state *graphics = &state->u.graphics; VkPipelineVertexInputDivisorStateCreateInfoEXT input_divisor_info; VkPipelineTessellationStateCreateInfo tessellation_info; + struct vkd3d_graphics_pipeline_key persistent_key = {0}; + struct vkd3d_graphics_pipeline_entry cache_entry = {0}; VkPipelineVertexInputStateCreateInfo input_desc; VkPipelineInputAssemblyStateCreateInfo ia_desc; VkPipelineColorBlendStateCreateInfo blend_desc; @@ -3821,12 +3964,17 @@ VkPipeline d3d12_pipeline_state_get_or_create_pipeline(struct d3d12_pipeline_sta b->inputRate = graphics->input_rates[binding];

pipeline_key.strides[binding_count] = strides[binding]; + persistent_key.strides[binding] = strides[binding];

++binding_count; }

pipeline_key.dsv_format = dsv_format;

+ persistent_key.state = state->state_hash; + persistent_key.topology = topology; + persistent_key.dsv_format = dsv_format; + if ((vk_pipeline = d3d12_pipeline_state_find_compiled_pipeline(state, &pipeline_key, vk_render_pass))) return vk_pipeline;

@@ -3918,6 +4066,11 @@ VkPipeline d3d12_pipeline_state_get_or_create_pipeline(struct d3d12_pipeline_sta return VK_NULL_HANDLE; }

+ cache_entry.super.vkd3d_revision = VKD3D_SHADER_CACHE_VKD3D_VERSION; + cache_entry.super.type = SHADER_CACHE_ENTRY_GRAPHICS_PIPELINE; + vkd3d_shader_cache_put(device->persistent_cache, &persistent_key, sizeof(persistent_key), + &cache_entry, sizeof(cache_entry)); + if (d3d12_pipeline_state_put_pipeline_to_cache(state, &pipeline_key, vk_pipeline, pipeline_desc.renderPass)) return vk_pipeline;

diff --git a/libs/vkd3d/vkd3d_private.h b/libs/vkd3d/vkd3d_private.h index 57de059af..b7cc0af05 100644 --- a/libs/vkd3d/vkd3d_private.h +++ b/libs/vkd3d/vkd3d_private.h @@ -68,6 +68,9 @@ struct vkd3d_render_pass_key

enum vkd3d_shader_cache_entry_type { + SHADER_CACHE_ENTRY_COMPUTE_STATE = VKD3D_MAKE_TAG('C', 'O', 'M', 'P'), + SHADER_CACHE_ENTRY_GRAPHICS_PIPELINE = VKD3D_MAKE_TAG('G', 'F', 'X', 'P'), + SHADER_CACHE_ENTRY_GRAPHICS_STATE = VKD3D_MAKE_TAG('G', 'F', 'X', 'S'), SHADER_CACHE_ENTRY_RENDER_PASS = VKD3D_MAKE_TAG('R', 'P', 'A', 'S'), SHADER_CACHE_ENTRY_ROOT_SIGNATURE = VKD3D_MAKE_TAG('R', 'O', 'O', 'T'), SHADER_CACHE_ENTRY_VULKAN_BLOB = VKD3D_MAKE_TAG('V', 'K', 'P', 'C'), @@ -91,6 +94,64 @@ struct vkd3d_shader_cache_root_signature uint8_t dxbc[1]; };

+struct vkd3d_input_layout_element_cache +{ + char semantic_name[32]; /* Not a proper solution */ + UINT semantic_index; + DXGI_FORMAT format; + UINT input_slot; + UINT aligned_byte_offset; + D3D12_INPUT_CLASSIFICATION input_slot_class; + UINT instance_data_step_rate; +}; + +struct vkd3d_so_declaration_cache_entry +{ + UINT stream; + char semantic_name[32]; /* Not a proper solution */ + UINT semantic_index; + BYTE start_component; + BYTE component_count; + BYTE output_slot; +}; + +struct vkd3d_shader_cache_pipeline_state +{ + struct vkd3d_shader_cache_entry super; + uint64_t root_signature; + uint32_t cs_size, vs_size, ps_size, ds_size, hs_size, gs_size; + uint32_t so_entries, so_strides; + uint32_t so_RasterizedStream; + uint32_t input_layout_elements; + D3D12_BLEND_DESC blend_state; + UINT sample_mask; + D3D12_RASTERIZER_DESC rasterizer_state; + D3D12_DEPTH_STENCIL_DESC1 depth_stencil_state; + /* Input layout is appended */ + D3D12_INDEX_BUFFER_STRIP_CUT_VALUE strip_cut_value; + D3D12_PRIMITIVE_TOPOLOGY_TYPE primitive_topology_type; + struct D3D12_RT_FORMAT_ARRAY rtv_formats; + DXGI_FORMAT dsv_format; + DXGI_SAMPLE_DESC sample_desc; + UINT node_mask; + D3D12_PIPELINE_STATE_FLAGS flags; + uint8_t data[1]; +}; + +struct vkd3d_graphics_pipeline_key +{ + uint64_t state; + D3D12_PRIMITIVE_TOPOLOGY topology; + VkFormat dsv_format; + uint32_t strides[D3D12_IA_VERTEX_INPUT_RESOURCE_SLOT_COUNT]; +}; + +struct vkd3d_graphics_pipeline_entry +{ + struct vkd3d_shader_cache_entry super; + /* TODO: Translated spir-v code */ +}; + /* End shader data structures */

/* FIXME: Better name. */ @@ -1339,6 +1400,7 @@ struct d3d12_pipeline_state struct d3d12_compute_pipeline_state compute; } u; VkPipelineBindPoint vk_bind_point; + uint64_t state_hash;

struct d3d12_pipeline_uav_counter_state uav_counters;

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Stefan Dösinger

9:17 p.m.

New subject: [PATCH 17/17] vkd3d: Catch and release graphics pipelines.

From: Stefan Dösinger stefan@codeweavers.com

--- libs/vkd3d/device.c | 189 +++++++++++++++++++++++++++++++++++++ libs/vkd3d/state.c | 2 +- libs/vkd3d/vkd3d_private.h | 2 + 3 files changed, 192 insertions(+), 1 deletion(-)

diff --git a/libs/vkd3d/device.c b/libs/vkd3d/device.c index a608befd7..906f6b710 100644 --- a/libs/vkd3d/device.c +++ b/libs/vkd3d/device.c @@ -4385,6 +4385,12 @@ static bool d3d12_device_load_cache(const void *key, uint32_t key_size, - offsetof(struct vkd3d_shader_cache_root_signature, dxbc[0]), NULL); break;

+ case SHADER_CACHE_ENTRY_COMPUTE_STATE: + case SHADER_CACHE_ENTRY_GRAPHICS_PIPELINE: + case SHADER_CACHE_ENTRY_GRAPHICS_STATE: + /* These are handled in a second pass */ + break; + case SHADER_CACHE_ENTRY_VULKAN_BLOB: break; } @@ -4392,6 +4398,188 @@ static bool d3d12_device_load_cache(const void *key, uint32_t key_size, return true; }

+static bool d3d12_device_load_cache2(const void *key, uint32_t key_size, + const void *value, uint32_t value_size, void *context) +{ + const struct vkd3d_shader_cache_entry *e = value; + struct vkd3d_shader_cache_root_signature *rs; + struct d3d12_root_signature *d3d12_root_sig; + struct vkd3d_shader_cache_pipeline_state *s; + const struct vkd3d_graphics_pipeline_key *k; + D3D12_INPUT_ELEMENT_DESC *il_element = NULL; + D3D12_SO_DECLARATION_ENTRY *so_decl = NULL; + struct d3d12_pipeline_state_desc desc; + struct d3d12_device *device = context; + struct d3d12_pipeline_state *object; + uint32_t size, size2, pos = 0; + enum vkd3d_result ret; + unsigned int i; + HRESULT hr; + + TRACE("device %p got entry type (%c%c%c%c)\n", device, + e->type & 0xff, e->type >> 8 & 0xff, e->type >> 16 & 0xff, + e->type >> 24 & 0xff); + + switch (e->type) + { + case SHADER_CACHE_ENTRY_RENDER_PASS: + case SHADER_CACHE_ENTRY_ROOT_SIGNATURE: + case SHADER_CACHE_ENTRY_VULKAN_BLOB: + /* Handled already */ + break; + + case SHADER_CACHE_ENTRY_COMPUTE_STATE: + /* TODO */ + break; + + case SHADER_CACHE_ENTRY_GRAPHICS_STATE: + /* Ignore, look it up when handling the full state */ + break; + + case SHADER_CACHE_ENTRY_GRAPHICS_PIPELINE: + k = key; + ret = vkd3d_shader_cache_get(device->persistent_cache, &k->state, sizeof(k->state), + NULL, &size); + if (ret) + { + FIXME("Did not find graphics state\n"); + break; + } + + s = vkd3d_malloc(size); + if (!s) + break; + ret = vkd3d_shader_cache_get(device->persistent_cache, &k->state, sizeof(k->state), + s, &size); + if (ret) + ERR("whut?\n"); + + ret = vkd3d_shader_cache_get(device->persistent_cache, &s->root_signature, sizeof(s->root_signature), + NULL, &size); + if (ret) + { + FIXME("Did not find root signature %lx for graphics pipeline\n", s->root_signature); + vkd3d_free(s); + break; + } + rs = vkd3d_malloc(size); + ret = vkd3d_shader_cache_get(device->persistent_cache, &s->root_signature, sizeof(s->root_signature), + rs, &size); + if (ret) + ERR("whut?\n"); + + size2 = sizeof(d3d12_root_sig); + ret = vkd3d_shader_cache_get(device->root_signature_cache, rs->dxbc, + size - offsetof(struct vkd3d_shader_cache_root_signature, dxbc[0]), + &d3d12_root_sig, &size2); + vkd3d_free(rs); + if (ret) + { + ERR("whut 2? Did not find root sig of hash %lx %d\n", s->root_signature, ret); + // return; + } + + memset(&desc, 0, sizeof(desc)); + desc.root_signature = &d3d12_root_sig->ID3D12RootSignature_iface; + + desc.vs.BytecodeLength = s->vs_size; + desc.vs.pShaderBytecode = s->vs_size ? s->data + pos : NULL; + pos += s->vs_size; + desc.ps.BytecodeLength = s->ps_size; + desc.ps.pShaderBytecode = s->ps_size ? s->data + pos : NULL; + pos += s->ps_size; + desc.ds.BytecodeLength = s->ds_size; + desc.ds.pShaderBytecode = s->ds_size ? s->data + pos : NULL; + pos += s->ds_size; + desc.hs.BytecodeLength = s->hs_size; + desc.hs.pShaderBytecode = s->hs_size ? s->data + pos : NULL; + pos += s->hs_size; + desc.gs.BytecodeLength = s->gs_size; + desc.gs.pShaderBytecode = s->gs_size ? s->data + pos : NULL; + pos += s->gs_size; + + desc.stream_output.NumEntries = s->so_entries; + if (s->so_entries) + { + so_decl = vkd3d_malloc(sizeof(*so_decl) * s->so_entries); + for (i = 0; i < s->so_entries; ++i) + { + struct vkd3d_so_declaration_cache_entry *sod = (void *)(s->data + pos); + so_decl[i].Stream = sod->stream; + so_decl[i].SemanticName = sod->semantic_name; + so_decl[i].SemanticIndex = sod->semantic_index; + so_decl[i].StartComponent = sod->start_component; + so_decl[i].ComponentCount = sod->component_count; + so_decl[i].OutputSlot = sod->output_slot; + pos += sizeof(*sod); + } + desc.stream_output.pSODeclaration = so_decl; + } + desc.stream_output.NumStrides = s->so_strides; + desc.stream_output.pBufferStrides = (void *)(s->data + pos); + pos += s->so_strides * sizeof(*desc.stream_output.pBufferStrides); + desc.stream_output.RasterizedStream = s->so_RasterizedStream; + + desc.blend_state = s->blend_state; + desc.sample_mask = s->sample_mask; + desc.rasterizer_state = s->rasterizer_state; + desc.depth_stencil_state = s->depth_stencil_state; + + desc.input_layout.NumElements = s->input_layout_elements; + if (s->input_layout_elements) + { + il_element = vkd3d_malloc(sizeof(*il_element) * s->input_layout_elements); + for (i = 0; i < s->input_layout_elements; ++i) + { + struct vkd3d_input_layout_element_cache *ile = (void *)(s->data + pos); + il_element[i].SemanticName = ile->semantic_name; + il_element[i].SemanticIndex = ile->semantic_index; + il_element[i].Format = ile->format; + il_element[i].InputSlot = ile->input_slot; + il_element[i].AlignedByteOffset = ile->aligned_byte_offset; + il_element[i].InputSlotClass = ile->input_slot_class; + il_element[i].InstanceDataStepRate = ile->instance_data_step_rate; + pos += sizeof(*ile); + } + desc.input_layout.pInputElementDescs = il_element; + } + + desc.strip_cut_value = s->strip_cut_value; + desc.primitive_topology_type = s->primitive_topology_type; + desc.rtv_formats = s->rtv_formats; + desc.dsv_format = s->dsv_format; + desc.sample_desc = s->sample_desc; + desc.node_mask = s->node_mask; + desc.flags = s->flags; + + if (!(object = vkd3d_malloc(sizeof(*object)))) + ERR("meh\n"); + /* We're happy with just creating and destroying it for now. It will feed the vulkan + * pipeline cache, which should re-use the pipeline when the game creates it for actual + * use later. + * + * FIXME: The manipulation of the device refcount in init() and Release() makes it + * unsafe to move this function to a separate thread. We might hold and release the + * last reference to the device. */ + hr = d3d12_pipeline_state_init_graphics(object, device, &desc); + if (SUCCEEDED(hr)) + { + VkRenderPass pass; + VkPipeline p = d3d12_pipeline_state_get_or_create_pipeline(object, + k->topology, k->strides, k->dsv_format, &pass); + TRACE("got render pass %lx\n", p); + ID3D12PipelineState_Release(&object->ID3D12PipelineState_iface); + } + + vkd3d_free(so_decl); + vkd3d_free(il_element); + vkd3d_free(s); + break; + } + + return true; +} + static HRESULT d3d12_device_init(struct d3d12_device *device, struct vkd3d_instance *instance, const struct vkd3d_device_create_info *create_info) { @@ -4488,6 +4676,7 @@ static HRESULT d3d12_device_init(struct d3d12_device *device, device_init_descriptor_pool_sizes(device);

vkd3d_shader_cache_enumerate(device->persistent_cache, d3d12_device_load_cache, device); + vkd3d_shader_cache_enumerate(device->persistent_cache, d3d12_device_load_cache2, device);

if ((device->parent = create_info->parent)) IUnknown_AddRef(device->parent); diff --git a/libs/vkd3d/state.c b/libs/vkd3d/state.c index c30f970c1..565b62d83 100644 --- a/libs/vkd3d/state.c +++ b/libs/vkd3d/state.c @@ -3131,7 +3131,7 @@ static VkLogicOp vk_logic_op_from_d3d12(D3D12_LOGIC_OP op) } }

-static HRESULT d3d12_pipeline_state_init_graphics(struct d3d12_pipeline_state *state, +HRESULT d3d12_pipeline_state_init_graphics(struct d3d12_pipeline_state *state, struct d3d12_device *device, const struct d3d12_pipeline_state_desc *desc) { unsigned int ps_output_swizzle[D3D12_SIMULTANEOUS_RENDER_TARGET_COUNT]; diff --git a/libs/vkd3d/vkd3d_private.h b/libs/vkd3d/vkd3d_private.h index b7cc0af05..41c03fbe5 100644 --- a/libs/vkd3d/vkd3d_private.h +++ b/libs/vkd3d/vkd3d_private.h @@ -1461,6 +1461,8 @@ HRESULT d3d12_pipeline_state_create_compute(struct d3d12_device *device, const D3D12_COMPUTE_PIPELINE_STATE_DESC *desc, struct d3d12_pipeline_state **state); HRESULT d3d12_pipeline_state_create_graphics(struct d3d12_device *device, const D3D12_GRAPHICS_PIPELINE_STATE_DESC *desc, struct d3d12_pipeline_state **state); +HRESULT d3d12_pipeline_state_init_graphics(struct d3d12_pipeline_state *state, + struct d3d12_device *device, const struct d3d12_pipeline_state_desc *desc); HRESULT d3d12_pipeline_state_create(struct d3d12_device *device, const D3D12_PIPELINE_STATE_STREAM_DESC *desc, struct d3d12_pipeline_state **state); VkPipeline d3d12_pipeline_state_get_or_create_pipeline(struct d3d12_pipeline_state *state,

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Giovanni Mascellani (＠giomasce)

5 Jan 5 Jan

4:03 p.m.

I haven't read the code yet, but I'm not yet sold on the idea of reimplementing our own serialization format. AFAIU getting a database right (correct, stable, performant, etc) is quite tricky, and since there are already a lot of time-tested alternatives around I'd like to have a discussion about why cooking our own is our best way forward. We've already had some discussion internally, but maybe it's a good idea to also have it here and possibly go a bit deeper.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_56771

Stefan Dösinger (＠stefan)

7:28 p.m.

On Fri Jan 5 19:28:26 2024 +0000, Giovanni Mascellani wrote:

...

I haven't read the code yet, but I'm not yet sold on the idea of reimplementing our own serialization format. AFAIU getting a database right (correct, stable, performant, etc) is quite tricky, and since there are already a lot of time-tested alternatives around I'd like to have a discussion about why cooking our own is our best way forward. We've already had some discussion internally, but maybe it's a good idea to also have it here and possibly go a bit deeper.

I started with the same idea, for the same reasons you mention. An on-disk file is a potential attack vector, so we need to tread carefully, and I didn't put a lot of validation into the serialization I wrote there.

What I investigated in pre-existing libraries:

LevelDB: The compiled binary is about 4 times the size of vkd3d. The last commit was in April 2023.

RocksDB: A leveldb fork, 10 times the size of vkd3d, takes about 30 minutes to build on my system

Fossilize: A library by valve that is pretty close to what we need. See below.

berkeley db, gdbm, etc: Afaiu a copy of those hangs around on every Unix system, but not inside Win32. They are either unmaintained or have incompatible licenses.

memcached, and a few others from web-related environments: Client-server architectures, even more overkill than RocksDB, although potentially smaller.

The one realistic choice is Mesa's C-only reimplementation of the fossilize serializer. It looks reasonably small at first size, but it is dependent on Mesa's hash table. I spent a few days trying to make sense of it, but failed. It mixes various hash formats (truncate_hash_to_64bits, and somewhere I saw 32 bit hashes too).

Afaics the foz backend is not the default backend in mesa (the default one populates a directory with thousands of files), so I am not sure how much testing it gets. The populate-a-directory approach is feasible for a cache on the user's machine (if you assume a post-FAT32 file system), but makes shipping a prepared cache awkward.

After some time of staring at the mesa code I needed a feeling of progress and decided to roll my own 500 lines of code. Is it NIH syndrome? Certainly. But copypasting Mesa code I don't understand and hoping it does a good job may or may not be better.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_56786

Giovanni Mascellani (＠giomasce)

9:46 p.m.

On Fri Jan 5 19:28:25 2024 +0000, Stefan Dösinger wrote:

...

I started with the same idea, for the same reasons you mention. An on-disk file is a potential attack vector, so we need to tread carefully, and I didn't put a lot of validation into the serialization I wrote there. What I investigated in pre-existing libraries: LevelDB: The compiled binary is about 4 times the size of vkd3d. The last commit was in April 2023. RocksDB: A leveldb fork, 10 times the size of vkd3d, takes about 30 minutes to build on my system Fossilize: A library by valve that is pretty close to what we need. See below. berkeley db, gdbm, etc: Afaiu a copy of those hangs around on every Unix system, but not inside Win32. They are either unmaintained or have incompatible licenses. memcached, and a few others from web-related environments: Client-server architectures, even more overkill than RocksDB, although potentially smaller. The one realistic choice is Mesa's C-only reimplementation of the fossilize serializer. It looks reasonably small at first size, but it is dependent on Mesa's hash table. I spent a few days trying to make sense of it, but failed. It mixes various hash formats (truncate_hash_to_64bits, and somewhere I saw 32 bit hashes too). Afaics the foz backend is not the default backend in mesa (the default one populates a directory with thousands of files), so I am not sure how much testing it gets. The populate-a-directory approach is feasible for a cache on the user's machine (if you assume a post-FAT32 file system), but makes shipping a prepared cache awkward. After some time of staring at the mesa code I needed a feeling of progress and decided to roll my own 500 lines of code. Is it NIH syndrome? Certainly. But copypasting Mesa code I don't understand and hoping it does a good job may or may not be better.

What about SQLite? AFAIK it is pure C, very permissing licensing, easy to embed and relatively small. It's possible that size-wise it's still a significant increase over vkd3d, but personally I'd still prefer a significant increase over having to deal with all the complications of doing a database right.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_56798

Stefan Dösinger (＠stefan)

6 Jan 6 Jan

11:48 a.m.

On Fri Jan 5 21:46:51 2024 +0000, Giovanni Mascellani wrote:

...

What about SQLite? AFAIK it is pure C, very permissing licensing, easy to embed and relatively small. It's possible that size-wise it's still a significant increase over vkd3d, but personally I'd still prefer a significant increase over having to deal with all the complications of doing a database right.

We don't need the SQL query machinery as the cache database is a simple key-value store. My systems libsqlite.so is about 1.5 mb vs a stripped vkd3d at 377kb. That said, I think Windows recently started shipping a build of sqlite itself. If we add this to Wine vkd3d could use externally provided sqlite in both a Unix and Win32 configuration.

(Interestingly msi.dll reimplements SQL queries itself, but msi has to support a bespoke file format anyway)

I also tried to identify Microsoft's shader cache format, hoping that they reused an existing database system. However, no file format identification tool I tried identified it.

We can change the format in the future. Our own cache can be thrown away, so we don't necessarily need to maintain backwards compatibility with existing files (although we can, if we want to). It's worth waiting and seeing how games will use ID3D12ShaderCacheSession. I don't think the API is meant to allow games to ship pre-created cache files and depending on their contents, but that hasn't stopped Windows applications before. If we run into this kind of situation we'll have to reverse engineer Microsoft's format anyway.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_56807

Giovanni Mascellani (＠giomasce)

12:48 p.m.

On Sat Jan 6 11:48:38 2024 +0000, Stefan Dösinger wrote:

...

We don't need the SQL query machinery as the cache database is a simple key-value store. My systems libsqlite.so is about 1.5 mb vs a stripped vkd3d at 377kb. That said, I think Windows recently started shipping a build of sqlite itself. If we add this to Wine vkd3d could use externally provided sqlite in both a Unix and Win32 configuration. (Interestingly msi.dll reimplements SQL queries itself, but msi has to support a bespoke file format anyway) I also tried to identify Microsoft's shader cache format, hoping that they reused an existing database system. However, no file format identification tool I tried identified it. We can change the format in the future. Our own cache can be thrown away, so we don't necessarily need to maintain backwards compatibility with existing files (although we can, if we want to). It's worth waiting and seeing how games will use ID3D12ShaderCacheSession. I don't think the API is meant to allow games to ship pre-created cache files and depending on their contents, but that hasn't stopped Windows applications before. If we run into this kind of situation we'll have to reverse engineer Microsoft's format anyway.

I find 1.5 MB a reasonable price to pay for not having to maintain our own file format and ensure that it is reasonably efficient, concurrent and robust. As you say, nowadays SQLite is becoming more and more a staple dependency that you should just be able to assume that a given system offers; but even if that's not the case, I'd say that the balance is in favor of SQLite anyway.

It's true that this is a cache, so we can just drop it whenever some corruption occurs, but my feeling is that we can have a better result with less code on our side by using the wheel somebody already invented, and to me that's totally worth 1.5 MB that can even be shared with other potential users.

My two cents, though.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_56808

Conor McCarthy (＠cmccarthy)

8 Jan 8 Jan

1:48 p.m.

Conor McCarthy (@cmccarthy) commented about libs/vkd3d/cache.c:

...

 vkd3d_free(cache);
}

+/* As the name implies this is taken from moltenvk. */ +#define MVKHASH_SEED 5381 +static inline uint64_t mvkHash64(const uint64_t *pVals, size_t count, uint64_t seed)

An ideal 64-bit hash has a 50% collision probability on about 5 billion entries. I think we should aim for a collision to never occur once, and 64 bits is not enough. Ideally a memcmp() of the entire key would be done in the comparison function after a hash match is found.

I'm not sure of the performance of the moltenvk hash, but a simple prime multiplication hash like [FNV-1a](https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function) probably has fewer collisions.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_56899

Conor McCarthy (＠cmccarthy)

1:50 p.m.

On Sat Jan 6 12:48:06 2024 +0000, Giovanni Mascellani wrote:

...

I find 1.5 MB a reasonable price to pay for not having to maintain our own file format and ensure that it is reasonably efficient, concurrent and robust. As you say, nowadays SQLite is becoming more and more a staple dependency that you should just be able to assume that a given system offers; but even if that's not the case, I'd say that the balance is in favor of SQLite anyway. It's true that this is a cache, so we can just drop it whenever some corruption occurs, but my feeling is that we can have a better result with less code on our side by using the wheel somebody already invented, and to me that's totally worth 1.5 MB that can even be shared with other potential users. My two cents, though.

Unless there are unforeseen complications, the 1.5 Mb for SQLite may well be worth it. My two cents also.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_56901

Giovanni Mascellani (＠giomasce)

11 Jan 11 Jan

1:38 p.m.

Giovanni Mascellani (@giomasce) commented about include/vkd3d.h:

...

Huhu document me

\since 1.10

*/

+struct vkd3d_shader_cache_desc +{
/** Maximum amount of data the cache holds in memory. */

uint32_t mem_size;

/** Maximum amount of data written to disk. Set to 0 for a memory-only cache. */

uint32_t disk_size;

/** Maximum number of cache entries. */

uint32_t max_entries;

/** Random flags, what else. */

enum vkd3d_shader_cache_flags flags;

/** An application-chosen version number. If the version of an existing
* cache on disk does match, the old data will be discarded. */

Maybe "does not match"?

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_57169

Giovanni Mascellani (＠giomasce)

1:38 p.m.

Giovanni Mascellani (@giomasce) commented about include/vkd3d_types.h:

...

 VKD3D_ERROR_INVALID_SHADER = -4,
 /** The operation is not implemented in this version of vkd3d. */
 VKD3D_ERROR_NOT_IMPLEMENTED = -5,
/** The requested shader cache key was not found. */

VKD3D_ERROR_NOT_FOUND = -6,

/** The requested shader cache value was bigger than the passed buffer. */

VKD3D_ERROR_MORE_DATA = -7,

/** A different key with the same hash was found in the shader cache. */

VKD3D_ERROR_HASH_COLLISSION = -8,

`COLLISION`, I think.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_57170

Giovanni Mascellani (＠giomasce)

1:39 p.m.

Giovanni Mascellani (@giomasce) commented about include/vkd3d.h:

...

VKD3D_SHADER_CACHE_FLAGS_NO_SERIALIZE,

VKD3D_FORCE_32_BIT_ENUM(VKD3D_SHADER_CACHE_FLAGS),

+};

+/**

Huhu document me

\since 1.10

*/

+struct vkd3d_shader_cache_desc +{

/** Maximum amount of data the cache holds in memory. */

uint32_t mem_size;

/** Maximum amount of data written to disk. Set to 0 for a memory-only cache. */

uint32_t disk_size;

Do these sizes count the sum of the sizes of stored keys and values, or the actual storage occupation, which in general I suppose might be larger (or even smaller, if compression is used)?

Also, I don't like too much the idea that setting a specific (degenerate, but in principle valid) value for a numeric parameter triggers a behavior change (like the multiple handles to the same object below). Could `in_memory` be a separate boolean parameter?

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_57171

Giovanni Mascellani (＠giomasce)

1:39 p.m.

Giovanni Mascellani (@giomasce) commented about include/vkd3d.h:

...

*/ VKD3D_API void vkd3d_set_log_callback(PFN_vkd3d_log callback);

+/**

Creates a new shader cache or opens an existing one.

\param name The name of the cache. In case of an on-disk cache, this is a file name. In case of a memory-only

cache, opening the same name again in the same process will return the same vkd3d_shader_cache handle.

Cache handles are reference counted, so vkd3d_shader_cache_close has to be called for each successful

vkd3d_shader_cache_open invocation.

Is it allowed to open the same file more than once in the same process? From different processes? What happens if you use a different name for the same file, e.g. because of a non canonical path, symlinks, hardlinks?

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_57172

Giovanni Mascellani (＠giomasce)

1:39 p.m.

Giovanni Mascellani (@giomasce) commented about include/vkd3d.h:

...

\param cache The cache to close.

\since 1.10

*/

+VKD3D_API void vkd3d_shader_cache_close(struct vkd3d_shader_cache *cache);

+/**

Stores a key-value pair in a shader cache.

\param cache The cache to store the value in.

\param key An opaque key of key_size bytes. The cache does not parse the key in any way. If the key already

exists, the existing value will be replaced.

FIXME: For some users (e.g. the renderpass cache) it would be interesting to prevent replacement and get

an error instead if the value already exists. Without this they need their own lock to have an atomic

get() - create new object - put() sequence.

Yeah, in general it seems sensible to allow for this in the API.

Conversely, if would be useful to have a way to destruct the current value (and possibly key too? Maybe not) when a `vkd3d_shader_cache_put()` call is replacing it. Maybe we could pass a destructor to `vkd3d_shader_cache_put()` to be used on the value that is getting replaced, if any. Or simply pass back the old value to the caller, so it is dealt with appropriately.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_57173

Giovanni Mascellani (＠giomasce)

1:39 p.m.

Giovanni Mascellani (@giomasce) commented about libs/vkd3d/cache.c:

...

+#include "vkd3d_private.h" +#include "rbtree.h"

+#include <stdarg.h> +#include <stdio.h>

+int vkd3d_shader_cache_open(const char *name,
   const struct vkd3d_shader_cache_desc *desc, struct vkd3d_shader_cache **cache)
+{

FIXME("%s, %p, %p: stub!\n", debugstr_a(name), desc, cache);

return VKD3D_ERROR_NOT_IMPLEMENTED;

+}

+void vkd3d_shader_cache_close(struct vkd3d_shader_cache *cache) +{

FIXME("Stub!\n");

Nitpick, but maybe `cache` should be dumped. Also in `vkd3d_shader_cache_delete_on_destroy()`.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_57174

Giovanni Mascellani (＠giomasce)

1:39 p.m.

Giovanni Mascellani (@giomasce) commented about libs/vkd3d/cache.c:

...

#include <stdarg.h> #include <stdio.h>

+/* List of open caches. I expect the number to be small. */ +static struct list cache_list = LIST_INIT(cache_list); +static struct vkd3d_mutex cache_list_mutex; +static LONG cache_mutex_initialized;

+struct vkd3d_shader_cache +{

LONG refcount;

struct vkd3d_shader_cache_desc desc;

struct list cache_list_entry;

char name[1];

We use C99, you don't need the `1`.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_57175

Giovanni Mascellani (＠giomasce)

1:39 p.m.

Giovanni Mascellani (@giomasce) commented about libs/vkd3d/cache.c:

...

return VKD3D_ERROR_NOT_IMPLEMENTED;
struct vkd3d_shader_cache *object;

size_t size;

TRACE("%s, %p, %p.\n", debugstr_a(name), desc, cache);

if (!name || !desc)

{
   WARN("No name or description, returning VKD3D_ERROR_INVALID_ARGUMENT.\n");
   return E_INVALIDARG;
}

/* FIXME: This isn't thread safe and cache_mutex_initialized might overflow. Do we have a
* something like DllMain or a platform-independent InitializeOnce? */
if (InterlockedIncrement(&cache_mutex_initialized) == 1)
   vkd3d_mutex_init(&cache_list_mutex);

Both the platforms we support have static mutex initializers, we can just define them in the header. We can also add support for `InitializeOnce()` and `pthread_once()`, of course, but the static initializer seems better to me.

Also notice that we have platform-independent functions `vkd3d_atomic_increment()` and similar.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_57176

Giovanni Mascellani (＠giomasce)

1:39 p.m.

Giovanni Mascellani (@giomasce) commented about libs/vkd3d/cache.c:

...

+{

struct vkd3d_cache_object_v1 d;

struct rb_entry entry; /* Entry in the hash table. */

uint8_t *payload; /* App key + value. Separate allocation to allow eviction. */

+};

+static int vkd3d_shader_cache_compare_key(const void *key, const struct rb_entry *entry) +{
const uint64_t *k = key;

const struct shader_cache_entry *e = RB_ENTRY_VALUE(entry, struct shader_cache_entry, entry);

if (*k < e->d.hash)
   return -1;
if (*k > e->d.hash)
   return 1;
return 0;

This only works as far as you keep this "experimental mode" in which you nuke the universe as soon as you find a collision. As soon as you want to fix that, the RB tree comparison function must be aware of the full key as well.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_57177

Giovanni Mascellani (＠giomasce)

1:39 p.m.

Giovanni Mascellani (@giomasce) commented about libs/vkd3d/vkd3d_private.h:

...

     ERR("Could not lock the mutex, error %d.\n", ret);
}

+static inline bool vkd3d_mutex_trylock(struct vkd3d_mutex *lock) +{

/* FIXME: Untested. */

return !pthread_mutex_lock(&lock->lock);

`pthread_mutex_trylock()` exists. Technically you have to check for `EBUSY` though.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_57178

Giovanni Mascellani (＠giomasce)

1:39 p.m.

Giovanni Mascellani (@giomasce) commented about libs/vkd3d/state.c:

...

-void vkd3d_render_pass_cache_init(struct vkd3d_render_pass_cache *cache) +struct vkd3d_shader_cache *vkd3d_render_pass_cache_init(struct d3d12_device *device) {

cache->render_passes = NULL;

cache->render_pass_count = 0;

cache->render_passes_size = 0;

struct vkd3d_shader_cache_desc cache_desc = {0};

struct vkd3d_shader_cache *cache;

enum vkd3d_result ret;

char cache_name[128];

cache_desc.mem_size = ~0;

cache_desc.max_entries = ~0;

cache_desc.flags = VKD3D_SHADER_CACHE_FLAGS_NO_SERIALIZE;

sprintf(cache_name, "memory:%p:renderpass", device);

I guess here `%p` is to make sure that each device has its own private cache. But that still requires iterating through all the available caches with a global lock and comparing strings. Wouldn't it be better to just make the cache private by default and have a flag to opt in sharing?

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_57179

Giovanni Mascellani (＠giomasce)

1:39 p.m.

Giovanni Mascellani (@giomasce) commented about include/vkd3d.h:

...

*/
VKD3D_SHADER_CACHE_FLAGS_NONE,

/**
* Don't acquire the cache mutex before access.
*/
VKD3D_SHADER_CACHE_FLAGS_NO_SERIALIZE,

VKD3D_FORCE_32_BIT_ENUM(VKD3D_SHADER_CACHE_FLAGS),
+};

+/**

Huhu document me

\since 1.10

*/

+struct vkd3d_shader_cache_desc

Maybe we should use the Vulkan convention for extensible structures here?

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_57180

Giovanni Mascellani (＠giomasce)

1:39 p.m.

Giovanni Mascellani (@giomasce) commented about libs/vkd3d/cache.c:

...

       cache->desc.disk_size = 0; /* Convert to mem only. */

```
       vkd3d_free(filename);
```
```
       return;
```
```
   }
```
}
sprintf(filename, "%s.idx", cache->name);
indices = fopen(filename, "rb");
if (!indices)
{

   /* This happens when the cache files did not exist. Keep the opened

    * values file, we'll use it later. */

   WARN("Index file %s not found.\n", filename);

```
   vkd3d_free(filename);
```
```
   return;
```
}

Given that you dump and read the whole database each time you might as well use a single file, which should decrease the corruption probability (for example because two processes read/write concurrently to the same database).

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_57181

Conor McCarthy (＠cmccarthy)

2:32 p.m.

On Thu Jan 11 13:37:25 2024 +0000, Giovanni Mascellani wrote:

...

Both the platforms we support have static mutex initializers, we can just define them in the header. We can also add support for `InitializeOnce()` and `pthread_once()`, of course, but the static initializer seems better to me. Also notice that we have platform-independent functions `vkd3d_atomic_increment()` and similar.

!384 has a static mutex initializer, `VKD3D_MUTEX_INITIALIZER`

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_57188

Henri Verbeet (＠hverbeet)

2:36 p.m.

On Mon Jan 8 13:50:51 2024 +0000, Conor McCarthy wrote:

...

Unless there are unforeseen complications, the 1.5 Mb for SQLite may well be worth it. My two cents also.

I already mentioned this on IRC, duplicating here for the benefit of others:

I think SQLite may be worth a look, to see how it compares to the current implementation. A couple of points though:

- To some extent the storage backend is a bit of an implementation detail; ideally it shouldn't be too hard to replace the storage backend, allow for multiple different implementations, or perhaps even to allow the user of vkd3d-shader to provide its own implementation.

- The main argument against SQLite is likely the cost of adding an effectively non-optional dependency. In particular, while SQLite is commonly available on Linux distributions, things are a bit harder on Windows/Wine. The implication would likely be adding SQLite as a bundled library to Wine (as well as CrossOver, but we're not necessarily terribly concerned about CrossOver here), so it may be worth getting @julliard's views on that beforehand.

- In general, filesystems tend to be pretty decent at storing files; being able to do significantly better by storing them in a database instead would seem like a surprising result, but I guess we'll find out.

- Are there any existing Win32 APIs we could use for the storage backend?

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_57193

Stefan Dösinger (＠stefan)

3:16 p.m.

On Mon Jan 8 13:48:50 2024 +0000, Conor McCarthy wrote:

...

An ideal 64-bit hash has a 50% collision probability on about 5 billion entries. I think we should aim for a collision to never occur once, and 64 bits is not enough. Ideally a memcmp() of the entire key would be done in the comparison function after a hash match is found. I'm not sure of the performance of the moltenvk hash, but a simple prime multiplication hash like [FNV-1a](https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function) probably has fewer collisions.

The idea behind the separate *payload allocation is to load it on demand and free it when the cache grows beyond the application-specified memory size. The rbtree search callback may not have the entire key available.

There are certainly ways to deal with that: Load the contents from disk from the search callback (which means passing the cache or rbtree pointer somehow), or load it from the caller and re-run the search, and compare the full tree only if *payload is allocated. Though I guess Microsoft created DXGI_ERROR_CACHE_HASH_COLLISION for some reason.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_57199

Stefan Dösinger (＠stefan)

3:17 p.m.

On Thu Jan 11 13:37:18 2024 +0000, Giovanni Mascellani wrote:

...

Maybe "does not match"?

Indeed

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_57200

Stefan Dösinger (＠stefan)

3:23 p.m.

On Thu Jan 11 13:37:20 2024 +0000, Giovanni Mascellani wrote:

...

Do these sizes count the sum of the sizes of stored keys and values, or the actual storage occupation, which in general I suppose might be larger (or even smaller, if compression is used)? Also, I don't like too much the idea that setting a specific (degenerate, but in principle valid) value for a numeric parameter triggers a behavior change (like the multiple handles to the same object below). Could `in_memory` be a separate boolean parameter?

As far as ID3D12ShaderCacheSession is concerned, both key and value sizes count towards the size (See https://gitlab.winehq.org/stefan/vkd3d/-/tree/cache-rework for tests which aren't included in this MR). I haven't tested the impact of compression and haven't checked if native compresses the storage at all.

I think a flag for specifying a memory-only cache is a good idea. I (ab)used the disk_size = 0 before adding the flags field to the cache desc structure.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_57201

Stefan Dösinger (＠stefan)

3:28 p.m.

On Thu Jan 11 13:37:21 2024 +0000, Giovanni Mascellani wrote:

...

Is it allowed to open the same file more than once in the same process? From different processes? What happens if you use a different name for the same file, e.g. because of a non canonical path, symlinks, hardlinks?

In native it only works in the same process. A memory-only cache opened from a seprate process doesn't magically share data. An on-disk cache that's opened from different processes returns an error, even if the version matches. (I think an hresult-wrapped ERROR_SHARING_VIOLATION, but I am not sure any more. It was just an ad-hoc test, not something I included in the test itself.

Native doesn't allow specifying a file path like this, just a guid and if it should be placed in the current working directory or the user's home directory. d3d12's own caches use a name scheme that differs from this though (something like .exe filename, GPU name and GPU PCI IDs)

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_57202

Alexandre Julliard (＠julliard)

3:33 p.m.

...

The main argument against SQLite is likely the cost of adding an effectively non-optional dependency. In particular, while SQLite is commonly available on Linux distributions, things are a bit harder on Windows/Wine. The implication would likely be adding SQLite as a bundled library to Wine (as well as CrossOver, but we're not necessarily terribly concerned about CrossOver here), so it may be worth getting @julliard's views on that beforehand.

SQLite is shipped with Windows as winsqlite3.dll, so it will be added to Wine sooner or later.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_57203

Stefan Dösinger (＠stefan)

4:04 p.m.

On Thu Jan 11 13:37:26 2024 +0000, Giovanni Mascellani wrote:

...

This only works as far as you keep this "experimental mode" in which you nuke the universe as soon as you find a collision. As soon as you want to fix that, the RB tree comparison function must be aware of the full key as well.

I don't think it does? _put / _get compare the key (as mentioned in the other comment, the idea is to load *payload from disk once the hash matches). There's a return value for hash collisions.

The "nuke the universe" aka exit(1) is there because I haven't yet seen a hash collision with the 64 bit hashes nor been able to produce one on Windows (and if we manage to deliberately produce one something is really wrong). I did see the exit(1) triggered due to bugs in my code though, e.g. incorrectly loading data from disk etc. So it did catch bugs by being noisy. I don't intend to have this in a non-draft MR though, just a loud FIXME.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_57213

Stefan Dösinger (＠stefan)

4:05 p.m.

On Thu Jan 11 13:37:27 2024 +0000, Giovanni Mascellani wrote:

...

`pthread_mutex_trylock()` exists. Technically you have to check for `EBUSY` though.

Oops, I wanted to use trylock() here. But still, I didn't test the Linux build codepath beyond checking that it compiles (and compiles without printf format warnings).

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_57214

Stefan Dösinger (＠stefan)

4:19 p.m.

On Thu Jan 11 13:37:30 2024 +0000, Giovanni Mascellani wrote:

...

Maybe we should use the Vulkan convention for extensible structures here?

yeah that sounds like a good idea

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_57217

Stefan Dösinger (＠stefan)

4:20 p.m.

On Thu Jan 11 13:37:31 2024 +0000, Giovanni Mascellani wrote:

...

Given that you dump and read the whole database each time you might as well use a single file, which should decrease the corruption probability (for example because two processes read/write concurrently to the same database).

That's not the plan in the long run though. (The "chache-rework" branch, which I should have called cache-pre-rework has delayed loading and partial updates, though not eviction)

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_57218

Stefan Dösinger (＠stefan)

4:43 p.m.

On Thu Jan 11 15:33:42 2024 +0000, Alexandre Julliard wrote:

...

...
The main argument against SQLite is likely the cost of adding an

effectively non-optional dependency. In particular, while SQLite is commonly available on Linux distributions, things are a bit harder on Windows/Wine. The implication would likely be adding SQLite as a bundled library to Wine (as well as CrossOver, but we're not necessarily terribly concerned about CrossOver here), so it may be worth getting @julliard's views on that beforehand. SQLite is shipped with Windows as winsqlite3.dll, so it will be added to Wine sooner or later.

I am not aware of any Win32 APIs suitable here, other than the mentioned winsqlite3.dll. There's cabinet.dll, but as far as I understand it it will only unpack the full file. Afaics we don't need to extract it to disk (carefully written callbacks could just write to memory). It would be platform specific, so we'd need some codepath for Unix too. I don't think it has the performance characteristics we want either.

Microsoft had some database libraries over time, but they aren't part of Win32. Unless we pack the cache into PE resource files or WritePrivateProfileString .ini files.

There's an LZ77 decompression algorithm in kernel32, but nothing to compress.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_57221

Henri Verbeet (＠hverbeet)

5:18 p.m.

...

It would be platform specific, so we'd need some codepath for Unix too.

Right, the idea in such a case would be to use one API on Windows and something else on Linux, although that probably comes with its own set of pitfalls.

In any case, having SQLite shipped with Windows certainly makes it a lot more feasible as a potential option.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_57225

549

Age (days ago)

556

Last active (days ago)

wine-gitlab@winehq.org

50 comments

6 participants

tags (0)

participants (6)

Alexandre Julliard (＠julliard)
Conor McCarthy (＠cmccarthy)
Giovanni Mascellani (＠giomasce)
Henri Verbeet (＠hverbeet)
Stefan Dösinger
Stefan Dösinger (＠stefan)