[PATCH v2 0/22] MR541: RFC: Shader cache for vkd3d

List overview All Threads

newer

older

[PATCH v6 0/6] MR5066: server:...

[PATCH 0/3] MR677:...

Stefan Dösinger (＠stefan)

14 Jan 2024 14 Jan '24

9:51 a.m.

Here is a preview of my shader cache work for early comments. It isn't complete, but does successfully cache things.

What's there: * A new vkd3d API that is used internally for caching, can be used to implement the ID3D12ShaderCacheSession interface and hopefully be used by wined3d as well * Simple saving and loading of the cached objects * It is used to cache render passes, root signatures and pipeline states

What is not yet there * Partial cache loading and eviction * ID3D12ShaderCacheSession - largely because it needs bumping ID3D12Device up to version 9, which may bring unrelated regressions. For this and tests see my "cache-rework" branch (which * Cache file compression * Incremental updates of cache files - right now they are rewritten from scratch on exit * Loading the cache in an extra thread. The pipeline state creation code will need some refactor for that

I am not quite happy yet with the two patches that write and reload actual graphics pipelines. The way I am storing the d3d settings aren't quite consistent yet either - in some cases I use the d3d input data as key directly, in others I store them as values attached to a hash value. The latter is usually the case if I need to cross-reference something, e.g. have a link from the pipeline state to the root signature. This kind of setup shows how wined3d can build a chain of linked state though.

There are also known issues with locking, explained in comments in the patches.

-- v2: vkd3d: Try to find a read-only cache in C:\windows\scache vkd3d: Cache and preload compute pipelines. DEBUG: Make cache profiling more visible vkd3d: Add some cache efficiency debug code. vkd3d: Add EXT_pipeline_creation_feedback. vkd3d: Catch and release graphics pipelines. Store graphics pipelines in the cache. vkd3d: Precreate root signatures from cache vkd3d: Keep root signatures around. Store render passes in the on-disk cache and recreate them on startup. vkd3d: Store the VK pipeline cache in an on-disk vkd3d cache. vkd3d: Keep the application name around. Add a win32 version of vkd3d_get_program_name. vkd3d: Basic shader cache writing and reading. vkd3d: Replace the custom render pass cache with vkd3d_shader_cache. vkd3d: Implement vkd3d_shader_cache_enumerate. Add cache locking. vkd3d: Implement vkd3d_shader_cache_get. vkd3d: Implement vkd3d_shader_cache_put. Create and destroy the shader cache tree. vkd3d: Implement shader_cache_open/close. vkd3d: Define and stub the shader cache API.

https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Show replies by date

Stefan Dösinger

14 Jan 14 Jan

9:51 a.m.

New subject: [PATCH v2 01/22] vkd3d: Define and stub the shader cache API.

From: Stefan Dösinger stefan@codeweavers.com

---

Q1: Can I add those include files to vkd3d.h or is there a problem?

Why not in vkd3d-shader? Because vkd3d-shader has no locks/mutexes, and I'd like to do locking in the shader cache implementation instead of the caller.

Q2: Since this is not in vkd3d-shader I could use DXGI_ERROR_* instead of defining more vkd3d_result enums. I kinda like having this independent of dxgi types though. --- Makefile.am | 1 + include/vkd3d.h | 208 ++++++++++++++++++++++++++++++++++++++++++ include/vkd3d_types.h | 10 ++ libs/vkd3d/cache.c | 60 ++++++++++++ libs/vkd3d/vkd3d.map | 6 ++ 5 files changed, 285 insertions(+) create mode 100644 libs/vkd3d/cache.c

diff --git a/Makefile.am b/Makefile.am index bc648b631..0c135b72e 100644 --- a/Makefile.am +++ b/Makefile.am @@ -332,6 +332,7 @@ libvkd3d_la_SOURCES = \ include/vkd3d_d3d12.idl \ include/vkd3d_d3dcommon.idl \ include/vkd3d_unknown.idl \ + libs/vkd3d/cache.c \ libs/vkd3d/command.c \ libs/vkd3d/device.c \ libs/vkd3d/resource.c \ diff --git a/include/vkd3d.h b/include/vkd3d.h index a3bb8e0dd..dd98bf34f 100644 --- a/include/vkd3d.h +++ b/include/vkd3d.h @@ -19,6 +19,8 @@ #ifndef __VKD3D_H #define __VKD3D_H

+#include <stdbool.h> +#include <stdint.h> #include <vkd3d_types.h>

#ifndef VKD3D_NO_WIN32_TYPES @@ -187,6 +189,72 @@ struct vkd3d_image_resource_create_info D3D12_RESOURCE_STATES present_state; };

+struct vkd3d_shader_cache; + +/** The output format of a compiled shader. */ +enum vkd3d_shader_cache_flags +{ + /** + * No particular behaviour modifications. + */ + VKD3D_SHADER_CACHE_FLAGS_NONE = 0x00000000, + /** + * Don't acquire the cache mutex before access. + */ + VKD3D_SHADER_CACHE_FLAGS_NO_SERIALIZE = 0x00000001, + /** + * Don't allow modifications, don't serialize back to a file. + */ + VKD3D_SHADER_CACHE_FLAGS_READ_ONLY = 0x00000002, + /** + * A memory only cache that is initially empty and gets discarded on close. + */ + VKD3D_SHADER_CACHE_FLAGS_MEMORY_ONLY = 0x00000004, + + VKD3D_FORCE_32_BIT_ENUM(VKD3D_SHADER_CACHE_FLAGS), +}; + +/** + * Huhu document me + * + * \since 1.10 + */ +struct vkd3d_shader_cache_desc +{ + /** Maximum amount of data the cache holds in memory. */ + uint32_t mem_size; + /** Maximum amount of data written to disk. Ignored for VKD3D_SHADER_CACHE_MEMORY_ONLY. */ + uint32_t disk_size; + /** Maximum number of cache entries. */ + uint32_t max_entries; + /** Random flags, what else. */ + enum vkd3d_shader_cache_flags flags; + /** An application-chosen version number. If the version of an existing + * cache on disk does not match, the old data will be discarded. */ + uint64_t version; +}; + +/** + * Callback function for vkd3d_shader_cache_enumerate. + * + * \ref key and \ref value become invalid after the callback returns and must not be freed or modified. + * + * \param key The application-specified key of the currently enumerated element. + * + * \param key_size Size of \ref key in bytes. + * + * \param value The value associated with \ref key. + * + * \param value_size Size of \ref value in bytes. + * + * \param context The context parameter passed to \ref vkd3d_shader_cache_enumerate. + * + * \return true if the enumeration should be continued, false to abort it. + */ +typedef bool (vkd3d_shader_cache_traverse_func)(struct vkd3d_shader_cache *cache, + const void *key, uint32_t key_size, const void *value, + uint32_t value_size, void *context); + #ifdef LIBVKD3D_SOURCE # define VKD3D_API VKD3D_EXPORT #else @@ -282,6 +350,124 @@ VKD3D_API HRESULT vkd3d_create_versioned_root_signature_deserializer(const void */ VKD3D_API void vkd3d_set_log_callback(PFN_vkd3d_log callback);

+/** + * Creates a new shader cache or opens an existing one. + * + * \param name The name of the cache. In case of an on-disk cache, this is a file name. In case of a + * memory-only cache, opening the same name again in the same process will return the same + * vkd3d_shader_cache handle. + * Cache handles are reference counted, so vkd3d_shader_cache_close has to be called for each + * successful vkd3d_shader_cache_open invocation. + * + * \param desc Cache properties. See \ref vkd3d_shader_cache_desc. + * + * \param cache Return pointer of the opened or created cache. + * + * \return A member of \ref vkd3d_result. + * + * \since 1.10 + */ +VKD3D_API int vkd3d_shader_cache_open(const char *name, + const struct vkd3d_shader_cache_desc *desc, struct vkd3d_shader_cache **cache); + +/** + * Decrements the cache reference count, closing it if it falls to zero. + * + * \param cache The cache to close. + * + * \since 1.10 + */ +VKD3D_API void vkd3d_shader_cache_close(struct vkd3d_shader_cache *cache); + +/** + * Stores a key-value pair in a shader cache. + * + * \param cache The cache to store the value in. + * + * \param key An opaque key of key_size bytes. The cache does not parse the key in any way. If the + * key already exists, the existing value will be replaced. + * FIXME: For some users (e.g. the renderpass cache) it would be interesting to prevent replacement + * and get an error instead if the value already exists. Without this they need their own lock to + * have an atomic get() - create new object - put() sequence. + * + * \param key_size The size of \ref key in bytes. + * + * \param value The value to associate with \ref key. + * + * \param value_size The size of \ref value in bytes. + * + * \return A member of \ref vkd3d_result. + * + * \since 1.10 + */ +VKD3D_API int vkd3d_shader_cache_put(struct vkd3d_shader_cache *cache, + const void *key, uint32_t key_size, const void *value, uint32_t value_size); + +/** + * Retrieves the stored value associated with a key in a shader cache. + * + * If the key is found, \ref value_size is set to the size of the value stored in the cache. If + * \ref value is non-NULL, and the input value of \ref value_size is equal to or larger than the + * size of the stored value, the stored value will be copied to the memory pointed to by \ref value. + * + * \param cache The cache to retrieve the value from. + * + * \param key The key to look up. + * + * \param key_size The size of \ref key in bytes. + * + * \param value The buffer where to write the value to, of size *value_size. This parameter may be + * NULL. + * + * \param value_size The size of \ref value in bytes. The size of the stored value will be returned + * here. + * + * \return A member of \ref vkd3d_result. + * + * \since 1.10 + */ +VKD3D_API int vkd3d_shader_cache_get(struct vkd3d_shader_cache *cache, + const void *key, uint32_t key_size, void *value, uint32_t *value_size); + +/** + * Marks an on-disk shader cache for deletion. + * + * When the final reference of \ref cache is released, the cache files on disk will be deleted. This + * function has no effect on memory-only caches, which are discarded after use in any case. + * + * \param cache The cache to delete. + * + * \since 1.10 + */ +VKD3D_API void vkd3d_shader_cache_delete_on_destroy(struct vkd3d_shader_cache *cache); + +/** + * Enumerates all key-value pairs in a cache. + * + * This function invokes \ref cb once for each stored entry. No particular enumeration order is + * guaranteed. + * The cache's lock is held during the entire operation, including when invoking the callback. + * + * This function does not make any guarantees about the order of enumeration. + * + * FIXME: Should calling vkd3d_shader_cache_get or vkd3d_shader_cache_put on the same cache from the + * callback be allowed? I am inclined to say yes to _get, but not _put. In either case we need + * reentrant locks. _put that doesn't add or remove keys (just overwrites values) would be ok too, + * but that is harder to spell out. + * + * If we decide "no calls to _get", remove the \ref cache parameter from vkd3d_shader_cache_traverse_func. + * + * \param cache The cache which contents should be enumerated. + * + * \param cb callback function, see \ref vkd3d_shader_cache_traverse_func. + + * \param context An application-specified pointer that is passed to the callback for each invocation. + * + * \since 1.10 + */ +VKD3D_API void vkd3d_shader_cache_enumerate(struct vkd3d_shader_cache *cache, + vkd3d_shader_cache_traverse_func *cb, void *context); + #endif /* VKD3D_NO_PROTOTYPES */

/* @@ -328,6 +514,28 @@ typedef HRESULT (*PFN_vkd3d_create_versioned_root_signature_deserializer)(const /** Type of vkd3d_set_log_callback(). \since 1.4 */ typedef void (*PFN_vkd3d_set_log_callback)(PFN_vkd3d_log callback);

+/** Type of vkd3d_shader_cache_open(). \since 1.10 */ +typedef int (*PFN_vkd3d_shader_cache_open)(const char *name, + const struct vkd3d_shader_cache_desc *desc, struct vkd3d_shader_cache **cache); + +/** Type of vkd3d_shader_cache_close(). \since 1.10 */ +typedef void (*PFN_vkd3d_shader_cache_close)(struct vkd3d_shader_cache *cache); + +/** Type of vkd3d_shader_cache_put(). \since 1.10 */ +typedef int (*PFN_vkd3d_shader_cache_put)(struct vkd3d_shader_cache *cache, + const void *key, uint32_t key_size, const void *value, uint32_t value_size); + +/** Type of vkd3d_shader_cache_get(). \since 1.10 */ +typedef int (*PFN_vkd3d_shader_cache_get)(struct vkd3d_shader_cache *cache, + const void *key, uint32_t key_size, void *value, uint32_t *value_size); + +/** Type of vkd3d_shader_cache_delete_on_destroy(). \since 1.10 */ +typedef void (*PFN_vkd3d_shader_cache_delete_on_destroy)(struct vkd3d_shader_cache *cache); + +/** Type of vkd3d_shader_cache_enumerate(). \since 1.10 */ +typedef void (*PFN_vkd3d_shader_cache_enumerate)(struct vkd3d_shader_cache *cache, + vkd3d_shader_cache_traverse_func *cb, void *context); + #ifdef __cplusplus } #endif /* __cplusplus */ diff --git a/include/vkd3d_types.h b/include/vkd3d_types.h index 4a7aca236..041f2fed4 100644 --- a/include/vkd3d_types.h +++ b/include/vkd3d_types.h @@ -51,6 +51,16 @@ enum vkd3d_result VKD3D_ERROR_INVALID_SHADER = -4, /** The operation is not implemented in this version of vkd3d. */ VKD3D_ERROR_NOT_IMPLEMENTED = -5, + /** The requested shader cache key was not found. */ + VKD3D_ERROR_NOT_FOUND = -6, + /** The requested shader cache value was bigger than the passed buffer. */ + VKD3D_ERROR_MORE_DATA = -7, + /** A different key with the same hash was found in the shader cache. */ + VKD3D_ERROR_HASH_COLLISION = -8, + /** A shader cache with the same name but different version is already opened. */ + VKD3D_ERROR_VERSION_MISMATCH = -9, + /** The cache lock is contended. */ + VKD3D_ERROR_LOCK_NOT_AVAILABLE = -10,

VKD3D_FORCE_32_BIT_ENUM(VKD3D_RESULT), }; diff --git a/libs/vkd3d/cache.c b/libs/vkd3d/cache.c new file mode 100644 index 000000000..6fe12b10e --- /dev/null +++ b/libs/vkd3d/cache.c @@ -0,0 +1,60 @@ +/* + * Copyright 2024 Stefan Dösinger for CodeWeavers + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA + */ + +#include "vkd3d_private.h" +#include "rbtree.h" + +#include <stdarg.h> +#include <stdio.h> + +int vkd3d_shader_cache_open(const char *name, + const struct vkd3d_shader_cache_desc *desc, struct vkd3d_shader_cache **cache) +{ + FIXME("%s, %p, %p: stub!\n", debugstr_a(name), desc, cache); + return VKD3D_ERROR_NOT_IMPLEMENTED; +} + +void vkd3d_shader_cache_close(struct vkd3d_shader_cache *cache) +{ + FIXME("%p Stub!\n", cache); +} + +int vkd3d_shader_cache_put(struct vkd3d_shader_cache *cache, + const void *key, uint32_t key_size, const void *value, uint32_t value_size) +{ + FIXME("%p, %p, %#x, %p, %#x stub!\n", cache, key, key_size, value, value_size); + return VKD3D_ERROR_NOT_IMPLEMENTED; +} + +int vkd3d_shader_cache_get(struct vkd3d_shader_cache *cache, + const void *key, uint32_t key_size, void *value, uint32_t *value_size) +{ + FIXME("%p, %p, %#x, %p, %p stub!\n", cache, key, key_size, value, value_size); + return VKD3D_ERROR_NOT_IMPLEMENTED; +} + +void vkd3d_shader_cache_delete_on_destroy(struct vkd3d_shader_cache *cache) +{ + FIXME("%p Stub!\n", cache); +} + +void vkd3d_shader_cache_enumerate(struct vkd3d_shader_cache *cache, + vkd3d_shader_cache_traverse_func *cb, void *context) +{ + FIXME("%p, %p, %p: stub!\n", cache, cb, context); +} diff --git a/libs/vkd3d/vkd3d.map b/libs/vkd3d/vkd3d.map index 441b2e35b..9e7bdbe9e 100644 --- a/libs/vkd3d/vkd3d.map +++ b/libs/vkd3d/vkd3d.map @@ -23,6 +23,12 @@ global: vkd3d_serialize_root_signature; vkd3d_serialize_versioned_root_signature; vkd3d_set_log_callback; + vkd3d_shader_cache_close; + vkd3d_shader_cache_delete_on_destroy; + vkd3d_shader_cache_enumerate; + vkd3d_shader_cache_get; + vkd3d_shader_cache_open; + vkd3d_shader_cache_put;

local: *; };

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Stefan Dösinger

9:51 a.m.

New subject: [PATCH v2 02/22] vkd3d: Implement shader_cache_open/close.

From: Stefan Dösinger stefan@codeweavers.com

--- libs/vkd3d/cache.c | 79 ++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 76 insertions(+), 3 deletions(-)

diff --git a/libs/vkd3d/cache.c b/libs/vkd3d/cache.c index 6fe12b10e..1dce309f9 100644 --- a/libs/vkd3d/cache.c +++ b/libs/vkd3d/cache.c @@ -22,16 +22,89 @@ #include <stdarg.h> #include <stdio.h>

+/* List of open caches. I expect the number to be small. */ +static struct list cache_list = LIST_INIT(cache_list); +static struct vkd3d_mutex cache_list_mutex; +static LONG cache_mutex_initialized; + +struct vkd3d_shader_cache +{ + LONG refcount; + struct vkd3d_shader_cache_desc desc; + struct list cache_list_entry; + char name[1]; +}; + int vkd3d_shader_cache_open(const char *name, const struct vkd3d_shader_cache_desc *desc, struct vkd3d_shader_cache **cache) { - FIXME("%s, %p, %p: stub!\n", debugstr_a(name), desc, cache); - return VKD3D_ERROR_NOT_IMPLEMENTED; + struct vkd3d_shader_cache *object; + size_t size; + + TRACE("%s, %p, %p.\n", debugstr_a(name), desc, cache); + + if (!name || !desc) + { + WARN("No name or description, returning VKD3D_ERROR_INVALID_ARGUMENT.\n"); + return VKD3D_ERROR_INVALID_ARGUMENT; + } + + /* FIXME: This isn't thread safe and cache_mutex_initialized might overflow. Do we have a + * something like DllMain or a platform-independent InitializeOnce? */ + if (InterlockedIncrement(&cache_mutex_initialized) == 1) + vkd3d_mutex_init(&cache_list_mutex); + + vkd3d_mutex_lock(&cache_list_mutex); + LIST_FOR_EACH_ENTRY(object, &cache_list, struct vkd3d_shader_cache, cache_list_entry) + { + if (!strcmp(object->name, name)) + { + TRACE("found an open cache of name %s.\n", debugstr_a(name)); + if (object->desc.version != desc->version) + { + WARN("Version mismatch: %"PRIu64", %"PRIu64".\n", object->desc.version, desc->version); + vkd3d_mutex_unlock(&cache_list_mutex); + return VKD3D_ERROR_VERSION_MISMATCH; + } + InterlockedIncrement(&object->refcount); + *cache = object; + vkd3d_mutex_unlock(&cache_list_mutex); + return VKD3D_OK; + } + } + + size = strlen(name) + 1; + object = vkd3d_calloc(1, offsetof(struct vkd3d_shader_cache, name[size])); + if (!object) + { + vkd3d_mutex_unlock(&cache_list_mutex); + return VKD3D_ERROR_OUT_OF_MEMORY; + } + + object->refcount = 1; + object->desc = *desc; + memcpy(object->name, name, size); + + list_add_head(&cache_list, &object->cache_list_entry); + vkd3d_mutex_unlock(&cache_list_mutex); + + *cache = object; + return VKD3D_OK; }

void vkd3d_shader_cache_close(struct vkd3d_shader_cache *cache) { - FIXME("%p Stub!\n", cache); + ULONG refcount = InterlockedDecrement(&cache->refcount); + TRACE("cache %s refcount %u.\n", cache->name, refcount); + + if (refcount) + return; + + vkd3d_mutex_lock(&cache_list_mutex); + list_remove(&cache->cache_list_entry); + vkd3d_mutex_unlock(&cache_list_mutex); + + vkd3d_free(cache); }

int vkd3d_shader_cache_put(struct vkd3d_shader_cache *cache,

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Stefan Dösinger

9:51 a.m.

New subject: [PATCH v2 03/22] Create and destroy the shader cache tree.

From: Stefan Dösinger stefan@codeweavers.com

--- libs/vkd3d/cache.c | 47 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+)

diff --git a/libs/vkd3d/cache.c b/libs/vkd3d/cache.c index 1dce309f9..cddb40eff 100644 --- a/libs/vkd3d/cache.c +++ b/libs/vkd3d/cache.c @@ -27,14 +27,52 @@ static struct list cache_list = LIST_INIT(cache_list); static struct vkd3d_mutex cache_list_mutex; static LONG cache_mutex_initialized;

+/* Data structures used in the serialized files. Changing these will break compatibility with + * existing cache files, so bump the cache version if doing so. + * + * We don't intend these files to be read by third party code, so consider them a vkd3d + * implementation detail. */ +struct vkd3d_cache_object_v1 +{ + uint64_t hash; + uint32_t offset; /* Where key + value are located in the .val file. */ + uint32_t disk_size; /* Size of the entry in the .val file. May be compressed. */ + uint32_t key_size; /* Size of the app provided key. */ + uint32_t value_size; /* Size of the value. key_size + value_size = uncompressed entry size. */ +}; + +/* End disk data structures. */ + struct vkd3d_shader_cache { LONG refcount; struct vkd3d_shader_cache_desc desc; struct list cache_list_entry; + + struct rb_tree tree; + char name[1]; };

+struct shader_cache_entry +{ + struct vkd3d_cache_object_v1 d; + struct rb_entry entry; /* Entry in the hash table. */ + uint8_t *payload; /* App key + value. Separate allocation to allow eviction. */ +}; + +static int vkd3d_shader_cache_compare_key(const void *key, const struct rb_entry *entry) +{ + const uint64_t *k = key; + const struct shader_cache_entry *e = RB_ENTRY_VALUE(entry, struct shader_cache_entry, entry); + + if (*k < e->d.hash) + return -1; + if (*k > e->d.hash) + return 1; + return 0; +} + int vkd3d_shader_cache_open(const char *name, const struct vkd3d_shader_cache_desc *desc, struct vkd3d_shader_cache **cache) { @@ -84,6 +122,7 @@ int vkd3d_shader_cache_open(const char *name, object->refcount = 1; object->desc = *desc; memcpy(object->name, name, size); + rb_init(&object->tree, vkd3d_shader_cache_compare_key);

list_add_head(&cache_list, &object->cache_list_entry); vkd3d_mutex_unlock(&cache_list_mutex); @@ -92,6 +131,12 @@ int vkd3d_shader_cache_open(const char *name, return VKD3D_OK; }

+static void vkd3d_shader_cache_clear(struct rb_entry *entry, void *context) +{ + struct shader_cache_entry *e = RB_ENTRY_VALUE(entry, struct shader_cache_entry, entry); + vkd3d_free(e); +} + void vkd3d_shader_cache_close(struct vkd3d_shader_cache *cache) { ULONG refcount = InterlockedDecrement(&cache->refcount); @@ -104,6 +149,8 @@ void vkd3d_shader_cache_close(struct vkd3d_shader_cache *cache) list_remove(&cache->cache_list_entry); vkd3d_mutex_unlock(&cache_list_mutex);

+ rb_destroy(&cache->tree, vkd3d_shader_cache_clear, NULL); + vkd3d_free(cache); }

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Stefan Dösinger

9:51 a.m.

New subject: [PATCH v2 04/22] vkd3d: Implement vkd3d_shader_cache_put.

From: Stefan Dösinger stefan@codeweavers.com

---

The exit(1) should obviously go before merging this. --- libs/vkd3d/cache.c | 128 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 126 insertions(+), 2 deletions(-)

diff --git a/libs/vkd3d/cache.c b/libs/vkd3d/cache.c index cddb40eff..75ba0b64a 100644 --- a/libs/vkd3d/cache.c +++ b/libs/vkd3d/cache.c @@ -73,6 +73,16 @@ static int vkd3d_shader_cache_compare_key(const void *key, const struct rb_entry return 0; }

+static void vkd3d_shader_cache_add_item(struct vkd3d_shader_cache *cache, struct shader_cache_entry *e) +{ + rb_put(&cache->tree, &e->d.hash, &e->entry); +} + +static void vkd3d_shader_cache_remove_item(struct vkd3d_shader_cache *cache, struct shader_cache_entry *e) +{ + rb_remove(&cache->tree, &e->entry); +} + int vkd3d_shader_cache_open(const char *name, const struct vkd3d_shader_cache_desc *desc, struct vkd3d_shader_cache **cache) { @@ -154,11 +164,125 @@ void vkd3d_shader_cache_close(struct vkd3d_shader_cache *cache) vkd3d_free(cache); }

+/* As the name implies this is taken from moltenvk. */ +#define MVKHASH_SEED 5381 +static inline uint64_t mvkHash64(const uint64_t *pVals, size_t count, uint64_t seed) +{ + uint64_t hash = seed; + for (size_t i = 0; i < count; ++i) + hash = ((hash << 5) + hash) ^ pVals[i]; + + return hash; +} + +static uint64_t hash_key(const void *key, size_t size) +{ + uint64_t last = 0, ret; + + ret = mvkHash64(key, size / sizeof(uint64_t), MVKHASH_SEED); + if (size % sizeof(uint64_t)) + { + const char *c = key; + /* FIXME: Endianess? */ + c += align(size, sizeof(uint64_t)) - sizeof(uint64_t); + memcpy(&last, c, size % sizeof(uint64_t)); + ret = mvkHash64(&last, 1, ret); + } + return ret; +} + +static bool vkd3d_shader_cache_trylock(struct vkd3d_shader_cache *cache) +{ + /* Not yet implemented. */ + return true; +} + +static void vkd3d_shader_cache_unlock(struct vkd3d_shader_cache *cache) +{ + /* Not yet implemented. */ +} + int vkd3d_shader_cache_put(struct vkd3d_shader_cache *cache, const void *key, uint32_t key_size, const void *value, uint32_t value_size) { - FIXME("%p, %p, %#x, %p, %#x stub!\n", cache, key, key_size, value, value_size); - return VKD3D_ERROR_NOT_IMPLEMENTED; + struct shader_cache_entry *e; + struct rb_entry *entry; + enum vkd3d_result ret; + uint64_t hash; + + TRACE("%p, %p, %#x, %p, %#x.\n", cache, key, key_size, value, value_size); + + if (cache->desc.flags & VKD3D_SHADER_CACHE_FLAGS_READ_ONLY) + { + WARN("Attempt to modify a read only cache.\n"); + return VKD3D_ERROR; + } + + if (!vkd3d_shader_cache_trylock(cache)) + { + WARN("Cache lock not available.\n"); + return VKD3D_ERROR_LOCK_NOT_AVAILABLE; + } + + hash = hash_key(key, key_size); + entry = rb_get(&cache->tree, &hash); + e = entry ? RB_ENTRY_VALUE(entry, struct shader_cache_entry, entry) : NULL; + + if (e && (e->d.key_size != key_size || memcmp(e->payload, key, key_size))) + { + FIXME("Actual case of hash collission found.\n"); + exit(1); + } + + if (e && e->d.value_size >= value_size) + { + if (e->d.value_size == value_size && !memcmp(e->payload + e->d.key_size, value, value_size)) + { + TRACE("No-op store call, existing item unchanged.\n"); + } + else + { + e->d.value_size = value_size; + memcpy(e->payload + e->d.key_size, value, value_size); + TRACE("Cache item %"PRIu64" overwritten.\n", hash); + } + ret = VKD3D_OK; + goto unlock; + } + else if (e) + { + vkd3d_free(e->payload); + vkd3d_shader_cache_remove_item(cache, e); + vkd3d_free(e); + } + + e = vkd3d_calloc(1, sizeof(*e)); + if (!e) + { + ret = VKD3D_ERROR_OUT_OF_MEMORY; + goto unlock; + } + e->payload = vkd3d_malloc(key_size + value_size); + if (!e->payload) + { + vkd3d_free(e); + ret = VKD3D_ERROR_OUT_OF_MEMORY; + goto unlock; + } + + e->d.key_size = key_size; + e->d.value_size = value_size; + e->d.hash = hash; + memcpy(e->payload, key, key_size); + memcpy(e->payload + key_size, value, value_size); + + vkd3d_shader_cache_add_item(cache, e); + TRACE("Cache item %"PRIu64" stored.\n", hash); + ret = VKD3D_OK; + +unlock: + vkd3d_shader_cache_unlock(cache); + return ret; }

int vkd3d_shader_cache_get(struct vkd3d_shader_cache *cache,

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Stefan Dösinger

9:51 a.m.

New subject: [PATCH v2 05/22] vkd3d: Implement vkd3d_shader_cache_get.

From: Stefan Dösinger stefan@codeweavers.com

--- libs/vkd3d/cache.c | 58 ++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 56 insertions(+), 2 deletions(-)

diff --git a/libs/vkd3d/cache.c b/libs/vkd3d/cache.c index 75ba0b64a..04126c2f0 100644 --- a/libs/vkd3d/cache.c +++ b/libs/vkd3d/cache.c @@ -288,8 +288,62 @@ unlock: int vkd3d_shader_cache_get(struct vkd3d_shader_cache *cache, const void *key, uint32_t key_size, void *value, uint32_t *value_size) { - FIXME("%p, %p, %#x, %p, %p stub!\n", cache, key, key_size, value, value_size); - return VKD3D_ERROR_NOT_IMPLEMENTED; + struct shader_cache_entry *e; + struct rb_entry *entry; + enum vkd3d_result ret; + uint32_t size_in; + uint64_t hash; + + TRACE("%p, %p, %#x, %p, %p.\n", cache, key, key_size, value, value_size); + + if (!vkd3d_shader_cache_trylock(cache)) + { + WARN("Cache lock not available.\n"); + return VKD3D_ERROR_LOCK_NOT_AVAILABLE; + } + + size_in = *value_size; + + hash = hash_key(key, key_size); + entry = rb_get(&cache->tree, &hash); + if (!entry) + { + WARN("entry not found\n"); + ret = VKD3D_ERROR_NOT_FOUND; + goto unlock; + } + + e = RB_ENTRY_VALUE(entry, struct shader_cache_entry, entry); + if (key_size != e->d.key_size || memcmp(key, e->payload, key_size)) + { + /* There is a return value for this, but I want to see if this ever happens. */ + FIXME("Hash collission. sizes %u, %u. read from offset %x hash %"PRIu64"\n", + key_size, e->d.key_size, e->d.offset, e->d.hash); + exit(1); + } + + *value_size = e->d.value_size; + if (!value) + { + TRACE("Found item, returning needed size %#x.\n", e->d.value_size); + ret = VKD3D_OK; + goto unlock; + } + + if (size_in < e->d.value_size) + { + WARN("Output buffer is too small, got %#x want %#x.\n", size_in, e->d.value_size); + ret = VKD3D_ERROR_MORE_DATA; + goto unlock; + } + + memcpy(value, e->payload + e->d.key_size, e->d.value_size); + ret = VKD3D_OK; + TRACE("Returning cached data.\n"); + +unlock: + vkd3d_shader_cache_unlock(cache); + return ret; }

void vkd3d_shader_cache_delete_on_destroy(struct vkd3d_shader_cache *cache)

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Stefan Dösinger

9:51 a.m.

New subject: [PATCH v2 06/22] Add cache locking.

From: Stefan Dösinger stefan@codeweavers.com

Why is this in the cache and not the caller? To allow for future improvements, e.g. reader-writer locks, that allow for simultaneous reads while iterating over the cache.

Why trylock? If we try to add a new pipeline to the cache while iterating over existing pipelines during startup, it is better to discard that new pipeline than block until after all pipelines are loaded. --- libs/vkd3d/cache.c | 14 +++++++++++--- libs/vkd3d/vkd3d_private.h | 11 +++++++++++ 2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/libs/vkd3d/cache.c b/libs/vkd3d/cache.c index 04126c2f0..c7f543a64 100644 --- a/libs/vkd3d/cache.c +++ b/libs/vkd3d/cache.c @@ -49,6 +49,7 @@ struct vkd3d_shader_cache struct vkd3d_shader_cache_desc desc; struct list cache_list_entry;

+ struct vkd3d_mutex lock; struct rb_tree tree;

char name[1]; @@ -133,6 +134,7 @@ int vkd3d_shader_cache_open(const char *name, object->desc = *desc; memcpy(object->name, name, size); rb_init(&object->tree, vkd3d_shader_cache_compare_key); + vkd3d_mutex_init(&object->lock);

list_add_head(&cache_list, &object->cache_list_entry); vkd3d_mutex_unlock(&cache_list_mutex); @@ -160,6 +162,7 @@ void vkd3d_shader_cache_close(struct vkd3d_shader_cache *cache) vkd3d_mutex_unlock(&cache_list_mutex);

rb_destroy(&cache->tree, vkd3d_shader_cache_clear, NULL); + vkd3d_mutex_destroy(&cache->lock);

vkd3d_free(cache); } @@ -193,13 +196,18 @@ static uint64_t hash_key(const void *key, size_t size)

static bool vkd3d_shader_cache_trylock(struct vkd3d_shader_cache *cache) { - /* Not yet implemented. */ - return true; + if (cache->desc.flags & (VKD3D_SHADER_CACHE_FLAGS_NO_SERIALIZE | VKD3D_SHADER_CACHE_FLAGS_READ_ONLY)) + return true; + + return vkd3d_mutex_trylock(&cache->lock); }

static void vkd3d_shader_cache_unlock(struct vkd3d_shader_cache *cache) { - /* Not yet implemented. */ + if (cache->desc.flags & (VKD3D_SHADER_CACHE_FLAGS_NO_SERIALIZE | VKD3D_SHADER_CACHE_FLAGS_READ_ONLY)) + return; + + vkd3d_mutex_unlock(&cache->lock); }

int vkd3d_shader_cache_put(struct vkd3d_shader_cache *cache, diff --git a/libs/vkd3d/vkd3d_private.h b/libs/vkd3d/vkd3d_private.h index 71945ea18..bbe8fa72c 100644 --- a/libs/vkd3d/vkd3d_private.h +++ b/libs/vkd3d/vkd3d_private.h @@ -221,6 +221,11 @@ static inline void vkd3d_mutex_lock(struct vkd3d_mutex *lock) EnterCriticalSection(&lock->lock); }

+static inline bool vkd3d_mutex_trylock(struct vkd3d_mutex *lock) +{ + return TryEnterCriticalSection(&lock->lock); +} + static inline void vkd3d_mutex_unlock(struct vkd3d_mutex *lock) { LeaveCriticalSection(&lock->lock); @@ -325,6 +330,12 @@ static inline void vkd3d_mutex_lock(struct vkd3d_mutex *lock) ERR("Could not lock the mutex, error %d.\n", ret); }

+static inline bool vkd3d_mutex_trylock(struct vkd3d_mutex *lock) +{ + /* FIXME: Untested. */ + return !pthread_mutex_trylock(&lock->lock); +} + static inline void vkd3d_mutex_unlock(struct vkd3d_mutex *lock) { int ret;

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Stefan Dösinger

9:51 a.m.

New subject: [PATCH v2 07/22] vkd3d: Implement vkd3d_shader_cache_enumerate.

From: Stefan Dösinger stefan@codeweavers.com

FIXME: Calling put/get from the enum callback will deadlock with a unix build that uses posix mutexes instead of win32 critical sections. --- libs/vkd3d/cache.c | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/libs/vkd3d/cache.c b/libs/vkd3d/cache.c index c7f543a64..6ca261507 100644 --- a/libs/vkd3d/cache.c +++ b/libs/vkd3d/cache.c @@ -194,6 +194,14 @@ static uint64_t hash_key(const void *key, size_t size) return ret; }

+static void vkd3d_shader_cache_lock(struct vkd3d_shader_cache *cache) +{ + if (cache->desc.flags & (VKD3D_SHADER_CACHE_FLAGS_NO_SERIALIZE | VKD3D_SHADER_CACHE_FLAGS_READ_ONLY)) + return; + + vkd3d_mutex_lock(&cache->lock); +} + static bool vkd3d_shader_cache_trylock(struct vkd3d_shader_cache *cache) { if (cache->desc.flags & (VKD3D_SHADER_CACHE_FLAGS_NO_SERIALIZE | VKD3D_SHADER_CACHE_FLAGS_READ_ONLY)) @@ -360,7 +368,17 @@ void vkd3d_shader_cache_delete_on_destroy(struct vkd3d_shader_cache *cache) }

void vkd3d_shader_cache_enumerate(struct vkd3d_shader_cache *cache, - vkd3d_shader_cache_traverse_func *cb, void *context) + vkd3d_shader_cache_traverse_func *cb, void *ctx) { - FIXME("%p, %p, %p: stub!\n", cache, cb, context); + struct shader_cache_entry *e; + struct rb_entry *iter; + + vkd3d_shader_cache_lock(cache); + RB_FOR_EACH(iter, &cache->tree) + { + e = RB_ENTRY_VALUE(iter, struct shader_cache_entry, entry); + if(!cb(cache, e->payload, e->d.key_size, e->payload + e->d.key_size, e->d.value_size, ctx)) + break; + } + vkd3d_shader_cache_unlock(cache); }

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Stefan Dösinger

9:51 a.m.

New subject: [PATCH v2 08/22] vkd3d: Replace the custom render pass cache with vkd3d_shader_cache.

From: Stefan Dösinger stefan@codeweavers.com

--- libs/vkd3d/device.c | 4 +- libs/vkd3d/state.c | 82 ++++++++++++++++---------------------- libs/vkd3d/vkd3d_private.h | 15 ++----- 3 files changed, 41 insertions(+), 60 deletions(-)

diff --git a/libs/vkd3d/device.c b/libs/vkd3d/device.c index ea243977c..8dc9df959 100644 --- a/libs/vkd3d/device.c +++ b/libs/vkd3d/device.c @@ -2575,7 +2575,7 @@ static ULONG STDMETHODCALLTYPE d3d12_device_Release(ID3D12Device5 *iface) vkd3d_uav_clear_state_cleanup(&device->uav_clear_state, device); vkd3d_destroy_null_resources(&device->null_resources, device); vkd3d_gpu_va_allocator_cleanup(&device->gpu_va_allocator); - vkd3d_render_pass_cache_cleanup(&device->render_pass_cache, device); + vkd3d_render_pass_cache_cleanup(device->render_pass_cache, device); d3d12_device_destroy_pipeline_cache(device); d3d12_device_destroy_vkd3d_queues(device); vkd3d_desc_object_cache_cleanup(&device->view_desc_cache); @@ -4403,7 +4403,7 @@ static HRESULT d3d12_device_init(struct d3d12_device *device, goto out_cleanup_descriptor_heap_layouts; }

- vkd3d_render_pass_cache_init(&device->render_pass_cache); + device->render_pass_cache = vkd3d_render_pass_cache_init(device); vkd3d_gpu_va_allocator_init(&device->gpu_va_allocator); vkd3d_time_domains_init(device);

diff --git a/libs/vkd3d/state.c b/libs/vkd3d/state.c index 82782e7d5..9c389db02 100644 --- a/libs/vkd3d/state.c +++ b/libs/vkd3d/state.c @@ -1573,13 +1573,12 @@ struct vkd3d_render_pass_entry

STATIC_ASSERT(sizeof(struct vkd3d_render_pass_key) == 48);

-static HRESULT vkd3d_render_pass_cache_create_pass_locked(struct vkd3d_render_pass_cache *cache, +static HRESULT vkd3d_render_pass_cache_create_pass_locked(struct vkd3d_shader_cache *cache, struct d3d12_device *device, const struct vkd3d_render_pass_key *key, VkRenderPass *vk_render_pass) { VkAttachmentReference attachment_references[D3D12_SIMULTANEOUS_RENDER_TARGET_COUNT + 1]; VkAttachmentDescription attachments[D3D12_SIMULTANEOUS_RENDER_TARGET_COUNT + 1]; const struct vkd3d_vk_device_procs *vk_procs = &device->vk_procs; - struct vkd3d_render_pass_entry *entry; unsigned int index, attachment_index; VkSubpassDescription sub_pass_desc; VkRenderPassCreateInfo pass_info; @@ -1587,17 +1586,6 @@ static HRESULT vkd3d_render_pass_cache_create_pass_locked(struct vkd3d_render_pa unsigned int rt_count; VkResult vr;

- if (!vkd3d_array_reserve((void **)&cache->render_passes, &cache->render_passes_size, - cache->render_pass_count + 1, sizeof(*cache->render_passes))) - { - *vk_render_pass = VK_NULL_HANDLE; - return E_OUTOFMEMORY; - } - - entry = &cache->render_passes[cache->render_pass_count]; - - entry->key = *key; - have_depth_stencil = key->depth_enable || key->stencil_enable; rt_count = have_depth_stencil ? key->attachment_count - 1 : key->attachment_count; assert(rt_count <= D3D12_SIMULTANEOUS_RENDER_TARGET_COUNT); @@ -1691,8 +1679,7 @@ static HRESULT vkd3d_render_pass_cache_create_pass_locked(struct vkd3d_render_pa pass_info.pDependencies = NULL; if ((vr = VK_CALL(vkCreateRenderPass(device->vk_device, &pass_info, NULL, vk_render_pass))) >= 0) { - entry->vk_render_pass = *vk_render_pass; - ++cache->render_pass_count; + vkd3d_shader_cache_put(cache, key, sizeof(*key), vk_render_pass, sizeof(*vk_render_pass)); } else { @@ -1703,28 +1690,18 @@ static HRESULT vkd3d_render_pass_cache_create_pass_locked(struct vkd3d_render_pa return hresult_from_vk_result(vr); }

-HRESULT vkd3d_render_pass_cache_find(struct vkd3d_render_pass_cache *cache, - struct d3d12_device *device, const struct vkd3d_render_pass_key *key, VkRenderPass *vk_render_pass) +HRESULT vkd3d_render_pass_cache_find(struct vkd3d_shader_cache *cache, struct d3d12_device *device, + const struct vkd3d_render_pass_key *key, VkRenderPass *vk_render_pass) { - bool found = false; + uint32_t size = sizeof(*vk_render_pass); + enum vkd3d_result ret; HRESULT hr = S_OK; - unsigned int i;

vkd3d_mutex_lock(&device->pipeline_cache_mutex);

- for (i = 0; i < cache->render_pass_count; ++i) - { - struct vkd3d_render_pass_entry *current = &cache->render_passes[i]; + ret = vkd3d_shader_cache_get(device->render_pass_cache, key, sizeof(*key), vk_render_pass, &size);

- if (!memcmp(&current->key, key, sizeof(*key))) - { - *vk_render_pass = current->vk_render_pass; - found = true; - break; - } - } - - if (!found) + if (ret) hr = vkd3d_render_pass_cache_create_pass_locked(cache, device, key, vk_render_pass);

vkd3d_mutex_unlock(&device->pipeline_cache_mutex); @@ -1732,27 +1709,38 @@ HRESULT vkd3d_render_pass_cache_find(struct vkd3d_render_pass_cache *cache, return hr; }

-void vkd3d_render_pass_cache_init(struct vkd3d_render_pass_cache *cache) +struct vkd3d_shader_cache *vkd3d_render_pass_cache_init(struct d3d12_device *device) { - cache->render_passes = NULL; - cache->render_pass_count = 0; - cache->render_passes_size = 0; + struct vkd3d_shader_cache_desc cache_desc = {0}; + struct vkd3d_shader_cache *cache; + enum vkd3d_result ret; + char cache_name[128]; + + cache_desc.mem_size = ~0; + cache_desc.max_entries = ~0; + cache_desc.flags = VKD3D_SHADER_CACHE_FLAGS_NO_SERIALIZE | VKD3D_SHADER_CACHE_FLAGS_MEMORY_ONLY; + sprintf(cache_name, "memory:%p:renderpass", device); + + if ((ret = vkd3d_shader_cache_open(cache_name, &cache_desc, &cache))) + ERR("Failed to create an in-memory cache\n"); + return cache; }

-void vkd3d_render_pass_cache_cleanup(struct vkd3d_render_pass_cache *cache, - struct d3d12_device *device) +static bool vkd3d_rp_cache_cleanup(struct vkd3d_shader_cache *cache, + const void *key, uint32_t key_size, const void *data, + uint32_t data_size, void *context) { + struct d3d12_device *device = context; const struct vkd3d_vk_device_procs *vk_procs = &device->vk_procs; - unsigned int i; - - for (i = 0; i < cache->render_pass_count; ++i) - { - struct vkd3d_render_pass_entry *current = &cache->render_passes[i]; - VK_CALL(vkDestroyRenderPass(device->vk_device, current->vk_render_pass, NULL)); - } + VkRenderPass pass = *(VkRenderPass *)data; + VK_CALL(vkDestroyRenderPass(device->vk_device, pass, NULL)); + return true; +}

- vkd3d_free(cache->render_passes); - cache->render_passes = NULL; +void vkd3d_render_pass_cache_cleanup(struct vkd3d_shader_cache *cache, struct d3d12_device *device) +{ + vkd3d_shader_cache_enumerate(cache, vkd3d_rp_cache_cleanup, device); + vkd3d_shader_cache_close(cache); }

static void d3d12_init_pipeline_state_desc(struct d3d12_pipeline_state_desc *desc) @@ -2908,7 +2896,7 @@ static HRESULT d3d12_graphics_pipeline_state_create_render_pass( key.padding = 0; key.sample_count = graphics->ms_desc.rasterizationSamples;

- return vkd3d_render_pass_cache_find(&device->render_pass_cache, device, &key, vk_render_pass); + return vkd3d_render_pass_cache_find(device->render_pass_cache, device, &key, vk_render_pass); }

static VkLogicOp vk_logic_op_from_d3d12(D3D12_LOGIC_OP op) diff --git a/libs/vkd3d/vkd3d_private.h b/libs/vkd3d/vkd3d_private.h index bbe8fa72c..50aad0c68 100644 --- a/libs/vkd3d/vkd3d_private.h +++ b/libs/vkd3d/vkd3d_private.h @@ -544,17 +544,10 @@ struct vkd3d_render_pass_key

struct vkd3d_render_pass_entry;

-struct vkd3d_render_pass_cache -{ - struct vkd3d_render_pass_entry *render_passes; - size_t render_pass_count; - size_t render_passes_size; -}; - -void vkd3d_render_pass_cache_cleanup(struct vkd3d_render_pass_cache *cache, struct d3d12_device *device); -HRESULT vkd3d_render_pass_cache_find(struct vkd3d_render_pass_cache *cache, struct d3d12_device *device, +struct vkd3d_shader_cache *vkd3d_render_pass_cache_init(struct d3d12_device *device); +void vkd3d_render_pass_cache_cleanup(struct vkd3d_shader_cache *cache, struct d3d12_device *device); +HRESULT vkd3d_render_pass_cache_find(struct vkd3d_shader_cache *cache, struct d3d12_device *device, const struct vkd3d_render_pass_key *key, VkRenderPass *vk_render_pass); -void vkd3d_render_pass_cache_init(struct vkd3d_render_pass_cache *cache);

struct vkd3d_private_store { @@ -1786,7 +1779,7 @@ struct d3d12_device bool worker_should_exit;

struct vkd3d_mutex pipeline_cache_mutex; - struct vkd3d_render_pass_cache render_pass_cache; + struct vkd3d_shader_cache *render_pass_cache; VkPipelineCache vk_pipeline_cache;

VkPhysicalDeviceMemoryProperties memory_properties;

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Stefan Dösinger

9:51 a.m.

New subject: [PATCH v2 09/22] vkd3d: Basic shader cache writing and reading.

From: Stefan Dösinger stefan@codeweavers.com

--- libs/vkd3d/cache.c | 243 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 243 insertions(+)

diff --git a/libs/vkd3d/cache.c b/libs/vkd3d/cache.c index 6ca261507..049cf00ed 100644 --- a/libs/vkd3d/cache.c +++ b/libs/vkd3d/cache.c @@ -32,6 +32,21 @@ static LONG cache_mutex_initialized; * * We don't intend these files to be read by third party code, so consider them a vkd3d * implementation detail. */ + +/* TODO: Endinaness of all these uints. */ + +/* VKD3DSHC */ +#define VKD3D_SHADER_CACHE_MAGIC 0x564B443344534843ull +#define VKD3D_SHADER_CACHE_VERSION ((uint64_t)1) + +struct vkd3d_cache_header_v1 +{ + uint64_t magic; + uint64_t struct_size; + uint64_t vkd3d_version; + uint64_t app_version; +}; + struct vkd3d_cache_object_v1 { uint64_t hash; @@ -52,6 +67,8 @@ struct vkd3d_shader_cache struct vkd3d_mutex lock; struct rb_tree tree;

+ FILE *indices, *values; + char name[1]; };

@@ -84,6 +101,145 @@ static void vkd3d_shader_cache_remove_item(struct vkd3d_shader_cache *cache, str rb_remove(&cache->tree, &e->entry); }

+static bool vkd3d_shader_cache_read_entry(struct vkd3d_shader_cache *cache, struct shader_cache_entry *e) +{ + size_t len; + + TRACE("reading object key len %u, data %ud.\n", e->d.key_size, e->d.value_size); + /* TODO: Check if the read size makes sense - is it smaller than the requested + * max size, is it smaller than the file on the disk etc. */ + e->payload = vkd3d_malloc(e->d.key_size + e->d.value_size); + if (!e->payload) + { + WARN("Out of memory.\n"); + return false; + } + + if (e->d.disk_size != e->d.key_size + e->d.value_size) + ERR("How do I get a compressed object before implementing compression?\n"); + + fseek(cache->values, e->d.offset, SEEK_SET); + len = fread(e->payload, e->d.key_size + e->d.value_size, 1, cache->values); + if (len != 1) + { + /* I suppose this could be handled better. */ + ERR("Failed to read cached object data len %u offset %u.\n", + e->d.key_size + e->d.value_size, e->d.offset); + vkd3d_free(e->payload); + return false; + } + + return true; +} + +static void vkd3d_shader_cache_read(struct vkd3d_shader_cache *cache) +{ + const bool ro = cache->desc.flags & VKD3D_SHADER_CACHE_FLAGS_READ_ONLY; + struct shader_cache_entry *e = NULL; + struct vkd3d_cache_header_v1 hdr; + char *filename; + FILE *indices; + size_t len; + + filename = vkd3d_malloc(strlen(cache->name) + 5); + + sprintf(filename, "%s.val", cache->name); + cache->values = fopen(filename, ro ? "rb" : "r+b"); + if (!cache->values) + { + if (ro) + { + WARN("Read only cache file %s not found.\n", filename); + return; + } + + cache->values = fopen(filename, "w+b"); + if (!cache->values) + { + WARN("Value file %s not found and could not be created.\n", filename); + /* Convert to mem only. */ + cache->desc.disk_size = 0; + cache->desc.flags |= VKD3D_SHADER_CACHE_FLAGS_MEMORY_ONLY; + vkd3d_free(filename); + return; + } + } + + sprintf(filename, "%s.idx", cache->name); + indices = fopen(filename, "rb"); + if (!indices) + { + /* This happens when the cache files did not exist. Keep the opened + * values file, we'll use it later. */ + WARN("Index file %s not found.\n", filename); + vkd3d_free(filename); + return; + } + + vkd3d_free(filename); + + TRACE("Reading cache %s.{idx, val}.\n", cache->name); + + len = fread(&hdr, sizeof(hdr), 1, indices); + if (len != 1) + { + WARN("Failed to read cache header.\n"); + goto done; + } + if (hdr.magic != VKD3D_SHADER_CACHE_MAGIC) + { + WARN("Invalid cache magic.\n"); + goto done; + } + if (hdr.struct_size < sizeof(hdr)) + { + WARN("Invalid cache header size.\n"); + goto done; + } + if (hdr.vkd3d_version != VKD3D_SHADER_CACHE_VERSION) + { + WARN("vkd3d shader version mismatch: Got %"PRIu64", want %"PRIu64".\n", + hdr.vkd3d_version, VKD3D_SHADER_CACHE_VERSION); + goto done; + } + if (hdr.app_version != cache->desc.version) + { + WARN("Application version mismatch: Cache has %"PRIu64", app wants %"PRIu64".\n", + hdr.app_version, cache->desc.version); + goto done; + } + + while (!feof(indices)) + { + e = vkd3d_calloc(1, sizeof(*e)); + if (!e) + { + WARN("Alloc fail.\n"); + break; + } + + len = fread(&e->d, sizeof(e->d), 1, indices); + if (len != 1) + { + if (!feof(indices)) + ERR("Failed to read object header.\n"); + break; + } + + if (!vkd3d_shader_cache_read_entry(cache, e)) + break; + + vkd3d_shader_cache_add_item(cache, e); + + TRACE("Loaded an item.\n"); + e = NULL; + } + +done: + vkd3d_free(e); + fclose(indices); +} + int vkd3d_shader_cache_open(const char *name, const struct vkd3d_shader_cache_desc *desc, struct vkd3d_shader_cache **cache) { @@ -98,6 +254,17 @@ int vkd3d_shader_cache_open(const char *name, return VKD3D_ERROR_INVALID_ARGUMENT; }

+ if (desc->disk_size && (desc->flags & VKD3D_SHADER_CACHE_FLAGS_MEMORY_ONLY)) + { + WARN("Disk size %lu with memory only flag set.\n", desc->disk_size); + return VKD3D_ERROR_INVALID_ARGUMENT; + } + if (!desc->disk_size && !(desc->flags & VKD3D_SHADER_CACHE_FLAGS_MEMORY_ONLY)) + { + WARN("On-disk cache of size 0.\n", desc->disk_size); + return VKD3D_ERROR_INVALID_ARGUMENT; + } + /* FIXME: This isn't thread safe and cache_mutex_initialized might overflow. Do we have a * something like DllMain or a platform-independent InitializeOnce? */ if (InterlockedIncrement(&cache_mutex_initialized) == 1) @@ -139,6 +306,9 @@ int vkd3d_shader_cache_open(const char *name, list_add_head(&cache_list, &object->cache_list_entry); vkd3d_mutex_unlock(&cache_list_mutex);

+ if (!(desc->flags & VKD3D_SHADER_CACHE_FLAGS_MEMORY_ONLY)) + vkd3d_shader_cache_read(object); + *cache = object; return VKD3D_OK; } @@ -149,6 +319,76 @@ static void vkd3d_shader_cache_clear(struct rb_entry *entry, void *context) vkd3d_free(e); }

+struct write_context +{ + struct vkd3d_shader_cache *cache; + FILE *indices; +}; + +static void vkd3d_shader_cache_write_entry(struct rb_entry *entry, void *context) +{ + struct shader_cache_entry *e = RB_ENTRY_VALUE(entry, struct shader_cache_entry, entry); + struct write_context *ctx = context; + struct vkd3d_shader_cache *cache = ctx->cache; + + /* TODO: Compress the data. */ + e->d.disk_size = e->d.key_size + e->d.value_size; + e->d.offset = ftell(cache->values); + + fwrite(&e->d, sizeof(e->d), 1, ctx->indices); + fwrite(e->payload, e->d.disk_size, 1, cache->values); +} + +static void vkd3d_shader_cache_write(struct vkd3d_shader_cache *cache) +{ + struct vkd3d_cache_header_v1 hdr; + struct write_context ctx; + char *filename; + + if (cache->desc.flags & VKD3D_SHADER_CACHE_FLAGS_READ_ONLY) + { + fclose (cache->values); + return; + } + + fseek(cache->values, 0, SEEK_END); + + filename = vkd3d_malloc(strlen(cache->name) + 5); + /* For now unconditionally repack. */ + if (1) + { + fclose(cache->values); + sprintf(filename, "%s.val", cache->name); + cache->values = fopen(filename, "w+b"); + if (!cache->values) + ERR("Reopen fail\n"); + } + + sprintf(filename, "%s.idx", cache->name); + ctx.indices = fopen(filename, "wb"); + if (!ctx.indices) + { + WARN("Failed to open %s\n", filename); + vkd3d_free(filename); + return; + } + vkd3d_free(filename); + + ctx.cache = cache; + hdr.magic = VKD3D_SHADER_CACHE_MAGIC; + hdr.struct_size = sizeof(hdr); + hdr.vkd3d_version = VKD3D_SHADER_CACHE_VERSION; + hdr.app_version = cache->desc.version; + + fwrite(&hdr, sizeof(hdr), 1, ctx.indices); + + rb_for_each_entry(&cache->tree, vkd3d_shader_cache_write_entry, &ctx); + + fseek(cache->values, 0, SEEK_END); + fclose(cache->values); + fclose(ctx.indices); +} + void vkd3d_shader_cache_close(struct vkd3d_shader_cache *cache) { ULONG refcount = InterlockedDecrement(&cache->refcount); @@ -157,6 +397,9 @@ void vkd3d_shader_cache_close(struct vkd3d_shader_cache *cache) if (refcount) return;

+ if (!(cache->desc.flags & VKD3D_SHADER_CACHE_FLAGS_MEMORY_ONLY)) + vkd3d_shader_cache_write(cache); + vkd3d_mutex_lock(&cache_list_mutex); list_remove(&cache->cache_list_entry); vkd3d_mutex_unlock(&cache_list_mutex);

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Stefan Dösinger

9:51 a.m.

New subject: [PATCH v2 10/22] Add a win32 version of vkd3d_get_program_name.

From: Stefan Dösinger stefan@codeweavers.com

Taken from wined3d_get_app_name. --- libs/vkd3d/utils.c | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+)

diff --git a/libs/vkd3d/utils.c b/libs/vkd3d/utils.c index 751971220..0284896ac 100644 --- a/libs/vkd3d/utils.c +++ b/libs/vkd3d/utils.c @@ -881,6 +881,33 @@ bool vkd3d_get_program_name(char program_name[PATH_MAX]) return true; }

+#elif defined(WIN32) + +bool vkd3d_get_program_name(char program_name[PATH_MAX]) +{ + char buffer[MAX_PATH]; + unsigned int len; + char *p, *name; + + *program_name = '\0'; + len = GetModuleFileNameA(0, buffer, ARRAY_SIZE(buffer)); + if (!(len && len < MAX_PATH)) + return false; + + name = buffer; + if ((p = strrchr(name, '/' ))) + name = p + 1; + if ((p = strrchr(name, '\'))) + name = p + 1; + + len = strlen(name) + 1; + if (PATH_MAX < len) + return false; + + memcpy(program_name, name, len); + return true; +} + #else

bool vkd3d_get_program_name(char program_name[PATH_MAX])

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Stefan Dösinger

9:51 a.m.

New subject: [PATCH v2 11/22] vkd3d: Keep the application name around.

From: Stefan Dösinger stefan@codeweavers.com

--- libs/vkd3d/device.c | 4 ++++ libs/vkd3d/vkd3d_private.h | 1 + 2 files changed, 5 insertions(+)

diff --git a/libs/vkd3d/device.c b/libs/vkd3d/device.c index 8dc9df959..8d0671add 100644 --- a/libs/vkd3d/device.c +++ b/libs/vkd3d/device.c @@ -614,6 +614,7 @@ static HRESULT vkd3d_instance_init(struct vkd3d_instance *instance, application_info.apiVersion = VK_API_VERSION_1_0; instance->api_version = VKD3D_API_VERSION_1_0;

+ application_info.pApplicationName = ""; if ((vkd3d_application_info = vkd3d_find_struct(create_info->next, APPLICATION_INFO))) { if (vkd3d_application_info->application_name) @@ -633,6 +634,9 @@ static HRESULT vkd3d_instance_init(struct vkd3d_instance *instance, application_info.pApplicationName = application_name; }

+ strncpy(instance->application_name, application_info.pApplicationName, + ARRAY_SIZE(instance->application_name)); + instance->application_name[ARRAY_SIZE(instance->application_name) - 1] = '\0'; TRACE("Application: %s.\n", debugstr_a(application_info.pApplicationName)); TRACE("vkd3d API version: %u.\n", instance->api_version);

diff --git a/libs/vkd3d/vkd3d_private.h b/libs/vkd3d/vkd3d_private.h index 50aad0c68..ad1c03397 100644 --- a/libs/vkd3d/vkd3d_private.h +++ b/libs/vkd3d/vkd3d_private.h @@ -192,6 +192,7 @@ struct vkd3d_instance uint64_t host_ticks_per_second;

LONG refcount; + char application_name[PATH_MAX]; };

#ifdef _WIN32

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Stefan Dösinger

9:51 a.m.

New subject: [PATCH v2 12/22] vkd3d: Store the VK pipeline cache in an on-disk vkd3d cache.

From: Stefan Dösinger stefan@codeweavers.com

--- libs/vkd3d/device.c | 78 ++++++++++++++++++++++++++++++++++++-- libs/vkd3d/vkd3d_private.h | 32 ++++++++++++++++ 2 files changed, 107 insertions(+), 3 deletions(-)

diff --git a/libs/vkd3d/device.c b/libs/vkd3d/device.c index 8d0671add..b4ad0bdd1 100644 --- a/libs/vkd3d/device.c +++ b/libs/vkd3d/device.c @@ -19,8 +19,17 @@ #include "vkd3d_private.h" #include "vkd3d_version.h"

+#ifdef HAVE_UNISTD_H +#include <unistd.h> +#endif + #define VKD3D_MAX_UAV_CLEAR_DESCRIPTORS_PER_TYPE 256u

+/* FIXME: We may want to put the GPU and driver identities in there, + * although under which conditions the pipeline cache can be transfered + * from one GPU/driver to another is a Vulkan implementation detail. */ +static const char vk_pipeline_cache_key[] = "vk_pipeline_cache"; + struct vkd3d_struct { enum vkd3d_structure_type type; @@ -2096,15 +2105,26 @@ static HRESULT d3d12_device_init_pipeline_cache(struct d3d12_device *device) { const struct vkd3d_vk_device_procs *vk_procs = &device->vk_procs; VkPipelineCacheCreateInfo cache_info; + struct vkd3d_shader_cache_vk_blob *cache_data = NULL; + uint32_t cache_size = 0; VkResult vr;

vkd3d_mutex_init(&device->pipeline_cache_mutex);

+ if (!vkd3d_shader_cache_get(device->persistent_cache, vk_pipeline_cache_key, + sizeof(vk_pipeline_cache_key), NULL, &cache_size)) + { + cache_data = vkd3d_malloc(cache_size); + vkd3d_shader_cache_get(device->persistent_cache, vk_pipeline_cache_key, + sizeof(vk_pipeline_cache_key), cache_data, &cache_size); + cache_size -= offsetof(struct vkd3d_shader_cache_vk_blob, blob[0]); + } + cache_info.sType = VK_STRUCTURE_TYPE_PIPELINE_CACHE_CREATE_INFO; cache_info.pNext = NULL; cache_info.flags = 0; - cache_info.initialDataSize = 0; - cache_info.pInitialData = NULL; + cache_info.initialDataSize = cache_size; + cache_info.pInitialData = cache_data->blob; if ((vr = VK_CALL(vkCreatePipelineCache(device->vk_device, &cache_info, NULL, &device->vk_pipeline_cache))) < 0) { @@ -2112,15 +2132,40 @@ static HRESULT d3d12_device_init_pipeline_cache(struct d3d12_device *device) device->vk_pipeline_cache = VK_NULL_HANDLE; }

+ vkd3d_free(cache_data); + return S_OK; }

static void d3d12_device_destroy_pipeline_cache(struct d3d12_device *device) { const struct vkd3d_vk_device_procs *vk_procs = &device->vk_procs; + struct vkd3d_shader_cache_vk_blob *cache_data = NULL; + size_t cache_size = 0; + VkResult vr;

if (device->vk_pipeline_cache) + { + vr = VK_CALL(vkGetPipelineCacheData(device->vk_device, device->vk_pipeline_cache, &cache_size, NULL)); + if (vr == VK_SUCCESS && cache_size) + cache_data = vkd3d_malloc(offsetof(struct vkd3d_shader_cache_vk_blob, blob[cache_size])); + if (cache_data) + { + cache_data->header.type = SHADER_CACHE_ENTRY_VULKAN_BLOB; + cache_data->header.vkd3d_revision = VKD3D_SHADER_CACHE_VKD3D_VERSION; + vr = VK_CALL(vkGetPipelineCacheData(device->vk_device, device->vk_pipeline_cache, + &cache_size, cache_data->blob)); + if (vr == VK_SUCCESS) + { + vkd3d_shader_cache_put(device->persistent_cache, vk_pipeline_cache_key, + sizeof(vk_pipeline_cache_key), cache_data, + offsetof(struct vkd3d_shader_cache_vk_blob, blob[cache_size])); + } + vkd3d_free(cache_data); + } + VK_CALL(vkDestroyPipelineCache(device->vk_device, device->vk_pipeline_cache, NULL)); + }

vkd3d_mutex_destroy(&device->pipeline_cache_mutex); } @@ -2587,6 +2632,7 @@ static ULONG STDMETHODCALLTYPE d3d12_device_Release(ID3D12Device5 *iface) if (device->use_vk_heaps) device_worker_stop(device); vkd3d_free(device->heaps); + vkd3d_shader_cache_close(device->persistent_cache); VK_CALL(vkDestroyDevice(device->vk_device, NULL)); if (device->parent) IUnknown_Release(device->parent); @@ -4355,7 +4401,9 @@ static void *device_worker_main(void *arg) static HRESULT d3d12_device_init(struct d3d12_device *device, struct vkd3d_instance *instance, const struct vkd3d_device_create_info *create_info) { + struct vkd3d_shader_cache_desc cache_desc = {0}; const struct vkd3d_vk_device_procs *vk_procs; + char *cache_name, *cwd; HRESULT hr;

device->ID3D12Device5_iface.lpVtbl = &d3d12_device_vtbl; @@ -4382,8 +4430,30 @@ static HRESULT d3d12_device_init(struct d3d12_device *device, if (FAILED(hr = vkd3d_create_vk_device(device, create_info))) goto out_free_instance;

+ /* FIXME: Does this use of getcwd work on Unix too? */ + cwd = getcwd(NULL, 0); + cache_name = vkd3d_malloc(strlen(cwd) + strlen(instance->application_name) + 8); + sprintf(cache_name, "%s/%s.cache", cwd, instance->application_name); + free(cwd); /* Use libc's free() because it is malloc'ed by getcwd. */ + + cache_desc.mem_size = 32 << 20; + cache_desc.disk_size = ~0u; + cache_desc.max_entries = ~0u; + cache_desc.version = VKD3D_SHADER_CACHE_OBJ_VERSION; + if (vkd3d_shader_cache_open(cache_name, &cache_desc, &device->persistent_cache)) + { + FIXME("Failed to open shader cache %s\n", debugstr_a(cache_name)); + cache_desc.disk_size = 0; + if (vkd3d_shader_cache_open(cache_name, &cache_desc, &device->persistent_cache)) + { + vkd3d_free(cache_name); + goto out_free_vk_resources; + } + } + vkd3d_free(cache_name); + if (FAILED(hr = d3d12_device_init_pipeline_cache(device))) - goto out_free_vk_resources; + goto out_free_cache;

if (FAILED(hr = vkd3d_private_store_init(&device->private_store))) goto out_free_pipeline_cache; @@ -4436,6 +4506,8 @@ out_free_private_store: vkd3d_private_store_destroy(&device->private_store); out_free_pipeline_cache: d3d12_device_destroy_pipeline_cache(device); +out_free_cache: + vkd3d_shader_cache_close(device->persistent_cache); out_free_vk_resources: vk_procs = &device->vk_procs; VK_CALL(vkDestroyDevice(device->vk_device, NULL)); diff --git a/libs/vkd3d/vkd3d_private.h b/libs/vkd3d/vkd3d_private.h index ad1c03397..acd010521 100644 --- a/libs/vkd3d/vkd3d_private.h +++ b/libs/vkd3d/vkd3d_private.h @@ -43,6 +43,37 @@ #include <limits.h> #include <stdbool.h>

+/* The following structures define data structures that are stored in vkd3d's + * cache. It doesn't define the cache format itself, those details are found + * in cache.c. + * + * Changing the structures will break compatibility with existing cache files. + * In this case bump VKD3D_SHADER_CACHE_OBJ_VERSION. + * + * The structs aren't meant to be read by external code, consider them a vkd3d + * implementation detail. */ +#define VKD3D_SHADER_CACHE_OBJ_VERSION 1ull +#define VKD3D_SHADER_CACHE_VKD3D_VERSION 1u + +enum vkd3d_shader_cache_entry_type +{ + SHADER_CACHE_ENTRY_VULKAN_BLOB = VKD3D_MAKE_TAG('V', 'K', 'P', 'C'), +}; + +struct vkd3d_shader_cache_entry +{ + uint32_t vkd3d_revision; /* Put the git revision here, discard translated code if changed. */ + uint32_t type; +}; + +struct vkd3d_shader_cache_vk_blob +{ + struct vkd3d_shader_cache_entry header; + uint8_t blob[1]; +}; + +/* End shader data structures */ + #define VK_CALL(f) (vk_procs->f)

#define VKD3D_DESCRIPTOR_MAGIC_FREE 0x00000000u @@ -1780,6 +1811,7 @@ struct d3d12_device bool worker_should_exit;

struct vkd3d_mutex pipeline_cache_mutex; + struct vkd3d_shader_cache *persistent_cache; struct vkd3d_shader_cache *render_pass_cache; VkPipelineCache vk_pipeline_cache;

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Stefan Dösinger

9:51 a.m.

New subject: [PATCH v2 13/22] Store render passes in the on-disk cache and recreate them on startup.

From: Stefan Dösinger stefan@codeweavers.com

This doesn't do all too much, renderpass creation is fast. It is a nice demonstration though. We might want to skip this patch when committing the cache upstream. --- libs/vkd3d/device.c | 27 +++++++++++++++++++++++++++ libs/vkd3d/state.c | 5 +++++ libs/vkd3d/vkd3d_private.h | 23 ++++++++++++----------- 3 files changed, 44 insertions(+), 11 deletions(-)

diff --git a/libs/vkd3d/device.c b/libs/vkd3d/device.c index b4ad0bdd1..d07bfa67f 100644 --- a/libs/vkd3d/device.c +++ b/libs/vkd3d/device.c @@ -4398,6 +4398,31 @@ static void *device_worker_main(void *arg) return NULL; }

+static bool d3d12_device_load_cache(struct vkd3d_shader_cache *cache, + const void *key, uint32_t key_size, const void *value, + uint32_t value_size, void *context) +{ + const struct vkd3d_shader_cache_entry *e = value; + struct d3d12_device *device = context; + VkRenderPass rp; + + TRACE("device %p got entry type (%c%c%c%c)\n", device, + e->type & 0xff, e->type >> 8 & 0xff, e->type >> 16 & 0xff, + e->type >> 24 & 0xff); + + switch (e->type) + { + case SHADER_CACHE_ENTRY_RENDER_PASS: + vkd3d_render_pass_cache_find(device->render_pass_cache, device, key, &rp); + break; + + case SHADER_CACHE_ENTRY_VULKAN_BLOB: + break; + } + + return true; +} + static HRESULT d3d12_device_init(struct d3d12_device *device, struct vkd3d_instance *instance, const struct vkd3d_device_create_info *create_info) { @@ -4489,6 +4514,8 @@ static HRESULT d3d12_device_init(struct d3d12_device *device,

device_init_descriptor_pool_sizes(device);

+ vkd3d_shader_cache_enumerate(device->persistent_cache, d3d12_device_load_cache, device); + if ((device->parent = create_info->parent)) IUnknown_AddRef(device->parent);

diff --git a/libs/vkd3d/state.c b/libs/vkd3d/state.c index 9c389db02..fe0405ac2 100644 --- a/libs/vkd3d/state.c +++ b/libs/vkd3d/state.c @@ -1579,6 +1579,7 @@ static HRESULT vkd3d_render_pass_cache_create_pass_locked(struct vkd3d_shader_ca VkAttachmentReference attachment_references[D3D12_SIMULTANEOUS_RENDER_TARGET_COUNT + 1]; VkAttachmentDescription attachments[D3D12_SIMULTANEOUS_RENDER_TARGET_COUNT + 1]; const struct vkd3d_vk_device_procs *vk_procs = &device->vk_procs; + struct vkd3d_shader_cache_entry rp_data; unsigned int index, attachment_index; VkSubpassDescription sub_pass_desc; VkRenderPassCreateInfo pass_info; @@ -1687,6 +1688,10 @@ static HRESULT vkd3d_render_pass_cache_create_pass_locked(struct vkd3d_shader_ca *vk_render_pass = VK_NULL_HANDLE; }

+ rp_data.type = SHADER_CACHE_ENTRY_RENDER_PASS; + rp_data.vkd3d_revision = VKD3D_SHADER_CACHE_VKD3D_VERSION; + vkd3d_shader_cache_put(device->persistent_cache, key, sizeof(*key), &rp_data, sizeof(rp_data)); + return hresult_from_vk_result(vr); }

diff --git a/libs/vkd3d/vkd3d_private.h b/libs/vkd3d/vkd3d_private.h index acd010521..542fa98a7 100644 --- a/libs/vkd3d/vkd3d_private.h +++ b/libs/vkd3d/vkd3d_private.h @@ -55,8 +55,20 @@ #define VKD3D_SHADER_CACHE_OBJ_VERSION 1ull #define VKD3D_SHADER_CACHE_VKD3D_VERSION 1u

+struct vkd3d_render_pass_key +{ + unsigned int attachment_count; + bool depth_enable; + bool stencil_enable; + bool depth_stencil_write; + bool padding; + unsigned int sample_count; + VkFormat vk_formats[D3D12_SIMULTANEOUS_RENDER_TARGET_COUNT + 1]; +}; + enum vkd3d_shader_cache_entry_type { + SHADER_CACHE_ENTRY_RENDER_PASS = VKD3D_MAKE_TAG('R', 'P', 'A', 'S'), SHADER_CACHE_ENTRY_VULKAN_BLOB = VKD3D_MAKE_TAG('V', 'K', 'P', 'C'), };

@@ -563,17 +575,6 @@ D3D12_GPU_VIRTUAL_ADDRESS vkd3d_gpu_va_allocator_allocate(struct vkd3d_gpu_va_al void *vkd3d_gpu_va_allocator_dereference(struct vkd3d_gpu_va_allocator *allocator, D3D12_GPU_VIRTUAL_ADDRESS address); void vkd3d_gpu_va_allocator_free(struct vkd3d_gpu_va_allocator *allocator, D3D12_GPU_VIRTUAL_ADDRESS address);

-struct vkd3d_render_pass_key -{ - unsigned int attachment_count; - bool depth_enable; - bool stencil_enable; - bool depth_stencil_write; - bool padding; - unsigned int sample_count; - VkFormat vk_formats[D3D12_SIMULTANEOUS_RENDER_TARGET_COUNT + 1]; -}; - struct vkd3d_render_pass_entry;

struct vkd3d_shader_cache *vkd3d_render_pass_cache_init(struct d3d12_device *device);

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Stefan Dösinger

9:51 a.m.

New subject: [PATCH v2 14/22] vkd3d: Keep root signatures around.

From: Stefan Dösinger stefan@codeweavers.com

--- libs/vkd3d/device.c | 9 ++++- libs/vkd3d/state.c | 69 +++++++++++++++++++++++++++++++++----- libs/vkd3d/vkd3d_private.h | 5 +++ tests/d3d12.c | 8 ++--- 4 files changed, 77 insertions(+), 14 deletions(-)

diff --git a/libs/vkd3d/device.c b/libs/vkd3d/device.c index d07bfa67f..93771d638 100644 --- a/libs/vkd3d/device.c +++ b/libs/vkd3d/device.c @@ -2632,6 +2632,7 @@ static ULONG STDMETHODCALLTYPE d3d12_device_Release(ID3D12Device5 *iface) if (device->use_vk_heaps) device_worker_stop(device); vkd3d_free(device->heaps); + vkd3d_root_signature_cache_cleanup(device->root_signature_cache, device); vkd3d_shader_cache_close(device->persistent_cache); VK_CALL(vkDestroyDevice(device->vk_device, NULL)); if (device->parent) @@ -4477,9 +4478,13 @@ static HRESULT d3d12_device_init(struct d3d12_device *device, } vkd3d_free(cache_name);

- if (FAILED(hr = d3d12_device_init_pipeline_cache(device))) + device->root_signature_cache = vkd3d_root_signature_cache_init(device); + if (!device->root_signature_cache) goto out_free_cache;

+ if (FAILED(hr = d3d12_device_init_pipeline_cache(device))) + goto out_free_cache2; + if (FAILED(hr = vkd3d_private_store_init(&device->private_store))) goto out_free_pipeline_cache;

@@ -4533,6 +4538,8 @@ out_free_private_store: vkd3d_private_store_destroy(&device->private_store); out_free_pipeline_cache: d3d12_device_destroy_pipeline_cache(device); +out_free_cache2: + vkd3d_shader_cache_close(device->root_signature_cache); out_free_cache: vkd3d_shader_cache_close(device->persistent_cache); out_free_vk_resources: diff --git a/libs/vkd3d/state.c b/libs/vkd3d/state.c index fe0405ac2..417707088 100644 --- a/libs/vkd3d/state.c +++ b/libs/vkd3d/state.c @@ -55,6 +55,12 @@ static ULONG STDMETHODCALLTYPE d3d12_root_signature_AddRef(ID3D12RootSignature * ULONG refcount = InterlockedIncrement(&root_signature->refcount);

TRACE("%p increasing refcount to %u.\n", root_signature, refcount); + if (refcount == 1) + { + if (FAILED(vkd3d_private_store_init(&root_signature->private_store))) + ERR("mama!!!\n"); + d3d12_device_add_ref(root_signature->device); + }

return refcount; } @@ -117,10 +123,8 @@ static ULONG STDMETHODCALLTYPE d3d12_root_signature_Release(ID3D12RootSignature if (!refcount) { struct d3d12_device *device = root_signature->device; - vkd3d_private_store_destroy(&root_signature->private_store); - d3d12_root_signature_cleanup(root_signature, device); - vkd3d_free(root_signature); d3d12_device_release(device); + vkd3d_private_store_destroy(&root_signature->private_store); }

return refcount; @@ -1411,7 +1415,7 @@ static HRESULT d3d12_root_signature_init(struct d3d12_root_signature *root_signa binding_desc = NULL;

root_signature->ID3D12RootSignature_iface.lpVtbl = &d3d12_root_signature_vtbl; - root_signature->refcount = 1; + root_signature->refcount = 0;

root_signature->vk_pipeline_layout = VK_NULL_HANDLE; root_signature->vk_set_count = 0; @@ -1511,11 +1515,6 @@ static HRESULT d3d12_root_signature_init(struct d3d12_root_signature *root_signa root_signature->push_constant_ranges, &root_signature->vk_pipeline_layout))) goto fail;

- if (FAILED(hr = vkd3d_private_store_init(&root_signature->private_store))) - goto fail; - - d3d12_device_add_ref(device); - return S_OK;

fail: @@ -1534,9 +1533,20 @@ HRESULT d3d12_root_signature_create(struct d3d12_device *device, struct vkd3d_shader_versioned_root_signature_desc vkd3d; } root_signature_desc; struct d3d12_root_signature *object; + uint32_t size = sizeof(object); HRESULT hr; int ret;

+ ret = vkd3d_shader_cache_get(device->root_signature_cache, bytecode, bytecode_length, + &object, &size); + if (ret == VKD3D_OK) + { + ERR("found cached root sig\n"); + *root_signature = object; + d3d12_root_signature_AddRef(&object->ID3D12RootSignature_iface); + return S_OK; + } + if ((ret = vkd3d_parse_root_signature_v_1_0(&dxbc, &root_signature_desc.vkd3d)) < 0) { WARN("Failed to parse root signature, vkd3d result %d.\n", ret); @@ -1559,11 +1569,52 @@ HRESULT d3d12_root_signature_create(struct d3d12_device *device,

TRACE("Created root signature %p.\n", object);

+ ret = vkd3d_shader_cache_put(device->root_signature_cache, bytecode, bytecode_length, + &object, size); + if (ret) + ERR("papa!\n"); + *root_signature = object; + d3d12_root_signature_AddRef(&object->ID3D12RootSignature_iface);

return S_OK; }

+struct vkd3d_shader_cache *vkd3d_root_signature_cache_init(struct d3d12_device *device) +{ + struct vkd3d_shader_cache_desc cache_desc = {0}; + struct vkd3d_shader_cache *cache; + char cache_name[64]; + + cache_desc.mem_size = ~0u; + cache_desc.max_entries = ~0u; + cache_desc.flags = VKD3D_SHADER_CACHE_FLAGS_MEMORY_ONLY; + + sprintf(cache_name, "memory:%p:root signatures", device); + if (vkd3d_shader_cache_open(cache_name, &cache_desc, &cache)) + return NULL; + + return cache; +} + +static bool vkd3d_rs_cache_cleanup(struct vkd3d_shader_cache *cache, + const void *key, uint32_t key_size, const void *value, + uint32_t value_size, void *context) +{ + struct d3d12_root_signature *root_signature = *(struct d3d12_root_signature **)value; + struct d3d12_device *device = context; + + d3d12_root_signature_cleanup(root_signature, device); + vkd3d_free(root_signature); + return true; +} + +void vkd3d_root_signature_cache_cleanup(struct vkd3d_shader_cache *cache, struct d3d12_device *device) +{ + vkd3d_shader_cache_enumerate(cache, vkd3d_rs_cache_cleanup, device); + vkd3d_shader_cache_close(device->root_signature_cache); +} + /* vkd3d_render_pass_cache */ struct vkd3d_render_pass_entry { diff --git a/libs/vkd3d/vkd3d_private.h b/libs/vkd3d/vkd3d_private.h index 542fa98a7..13b031789 100644 --- a/libs/vkd3d/vkd3d_private.h +++ b/libs/vkd3d/vkd3d_private.h @@ -579,6 +579,10 @@ struct vkd3d_render_pass_entry;

struct vkd3d_shader_cache *vkd3d_render_pass_cache_init(struct d3d12_device *device); void vkd3d_render_pass_cache_cleanup(struct vkd3d_shader_cache *cache, struct d3d12_device *device); +HRESULT vkd3d_render_pass_cache_find(struct vkd3d_shader_cache *cache, struct d3d12_device *device, + const struct vkd3d_render_pass_key *key, VkRenderPass *vk_render_pass); +struct vkd3d_shader_cache *vkd3d_root_signature_cache_init(struct d3d12_device *device); +void vkd3d_root_signature_cache_cleanup(struct vkd3d_shader_cache *cache, struct d3d12_device *device); HRESULT vkd3d_render_pass_cache_find(struct vkd3d_shader_cache *cache, struct d3d12_device *device, const struct vkd3d_render_pass_key *key, VkRenderPass *vk_render_pass);

@@ -1814,6 +1818,7 @@ struct d3d12_device struct vkd3d_mutex pipeline_cache_mutex; struct vkd3d_shader_cache *persistent_cache; struct vkd3d_shader_cache *render_pass_cache; + struct vkd3d_shader_cache *root_signature_cache; VkPipelineCache vk_pipeline_cache;

VkPhysicalDeviceMemoryProperties memory_properties; diff --git a/tests/d3d12.c b/tests/d3d12.c index 427e59c22..bb2a04974 100644 --- a/tests/d3d12.c +++ b/tests/d3d12.c @@ -2674,9 +2674,9 @@ static void test_create_root_signature(void) * heap manager reuses the allocation. */ hr = create_root_signature(device, &root_signature_desc, &root_signature2); ok(hr == S_OK, "Failed to create root signature, hr %#x.\n", hr); - todo ok(root_signature == root_signature2, "Got different root signature pointers.\n"); + ok(root_signature == root_signature2, "Got different root signature pointers.\n"); refcount = ID3D12RootSignature_Release(root_signature2); - todo ok(refcount == 1, "ID3D12RootSignature has %u references left.\n", (unsigned int)refcount); + ok(refcount == 1, "ID3D12RootSignature has %u references left.\n", (unsigned int)refcount);

hr = 0xdeadbeef; hr = ID3D12RootSignature_SetPrivateData(root_signature, &test_guid, sizeof(hr), &hr); @@ -2728,9 +2728,9 @@ static void test_create_root_signature(void)

hr = create_root_signature(device, &root_signature_desc, &root_signature2); ok(hr == S_OK, "Failed to create root signature, hr %#x.\n", hr); - todo ok(root_signature == root_signature2, "Got different root signature pointers.\n"); + ok(root_signature == root_signature2, "Got different root signature pointers.\n"); refcount = ID3D12RootSignature_Release(root_signature2); - todo ok(refcount == 1, "ID3D12RootSignature has %u references left.\n", (unsigned int)refcount); + ok(refcount == 1, "ID3D12RootSignature has %u references left.\n", (unsigned int)refcount);

refcount = ID3D12RootSignature_Release(root_signature); ok(!refcount, "ID3D12RootSignature has %u references left.\n", (unsigned int)refcount);

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Stefan Dösinger

9:51 a.m.

New subject: [PATCH v2 15/22] vkd3d: Precreate root signatures from cache

From: Stefan Dösinger stefan@codeweavers.com

--- libs/vkd3d/cache.c | 2 +- libs/vkd3d/device.c | 7 +++++++ libs/vkd3d/state.c | 35 +++++++++++++++++++++++++++++++---- libs/vkd3d/vkd3d_private.h | 11 +++++++++++ 4 files changed, 50 insertions(+), 5 deletions(-)

diff --git a/libs/vkd3d/cache.c b/libs/vkd3d/cache.c index 049cf00ed..5a158e9c3 100644 --- a/libs/vkd3d/cache.c +++ b/libs/vkd3d/cache.c @@ -421,7 +421,7 @@ static inline uint64_t mvkHash64(const uint64_t *pVals, size_t count, uint64_t s return hash; }

-static uint64_t hash_key(const void *key, size_t size) +uint64_t hash_key(const void *key, size_t size) { uint64_t last = 0, ret;

diff --git a/libs/vkd3d/device.c b/libs/vkd3d/device.c index 93771d638..d5cc50f0c 100644 --- a/libs/vkd3d/device.c +++ b/libs/vkd3d/device.c @@ -4403,6 +4403,7 @@ static bool d3d12_device_load_cache(struct vkd3d_shader_cache *cache, const void *key, uint32_t key_size, const void *value, uint32_t value_size, void *context) { + const struct vkd3d_shader_cache_root_signature *rs; const struct vkd3d_shader_cache_entry *e = value; struct d3d12_device *device = context; VkRenderPass rp; @@ -4417,6 +4418,12 @@ static bool d3d12_device_load_cache(struct vkd3d_shader_cache *cache, vkd3d_render_pass_cache_find(device->render_pass_cache, device, key, &rp); break;

+ case SHADER_CACHE_ENTRY_ROOT_SIGNATURE: + rs = value; + d3d12_root_signature_create(device, rs->dxbc, value_size + - offsetof(struct vkd3d_shader_cache_root_signature, dxbc[0]), NULL); + break; + case SHADER_CACHE_ENTRY_VULKAN_BLOB: break; } diff --git a/libs/vkd3d/state.c b/libs/vkd3d/state.c index 417707088..f39055f90 100644 --- a/libs/vkd3d/state.c +++ b/libs/vkd3d/state.c @@ -1532,6 +1532,7 @@ HRESULT d3d12_root_signature_create(struct d3d12_device *device, D3D12_VERSIONED_ROOT_SIGNATURE_DESC d3d12; struct vkd3d_shader_versioned_root_signature_desc vkd3d; } root_signature_desc; + struct vkd3d_shader_cache_root_signature *cache_value; struct d3d12_root_signature *object; uint32_t size = sizeof(object); HRESULT hr; @@ -1542,8 +1543,13 @@ HRESULT d3d12_root_signature_create(struct d3d12_device *device, if (ret == VKD3D_OK) { ERR("found cached root sig\n"); - *root_signature = object; - d3d12_root_signature_AddRef(&object->ID3D12RootSignature_iface); + if (root_signature) + { + *root_signature = object; + d3d12_root_signature_AddRef(&object->ID3D12RootSignature_iface); + } + else + ERR("Why do I create a cached root sig twice?\n"); return S_OK; }

@@ -1574,8 +1580,29 @@ HRESULT d3d12_root_signature_create(struct d3d12_device *device, if (ret) ERR("papa!\n");

- *root_signature = object; - d3d12_root_signature_AddRef(&object->ID3D12RootSignature_iface); + /* Why the hash as key and d3d root signature description as value? Because we store + * the root signature hash in pipelines and need a way to look up the root signature + * when we recreate the pipelines. + * + * Alternatively we could use bytecode as key here and store a hash -> bytecode lookup + * at runtime in device->root_signature_cache. I am unsure for now. */ + object->hash = hash_key(bytecode, bytecode_length); + size = offsetof(struct vkd3d_shader_cache_root_signature, dxbc[bytecode_length]); + cache_value = vkd3d_malloc(size); + cache_value->header.vkd3d_revision = VKD3D_SHADER_CACHE_VKD3D_VERSION; + cache_value->header.type = SHADER_CACHE_ENTRY_ROOT_SIGNATURE; + memcpy(cache_value->dxbc, bytecode, bytecode_length); + ret = vkd3d_shader_cache_put(device->persistent_cache, &object->hash, sizeof(object->hash), + cache_value, size); + if (ret) + ERR("uncle!\n"); + vkd3d_free(cache_value); + + if (root_signature) + { + *root_signature = object; + d3d12_root_signature_AddRef(&object->ID3D12RootSignature_iface); + }

return S_OK; } diff --git a/libs/vkd3d/vkd3d_private.h b/libs/vkd3d/vkd3d_private.h index 13b031789..0705b5e7c 100644 --- a/libs/vkd3d/vkd3d_private.h +++ b/libs/vkd3d/vkd3d_private.h @@ -69,6 +69,7 @@ struct vkd3d_render_pass_key enum vkd3d_shader_cache_entry_type { SHADER_CACHE_ENTRY_RENDER_PASS = VKD3D_MAKE_TAG('R', 'P', 'A', 'S'), + SHADER_CACHE_ENTRY_ROOT_SIGNATURE = VKD3D_MAKE_TAG('R', 'O', 'O', 'T'), SHADER_CACHE_ENTRY_VULKAN_BLOB = VKD3D_MAKE_TAG('V', 'K', 'P', 'C'), };

@@ -84,8 +85,17 @@ struct vkd3d_shader_cache_vk_blob uint8_t blob[1]; };

+struct vkd3d_shader_cache_root_signature +{ + struct vkd3d_shader_cache_entry header; + uint8_t dxbc[1]; +}; + /* End shader data structures */

+/* FIXME: Better name. */ +uint64_t hash_key(const void *key, size_t size); + #define VK_CALL(f) (vk_procs->f)

#define VKD3D_DESCRIPTOR_MAGIC_FREE 0x00000000u @@ -1213,6 +1223,7 @@ struct d3d12_root_signature { ID3D12RootSignature ID3D12RootSignature_iface; LONG refcount; + uint64_t hash;

VkPipelineLayout vk_pipeline_layout; struct d3d12_descriptor_set_layout descriptor_set_layouts[VKD3D_MAX_DESCRIPTOR_SETS];

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Stefan Dösinger

9:51 a.m.

New subject: [PATCH v2 16/22] Store graphics pipelines in the cache.

From: Stefan Dösinger stefan@codeweavers.com

--- libs/vkd3d/state.c | 153 +++++++++++++++++++++++++++++++++++++ libs/vkd3d/vkd3d_private.h | 62 +++++++++++++++ 2 files changed, 215 insertions(+)

diff --git a/libs/vkd3d/state.c b/libs/vkd3d/state.c index f39055f90..eb72464bd 100644 --- a/libs/vkd3d/state.c +++ b/libs/vkd3d/state.c @@ -2476,6 +2476,133 @@ static HRESULT d3d12_pipeline_state_find_and_init_uav_counters(struct d3d12_pipe return hr; }

+static struct vkd3d_shader_cache_pipeline_state *vkd3d_cache_pipeline_from_d3d( + const struct d3d12_pipeline_state_desc *desc, + const struct d3d12_root_signature *root_signature, uint32_t *entry_size) +{ + struct vkd3d_shader_cache_pipeline_state *entry; + uint32_t size, pos = 0, i; + + size = desc->cs.BytecodeLength; + size += desc->vs.BytecodeLength; + size += desc->ps.BytecodeLength; + size += desc->ds.BytecodeLength; + size += desc->hs.BytecodeLength; + size += desc->gs.BytecodeLength; + size += desc->stream_output.NumEntries * sizeof(struct vkd3d_so_declaration_cache_entry); + size += desc->stream_output.NumStrides * sizeof(*desc->stream_output.pBufferStrides); + /* FIXME: Dynamically handle semantic strings */ + size += desc->input_layout.NumElements * sizeof(struct vkd3d_input_layout_element_cache); + + *entry_size = offsetof(struct vkd3d_shader_cache_pipeline_state, data[size]); + entry = vkd3d_calloc(1, *entry_size); + + entry->super.vkd3d_revision = VKD3D_SHADER_CACHE_VKD3D_VERSION; + entry->super.type = 0; + + entry->root_signature = root_signature->hash; + + entry->cs_size = desc->cs.BytecodeLength; + if (entry->cs_size) + { + memcpy(entry->data + pos, desc->cs.pShaderBytecode, entry->cs_size); + pos += entry->cs_size; + } + + entry->vs_size = desc->vs.BytecodeLength; + if (entry->vs_size) + { + memcpy(entry->data + pos, desc->vs.pShaderBytecode, entry->vs_size); + pos += entry->vs_size; + } + + entry->ps_size = desc->ps.BytecodeLength; + if (entry->ps_size) + { + memcpy(entry->data + pos, desc->ps.pShaderBytecode, entry->ps_size); + pos += entry->ps_size; + } + + entry->ds_size = desc->ds.BytecodeLength; + if (entry->ds_size) + { + memcpy(entry->data + pos, desc->ds.pShaderBytecode, entry->ds_size); + pos += entry->ds_size; + } + + entry->hs_size = desc->hs.BytecodeLength; + if (entry->hs_size) + { + memcpy(entry->data + pos, desc->hs.pShaderBytecode, entry->hs_size); + pos += entry->hs_size; + } + + entry->gs_size = desc->gs.BytecodeLength; + if (entry->gs_size) + { + memcpy(entry->data + pos, desc->gs.pShaderBytecode, entry->gs_size); + pos += entry->gs_size; + } + + entry->so_entries = desc->stream_output.NumEntries; + for (i = 0; i < entry->so_entries; ++i) + { + struct vkd3d_so_declaration_cache_entry *e = (void *)(entry->data + pos); + e->stream = desc->stream_output.pSODeclaration[i].Stream; + strncpy(e->semantic_name, desc->stream_output.pSODeclaration[i].SemanticName, 32); + e->semantic_name[31] = 0; + e->semantic_index = desc->stream_output.pSODeclaration[i].SemanticIndex; + e->start_component = desc->stream_output.pSODeclaration[i].StartComponent; + e->component_count = desc->stream_output.pSODeclaration[i].ComponentCount; + e->output_slot = desc->stream_output.pSODeclaration[i].OutputSlot; + + if (strlen(desc->stream_output.pSODeclaration[i].SemanticName) > 31) + FIXME("Output semantic name too long\n"); + + pos += sizeof(*e); + } + entry->so_strides = desc->stream_output.NumStrides; + if (entry->so_strides) + { + memcpy(entry->data + pos, desc->stream_output.pBufferStrides, + sizeof(*desc->stream_output.pBufferStrides) * entry->so_strides); + pos += sizeof(*desc->stream_output.pBufferStrides) * entry->so_strides; + } + + entry->input_layout_elements = desc->input_layout.NumElements; + for (i = 0; i < entry->input_layout_elements; ++i) + { + struct vkd3d_input_layout_element_cache *e = (void *)(entry->data + pos); + strncpy(e->semantic_name, desc->input_layout.pInputElementDescs[i].SemanticName, 32); + e->semantic_name[31] = 0; + e->semantic_index = desc->input_layout.pInputElementDescs[i].SemanticIndex; + e->format = desc->input_layout.pInputElementDescs[i].Format; + e->input_slot = desc->input_layout.pInputElementDescs[i].InputSlot; + e->aligned_byte_offset = desc->input_layout.pInputElementDescs[i].AlignedByteOffset; + e->input_slot_class = desc->input_layout.pInputElementDescs[i].InputSlotClass; + e->instance_data_step_rate = desc->input_layout.pInputElementDescs[i].InstanceDataStepRate; + + if (strlen(desc->input_layout.pInputElementDescs[i].SemanticName) > 31) + FIXME("Input semantic name too long\n"); + + pos += sizeof(*e); + } + + entry->blend_state = desc->blend_state; + entry->sample_mask = desc->sample_mask; + entry->rasterizer_state = desc->rasterizer_state; + entry->depth_stencil_state = desc->depth_stencil_state; + entry->strip_cut_value = desc->strip_cut_value; + entry->primitive_topology_type = desc->primitive_topology_type; + entry->rtv_formats = desc->rtv_formats; + entry->dsv_format = desc->dsv_format; + entry->sample_desc = desc->sample_desc; + entry->node_mask = desc->node_mask; + entry->flags = desc->flags; + + return entry; +} + static HRESULT d3d12_pipeline_state_init_compute(struct d3d12_pipeline_state *state, struct d3d12_device *device, const struct d3d12_pipeline_state_desc *desc) { @@ -3038,6 +3165,7 @@ static HRESULT d3d12_pipeline_state_init_graphics(struct d3d12_pipeline_state *s uint32_t aligned_offsets[D3D12_VS_INPUT_REGISTER_COUNT]; struct vkd3d_shader_descriptor_offset_info offset_info; struct vkd3d_shader_parameter ps_shader_parameters[1]; + struct vkd3d_shader_cache_pipeline_state *cache_entry; struct vkd3d_shader_transform_feedback_info xfb_info; struct vkd3d_shader_spirv_target_info ps_target_info; struct vkd3d_shader_interface_info shader_interface; @@ -3050,6 +3178,7 @@ static HRESULT d3d12_pipeline_state_init_graphics(struct d3d12_pipeline_state *s const struct vkd3d_format *format; unsigned int instance_divisor; VkVertexInputRate input_rate; + uint32_t cache_entry_size; unsigned int i, j; size_t rt_count; uint32_t mask; @@ -3555,6 +3684,18 @@ static HRESULT d3d12_pipeline_state_init_graphics(struct d3d12_pipeline_state *s state->vk_bind_point = VK_PIPELINE_BIND_POINT_GRAPHICS; d3d12_device_add_ref(state->device = device);

+ cache_entry = vkd3d_cache_pipeline_from_d3d(desc, root_signature, &cache_entry_size); + if (cache_entry) + { + uint64_t hash; + cache_entry->super.type = SHADER_CACHE_ENTRY_GRAPHICS_STATE; + hash = hash_key(cache_entry, cache_entry_size); + vkd3d_shader_cache_put(device->persistent_cache, &hash, sizeof(hash), + cache_entry, cache_entry_size); + vkd3d_free(cache_entry); + state->state_hash = hash; + } + return S_OK;

fail: @@ -3775,6 +3916,8 @@ VkPipeline d3d12_pipeline_state_get_or_create_pipeline(struct d3d12_pipeline_sta struct d3d12_graphics_pipeline_state *graphics = &state->u.graphics; VkPipelineVertexInputDivisorStateCreateInfoEXT input_divisor_info; VkPipelineTessellationStateCreateInfo tessellation_info; + struct vkd3d_graphics_pipeline_key persistent_key = {0}; + struct vkd3d_graphics_pipeline_entry cache_entry = {0}; VkPipelineVertexInputStateCreateInfo input_desc; VkPipelineInputAssemblyStateCreateInfo ia_desc; VkPipelineColorBlendStateCreateInfo blend_desc; @@ -3841,12 +3984,17 @@ VkPipeline d3d12_pipeline_state_get_or_create_pipeline(struct d3d12_pipeline_sta b->inputRate = graphics->input_rates[binding];

pipeline_key.strides[binding_count] = strides[binding]; + persistent_key.strides[binding] = strides[binding];

++binding_count; }

pipeline_key.dsv_format = dsv_format;

+ persistent_key.state = state->state_hash; + persistent_key.topology = topology; + persistent_key.dsv_format = dsv_format; + if ((vk_pipeline = d3d12_pipeline_state_find_compiled_pipeline(state, &pipeline_key, vk_render_pass))) return vk_pipeline;

@@ -3938,6 +4086,11 @@ VkPipeline d3d12_pipeline_state_get_or_create_pipeline(struct d3d12_pipeline_sta return VK_NULL_HANDLE; }

+ cache_entry.super.vkd3d_revision = VKD3D_SHADER_CACHE_VKD3D_VERSION; + cache_entry.super.type = SHADER_CACHE_ENTRY_GRAPHICS_PIPELINE; + vkd3d_shader_cache_put(device->persistent_cache, &persistent_key, sizeof(persistent_key), + &cache_entry, sizeof(cache_entry)); + if (d3d12_pipeline_state_put_pipeline_to_cache(state, &pipeline_key, vk_pipeline, pipeline_desc.renderPass)) return vk_pipeline;

diff --git a/libs/vkd3d/vkd3d_private.h b/libs/vkd3d/vkd3d_private.h index 0705b5e7c..f934be5fb 100644 --- a/libs/vkd3d/vkd3d_private.h +++ b/libs/vkd3d/vkd3d_private.h @@ -68,6 +68,9 @@ struct vkd3d_render_pass_key

enum vkd3d_shader_cache_entry_type { + SHADER_CACHE_ENTRY_COMPUTE_STATE = VKD3D_MAKE_TAG('C', 'O', 'M', 'P'), + SHADER_CACHE_ENTRY_GRAPHICS_PIPELINE = VKD3D_MAKE_TAG('G', 'F', 'X', 'P'), + SHADER_CACHE_ENTRY_GRAPHICS_STATE = VKD3D_MAKE_TAG('G', 'F', 'X', 'S'), SHADER_CACHE_ENTRY_RENDER_PASS = VKD3D_MAKE_TAG('R', 'P', 'A', 'S'), SHADER_CACHE_ENTRY_ROOT_SIGNATURE = VKD3D_MAKE_TAG('R', 'O', 'O', 'T'), SHADER_CACHE_ENTRY_VULKAN_BLOB = VKD3D_MAKE_TAG('V', 'K', 'P', 'C'), @@ -91,6 +94,64 @@ struct vkd3d_shader_cache_root_signature uint8_t dxbc[1]; };

+struct vkd3d_input_layout_element_cache +{ + char semantic_name[32]; /* Not a proper solution */ + UINT semantic_index; + DXGI_FORMAT format; + UINT input_slot; + UINT aligned_byte_offset; + D3D12_INPUT_CLASSIFICATION input_slot_class; + UINT instance_data_step_rate; +}; + +struct vkd3d_so_declaration_cache_entry +{ + UINT stream; + char semantic_name[32]; /* Not a proper solution */ + UINT semantic_index; + BYTE start_component; + BYTE component_count; + BYTE output_slot; +}; + +struct vkd3d_shader_cache_pipeline_state +{ + struct vkd3d_shader_cache_entry super; + uint64_t root_signature; + uint32_t cs_size, vs_size, ps_size, ds_size, hs_size, gs_size; + uint32_t so_entries, so_strides; + uint32_t so_RasterizedStream; + uint32_t input_layout_elements; + D3D12_BLEND_DESC blend_state; + UINT sample_mask; + D3D12_RASTERIZER_DESC rasterizer_state; + D3D12_DEPTH_STENCIL_DESC1 depth_stencil_state; + /* Input layout is appended */ + D3D12_INDEX_BUFFER_STRIP_CUT_VALUE strip_cut_value; + D3D12_PRIMITIVE_TOPOLOGY_TYPE primitive_topology_type; + struct D3D12_RT_FORMAT_ARRAY rtv_formats; + DXGI_FORMAT dsv_format; + DXGI_SAMPLE_DESC sample_desc; + UINT node_mask; + D3D12_PIPELINE_STATE_FLAGS flags; + uint8_t data[1]; +}; + +struct vkd3d_graphics_pipeline_key +{ + uint64_t state; + D3D12_PRIMITIVE_TOPOLOGY topology; + VkFormat dsv_format; + uint32_t strides[D3D12_IA_VERTEX_INPUT_RESOURCE_SLOT_COUNT]; +}; + +struct vkd3d_graphics_pipeline_entry +{ + struct vkd3d_shader_cache_entry super; + /* TODO: Translated spir-v code */ +}; + /* End shader data structures */

/* FIXME: Better name. */ @@ -1340,6 +1401,7 @@ struct d3d12_pipeline_state struct d3d12_compute_pipeline_state compute; } u; VkPipelineBindPoint vk_bind_point; + uint64_t state_hash;

struct d3d12_pipeline_uav_counter_state uav_counters;

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Stefan Dösinger

9:51 a.m.

New subject: [PATCH v2 17/22] vkd3d: Catch and release graphics pipelines.

From: Stefan Dösinger stefan@codeweavers.com

--- libs/vkd3d/device.c | 190 +++++++++++++++++++++++++++++++++++++ libs/vkd3d/state.c | 2 +- libs/vkd3d/vkd3d_private.h | 2 + 3 files changed, 193 insertions(+), 1 deletion(-)

diff --git a/libs/vkd3d/device.c b/libs/vkd3d/device.c index d5cc50f0c..370a2f67f 100644 --- a/libs/vkd3d/device.c +++ b/libs/vkd3d/device.c @@ -4424,7 +4424,196 @@ static bool d3d12_device_load_cache(struct vkd3d_shader_cache *cache, - offsetof(struct vkd3d_shader_cache_root_signature, dxbc[0]), NULL); break;

+ case SHADER_CACHE_ENTRY_COMPUTE_STATE: + case SHADER_CACHE_ENTRY_GRAPHICS_PIPELINE: + case SHADER_CACHE_ENTRY_GRAPHICS_STATE: + /* These are handled in a second pass */ + break; + + case SHADER_CACHE_ENTRY_VULKAN_BLOB: + break; + } + + return true; +} + +static bool d3d12_device_load_cache2(struct vkd3d_shader_cache *cache, + const void *key, uint32_t key_size, const void *value, + uint32_t value_size, void *context) +{ + const struct vkd3d_shader_cache_entry *e = value; + struct vkd3d_shader_cache_root_signature *rs; + struct d3d12_root_signature *d3d12_root_sig; + struct vkd3d_shader_cache_pipeline_state *s; + const struct vkd3d_graphics_pipeline_key *k; + D3D12_INPUT_ELEMENT_DESC *il_element = NULL; + D3D12_SO_DECLARATION_ENTRY *so_decl = NULL; + struct d3d12_pipeline_state_desc desc; + struct d3d12_device *device = context; + struct d3d12_pipeline_state *object; + uint32_t size, size2, pos = 0; + enum vkd3d_result ret; + unsigned int i; + HRESULT hr; + + TRACE("device %p got entry type (%c%c%c%c)\n", device, + e->type & 0xff, e->type >> 8 & 0xff, e->type >> 16 & 0xff, + e->type >> 24 & 0xff); + + switch (e->type) + { + case SHADER_CACHE_ENTRY_RENDER_PASS: + case SHADER_CACHE_ENTRY_ROOT_SIGNATURE: case SHADER_CACHE_ENTRY_VULKAN_BLOB: + /* Handled already */ + break; + + case SHADER_CACHE_ENTRY_COMPUTE_STATE: + /* TODO */ + break; + + case SHADER_CACHE_ENTRY_GRAPHICS_STATE: + /* Ignore, look it up when handling the full state */ + break; + + case SHADER_CACHE_ENTRY_GRAPHICS_PIPELINE: + k = key; + ret = vkd3d_shader_cache_get(cache, &k->state, sizeof(k->state), + NULL, &size); + if (ret) + { + FIXME("Did not find graphics state\n"); + break; + } + + s = vkd3d_malloc(size); + if (!s) + break; + ret = vkd3d_shader_cache_get(cache, &k->state, sizeof(k->state), + s, &size); + if (ret) + ERR("whut?\n"); + + ret = vkd3d_shader_cache_get(cache, &s->root_signature, sizeof(s->root_signature), + NULL, &size); + if (ret) + { + FIXME("Did not find root signature %lx for graphics pipeline\n", s->root_signature); + vkd3d_free(s); + break; + } + rs = vkd3d_malloc(size); + ret = vkd3d_shader_cache_get(cache, &s->root_signature, sizeof(s->root_signature), + rs, &size); + if (ret) + ERR("whut?\n"); + + size2 = sizeof(d3d12_root_sig); + ret = vkd3d_shader_cache_get(device->root_signature_cache, rs->dxbc, + size - offsetof(struct vkd3d_shader_cache_root_signature, dxbc[0]), + &d3d12_root_sig, &size2); + vkd3d_free(rs); + if (ret) + { + ERR("whut 2? Did not find root sig of hash %lx %d\n", s->root_signature, ret); + // return; + } + + memset(&desc, 0, sizeof(desc)); + desc.root_signature = &d3d12_root_sig->ID3D12RootSignature_iface; + + desc.vs.BytecodeLength = s->vs_size; + desc.vs.pShaderBytecode = s->vs_size ? s->data + pos : NULL; + pos += s->vs_size; + desc.ps.BytecodeLength = s->ps_size; + desc.ps.pShaderBytecode = s->ps_size ? s->data + pos : NULL; + pos += s->ps_size; + desc.ds.BytecodeLength = s->ds_size; + desc.ds.pShaderBytecode = s->ds_size ? s->data + pos : NULL; + pos += s->ds_size; + desc.hs.BytecodeLength = s->hs_size; + desc.hs.pShaderBytecode = s->hs_size ? s->data + pos : NULL; + pos += s->hs_size; + desc.gs.BytecodeLength = s->gs_size; + desc.gs.pShaderBytecode = s->gs_size ? s->data + pos : NULL; + pos += s->gs_size; + + desc.stream_output.NumEntries = s->so_entries; + if (s->so_entries) + { + so_decl = vkd3d_malloc(sizeof(*so_decl) * s->so_entries); + for (i = 0; i < s->so_entries; ++i) + { + struct vkd3d_so_declaration_cache_entry *sod = (void *)(s->data + pos); + so_decl[i].Stream = sod->stream; + so_decl[i].SemanticName = sod->semantic_name; + so_decl[i].SemanticIndex = sod->semantic_index; + so_decl[i].StartComponent = sod->start_component; + so_decl[i].ComponentCount = sod->component_count; + so_decl[i].OutputSlot = sod->output_slot; + pos += sizeof(*sod); + } + desc.stream_output.pSODeclaration = so_decl; + } + desc.stream_output.NumStrides = s->so_strides; + desc.stream_output.pBufferStrides = (void *)(s->data + pos); + pos += s->so_strides * sizeof(*desc.stream_output.pBufferStrides); + desc.stream_output.RasterizedStream = s->so_RasterizedStream; + + desc.blend_state = s->blend_state; + desc.sample_mask = s->sample_mask; + desc.rasterizer_state = s->rasterizer_state; + desc.depth_stencil_state = s->depth_stencil_state; + + desc.input_layout.NumElements = s->input_layout_elements; + if (s->input_layout_elements) + { + il_element = vkd3d_malloc(sizeof(*il_element) * s->input_layout_elements); + for (i = 0; i < s->input_layout_elements; ++i) + { + struct vkd3d_input_layout_element_cache *ile = (void *)(s->data + pos); + il_element[i].SemanticName = ile->semantic_name; + il_element[i].SemanticIndex = ile->semantic_index; + il_element[i].Format = ile->format; + il_element[i].InputSlot = ile->input_slot; + il_element[i].AlignedByteOffset = ile->aligned_byte_offset; + il_element[i].InputSlotClass = ile->input_slot_class; + il_element[i].InstanceDataStepRate = ile->instance_data_step_rate; + pos += sizeof(*ile); + } + desc.input_layout.pInputElementDescs = il_element; + } + + desc.strip_cut_value = s->strip_cut_value; + desc.primitive_topology_type = s->primitive_topology_type; + desc.rtv_formats = s->rtv_formats; + desc.dsv_format = s->dsv_format; + desc.sample_desc = s->sample_desc; + desc.node_mask = s->node_mask; + desc.flags = s->flags; + + if (!(object = vkd3d_malloc(sizeof(*object)))) + ERR("meh\n"); + /* We're happy with just creating and destroying it for now. It will feed the vulkan + * pipeline cache, which should re-use the pipeline when the game creates it for actual + * use later. + * + * FIXME: The manipulation of the device refcount in init() and Release() makes it + * unsafe to move this function to a separate thread. We might hold and release the + * last reference to the device. */ + hr = d3d12_pipeline_state_init_graphics(object, device, &desc); + if (SUCCEEDED(hr)) + { + VkRenderPass pass; + VkPipeline p = d3d12_pipeline_state_get_or_create_pipeline(object, + k->topology, k->strides, k->dsv_format, &pass); + TRACE("got render pass %lx\n", p); + ID3D12PipelineState_Release(&object->ID3D12PipelineState_iface); + } + + vkd3d_free(so_decl); + vkd3d_free(il_element); + vkd3d_free(s); break; }

@@ -4527,6 +4716,7 @@ static HRESULT d3d12_device_init(struct d3d12_device *device, device_init_descriptor_pool_sizes(device);

vkd3d_shader_cache_enumerate(device->persistent_cache, d3d12_device_load_cache, device); + vkd3d_shader_cache_enumerate(device->persistent_cache, d3d12_device_load_cache2, device);

if ((device->parent = create_info->parent)) IUnknown_AddRef(device->parent); diff --git a/libs/vkd3d/state.c b/libs/vkd3d/state.c index eb72464bd..8b72c6408 100644 --- a/libs/vkd3d/state.c +++ b/libs/vkd3d/state.c @@ -3151,7 +3151,7 @@ static VkLogicOp vk_logic_op_from_d3d12(D3D12_LOGIC_OP op) } }

-static HRESULT d3d12_pipeline_state_init_graphics(struct d3d12_pipeline_state *state, +HRESULT d3d12_pipeline_state_init_graphics(struct d3d12_pipeline_state *state, struct d3d12_device *device, const struct d3d12_pipeline_state_desc *desc) { unsigned int ps_output_swizzle[D3D12_SIMULTANEOUS_RENDER_TARGET_COUNT]; diff --git a/libs/vkd3d/vkd3d_private.h b/libs/vkd3d/vkd3d_private.h index f934be5fb..31f989947 100644 --- a/libs/vkd3d/vkd3d_private.h +++ b/libs/vkd3d/vkd3d_private.h @@ -1462,6 +1462,8 @@ HRESULT d3d12_pipeline_state_create_compute(struct d3d12_device *device, const D3D12_COMPUTE_PIPELINE_STATE_DESC *desc, struct d3d12_pipeline_state **state); HRESULT d3d12_pipeline_state_create_graphics(struct d3d12_device *device, const D3D12_GRAPHICS_PIPELINE_STATE_DESC *desc, struct d3d12_pipeline_state **state); +HRESULT d3d12_pipeline_state_init_graphics(struct d3d12_pipeline_state *state, + struct d3d12_device *device, const struct d3d12_pipeline_state_desc *desc); HRESULT d3d12_pipeline_state_create(struct d3d12_device *device, const D3D12_PIPELINE_STATE_STREAM_DESC *desc, struct d3d12_pipeline_state **state); VkPipeline d3d12_pipeline_state_get_or_create_pipeline(struct d3d12_pipeline_state *state,

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Stefan Dösinger

9:51 a.m.

New subject: [PATCH v2 18/22] vkd3d: Add EXT_pipeline_creation_feedback.

From: Stefan Dösinger stefan@codeweavers.com

--- libs/vkd3d/device.c | 1 + libs/vkd3d/state.c | 19 +++++++++++++++++++ libs/vkd3d/vkd3d_private.h | 1 + 3 files changed, 21 insertions(+)

diff --git a/libs/vkd3d/device.c b/libs/vkd3d/device.c index 370a2f67f..61adfa1f9 100644 --- a/libs/vkd3d/device.c +++ b/libs/vkd3d/device.c @@ -104,6 +104,7 @@ static const struct vkd3d_optional_extension_info optional_device_extensions[] = VK_EXTENSION(EXT_DEPTH_CLIP_ENABLE, EXT_depth_clip_enable), VK_EXTENSION(EXT_DESCRIPTOR_INDEXING, EXT_descriptor_indexing), VK_EXTENSION(EXT_MUTABLE_DESCRIPTOR_TYPE, EXT_mutable_descriptor_type), + VK_EXTENSION(EXT_PIPELINE_CREATION_FEEDBACK, EXT_pipeline_creation_feedback), VK_EXTENSION(EXT_ROBUSTNESS_2, EXT_robustness2), VK_EXTENSION(EXT_SHADER_DEMOTE_TO_HELPER_INVOCATION, EXT_shader_demote_to_helper_invocation), VK_EXTENSION(EXT_SHADER_STENCIL_EXPORT, EXT_shader_stencil_export), diff --git a/libs/vkd3d/state.c b/libs/vkd3d/state.c index 8b72c6408..b1a625687 100644 --- a/libs/vkd3d/state.c +++ b/libs/vkd3d/state.c @@ -3912,12 +3912,14 @@ VkPipeline d3d12_pipeline_state_get_or_create_pipeline(struct d3d12_pipeline_sta VkRenderPass *vk_render_pass) { VkVertexInputBindingDescription bindings[D3D12_IA_VERTEX_INPUT_RESOURCE_SLOT_COUNT]; + VkPipelineCreationFeedback feedback = {0}, stage_feedback[VKD3D_MAX_SHADER_STAGES]; const struct vkd3d_vk_device_procs *vk_procs = &state->device->vk_procs; struct d3d12_graphics_pipeline_state *graphics = &state->u.graphics; VkPipelineVertexInputDivisorStateCreateInfoEXT input_divisor_info; VkPipelineTessellationStateCreateInfo tessellation_info; struct vkd3d_graphics_pipeline_key persistent_key = {0}; struct vkd3d_graphics_pipeline_entry cache_entry = {0}; + VkPipelineCreationFeedbackCreateInfo feedback_info; VkPipelineVertexInputStateCreateInfo input_desc; VkPipelineInputAssemblyStateCreateInfo ia_desc; VkPipelineColorBlendStateCreateInfo blend_desc; @@ -4077,6 +4079,15 @@ VkPipeline d3d12_pipeline_state_get_or_create_pipeline(struct d3d12_pipeline_sta return VK_NULL_HANDLE; }

+ if (device->vk_info.EXT_pipeline_creation_feedback) + { + pipeline_desc.pNext = &feedback_info; + feedback_info.sType = VK_STRUCTURE_TYPE_PIPELINE_CREATION_FEEDBACK_CREATE_INFO; + feedback_info.pNext = NULL; + feedback_info.pPipelineCreationFeedback = &feedback; + feedback_info.pipelineStageCreationFeedbackCount = ARRAY_SIZE(stage_feedback); + feedback_info.pPipelineStageCreationFeedbacks = stage_feedback; + } *vk_render_pass = pipeline_desc.renderPass;

if ((vr = VK_CALL(vkCreateGraphicsPipelines(device->vk_device, device->vk_pipeline_cache, @@ -4091,6 +4102,14 @@ VkPipeline d3d12_pipeline_state_get_or_create_pipeline(struct d3d12_pipeline_sta vkd3d_shader_cache_put(device->persistent_cache, &persistent_key, sizeof(persistent_key), &cache_entry, sizeof(cache_entry));

+ if (feedback.flags & VK_PIPELINE_CREATION_FEEDBACK_VALID_BIT) + { + if (feedback.flags & VK_PIPELINE_CREATION_FEEDBACK_APPLICATION_PIPELINE_CACHE_HIT_BIT) + TRACE("Pipeline was found in the Vulkan pipeline cache.\n"); + else + TRACE("Pipeline was not found in the Vulkan pipeline cache.\n"); + } + if (d3d12_pipeline_state_put_pipeline_to_cache(state, &pipeline_key, vk_pipeline, pipeline_desc.renderPass)) return vk_pipeline;

diff --git a/libs/vkd3d/vkd3d_private.h b/libs/vkd3d/vkd3d_private.h index 31f989947..581bc893b 100644 --- a/libs/vkd3d/vkd3d_private.h +++ b/libs/vkd3d/vkd3d_private.h @@ -248,6 +248,7 @@ struct vkd3d_vulkan_info bool EXT_depth_clip_enable; bool EXT_descriptor_indexing; bool EXT_mutable_descriptor_type; + bool EXT_pipeline_creation_feedback; bool EXT_robustness2; bool EXT_shader_demote_to_helper_invocation; bool EXT_shader_stencil_export;

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Stefan Dösinger

9:51 a.m.

New subject: [PATCH v2 19/22] vkd3d: Add some cache efficiency debug code.

From: Stefan Dösinger stefan@codeweavers.com

--- libs/vkd3d/device.c | 7 +++++++ libs/vkd3d/state.c | 14 +++++++++++++- libs/vkd3d/vkd3d_private.h | 2 ++ 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/libs/vkd3d/device.c b/libs/vkd3d/device.c index 61adfa1f9..1f5be6e65 100644 --- a/libs/vkd3d/device.c +++ b/libs/vkd3d/device.c @@ -4653,6 +4653,8 @@ static HRESULT d3d12_device_init(struct d3d12_device *device, if (FAILED(hr = vkd3d_create_vk_device(device, create_info))) goto out_free_instance;

+ device->cache_hit = device->cache_miss = device->cache_ready = 0; + /* FIXME: Does this use of getcwd work on Unix too? */ cwd = getcwd(NULL, 0); cache_name = vkd3d_malloc(strlen(cwd) + strlen(instance->application_name) + 8); @@ -4719,6 +4721,11 @@ static HRESULT d3d12_device_init(struct d3d12_device *device, vkd3d_shader_cache_enumerate(device->persistent_cache, d3d12_device_load_cache, device); vkd3d_shader_cache_enumerate(device->persistent_cache, d3d12_device_load_cache2, device);

+ TRACE("Creation time: %u cache hits, %u miss, %02f%% ratio\n", device->cache_hit, device->cache_miss, + ((float)device->cache_hit) / (device->cache_hit + device->cache_miss) * 100); + device->cache_hit = device->cache_miss = 0; + device->cache_ready = true; + if ((device->parent = create_info->parent)) IUnknown_AddRef(device->parent);

diff --git a/libs/vkd3d/state.c b/libs/vkd3d/state.c index b1a625687..1804b5b0c 100644 --- a/libs/vkd3d/state.c +++ b/libs/vkd3d/state.c @@ -4105,9 +4105,21 @@ VkPipeline d3d12_pipeline_state_get_or_create_pipeline(struct d3d12_pipeline_sta if (feedback.flags & VK_PIPELINE_CREATION_FEEDBACK_VALID_BIT) { if (feedback.flags & VK_PIPELINE_CREATION_FEEDBACK_APPLICATION_PIPELINE_CACHE_HIT_BIT) + { TRACE("Pipeline was found in the Vulkan pipeline cache.\n"); - else + device->cache_hit++; + } + else + { TRACE("Pipeline was not found in the Vulkan pipeline cache.\n"); + device->cache_miss++; + } + + if (device->cache_ready) + { + TRACE("runtime: %u cache hits, %u miss, %02f%% ratio\n", device->cache_hit, device->cache_miss, + ((float)device->cache_hit) / (device->cache_hit + device->cache_miss) * 100); + } }

if (d3d12_pipeline_state_put_pipeline_to_cache(state, &pipeline_key, vk_pipeline, pipeline_desc.renderPass)) diff --git a/libs/vkd3d/vkd3d_private.h b/libs/vkd3d/vkd3d_private.h index 581bc893b..3c28166e9 100644 --- a/libs/vkd3d/vkd3d_private.h +++ b/libs/vkd3d/vkd3d_private.h @@ -1896,6 +1896,8 @@ struct d3d12_device struct vkd3d_shader_cache *render_pass_cache; struct vkd3d_shader_cache *root_signature_cache; VkPipelineCache vk_pipeline_cache; + uint32_t cache_hit, cache_miss; + bool cache_ready;

VkPhysicalDeviceMemoryProperties memory_properties;

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Stefan Dösinger

9:51 a.m.

New subject: [PATCH v2 20/22] DEBUG: Make cache profiling more visible

From: Stefan Dösinger stefan@codeweavers.com

And disable the VK pipeline cache as it may hide bugs in our cache --- libs/vkd3d/device.c | 6 +++--- libs/vkd3d/state.c | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/libs/vkd3d/device.c b/libs/vkd3d/device.c index 1f5be6e65..b0e6deb1d 100644 --- a/libs/vkd3d/device.c +++ b/libs/vkd3d/device.c @@ -2124,8 +2124,8 @@ static HRESULT d3d12_device_init_pipeline_cache(struct d3d12_device *device) cache_info.sType = VK_STRUCTURE_TYPE_PIPELINE_CACHE_CREATE_INFO; cache_info.pNext = NULL; cache_info.flags = 0; - cache_info.initialDataSize = cache_size; - cache_info.pInitialData = cache_data->blob; + cache_info.initialDataSize = 0; + cache_info.pInitialData = NULL; if ((vr = VK_CALL(vkCreatePipelineCache(device->vk_device, &cache_info, NULL, &device->vk_pipeline_cache))) < 0) { @@ -4721,7 +4721,7 @@ static HRESULT d3d12_device_init(struct d3d12_device *device, vkd3d_shader_cache_enumerate(device->persistent_cache, d3d12_device_load_cache, device); vkd3d_shader_cache_enumerate(device->persistent_cache, d3d12_device_load_cache2, device);

- TRACE("Creation time: %u cache hits, %u miss, %02f%% ratio\n", device->cache_hit, device->cache_miss, + ERR("Creation time: %u cache hits, %u miss, %02f%% ratio\n", device->cache_hit, device->cache_miss, ((float)device->cache_hit) / (device->cache_hit + device->cache_miss) * 100); device->cache_hit = device->cache_miss = 0; device->cache_ready = true; diff --git a/libs/vkd3d/state.c b/libs/vkd3d/state.c index 1804b5b0c..c59a97aa1 100644 --- a/libs/vkd3d/state.c +++ b/libs/vkd3d/state.c @@ -4117,7 +4117,7 @@ VkPipeline d3d12_pipeline_state_get_or_create_pipeline(struct d3d12_pipeline_sta

if (device->cache_ready) { - TRACE("runtime: %u cache hits, %u miss, %02f%% ratio\n", device->cache_hit, device->cache_miss, + ERR("runtime: %u cache hits, %u miss, %02f%% ratio\n", device->cache_hit, device->cache_miss, ((float)device->cache_hit) / (device->cache_hit + device->cache_miss) * 100); } }

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Stefan Dösinger

9:51 a.m.

New subject: [PATCH v2 21/22] vkd3d: Cache and preload compute pipelines.

From: Stefan Dösinger stefan@codeweavers.com

--- libs/vkd3d/device.c | 42 +++++++++++++++++++++++++++++--------- libs/vkd3d/state.c | 16 ++++++++++++++- libs/vkd3d/vkd3d_private.h | 2 ++ 3 files changed, 49 insertions(+), 11 deletions(-)

diff --git a/libs/vkd3d/device.c b/libs/vkd3d/device.c index b0e6deb1d..38d13f5e6 100644 --- a/libs/vkd3d/device.c +++ b/libs/vkd3d/device.c @@ -4469,7 +4469,6 @@ static bool d3d12_device_load_cache2(struct vkd3d_shader_cache *cache, /* Handled already */ break;

- case SHADER_CACHE_ENTRY_COMPUTE_STATE: /* TODO */ break;

@@ -4495,12 +4494,21 @@ static bool d3d12_device_load_cache2(struct vkd3d_shader_cache *cache, if (ret) ERR("whut?\n");

+ /* Fall through */ + case SHADER_CACHE_ENTRY_COMPUTE_STATE: + if (e->type == SHADER_CACHE_ENTRY_COMPUTE_STATE) + { + s = (void *)value; + k = NULL; /* gcc thinks it may be used uninitialized. */ + } + ret = vkd3d_shader_cache_get(cache, &s->root_signature, sizeof(s->root_signature), NULL, &size); if (ret) { FIXME("Did not find root signature %lx for graphics pipeline\n", s->root_signature); - vkd3d_free(s); + if (s != key) + vkd3d_free(s); break; } rs = vkd3d_malloc(size); @@ -4523,6 +4531,9 @@ static bool d3d12_device_load_cache2(struct vkd3d_shader_cache *cache, memset(&desc, 0, sizeof(desc)); desc.root_signature = &d3d12_root_sig->ID3D12RootSignature_iface;

+ desc.cs.BytecodeLength = s->cs_size; + desc.cs.pShaderBytecode = s->cs_size ? s->data + pos : NULL; + pos += s->cs_size; desc.vs.BytecodeLength = s->vs_size; desc.vs.pShaderBytecode = s->vs_size ? s->data + pos : NULL; pos += s->vs_size; @@ -4602,19 +4613,30 @@ static bool d3d12_device_load_cache2(struct vkd3d_shader_cache *cache, * FIXME: The manipulation of the device refcount in init() and Release() makes it * unsafe to move this function to a separate thread. We might hold and release the * last reference to the device. */ - hr = d3d12_pipeline_state_init_graphics(object, device, &desc); - if (SUCCEEDED(hr)) + if (e->type == SHADER_CACHE_ENTRY_GRAPHICS_PIPELINE) + { + hr = d3d12_pipeline_state_init_graphics(object, device, &desc); + if (SUCCEEDED(hr)) + { + VkRenderPass pass; + VkPipeline p = d3d12_pipeline_state_get_or_create_pipeline(object, + k->topology, k->strides, k->dsv_format, &pass); + TRACE("got render pass %lx\n", p); + ID3D12PipelineState_Release(&object->ID3D12PipelineState_iface); + } + vkd3d_free(s); + } + else if (e->type == SHADER_CACHE_ENTRY_COMPUTE_STATE) { - VkRenderPass pass; - VkPipeline p = d3d12_pipeline_state_get_or_create_pipeline(object, - k->topology, k->strides, k->dsv_format, &pass); - TRACE("got render pass %lx\n", p); - ID3D12PipelineState_Release(&object->ID3D12PipelineState_iface); + hr = d3d12_pipeline_state_init_compute(object, device, &desc); + if (SUCCEEDED(hr)) + ID3D12PipelineState_Release(&object->ID3D12PipelineState_iface); + else + ERR("Cached compute pipeline did not build.\n"); }

vkd3d_free(so_decl); vkd3d_free(il_element); - vkd3d_free(s); break; }

diff --git a/libs/vkd3d/state.c b/libs/vkd3d/state.c index c59a97aa1..508b6d659 100644 --- a/libs/vkd3d/state.c +++ b/libs/vkd3d/state.c @@ -2603,15 +2603,17 @@ static struct vkd3d_shader_cache_pipeline_state *vkd3d_cache_pipeline_from_d3d( return entry; }

-static HRESULT d3d12_pipeline_state_init_compute(struct d3d12_pipeline_state *state, +HRESULT d3d12_pipeline_state_init_compute(struct d3d12_pipeline_state *state, struct d3d12_device *device, const struct d3d12_pipeline_state_desc *desc) { const struct vkd3d_vk_device_procs *vk_procs = &device->vk_procs; struct vkd3d_shader_interface_info shader_interface; struct vkd3d_shader_descriptor_offset_info offset_info; + struct vkd3d_shader_cache_pipeline_state *cache_entry; const struct d3d12_root_signature *root_signature; struct vkd3d_shader_spirv_target_info target_info; VkPipelineLayout vk_pipeline_layout; + uint32_t cache_entry_size; HRESULT hr;

state->ID3D12PipelineState_iface.lpVtbl = &d3d12_pipeline_state_vtbl; @@ -2682,6 +2684,18 @@ static HRESULT d3d12_pipeline_state_init_compute(struct d3d12_pipeline_state *st return hr; }

+ cache_entry = vkd3d_cache_pipeline_from_d3d(desc, root_signature, &cache_entry_size); + if (cache_entry) + { + uint64_t hash; + cache_entry->super.type = SHADER_CACHE_ENTRY_COMPUTE_STATE; + hash = hash_key(cache_entry, cache_entry_size); + vkd3d_shader_cache_put(device->persistent_cache, &hash, sizeof(hash), + cache_entry, cache_entry_size); + vkd3d_free(cache_entry); + state->state_hash = hash; + } + state->vk_bind_point = VK_PIPELINE_BIND_POINT_COMPUTE; d3d12_device_add_ref(state->device = device);

diff --git a/libs/vkd3d/vkd3d_private.h b/libs/vkd3d/vkd3d_private.h index 3c28166e9..3535d4389 100644 --- a/libs/vkd3d/vkd3d_private.h +++ b/libs/vkd3d/vkd3d_private.h @@ -1463,6 +1463,8 @@ HRESULT d3d12_pipeline_state_create_compute(struct d3d12_device *device, const D3D12_COMPUTE_PIPELINE_STATE_DESC *desc, struct d3d12_pipeline_state **state); HRESULT d3d12_pipeline_state_create_graphics(struct d3d12_device *device, const D3D12_GRAPHICS_PIPELINE_STATE_DESC *desc, struct d3d12_pipeline_state **state); +HRESULT d3d12_pipeline_state_init_compute(struct d3d12_pipeline_state *state, + struct d3d12_device *device, const struct d3d12_pipeline_state_desc *desc); HRESULT d3d12_pipeline_state_init_graphics(struct d3d12_pipeline_state *state, struct d3d12_device *device, const struct d3d12_pipeline_state_desc *desc); HRESULT d3d12_pipeline_state_create(struct d3d12_device *device,

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Stefan Dösinger

9:51 a.m.

New subject: [PATCH v2 22/22] vkd3d: Try to find a read-only cache in C:\windows\scache

From: Stefan Dösinger stefan@codeweavers.com

This is intended for crossover, not necessarily upstream vkd3d

FIXME: This will cause the read-write cache to write all objects from the read-only cache. --- libs/vkd3d/device.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/libs/vkd3d/device.c b/libs/vkd3d/device.c index 38d13f5e6..1cf56c865 100644 --- a/libs/vkd3d/device.c +++ b/libs/vkd3d/device.c @@ -4648,6 +4648,7 @@ static HRESULT d3d12_device_init(struct d3d12_device *device, { struct vkd3d_shader_cache_desc cache_desc = {0}; const struct vkd3d_vk_device_procs *vk_procs; + struct vkd3d_shader_cache *base_cache; char *cache_name, *cwd; HRESULT hr;

@@ -4679,14 +4680,19 @@ static HRESULT d3d12_device_init(struct d3d12_device *device,

/* FIXME: Does this use of getcwd work on Unix too? */ cwd = getcwd(NULL, 0); - cache_name = vkd3d_malloc(strlen(cwd) + strlen(instance->application_name) + 8); - sprintf(cache_name, "%s/%s.cache", cwd, instance->application_name); - free(cwd); /* Use libc's free() because it is malloc'ed by getcwd. */ - + cache_name = vkd3d_malloc(max(20, strlen(cwd)) + strlen(instance->application_name) + 8); + sprintf(cache_name, "C:\windows\scache\%s.cache", instance->application_name); cache_desc.mem_size = 32 << 20; cache_desc.disk_size = ~0u; cache_desc.max_entries = ~0u; cache_desc.version = VKD3D_SHADER_CACHE_OBJ_VERSION; + cache_desc.flags = VKD3D_SHADER_CACHE_FLAGS_READ_ONLY; + vkd3d_shader_cache_open(cache_name, &cache_desc, &base_cache); + + sprintf(cache_name, "%s/%s.cache", cwd, instance->application_name); + cache_desc.flags = 0; + free(cwd); /* Use libc's free() because it is malloc'ed by getcwd. */ + if (vkd3d_shader_cache_open(cache_name, &cache_desc, &device->persistent_cache)) { FIXME("Failed to open shader cache %s\n", debugstr_a(cache_name)); @@ -4740,6 +4746,10 @@ static HRESULT d3d12_device_init(struct d3d12_device *device,

device_init_descriptor_pool_sizes(device);

+ vkd3d_shader_cache_enumerate(base_cache, d3d12_device_load_cache, device); + vkd3d_shader_cache_enumerate(base_cache, d3d12_device_load_cache2, device); + vkd3d_shader_cache_close(base_cache); + vkd3d_shader_cache_enumerate(device->persistent_cache, d3d12_device_load_cache, device); vkd3d_shader_cache_enumerate(device->persistent_cache, d3d12_device_load_cache2, device);

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541

Stefan Dösinger (＠stefan)

10 a.m.

On Sun Jan 14 09:51:05 2024 +0000, Stefan Dösinger wrote:

...

changed this line in [version 2 of the diff](/wine/vkd3d/-/merge_requests/541/diffs?diff_id=93307&start_sha=db740062da3a7024292a76b9c0fcd26615023012#15cef1a3fac90dca5771e21603ae6cc81a5ac8a6_225_233)

Should be fixed in the update I just pushed

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_57365

Giovanni Mascellani (＠giomasce)

16 Jan 16 Jan

11:35 a.m.

...

There's a return value for hash collisions.

Ah, I didn't know this was a possibility. Still, given that it doesn't seem complicated and you have to compare the key at least once anyway, I wonder why not comparing the key too in the RB comparison function, so you just never have to declare a cache collision, and there is less that can go wrong in the caller.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_57500

Giovanni Mascellani (＠giomasce)

11:35 a.m.

Giovanni Mascellani (@giomasce) commented about libs/vkd3d/state.c:

...

 if (ret)
     ERR("papa!\n");
*root_signature = object;

d3d12_root_signature_AddRef(&object->ID3D12RootSignature_iface);
/* Why the hash as key and d3d root signature description as value? Because we store
* the root signature hash in pipelines and need a way to look up the root signature
* when we recreate the pipelines.
*
* Alternatively we could use bytecode as key here and store a hash -> bytecode lookup
* at runtime in device->root_signature_cache. I am unsure for now. */

Why not storing the whole root signature bytecode in the pipeline cache, so you don't have to keep another map for resolving root signature hashes?

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_57501

Giovanni Mascellani (＠giomasce)

11:35 a.m.

Giovanni Mascellani (@giomasce) commented about libs/vkd3d/vkd3d_private.h:

...

#define VKD3D_SHADER_CACHE_OBJ_VERSION 1ull #define VKD3D_SHADER_CACHE_VKD3D_VERSION 1u

+struct vkd3d_render_pass_key +{

unsigned int attachment_count;

bool depth_enable;

bool stencil_enable;

bool depth_stencil_write;

bool padding;

unsigned int sample_count;

VkFormat vk_formats[D3D12_SIMULTANEOUS_RENDER_TARGET_COUNT + 1];

+};

Shouldn't we either mark structures used as keys as packed, or at least statically assert that their size is what we expect? I don't think it's happening here, but were we to (here or elsewhere) introduce some padding hashing might become unreliable.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_57502

Matteo Bruni (＠Mystral)

17 Jan 17 Jan

10:31 a.m.

On Thu Jan 11 15:23:12 2024 +0000, Stefan Dösinger wrote:

...

As far as ID3D12ShaderCacheSession is concerned, both key and value sizes count towards the size (See https://gitlab.winehq.org/stefan/vkd3d/-/tree/cache-rework for tests which aren't included in this MR). I haven't tested the impact of compression and haven't checked if native compresses the storage at all. I think a flag for specifying a memory-only cache is a good idea. I (ab)used the disk_size = 0 before adding the flags field to the cache desc structure.

I'm sure it's just a matter of personal preference but, in my case, I'd rather not add a pointless parameter when disk_size == 0 means what you'd expect it to mean.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_57695

Stefan Dösinger (＠stefan)

23 Jan 23 Jan

9:16 p.m.

On Thu Jan 11 17:18:41 2024 +0000, Henri Verbeet wrote:

...

...
It would be platform specific, so we'd need some codepath for Unix too.

Right, the idea in such a case would be to use one API on Windows and something else on Linux, although that probably comes with its own set of pitfalls. In any case, having SQLite shipped with Windows certainly makes it a lot more feasible as a potential option.

This is mostly a note for myself for future reference: One quality aspect of the serialization backend is how fast the cache file is loaded into memory. I fed the a somewhat populated cache from Diablo 2 resurrected to the tests/d3d12.cross64.exe, which creates and destroys devices a lot. My simple reader did rather poorly - it increased the test runtime from ~25 to ~55 seconds with the code that actually creates the pipelines disabled.

I didn't implement nor test any alternative backends yet :-)

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_58692

Stefan Dösinger (＠stefan)

26 Feb 26 Feb

7:48 a.m.

On Tue Jan 23 21:16:01 2024 +0000, Stefan Dösinger wrote:

...

This is mostly a note for myself for future reference: One quality aspect of the serialization backend is how fast the cache file is loaded into memory. I fed the a somewhat populated cache from Diablo 2 resurrected to the tests/d3d12.cross64.exe, which creates and destroys devices a lot. My simple reader did rather poorly - it increased the test runtime from ~25 to ~55 seconds with the code that actually creates the pipelines disabled. I didn't implement nor test any alternative backends yet :-)

I am adding some thoughts on sqlite here, for lack of a better place to record them:

The sqlite feature set is fairly nice for what we need. The sqlite project advertises the use as a data storage format even if the SQL functionality isn't needed. Afaics it should support partial loading from disk into memory, although I didn't verify this in practise yet.

sqlite's build system is a bit peculiar: In a first step it uses TCL and other scripts to merge all source files into one "amalgamation" sqlite3.c file. In all likelyhood we'd check that sqlite3.c file into Wine's git repo either in libs/ or dlls/winsqlite3/ itself. One complication is that the parameters passed to the TCL generators are different for Windows and Linux targets. We'd probably want to submit a patch to sqlite3 to add the ability to pass --useapicall from the Unix Makefile to upstream sqlite.

Judging by Microsoft's sqlite3.h header they use sqlite version 3.29.0. They made some adjustments to the header, mostly adding NTDDI_VERSION #ifs. I don't think we care about those differences.

If we use sqlite3 for the shader cache I think we should go for it hook, line and sinker and use the SQL query functionality to search for cache keys. I'd expect it to be more performant than homebrew code.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/541#note_62611

345

Age (days ago)

388

Last active (days ago)

wine-gitlab@winehq.org

29 comments

4 participants

tags (0)

participants (4)

Giovanni Mascellani (＠giomasce)
Matteo Bruni (＠Mystral)
Stefan Dösinger
Stefan Dösinger (＠stefan)