On Wed, 11 Aug 2021 at 11:50, Jan Sikorski jsikorski@codeweavers.com wrote:
On 10 Aug 2021, at 20:50, Henri Verbeet hverbeet@gmail.com wrote: On Tue, 10 Aug 2021 at 10:35, Jan Sikorski jsikorski@codeweavers.com wrote:
dlls/wined3d/shader_spirv.c | 112 +++++++++++++++++++++++++++++++++ dlls/wined3d/wined3d_private.h | 2 + 2 files changed, 114 insertions(+)
And this too is never used anywhere until patch 5/5 in this series.
Yes, do I squash it all together then?
2, 4, and 5 would probably need to be together, yes. In the case of wined3d_view_load_location(), you can still do that as a separate patch if you use it in wined3d_rendertarget_view_load_location() in the same patch.
- .shader_run_compute = wined3d_spirv_run_compute,
};
"shader_run_compute" remains uninitialised for the other shader backends.
It should be initialised to NULL, which seemed appropriate, but if we don’t want this I’ll make it spit an ERR.
So far we've gone with printing a FIXME/ERR for unimplemented wined3d_shader_backend_ops operations, yes. Perhaps we could revisit that, but even then we'd probably want to put an explicit NULL in e.g. "glsl_shader_backend".
Conceptually, it doesn't seem quite proper for the shader backends to do compute dispatch by themselves; ideally these would only translate shaders, although in practice they're also responsible for setting up some related state, for OpenGL in particular.
I'm not sure whether you perhaps already considered and rejected this, but would it be very hard to use the existing .shader_select_compute() operation from wined3d_unordered_access_view_vk_clear()? Or perhaps simply adapter_vk_dispatch_compute()?
My understanding is that I can’t touch wined3d_state, so in order to unite with existing functionality I’d have to decouple it from the state. Maybe we want this anyway? It looks straightforward to do for shader_select_compute. adapter_vk_dispatch_compute() mostly applies state so I don’t see how it could be of use here.
I think we have a number of options, each with their advantages and disadvantages:
- We can do this largely adapter agnostic in wined3d_device_context_clear_uav_float()/wined3d_device_context_clear_uav_uint() by using wined3d_state_create(), wined3d_device_context_set_state(), wined3d_device_context_set_shader(), wined3d_device_context_set_constant_buffers(), wined3d_device_context_set_unordered_access_views(), and wined3d_device_context_dispatch(). The nice thing about that approach is that it would work just as well for the OpenGL backend, even without ARB_clear_texture and ARB_clear_buffer, or e.g. for a hypothetical Metal backend. The main disadvantages would be that wined3d_device_context_set_state() is a fairly heavy operation, and we wouldn't be able to use the same approach for e.g. depth/stencil readback.
- Instead of using wined3d_state_create() and wined3d_device_context_set_state(), we could modify the existing device state before dispatch, and restore it afterwards. For graphics shaders that kind of thing can be a bit painful, but we wouldn't have to touch that much state for compute shaders. That still doesn't help for depth/stencil readback though.
- Instead of doing this on the application side of the command stream, we could do it from wined3d_cs_exec_clear_unordered_access_view() or wined3d_unordered_access_view_vk_clear(). We'd now have to manually set and invalidate the relevant state, but perhaps that's still ok. The basic principle should generalise to other compute shaders, and we're still not using anything adapter-specific, so we could e.g. call this from wined3d_unordered_access_view_gl_clear() as a fallback. (Or simply the main implementation; glClearTexSubImage() is convenient, but I'm not sure whether it's any more efficient than using compute shaders on any GL implementation.)
- There's a chance that implementing this in a Vulkan-specific way would be more efficient. E.g. there may be an advantage to using push constants instead of uniform buffers, and we'd avoid some state processing. It would be hard to quantify that without doing some benchmarking first though. If we take this approach, we essentially want a small wrapper around vkd3d_shader_compile() from the shader backend; we give it a D3D shader, some interface/binding information, and get back a SPIR-V shader. Decoupling shader_select_compute() from the state largely gets us there, although it introduces some asymmetry between shader_select() and shader_select_compute() (and decoupling shader_select() from the state won't be quite as easy). Arguably there's not much of a point in passing the shader backend a wined3d_shader in this case either; we may as well just pass it the byte code. This approach is likely the most flexible and the most efficient, but also likely the most complicated. By its nature, we can't use the same implementation for different adapter types.