On 6/16/22 11:48, Francisco Casas wrote:
Hello, thanks for commenting on the patch!
On 15-06-22 19:20, Zebediah Figura wrote:
From skimming the patch, here are some immediate thoughts:
- I have a patch which has been around for a while which replaces the
field linked list with an array. I think we wanted that anyway, but we especially want it given list_get().
This seems like a good idea, I will see if I can add it before this patch.
- All current (upstream) uses of hlsl_new_store() and hlsl_new_load()
with a nonzero offset fall into one of two categories:
- retrieving the element at a specific offset (initializer and matrix parsing)
- adding another offset to an existing hlsl_deref (splitting passes, initial parsing of field/element derefs)
I might have missed something, but assuming I've got that right, I think that the hlsl_new_load() and hlsl_new_store() should take arguments to reflect that. (I.e. split each into two functions, one taking an unsigned int and the other taking a const struct hlsl_deref *). In that case struct path_arg would be limited to hlsl.c, if it even needs to exist.
Good observation, I can see some problems though:
a) The version of the function that receives the component index can generate the path of constant nodes alright, but it also needs to insert these nodes in the corresponding instruction list.
The instruction list can be received in this case and the new constant nodes can be added at the head -- or -- we can allow the "steps" in the path to be either an hlsl_src or an unsigned int, and then add a pass that adds the constant nodes at the beginning of the program:
struct hlsl_step { struct hlsl_src *src; unsigned int c; /* used if src is NULL */ };
struct hlsl_deref { struct hlsl_ir_var *var;
unsigned int path_len; struct hlsl_step *path;
};
I don't like this idea too much, but it also covers for the field accesses that Giovanni mentioned.
b) In my patch, accessing matrix components requires paths of length 2 (which I think is the right thing to do) so the version of the functions that receive a hlsl_deref should also receive 2 optional nodes instead of 1. So for instance, hlsl_new_store would be:
struct hlsl_ir_store *hlsl_new_store(struct hlsl_ctx *ctx, struct hlsl_ir_var *var, const hlsl_deref *deref, struct hlsl_ir_node *node1, struct hlsl_ir_node *node2, struct hlsl_ir_node *rhs, unsigned int writemask, struct vkd3d_shader_location loc);
this adds a little clutter.
c) We also have to change other functions, for instance, add_load() in upstream adds an arbitrary register offset to an hlsl_deref offset. In the patch, this generality is achieved by concatenating paths. Given the current uses of add_load() we would also need to create two versions of this function, the "hlsl_deref + 2 optional nodes" one and the "component index" one, and solve this particular case of problem (a) too.
Sure, we'd have to split add_load() into two helpers too. I'm inclined to say that's worthwhile, though.
I am not sure that loosing the generality of using path_arg is a good thing, it can simplify some calls to loads and stores but that's the only benefit I see (unless I am missing something). I can switch to this approach though, if we agree in a way of solving (a).
The problem with path_arg is that it's another layer of abstraction that has to be understood. It'd be nice if we could avoid that, or at least avoid letting it leak out of hlsl.c.
One obvious way to solve (a) and (b) is to replace hlsl_new_load() with a hlsl_add_load() that explicitly takes an instruction list and adds to the end. I think we had even discussed doing something similar to this previously, to save a bit of effort in the common new + insert case. This should cover most cases, but I'm not sure it covers all cases—there are places in hlsl_codegen.c that we use list_add_before(), list_add_after(), list_add_head().
What we could do instead, to make sure all cases are covered, is something like
bool hlsl_new_xyzzy(struct hlsl_block *block, unsigned int c, ...) { struct hlsl_ir_constant *constant; struct hlsl_ir_xyzzy *xyzzy;
list_init(&block->instrs);
constant = hlsl_new_uint_constant(c); list_add_tail(&block->instrs, c);
xyzzy = hlsl_alloc(...); init_node(&xyzzy);
return true; /* or xyzzy for convenience? */ }
and then
{ struct hlsl_block block;
hlsl_new_xyzzy(&block, 123, ...); list_move_{head,tail,before,after}(..., &block->instrs); }
the idea basically being to return a list of instructions in a caller-allocated container.
I think there's been at least one other time I've wanted to do something like this, although I don't know what that was now.
Note that this step could also be done independently of anything else, which would help make the "big" patch more reviewable.
You mean something like adding a "Replace path_arg with specific load and store functions." patch afterwards?
I mean you can have one patch (or multiple patches) that change the hlsl_new_load() and hlsl_new_store() functions like I describe above, and then another patch (i.e. the one attached to your email, mostly) that changes the internal representation of hlsl_derefs from a register offset to a component chain.
- Similarly I think hlsl_new_resource_load() and friends should just
take a "const struct hlsl_deref *" or two.
Yes, this seems consistent if we do stores and loads this way.
I'm not immediately sure what to do with prepend_input_copy() and friends—maybe those should be converted to iterative passes instead of recursive ones.
I don't see any problem keeping them as recursive passes.
Well, if we follow my above suggestion regarding the interface for hlsl_new_load() and hlsl_new_store(), there's no way to pass a whole list of offsets at once. I think this is probably for the better, but it does mean we probably need to split up I/O copies one level at a time.
- Stuff like free_hlsl_deref() can be made into a separate patch.
Hmm, sorry, I am not sure I see a clear division here. What other things could be in this separate patch?
I just mean that you can invent a patch that adds
static void free_hlsl_deref(struct hlsl_deref *deref) { hlsl_src_remove(&deref->offset); }
and then in "this" patch you only need to change the body of free_hlsl_deref(), i.e. one hunk instead of several.