On 10/10/22 13:17, Francisco Casas wrote:
On 10-10-22 02:09, Zebediah Figura wrote:
I haven't done a detailed review of these patches, but looking over them at a high level, the general approach seems right for all of these, so I think it would make sense to start submitting them (a few at a time).
Sure!
I am thinking on prepending some patches with field-level comments for the data structures too. To put code where my mouth is (regarding my suggestion of field-level comments), and so that the fields usage is clear as the patch series progresses.
Sounds great :-)
I do have a few high-level comments that might be good to address sooner than later, though.
From "vkd3d-shader/hlsl: Span variables across multiple register spaces.":
- "The 'reg' field is still used for numeric register spaces. The
HLSL_SPACE_DEFAULT value can be used to indicate this." — My inclination is that we don't want to do this. We should just use one array, probably an array of hlsl_reg structures (even though writemask is meaningless for anything non-numeric). Same for reg_size vs reg_space_size.
I agree, I will try to go in that direction.
The only thing to keep in mind is that it implies a very intrusive patch (or couple of patches) that change all uses of the 'reg' and 'reg_size' fields, so that should be considered when rebasing.
Yeah, but in a sense that's a good thing. We're changing the ways those fields are used, so we do want to make sure we consider all of them.
- HLSL_SPACE_* naming could be improved, I think. It's written to match
the bytecode, but I'd advocate for SAMPLER and TEXTURE instead of S and T, and probably NUMERIC instead of DEFAULT. [Although... hmm. While writing this I realized that this does make it a bit easier to mix up HLSL_TYPE_TEXTURE and HLSL_SPACE_TEXTURE. Still...]
When I was first implementing this patch I indeed went for HLSL_SPACE_SAMPLER and HLSL_SPACE_TEXTURE, but because textures in SM1 use HLSL_SPACE_SAMPLER too, I thought that may lead to confusion. Also, these enum values get used in many places and I found the abbreviated names to be far less verbose, and harder to confuse with the HLSL_TYPE_* as you mentioned. But if you still want, I will go back to use those names.
I decided to name this "register space" as "DEFAULT" because I am also using it as a return value for hlsl_type_get_reg_space() for struct types, that may or may not contain objects within their fields. Maybe I should separate HLSL_SPACE_DEFAULT into HLSL_SPACE_NUMERIC, that encompasses the numeric register spaces according to the situation (constant, temps, or semantics) and HLSL_SPACE_MULTI for struct types; but I am not yet sure about it.
Are we ever going to call hlsl_type_get_reg_space() on anything larger than a single component?
- Also, "reg_space" is... a bit unfortunate, since "(register) space"
also means something else (specifically, sm 5.1 register space). Unfortunately what we're dealing with is namespaces on namespaces. Some other names that occur to me are "set" (although that one means something else for Vulkan), "namespace", "reg_type". "Set" is probably my favourite, especially since that matches the d3dx9 terminology. I'm open to bikeshedding this one though.
I like register "set". Another alternative may be register "group", if we don't want them to collide with other definitions.
- We probably want to move away from storing reg_size (and hence also
reg_space_size) in the variable. I don't mind deferring that one to later, though.
I think you mean "in the type" instead of "in the variable" (?). Hmm, I don't see the problem on keeping that information in the hlsl_type, since it is constant.
Still, yeah, if we want to switch to computing them on the fly as needed, we better do that later.
Yeah, I misspoke, I meant in the type.
From "vkd3d-shader/hlsl: Skip object components when creating input/output copies.":
- Why this patch? Is this broken now?
Yes, currently a shader that contains an input variable with an object field either reaches unreachable code or throws a failed assertion.
For instance:
struct apple { Texture2D tex; float4 a : sem2; };
float4 main(struct apple var) : sv_target { return var.tex.Load(float3(1, 2, 3)); }
I think it also resulted on some segfaults with similar shaders. The reason being that prepend_input_var_copy() and append_var_output_copy() are not prepared to handle object fields.
Still, this made me realize that I should do
if (var->data_type->reg_space_size[k] > 0)
instead of
if (var->reg_space[k].count > 0)
!
Because "var->reg_space[k].count" is not set yet.
But probably, that is redundant with the checks added on that patch to prepend_input_copy() and append_output_copy() in the same patch. I have to try to remember why I didn't just keep those.
Oh, that's awkward. So I guess what we really need is for prepend_input_{var,struct}_copy() to call prepend_uniform_copy() for uniforms. That should be relatively painless, though.
From "vkd3d-shader/hlsl: Track object components usage and allocate registers accordingly.":
- This one looks fundamentally sensible. It may be an open question
whether we want to merge the first_write/last_read fields into object_usage (and make the whole thing completely per-component rather than just per-object-component) but probably makes more sense just to leave it alone for the moment.
hmm... I may be wrong, but...
I think that... with the combined effect of:
- splitting struct copies.
- splitting array copies.
- splitting matrix copies.
- component-wise copy propagation.
all used temp variables should be a single register now (?).
Currently, we split copies, but don't split the variables themselves; i.e. they'll always take a contiguous allocation. That could change, of course (and doing so might help us pack registers more efficiently in some cases), but we don't and can't split uniforms.
And, on the other hand, the non-temps lifetime is the same as the whole program.
So, having a first_write, last_read for every component won't be very useful considering these things.
Still, yes, probably better to analyze this in the future.
Right, the first_write part wouldn't exactly be used for object types (and uniforms in general), but on the other hand, currently we're using both "last_read" and "objects_usage[...][...].used" to track whether a variable is read. So it's not exactly ideal, but it may take more prototyping to determine what is.
I also like having an object_usage array for each "register space" so that the offset can be used immediately to get the information about that object, instead of having to go through hlsl_reg_space_range_from_deref(), which we probably want to do after path indexes get into the sm1 and sm4 functions but we can't do easily now.
Hmm, I'm less sure about this one; it's basically redundant. I'll have to take a second look when that patch gets submitted...
- hlsl_reg_space_range_from_deref() is odd here, because, why the count
parameter? As far as I can tell it's not used in this patch.
Well, yes, so far it is only used for doing assert(count == 1); after calling hlsl_reg_space_range_from_deref(). Which I think is a good assertion to have, because we are checking that resource arrays cannot be passed when single resources are expected.
Also, I added it for consistency with hlsl_component_index_range_from_deref(), and I thought it could be useful in the future, if we want to handle SM 5.1 resource arrays and their non-constant indexes.
But yes, if you consider it is dead code I can remove it.
I think the assertion makes sense, but it should probably live inside of hlsl_reg_space_range_from_deref() in that case (and then the function probably shouldn't have "range" in the name either).
FWIW, a SM 5.1 resource array should I think always be considered a single component from the HLSL perspective. That matches the way they're initialized. We'll need a separate field in struct hlsl_ir_resource_load for the index.
- request_object_registers_for_allocation is one of those things that we
can probably do as part of track_object_components_usage(). I.e. count = max(count, index).
Yes, but I have the feeling that it is better to have this as a separate step, so that all uses of "reg_space" are in the same place, when doing register allocation, after compute_liveness(). Just to keep things separated.
Hmm, maybe. I'll probably need to take a second look at this one as well.
The last few patches related to sorting—can these be put into sm{1,4}_sort_externs() instead of being separate functions?
Just to clarify, there are two different orderings for the externs we have to care about:
A) The register indexes they have assigned. B) The order in which they appear in the CTAB/RDEF tables.
I am afraid it is necessary to call these sorting functions before doing the allocate_objects() calls, in order to ensure that A is done imitating the native compiler.
The sm{1,4}_sort_externs() then should ensure that B is done imitating the native compiler.
Ah, right, I failed to notice there are two different orders we care about. So the patches make sense as they are. And since I think we need to match declaration order (right?), that means we don't want to use the same code structure as smX_sort_externs().
I haven't yet took much time one the latter, as my understanding is that imitating A is more important for applications (if some of them make assumptions about on which register to place the resources).
If necessary I can work later on some patches that modify sm{1,4}_sort_externs() to try to achieve B.
I think we should be close already, so I'm not too concerned. If you know of some way we're off, that'd be appreciated to fix, or at least make a note of, but it doesn't need to block anything. (It's also a relatively weird thing for an application to care about...)