If a hlsl_ir_load loads a variable whose components are stored from different instructions, copy propagation doesn't replace it.
But if all these instructions are constants (which currently is the case for value constructors), the load can be replaced with a constant value, which is what the first patch of this series does.
For instance, this shader:
``` sampler s; Texture2D t;
float4 main() : sv_target { return t.Gather(s, float2(0.6, 0.6), int2(0, 0)); } ```
results in the following IR before applying the patch: ``` float | 6.00000024e-01 float | 6.00000024e-01 uint | 0 | = (<constructor-2>[@4].x @2) uint | 1 | = (<constructor-2>[@6].x @3) float2 | <constructor-2> int | 0 int | 0 uint | 0 | = (<constructor-5>[@11].x @9) uint | 1 | = (<constructor-5>[@13].x @10) int2 | <constructor-5> float4 | gather_red(resource = t, sampler = s, coords = @8, offset = @15) | return | = (<output-sv_target0> @16) ```
and this IR afterwards: ``` float2 | {6.00000024e-01 6.00000024e-01 } int2 | {0 0 } float4 | gather_red(resource = t, sampler = s, coords = @2, offset = @3) | return | = (<output-sv_target0> @4) ```
This is required to write texel_offsets as aoffimmi modifiers in the sm4 backend, since it expects the texel_offset arguments to be hlsl_ir_constant.
This series also: * Allows Gather() methods to use aoffimmi modifiers instead of an additional source register (which is the only way allowed for shader model 4.1), when possible. * Adds support to texel_offsets in the Load() method via aoffimmi modifiers (the only allowed method).
-- v9: vkd3d-shader/hlsl: Fold swizzle chains. vkd3d-shader/hlsl: Apply copy propagation to swizzled loads. vkd3d-shader/hlsl: Use aoffimmis when writing gather resource loads. vkd3d-shader/hlsl: Replace loads with constants in copy prop. vkd3d-shader/hlsl: Synthesize the swizzle and replace the instruction inside of copy_propagation_compute_replacement(). vkd3d-shader/hlsl: Call copy_propagation_get_value() directly in copy_propagation_transform_object_load(). vkd3d-shader/hlsl: Add some swizzle manipulation definitions. tests: Test constant propagation through swizzles. vkd3d-shader/hlsl: Support offset argument for the texture Load() method. tests: Test offset argument for the texture Load() method.
From: Francisco Casas fcasas@codeweavers.com
--- Makefile.am | 1 + tests/texture-load-offset.shader_test | 51 +++++++++++++++++++++++++++ 2 files changed, 52 insertions(+) create mode 100644 tests/texture-load-offset.shader_test
diff --git a/Makefile.am b/Makefile.am index 464b43ae..e1d56725 100644 --- a/Makefile.am +++ b/Makefile.am @@ -140,6 +140,7 @@ vkd3d_shader_tests = \ tests/swizzle-6.shader_test \ tests/swizzle-7.shader_test \ tests/texture-load.shader_test \ + tests/texture-load-offset.shader_test \ tests/texture-load-typed.shader_test \ tests/trigonometry.shader_test \ tests/uav.shader_test \ diff --git a/tests/texture-load-offset.shader_test b/tests/texture-load-offset.shader_test new file mode 100644 index 00000000..ab233c58 --- /dev/null +++ b/tests/texture-load-offset.shader_test @@ -0,0 +1,51 @@ +[require] +shader model >= 4.0 + +[texture 0] +size (3, 3) +0 0 0 1 1 0 0 1 2 0 0 1 +0 1 0 1 1 1 0 1 2 1 0 1 +0 2 0 1 1 2 0 1 2 2 0 1 + + +[pixel shader] +Texture2D t; + +float4 main(float4 pos : sv_position) : sv_target +{ + return t.Load(int3(pos.xy, 0), int2(0, 1)); +} + + +[test] +draw quad +todo probe (0, 0) rgba (0, 1, 0, 1) +todo probe (1, 0) rgba (1, 1, 0, 1) +todo probe (0, 1) rgba (0, 2, 0, 1) +todo probe (1, 1) rgba (1, 2, 0, 1) + + +[pixel shader] +Texture2D t; + +float4 main(float4 pos : sv_position) : sv_target +{ + return t.Load(int3(pos.xy, 0), int2(-2, 0)); +} + + +[test] +draw quad +todo probe (3, 0) rgba (1, 0, 0, 1) +todo probe (4, 0) rgba (2, 0, 0, 1) +todo probe (3, 1) rgba (1, 1, 0, 1) +todo probe (4, 1) rgba (2, 1, 0, 1) + + +[pixel shader fail todo] +Texture2D t; + +float4 main(float4 pos : sv_position) : sv_target +{ + return t.Load(int3(pos.xy, 0), int2(8, 1)); +}
From: Francisco Casas fcasas@codeweavers.com
--- libs/vkd3d-shader/hlsl_sm4.c | 16 ++++++++++++++-- tests/texture-load-offset.shader_test | 10 +++++----- 2 files changed, 19 insertions(+), 7 deletions(-)
diff --git a/libs/vkd3d-shader/hlsl_sm4.c b/libs/vkd3d-shader/hlsl_sm4.c index d9d05e04..ae7017a8 100644 --- a/libs/vkd3d-shader/hlsl_sm4.c +++ b/libs/vkd3d-shader/hlsl_sm4.c @@ -1418,7 +1418,8 @@ static void write_sm4_constant(struct hlsl_ctx *ctx,
static void write_sm4_ld(struct hlsl_ctx *ctx, struct vkd3d_bytecode_buffer *buffer, const struct hlsl_type *resource_type, const struct hlsl_ir_node *dst, - const struct hlsl_deref *resource, const struct hlsl_ir_node *coords) + const struct hlsl_deref *resource, const struct hlsl_ir_node *coords, + const struct hlsl_ir_node *texel_offset) { bool uav = (resource_type->base_type == HLSL_TYPE_UAV); struct sm4_instruction instr; @@ -1427,6 +1428,16 @@ static void write_sm4_ld(struct hlsl_ctx *ctx, struct vkd3d_bytecode_buffer *buf memset(&instr, 0, sizeof(instr)); instr.opcode = uav ? VKD3D_SM5_OP_LD_UAV_TYPED : VKD3D_SM4_OP_LD;
+ if (texel_offset) + { + if (!encode_texel_offset_as_aoffimmi(&instr, texel_offset)) + { + hlsl_error(ctx, &texel_offset->loc, VKD3D_SHADER_ERROR_HLSL_INVALID_TEXEL_OFFSET, + "Offset must resolve to integer literal in the range -8 to 7."); + return; + } + } + sm4_dst_from_node(&instr.dsts[0], dst); instr.dst_count = 1;
@@ -2171,7 +2182,8 @@ static void write_sm4_resource_load(struct hlsl_ctx *ctx, switch (load->load_type) { case HLSL_RESOURCE_LOAD: - write_sm4_ld(ctx, buffer, resource_type, &load->node, &load->resource, coords); + write_sm4_ld(ctx, buffer, resource_type, &load->node, &load->resource, + coords, texel_offset); break;
case HLSL_RESOURCE_SAMPLE: diff --git a/tests/texture-load-offset.shader_test b/tests/texture-load-offset.shader_test index ab233c58..6d732190 100644 --- a/tests/texture-load-offset.shader_test +++ b/tests/texture-load-offset.shader_test @@ -8,7 +8,7 @@ size (3, 3) 0 2 0 1 1 2 0 1 2 2 0 1
-[pixel shader] +[pixel shader todo] Texture2D t;
float4 main(float4 pos : sv_position) : sv_target @@ -18,14 +18,14 @@ float4 main(float4 pos : sv_position) : sv_target
[test] -draw quad +todo draw quad todo probe (0, 0) rgba (0, 1, 0, 1) todo probe (1, 0) rgba (1, 1, 0, 1) todo probe (0, 1) rgba (0, 2, 0, 1) todo probe (1, 1) rgba (1, 2, 0, 1)
-[pixel shader] +[pixel shader todo] Texture2D t;
float4 main(float4 pos : sv_position) : sv_target @@ -35,14 +35,14 @@ float4 main(float4 pos : sv_position) : sv_target
[test] -draw quad +todo draw quad todo probe (3, 0) rgba (1, 0, 0, 1) todo probe (4, 0) rgba (2, 0, 0, 1) todo probe (3, 1) rgba (1, 1, 0, 1) todo probe (4, 1) rgba (2, 1, 0, 1)
-[pixel shader fail todo] +[pixel shader fail] Texture2D t;
float4 main(float4 pos : sv_position) : sv_target
From: Francisco Casas fcasas@codeweavers.com
The Load() method offsets are used for these tests since these must solve to constants in order to pass. --- Makefile.am | 1 + tests/swizzle-constant-prop.shader_test | 63 +++++++++++++++++++++++++ 2 files changed, 64 insertions(+) create mode 100644 tests/swizzle-constant-prop.shader_test
diff --git a/Makefile.am b/Makefile.am index e1d56725..9e3bba79 100644 --- a/Makefile.am +++ b/Makefile.am @@ -139,6 +139,7 @@ vkd3d_shader_tests = \ tests/swizzle-5.shader_test \ tests/swizzle-6.shader_test \ tests/swizzle-7.shader_test \ + tests/swizzle-constant-prop.shader_test \ tests/texture-load.shader_test \ tests/texture-load-offset.shader_test \ tests/texture-load-typed.shader_test \ diff --git a/tests/swizzle-constant-prop.shader_test b/tests/swizzle-constant-prop.shader_test new file mode 100644 index 00000000..2599983b --- /dev/null +++ b/tests/swizzle-constant-prop.shader_test @@ -0,0 +1,63 @@ +% The texel offset argument to Load() must resolve to a constant integer; +% make sure that we can do so. + +[require] +shader model >= 4.0 + + +[texture 0] +size (4, 4) + 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 + 5 5 5 5 6 6 6 6 7 7 7 7 8 8 8 8 + 9 9 9 9 10 10 10 10 11 11 11 11 12 12 12 12 +13 13 13 13 14 14 14 14 14 15 15 15 16 16 16 16 + + +[pixel shader todo] +Texture2D tex; +uniform int i; + +float4 main() : sv_target +{ + int4 a = {1, 2, i, i}; + return 100 * a + tex.Load(int3(0, 0, 0), a.xy); +} + +[test] +uniform 0 int 4 +todo draw quad +todo probe all rgba (110, 210, 410, 410) + + +[pixel shader todo] +Texture2D tex; +uniform int i; + +float4 main() : sv_target +{ + int4 a = {0, 1, 2, i}; + int4 b = a.yxww; + int3 c = b.wyx; + return 100 * b + tex.Load(int3(0, 0, 0), c.yz); +} + +[test] +uniform 0 int 3 +todo draw quad +todo probe all rgba (105, 5, 305, 305) + + +[pixel shader todo] +Texture2D tex; +uniform int i; + +float4 main() : sv_target +{ + int4 a = {1, 2, 3, i}; + return tex.Load(int3(0, 0, 0), a.wzxx.yxw.zx); +} + +[test] +uniform 0 int 1 +todo draw quad +todo probe all rgba (14.0, 14.0, 14.0, 14.0)
From: Zebediah Figura zfigura@codeweavers.com
--- libs/vkd3d-shader/hlsl.c | 6 +++--- libs/vkd3d-shader/hlsl.h | 8 ++++++++ libs/vkd3d-shader/hlsl_codegen.c | 4 ++-- libs/vkd3d-shader/hlsl_constant_ops.c | 8 ++------ 4 files changed, 15 insertions(+), 11 deletions(-)
diff --git a/libs/vkd3d-shader/hlsl.c b/libs/vkd3d-shader/hlsl.c index a440aa39..7d8e29f9 100644 --- a/libs/vkd3d-shader/hlsl.c +++ b/libs/vkd3d-shader/hlsl.c @@ -1641,7 +1641,7 @@ const char *debug_hlsl_swizzle(unsigned int swizzle, unsigned int size)
assert(size <= ARRAY_SIZE(components)); for (i = 0; i < size; ++i) - string[i] = components[(swizzle >> i * 2) & 3]; + string[i] = components[hlsl_swizzle_get_component(swizzle, i)]; string[size] = 0; return vkd3d_dbg_sprintf(".%s", string); } @@ -2299,8 +2299,8 @@ unsigned int hlsl_combine_swizzles(unsigned int first, unsigned int second, unsi unsigned int ret = 0, i; for (i = 0; i < dim; ++i) { - unsigned int s = (second >> (i * 2)) & 3; - ret |= ((first >> (s * 2)) & 3) << (i * 2); + unsigned int s = hlsl_swizzle_get_component(second, i); + ret |= hlsl_swizzle_get_component(first, s) << HLSL_SWIZZLE_SHIFT(i); } return ret; } diff --git a/libs/vkd3d-shader/hlsl.h b/libs/vkd3d-shader/hlsl.h index 14070239..fff7d797 100644 --- a/libs/vkd3d-shader/hlsl.h +++ b/libs/vkd3d-shader/hlsl.h @@ -63,6 +63,14 @@ | ((HLSL_SWIZZLE_ ## z) << 4) \ | ((HLSL_SWIZZLE_ ## w) << 6))
+#define HLSL_SWIZZLE_MASK (0x3u) +#define HLSL_SWIZZLE_SHIFT(idx) (2u * (idx)) + +static inline unsigned int hlsl_swizzle_get_component(unsigned int swizzle, unsigned int idx) +{ + return (swizzle >> HLSL_SWIZZLE_SHIFT(idx)) & HLSL_SWIZZLE_MASK; +} + enum hlsl_type_class { HLSL_CLASS_SCALAR, diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index 71e515b8..a90524e5 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -710,7 +710,7 @@ static struct hlsl_ir_node *copy_propagation_compute_replacement(struct hlsl_ctx TRACE("No single source for propagating load from %s[%u-%u].\n", var->name, start, start + count); return NULL; } - *swizzle |= value->component << i * 2; + *swizzle |= value->component << HLSL_SWIZZLE_SHIFT(i); }
TRACE("Load from %s[%u-%u] propagated as instruction %p%s.\n", @@ -1255,7 +1255,7 @@ static bool remove_trivial_swizzles(struct hlsl_ctx *ctx, struct hlsl_ir_node *i return false;
for (i = 0; i < instr->data_type->dimx; ++i) - if (((swizzle->swizzle >> (2 * i)) & 3) != i) + if (hlsl_swizzle_get_component(swizzle->swizzle, i) != i) return false;
hlsl_replace_node(instr, swizzle->val.node); diff --git a/libs/vkd3d-shader/hlsl_constant_ops.c b/libs/vkd3d-shader/hlsl_constant_ops.c index ea59fb86..3210bbd5 100644 --- a/libs/vkd3d-shader/hlsl_constant_ops.c +++ b/libs/vkd3d-shader/hlsl_constant_ops.c @@ -603,7 +603,7 @@ bool hlsl_fold_constant_swizzles(struct hlsl_ctx *ctx, struct hlsl_ir_node *inst { struct hlsl_ir_constant *value, *res; struct hlsl_ir_swizzle *swizzle; - unsigned int i, swizzle_bits; + unsigned int i;
if (instr->type != HLSL_IR_SWIZZLE) return false; @@ -615,12 +615,8 @@ bool hlsl_fold_constant_swizzles(struct hlsl_ctx *ctx, struct hlsl_ir_node *inst if (!(res = hlsl_new_constant(ctx, instr->data_type, &instr->loc))) return false;
- swizzle_bits = swizzle->swizzle; for (i = 0; i < swizzle->node.data_type->dimx; ++i) - { - res->value[i] = value->value[swizzle_bits & 3]; - swizzle_bits >>= 2; - } + res->value[i] = value->value[hlsl_swizzle_get_component(swizzle->swizzle, i)];
list_add_before(&swizzle->node.entry, &res->node.entry); hlsl_replace_node(&swizzle->node, &res->node);
From: Zebediah Figura zfigura@codeweavers.com
copy_propagation_compute_replacement() is not doing very much for us, and conceptually is a bit of an odd fit anyway, since it's meant to deal with multi-component types. --- libs/vkd3d-shader/hlsl_codegen.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index a90524e5..09cda714 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -763,15 +763,20 @@ static bool copy_propagation_transform_load(struct hlsl_ctx *ctx, static bool copy_propagation_transform_object_load(struct hlsl_ctx *ctx, struct hlsl_deref *deref, struct copy_propagation_state *state) { + struct copy_propagation_value *value; struct hlsl_ir_load *load; - struct hlsl_ir_node *instr; - unsigned int swizzle; + unsigned int start, count; + + if (!hlsl_component_index_range_from_deref(ctx, deref, &start, &count)) + return false; + assert(count == 1);
- if (!(instr = copy_propagation_compute_replacement(ctx, state, deref, &swizzle))) + if (!(value = copy_propagation_get_value(state, deref->var, start))) return false; + assert(!value->component);
/* Only HLSL_IR_LOAD can produce an object. */ - load = hlsl_ir_load(instr); + load = hlsl_ir_load(value->node);
hlsl_cleanup_deref(deref); hlsl_copy_deref(ctx, deref, &load->src);
From: Zebediah Figura zfigura@codeweavers.com
Rename it to copy_propagation_replace_with_single_instr() accordingly.
The idea is to introduce a constant vector replacement pass which will do the same thing. --- libs/vkd3d-shader/hlsl_codegen.c | 66 +++++++++++++++----------------- 1 file changed, 31 insertions(+), 35 deletions(-)
diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index 09cda714..ebc1822b 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -681,61 +681,65 @@ static void copy_propagation_set_value(struct copy_propagation_var_def *var_def, } }
-static struct hlsl_ir_node *copy_propagation_compute_replacement(struct hlsl_ctx *ctx, - const struct copy_propagation_state *state, const struct hlsl_deref *deref, - unsigned int *swizzle) +static bool copy_propagation_replace_with_single_instr(struct hlsl_ctx *ctx, + const struct copy_propagation_state *state, struct hlsl_ir_load *load) { + const struct hlsl_deref *deref = &load->src; const struct hlsl_ir_var *var = deref->var; - struct hlsl_ir_node *instr = NULL; + struct hlsl_ir_node *instr = &load->node; + struct hlsl_ir_node *new_instr = NULL; unsigned int start, count, i; + unsigned int ret_swizzle = 0;
if (!hlsl_component_index_range_from_deref(ctx, deref, &start, &count)) - return NULL; - - *swizzle = 0; + return false;
for (i = 0; i < count; ++i) { struct copy_propagation_value *value = copy_propagation_get_value(state, var, start + i);
if (!value) - return NULL; + return false;
- if (!instr) + if (!new_instr) { - instr = value->node; + new_instr = value->node; } - else if (instr != value->node) + else if (new_instr != value->node) { TRACE("No single source for propagating load from %s[%u-%u].\n", var->name, start, start + count); - return NULL; + return false; } - *swizzle |= value->component << HLSL_SWIZZLE_SHIFT(i); + ret_swizzle |= value->component << HLSL_SWIZZLE_SHIFT(i); }
TRACE("Load from %s[%u-%u] propagated as instruction %p%s.\n", - var->name, start, start + count, instr, debug_hlsl_swizzle(*swizzle, count)); - return instr; + var->name, start, start + count, new_instr, debug_hlsl_swizzle(ret_swizzle, count)); + + if (instr->data_type->type != HLSL_CLASS_OBJECT) + { + struct hlsl_ir_swizzle *swizzle_node; + + if (!(swizzle_node = hlsl_new_swizzle(ctx, ret_swizzle, count, new_instr, &instr->loc))) + return false; + list_add_before(&instr->entry, &swizzle_node->node.entry); + new_instr = &swizzle_node->node; + } + + hlsl_replace_node(instr, new_instr); + return true; }
static bool copy_propagation_transform_load(struct hlsl_ctx *ctx, struct hlsl_ir_load *load, struct copy_propagation_state *state) { - struct hlsl_ir_node *instr = &load->node, *new_instr; - struct hlsl_type *type = instr->data_type; - struct hlsl_ir_swizzle *swizzle_node; - unsigned int dimx = 0; - unsigned int swizzle; + struct hlsl_type *type = load->node.data_type;
switch (type->type) { case HLSL_CLASS_SCALAR: case HLSL_CLASS_VECTOR: - dimx = type->dimx; - break; - case HLSL_CLASS_OBJECT: - dimx = 1; break;
case HLSL_CLASS_MATRIX: @@ -746,18 +750,10 @@ static bool copy_propagation_transform_load(struct hlsl_ctx *ctx, return false; }
- if (!(new_instr = copy_propagation_compute_replacement(ctx, state, &load->src, &swizzle))) - return false; + if (copy_propagation_replace_with_single_instr(ctx, state, load)) + return true;
- if (type->type != HLSL_CLASS_OBJECT) - { - if (!(swizzle_node = hlsl_new_swizzle(ctx, swizzle, dimx, new_instr, &instr->loc))) - return false; - list_add_before(&instr->entry, &swizzle_node->node.entry); - new_instr = &swizzle_node->node; - } - hlsl_replace_node(instr, new_instr); - return true; + return false; }
static bool copy_propagation_transform_object_load(struct hlsl_ctx *ctx,
From: Francisco Casas fcasas@codeweavers.com
If a hlsl_ir_load loads a variable whose components are stored from different instructions, copy propagation doesn't replace it.
But if all these instructions are constants (which currently is the case for value constructors), the load could be replaced with a constant value. Which is expected in some other instructions, e.g. texel_offsets when using aoffimmi modifiers.
For instance, this shader:
``` sampler s; Texture2D t;
float4 main() : sv_target { return t.Gather(s, float2(0.6, 0.6), int2(0, 0)); } ```
results in the following IR before applying the patch: ``` float | 6.00000024e-01 float | 6.00000024e-01 uint | 0 | = (<constructor-2>[@4].x @2) uint | 1 | = (<constructor-2>[@6].x @3) float2 | <constructor-2> int | 0 int | 0 uint | 0 | = (<constructor-5>[@11].x @9) uint | 1 | = (<constructor-5>[@13].x @10) int2 | <constructor-5> float4 | gather_red(resource = t, sampler = s, coords = @8, offset = @15) | return | = (<output-sv_target0> @16) ```
and this IR afterwards: ``` float2 | {6.00000024e-01 6.00000024e-01 } int2 | {0 0 } float4 | gather_red(resource = t, sampler = s, coords = @2, offset = @3) | return | = (<output-sv_target0> @4) ``` --- libs/vkd3d-shader/hlsl_codegen.c | 85 ++++++++++++++++++++++ tests/hlsl-initializer-objects.shader_test | 8 +- tests/object-references.shader_test | 6 +- tests/sampler-offset.shader_test | 12 +-- tests/shader_runner_d3d12.c | 2 +- tests/texture-load-offset.shader_test | 24 +++--- 6 files changed, 111 insertions(+), 26 deletions(-)
diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index ebc1822b..b05109b0 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -501,6 +501,52 @@ static bool lower_broadcasts(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, v return false; }
+/* + * Copy propagation. The basic idea is to recognize instruction sequences of the + * form: + * + * 2: <any instruction> + * 3: v = @2 + * 4: load(v) + * + * and replace the load (@4) with the original instruction (@2). + * This works for multiple components, even if they're written using separate + * store instructions, as long as the rhs is the same in every case. This basic + * detection is implemented by copy_propagation_replace_with_single_instr(). + * + * We use the same infrastructure to implement a more specialized + * transformation. We recognize sequences of the form: + * + * 2: 123 + * 3: var.x = @2 + * 4: 345 + * 5: var.y = @4 + * 6: load(var.xy) + * + * where the load (@6) originates from different sources but that are constant, + * and transform it into a single constant vector. This latter pass is done + * by copy_propagation_replace_with_constant_vector(). + * + * This is a specialized form of vectorization, and begs the question: why does + * the load need to be involved? Can we just vectorize the stores into a single + * instruction, and then use "normal" copy-prop to convert that into a single + * vector? + * + * In general, the answer is yes, but there is a special case which necessitates + * the use of this transformation: non-uniform control flow. Copy-prop can act + * across some control flow, and in cases like the following: + * + * 2: 123 + * 3: var.x = @2 + * 4: if (...) + * 5: 456 + * 6: var.y = @5 + * 7: load(var.xy) + * + * we can copy-prop the load (@7) into a constant vector {123, 456}, but we + * cannot easily vectorize the stores @3 and @6. + */ + enum copy_propagation_value_state { VALUE_STATE_NOT_WRITTEN = 0, @@ -730,6 +776,42 @@ static bool copy_propagation_replace_with_single_instr(struct hlsl_ctx *ctx, return true; }
+static bool copy_propagation_replace_with_constant_vector(struct hlsl_ctx *ctx, + const struct copy_propagation_state *state, struct hlsl_ir_load *load) +{ + const struct hlsl_ir_var *var = load->src.var; + union hlsl_constant_value values[4] = {0}; + struct hlsl_ir_node *instr = &load->node; + struct hlsl_ir_constant *cons; + unsigned int start, count, i; + + if (!hlsl_component_index_range_from_deref(ctx, &load->src, &start, &count)) + return false; + + for (i = 0; i < count; ++i) + { + struct copy_propagation_value *value = copy_propagation_get_value(state, var, start + i); + + if (!value || value->node->type != HLSL_IR_CONSTANT) + return false; + + values[i] = hlsl_ir_constant(value->node)->value[value->component]; + } + + if (!(cons = hlsl_new_constant(ctx, instr->data_type, &instr->loc))) + return false; + cons->value[0] = values[0]; + cons->value[1] = values[1]; + cons->value[2] = values[2]; + cons->value[3] = values[3]; + list_add_before(&instr->entry, &cons->node.entry); + + TRACE("Load from %s[%u-%u] turned into a constant %p.\n", var->name, start, start + count, cons); + + hlsl_replace_node(instr, &cons->node); + return true; +} + static bool copy_propagation_transform_load(struct hlsl_ctx *ctx, struct hlsl_ir_load *load, struct copy_propagation_state *state) { @@ -750,6 +832,9 @@ static bool copy_propagation_transform_load(struct hlsl_ctx *ctx, return false; }
+ if (copy_propagation_replace_with_constant_vector(ctx, state, load)) + return true; + if (copy_propagation_replace_with_single_instr(ctx, state, load)) return true;
diff --git a/tests/hlsl-initializer-objects.shader_test b/tests/hlsl-initializer-objects.shader_test index d40ede46..d9c0bc91 100644 --- a/tests/hlsl-initializer-objects.shader_test +++ b/tests/hlsl-initializer-objects.shader_test @@ -29,7 +29,7 @@ draw quad probe all rgba (0.2, 0.2, 0.2, 0.1)
-[pixel shader todo] +[pixel shader] Texture2D tex;
struct foo @@ -48,11 +48,11 @@ float4 main() : sv_target }
[test] -todo draw quad -todo probe all rgba (31.1, 41.1, 51.1, 61.1) 1 +draw quad +probe all rgba (31.1, 41.1, 51.1, 61.1) 1
-[pixel shader todo] +[pixel shader] Texture2D tex1; Texture2D tex2;
diff --git a/tests/object-references.shader_test b/tests/object-references.shader_test index 12f745e6..ba9b1235 100644 --- a/tests/object-references.shader_test +++ b/tests/object-references.shader_test @@ -132,7 +132,7 @@ float4 main() : sv_target }
-[pixel shader todo] +[pixel shader] Texture2D tex; uniform float f;
@@ -153,5 +153,5 @@ float4 main() : sv_target
[test] uniform 0 float 10.0 -todo draw quad -todo probe (0, 0) rgba (11.0, 12.0, 13.0, 11.0) +draw quad +probe (0, 0) rgba (11.0, 12.0, 13.0, 11.0) diff --git a/tests/sampler-offset.shader_test b/tests/sampler-offset.shader_test index 2aa8f9b3..6f8357df 100644 --- a/tests/sampler-offset.shader_test +++ b/tests/sampler-offset.shader_test @@ -12,7 +12,7 @@ size (3, 3) 0.0 0.2 0.0 0.4 0.1 0.2 0.5 0.0 0.2 0.2 0.0 0.4
-[pixel shader todo] +[pixel shader] sampler s; Texture2D t;
@@ -22,11 +22,11 @@ float4 main() : sv_target }
[test] -todo draw quad +draw quad probe all rgba (0.1, 0.2, 0.5, 0.0)
-[pixel shader todo] +[pixel shader] sampler s; Texture2D t;
@@ -36,11 +36,11 @@ float4 main() : sv_target }
[test] -todo draw quad +draw quad probe all rgba (0.2, 0.2, 0.0, 0.4)
-[pixel shader todo] +[pixel shader] sampler s; Texture2D t;
@@ -50,5 +50,5 @@ float4 main() : sv_target }
[test] -todo draw quad +draw quad probe all rgba (0.0, 0.2, 0.0, 0.4) diff --git a/tests/shader_runner_d3d12.c b/tests/shader_runner_d3d12.c index bb4d9c5a..bd94b4c9 100644 --- a/tests/shader_runner_d3d12.c +++ b/tests/shader_runner_d3d12.c @@ -167,7 +167,7 @@ static ID3D12RootSignature *d3d12_runner_create_root_signature(struct d3d12_shad ID3D12GraphicsCommandList *command_list, unsigned int *uniform_index) { D3D12_ROOT_SIGNATURE_DESC root_signature_desc = {0}; - D3D12_ROOT_PARAMETER root_params[3], *root_param; + D3D12_ROOT_PARAMETER root_params[4], *root_param; D3D12_STATIC_SAMPLER_DESC static_samplers[1]; ID3D12RootSignature *root_signature; HRESULT hr; diff --git a/tests/texture-load-offset.shader_test b/tests/texture-load-offset.shader_test index 6d732190..52b6a5f9 100644 --- a/tests/texture-load-offset.shader_test +++ b/tests/texture-load-offset.shader_test @@ -8,7 +8,7 @@ size (3, 3) 0 2 0 1 1 2 0 1 2 2 0 1
-[pixel shader todo] +[pixel shader] Texture2D t;
float4 main(float4 pos : sv_position) : sv_target @@ -18,14 +18,14 @@ float4 main(float4 pos : sv_position) : sv_target
[test] -todo draw quad -todo probe (0, 0) rgba (0, 1, 0, 1) -todo probe (1, 0) rgba (1, 1, 0, 1) -todo probe (0, 1) rgba (0, 2, 0, 1) -todo probe (1, 1) rgba (1, 2, 0, 1) +draw quad +probe (0, 0) rgba (0, 1, 0, 1) +probe (1, 0) rgba (1, 1, 0, 1) +probe (0, 1) rgba (0, 2, 0, 1) +probe (1, 1) rgba (1, 2, 0, 1)
-[pixel shader todo] +[pixel shader] Texture2D t;
float4 main(float4 pos : sv_position) : sv_target @@ -35,11 +35,11 @@ float4 main(float4 pos : sv_position) : sv_target
[test] -todo draw quad -todo probe (3, 0) rgba (1, 0, 0, 1) -todo probe (4, 0) rgba (2, 0, 0, 1) -todo probe (3, 1) rgba (1, 1, 0, 1) -todo probe (4, 1) rgba (2, 1, 0, 1) +draw quad +probe (3, 0) rgba (1, 0, 0, 1) +probe (4, 0) rgba (2, 0, 0, 1) +probe (3, 1) rgba (1, 1, 0, 1) +probe (4, 1) rgba (2, 1, 0, 1)
[pixel shader fail]
From: Francisco Casas fcasas@codeweavers.com
If the offset of a gather resource load can be represented as an aoffimmi (vectori of ints from -8 to 7), use one. This is of particular importance for 4.0 profiles, where this is the only valid way of representing offsets for this operation. --- libs/vkd3d-shader/hlsl_sm4.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/libs/vkd3d-shader/hlsl_sm4.c b/libs/vkd3d-shader/hlsl_sm4.c index ae7017a8..a6694221 100644 --- a/libs/vkd3d-shader/hlsl_sm4.c +++ b/libs/vkd3d-shader/hlsl_sm4.c @@ -2121,11 +2121,19 @@ static void write_sm4_gather(struct hlsl_ctx *ctx, struct vkd3d_bytecode_buffer
sm4_src_from_node(&instr.srcs[instr.src_count++], coords, VKD3DSP_WRITEMASK_ALL);
- /* FIXME: Use an aoffimmi modifier if possible. */ if (texel_offset) { - instr.opcode = VKD3D_SM5_OP_GATHER4_PO; - sm4_src_from_node(&instr.srcs[instr.src_count++], texel_offset, VKD3DSP_WRITEMASK_ALL); + if (!encode_texel_offset_as_aoffimmi(&instr, texel_offset)) + { + if (ctx->profile->major_version < 5) + { + hlsl_error(ctx, &texel_offset->loc, VKD3D_SHADER_ERROR_HLSL_INVALID_TEXEL_OFFSET, + "Offset must resolve to integer literal in the range -8 to 7 for profiles < 5."); + return; + } + instr.opcode = VKD3D_SM5_OP_GATHER4_PO; + sm4_src_from_node(&instr.srcs[instr.src_count++], texel_offset, VKD3DSP_WRITEMASK_ALL); + } }
sm4_src_from_deref(ctx, &instr.srcs[instr.src_count++], resource, resource_type, instr.dsts[0].writemask);
From: Zebediah Figura zfigura@codeweavers.com
--- libs/vkd3d-shader/hlsl_codegen.c | 68 ++++++++++++++++++------- tests/swizzle-constant-prop.shader_test | 6 +-- 2 files changed, 52 insertions(+), 22 deletions(-)
diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index b05109b0..f6ba08a0 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -514,6 +514,9 @@ static bool lower_broadcasts(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, v * store instructions, as long as the rhs is the same in every case. This basic * detection is implemented by copy_propagation_replace_with_single_instr(). * + * In some cases, the load itself might not have a single source, but a + * subsequent swizzle might; hence we also try to replace swizzles of loads. + * * We use the same infrastructure to implement a more specialized * transformation. We recognize sequences of the form: * @@ -728,11 +731,11 @@ static void copy_propagation_set_value(struct copy_propagation_var_def *var_def, }
static bool copy_propagation_replace_with_single_instr(struct hlsl_ctx *ctx, - const struct copy_propagation_state *state, struct hlsl_ir_load *load) + const struct copy_propagation_state *state, const struct hlsl_deref *deref, + unsigned int swizzle, struct hlsl_ir_node *instr) { - const struct hlsl_deref *deref = &load->src; + const unsigned int instr_component_count = hlsl_type_component_count(instr->data_type); const struct hlsl_ir_var *var = deref->var; - struct hlsl_ir_node *instr = &load->node; struct hlsl_ir_node *new_instr = NULL; unsigned int start, count, i; unsigned int ret_swizzle = 0; @@ -740,11 +743,11 @@ static bool copy_propagation_replace_with_single_instr(struct hlsl_ctx *ctx, if (!hlsl_component_index_range_from_deref(ctx, deref, &start, &count)) return false;
- for (i = 0; i < count; ++i) + for (i = 0; i < instr_component_count; ++i) { - struct copy_propagation_value *value = copy_propagation_get_value(state, var, start + i); + struct copy_propagation_value *value;
- if (!value) + if (!(value = copy_propagation_get_value(state, var, start + hlsl_swizzle_get_component(swizzle, i)))) return false;
if (!new_instr) @@ -753,14 +756,16 @@ static bool copy_propagation_replace_with_single_instr(struct hlsl_ctx *ctx, } else if (new_instr != value->node) { - TRACE("No single source for propagating load from %s[%u-%u].\n", var->name, start, start + count); + TRACE("No single source for propagating load from %s[%u-%u]%s\n", + var->name, start, start + count, debug_hlsl_swizzle(swizzle, instr_component_count)); return false; } ret_swizzle |= value->component << HLSL_SWIZZLE_SHIFT(i); }
- TRACE("Load from %s[%u-%u] propagated as instruction %p%s.\n", - var->name, start, start + count, new_instr, debug_hlsl_swizzle(ret_swizzle, count)); + TRACE("Load from %s[%u-%u]%s propagated as instruction %p%s.\n", + var->name, start, start + count, debug_hlsl_swizzle(swizzle, instr_component_count), + new_instr, debug_hlsl_swizzle(ret_swizzle, instr_component_count));
if (instr->data_type->type != HLSL_CLASS_OBJECT) { @@ -777,22 +782,24 @@ static bool copy_propagation_replace_with_single_instr(struct hlsl_ctx *ctx, }
static bool copy_propagation_replace_with_constant_vector(struct hlsl_ctx *ctx, - const struct copy_propagation_state *state, struct hlsl_ir_load *load) + const struct copy_propagation_state *state, const struct hlsl_deref *deref, + unsigned int swizzle, struct hlsl_ir_node *instr) { - const struct hlsl_ir_var *var = load->src.var; + const unsigned int instr_component_count = hlsl_type_component_count(instr->data_type); + const struct hlsl_ir_var *var = deref->var; union hlsl_constant_value values[4] = {0}; - struct hlsl_ir_node *instr = &load->node; struct hlsl_ir_constant *cons; unsigned int start, count, i;
- if (!hlsl_component_index_range_from_deref(ctx, &load->src, &start, &count)) + if (!hlsl_component_index_range_from_deref(ctx, deref, &start, &count)) return false;
- for (i = 0; i < count; ++i) + for (i = 0; i < instr_component_count; ++i) { - struct copy_propagation_value *value = copy_propagation_get_value(state, var, start + i); + struct copy_propagation_value *value;
- if (!value || value->node->type != HLSL_IR_CONSTANT) + if (!(value = copy_propagation_get_value(state, var, start + hlsl_swizzle_get_component(swizzle, i))) + || value->node->type != HLSL_IR_CONSTANT) return false;
values[i] = hlsl_ir_constant(value->node)->value[value->component]; @@ -806,7 +813,8 @@ static bool copy_propagation_replace_with_constant_vector(struct hlsl_ctx *ctx, cons->value[3] = values[3]; list_add_before(&instr->entry, &cons->node.entry);
- TRACE("Load from %s[%u-%u] turned into a constant %p.\n", var->name, start, start + count, cons); + TRACE("Load from %s[%u-%u]%s turned into a constant %p.\n", + var->name, start, start + count, debug_hlsl_swizzle(swizzle, instr_component_count), cons);
hlsl_replace_node(instr, &cons->node); return true; @@ -832,10 +840,28 @@ static bool copy_propagation_transform_load(struct hlsl_ctx *ctx, return false; }
- if (copy_propagation_replace_with_constant_vector(ctx, state, load)) + if (copy_propagation_replace_with_constant_vector(ctx, state, &load->src, HLSL_SWIZZLE(X, Y, Z, W), &load->node)) return true;
- if (copy_propagation_replace_with_single_instr(ctx, state, load)) + if (copy_propagation_replace_with_single_instr(ctx, state, &load->src, HLSL_SWIZZLE(X, Y, Z, W), &load->node)) + return true; + + return false; +} + +static bool copy_propagation_transform_swizzle(struct hlsl_ctx *ctx, + struct hlsl_ir_swizzle *swizzle, struct copy_propagation_state *state) +{ + struct hlsl_ir_load *load; + + if (swizzle->val.node->type != HLSL_IR_LOAD) + return false; + load = hlsl_ir_load(swizzle->val.node); + + if (copy_propagation_replace_with_constant_vector(ctx, state, &load->src, swizzle->swizzle, &swizzle->node)) + return true; + + if (copy_propagation_replace_with_single_instr(ctx, state, &load->src, swizzle->swizzle, &swizzle->node)) return true;
return false; @@ -1039,6 +1065,10 @@ static bool copy_propagation_transform_block(struct hlsl_ctx *ctx, struct hlsl_b copy_propagation_record_store(ctx, hlsl_ir_store(instr), state); break;
+ case HLSL_IR_SWIZZLE: + progress |= copy_propagation_transform_swizzle(ctx, hlsl_ir_swizzle(instr), state); + break; + case HLSL_IR_IF: progress |= copy_propagation_process_if(ctx, hlsl_ir_if(instr), state); break; diff --git a/tests/swizzle-constant-prop.shader_test b/tests/swizzle-constant-prop.shader_test index 2599983b..48c3ab79 100644 --- a/tests/swizzle-constant-prop.shader_test +++ b/tests/swizzle-constant-prop.shader_test @@ -13,7 +13,7 @@ size (4, 4) 13 13 13 13 14 14 14 14 14 15 15 15 16 16 16 16
-[pixel shader todo] +[pixel shader] Texture2D tex; uniform int i;
@@ -25,8 +25,8 @@ float4 main() : sv_target
[test] uniform 0 int 4 -todo draw quad -todo probe all rgba (110, 210, 410, 410) +draw quad +probe all rgba (110, 210, 410, 410)
[pixel shader todo]
From: Francisco Casas fcasas@codeweavers.com
--- libs/vkd3d-shader/hlsl_codegen.c | 34 +++++++++++++++++++++++++ tests/swizzle-constant-prop.shader_test | 12 ++++----- 2 files changed, 40 insertions(+), 6 deletions(-)
diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index f6ba08a0..448e9ef8 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -1358,6 +1358,39 @@ static bool lower_narrowing_casts(struct hlsl_ctx *ctx, struct hlsl_ir_node *ins return false; }
+static bool fold_swizzle_chains(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, void *context) +{ + struct hlsl_ir_swizzle *swizzle; + struct hlsl_ir_node *next_instr; + + if (instr->type != HLSL_IR_SWIZZLE) + return false; + swizzle = hlsl_ir_swizzle(instr); + + next_instr = swizzle->val.node; + + if (next_instr->type == HLSL_IR_SWIZZLE) + { + struct hlsl_ir_swizzle *new_swizzle; + struct hlsl_ir_node *new_instr; + unsigned int combined_swizzle; + + combined_swizzle = hlsl_combine_swizzles(hlsl_ir_swizzle(next_instr)->swizzle, + swizzle->swizzle, instr->data_type->dimx); + next_instr = hlsl_ir_swizzle(next_instr)->val.node; + + if (!(new_swizzle = hlsl_new_swizzle(ctx, combined_swizzle, instr->data_type->dimx, next_instr, &instr->loc))) + return false; + + new_instr = &new_swizzle->node; + list_add_before(&instr->entry, &new_instr->entry); + hlsl_replace_node(instr, new_instr); + return true; + } + + return false; +} + static bool remove_trivial_swizzles(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, void *context) { struct hlsl_ir_swizzle *swizzle; @@ -2800,6 +2833,7 @@ int hlsl_emit_bytecode(struct hlsl_ctx *ctx, struct hlsl_ir_function_decl *entry progress = transform_ir(ctx, hlsl_fold_constant_exprs, body, NULL); progress |= transform_ir(ctx, hlsl_fold_constant_swizzles, body, NULL); progress |= copy_propagation_execute(ctx, body); + progress |= transform_ir(ctx, fold_swizzle_chains, body, NULL); progress |= transform_ir(ctx, remove_trivial_swizzles, body, NULL); } while (progress); diff --git a/tests/swizzle-constant-prop.shader_test b/tests/swizzle-constant-prop.shader_test index 48c3ab79..357a3496 100644 --- a/tests/swizzle-constant-prop.shader_test +++ b/tests/swizzle-constant-prop.shader_test @@ -29,7 +29,7 @@ draw quad probe all rgba (110, 210, 410, 410)
-[pixel shader todo] +[pixel shader] Texture2D tex; uniform int i;
@@ -43,11 +43,11 @@ float4 main() : sv_target
[test] uniform 0 int 3 -todo draw quad -todo probe all rgba (105, 5, 305, 305) +draw quad +probe all rgba (105, 5, 305, 305)
-[pixel shader todo] +[pixel shader] Texture2D tex; uniform int i;
@@ -59,5 +59,5 @@ float4 main() : sv_target
[test] uniform 0 int 1 -todo draw quad -todo probe all rgba (14.0, 14.0, 14.0, 14.0) +draw quad +probe all rgba (14.0, 14.0, 14.0, 14.0)
:arrow_up: Zeb's reorganization of the solution as in [her branch](https://gitlab.winehq.org/zfigura/vkd3d/-/tree/copyprop) plus two patches that it was missing (namely a52ccc3f and 69ab277a), and minor changes to the copy propagation comment (specifically, using IR instructions instead of pseudocode in the examples).
This merge request was approved by Zebediah Figura.
This merge request was approved by Giovanni Mascellani.
Giovanni Mascellani (@giomasce) commented about libs/vkd3d-shader/hlsl_codegen.c:
struct hlsl_deref *deref, struct copy_propagation_state *state)
{
- struct copy_propagation_value *value; struct hlsl_ir_load *load;
- struct hlsl_ir_node *instr;
- unsigned int swizzle;
- unsigned int start, count;
- if (!hlsl_component_index_range_from_deref(ctx, deref, &start, &count))
return false;
- assert(count == 1);
- if (!(instr = copy_propagation_compute_replacement(ctx, state, deref, &swizzle)))
- if (!(value = copy_propagation_get_value(state, deref->var, start))) return false;
- assert(!value->component);
I won't block the MR on that, but I don't like this style: `component` is really a number here, not a boolean value; in other words, zero is not a value that is more special than any other. So I would write `component == 0`, rather than using operator `!` which gives to the reader the impression that zero is a qualitatively different value from any other.
Very nice MR, thanks. I just have a minor comment, but that shouldn't block or even delay the MR. Just to say that.
This will need a rebase.
On Thu Jan 19 15:25:52 2023 +0000, Giovanni Mascellani wrote:
I won't block the MR on that, but I don't like this style: `component` is really a number here, not a boolean value; in other words, zero is not a value that is more special than any other. So I would write `component == 0`, rather than using operator `!` which gives to the reader the impression that zero is a qualitatively different value from any other.
I agree, I will change it while I am rebasing. (albeit... it is on one of Zeb's patches, and... I feel like we had a similar conversation a year ago, but I am not sure...)
On Fri Jan 20 00:42:57 2023 +0000, Francisco Casas wrote:
I agree, I will change it while I am rebasing. (albeit... it is on one of Zeb's patches, and... I feel like we had a similar conversation a year ago, but I am not sure...)
I don't mind using !x (or e.g. "if (x)") for integer values, and I don't think Henri minds either, but I don't really want to spend time arguing about it either.