First part of the continuation of the implementation of non-constant offset dereferences (a.k.a. relative addressing) for SM4, now that we use vsir registers in tpf.c.
As a quick recap: while parsing HLSL we are expressing derefs as paths, and then we are lowering these paths into a single offset node (which is closer to the bytecode) using the replace_deref_path_with_offset() pass, right before register allocation.
This first part of the series splits this offset node into 2 parts: - A constant uint, which will be called hlsl_deref.offset_const. - A non-hlsl_ir_constant offset node that will only be present when we need relative addressing, that we will end up calling hlsl_deref.offset_rel.
Both these fields will be analog to the ones used in vsir register indexes, vkd3d_shader_register_index.rel_addr and vkd3d_shader_src_param.offset respectively, which is something we need for the second part of this series.
The following patches are in my [nonconst-offsets-8-part](https://gitlab.winehq.org/fcasas/vkd3d/-/commits/nonconst-offsets-8-part) branch, if something is not clear in this series, it may be worth skimming through them.
Supersedes !229.
From: Francisco Casas fcasas@codeweavers.com
--- tests/hlsl/array-index-expr.shader_test | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+)
diff --git a/tests/hlsl/array-index-expr.shader_test b/tests/hlsl/array-index-expr.shader_test index 0a83080cc..f8d9f17b4 100644 --- a/tests/hlsl/array-index-expr.shader_test +++ b/tests/hlsl/array-index-expr.shader_test @@ -97,3 +97,22 @@ todo probe all rgba (5.0, 6.0, 7.0, 8.0) uniform 0 float4 1 1 0 0 todo draw quad todo probe all rgba (9.0, 10.0, 11.0, 12.0) + + +[pixel shader todo] +int4 a; + +float4 main() : sv_target +{ + float4 arr[] = {10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120}; + + float4 tmp = float4(1, 2, 3, 4); + tmp.yz = arr[a.z].wx; + + return tmp; +} + +[test] +uniform 0 int4 0 0 2 0 +todo draw quad +probe all rgba (1.0, 120.0, 90.0, 4.0)
From: Francisco Casas fcasas@codeweavers.com
Some functions work with dereferences and need to know if they are lowered yet. Adding this field is more readable than checking if deref->offset or deref->data_type are not NULL, and helps following patches. --- libs/vkd3d-shader/hlsl.c | 15 +++++---------- libs/vkd3d-shader/hlsl.h | 2 ++ libs/vkd3d-shader/hlsl_codegen.c | 1 + 3 files changed, 8 insertions(+), 10 deletions(-)
diff --git a/libs/vkd3d-shader/hlsl.c b/libs/vkd3d-shader/hlsl.c index 0bfba35f4..9cbad6ad7 100644 --- a/libs/vkd3d-shader/hlsl.c +++ b/libs/vkd3d-shader/hlsl.c @@ -249,14 +249,7 @@ static enum hlsl_regset type_get_regset(const struct hlsl_type *type)
enum hlsl_regset hlsl_deref_get_regset(struct hlsl_ctx *ctx, const struct hlsl_deref *deref) { - struct hlsl_type *type; - - if (deref->data_type) - type = deref->data_type; - else - type = hlsl_deref_get_type(ctx, deref); - - return type_get_regset(type); + return type_get_regset(hlsl_deref_get_type(ctx, deref)); }
unsigned int hlsl_type_get_sm4_offset(const struct hlsl_type *type, unsigned int offset) @@ -520,6 +513,8 @@ static bool init_deref(struct hlsl_ctx *ctx, struct hlsl_deref *deref, struct hl deref->var = var; deref->path_len = path_len; deref->offset.node = NULL; + deref->is_lowered = false; + deref->data_type = NULL;
if (path_len == 0) { @@ -609,7 +604,7 @@ struct hlsl_type *hlsl_deref_get_type(struct hlsl_ctx *ctx, const struct hlsl_de
assert(deref);
- if (deref->offset.node) + if (deref->is_lowered) return deref->data_type;
type = deref->var->data_type; @@ -1350,7 +1345,7 @@ struct hlsl_ir_load *hlsl_new_load_index(struct hlsl_ctx *ctx, const struct hlsl struct hlsl_type *type; unsigned int i;
- assert(!deref->offset.node); + assert(!deref->is_lowered);
type = hlsl_deref_get_type(ctx, deref); if (idx) diff --git a/libs/vkd3d-shader/hlsl.h b/libs/vkd3d-shader/hlsl.h index e45256bce..b92ec9e4c 100644 --- a/libs/vkd3d-shader/hlsl.h +++ b/libs/vkd3d-shader/hlsl.h @@ -624,6 +624,8 @@ struct hlsl_deref * stored in the data_type field. */ struct hlsl_src offset; struct hlsl_type *data_type; + /* Whether the path has been lowered to an offset or not. */ + bool is_lowered; };
struct hlsl_ir_load diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index c0d18a3ef..629113867 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -146,6 +146,7 @@ static bool replace_deref_path_with_offset(struct hlsl_ctx *ctx, struct hlsl_der return true; }
+ deref->is_lowered = true; deref->data_type = type;
if (!(offset = new_offset_instr_from_deref(ctx, &block, deref, &instr->loc)))
From: Francisco Casas fcasas@codeweavers.com
This uint will be used for the following:
- Since SM4's relative addressing (the capability of passing a register as an index to another register) only has whole-register granularity, we will need to make the offset node express the offset in whole-registers and specify the register component in this uint, otherwise we would have to add additional / and % operations in the output binary.
- If, after we apply constant folding and copy propagation, we determine that the offset is a single constant node, we can store all the offset in this uint constant, and remove the offset src.
This allows DCE to remove a good bunch of the nodes previously required only for the offset constants, which makes the output more liteweight and readable, and simplifies the implementation of relative addressing when writing tpf in the following patches.
In dump_deref(), we use "c" to indicate components instead of whole registers. Since now both the offset node and the offset uint are in components a lowered deref would look like:
var[@42c + 2c]
But, once we express the offset node in whole registers we will remove the "c" from the node part:
var[@22 + 3c] --- libs/vkd3d-shader/hlsl.c | 13 ++++-- libs/vkd3d-shader/hlsl.h | 12 +++--- libs/vkd3d-shader/hlsl_codegen.c | 72 ++++++++++++++++++++------------ 3 files changed, 62 insertions(+), 35 deletions(-)
diff --git a/libs/vkd3d-shader/hlsl.c b/libs/vkd3d-shader/hlsl.c index 9cbad6ad7..9e3ee8174 100644 --- a/libs/vkd3d-shader/hlsl.c +++ b/libs/vkd3d-shader/hlsl.c @@ -513,6 +513,7 @@ static bool init_deref(struct hlsl_ctx *ctx, struct hlsl_deref *deref, struct hl deref->var = var; deref->path_len = path_len; deref->offset.node = NULL; + deref->offset_const = 0; deref->is_lowered = false; deref->data_type = NULL;
@@ -542,6 +543,7 @@ bool hlsl_init_deref_from_index_chain(struct hlsl_ctx *ctx, struct hlsl_deref *d deref->path = NULL; deref->path_len = 0; deref->offset.node = NULL; + deref->offset_const = 0;
assert(chain); if (chain->type == HLSL_IR_INDEX) @@ -1138,6 +1140,7 @@ void hlsl_cleanup_deref(struct hlsl_deref *deref) deref->path_len = 0;
hlsl_src_remove(&deref->offset); + deref->offset_const = 0; }
/* Initializes a simple variable dereference, so that it can be passed to load/store functions. */ @@ -2329,11 +2332,15 @@ static void dump_deref(struct vkd3d_string_buffer *buffer, const struct hlsl_der } vkd3d_string_buffer_printf(buffer, "]"); } - else if (deref->offset.node) + else if (deref->is_lowered) { vkd3d_string_buffer_printf(buffer, "["); - dump_src(buffer, &deref->offset); - vkd3d_string_buffer_printf(buffer, "]"); + if (deref->offset.node) + { + dump_src(buffer, &deref->offset); + vkd3d_string_buffer_printf(buffer, "c + "); + } + vkd3d_string_buffer_printf(buffer, "%uc]", deref->offset_const); } } else diff --git a/libs/vkd3d-shader/hlsl.h b/libs/vkd3d-shader/hlsl.h index b92ec9e4c..9a333844c 100644 --- a/libs/vkd3d-shader/hlsl.h +++ b/libs/vkd3d-shader/hlsl.h @@ -615,14 +615,16 @@ struct hlsl_deref unsigned int path_len; struct hlsl_src *path;
- /* Single instruction node of data type uint used to represent the register offset (in register - * components, within the pertaining regset), from the start of the variable, of the part - * referenced. - * The path is lowered to this single offset -- whose value may vary between SM1 and SM4 -- - * before writing the bytecode. + /* Before writing the bytecode, deref paths are lowered into an offset (within the pertaining + * regset) from the start of the variable, to the part of the variable that is referenced. + * This offset is stored using two fields, one for a variable part and other for a constant + * part, which are added together: + * - offset: An offset given by an instruction node, in number of register components. + * - offset_const: A constant number of register components. * Since the type information cannot longer be retrieved from the offset alone, the type is * stored in the data_type field. */ struct hlsl_src offset; + unsigned int offset_const; struct hlsl_type *data_type; /* Whether the path has been lowered to an offset or not. */ bool is_lowered; diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index 629113867..c519ac4ec 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -23,8 +23,8 @@
/* TODO: remove when no longer needed, only used for new_offset_instr_from_deref() */ static struct hlsl_ir_node *new_offset_from_path_index(struct hlsl_ctx *ctx, struct hlsl_block *block, - struct hlsl_type *type, struct hlsl_ir_node *offset, struct hlsl_ir_node *idx, - enum hlsl_regset regset, const struct vkd3d_shader_location *loc) + struct hlsl_type *type, struct hlsl_ir_node *base_offset, struct hlsl_ir_node *idx, + enum hlsl_regset regset, unsigned int *offset_component, const struct vkd3d_shader_location *loc) { struct hlsl_ir_node *idx_offset = NULL; struct hlsl_ir_node *c; @@ -34,7 +34,7 @@ static struct hlsl_ir_node *new_offset_from_path_index(struct hlsl_ctx *ctx, str switch (type->class) { case HLSL_CLASS_VECTOR: - idx_offset = idx; + *offset_component += hlsl_ir_constant(idx)->value.u[0].u; break;
case HLSL_CLASS_MATRIX: @@ -69,8 +69,16 @@ static struct hlsl_ir_node *new_offset_from_path_index(struct hlsl_ctx *ctx, str { unsigned int field_idx = hlsl_ir_constant(idx)->value.u[0].u; struct hlsl_struct_field *field = &type->e.record.fields[field_idx]; + unsigned int field_offset = field->reg_offset[regset];
- if (!(c = hlsl_new_uint_constant(ctx, field->reg_offset[regset], loc))) + if (regset == HLSL_REGSET_NUMERIC) + { + assert(*offset_component == 0); + *offset_component = field_offset % 4; + field_offset -= *offset_component; + } + + if (!(c = hlsl_new_uint_constant(ctx, field_offset, loc))) return NULL; hlsl_block_add_instr(block, c);
@@ -83,27 +91,33 @@ static struct hlsl_ir_node *new_offset_from_path_index(struct hlsl_ctx *ctx, str vkd3d_unreachable(); }
- if (offset) + if (idx_offset) { - if (!(idx_offset = hlsl_new_binary_expr(ctx, HLSL_OP2_ADD, offset, idx_offset))) + if (!(base_offset = hlsl_new_binary_expr(ctx, HLSL_OP2_ADD, base_offset, idx_offset))) return NULL; - hlsl_block_add_instr(block, idx_offset); + hlsl_block_add_instr(block, base_offset); }
- return idx_offset; + return base_offset; }
/* TODO: remove when no longer needed, only used for replace_deref_path_with_offset() */ static struct hlsl_ir_node *new_offset_instr_from_deref(struct hlsl_ctx *ctx, struct hlsl_block *block, - const struct hlsl_deref *deref, const struct vkd3d_shader_location *loc) + const struct hlsl_deref *deref, unsigned int *offset_component, const struct vkd3d_shader_location *loc) { enum hlsl_regset regset = hlsl_deref_get_regset(ctx, deref); - struct hlsl_ir_node *offset = NULL; + struct hlsl_ir_node *offset; struct hlsl_type *type; unsigned int i;
+ *offset_component = 0; + hlsl_block_init(block);
+ if (!(offset = hlsl_new_uint_constant(ctx, 0, loc))) + return NULL; + hlsl_block_add_instr(block, offset); + assert(deref->var); type = deref->var->data_type;
@@ -112,8 +126,11 @@ static struct hlsl_ir_node *new_offset_instr_from_deref(struct hlsl_ctx *ctx, st struct hlsl_block idx_block;
if (!(offset = new_offset_from_path_index(ctx, &idx_block, type, offset, deref->path[i].node, - regset, loc))) + regset, offset_component, loc))) + { + hlsl_block_cleanup(&idx_block); return NULL; + }
hlsl_block_add_block(block, &idx_block);
@@ -127,14 +144,16 @@ static struct hlsl_ir_node *new_offset_instr_from_deref(struct hlsl_ctx *ctx, st static bool replace_deref_path_with_offset(struct hlsl_ctx *ctx, struct hlsl_deref *deref, struct hlsl_ir_node *instr) { - struct hlsl_type *type; + unsigned int offset_component; struct hlsl_ir_node *offset; struct hlsl_block block; + struct hlsl_type *type;
assert(deref->var);
/* register offsets shouldn't be used before this point is reached. */ assert(!deref->offset.node); + assert(!deref->offset_const);
type = hlsl_deref_get_type(ctx, deref);
@@ -149,12 +168,13 @@ static bool replace_deref_path_with_offset(struct hlsl_ctx *ctx, struct hlsl_der deref->is_lowered = true; deref->data_type = type;
- if (!(offset = new_offset_instr_from_deref(ctx, &block, deref, &instr->loc))) + if (!(offset = new_offset_instr_from_deref(ctx, &block, deref, &offset_component, &instr->loc))) return false; list_move_before(&instr->entry, &block.instrs);
hlsl_cleanup_deref(deref); hlsl_src_from_node(&deref->offset, offset); + deref->offset_const = offset_component;
return true; } @@ -4194,30 +4214,28 @@ bool hlsl_regset_index_from_deref(struct hlsl_ctx *ctx, const struct hlsl_deref
bool hlsl_offset_from_deref(struct hlsl_ctx *ctx, const struct hlsl_deref *deref, unsigned int *offset) { + enum hlsl_regset regset = hlsl_deref_get_regset(ctx, deref); struct hlsl_ir_node *offset_node = deref->offset.node; - enum hlsl_regset regset; unsigned int size;
- if (!offset_node) - { - *offset = 0; - return true; - } + *offset = deref->offset_const;
- /* We should always have generated a cast to UINT. */ - assert(offset_node->data_type->class == HLSL_CLASS_SCALAR - && offset_node->data_type->base_type == HLSL_TYPE_UINT); + if (offset_node) + { + /* We should always have generated a cast to UINT. */ + assert(offset_node->data_type->class == HLSL_CLASS_SCALAR + && offset_node->data_type->base_type == HLSL_TYPE_UINT);
- if (offset_node->type != HLSL_IR_CONSTANT) - return false; + if (offset_node->type != HLSL_IR_CONSTANT) + return false;
- *offset = hlsl_ir_constant(offset_node)->value.u[0].u; - regset = hlsl_deref_get_regset(ctx, deref); + *offset += hlsl_ir_constant(offset_node)->value.u[0].u; + }
size = deref->var->data_type->reg_size[regset]; if (*offset >= size) { - hlsl_error(ctx, &deref->offset.node->loc, VKD3D_SHADER_ERROR_HLSL_OFFSET_OUT_OF_BOUNDS, + hlsl_error(ctx, &offset_node->loc, VKD3D_SHADER_ERROR_HLSL_OFFSET_OUT_OF_BOUNDS, "Dereference is out of bounds. %u/%u", *offset, size); return false; }
From: Francisco Casas fcasas@codeweavers.com
This is required to use SM4 relative addressing, because it is limited to whole-register granularity. --- libs/vkd3d-shader/hlsl.c | 2 +- libs/vkd3d-shader/hlsl.h | 2 +- libs/vkd3d-shader/hlsl_codegen.c | 19 +++++++++---------- 3 files changed, 11 insertions(+), 12 deletions(-)
diff --git a/libs/vkd3d-shader/hlsl.c b/libs/vkd3d-shader/hlsl.c index 9e3ee8174..cb5b23db3 100644 --- a/libs/vkd3d-shader/hlsl.c +++ b/libs/vkd3d-shader/hlsl.c @@ -2338,7 +2338,7 @@ static void dump_deref(struct vkd3d_string_buffer *buffer, const struct hlsl_der if (deref->offset.node) { dump_src(buffer, &deref->offset); - vkd3d_string_buffer_printf(buffer, "c + "); + vkd3d_string_buffer_printf(buffer, " + "); } vkd3d_string_buffer_printf(buffer, "%uc]", deref->offset_const); } diff --git a/libs/vkd3d-shader/hlsl.h b/libs/vkd3d-shader/hlsl.h index 9a333844c..d801da31c 100644 --- a/libs/vkd3d-shader/hlsl.h +++ b/libs/vkd3d-shader/hlsl.h @@ -619,7 +619,7 @@ struct hlsl_deref * regset) from the start of the variable, to the part of the variable that is referenced. * This offset is stored using two fields, one for a variable part and other for a constant * part, which are added together: - * - offset: An offset given by an instruction node, in number of register components. + * - offset: An offset given by an instruction node, in whole registers. * - offset_const: A constant number of register components. * Since the type information cannot longer be retrieved from the offset alone, the type is * stored in the data_type field. */ diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index c519ac4ec..9f4e367bd 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -39,14 +39,7 @@ static struct hlsl_ir_node *new_offset_from_path_index(struct hlsl_ctx *ctx, str
case HLSL_CLASS_MATRIX: { - if (!(c = hlsl_new_uint_constant(ctx, 4, loc))) - return NULL; - hlsl_block_add_instr(block, c); - - if (!(idx_offset = hlsl_new_binary_expr(ctx, HLSL_OP2_MUL, c, idx))) - return NULL; - hlsl_block_add_instr(block, idx_offset); - + idx_offset = idx; break; }
@@ -54,6 +47,9 @@ static struct hlsl_ir_node *new_offset_from_path_index(struct hlsl_ctx *ctx, str { unsigned int size = hlsl_type_get_array_element_reg_size(type->e.array.type, regset);
+ if (regset == HLSL_REGSET_NUMERIC) + size /= 4; + if (!(c = hlsl_new_uint_constant(ctx, size, loc))) return NULL; hlsl_block_add_instr(block, c); @@ -75,7 +71,7 @@ static struct hlsl_ir_node *new_offset_from_path_index(struct hlsl_ctx *ctx, str { assert(*offset_component == 0); *offset_component = field_offset % 4; - field_offset -= *offset_component; + field_offset /= 4; }
if (!(c = hlsl_new_uint_constant(ctx, field_offset, loc))) @@ -4229,7 +4225,10 @@ bool hlsl_offset_from_deref(struct hlsl_ctx *ctx, const struct hlsl_deref *deref if (offset_node->type != HLSL_IR_CONSTANT) return false;
- *offset += hlsl_ir_constant(offset_node)->value.u[0].u; + if (regset == HLSL_REGSET_NUMERIC) + *offset += 4 * hlsl_ir_constant(offset_node)->value.u[0].u; + else + *offset += hlsl_ir_constant(offset_node)->value.u[0].u; }
size = deref->var->data_type->reg_size[regset];
From: Francisco Casas fcasas@codeweavers.com
--- libs/vkd3d-shader/hlsl_codegen.c | 29 +++++++++++++++++++++-------- 1 file changed, 21 insertions(+), 8 deletions(-)
diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index 9f4e367bd..e25396a79 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -175,6 +175,24 @@ static bool replace_deref_path_with_offset(struct hlsl_ctx *ctx, struct hlsl_der return true; }
+static bool clean_constant_deref_offset_srcs(struct hlsl_ctx *ctx, struct hlsl_deref *deref, + struct hlsl_ir_node *instr) +{ + if (deref->offset.node && deref->offset.node->type == HLSL_IR_CONSTANT) + { + enum hlsl_regset regset = hlsl_deref_get_regset(ctx, deref); + + if (regset == HLSL_REGSET_NUMERIC) + deref->offset_const += 4 * hlsl_ir_constant(deref->offset.node)->value.u[0].u; + else + deref->offset_const += hlsl_ir_constant(deref->offset.node)->value.u[0].u; + hlsl_src_remove(&deref->offset); + return true; + } + return false; +} + + /* Split uniforms into two variables representing the constant and temp * registers, and copy the former to the latter, so that writes to uniforms * work. */ @@ -4221,14 +4239,8 @@ bool hlsl_offset_from_deref(struct hlsl_ctx *ctx, const struct hlsl_deref *deref /* We should always have generated a cast to UINT. */ assert(offset_node->data_type->class == HLSL_CLASS_SCALAR && offset_node->data_type->base_type == HLSL_TYPE_UINT); - - if (offset_node->type != HLSL_IR_CONSTANT) - return false; - - if (regset == HLSL_REGSET_NUMERIC) - *offset += 4 * hlsl_ir_constant(offset_node)->value.u[0].u; - else - *offset += hlsl_ir_constant(offset_node)->value.u[0].u; + assert(offset_node->type != HLSL_IR_CONSTANT); + return false; }
size = deref->var->data_type->reg_size[regset]; @@ -4487,6 +4499,7 @@ int hlsl_emit_bytecode(struct hlsl_ctx *ctx, struct hlsl_ir_function_decl *entry /* TODO: move forward, remove when no longer needed */ transform_derefs(ctx, replace_deref_path_with_offset, body); while (hlsl_transform_ir(ctx, hlsl_fold_constant_exprs, body, NULL)); + transform_derefs(ctx, clean_constant_deref_offset_srcs, body);
do compute_liveness(ctx, entry_func);
From: Francisco Casas fcasas@codeweavers.com
This field is now analogous to vkd3d_shader_register_index.rel_addr.
Also, it makes sense to rename it now because all the constant part of the offset is now handled to hlsl_deref.offset_const. Consequently, it may also be NULL now. --- libs/vkd3d-shader/hlsl.c | 18 +++++++-------- libs/vkd3d-shader/hlsl.h | 4 ++-- libs/vkd3d-shader/hlsl_codegen.c | 38 ++++++++++++++++---------------- 3 files changed, 30 insertions(+), 30 deletions(-)
diff --git a/libs/vkd3d-shader/hlsl.c b/libs/vkd3d-shader/hlsl.c index cb5b23db3..3618d3473 100644 --- a/libs/vkd3d-shader/hlsl.c +++ b/libs/vkd3d-shader/hlsl.c @@ -512,7 +512,7 @@ static bool init_deref(struct hlsl_ctx *ctx, struct hlsl_deref *deref, struct hl { deref->var = var; deref->path_len = path_len; - deref->offset.node = NULL; + deref->offset_rel.node = NULL; deref->offset_const = 0; deref->is_lowered = false; deref->data_type = NULL; @@ -542,7 +542,7 @@ bool hlsl_init_deref_from_index_chain(struct hlsl_ctx *ctx, struct hlsl_deref *d
deref->path = NULL; deref->path_len = 0; - deref->offset.node = NULL; + deref->offset_rel.node = NULL; deref->offset_const = 0;
assert(chain); @@ -1117,7 +1117,7 @@ bool hlsl_copy_deref(struct hlsl_ctx *ctx, struct hlsl_deref *deref, const struc if (!other) return true;
- assert(!other->offset.node); + assert(!other->offset_rel.node);
if (!init_deref(ctx, deref, other->var, other->path_len)) return false; @@ -1139,7 +1139,7 @@ void hlsl_cleanup_deref(struct hlsl_deref *deref) deref->path = NULL; deref->path_len = 0;
- hlsl_src_remove(&deref->offset); + hlsl_src_remove(&deref->offset_rel); deref->offset_const = 0; }
@@ -1175,7 +1175,7 @@ struct hlsl_ir_node *hlsl_new_store_index(struct hlsl_ctx *ctx, const struct hls unsigned int i;
assert(lhs); - assert(!lhs->offset.node); + assert(!lhs->offset_rel.node);
if (!(store = hlsl_alloc(ctx, sizeof(*store)))) return NULL; @@ -1621,7 +1621,7 @@ static bool clone_deref(struct hlsl_ctx *ctx, struct clone_instr_map *map, { unsigned int i;
- assert(!src->offset.node); + assert(!src->offset_rel.node);
if (!init_deref(ctx, dst, src->var, src->path_len)) return false; @@ -2335,9 +2335,9 @@ static void dump_deref(struct vkd3d_string_buffer *buffer, const struct hlsl_der else if (deref->is_lowered) { vkd3d_string_buffer_printf(buffer, "["); - if (deref->offset.node) + if (deref->offset_rel.node) { - dump_src(buffer, &deref->offset); + dump_src(buffer, &deref->offset_rel); vkd3d_string_buffer_printf(buffer, " + "); } vkd3d_string_buffer_printf(buffer, "%uc]", deref->offset_const); @@ -2875,7 +2875,7 @@ static void free_ir_resource_load(struct hlsl_ir_resource_load *load)
static void free_ir_resource_store(struct hlsl_ir_resource_store *store) { - hlsl_src_remove(&store->resource.offset); + hlsl_src_remove(&store->resource.offset_rel); hlsl_src_remove(&store->coords); hlsl_src_remove(&store->value); vkd3d_free(store); diff --git a/libs/vkd3d-shader/hlsl.h b/libs/vkd3d-shader/hlsl.h index d801da31c..c35af4229 100644 --- a/libs/vkd3d-shader/hlsl.h +++ b/libs/vkd3d-shader/hlsl.h @@ -619,11 +619,11 @@ struct hlsl_deref * regset) from the start of the variable, to the part of the variable that is referenced. * This offset is stored using two fields, one for a variable part and other for a constant * part, which are added together: - * - offset: An offset given by an instruction node, in whole registers. + * - offset_rel: An offset given by an instruction node, in whole registers. * - offset_const: A constant number of register components. * Since the type information cannot longer be retrieved from the offset alone, the type is * stored in the data_type field. */ - struct hlsl_src offset; + struct hlsl_src offset_rel; unsigned int offset_const; struct hlsl_type *data_type; /* Whether the path has been lowered to an offset or not. */ diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index e25396a79..fc21cd0a4 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -148,7 +148,7 @@ static bool replace_deref_path_with_offset(struct hlsl_ctx *ctx, struct hlsl_der assert(deref->var);
/* register offsets shouldn't be used before this point is reached. */ - assert(!deref->offset.node); + assert(!deref->offset_rel.node); assert(!deref->offset_const);
type = hlsl_deref_get_type(ctx, deref); @@ -169,7 +169,7 @@ static bool replace_deref_path_with_offset(struct hlsl_ctx *ctx, struct hlsl_der list_move_before(&instr->entry, &block.instrs);
hlsl_cleanup_deref(deref); - hlsl_src_from_node(&deref->offset, offset); + hlsl_src_from_node(&deref->offset_rel, offset); deref->offset_const = offset_component;
return true; @@ -178,15 +178,15 @@ static bool replace_deref_path_with_offset(struct hlsl_ctx *ctx, struct hlsl_der static bool clean_constant_deref_offset_srcs(struct hlsl_ctx *ctx, struct hlsl_deref *deref, struct hlsl_ir_node *instr) { - if (deref->offset.node && deref->offset.node->type == HLSL_IR_CONSTANT) + if (deref->offset_rel.node && deref->offset_rel.node->type == HLSL_IR_CONSTANT) { enum hlsl_regset regset = hlsl_deref_get_regset(ctx, deref);
if (regset == HLSL_REGSET_NUMERIC) - deref->offset_const += 4 * hlsl_ir_constant(deref->offset.node)->value.u[0].u; + deref->offset_const += 4 * hlsl_ir_constant(deref->offset_rel.node)->value.u[0].u; else - deref->offset_const += hlsl_ir_constant(deref->offset.node)->value.u[0].u; - hlsl_src_remove(&deref->offset); + deref->offset_const += hlsl_ir_constant(deref->offset_rel.node)->value.u[0].u; + hlsl_src_remove(&deref->offset_rel); return true; } return false; @@ -3088,8 +3088,8 @@ static void compute_liveness_recurse(struct hlsl_block *block, unsigned int loop if (!var->first_write) var->first_write = loop_first ? min(instr->index, loop_first) : instr->index; store->rhs.node->last_read = last_read; - if (store->lhs.offset.node) - store->lhs.offset.node->last_read = last_read; + if (store->lhs.offset_rel.node) + store->lhs.offset_rel.node->last_read = last_read; break; } case HLSL_IR_EXPR: @@ -3116,8 +3116,8 @@ static void compute_liveness_recurse(struct hlsl_block *block, unsigned int loop
var = load->src.var; var->last_read = max(var->last_read, last_read); - if (load->src.offset.node) - load->src.offset.node->last_read = last_read; + if (load->src.offset_rel.node) + load->src.offset_rel.node->last_read = last_read; break; } case HLSL_IR_LOOP: @@ -3134,14 +3134,14 @@ static void compute_liveness_recurse(struct hlsl_block *block, unsigned int loop
var = load->resource.var; var->last_read = max(var->last_read, last_read); - if (load->resource.offset.node) - load->resource.offset.node->last_read = last_read; + if (load->resource.offset_rel.node) + load->resource.offset_rel.node->last_read = last_read;
if ((var = load->sampler.var)) { var->last_read = max(var->last_read, last_read); - if (load->sampler.offset.node) - load->sampler.offset.node->last_read = last_read; + if (load->sampler.offset_rel.node) + load->sampler.offset_rel.node->last_read = last_read; }
if (load->coords.node) @@ -3166,8 +3166,8 @@ static void compute_liveness_recurse(struct hlsl_block *block, unsigned int loop
var = store->resource.var; var->last_read = max(var->last_read, last_read); - if (store->resource.offset.node) - store->resource.offset.node->last_read = last_read; + if (store->resource.offset_rel.node) + store->resource.offset_rel.node->last_read = last_read; store->coords.node->last_read = last_read; store->value.node->last_read = last_read; break; @@ -4229,7 +4229,7 @@ bool hlsl_regset_index_from_deref(struct hlsl_ctx *ctx, const struct hlsl_deref bool hlsl_offset_from_deref(struct hlsl_ctx *ctx, const struct hlsl_deref *deref, unsigned int *offset) { enum hlsl_regset regset = hlsl_deref_get_regset(ctx, deref); - struct hlsl_ir_node *offset_node = deref->offset.node; + struct hlsl_ir_node *offset_node = deref->offset_rel.node; unsigned int size;
*offset = deref->offset_const; @@ -4261,8 +4261,8 @@ unsigned int hlsl_offset_from_deref_safe(struct hlsl_ctx *ctx, const struct hlsl if (hlsl_offset_from_deref(ctx, deref, &offset)) return offset;
- hlsl_fixme(ctx, &deref->offset.node->loc, "Dereference with non-constant offset of type %s.", - hlsl_node_type_to_string(deref->offset.node->type)); + hlsl_fixme(ctx, &deref->offset_rel.node->loc, "Dereference with non-constant offset of type %s.", + hlsl_node_type_to_string(deref->offset_rel.node->type));
return 0; }
From: Francisco Casas fcasas@codeweavers.com
--- libs/vkd3d-shader/hlsl.h | 3 +++ libs/vkd3d-shader/hlsl_codegen.c | 15 +++++++++++++++ 2 files changed, 18 insertions(+)
diff --git a/libs/vkd3d-shader/hlsl.h b/libs/vkd3d-shader/hlsl.h index c35af4229..420364028 100644 --- a/libs/vkd3d-shader/hlsl.h +++ b/libs/vkd3d-shader/hlsl.h @@ -422,6 +422,9 @@ struct hlsl_ir_var * It may be less than the allocation size, e.g. for texture arrays. */ unsigned int bind_count[HLSL_REGSET_LAST_OBJECT + 1];
+ /* Whether the shader performs dereferences with non-constant offsets in the variable. */ + bool indexable; + uint32_t is_input_semantic : 1; uint32_t is_output_semantic : 1; uint32_t is_uniform : 1; diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index fc21cd0a4..22fdf9a0b 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -3000,6 +3000,19 @@ static void dump_function(struct rb_entry *entry, void *context) rb_for_each_entry(&func->overloads, dump_function_decl, ctx); }
+static bool mark_indexable_vars(struct hlsl_ctx *ctx, struct hlsl_deref *deref, + struct hlsl_ir_node *instr) +{ + if (!deref->offset_rel.node) + return false; + + assert(deref->var); + assert(deref->offset_rel.node->type != HLSL_IR_CONSTANT); + deref->var->indexable = true; + + return true; +} + static char get_regset_name(enum hlsl_regset regset) { switch (regset) @@ -4510,6 +4523,8 @@ int hlsl_emit_bytecode(struct hlsl_ctx *ctx, struct hlsl_ir_function_decl *entry if (TRACE_ON()) rb_for_each_entry(&ctx->functions, dump_function, ctx);
+ transform_derefs(ctx, mark_indexable_vars, body); + calculate_resource_register_counts(ctx);
allocate_register_reservations(ctx);
From: Francisco Casas fcasas@codeweavers.com
--- libs/vkd3d-shader/tpf.c | 36 +++++++++++++++++++++++++++++++++++- 1 file changed, 35 insertions(+), 1 deletion(-)
diff --git a/libs/vkd3d-shader/tpf.c b/libs/vkd3d-shader/tpf.c index 0fc16b4c7..90b176003 100644 --- a/libs/vkd3d-shader/tpf.c +++ b/libs/vkd3d-shader/tpf.c @@ -3773,7 +3773,7 @@ static void sm4_register_from_deref(struct hlsl_ctx *ctx, struct vkd3d_shader_re struct hlsl_reg hlsl_reg = hlsl_reg_from_deref(ctx, deref);
assert(hlsl_reg.allocated); - reg->type = VKD3DSPR_TEMP; + reg->type = deref->var->indexable ? VKD3DSPR_IDXTEMP : VKD3DSPR_TEMP; reg->dimension = VSIR_DIMENSION_VEC4; reg->idx[0].offset = hlsl_reg.id; reg->idx_count = 1; @@ -4266,6 +4266,20 @@ static void write_sm4_dcl_temps(const struct tpf_writer *tpf, uint32_t temp_coun write_sm4_instruction(tpf, &instr); }
+static void write_sm4_dcl_indexable_temp(const struct tpf_writer *tpf, uint32_t idx, + uint32_t size, uint32_t comp_count) +{ + struct sm4_instruction instr = + { + .opcode = VKD3D_SM4_OP_DCL_INDEXABLE_TEMP, + + .idx = {idx, size, comp_count}, + .idx_count = 3, + }; + + write_sm4_instruction(tpf, &instr); +} + static void write_sm4_dcl_thread_group(const struct tpf_writer *tpf, const uint32_t thread_count[3]) { struct sm4_instruction instr = @@ -5583,6 +5597,7 @@ static void write_sm4_shdr(struct hlsl_ctx *ctx, struct extern_resource *extern_resources; unsigned int extern_resources_count, i; const struct hlsl_buffer *cbuffer; + const struct hlsl_scope *scope; const struct hlsl_ir_var *var; size_t token_count_position; struct tpf_writer tpf; @@ -5637,6 +5652,25 @@ static void write_sm4_shdr(struct hlsl_ctx *ctx, if (ctx->temp_count) write_sm4_dcl_temps(&tpf, ctx->temp_count);
+ LIST_FOR_EACH_ENTRY(scope, &ctx->scopes, struct hlsl_scope, entry) + { + LIST_FOR_EACH_ENTRY(var, &scope->vars, struct hlsl_ir_var, scope_entry) + { + if (var->is_uniform || var->is_input_semantic || var->is_output_semantic) + continue; + if (!var->regs[HLSL_REGSET_NUMERIC].allocated) + continue; + + if (var->indexable) + { + unsigned int id = var->regs[HLSL_REGSET_NUMERIC].id; + unsigned int size = align(var->data_type->reg_size[HLSL_REGSET_NUMERIC], 4) / 4; + + write_sm4_dcl_indexable_temp(&tpf, id, size, 4); + } + } + } + write_sm4_block(&tpf, &entry_func->body);
write_sm4_ret(&tpf);
Giovanni Mascellani (@giomasce) commented about libs/vkd3d-shader/hlsl.h:
* stored in the data_type field. */ struct hlsl_src offset; struct hlsl_type *data_type;
- /* Whether the path has been lowered to an offset or not. */
- bool is_lowered;
Instead of adding another field which just duplicates information, could you use a helper? It should serve the same documentation purpose.
Giovanni Mascellani (@giomasce) commented about libs/vkd3d-shader/hlsl.c:
} vkd3d_string_buffer_printf(buffer, "]"); }
else if (deref->offset.node)
else if (deref->is_lowered) { vkd3d_string_buffer_printf(buffer, "[");
dump_src(buffer, &deref->offset);
vkd3d_string_buffer_printf(buffer, "]");
if (deref->offset.node)
{
dump_src(buffer, &deref->offset);
vkd3d_string_buffer_printf(buffer, "c + ");
}
vkd3d_string_buffer_printf(buffer, "%uc]", deref->offset_const);
I wouldn't object to only printing the constant offset part if it is different from zero (unless, of course, the relative part is NULL too).
Giovanni Mascellani (@giomasce) commented about libs/vkd3d-shader/hlsl.h:
- /* Single instruction node of data type uint used to represent the register offset (in register
* components, within the pertaining regset), from the start of the variable, of the part
* referenced.
* The path is lowered to this single offset -- whose value may vary between SM1 and SM4 --
* before writing the bytecode.
- /* Before writing the bytecode, deref paths are lowered into an offset (within the pertaining
* regset) from the start of the variable, to the part of the variable that is referenced.
* This offset is stored using two fields, one for a variable part and other for a constant
* part, which are added together:
* - offset: An offset given by an instruction node, in number of register components.
struct hlsl_src offset;* - offset_const: A constant number of register components. * Since the type information cannot longer be retrieved from the offset alone, the type is * stored in the data_type field. */
- unsigned int offset_const;
Some bike shedding: `const_offset` sounds better to me, since I perceive `const` as an adjective.
Giovanni Mascellani (@giomasce) commented about libs/vkd3d-shader/hlsl_codegen.c:
switch (type->class) { case HLSL_CLASS_VECTOR:
idx_offset = idx;
*offset_component += hlsl_ir_constant(idx)->value.u[0].u;
So many flames to debate whether spaces or tabs should be used for indentation, and now we're turning to asterisks?!? :-P
Giovanni Mascellani (@giomasce) commented about libs/vkd3d-shader/hlsl_codegen.c:
struct hlsl_block idx_block; if (!(offset = new_offset_from_path_index(ctx, &idx_block, type, offset, deref->path[i].node,
regset, loc)))
regset, offset_component, loc)))
{
hlsl_block_cleanup(&idx_block);
Good to cleanup the node, but my feeling is that should be done at the same level it is initialized, i.e., inside `new_offset_from_path_index()`. To me keeping a consistent convention makes it easier to spot problems. Don't know what the others think about that. Another good option is to also initialize the block outside of `new_offset_from_path_index()`.
Giovanni Mascellani (@giomasce) commented about libs/vkd3d-shader/hlsl_codegen.c:
{ unsigned int size = hlsl_type_get_array_element_reg_size(type->e.array.type, regset);
if (regset == HLSL_REGSET_NUMERIC)
size /= 4;
It's pretty obvious, but can you `assert(size % 4 == 0)` please?
Small typo in "vkd3d-shader/hlsl: Absorv hlsl_ir_constant deref offsets into offset_const.": "Absorv" instead of "Absorb".
Giovanni Mascellani (@giomasce) commented about libs/vkd3d-shader/tpf.c:
if (ctx->temp_count) write_sm4_dcl_temps(&tpf, ctx->temp_count);
- LIST_FOR_EACH_ENTRY(scope, &ctx->scopes, struct hlsl_scope, entry)
- {
LIST_FOR_EACH_ENTRY(var, &scope->vars, struct hlsl_ir_var, scope_entry)
{
if (var->is_uniform || var->is_input_semantic || var->is_output_semantic)
continue;
if (!var->regs[HLSL_REGSET_NUMERIC].allocated)
continue;
if (var->indexable)
{
unsigned int id = var->regs[HLSL_REGSET_NUMERIC].id;
I'm not sure this is correct: it seems that the `r` and `x` registers are two independent namespaces, so the `id` that was allocated for the variable as an `r` register has no meaning if you want it to use it as an `x` register. This has a few undesirable implications: * You're wasting registers, because an `r` register makes it impossible to use the corresponding `x` registers and viceversa. This might be minor, I guess, since it's quite likely that a downstream compiler is going to redo register allocation anyway. But it's not pretty. * More importantly, you might end up allocating two different variables (with independent lifespans) on the same or conflicting `x` registers, therefore creating duplicate and possibly contradicting `dcl_indexable_temp` instructions. That doesn't look right.
I think that the proper way forward here is to skip indexable variables from the usual register allocation algorithm and have another round of `x` register allocation for them, in which I think that lifetime doesn't have to be considered (i.e., variables must use independent registers even if they never are alive at the same time). This last bit might require some independent confirmation: I remember it from past experiments, but I might be mistaken.
Mostly ok, except a handful of random comments and a slighter large issue in the last commit.