[PATCH 0/4] MR229: vkd3d-shader/hlsl: Support non-constant offset dereferences, part 2

List overview All Threads

newer

older

Re: [PATCH v2 0/1] MR3939:...

[PATCH 0/3] MR4024:...

Francisco Casas (＠fcasas)

9 Jun 2023 9 Jun '23

3 p.m.

Support for relative addressing in SM4.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/229

Show replies by date

Francisco Casas

9 Jun 9 Jun

3 p.m.

New subject: [PATCH 1/4] vkd3d-shader/hlsl: Mark vars that require non-constant dereferences.

From: Francisco Casas fcasas@codeweavers.com

--- libs/vkd3d-shader/hlsl.h | 3 +++ libs/vkd3d-shader/hlsl_codegen.c | 18 ++++++++++++++++++ 2 files changed, 21 insertions(+)

diff --git a/libs/vkd3d-shader/hlsl.h b/libs/vkd3d-shader/hlsl.h index 7c4e74dc..107ee153 100644 --- a/libs/vkd3d-shader/hlsl.h +++ b/libs/vkd3d-shader/hlsl.h @@ -418,6 +418,9 @@ struct hlsl_ir_var struct vkd3d_shader_location first_sampler_dim_loc; } *objects_usage[HLSL_REGSET_LAST_OBJECT + 1];

+ /* Whether the shader performs dereferences with non-constant offsets in the variable. */ + bool indexable; + uint32_t is_input_semantic : 1; uint32_t is_output_semantic : 1; uint32_t is_uniform : 1; diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index 4317604b..2dc92f00 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -2670,6 +2670,22 @@ static void dump_function(struct rb_entry *entry, void *context) rb_for_each_entry(&func->overloads, dump_function_decl, ctx); }

+static bool mark_indexable_vars(struct hlsl_ctx *ctx, struct hlsl_deref *deref, + struct hlsl_ir_node *instr) +{ + if (!deref->offset.node) + return false; + + if (deref->offset.node->type != HLSL_IR_CONSTANT) + { + assert(deref->var); + deref->var->indexable = true; + return true; + } + + return false; +} + static char get_regset_name(enum hlsl_regset regset) { switch (regset) @@ -4112,6 +4128,8 @@ int hlsl_emit_bytecode(struct hlsl_ctx *ctx, struct hlsl_ir_function_decl *entry if (TRACE_ON()) rb_for_each_entry(&ctx->functions, dump_function, ctx);

+ transform_derefs(ctx, mark_indexable_vars, body); + allocate_register_reservations(ctx);

calculate_resource_register_counts(ctx);

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/229

Francisco Casas

3 p.m.

New subject: [PATCH 2/4] vkd3d-shader/hlsl: Express deref->offset in whole registers.

From: Francisco Casas fcasas@codeweavers.com

--- libs/vkd3d-shader/hlsl.c | 11 ++++- libs/vkd3d-shader/hlsl.h | 11 ++--- libs/vkd3d-shader/hlsl_codegen.c | 72 +++++++++++++++++++------------- 3 files changed, 59 insertions(+), 35 deletions(-)

diff --git a/libs/vkd3d-shader/hlsl.c b/libs/vkd3d-shader/hlsl.c index f439c9f3..80fa725a 100644 --- a/libs/vkd3d-shader/hlsl.c +++ b/libs/vkd3d-shader/hlsl.c @@ -435,6 +435,7 @@ static bool init_deref(struct hlsl_ctx *ctx, struct hlsl_deref *deref, struct hl { deref->var = var; deref->path_len = path_len; + deref->offset_const = 0; deref->offset.node = NULL;

if (path_len == 0) @@ -462,6 +463,7 @@ bool hlsl_init_deref_from_index_chain(struct hlsl_ctx *ctx, struct hlsl_deref *d

deref->path = NULL; deref->path_len = 0; + deref->offset_const = 0; deref->offset.node = NULL;

assert(chain); @@ -2210,10 +2212,15 @@ static void dump_deref(struct vkd3d_string_buffer *buffer, const struct hlsl_der } vkd3d_string_buffer_printf(buffer, "]"); } - else if (deref->offset.node) + else if (deref->offset.node || deref->offset_const) { vkd3d_string_buffer_printf(buffer, "["); - dump_src(buffer, &deref->offset); + if (deref->offset.node) + dump_src(buffer, &deref->offset); + if (deref->offset.node && deref->offset_const) + vkd3d_string_buffer_printf(buffer, " + "); + if (deref->offset_const) + vkd3d_string_buffer_printf(buffer, "%u/4", deref->offset_const); vkd3d_string_buffer_printf(buffer, "]"); } } diff --git a/libs/vkd3d-shader/hlsl.h b/libs/vkd3d-shader/hlsl.h index 107ee153..09a9654f 100644 --- a/libs/vkd3d-shader/hlsl.h +++ b/libs/vkd3d-shader/hlsl.h @@ -599,12 +599,13 @@ struct hlsl_deref unsigned int path_len; struct hlsl_src *path;

- /* Single instruction node of data type uint used to represent the register offset (in register - * components, within the pertaining regset), from the start of the variable, of the part - * referenced. - * The path is lowered to this single offset -- whose value may vary between SM1 and SM4 -- - * before writing the bytecode. */ + /* Before writing the bytecode, the deref path is lowered into an offset from the start of the + * variable (within the pertaining regset), to the part of the variable that is referenced. + * This offset is composed of two parts, which are added together: + * - offset: An instruction node, storing the offset in whole registers. + * - offset_const: A constant offset, in number of register components. */ struct hlsl_src offset; + unsigned int offset_const; enum hlsl_regset offset_regset; };

diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index 2dc92f00..d4b7243a 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -23,8 +23,8 @@

/* TODO: remove when no longer needed, only used for new_offset_instr_from_deref() */ static struct hlsl_ir_node *new_offset_from_path_index(struct hlsl_ctx *ctx, struct hlsl_block *block, - struct hlsl_type *type, struct hlsl_ir_node *offset, struct hlsl_ir_node *idx, - enum hlsl_regset regset, const struct vkd3d_shader_location *loc) + struct hlsl_type *type, struct hlsl_ir_node *base_offset, struct hlsl_ir_node *idx, + enum hlsl_regset regset, unsigned int *offset_component, const struct vkd3d_shader_location *loc) { struct hlsl_ir_node *idx_offset = NULL; struct hlsl_ir_node *c; @@ -34,27 +34,21 @@ static struct hlsl_ir_node *new_offset_from_path_index(struct hlsl_ctx *ctx, str switch (type->class) { case HLSL_CLASS_VECTOR: - idx_offset = idx; + *offset_component += hlsl_ir_constant(idx)->value.u[0].u; break;

case HLSL_CLASS_MATRIX: - { - if (!(c = hlsl_new_uint_constant(ctx, 4, loc))) - return NULL; - hlsl_block_add_instr(block, c); - - if (!(idx_offset = hlsl_new_binary_expr(ctx, HLSL_OP2_MUL, c, idx))) - return NULL; - hlsl_block_add_instr(block, idx_offset); - + idx_offset = idx; break; - }

case HLSL_CLASS_ARRAY: { - unsigned int size = hlsl_type_get_array_element_reg_size(type->e.array.type, regset); + unsigned int elem_size = hlsl_type_get_array_element_reg_size(type->e.array.type, regset); + + if (regset == HLSL_REGSET_NUMERIC) + elem_size /= 4;

- if (!(c = hlsl_new_uint_constant(ctx, size, loc))) + if (!(c = hlsl_new_uint_constant(ctx, elem_size, loc))) return NULL; hlsl_block_add_instr(block, c);

@@ -69,8 +63,16 @@ static struct hlsl_ir_node *new_offset_from_path_index(struct hlsl_ctx *ctx, str { unsigned int field_idx = hlsl_ir_constant(idx)->value.u[0].u; struct hlsl_struct_field *field = &type->e.record.fields[field_idx]; + unsigned int field_offset = field->reg_offset[regset];

- if (!(c = hlsl_new_uint_constant(ctx, field->reg_offset[regset], loc))) + if (regset == HLSL_REGSET_NUMERIC) + { + assert(*offset_component == 0); + *offset_component = field_offset % 4; + field_offset /= 4; + } + + if (!(c = hlsl_new_uint_constant(ctx, field_offset, loc))) return NULL; hlsl_block_add_instr(block, c);

@@ -83,26 +85,33 @@ static struct hlsl_ir_node *new_offset_from_path_index(struct hlsl_ctx *ctx, str vkd3d_unreachable(); }

- if (offset) + if (idx_offset) { - if (!(idx_offset = hlsl_new_binary_expr(ctx, HLSL_OP2_ADD, offset, idx_offset))) + if (!(base_offset = hlsl_new_binary_expr(ctx, HLSL_OP2_ADD, base_offset, idx_offset))) return NULL; - hlsl_block_add_instr(block, idx_offset); + hlsl_block_add_instr(block, base_offset); }

- return idx_offset; + return base_offset; }

/* TODO: remove when no longer needed, only used for replace_deref_path_with_offset() */ static struct hlsl_ir_node *new_offset_instr_from_deref(struct hlsl_ctx *ctx, struct hlsl_block *block, - const struct hlsl_deref *deref, const struct vkd3d_shader_location *loc) + const struct hlsl_deref *deref, unsigned int *offset_component, const struct vkd3d_shader_location *loc) { - struct hlsl_ir_node *offset = NULL; + struct hlsl_ir_node *offset, *zero; struct hlsl_type *type; unsigned int i;

+ *offset_component = 0; + hlsl_block_init(block);

+ if (!(zero = hlsl_new_uint_constant(ctx, 0, loc))) + return NULL; + hlsl_block_add_instr(block, zero); + offset = zero; + assert(deref->var); type = deref->var->data_type;

@@ -111,8 +120,11 @@ static struct hlsl_ir_node *new_offset_instr_from_deref(struct hlsl_ctx *ctx, st struct hlsl_block idx_block;

if (!(offset = new_offset_from_path_index(ctx, &idx_block, type, offset, deref->path[i].node, - deref->offset_regset, loc))) + deref->offset_regset, offset_component, loc))) + { + hlsl_block_cleanup(&idx_block); return NULL; + }

hlsl_block_add_block(block, &idx_block);

@@ -126,6 +138,7 @@ static struct hlsl_ir_node *new_offset_instr_from_deref(struct hlsl_ctx *ctx, st static bool replace_deref_path_with_offset(struct hlsl_ctx *ctx, struct hlsl_deref *deref, struct hlsl_ir_node *instr) { + unsigned int offset_component; const struct hlsl_type *type; struct hlsl_ir_node *offset; struct hlsl_block block; @@ -147,12 +160,13 @@ static bool replace_deref_path_with_offset(struct hlsl_ctx *ctx, struct hlsl_der

deref->offset_regset = hlsl_type_get_regset(type);

- if (!(offset = new_offset_instr_from_deref(ctx, &block, deref, &instr->loc))) + if (!(offset = new_offset_instr_from_deref(ctx, &block, deref, &offset_component, &instr->loc))) return false; list_move_before(&instr->entry, &block.instrs);

hlsl_cleanup_deref(deref); hlsl_src_from_node(&deref->offset, offset); + deref->offset_const = offset_component;

return true; } @@ -3860,11 +3874,10 @@ bool hlsl_offset_from_deref(struct hlsl_ctx *ctx, const struct hlsl_deref *deref struct hlsl_ir_node *offset_node = deref->offset.node; unsigned int size;

+ *offset = deref->offset_const; + if (!offset_node) - { - *offset = 0; return true; - }

/* We should always have generated a cast to UINT. */ assert(offset_node->data_type->class == HLSL_CLASS_SCALAR @@ -3873,7 +3886,10 @@ bool hlsl_offset_from_deref(struct hlsl_ctx *ctx, const struct hlsl_deref *deref if (offset_node->type != HLSL_IR_CONSTANT) return false;

- *offset = hlsl_ir_constant(offset_node)->value.u[0].u; + if (deref->offset_regset == HLSL_REGSET_NUMERIC) + *offset += 4 * hlsl_ir_constant(offset_node)->value.u[0].u; + else + *offset += hlsl_ir_constant(offset_node)->value.u[0].u;

size = deref->var->data_type->reg_size[deref->offset_regset]; if (*offset >= size)

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/229

Francisco Casas

3 p.m.

New subject: [PATCH 3/4] vkd3d-shader/hlsl: Remove constant deref offset srcs.

From: Francisco Casas fcasas@codeweavers.com

This allows to remove offset nodes from the IR, making it more readable. --- libs/vkd3d-shader/hlsl_codegen.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)

diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index d4b7243a..d5a4ab25 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -171,6 +171,23 @@ static bool replace_deref_path_with_offset(struct hlsl_ctx *ctx, struct hlsl_der return true; }

+/* TODO: remove when no longer needed. */ +static bool clean_constant_deref_offset_srcs(struct hlsl_ctx *ctx, struct hlsl_deref *deref, + struct hlsl_ir_node *instr) +{ + if (deref->offset.node && deref->offset.node->type == HLSL_IR_CONSTANT) + { + if (deref->offset_regset == HLSL_REGSET_NUMERIC) + deref->offset_const += 4 * hlsl_ir_constant(deref->offset.node)->value.u[0].u; + else + deref->offset_const += hlsl_ir_constant(deref->offset.node)->value.u[0].u; + hlsl_src_remove(&deref->offset); + return true; + } + return false; +} + + /* Split uniforms into two variables representing the constant and temp * registers, and copy the former to the latter, so that writes to uniforms * work. */ @@ -4134,6 +4151,7 @@ int hlsl_emit_bytecode(struct hlsl_ctx *ctx, struct hlsl_ir_function_decl *entry /* TODO: move forward, remove when no longer needed */ transform_derefs(ctx, replace_deref_path_with_offset, body); while (hlsl_transform_ir(ctx, hlsl_fold_constant_exprs, body, NULL)); + transform_derefs(ctx, clean_constant_deref_offset_srcs, body);

do compute_liveness(ctx, entry_func);

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/229

Francisco Casas

3 p.m.

New subject: [PATCH 4/4] vkd3d-shader/tpf: Support relative addressing in sm4.

From: Francisco Casas fcasas@codeweavers.com

The idea of the struct sm4_xregister is that it is a regular struct sm4_register but that can also contain other registers for relative addressing in their indexes. --- libs/vkd3d-shader/tpf.c | 289 ++++++++++++++++++++----- tests/array-index-expr.shader_test | 36 +-- tests/function-return.shader_test | 22 +- tests/hlsl-matrix-indexing.shader_test | 6 +- tests/return.shader_test | 22 +- 5 files changed, 283 insertions(+), 92 deletions(-)

diff --git a/libs/vkd3d-shader/tpf.c b/libs/vkd3d-shader/tpf.c index d066b13e..5b1b96a2 100644 --- a/libs/vkd3d-shader/tpf.c +++ b/libs/vkd3d-shader/tpf.c @@ -3316,6 +3316,16 @@ struct sm4_register unsigned int mod; };

+struct sm4_xregister +{ + struct sm4_register r; + + /* Relative addressing to be added to the constant offsets in r.idx[·]. */ + bool idx_has_reg[3]; + struct sm4_register idx_reg[3]; + unsigned int idx_reg_swizzle[3]; +}; + struct sm4_instruction { enum vkd3d_sm4_opcode opcode; @@ -3325,14 +3335,22 @@ struct sm4_instruction

struct sm4_dst_register { - struct sm4_register reg; + union + { + struct sm4_xregister xreg; + struct sm4_register reg; + }; unsigned int writemask; } dsts[2]; unsigned int dst_count;

struct sm4_src_register { - struct sm4_register reg; + union + { + struct sm4_xregister xreg; + struct sm4_register reg; + }; enum vkd3d_sm4_swizzle_type swizzle_type; unsigned int swizzle; } srcs[5]; @@ -3344,11 +3362,64 @@ struct sm4_instruction unsigned int idx_count; };

-static void sm4_register_from_deref(struct hlsl_ctx *ctx, struct sm4_register *reg, +static void sm4_register_from_node(struct sm4_register *reg, unsigned int *writemask, + enum vkd3d_sm4_swizzle_type *swizzle_type, const struct hlsl_ir_node *instr) +{ + assert(instr->reg.allocated); + reg->type = VKD3D_SM4_RT_TEMP; + reg->dim = VKD3D_SM4_DIMENSION_VEC4; + if (swizzle_type) + *swizzle_type = VKD3D_SM4_SWIZZLE_VEC4; + reg->idx[0] = instr->reg.id; + reg->idx_count = 1; + *writemask = instr->reg.writemask; +} + +static void sm4_xregister_set_idxs_from_deref(struct hlsl_ctx *ctx, struct sm4_xregister *xreg, + const struct hlsl_deref *deref, unsigned int *writemask) +{ + const struct hlsl_ir_var *var = deref->var; + unsigned int offset = 0; + + xreg->r.idx[0] = var->regs[HLSL_REGSET_NUMERIC].id; + xreg->idx_has_reg[0] = false; + xreg->r.idx_count = 1; + + if (!var->indexable) + { + offset = hlsl_offset_from_deref_safe(ctx, deref); + xreg->r.idx[0] += offset / 4; + } + else + { + offset = deref->offset_const; + xreg->r.idx[1] = offset / 4; + xreg->idx_has_reg[1] = false; + xreg->r.idx_count = 2; + + if (deref->offset.node) + { + unsigned int idx_reg_writemask; + + xreg->idx_has_reg[1] = true; + sm4_register_from_node(&xreg->idx_reg[1], &idx_reg_writemask, NULL, deref->offset.node); + xreg->idx_reg_swizzle[1] = hlsl_swizzle_from_writemask(idx_reg_writemask) & 0x3; /* NOTE: Native applies this mask to the swizzle. */ + } + } + + *writemask = 0xf & (0xf << (offset % 4)); + if (var->regs[HLSL_REGSET_NUMERIC].writemask) + *writemask = hlsl_combine_writemasks(var->regs[HLSL_REGSET_NUMERIC].writemask, *writemask); +} + +static void sm4_xregister_from_deref(struct hlsl_ctx *ctx, struct sm4_xregister *xreg, unsigned int *writemask, enum vkd3d_sm4_swizzle_type *swizzle_type, const struct hlsl_deref *deref, const struct hlsl_type *data_type) { const struct hlsl_ir_var *var = deref->var; + struct sm4_register *reg = &xreg->r; + + memset(xreg, 0, sizeof(*xreg));

if (var->is_uniform) { @@ -3424,16 +3495,12 @@ static void sm4_register_from_deref(struct hlsl_ctx *ctx, struct sm4_register *r } else { - struct hlsl_reg hlsl_reg = hlsl_reg_from_deref(ctx, deref); - - assert(hlsl_reg.allocated); reg->type = VKD3D_SM4_RT_INPUT; reg->dim = VKD3D_SM4_DIMENSION_VEC4; if (swizzle_type) *swizzle_type = VKD3D_SM4_SWIZZLE_VEC4; - reg->idx[0] = hlsl_reg.id; - reg->idx_count = 1; - *writemask = hlsl_reg.writemask; + + sm4_xregister_set_idxs_from_deref(ctx, xreg, deref, writemask); } } else if (var->is_output_semantic) @@ -3458,28 +3525,22 @@ static void sm4_register_from_deref(struct hlsl_ctx *ctx, struct sm4_register *r } else { - struct hlsl_reg hlsl_reg = hlsl_reg_from_deref(ctx, deref); - - assert(hlsl_reg.allocated); reg->type = VKD3D_SM4_RT_OUTPUT; reg->dim = VKD3D_SM4_DIMENSION_VEC4; - reg->idx[0] = hlsl_reg.id; - reg->idx_count = 1; - *writemask = hlsl_reg.writemask; + if (swizzle_type) + *swizzle_type = VKD3D_SM4_SWIZZLE_VEC4; + + sm4_xregister_set_idxs_from_deref(ctx, xreg, deref, writemask); } } else { - struct hlsl_reg hlsl_reg = hlsl_reg_from_deref(ctx, deref); - - assert(hlsl_reg.allocated); - reg->type = VKD3D_SM4_RT_TEMP; + reg->type = deref->var->indexable ? VKD3D_SM4_RT_INDEXABLE_TEMP : VKD3D_SM4_RT_TEMP; reg->dim = VKD3D_SM4_DIMENSION_VEC4; if (swizzle_type) *swizzle_type = VKD3D_SM4_SWIZZLE_VEC4; - reg->idx[0] = hlsl_reg.id; - reg->idx_count = 1; - *writemask = hlsl_reg.writemask; + + sm4_xregister_set_idxs_from_deref(ctx, xreg, deref, writemask); } }

@@ -3488,27 +3549,16 @@ static void sm4_src_from_deref(struct hlsl_ctx *ctx, struct sm4_src_register *sr { unsigned int writemask;

- sm4_register_from_deref(ctx, &src->reg, &writemask, &src->swizzle_type, deref, data_type); + sm4_xregister_from_deref(ctx, &src->xreg, &writemask, &src->swizzle_type, deref, data_type); if (src->swizzle_type == VKD3D_SM4_SWIZZLE_VEC4) src->swizzle = hlsl_map_swizzle(hlsl_swizzle_from_writemask(writemask), map_writemask); }

-static void sm4_register_from_node(struct sm4_register *reg, unsigned int *writemask, - enum vkd3d_sm4_swizzle_type *swizzle_type, const struct hlsl_ir_node *instr) -{ - assert(instr->reg.allocated); - reg->type = VKD3D_SM4_RT_TEMP; - reg->dim = VKD3D_SM4_DIMENSION_VEC4; - *swizzle_type = VKD3D_SM4_SWIZZLE_VEC4; - reg->idx[0] = instr->reg.id; - reg->idx_count = 1; - *writemask = instr->reg.writemask; -} - static void sm4_dst_from_node(struct sm4_dst_register *dst, const struct hlsl_ir_node *instr) { unsigned int swizzle_type;

+ memset(&dst->xreg, 0, sizeof(dst->xreg)); sm4_register_from_node(&dst->reg, &dst->writemask, &swizzle_type, instr); }

@@ -3548,11 +3598,41 @@ static void sm4_src_from_node(struct sm4_src_register *src, return; }

+ memset(&src->xreg, 0, sizeof(src->xreg)); sm4_register_from_node(&src->reg, &writemask, &src->swizzle_type, instr); if (src->swizzle_type == VKD3D_SM4_SWIZZLE_VEC4) src->swizzle = hlsl_map_swizzle(hlsl_swizzle_from_writemask(writemask), map_writemask); }

+static unsigned int sm4_get_index_adressing_from_xreg(const struct sm4_xregister *xreg, + unsigned int i, unsigned int *size) +{ + if (i >= xreg->r.idx_count) + { + *size = 0; + return 0; + } + + if (xreg->idx_has_reg[i]) + { + if (xreg->r.idx[i] == 0) + { + *size = 1 + xreg->idx_reg[i].idx_count; + return VKD3D_SM4_ADDRESSING_RELATIVE; + } + else + { + *size = 2 + xreg->idx_reg[i].idx_count; + return VKD3D_SM4_ADDRESSING_RELATIVE | VKD3D_SM4_ADDRESSING_OFFSET; + } + } + else + { + *size = 1; + return 0; + } +} + static uint32_t sm4_encode_register(const struct sm4_register *reg) { return (reg->type << VKD3D_SM4_REGISTER_TYPE_SHIFT) @@ -3560,27 +3640,53 @@ static uint32_t sm4_encode_register(const struct sm4_register *reg) | (reg->dim << VKD3D_SM4_DIMENSION_SHIFT); }

-static uint32_t sm4_register_order(const struct sm4_register *reg) +static uint32_t sm4_encode_xregister(const struct sm4_xregister *xreg) +{ + unsigned int idx_size; + unsigned int idx0_addressing = sm4_get_index_adressing_from_xreg(xreg, 0, &idx_size); + unsigned int idx1_addressing = sm4_get_index_adressing_from_xreg(xreg, 1, &idx_size); + unsigned int idx2_addressing = sm4_get_index_adressing_from_xreg(xreg, 2, &idx_size); + + return (xreg->r.type << VKD3D_SM4_REGISTER_TYPE_SHIFT) + | (xreg->r.idx_count << VKD3D_SM4_REGISTER_ORDER_SHIFT) + | (idx0_addressing << VKD3D_SM4_ADDRESSING_SHIFT0) + | (idx1_addressing << VKD3D_SM4_ADDRESSING_SHIFT1) + | (idx2_addressing << VKD3D_SM4_ADDRESSING_SHIFT2) + | (xreg->r.dim << VKD3D_SM4_DIMENSION_SHIFT); +} + +static uint32_t sm4_xregister_order(const struct sm4_xregister *xreg) { uint32_t order = 1; - if (reg->type == VKD3D_SM4_RT_IMMCONST) - order += reg->dim == VKD3D_SM4_DIMENSION_VEC4 ? 4 : 1; - order += reg->idx_count; - if (reg->mod) + unsigned int i; + + if (xreg->r.type == VKD3D_SM4_RT_IMMCONST) + order += xreg->r.dim == VKD3D_SM4_DIMENSION_VEC4 ? 4 : 1; + + for (i = 0; i < xreg->r.idx_count; i++) + { + unsigned int size; + + sm4_get_index_adressing_from_xreg(xreg, i, &size); + order += size; + } + + if (xreg->r.mod) ++order; + return order; }

static void write_sm4_instruction(struct vkd3d_bytecode_buffer *buffer, const struct sm4_instruction *instr) { uint32_t token = instr->opcode; - unsigned int size = 1, i, j; + unsigned int size = 1, i, j, k;

size += instr->modifier_count; for (i = 0; i < instr->dst_count; ++i) - size += sm4_register_order(&instr->dsts[i].reg); + size += sm4_xregister_order(&instr->dsts[i].xreg); for (i = 0; i < instr->src_count; ++i) - size += sm4_register_order(&instr->srcs[i].reg); + size += sm4_xregister_order(&instr->srcs[i].xreg); size += instr->idx_count; if (instr->byte_stride) ++size; @@ -3601,18 +3707,43 @@ static void write_sm4_instruction(struct vkd3d_bytecode_buffer *buffer, const st

for (i = 0; i < instr->dst_count; ++i) { - token = sm4_encode_register(&instr->dsts[i].reg); + token = sm4_encode_xregister(&instr->dsts[i].xreg); if (instr->dsts[i].reg.dim == VKD3D_SM4_DIMENSION_VEC4) token |= instr->dsts[i].writemask << VKD3D_SM4_WRITEMASK_SHIFT; put_u32(buffer, token);

for (j = 0; j < instr->dsts[i].reg.idx_count; ++j) - put_u32(buffer, instr->dsts[i].reg.idx[j]); + { + unsigned int addressing, idx_size; + + addressing = sm4_get_index_adressing_from_xreg(&instr->dsts[i].xreg, j, &idx_size); + + if (addressing & VKD3D_SM4_ADDRESSING_RELATIVE) + { + const struct sm4_register *idx_reg = &instr->dsts[i].xreg.idx_reg[j]; + uint32_t idx_reg_swizzle = instr->dsts[i].xreg.idx_reg_swizzle[j]; + + token = sm4_encode_register(idx_reg); + token |= VKD3D_SM4_SWIZZLE_SCALAR << VKD3D_SM4_SWIZZLE_TYPE_SHIFT; + token |= idx_reg_swizzle << VKD3D_SM4_SWIZZLE_SHIFT; + put_u32(buffer, token); + + for (k = 0; k < idx_reg->idx_count; ++k) + put_u32(buffer, idx_reg->idx[k]); + + if (addressing & VKD3D_SM4_ADDRESSING_OFFSET) + put_u32(buffer, instr->dsts[i].xreg.r.idx[k]); + } + else + { + put_u32(buffer, instr->dsts[i].reg.idx[j]); + } + } }

for (i = 0; i < instr->src_count; ++i) { - token = sm4_encode_register(&instr->srcs[i].reg); + token = sm4_encode_xregister(&instr->srcs[i].xreg); token |= (uint32_t)instr->srcs[i].swizzle_type << VKD3D_SM4_SWIZZLE_TYPE_SHIFT; token |= instr->srcs[i].swizzle << VKD3D_SM4_SWIZZLE_SHIFT; if (instr->srcs[i].reg.mod) @@ -3624,8 +3755,34 @@ static void write_sm4_instruction(struct vkd3d_bytecode_buffer *buffer, const st | VKD3D_SM4_EXTENDED_OPERAND_MODIFIER);

for (j = 0; j < instr->srcs[i].reg.idx_count; ++j) - put_u32(buffer, instr->srcs[i].reg.idx[j]); + { + unsigned int addressing, idx_size;

+ addressing = sm4_get_index_adressing_from_xreg(&instr->srcs[i].xreg, j, &idx_size); + + if (addressing & VKD3D_SM4_ADDRESSING_RELATIVE) + { + const struct sm4_register *idx_reg = &instr->srcs[i].xreg.idx_reg[j]; + uint32_t idx_reg_swizzle = instr->srcs[i].xreg.idx_reg_swizzle[j]; + + token = sm4_encode_register(idx_reg); + token |= VKD3D_SM4_SWIZZLE_SCALAR << VKD3D_SM4_SWIZZLE_TYPE_SHIFT; + token |= idx_reg_swizzle << VKD3D_SM4_SWIZZLE_SHIFT; + put_u32(buffer, token); + + for (k = 0; k < idx_reg->idx_count; ++k) + put_u32(buffer, idx_reg->idx[k]); + + if (addressing & VKD3D_SM4_ADDRESSING_OFFSET) + put_u32(buffer, instr->srcs[i].xreg.r.idx[k]); + } + else + { + put_u32(buffer, instr->srcs[i].reg.idx[j]); + } + } + + /* FIXME: This comes before or after the previous block? */ if (instr->srcs[i].reg.type == VKD3D_SM4_RT_IMMCONST) { put_u32(buffer, instr->srcs[i].reg.immconst_uint[0]); @@ -3882,6 +4039,20 @@ static void write_sm4_dcl_temps(struct vkd3d_bytecode_buffer *buffer, uint32_t t write_sm4_instruction(buffer, &instr); }

+static void write_sm4_dcl_indexable_temp(struct vkd3d_bytecode_buffer *buffer, uint32_t idx, + uint32_t size, uint32_t comp_count) +{ + struct sm4_instruction instr = + { + .opcode = VKD3D_SM4_OP_DCL_INDEXABLE_TEMP, + + .idx = {idx, size, comp_count}, + .idx_count = 3, + }; + + write_sm4_instruction(buffer, &instr); +} + static void write_sm4_dcl_thread_group(struct vkd3d_bytecode_buffer *buffer, const uint32_t thread_count[3]) { struct sm4_instruction instr = @@ -4313,7 +4484,7 @@ static void write_sm4_store_uav_typed(struct hlsl_ctx *ctx, struct vkd3d_bytecod memset(&instr, 0, sizeof(instr)); instr.opcode = VKD3D_SM5_OP_STORE_UAV_TYPED;

- sm4_register_from_deref(ctx, &instr.dsts[0].reg, &instr.dsts[0].writemask, NULL, dst, dst->var->data_type); + sm4_xregister_from_deref(ctx, &instr.dsts[0].xreg, &instr.dsts[0].writemask, NULL, dst, dst->var->data_type); instr.dst_count = 1;

sm4_src_from_node(&instr.srcs[0], coords, VKD3DSP_WRITEMASK_ALL); @@ -5033,7 +5204,7 @@ static void write_sm4_store(struct hlsl_ctx *ctx, memset(&instr, 0, sizeof(instr)); instr.opcode = VKD3D_SM4_OP_MOV;

- sm4_register_from_deref(ctx, &instr.dsts[0].reg, &writemask, NULL, &store->lhs, rhs->data_type); + sm4_xregister_from_deref(ctx, &instr.dsts[0].xreg, &writemask, NULL, &store->lhs, rhs->data_type); instr.dsts[0].writemask = hlsl_combine_writemasks(writemask, store->writemask); instr.dst_count = 1;

@@ -5148,6 +5319,7 @@ static void write_sm4_shdr(struct hlsl_ctx *ctx, struct vkd3d_bytecode_buffer buffer = {0}; unsigned int extern_resources_count, i; const struct hlsl_buffer *cbuffer; + const struct hlsl_scope *scope; const struct hlsl_ir_var *var; size_t token_count_position;

@@ -5202,6 +5374,25 @@ static void write_sm4_shdr(struct hlsl_ctx *ctx, if (ctx->temp_count) write_sm4_dcl_temps(&buffer, ctx->temp_count);

+ LIST_FOR_EACH_ENTRY(scope, &ctx->scopes, struct hlsl_scope, entry) + { + LIST_FOR_EACH_ENTRY(var, &scope->vars, struct hlsl_ir_var, scope_entry) + { + if (var->is_uniform || var->is_input_semantic || var->is_output_semantic) + continue; + if (!var->regs[HLSL_REGSET_NUMERIC].allocated) + continue; + + if (var->indexable) + { + unsigned int id = var->regs[HLSL_REGSET_NUMERIC].id; + unsigned int size = align(var->data_type->reg_size[HLSL_REGSET_NUMERIC], 4) / 4; + + write_sm4_dcl_indexable_temp(&buffer, id, size, 4); + } + } + } + write_sm4_block(ctx, &buffer, &entry_func->body);

write_sm4_ret(&buffer); diff --git a/tests/array-index-expr.shader_test b/tests/array-index-expr.shader_test index 0a83080c..b88c54a9 100644 --- a/tests/array-index-expr.shader_test +++ b/tests/array-index-expr.shader_test @@ -1,4 +1,4 @@ -[pixel shader todo] +[pixel shader] uniform float4 f[3]; uniform float2 i;

@@ -12,17 +12,17 @@ uniform 0 float4 1.0 2.0 3.0 4.0 uniform 4 float4 5.0 6.0 7.0 8.0 uniform 8 float4 9.0 10.0 11.0 12.0 uniform 12 float4 0 0 0 0 -todo draw quad -todo probe all rgba (1.0, 2.0, 3.0, 4.0) +draw quad +probe all rgba (1.0, 2.0, 3.0, 4.0) uniform 12 float4 1 0 0 0 -todo draw quad -todo probe all rgba (5.0, 6.0, 7.0, 8.0) +draw quad +probe all rgba (5.0, 6.0, 7.0, 8.0) uniform 12 float4 0 1 0 0 -todo draw quad -todo probe all rgba (5.0, 6.0, 7.0, 8.0) +draw quad +probe all rgba (5.0, 6.0, 7.0, 8.0) uniform 12 float4 1 1 0 0 -todo draw quad -todo probe all rgba (9.0, 10.0, 11.0, 12.0) +draw quad +probe all rgba (9.0, 10.0, 11.0, 12.0)

[pixel shader] @@ -74,7 +74,7 @@ draw quad probe all rgba (24.0, 0.0, 21.0, 1.0)

-[pixel shader todo] +[pixel shader] uniform float2 i;

float4 main() : sv_target @@ -86,14 +86,14 @@ float4 main() : sv_target

[test] uniform 0 float4 0 0 0 0 -todo draw quad -todo probe all rgba (1.0, 2.0, 3.0, 4.0) +draw quad +probe all rgba (1.0, 2.0, 3.0, 4.0) uniform 0 float4 1 0 0 0 -todo draw quad -todo probe all rgba (5.0, 6.0, 7.0, 8.0) +draw quad +probe all rgba (5.0, 6.0, 7.0, 8.0) uniform 0 float4 0 1 0 0 -todo draw quad -todo probe all rgba (5.0, 6.0, 7.0, 8.0) +draw quad +probe all rgba (5.0, 6.0, 7.0, 8.0) uniform 0 float4 1 1 0 0 -todo draw quad -todo probe all rgba (9.0, 10.0, 11.0, 12.0) +draw quad +probe all rgba (9.0, 10.0, 11.0, 12.0) diff --git a/tests/function-return.shader_test b/tests/function-return.shader_test index cbd29749..80bc5767 100644 --- a/tests/function-return.shader_test +++ b/tests/function-return.shader_test @@ -258,7 +258,7 @@ uniform 0 float 0.9 draw quad probe all rgba (0.4, 0.1, 0.7, 0.6) 1

-[pixel shader todo] +[pixel shader]

uniform float4 f[3];

@@ -295,21 +295,21 @@ float4 main() : sv_target uniform 0 float4 0.3 0.0 0.0 0.0 uniform 4 float4 0.0 0.0 0.0 0.0 uniform 8 float4 0.1 0.0 0.0 0.0 -todo draw quad -todo probe all rgba (0.3, 0.2, 0.6, 0.6) 1 +draw quad +probe all rgba (0.3, 0.2, 0.6, 0.6) 1

uniform 4 float4 0.35 0.0 0.0 0.0 -todo draw quad -todo probe all rgba (0.3, 0.3, 0.6, 0.6) 1 +draw quad +probe all rgba (0.3, 0.3, 0.6, 0.6) 1

uniform 8 float4 0.5 0.0 0.0 0.0 -todo draw quad -todo probe all rgba (0.3, 0.5, 0.6, 0.6) 1 +draw quad +probe all rgba (0.3, 0.5, 0.6, 0.6) 1

uniform 0 float4 1.0 0.0 0.0 0.0 -todo draw quad -todo probe all rgba (0.3, 0.5, 0.6, 0.6) 1 +draw quad +probe all rgba (0.3, 0.5, 0.6, 0.6) 1

uniform 4 float4 2.0 0.0 0.0 0.0 -todo draw quad -todo probe all rgba (0.4, 0.1, 0.6, 0.6) 1 +draw quad +probe all rgba (0.4, 0.1, 0.6, 0.6) 1 diff --git a/tests/hlsl-matrix-indexing.shader_test b/tests/hlsl-matrix-indexing.shader_test index a57d8fb8..f44ba63e 100644 --- a/tests/hlsl-matrix-indexing.shader_test +++ b/tests/hlsl-matrix-indexing.shader_test @@ -124,7 +124,7 @@ draw quad probe all rgba (8, 9, 10, 11)

-[pixel shader todo] +[pixel shader] uniform float i;

float4 main() : sv_target @@ -136,5 +136,5 @@ float4 main() : sv_target

[test] uniform 0 float 3 -todo draw quad -todo probe all rgba (12, 13, 14, 15) +draw quad +probe all rgba (12, 13, 14, 15) diff --git a/tests/return.shader_test b/tests/return.shader_test index 9f800d1a..30fcd65b 100644 --- a/tests/return.shader_test +++ b/tests/return.shader_test @@ -217,7 +217,7 @@ uniform 0 float 0.8 draw quad probe all rgba (0.5, 0.5, 0.5, 0.5)

-[pixel shader todo] +[pixel shader]

uniform float4 f[3];

@@ -243,21 +243,21 @@ void main(out float4 ret : sv_target) uniform 0 float4 0.3 0.0 0.0 0.0 uniform 4 float4 0.0 0.0 0.0 0.0 uniform 8 float4 0.1 0.0 0.0 0.0 -todo draw quad -todo probe all rgba (0.1, 0.1, 0.1, 0.1) 1 +draw quad +probe all rgba (0.1, 0.1, 0.1, 0.1) 1

uniform 4 float4 0.35 0.0 0.0 0.0 -todo draw quad -todo probe all rgba (0.2, 0.2, 0.2, 0.2) 1 +draw quad +probe all rgba (0.2, 0.2, 0.2, 0.2) 1

uniform 8 float4 0.5 0.0 0.0 0.0 -todo draw quad -todo probe all rgba (0.4, 0.4, 0.4, 0.4) 1 +draw quad +probe all rgba (0.4, 0.4, 0.4, 0.4) 1

uniform 0 float4 1.0 0.0 0.0 0.0 -todo draw quad -todo probe all rgba (0.4, 0.4, 0.4, 0.4) 1 +draw quad +probe all rgba (0.4, 0.4, 0.4, 0.4) 1

uniform 4 float4 2.0 0.0 0.0 0.0 -todo draw quad -todo probe all rgba (0.9, 0.9, 0.9, 0.9) 1 +draw quad +probe all rgba (0.9, 0.9, 0.9, 0.9) 1

-- GitLab https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/229

Francisco Casas (＠fcasas)

3:02 p.m.

I thought on including the following change but didn't because, if I remember correctly, last time we agreed on not doing it. It may make more sense now though.

My opinion is that we should include all the fields that are exclusive to both sm4_dst_register and sm4_src_register within struct sm4_register, namely:

```c unsigned int writemask; enum vkd3d_sm4_swizzle_type swizzle_type; unsigned int swizzle; ``` even if they are not used when the register is src or dst.

This would save us from passing them as arguments every time, which also adds complexity when these are register used as indexes of a struct sm4_xregister.

Also, this would also allows us to define struct sm4_xregister simply as: ```c struct sm4_xregister { struct sm4_register r;

/* Relative addressing to be added to the constant offsets in r.idx[·]. */ bool idx_has_reg[3]; struct sm4_register idx_reg[3]; }; ```

and use it directly for the dsts[] and srcs[] fields of struct sm4_instruction.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/229#note_35273

Zebediah Figura (＠zfigura)

5:48 p.m.

I'm going to have to push back on 2/4. Along the lines of 215, I don't want to spend too much effort doing this on the HLSL level when we really should be doing these kinds of transformations over lower level IR.

I don't like how 4/4 avoids the recursive struct; probably better to do it like vkd3d_shader_register.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/229#note_35284

Giovanni Mascellani (＠giomasce)

12 Jun 12 Jun

11:46 a.m.

2/4 seems to be doing two things: one is, as declared in the subject, dividing by four `deref->offset` (BTW, what is the reason? Maybe that should be explained in the commit message); the other is introducing `offset_const` and partially populating it in `new_offset_instr_from_deref()`. Could these two be splitted?

WRT Zeb's comment, I also think that the current implementation is a bit more complicated than it should be. But I think that a simpler implementation might be acceptable. What I would do is to avoid touching `offset_const` in `new_offset_instr_from_deref()`, and rather have `clean_constant_deref_offset_srcs()` from 3/4 shove every additive constant it can find into `offset_const` (basically to recurse on both arguments as long as you find `HLSL_OP2_ADD` expressions, and remove all the addends that are constants, adding them to `offset_const`).

Does this make sense to you?

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/229#note_35439

Henri Verbeet (＠hverbeet)

12:06 p.m.

...

I'm going to have to push back on 2/4. Along the lines of 215, I don't want to spend too much effort doing this on the HLSL level when we really should be doing these kinds of transformations over lower level IR.

I don't like how 4/4 avoids the recursive struct; probably better to do it like vkd3d_shader_register.

More broadly, is there a particular reason we couldn't start using the vkd3d_shader_instruction structures here? I didn't do a comprehensive review, but the sm4_instruction structures largely just look like subsets of the vkd3d_shader_instruction structures, as you'd expect.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/229#note_35443

Giovanni Mascellani (＠giomasce)

1:13 p.m.

Giovanni Mascellani (@giomasce) commented about libs/vkd3d-shader/tpf.c:

...

     return;
 }
memset(&src->xreg, 0, sizeof(src->xreg)); sm4_register_from_node(&src->reg, &writemask, &src->swizzle_type, instr); if (src->swizzle_type == VKD3D_SM4_SWIZZLE_VEC4) src->swizzle = hlsl_map_swizzle(hlsl_swizzle_from_writemask(writemask), map_writemask);

}

+static unsigned int sm4_get_index_adressing_from_xreg(const struct sm4_xregister *xreg,

Typo: `addressing`

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/229#note_35452

Giovanni Mascellani (＠giomasce)

1:13 p.m.

Giovanni Mascellani (@giomasce) commented about libs/vkd3d-shader/tpf.c:

...

LIST_FOR_EACH_ENTRY(scope, &ctx->scopes, struct hlsl_scope, entry)
{

   LIST_FOR_EACH_ENTRY(var, &scope->vars, struct hlsl_ir_var, scope_entry)

```
   {
```

       if (var->is_uniform || var->is_input_semantic || var->is_output_semantic)

```
           continue;
```

       if (!var->regs[HLSL_REGSET_NUMERIC].allocated)

```
           continue;
```
```
       if (var->indexable)
```
```
       {
```

           unsigned int id = var->regs[HLSL_REGSET_NUMERIC].id;

           unsigned int size = align(var->data_type->reg_size[HLSL_REGSET_NUMERIC], 4) / 4;

           write_sm4_dcl_indexable_temp(&buffer, id, size, 4);

Maybe these could be sorted by `id`?

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/229#note_35453

Giovanni Mascellani (＠giomasce)

1:13 p.m.

On Mon Jun 12 13:13:16 2023 +0000, Henri Verbeet wrote:

...

...
I'm going to have to push back on 2/4. Along the lines of 215, I don't

want to spend too much effort doing this on the HLSL level when we really should be doing these kinds of transformations over lower level IR.

...
I don't like how 4/4 avoids the recursive struct; probably better to

do it like vkd3d_shader_register. More broadly, is there a particular reason we couldn't start using the vkd3d_shader_instruction structures here? I didn't do a comprehensive review, but the sm4_instruction structures largely just look like subsets of the vkd3d_shader_instruction structures, as you'd expect.

I'm not opposed in principle to migrating to `vkd3d_shader_instruction`, though I haven't considered precisely how much work it is. Surely reusing structures instead of defining a new variant has some value. OTOH, in this case we need much less than the whole `vkd3d_shader_register` structure. Currently I think we either use one constant register, or one constant and one dynamic register. Unless we already anticipate needing more, we could just design our structures around this.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/229#note_35454

Giovanni Mascellani (＠giomasce)

1:36 p.m.

Incidentally, Francisco, do you have plans for concluding the paths to offsets transition? It would be nice to avoid having it dragging forever.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/229#note_35455

Henri Verbeet (＠hverbeet)

13 Jun 13 Jun

11:21 a.m.

...

I'm not opposed in principle to migrating to `vkd3d_shader_instruction`, though I haven't considered precisely how much work it is. Surely reusing structures instead of defining a new variant has some value. OTOH, in this case we need much less than the whole `vkd3d_shader_register` structure. Currently I think we either use one constant register, or one constant and one dynamic register. Unless we already anticipate needing more, we could just design our structures around this.

I think the main value in using the vkd3d_shader_instruction structures is that it gets us a step closer to having a common low level IR in vkd3d-shader. I.e., allowing us to do HLSL/d3dbc/tpf/dxil/d3d-asm -> vkil -> d3dbc/tpf/dxil/d3d-asm/SPIR-V/GLSL/MSL/AIR in a more straightforward manner. Some of those are more useful than others, but e.g. d3d-asm->d3dbc is something that Wine's d3dx9/d3dcompiler needs, and e.g. HLSL->SPIR-V/GLSL/MSL/AIR is useful for porting Direct3D applications.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/229#note_35614

Francisco Casas (＠fcasas)

1:15 p.m.

On Tue Jun 13 13:15:25 2023 +0000, Giovanni Mascellani wrote:

...

Incidentally, Francisco, do you have plans for concluding the paths to offsets transition? It would be nice to avoid having it dragging forever.

I will work on that once I finish implementing the remaining suggestions of !209. Because I am seeing this MR won't get accepted before that.

To be honest, I am not yet convinced if it would be really good to make the paths reach the bytecode part, but I will try implementing that nevertheless. It may change my opinion, as it often happens.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/229#note_35629

Francisco Casas (＠fcasas)

7 Jul 7 Jul

2:54 a.m.

...

2/4 seems to be doing two things: one is, as declared in the subject, dividing by four `deref->offset` (BTW, what is the reason? Maybe that should be explained in the commit message); the other is introducing `offset_const` and partially populating it in `new_offset_instr_from_deref()`. Could these two be splitted?

Right, I am splitting the patch in two and adding an explanation to the patch description. The main reason for these changes is that SM4 relative addressing (using a register as an index of a src or dst register) is done with a whole-register granularity, so we store the register-component part of the offset with this new uint. Also, storing the offset in this way (node + uint) allows us to latter move all the offset to the uint in case the node is constant (what 3/4 does), removing the offset nodes and making the IR more readable.

...

WRT Zeb's comment, I also think that the current implementation is a bit more complicated than it should be. But I think that a simpler implementation might be acceptable. What I would do is to avoid touching `offset_const` in `new_offset_instr_from_deref()`, and rather have `clean_constant_deref_offset_srcs()` from 3/4 shove every additive constant it can find into `offset_const` (basically to recurse on both arguments as long as you find `HLSL_OP2_ADD` expressions, and remove all the addends that are constants, adding them to `offset_const`).

Does this make sense to you?

I don't think that idea would be better. The offset node, if it is to contain all the offset information until `clean_constant_deref_offset_srcs()`, would have to be in components and not in whole-registers, which means we would have to introduce HLSL_OP2_DIVs to get a node in whole-registers. Also, it may be a little brittle, since we would be asserting that we are always capable of extracting the component part of the offset as a constant there.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/229#note_38265

Francisco Casas (＠fcasas)

2:54 a.m.

On Fri Jul 7 02:54:01 2023 +0000, Francisco Casas wrote:

...

I thought on including the following change but didn't because, if I remember correctly, last time we agreed on not doing it. It may make more sense now though. My opinion is that we should include all the fields that are exclusive to both sm4_dst_register and sm4_src_register within struct sm4_register, namely:
unsigned int writemask;
enum vkd3d_sm4_swizzle_type swizzle_type;
unsigned int swizzle;
even if they are not used when the register is src or dst. This would save us from passing them as arguments every time, which also adds complexity when these are register used as indexes of a struct sm4_xregister. Also, this would also allows us to define struct sm4_xregister simply as:
struct sm4_xregister
{
    struct sm4_register r;
    /* Relative addressing to be added to the constant offsets in
r.idx[·]. */
    bool idx_has_reg[3];
    struct sm4_register idx_reg[3];
};
and use it directly for the dsts[] and srcs[] fields of struct sm4_instruction.

This comment is superseded by the proposal in !269 .

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/229#note_38266

Francisco Casas (＠fcasas)

2:54 a.m.

...

I'm going to have to push back on 2/4. Along the lines of 215, I don't want to spend too much effort doing this on the HLSL level when we really should be doing these kinds of transformations over lower level IR.

I am not following what on 2/4 requires too much effort. Can you specify?

In sm4 relative addressing can only be done with "whole-register granularity", meaning that the node has to be expressed in whole registers and to specify register components we require a constant uint offset. If the node is constant, moving its value to this constant uint is a low-hanging fruit, I think.

...

I don't like how 4/4 avoids the recursive struct; probably better to do it like vkd3d_shader_register.

The thing is that the `struct sm4_instruction` values are "ephemeral", they are only created to be used as an input to write_sm4_instruction() and currently they don't allocate heap memory.

I made the extended register `struct sm4_xregister` to avoid allocating heap memory and having to take it into account and free it everywhere it is used, and its `.r` field can be initialized as a regular register.

I guess you are thinking on something more orthodox, along the lines of:

```c struct sm4_register_index { struct sm4_register *rel_addr; unsigned int offset; };

struct sm4_register { enum vkd3d_sm4_register_type type; struct sm4_register_index idx[2]; unsigned int idx_count;

/* ... */ };

static struct sm4_register *sm4_new_register(void); static void sm4_register_add_index(struct sm4_register *reg, struct sm4_register *rel_addr, unsigned int offset); static void sm4_register_cleanup(struct sm4_register *reg); ```

so I am implementing that.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/229#note_38267

Francisco Casas (＠fcasas)

2:54 a.m.

On Mon Jun 12 13:13:16 2023 +0000, Giovanni Mascellani wrote:

...

I'm not opposed in principle to migrating to `vkd3d_shader_instruction`, though I haven't considered precisely how much work it is. Surely reusing structures instead of defining a new variant has some value. OTOH, in this case we need much less than the whole `vkd3d_shader_register` structure. Currently I think we either use one constant register, or one constant and one dynamic register. Unless we already anticipate needing more, we could just design our structures around this.

Given that the `struct sm4_instruction` values don't leave the `write_sm4_*()` functions and are just used as input for `write_sm4_instruction()`, I don't see much value in using `vkd3d_shader_instruction` instead. Also, some translation functions may be required, for instance, the vkd3d swizzles are different from the SM4 swizzles.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/229#note_38269

Francisco Casas (＠fcasas)

2:54 a.m.

On Mon Jun 12 13:13:16 2023 +0000, Giovanni Mascellani wrote:

...

Maybe these could be sorted by `id`?

Yes, but I don't see a trivial way. What benefit can this bring?

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/229#note_38270

Francisco Casas (＠fcasas)

2:54 a.m.

...

Incidentally, Francisco, do you have plans for concluding the paths to offsets transition? It would be nice to avoid having it dragging forever.

I see now that you meant to just close it in any way, and not specifically dragging the paths into tpf.c and d3dbc.c.

I am super inclined to keep passing the derefs as an offset node plus some extra information, taking advantage of the IR infrastructure, as we do now instead of passing the whole deref path to tpf.c and d3dbc.c and encumbering them with the offset calculations.

I will include patches to remove the TODO's related to `replace_deref_path_with_offset()` in part4 of broaden resources support (after !209 gets approved), if there are objections we can discuss them there.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/229#note_38271

Francisco Casas (＠fcasas)

10 Jul 10 Jul

4:58 p.m.

On Fri Jul 7 02:54:02 2023 +0000, Francisco Casas wrote:

...

...
I'm going to have to push back on 2/4. Along the lines of 215, I don't

want to spend too much effort doing this on the HLSL level when we really should be doing these kinds of transformations over lower level IR. I am not following what on 2/4 requires too much effort. Can you specify? In sm4 relative addressing can only be done with "whole-register granularity", meaning that the node has to be expressed in whole registers and to specify register components we require a constant uint offset. If the node is constant, moving its value to this constant uint is a low-hanging fruit, I think.

...
I don't like how 4/4 avoids the recursive struct; probably better to

do it like vkd3d_shader_register. The thing is that the `struct sm4_instruction` values are "ephemeral", they are only created to be used as an input to write_sm4_instruction() and currently they don't allocate heap memory. I made the extended register `struct sm4_xregister` to avoid allocating heap memory and having to take it into account and free it everywhere it is used, and its `.r` field can be initialized as a regular register. I guess you are thinking on something more orthodox, along the lines of:
struct sm4_register_index
{
    struct sm4_register *rel_addr;
    unsigned int offset;
};
struct sm4_register
{
    enum vkd3d_sm4_register_type type;
    struct sm4_register_index idx[2];
    unsigned int idx_count;
    /* ... */
};
static struct sm4_register *sm4_new_register(void);
static void sm4_register_add_index(struct sm4_register *reg, struct
sm4_register *rel_addr, unsigned int offset);
static void sm4_register_cleanup(struct sm4_register *reg);
so I am implementing that.

For the record, I implemented this in this branch https://gitlab.winehq.org/fcasas/vkd3d/-/commits/nonconst-offsets-6. As you can see, moving the registers to the heap is costly (https://gitlab.winehq.org/fcasas/vkd3d/-/commit/b304a29e7668493dec3183428510...), but after that it allows for a nice implementation of the writing instructions part (https://gitlab.winehq.org/fcasas/vkd3d/-/commit/83ff8f6b866e19639b5e946055e1...). So it may be worth it.

One thing about this branch is that it requires !269 first, and henri is suggesting making sm4_register a subset of struct vkd3d_shader_register, so I will get to work on that.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/229#note_38551

Zebediah Figura (＠zfigura)

11 Jul 11 Jul

11:10 p.m.

...

The thing is that the `struct sm4_instruction` values are "ephemeral", they are only created to be used as an input to write_sm4_instruction() and currently they don't allocate heap memory.

I made the extended register `struct sm4_xregister` to avoid allocating heap memory and having to take it into account and free it everywhere it is used, and its `.r` field can be initialized as a regular register.

Eventually we are going to need those structs in heap memory; the idea is that sm4_instruction won't be ephemeral, but rather we'll eventually do passes over it.

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/229#note_38728

Ethan Lee (＠flibitijibibo)

22 Sep 22 Sep

8:24 p.m.

Sorry for the bump, but wanted to be sure I was caught up after disappearing for a few months.

If I haven't missed anything, this feature is blocked on the following changes: - [ ] Replace sm4_register with vkd3d_shader_register - [ ] Finish transition from paths to offsets - [ ] Address consistency issues between offsets and _register_ offsets?

This is still our only blocker for native D3D11 support, and thanks to Recent Developments I should be able to start helping with HLSL and TPF features again soon!

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/229#note_46386

Francisco Casas (＠fcasas)

25 Sep 25 Sep

3:51 p.m.

On Mon Sep 25 15:51:27 2023 +0000, Ethan Lee wrote:

...

Sorry for the bump, but wanted to be sure I was caught up after disappearing for a few months. If I haven't missed anything, this feature is blocked on the following changes:

[ ] Replace sm4_register with vkd3d_shader_register

[ ] Finish transition from paths to offsets

[ ] Address consistency issues between offsets and _register_ offsets?

This is still our only blocker for native D3D11 support, and thanks to Recent Developments I should be able to start helping with HLSL and TPF features again soon!

I think that the blocker is just the first point: "Replace sm4_register with vkd3d_shader_register" (the first part is !319), we can certainly support non-constant offset dereferences before solving the second point, which may take a while. I am not sure I understand the third point, besides what is addressed by 2/4.

...

This is still our only blocker for native D3D11 support, and thanks to Recent Developments I should be able to start helping with HLSL and TPF features again soon!

Nice!

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/229#note_46657

Ethan Lee (＠flibitijibibo)

27 Sep 27 Sep

9:09 p.m.

On Mon Sep 25 15:51:27 2023 +0000, Francisco Casas wrote:

...

I think that the blocker is just the first point: "Replace sm4_register with vkd3d_shader_register" (the first part is !319), we can certainly support non-constant offset dereferences before solving the second point, which may take a while. I am not sure I understand the third point, besides what is addressed by 2/4.

...
This is still our only blocker for native D3D11 support, and thanks to

Recent Developments I should be able to start helping with HLSL and TPF features again soon! Nice!

That all makes sense to me - with that in mind I tried doing a manual rebase of this based on !370, but got lost a bit at around the codegen stage. Happy to test out any WIPs of a refreshed version of this MR ASAP!

-- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/229#note_47041