Recap:
I implemented relative addressing in !229, but I was suggested to handle register indexes by moving all the registers to the heap, which I did in my [nonconst-offsets-6](https://gitlab.winehq.org/fcasas/vkd3d/-/commits/nonconst-offsets-6) branch. A part of this implementation required !269, but I was asked to try to turn the sm4 register structs into the vkd3d-shader register structs instead, to get closer to a common low level IR.
This is the first part of that transformation. The whole thing is in my [use_vkd3d_reg](https://gitlab.winehq.org/fcasas/vkd3d/-/commits/use_vkd3d_reg) branch.
This is built on top of !225 (the first two patches), so it may be good to get that upstream first.
---
This patch series aims to do the following replacements: ``` struct sm4_register -> struct vkd3d_shader_register struct sm4_dst_register -> struct vkd3d_shader_dst_param struct sm4_src_register -> struct vkd3d_shader_src_param ``` to get us closer to a common level IR and simplify the implementation of relative-addressing.
To achieve this, the fields in the sm4 register structs are replaced with fields in the vkd3d-shader structs or removed altogether, one at the time.
As can be seen when looking at the whole branch, it is possible to do this transformation without having to add additional fields to `struct vkd3d_shader_register`, by restricting each register type to a single `enum vkd3d_sm4_dimension` (and its src registers to a single `enum vkd3d_sm4_swizzle_type`) by default.
The only exception we need so far is for sampler registers: They always have dimension NONE, except when used as arguments of gather instructions, in which case they have dimension VEC4 and SCALAR swizzle. This, and similar exceptions we may find in the future, can be handled using the opcode_info, as in 7/8 [81e17506](https://gitlab.winehq.org/fcasas/vkd3d/-/commit/81e17506ba2cb1fbf4d021159ab1...).
-- v2: vkd3d-shader/tpf: Get rid of sm4_register.dim. vkd3d-shader/tpf: Use 's' in src_info to expect sampler register with vec4 dimension. vkd3d-shader/tpf: Separate dst register write function. vkd3d-shader/tpf: Separate src register write function. vkd3d-shader/tpf: Make register_type_table an array of structs. vkd3d-shader/tpf: Use struct vkd3d_shader_register_index in sm4_register.idx[]. vkd3d-shader/tpf: Use enum vkd3d_shader_register_type in sm4_register.type. vkd3d-shader/tpf: Rename VKD3D_SM4_SWIZZLE_NONE as VKD3D_SM4_SWIZZLE_MASK4.
From: Conor McCarthy cmccarthy@codeweavers.com
--- libs/vkd3d-shader/tpf.c | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-)
diff --git a/libs/vkd3d-shader/tpf.c b/libs/vkd3d-shader/tpf.c index 4be2f8b8..30ad0825 100644 --- a/libs/vkd3d-shader/tpf.c +++ b/libs/vkd3d-shader/tpf.c @@ -505,7 +505,7 @@ enum vkd3d_sm4_input_primitive_type
enum vkd3d_sm4_swizzle_type { - VKD3D_SM4_SWIZZLE_NONE = 0x0, + VKD3D_SM4_SWIZZLE_MASK4 = 0x0, VKD3D_SM4_SWIZZLE_VEC4 = 0x1, VKD3D_SM4_SWIZZLE_SCALAR = 0x2, }; @@ -1981,7 +1981,7 @@ static bool shader_sm4_read_src_param(struct vkd3d_shader_sm4_parser *priv, cons
switch (swizzle_type) { - case VKD3D_SM4_SWIZZLE_NONE: + case VKD3D_SM4_SWIZZLE_MASK4: if (shader_sm4_is_scalar_register(&src_param->reg)) src_param->swizzle = VKD3D_SHADER_SWIZZLE(X, X, X, X); else @@ -2534,19 +2534,19 @@ bool hlsl_sm4_register_from_semantic(struct hlsl_ctx *ctx, const struct hlsl_sem } register_table[] = { - {"sv_dispatchthreadid", false, VKD3D_SHADER_TYPE_COMPUTE, VKD3D_SM4_SWIZZLE_VEC4, VKD3D_SM5_RT_THREAD_ID, false}, - {"sv_groupid", false, VKD3D_SHADER_TYPE_COMPUTE, VKD3D_SM4_SWIZZLE_VEC4, VKD3D_SM5_RT_THREAD_GROUP_ID, false}, - {"sv_groupthreadid", false, VKD3D_SHADER_TYPE_COMPUTE, VKD3D_SM4_SWIZZLE_VEC4, VKD3D_SM5_RT_LOCAL_THREAD_ID, false}, + {"sv_dispatchthreadid", false, VKD3D_SHADER_TYPE_COMPUTE, VKD3D_SM4_SWIZZLE_VEC4, VKD3D_SM5_RT_THREAD_ID, false}, + {"sv_groupid", false, VKD3D_SHADER_TYPE_COMPUTE, VKD3D_SM4_SWIZZLE_VEC4, VKD3D_SM5_RT_THREAD_GROUP_ID, false}, + {"sv_groupthreadid", false, VKD3D_SHADER_TYPE_COMPUTE, VKD3D_SM4_SWIZZLE_VEC4, VKD3D_SM5_RT_LOCAL_THREAD_ID, false},
- {"sv_primitiveid", false, VKD3D_SHADER_TYPE_GEOMETRY, VKD3D_SM4_SWIZZLE_NONE, VKD3D_SM4_RT_PRIMID, false}, + {"sv_primitiveid", false, VKD3D_SHADER_TYPE_GEOMETRY, VKD3D_SM4_SWIZZLE_MASK4, VKD3D_SM4_RT_PRIMID, false},
/* Put sv_target in this table, instead of letting it fall through to * default varying allocation, so that the register index matches the * usage index. */ - {"color", true, VKD3D_SHADER_TYPE_PIXEL, VKD3D_SM4_SWIZZLE_VEC4, VKD3D_SM4_RT_OUTPUT, true}, - {"depth", true, VKD3D_SHADER_TYPE_PIXEL, VKD3D_SM4_SWIZZLE_VEC4, VKD3D_SM4_RT_DEPTHOUT, false}, - {"sv_depth", true, VKD3D_SHADER_TYPE_PIXEL, VKD3D_SM4_SWIZZLE_VEC4, VKD3D_SM4_RT_DEPTHOUT, false}, - {"sv_target", true, VKD3D_SHADER_TYPE_PIXEL, VKD3D_SM4_SWIZZLE_VEC4, VKD3D_SM4_RT_OUTPUT, true}, + {"color", true, VKD3D_SHADER_TYPE_PIXEL, VKD3D_SM4_SWIZZLE_VEC4, VKD3D_SM4_RT_OUTPUT, true}, + {"depth", true, VKD3D_SHADER_TYPE_PIXEL, VKD3D_SM4_SWIZZLE_VEC4, VKD3D_SM4_RT_DEPTHOUT, false}, + {"sv_depth", true, VKD3D_SHADER_TYPE_PIXEL, VKD3D_SM4_SWIZZLE_VEC4, VKD3D_SM4_RT_DEPTHOUT, false}, + {"sv_target", true, VKD3D_SHADER_TYPE_PIXEL, VKD3D_SM4_SWIZZLE_VEC4, VKD3D_SM4_RT_OUTPUT, true}, };
for (i = 0; i < ARRAY_SIZE(register_table); ++i) @@ -3507,7 +3507,7 @@ static void sm4_register_from_deref(struct hlsl_ctx *ctx, struct sm4_register *r reg->type = VKD3D_SM4_RT_SAMPLER; reg->dim = VKD3D_SM4_DIMENSION_NONE; if (swizzle_type) - *swizzle_type = VKD3D_SM4_SWIZZLE_NONE; + *swizzle_type = VKD3D_SM4_SWIZZLE_MASK4; reg->idx[0] = var->regs[HLSL_REGSET_SAMPLERS].id; reg->idx[0] += hlsl_offset_from_deref_safe(ctx, deref); assert(regset == HLSL_REGSET_SAMPLERS); @@ -3639,7 +3639,7 @@ static void sm4_dst_from_node(struct sm4_dst_register *dst, const struct hlsl_ir static void sm4_src_from_constant_value(struct sm4_src_register *src, const struct hlsl_constant_value *value, unsigned int width, unsigned int map_writemask) { - src->swizzle_type = VKD3D_SM4_SWIZZLE_NONE; + src->swizzle_type = VKD3D_SM4_SWIZZLE_MASK4; src->reg.type = VKD3D_SM4_RT_IMMCONST; if (width == 1) { @@ -4198,7 +4198,7 @@ static void write_sm4_ld(struct hlsl_ctx *ctx, struct vkd3d_bytecode_buffer *buf index = hlsl_ir_constant(sample_index);
memset(&instr.srcs[2], 0, sizeof(instr.srcs[2])); - instr.srcs[2].swizzle_type = VKD3D_SM4_SWIZZLE_NONE; + instr.srcs[2].swizzle_type = VKD3D_SM4_SWIZZLE_MASK4; reg->type = VKD3D_SM4_RT_IMMCONST; reg->dim = VKD3D_SM4_DIMENSION_SCALAR; reg->immconst_uint[0] = index->value.u[0].u; @@ -4317,7 +4317,7 @@ static void write_sm4_cast_from_bool(struct hlsl_ctx *ctx, instr.dst_count = 1;
sm4_src_from_node(&instr.srcs[0], arg, instr.dsts[0].writemask); - instr.srcs[1].swizzle_type = VKD3D_SM4_SWIZZLE_NONE; + instr.srcs[1].swizzle_type = VKD3D_SM4_SWIZZLE_MASK4; instr.srcs[1].reg.type = VKD3D_SM4_RT_IMMCONST; instr.srcs[1].reg.dim = VKD3D_SM4_DIMENSION_SCALAR; instr.srcs[1].reg.immconst_uint[0] = mask;
From: Francisco Casas fcasas@codeweavers.com
--- libs/vkd3d-shader/tpf.c | 88 ++++++++++++++++++++++++----------------- 1 file changed, 51 insertions(+), 37 deletions(-)
diff --git a/libs/vkd3d-shader/tpf.c b/libs/vkd3d-shader/tpf.c index 30ad0825..2264afb8 100644 --- a/libs/vkd3d-shader/tpf.c +++ b/libs/vkd3d-shader/tpf.c @@ -2519,7 +2519,7 @@ static bool type_is_integer(const struct hlsl_type *type) }
bool hlsl_sm4_register_from_semantic(struct hlsl_ctx *ctx, const struct hlsl_semantic *semantic, - bool output, unsigned int *type, enum vkd3d_sm4_swizzle_type *swizzle_type, bool *has_idx) + bool output, enum vkd3d_shader_register_type *type, enum vkd3d_sm4_swizzle_type *swizzle_type, bool *has_idx) { unsigned int i;
@@ -2529,24 +2529,24 @@ bool hlsl_sm4_register_from_semantic(struct hlsl_ctx *ctx, const struct hlsl_sem bool output; enum vkd3d_shader_type shader_type; enum vkd3d_sm4_swizzle_type swizzle_type; - enum vkd3d_sm4_register_type type; + enum vkd3d_shader_register_type type; bool has_idx; } register_table[] = { - {"sv_dispatchthreadid", false, VKD3D_SHADER_TYPE_COMPUTE, VKD3D_SM4_SWIZZLE_VEC4, VKD3D_SM5_RT_THREAD_ID, false}, - {"sv_groupid", false, VKD3D_SHADER_TYPE_COMPUTE, VKD3D_SM4_SWIZZLE_VEC4, VKD3D_SM5_RT_THREAD_GROUP_ID, false}, - {"sv_groupthreadid", false, VKD3D_SHADER_TYPE_COMPUTE, VKD3D_SM4_SWIZZLE_VEC4, VKD3D_SM5_RT_LOCAL_THREAD_ID, false}, + {"sv_dispatchthreadid", false, VKD3D_SHADER_TYPE_COMPUTE, VKD3D_SM4_SWIZZLE_VEC4, VKD3DSPR_THREADID, false}, + {"sv_groupid", false, VKD3D_SHADER_TYPE_COMPUTE, VKD3D_SM4_SWIZZLE_VEC4, VKD3DSPR_THREADGROUPID, false}, + {"sv_groupthreadid", false, VKD3D_SHADER_TYPE_COMPUTE, VKD3D_SM4_SWIZZLE_VEC4, VKD3DSPR_LOCALTHREADID, false},
- {"sv_primitiveid", false, VKD3D_SHADER_TYPE_GEOMETRY, VKD3D_SM4_SWIZZLE_MASK4, VKD3D_SM4_RT_PRIMID, false}, + {"sv_primitiveid", false, VKD3D_SHADER_TYPE_GEOMETRY, VKD3D_SM4_SWIZZLE_MASK4, VKD3DSPR_PRIMID, false},
/* Put sv_target in this table, instead of letting it fall through to * default varying allocation, so that the register index matches the * usage index. */ - {"color", true, VKD3D_SHADER_TYPE_PIXEL, VKD3D_SM4_SWIZZLE_VEC4, VKD3D_SM4_RT_OUTPUT, true}, - {"depth", true, VKD3D_SHADER_TYPE_PIXEL, VKD3D_SM4_SWIZZLE_VEC4, VKD3D_SM4_RT_DEPTHOUT, false}, - {"sv_depth", true, VKD3D_SHADER_TYPE_PIXEL, VKD3D_SM4_SWIZZLE_VEC4, VKD3D_SM4_RT_DEPTHOUT, false}, - {"sv_target", true, VKD3D_SHADER_TYPE_PIXEL, VKD3D_SM4_SWIZZLE_VEC4, VKD3D_SM4_RT_OUTPUT, true}, + {"color", true, VKD3D_SHADER_TYPE_PIXEL, VKD3D_SM4_SWIZZLE_VEC4, VKD3DSPR_OUTPUT, true}, + {"depth", true, VKD3D_SHADER_TYPE_PIXEL, VKD3D_SM4_SWIZZLE_VEC4, VKD3DSPR_DEPTHOUT, false}, + {"sv_depth", true, VKD3D_SHADER_TYPE_PIXEL, VKD3D_SM4_SWIZZLE_VEC4, VKD3DSPR_DEPTHOUT, false}, + {"sv_target", true, VKD3D_SHADER_TYPE_PIXEL, VKD3D_SM4_SWIZZLE_VEC4, VKD3DSPR_OUTPUT, true}, };
for (i = 0; i < ARRAY_SIZE(register_table); ++i) @@ -2555,7 +2555,8 @@ bool hlsl_sm4_register_from_semantic(struct hlsl_ctx *ctx, const struct hlsl_sem && output == register_table[i].output && ctx->profile->type == register_table[i].shader_type) { - *type = register_table[i].type; + if (type) + *type = register_table[i].type; if (swizzle_type) *swizzle_type = register_table[i].swizzle_type; *has_idx = register_table[i].has_idx; @@ -2652,7 +2653,6 @@ static void write_sm4_signature(struct hlsl_ctx *ctx, struct dxbc_writer *dxbc, LIST_FOR_EACH_ENTRY(var, &ctx->extern_vars, struct hlsl_ir_var, extern_entry) { unsigned int width = (1u << var->data_type->dimx) - 1, use_mask; - enum vkd3d_sm4_register_type type; uint32_t usage_idx, reg_idx; D3D_NAME usage; bool has_idx; @@ -2666,14 +2666,13 @@ static void write_sm4_signature(struct hlsl_ctx *ctx, struct dxbc_writer *dxbc, continue; usage_idx = var->semantic.index;
- if (hlsl_sm4_register_from_semantic(ctx, &var->semantic, output, &type, NULL, &has_idx)) + if (hlsl_sm4_register_from_semantic(ctx, &var->semantic, output, NULL, NULL, &has_idx)) { reg_idx = has_idx ? var->semantic.index : ~0u; } else { assert(var->regs[HLSL_REGSET_NUMERIC].allocated); - type = VKD3D_SM4_RT_INPUT; reg_idx = var->regs[HLSL_REGSET_NUMERIC].id; }
@@ -3431,7 +3430,7 @@ static uint32_t sm4_encode_instruction_modifier(const struct sm4_instruction_mod
struct sm4_register { - enum vkd3d_sm4_register_type type; + enum vkd3d_shader_register_type type; uint32_t idx[2]; unsigned int idx_count; enum vkd3d_sm4_dimension dim; @@ -3480,7 +3479,7 @@ static void sm4_register_from_deref(struct hlsl_ctx *ctx, struct sm4_register *r
if (regset == HLSL_REGSET_TEXTURES) { - reg->type = VKD3D_SM4_RT_RESOURCE; + reg->type = VKD3DSPR_RESOURCE; reg->dim = VKD3D_SM4_DIMENSION_VEC4; if (swizzle_type) *swizzle_type = VKD3D_SM4_SWIZZLE_VEC4; @@ -3492,7 +3491,7 @@ static void sm4_register_from_deref(struct hlsl_ctx *ctx, struct sm4_register *r } else if (regset == HLSL_REGSET_UAVS) { - reg->type = VKD3D_SM5_RT_UAV; + reg->type = VKD3DSPR_UAV; reg->dim = VKD3D_SM4_DIMENSION_VEC4; if (swizzle_type) *swizzle_type = VKD3D_SM4_SWIZZLE_VEC4; @@ -3504,7 +3503,7 @@ static void sm4_register_from_deref(struct hlsl_ctx *ctx, struct sm4_register *r } else if (regset == HLSL_REGSET_SAMPLERS) { - reg->type = VKD3D_SM4_RT_SAMPLER; + reg->type = VKD3DSPR_SAMPLER; reg->dim = VKD3D_SM4_DIMENSION_NONE; if (swizzle_type) *swizzle_type = VKD3D_SM4_SWIZZLE_MASK4; @@ -3519,7 +3518,7 @@ static void sm4_register_from_deref(struct hlsl_ctx *ctx, struct sm4_register *r unsigned int offset = hlsl_offset_from_deref_safe(ctx, deref) + var->buffer_offset;
assert(data_type->class <= HLSL_CLASS_VECTOR); - reg->type = VKD3D_SM4_RT_CONSTBUFFER; + reg->type = VKD3DSPR_CONSTBUFFER; reg->dim = VKD3D_SM4_DIMENSION_VEC4; if (swizzle_type) *swizzle_type = VKD3D_SM4_SWIZZLE_VEC4; @@ -3551,7 +3550,7 @@ static void sm4_register_from_deref(struct hlsl_ctx *ctx, struct sm4_register *r struct hlsl_reg hlsl_reg = hlsl_reg_from_deref(ctx, deref);
assert(hlsl_reg.allocated); - reg->type = VKD3D_SM4_RT_INPUT; + reg->type = VKD3DSPR_INPUT; reg->dim = VKD3D_SM4_DIMENSION_VEC4; if (swizzle_type) *swizzle_type = VKD3D_SM4_SWIZZLE_VEC4; @@ -3574,7 +3573,7 @@ static void sm4_register_from_deref(struct hlsl_ctx *ctx, struct sm4_register *r reg->idx_count = 1; }
- if (reg->type == VKD3D_SM4_RT_DEPTHOUT) + if (reg->type == VKD3DSPR_DEPTHOUT) reg->dim = VKD3D_SM4_DIMENSION_SCALAR; else reg->dim = VKD3D_SM4_DIMENSION_VEC4; @@ -3585,7 +3584,7 @@ static void sm4_register_from_deref(struct hlsl_ctx *ctx, struct sm4_register *r struct hlsl_reg hlsl_reg = hlsl_reg_from_deref(ctx, deref);
assert(hlsl_reg.allocated); - reg->type = VKD3D_SM4_RT_OUTPUT; + reg->type = VKD3DSPR_OUTPUT; reg->dim = VKD3D_SM4_DIMENSION_VEC4; reg->idx[0] = hlsl_reg.id; reg->idx_count = 1; @@ -3597,7 +3596,7 @@ static void sm4_register_from_deref(struct hlsl_ctx *ctx, struct sm4_register *r struct hlsl_reg hlsl_reg = hlsl_reg_from_deref(ctx, deref);
assert(hlsl_reg.allocated); - reg->type = VKD3D_SM4_RT_TEMP; + reg->type = VKD3DSPR_TEMP; reg->dim = VKD3D_SM4_DIMENSION_VEC4; if (swizzle_type) *swizzle_type = VKD3D_SM4_SWIZZLE_VEC4; @@ -3621,7 +3620,7 @@ static void sm4_register_from_node(struct sm4_register *reg, unsigned int *write enum vkd3d_sm4_swizzle_type *swizzle_type, const struct hlsl_ir_node *instr) { assert(instr->reg.allocated); - reg->type = VKD3D_SM4_RT_TEMP; + reg->type = VKD3DSPR_TEMP; reg->dim = VKD3D_SM4_DIMENSION_VEC4; *swizzle_type = VKD3D_SM4_SWIZZLE_VEC4; reg->idx[0] = instr->reg.id; @@ -3640,7 +3639,7 @@ static void sm4_src_from_constant_value(struct sm4_src_register *src, const struct hlsl_constant_value *value, unsigned int width, unsigned int map_writemask) { src->swizzle_type = VKD3D_SM4_SWIZZLE_MASK4; - src->reg.type = VKD3D_SM4_RT_IMMCONST; + src->reg.type = VKD3DSPR_IMMCONST; if (width == 1) { src->reg.dim = VKD3D_SM4_DIMENSION_SCALAR; @@ -3679,7 +3678,22 @@ static void sm4_src_from_node(struct sm4_src_register *src,
static uint32_t sm4_encode_register(const struct sm4_register *reg) { - return (reg->type << VKD3D_SM4_REGISTER_TYPE_SHIFT) + uint32_t sm4_reg_type = ~0u; + unsigned int i; + + /* Find sm4 register type from vkd3d_shader_register_type. */ + for (i = 0; i < ARRAY_SIZE(register_type_table); ++i) + { + if (reg->type == register_type_table[i]) + { + if (sm4_reg_type != ~0u) + ERR("Multiple maps for register_type %#x.\n", reg->type); + sm4_reg_type = i; + } + } + assert(sm4_reg_type != ~0u); + + return (sm4_reg_type << VKD3D_SM4_REGISTER_TYPE_SHIFT) | (reg->idx_count << VKD3D_SM4_REGISTER_ORDER_SHIFT) | (reg->dim << VKD3D_SM4_DIMENSION_SHIFT); } @@ -3687,7 +3701,7 @@ static uint32_t sm4_encode_register(const struct sm4_register *reg) static uint32_t sm4_register_order(const struct sm4_register *reg) { uint32_t order = 1; - if (reg->type == VKD3D_SM4_RT_IMMCONST) + if (reg->type == VKD3DSPR_IMMCONST) order += reg->dim == VKD3D_SM4_DIMENSION_VEC4 ? 4 : 1; order += reg->idx_count; if (reg->mod) @@ -3750,7 +3764,7 @@ static void write_sm4_instruction(struct vkd3d_bytecode_buffer *buffer, const st for (j = 0; j < instr->srcs[i].reg.idx_count; ++j) put_u32(buffer, instr->srcs[i].reg.idx[j]);
- if (instr->srcs[i].reg.type == VKD3D_SM4_RT_IMMCONST) + if (instr->srcs[i].reg.type == VKD3DSPR_IMMCONST) { put_u32(buffer, instr->srcs[i].reg.immconst_uint[0]); if (instr->srcs[i].reg.dim == VKD3D_SM4_DIMENSION_VEC4) @@ -3803,7 +3817,7 @@ static void write_sm4_dcl_constant_buffer(struct vkd3d_bytecode_buffer *buffer, .opcode = VKD3D_SM4_OP_DCL_CONSTANT_BUFFER,
.srcs[0].reg.dim = VKD3D_SM4_DIMENSION_VEC4, - .srcs[0].reg.type = VKD3D_SM4_RT_CONSTBUFFER, + .srcs[0].reg.type = VKD3DSPR_CONSTBUFFER, .srcs[0].reg.idx = {cbuffer->reg.id, (cbuffer->used_size + 3) / 4}, .srcs[0].reg.idx_count = 2, .srcs[0].swizzle_type = VKD3D_SM4_SWIZZLE_VEC4, @@ -3822,7 +3836,7 @@ static void write_sm4_dcl_samplers(struct hlsl_ctx *ctx, struct vkd3d_bytecode_b { .opcode = VKD3D_SM4_OP_DCL_SAMPLER,
- .dsts[0].reg.type = VKD3D_SM4_RT_SAMPLER, + .dsts[0].reg.type = VKD3DSPR_SAMPLER, .dsts[0].reg.idx_count = 1, .dst_count = 1, }; @@ -3863,7 +3877,7 @@ static void write_sm4_dcl_textures(struct hlsl_ctx *ctx, struct vkd3d_bytecode_b
instr = (struct sm4_instruction) { - .dsts[0].reg.type = uav ? VKD3D_SM5_RT_UAV : VKD3D_SM4_RT_RESOURCE, + .dsts[0].reg.type = uav ? VKD3DSPR_UAV : VKD3DSPR_RESOURCE, .dsts[0].reg.idx = {resource->id + i}, .dsts[0].reg.idx_count = 1, .dst_count = 1, @@ -3929,13 +3943,13 @@ static void write_sm4_dcl_semantic(struct hlsl_ctx *ctx, struct vkd3d_bytecode_b } else { - instr.dsts[0].reg.type = output ? VKD3D_SM4_RT_OUTPUT : VKD3D_SM4_RT_INPUT; + instr.dsts[0].reg.type = output ? VKD3DSPR_OUTPUT : VKD3DSPR_INPUT; instr.dsts[0].reg.idx[0] = var->regs[HLSL_REGSET_NUMERIC].id; instr.dsts[0].reg.idx_count = 1; instr.dsts[0].writemask = var->regs[HLSL_REGSET_NUMERIC].writemask; }
- if (instr.dsts[0].reg.type == VKD3D_SM4_RT_DEPTHOUT) + if (instr.dsts[0].reg.type == VKD3DSPR_DEPTHOUT) instr.dsts[0].reg.dim = VKD3D_SM4_DIMENSION_SCALAR;
hlsl_sm4_usage_from_semantic(ctx, &var->semantic, output, &usage); @@ -4067,7 +4081,7 @@ static void write_sm4_unary_op_with_two_destinations(struct vkd3d_bytecode_buffe assert(dst_idx < ARRAY_SIZE(instr.dsts)); sm4_dst_from_node(&instr.dsts[dst_idx], dst); assert(1 - dst_idx >= 0); - instr.dsts[1 - dst_idx].reg.type = VKD3D_SM4_RT_NULL; + instr.dsts[1 - dst_idx].reg.type = VKD3DSPR_NULL; instr.dsts[1 - dst_idx].reg.dim = VKD3D_SM4_DIMENSION_NONE; instr.dsts[1 - dst_idx].reg.idx_count = 0; instr.dst_count = 2; @@ -4127,7 +4141,7 @@ static void write_sm4_binary_op_with_two_destinations(struct vkd3d_bytecode_buff assert(dst_idx < ARRAY_SIZE(instr.dsts)); sm4_dst_from_node(&instr.dsts[dst_idx], dst); assert(1 - dst_idx >= 0); - instr.dsts[1 - dst_idx].reg.type = VKD3D_SM4_RT_NULL; + instr.dsts[1 - dst_idx].reg.type = VKD3DSPR_NULL; instr.dsts[1 - dst_idx].reg.dim = VKD3D_SM4_DIMENSION_NONE; instr.dsts[1 - dst_idx].reg.idx_count = 0; instr.dst_count = 2; @@ -4199,7 +4213,7 @@ static void write_sm4_ld(struct hlsl_ctx *ctx, struct vkd3d_bytecode_buffer *buf
memset(&instr.srcs[2], 0, sizeof(instr.srcs[2])); instr.srcs[2].swizzle_type = VKD3D_SM4_SWIZZLE_MASK4; - reg->type = VKD3D_SM4_RT_IMMCONST; + reg->type = VKD3DSPR_IMMCONST; reg->dim = VKD3D_SM4_DIMENSION_SCALAR; reg->immconst_uint[0] = index->value.u[0].u; } @@ -4318,7 +4332,7 @@ static void write_sm4_cast_from_bool(struct hlsl_ctx *ctx,
sm4_src_from_node(&instr.srcs[0], arg, instr.dsts[0].writemask); instr.srcs[1].swizzle_type = VKD3D_SM4_SWIZZLE_MASK4; - instr.srcs[1].reg.type = VKD3D_SM4_RT_IMMCONST; + instr.srcs[1].reg.type = VKD3DSPR_IMMCONST; instr.srcs[1].reg.dim = VKD3D_SM4_DIMENSION_SCALAR; instr.srcs[1].reg.immconst_uint[0] = mask; instr.src_count = 2;
From: Francisco Casas fcasas@codeweavers.com
--- libs/vkd3d-shader/tpf.c | 55 ++++++++++++++++++++++++----------------- 1 file changed, 32 insertions(+), 23 deletions(-)
diff --git a/libs/vkd3d-shader/tpf.c b/libs/vkd3d-shader/tpf.c index 2264afb8..d5937beb 100644 --- a/libs/vkd3d-shader/tpf.c +++ b/libs/vkd3d-shader/tpf.c @@ -3431,7 +3431,7 @@ static uint32_t sm4_encode_instruction_modifier(const struct sm4_instruction_mod struct sm4_register { enum vkd3d_shader_register_type type; - uint32_t idx[2]; + struct vkd3d_shader_register_index idx[2]; unsigned int idx_count; enum vkd3d_sm4_dimension dim; uint32_t immconst_uint[4]; @@ -3483,8 +3483,8 @@ static void sm4_register_from_deref(struct hlsl_ctx *ctx, struct sm4_register *r reg->dim = VKD3D_SM4_DIMENSION_VEC4; if (swizzle_type) *swizzle_type = VKD3D_SM4_SWIZZLE_VEC4; - reg->idx[0] = var->regs[HLSL_REGSET_TEXTURES].id; - reg->idx[0] += hlsl_offset_from_deref_safe(ctx, deref); + reg->idx[0].offset = var->regs[HLSL_REGSET_TEXTURES].id; + reg->idx[0].offset += hlsl_offset_from_deref_safe(ctx, deref); assert(regset == HLSL_REGSET_TEXTURES); reg->idx_count = 1; *writemask = VKD3DSP_WRITEMASK_ALL; @@ -3495,8 +3495,8 @@ static void sm4_register_from_deref(struct hlsl_ctx *ctx, struct sm4_register *r reg->dim = VKD3D_SM4_DIMENSION_VEC4; if (swizzle_type) *swizzle_type = VKD3D_SM4_SWIZZLE_VEC4; - reg->idx[0] = var->regs[HLSL_REGSET_UAVS].id; - reg->idx[0] += hlsl_offset_from_deref_safe(ctx, deref); + reg->idx[0].offset = var->regs[HLSL_REGSET_UAVS].id; + reg->idx[0].offset += hlsl_offset_from_deref_safe(ctx, deref); assert(regset == HLSL_REGSET_UAVS); reg->idx_count = 1; *writemask = VKD3DSP_WRITEMASK_ALL; @@ -3507,8 +3507,8 @@ static void sm4_register_from_deref(struct hlsl_ctx *ctx, struct sm4_register *r reg->dim = VKD3D_SM4_DIMENSION_NONE; if (swizzle_type) *swizzle_type = VKD3D_SM4_SWIZZLE_MASK4; - reg->idx[0] = var->regs[HLSL_REGSET_SAMPLERS].id; - reg->idx[0] += hlsl_offset_from_deref_safe(ctx, deref); + reg->idx[0].offset = var->regs[HLSL_REGSET_SAMPLERS].id; + reg->idx[0].offset += hlsl_offset_from_deref_safe(ctx, deref); assert(regset == HLSL_REGSET_SAMPLERS); reg->idx_count = 1; *writemask = VKD3DSP_WRITEMASK_ALL; @@ -3522,8 +3522,8 @@ static void sm4_register_from_deref(struct hlsl_ctx *ctx, struct sm4_register *r reg->dim = VKD3D_SM4_DIMENSION_VEC4; if (swizzle_type) *swizzle_type = VKD3D_SM4_SWIZZLE_VEC4; - reg->idx[0] = var->buffer->reg.id; - reg->idx[1] = offset / 4; + reg->idx[0].offset = var->buffer->reg.id; + reg->idx[1].offset = offset / 4; reg->idx_count = 2; *writemask = ((1u << data_type->dimx) - 1) << (offset & 3); } @@ -3538,7 +3538,7 @@ static void sm4_register_from_deref(struct hlsl_ctx *ctx, struct sm4_register *r
if (has_idx) { - reg->idx[0] = var->semantic.index + offset / 4; + reg->idx[0].offset = var->semantic.index + offset / 4; reg->idx_count = 1; }
@@ -3554,7 +3554,7 @@ static void sm4_register_from_deref(struct hlsl_ctx *ctx, struct sm4_register *r reg->dim = VKD3D_SM4_DIMENSION_VEC4; if (swizzle_type) *swizzle_type = VKD3D_SM4_SWIZZLE_VEC4; - reg->idx[0] = hlsl_reg.id; + reg->idx[0].offset = hlsl_reg.id; reg->idx_count = 1; *writemask = hlsl_reg.writemask; } @@ -3569,7 +3569,7 @@ static void sm4_register_from_deref(struct hlsl_ctx *ctx, struct sm4_register *r
if (has_idx) { - reg->idx[0] = var->semantic.index + offset / 4; + reg->idx[0].offset = var->semantic.index + offset / 4; reg->idx_count = 1; }
@@ -3586,7 +3586,7 @@ static void sm4_register_from_deref(struct hlsl_ctx *ctx, struct sm4_register *r assert(hlsl_reg.allocated); reg->type = VKD3DSPR_OUTPUT; reg->dim = VKD3D_SM4_DIMENSION_VEC4; - reg->idx[0] = hlsl_reg.id; + reg->idx[0].offset = hlsl_reg.id; reg->idx_count = 1; *writemask = hlsl_reg.writemask; } @@ -3600,7 +3600,7 @@ static void sm4_register_from_deref(struct hlsl_ctx *ctx, struct sm4_register *r reg->dim = VKD3D_SM4_DIMENSION_VEC4; if (swizzle_type) *swizzle_type = VKD3D_SM4_SWIZZLE_VEC4; - reg->idx[0] = hlsl_reg.id; + reg->idx[0].offset = hlsl_reg.id; reg->idx_count = 1; *writemask = hlsl_reg.writemask; } @@ -3623,7 +3623,7 @@ static void sm4_register_from_node(struct sm4_register *reg, unsigned int *write reg->type = VKD3DSPR_TEMP; reg->dim = VKD3D_SM4_DIMENSION_VEC4; *swizzle_type = VKD3D_SM4_SWIZZLE_VEC4; - reg->idx[0] = instr->reg.id; + reg->idx[0].offset = instr->reg.id; reg->idx_count = 1; *writemask = instr->reg.writemask; } @@ -3745,7 +3745,10 @@ static void write_sm4_instruction(struct vkd3d_bytecode_buffer *buffer, const st put_u32(buffer, token);
for (j = 0; j < instr->dsts[i].reg.idx_count; ++j) - put_u32(buffer, instr->dsts[i].reg.idx[j]); + { + put_u32(buffer, instr->dsts[i].reg.idx[j].offset); + assert(!instr->dsts[i].reg.idx[j].rel_addr); + } }
for (i = 0; i < instr->src_count; ++i) @@ -3762,7 +3765,10 @@ static void write_sm4_instruction(struct vkd3d_bytecode_buffer *buffer, const st | VKD3D_SM4_EXTENDED_OPERAND_MODIFIER);
for (j = 0; j < instr->srcs[i].reg.idx_count; ++j) - put_u32(buffer, instr->srcs[i].reg.idx[j]); + { + put_u32(buffer, instr->srcs[i].reg.idx[j].offset); + assert(!instr->srcs[i].reg.idx[j].rel_addr); + }
if (instr->srcs[i].reg.type == VKD3DSPR_IMMCONST) { @@ -3818,7 +3824,8 @@ static void write_sm4_dcl_constant_buffer(struct vkd3d_bytecode_buffer *buffer,
.srcs[0].reg.dim = VKD3D_SM4_DIMENSION_VEC4, .srcs[0].reg.type = VKD3DSPR_CONSTBUFFER, - .srcs[0].reg.idx = {cbuffer->reg.id, (cbuffer->used_size + 3) / 4}, + .srcs[0].reg.idx[0].offset = cbuffer->reg.id, + .srcs[0].reg.idx[1].offset = (cbuffer->used_size + 3) / 4, .srcs[0].reg.idx_count = 2, .srcs[0].swizzle_type = VKD3D_SM4_SWIZZLE_VEC4, .srcs[0].swizzle = HLSL_SWIZZLE(X, Y, Z, W), @@ -3853,7 +3860,7 @@ static void write_sm4_dcl_samplers(struct hlsl_ctx *ctx, struct vkd3d_bytecode_b if (resource->var && !resource->var->objects_usage[HLSL_REGSET_SAMPLERS][i].used) continue;
- instr.dsts[0].reg.idx[0] = resource->id + i; + instr.dsts[0].reg.idx[0].offset = resource->id + i; write_sm4_instruction(buffer, &instr); } } @@ -3878,7 +3885,7 @@ static void write_sm4_dcl_textures(struct hlsl_ctx *ctx, struct vkd3d_bytecode_b instr = (struct sm4_instruction) { .dsts[0].reg.type = uav ? VKD3DSPR_UAV : VKD3DSPR_RESOURCE, - .dsts[0].reg.idx = {resource->id + i}, + .dsts[0].reg.idx[0].offset = resource->id + i, .dsts[0].reg.idx_count = 1, .dst_count = 1,
@@ -3932,7 +3939,7 @@ static void write_sm4_dcl_semantic(struct hlsl_ctx *ctx, struct vkd3d_bytecode_b { if (has_idx) { - instr.dsts[0].reg.idx[0] = var->semantic.index; + instr.dsts[0].reg.idx[0].offset = var->semantic.index; instr.dsts[0].reg.idx_count = 1; } else @@ -3944,7 +3951,7 @@ static void write_sm4_dcl_semantic(struct hlsl_ctx *ctx, struct vkd3d_bytecode_b else { instr.dsts[0].reg.type = output ? VKD3DSPR_OUTPUT : VKD3DSPR_INPUT; - instr.dsts[0].reg.idx[0] = var->regs[HLSL_REGSET_NUMERIC].id; + instr.dsts[0].reg.idx[0].offset = var->regs[HLSL_REGSET_NUMERIC].id; instr.dsts[0].reg.idx_count = 1; instr.dsts[0].writemask = var->regs[HLSL_REGSET_NUMERIC].writemask; } @@ -4034,7 +4041,9 @@ static void write_sm4_dcl_thread_group(struct vkd3d_bytecode_buffer *buffer, con { .opcode = VKD3D_SM5_OP_DCL_THREAD_GROUP,
- .idx = {thread_count[0], thread_count[1], thread_count[2]}, + .idx[0] = thread_count[0], + .idx[1] = thread_count[1], + .idx[2] = thread_count[2], .idx_count = 3, };
On Fri Jul 21 00:52:08 2023 +0000, Francisco Casas wrote:
changed this line in [version 2 of the diff](/wine/vkd3d/-/merge_requests/281/diffs?diff_id=58577&start_sha=0ba7cf7e7d71559c2089db2e5df8d766f6bb823b#b9a696b7e2f161c9052ff1159d7d8cb2907cdeae_1550_1555)
I see. I did it in accordance with `get_opcode_info()`. I also changed it now.
From: Francisco Casas fcasas@codeweavers.com
--- libs/vkd3d-shader/tpf.c | 164 +++++++++++++++-------- libs/vkd3d-shader/vkd3d_shader_private.h | 2 + 2 files changed, 107 insertions(+), 59 deletions(-)
diff --git a/libs/vkd3d-shader/tpf.c b/libs/vkd3d-shader/tpf.c index d5937beb..8a1de888 100644 --- a/libs/vkd3d-shader/tpf.c +++ b/libs/vkd3d-shader/tpf.c @@ -430,6 +430,8 @@ enum vkd3d_sm4_register_type VKD3D_SM5_RT_DEPTHOUT_GREATER_EQUAL = 0x26, VKD3D_SM5_RT_DEPTHOUT_LESS_EQUAL = 0x27, VKD3D_SM5_RT_OUTPUT_STENCIL_REF = 0x29, + + VKD3D_SM4_REGISTER_TYPE_LAST = VKD3D_SM5_RT_OUTPUT_STENCIL_REF, };
enum vkd3d_sm4_extended_operand_type @@ -1468,50 +1470,50 @@ static const struct vkd3d_sm4_opcode_info opcode_table[] = {VKD3D_SM5_OP_CHECK_ACCESS_FULLY_MAPPED, VKD3DSIH_CHECK_ACCESS_FULLY_MAPPED, "u", "u"}, };
-static const enum vkd3d_shader_register_type register_type_table[] = -{ - /* VKD3D_SM4_RT_TEMP */ VKD3DSPR_TEMP, - /* VKD3D_SM4_RT_INPUT */ VKD3DSPR_INPUT, - /* VKD3D_SM4_RT_OUTPUT */ VKD3DSPR_OUTPUT, - /* VKD3D_SM4_RT_INDEXABLE_TEMP */ VKD3DSPR_IDXTEMP, - /* VKD3D_SM4_RT_IMMCONST */ VKD3DSPR_IMMCONST, - /* VKD3D_SM4_RT_IMMCONST64 */ VKD3DSPR_IMMCONST64, - /* VKD3D_SM4_RT_SAMPLER */ VKD3DSPR_SAMPLER, - /* VKD3D_SM4_RT_RESOURCE */ VKD3DSPR_RESOURCE, - /* VKD3D_SM4_RT_CONSTBUFFER */ VKD3DSPR_CONSTBUFFER, - /* VKD3D_SM4_RT_IMMCONSTBUFFER */ VKD3DSPR_IMMCONSTBUFFER, - /* UNKNOWN */ ~0u, - /* VKD3D_SM4_RT_PRIMID */ VKD3DSPR_PRIMID, - /* VKD3D_SM4_RT_DEPTHOUT */ VKD3DSPR_DEPTHOUT, - /* VKD3D_SM4_RT_NULL */ VKD3DSPR_NULL, - /* VKD3D_SM4_RT_RASTERIZER */ VKD3DSPR_RASTERIZER, - /* VKD3D_SM4_RT_OMASK */ VKD3DSPR_SAMPLEMASK, - /* VKD3D_SM5_RT_STREAM */ VKD3DSPR_STREAM, - /* VKD3D_SM5_RT_FUNCTION_BODY */ VKD3DSPR_FUNCTIONBODY, - /* UNKNOWN */ ~0u, - /* VKD3D_SM5_RT_FUNCTION_POINTER */ VKD3DSPR_FUNCTIONPOINTER, - /* UNKNOWN */ ~0u, - /* UNKNOWN */ ~0u, - /* VKD3D_SM5_RT_OUTPUT_CONTROL_POINT_ID */ VKD3DSPR_OUTPOINTID, - /* VKD3D_SM5_RT_FORK_INSTANCE_ID */ VKD3DSPR_FORKINSTID, - /* VKD3D_SM5_RT_JOIN_INSTANCE_ID */ VKD3DSPR_JOININSTID, - /* VKD3D_SM5_RT_INPUT_CONTROL_POINT */ VKD3DSPR_INCONTROLPOINT, - /* VKD3D_SM5_RT_OUTPUT_CONTROL_POINT */ VKD3DSPR_OUTCONTROLPOINT, - /* VKD3D_SM5_RT_PATCH_CONSTANT_DATA */ VKD3DSPR_PATCHCONST, - /* VKD3D_SM5_RT_DOMAIN_LOCATION */ VKD3DSPR_TESSCOORD, - /* UNKNOWN */ ~0u, - /* VKD3D_SM5_RT_UAV */ VKD3DSPR_UAV, - /* VKD3D_SM5_RT_SHARED_MEMORY */ VKD3DSPR_GROUPSHAREDMEM, - /* VKD3D_SM5_RT_THREAD_ID */ VKD3DSPR_THREADID, - /* VKD3D_SM5_RT_THREAD_GROUP_ID */ VKD3DSPR_THREADGROUPID, - /* VKD3D_SM5_RT_LOCAL_THREAD_ID */ VKD3DSPR_LOCALTHREADID, - /* VKD3D_SM5_RT_COVERAGE */ VKD3DSPR_COVERAGE, - /* VKD3D_SM5_RT_LOCAL_THREAD_INDEX */ VKD3DSPR_LOCALTHREADINDEX, - /* VKD3D_SM5_RT_GS_INSTANCE_ID */ VKD3DSPR_GSINSTID, - /* VKD3D_SM5_RT_DEPTHOUT_GREATER_EQUAL */ VKD3DSPR_DEPTHOUTGE, - /* VKD3D_SM5_RT_DEPTHOUT_LESS_EQUAL */ VKD3DSPR_DEPTHOUTLE, - /* VKD3D_SM5_RT_CYCLE_COUNTER */ ~0u, - /* VKD3D_SM5_RT_OUTPUT_STENCIL_REF */ VKD3DSPR_OUTSTENCILREF, +struct vkd3d_sm4_register_type_info +{ + enum vkd3d_sm4_register_type sm4_type; + enum vkd3d_shader_register_type vkd3d_type; +}; + +static const struct vkd3d_sm4_register_type_info register_type_table[] = +{ + {VKD3D_SM4_RT_TEMP, VKD3DSPR_TEMP}, + {VKD3D_SM4_RT_INPUT, VKD3DSPR_INPUT}, + {VKD3D_SM4_RT_OUTPUT, VKD3DSPR_OUTPUT}, + {VKD3D_SM4_RT_INDEXABLE_TEMP, VKD3DSPR_IDXTEMP}, + {VKD3D_SM4_RT_IMMCONST, VKD3DSPR_IMMCONST}, + {VKD3D_SM4_RT_IMMCONST64, VKD3DSPR_IMMCONST64}, + {VKD3D_SM4_RT_SAMPLER, VKD3DSPR_SAMPLER}, + {VKD3D_SM4_RT_RESOURCE, VKD3DSPR_RESOURCE}, + {VKD3D_SM4_RT_CONSTBUFFER, VKD3DSPR_CONSTBUFFER}, + {VKD3D_SM4_RT_IMMCONSTBUFFER, VKD3DSPR_IMMCONSTBUFFER}, + {VKD3D_SM4_RT_PRIMID, VKD3DSPR_PRIMID}, + {VKD3D_SM4_RT_DEPTHOUT, VKD3DSPR_DEPTHOUT}, + {VKD3D_SM4_RT_NULL, VKD3DSPR_NULL}, + {VKD3D_SM4_RT_RASTERIZER, VKD3DSPR_RASTERIZER}, + {VKD3D_SM4_RT_OMASK, VKD3DSPR_SAMPLEMASK}, + {VKD3D_SM5_RT_STREAM, VKD3DSPR_STREAM}, + {VKD3D_SM5_RT_FUNCTION_BODY, VKD3DSPR_FUNCTIONBODY}, + {VKD3D_SM5_RT_FUNCTION_POINTER, VKD3DSPR_FUNCTIONPOINTER}, + {VKD3D_SM5_RT_OUTPUT_CONTROL_POINT_ID, VKD3DSPR_OUTPOINTID}, + {VKD3D_SM5_RT_FORK_INSTANCE_ID, VKD3DSPR_FORKINSTID}, + {VKD3D_SM5_RT_JOIN_INSTANCE_ID, VKD3DSPR_JOININSTID}, + {VKD3D_SM5_RT_INPUT_CONTROL_POINT, VKD3DSPR_INCONTROLPOINT}, + {VKD3D_SM5_RT_OUTPUT_CONTROL_POINT, VKD3DSPR_OUTCONTROLPOINT}, + {VKD3D_SM5_RT_PATCH_CONSTANT_DATA, VKD3DSPR_PATCHCONST}, + {VKD3D_SM5_RT_DOMAIN_LOCATION, VKD3DSPR_TESSCOORD}, + {VKD3D_SM5_RT_UAV, VKD3DSPR_UAV}, + {VKD3D_SM5_RT_SHARED_MEMORY, VKD3DSPR_GROUPSHAREDMEM}, + {VKD3D_SM5_RT_THREAD_ID, VKD3DSPR_THREADID}, + {VKD3D_SM5_RT_THREAD_GROUP_ID, VKD3DSPR_THREADGROUPID}, + {VKD3D_SM5_RT_LOCAL_THREAD_ID, VKD3DSPR_LOCALTHREADID}, + {VKD3D_SM5_RT_COVERAGE, VKD3DSPR_COVERAGE}, + {VKD3D_SM5_RT_LOCAL_THREAD_INDEX, VKD3DSPR_LOCALTHREADINDEX}, + {VKD3D_SM5_RT_GS_INSTANCE_ID, VKD3DSPR_GSINSTID}, + {VKD3D_SM5_RT_DEPTHOUT_GREATER_EQUAL, VKD3DSPR_DEPTHOUTGE}, + {VKD3D_SM5_RT_DEPTHOUT_LESS_EQUAL, VKD3DSPR_DEPTHOUTLE}, + {VKD3D_SM5_RT_OUTPUT_STENCIL_REF, VKD3DSPR_OUTSTENCILREF}, };
static const enum vkd3d_shader_register_precision register_precision_table[] = @@ -1530,12 +1532,55 @@ static const struct vkd3d_sm4_opcode_info *get_opcode_info(enum vkd3d_sm4_opcode
for (i = 0; i < sizeof(opcode_table) / sizeof(*opcode_table); ++i) { - if (opcode == opcode_table[i].opcode) return &opcode_table[i]; + if (opcode == opcode_table[i].opcode) + return &opcode_table[i]; }
return NULL; }
+static const struct vkd3d_sm4_register_type_info *get_info_from_sm4_register_type( + enum vkd3d_sm4_register_type sm4_type) +{ + static const struct vkd3d_sm4_register_type_info *lookup[VKD3D_SM4_REGISTER_TYPE_LAST + 1] = {}; + static bool computed_lookup = false; + unsigned int i, t; + + if (!computed_lookup) + { + for (i = 0; i < ARRAY_SIZE(register_type_table); ++i) + { + t = register_type_table[i].sm4_type; + lookup[t] = ®ister_type_table[i]; + } + computed_lookup = true; + } + + assert(sm4_type <= VKD3D_SM4_REGISTER_TYPE_LAST); + return lookup[sm4_type]; +} + +static const struct vkd3d_sm4_register_type_info *get_info_from_vkd3d_register_type( + enum vkd3d_shader_register_type vkd3d_type) +{ + static const struct vkd3d_sm4_register_type_info *lookup[VKD3DSPR_REGISTER_TYPE_LAST + 1] = {}; + static bool computed_lookup = false; + unsigned int i, t; + + if (!computed_lookup) + { + for (i = 0; i < ARRAY_SIZE(register_type_table); ++i) + { + t = register_type_table[i].vkd3d_type; + lookup[t] = ®ister_type_table[i]; + } + computed_lookup = true; + } + + assert(vkd3d_type <= VKD3DSPR_REGISTER_TYPE_LAST); + return lookup[vkd3d_type]; +} + static void map_register(const struct vkd3d_shader_sm4_parser *sm4, struct vkd3d_shader_register *reg) { switch (sm4->p.shader_version.type) @@ -1642,6 +1687,7 @@ static bool sm4_register_is_descriptor(enum vkd3d_sm4_register_type register_typ static bool shader_sm4_read_param(struct vkd3d_shader_sm4_parser *priv, const uint32_t **ptr, const uint32_t *end, enum vkd3d_data_type data_type, struct vkd3d_shader_register *param, enum vkd3d_shader_src_modifier *modifier) { + const struct vkd3d_sm4_register_type_info *register_type_info; enum vkd3d_sm4_register_precision precision; enum vkd3d_sm4_register_type register_type; enum vkd3d_sm4_extended_operand_type type; @@ -1656,15 +1702,15 @@ static bool shader_sm4_read_param(struct vkd3d_shader_sm4_parser *priv, const ui token = *(*ptr)++;
register_type = (token & VKD3D_SM4_REGISTER_TYPE_MASK) >> VKD3D_SM4_REGISTER_TYPE_SHIFT; - if (register_type >= ARRAY_SIZE(register_type_table) - || register_type_table[register_type] == VKD3DSPR_INVALID) + register_type_info = get_info_from_sm4_register_type(register_type); + if (!register_type_info) { FIXME("Unhandled register type %#x.\n", register_type); param->type = VKD3DSPR_TEMP; } else { - param->type = register_type_table[register_type]; + param->type = register_type_info->vkd3d_type; } param->precision = VKD3D_SHADER_REGISTER_PRECISION_DEFAULT; param->non_uniform = false; @@ -3678,20 +3724,20 @@ static void sm4_src_from_node(struct sm4_src_register *src,
static uint32_t sm4_encode_register(const struct sm4_register *reg) { - uint32_t sm4_reg_type = ~0u; - unsigned int i; + const struct vkd3d_sm4_register_type_info *register_type_info; + uint32_t sm4_reg_type; + + register_type_info = get_info_from_vkd3d_register_type(reg->type);
- /* Find sm4 register type from vkd3d_shader_register_type. */ - for (i = 0; i < ARRAY_SIZE(register_type_table); ++i) + if (!register_type_info) { - if (reg->type == register_type_table[i]) - { - if (sm4_reg_type != ~0u) - ERR("Multiple maps for register_type %#x.\n", reg->type); - sm4_reg_type = i; - } + FIXME("Unhandled vkd3d-shader register type %#x.\n", reg->type); + sm4_reg_type = VKD3D_SM4_RT_TEMP; + } + else + { + sm4_reg_type = register_type_info->sm4_type; } - assert(sm4_reg_type != ~0u);
return (sm4_reg_type << VKD3D_SM4_REGISTER_TYPE_SHIFT) | (reg->idx_count << VKD3D_SM4_REGISTER_ORDER_SHIFT) diff --git a/libs/vkd3d-shader/vkd3d_shader_private.h b/libs/vkd3d-shader/vkd3d_shader_private.h index 49ad7b3e..48561fa0 100644 --- a/libs/vkd3d-shader/vkd3d_shader_private.h +++ b/libs/vkd3d-shader/vkd3d_shader_private.h @@ -508,6 +508,8 @@ enum vkd3d_shader_register_type VKD3DSPR_RASTERIZER, VKD3DSPR_OUTSTENCILREF,
+ VKD3DSPR_REGISTER_TYPE_LAST = VKD3DSPR_OUTSTENCILREF, + VKD3DSPR_INVALID = ~0u, };
On Thu Jul 20 11:01:57 2023 +0000, Giovanni Mascellani wrote:
Here you're ignoring `rel_addr`, which I guess is correct because at this stage it's never expected to be set, is that right? If so, please add an `assert()`, even if it's temporary. Same thing just below.
Yep, for now it is expected to remain NULL, until we implement non-constant offset dereferences. I added the asserts.
On Fri Jul 21 00:52:09 2023 +0000, Francisco Casas wrote:
changed this line in [version 2 of the diff](/wine/vkd3d/-/merge_requests/281/diffs?diff_id=58577&start_sha=0ba7cf7e7d71559c2089db2e5df8d766f6bb823b#b9a696b7e2f161c9052ff1159d7d8cb2907cdeae_1692_1710)
Right, at least so far we have a `vkd3d_shader_register_type` for each `vkd3d_sm4_register_type` in the register_type_table. I removed the check.
For next time, please make this two different commits.
Okay, splited.
Also, it would seem that `sm4_encode_register()` might remain as a useful helper, doesn't it?
I don't think so. It wouldn't spare us from calling get_register_type_info_from_handler() (and the pertaining checks) in both sm4_write_dst_register() and sm4_write_src_register() in later patches.
From: Francisco Casas fcasas@codeweavers.com
--- libs/vkd3d-shader/tpf.c | 84 +++++++++++++++++++++++++++-------------- 1 file changed, 55 insertions(+), 29 deletions(-)
diff --git a/libs/vkd3d-shader/tpf.c b/libs/vkd3d-shader/tpf.c index 8a1de888..a8d78f31 100644 --- a/libs/vkd3d-shader/tpf.c +++ b/libs/vkd3d-shader/tpf.c @@ -3744,6 +3744,60 @@ static uint32_t sm4_encode_register(const struct sm4_register *reg) | (reg->dim << VKD3D_SM4_DIMENSION_SHIFT); }
+static void sm4_write_src_register(struct vkd3d_bytecode_buffer *buffer, const struct sm4_src_register *src) +{ + const struct vkd3d_sm4_register_type_info *register_type_info; + uint32_t sm4_reg_type, reg_dim; + uint32_t token = 0; + unsigned int j; + + register_type_info = get_info_from_vkd3d_register_type(src->reg.type); + if (!register_type_info) + { + FIXME("Unhandled vkd3d-shader register type %#x.\n", src->reg.type); + sm4_reg_type = VKD3D_SM4_RT_TEMP; + } + else + { + sm4_reg_type = register_type_info->sm4_type; + } + + reg_dim = src->reg.dim; + + token |= sm4_reg_type << VKD3D_SM4_REGISTER_TYPE_SHIFT; + token |= src->reg.idx_count << VKD3D_SM4_REGISTER_ORDER_SHIFT; + token |= reg_dim << VKD3D_SM4_DIMENSION_SHIFT; + if (reg_dim == VKD3D_SM4_DIMENSION_VEC4) + { + token |= (uint32_t)src->swizzle_type << VKD3D_SM4_SWIZZLE_TYPE_SHIFT; + token |= src->swizzle << VKD3D_SM4_SWIZZLE_SHIFT; + } + if (src->reg.mod) + token |= VKD3D_SM4_EXTENDED_OPERAND; + put_u32(buffer, token); + + if (src->reg.mod) + put_u32(buffer, (src->reg.mod << VKD3D_SM4_REGISTER_MODIFIER_SHIFT) + | VKD3D_SM4_EXTENDED_OPERAND_MODIFIER); + + for (j = 0; j < src->reg.idx_count; ++j) + { + put_u32(buffer, src->reg.idx[j].offset); + assert(!src->reg.idx[j].rel_addr); + } + + if (src->reg.type == VKD3DSPR_IMMCONST) + { + put_u32(buffer, src->reg.immconst_uint[0]); + if (reg_dim == VKD3D_SM4_DIMENSION_VEC4) + { + put_u32(buffer, src->reg.immconst_uint[1]); + put_u32(buffer, src->reg.immconst_uint[2]); + put_u32(buffer, src->reg.immconst_uint[3]); + } + } +} + static uint32_t sm4_register_order(const struct sm4_register *reg) { uint32_t order = 1; @@ -3798,35 +3852,7 @@ static void write_sm4_instruction(struct vkd3d_bytecode_buffer *buffer, const st }
for (i = 0; i < instr->src_count; ++i) - { - token = sm4_encode_register(&instr->srcs[i].reg); - token |= (uint32_t)instr->srcs[i].swizzle_type << VKD3D_SM4_SWIZZLE_TYPE_SHIFT; - token |= instr->srcs[i].swizzle << VKD3D_SM4_SWIZZLE_SHIFT; - if (instr->srcs[i].reg.mod) - token |= VKD3D_SM4_EXTENDED_OPERAND; - put_u32(buffer, token); - - if (instr->srcs[i].reg.mod) - put_u32(buffer, (instr->srcs[i].reg.mod << VKD3D_SM4_REGISTER_MODIFIER_SHIFT) - | VKD3D_SM4_EXTENDED_OPERAND_MODIFIER); - - for (j = 0; j < instr->srcs[i].reg.idx_count; ++j) - { - put_u32(buffer, instr->srcs[i].reg.idx[j].offset); - assert(!instr->srcs[i].reg.idx[j].rel_addr); - } - - if (instr->srcs[i].reg.type == VKD3DSPR_IMMCONST) - { - put_u32(buffer, instr->srcs[i].reg.immconst_uint[0]); - if (instr->srcs[i].reg.dim == VKD3D_SM4_DIMENSION_VEC4) - { - put_u32(buffer, instr->srcs[i].reg.immconst_uint[1]); - put_u32(buffer, instr->srcs[i].reg.immconst_uint[2]); - put_u32(buffer, instr->srcs[i].reg.immconst_uint[3]); - } - } - } + sm4_write_src_register(buffer, &instr->srcs[i]);
if (instr->byte_stride) put_u32(buffer, instr->byte_stride);
On Fri Jul 21 00:52:10 2023 +0000, Francisco Casas wrote:
changed this line in [version 2 of the diff](/wine/vkd3d/-/merge_requests/281/diffs?diff_id=58577&start_sha=0ba7cf7e7d71559c2089db2e5df8d766f6bb823b#b9a696b7e2f161c9052ff1159d7d8cb2907cdeae_519_521)
Makes sense, I missed that we were adding the `u` when using this pattern.
On Fri Jul 21 00:52:11 2023 +0000, Francisco Casas wrote:
changed this line in [version 2 of the diff](/wine/vkd3d/-/merge_requests/281/diffs?diff_id=58577&start_sha=0ba7cf7e7d71559c2089db2e5df8d766f6bb823b#b9a696b7e2f161c9052ff1159d7d8cb2907cdeae_1488_1489)
I changed it. Albeit the `~0u` may have the advantage of being more noticeable.
From: Francisco Casas fcasas@codeweavers.com
--- libs/vkd3d-shader/tpf.c | 41 +++++++++++++++++++++-------------------- 1 file changed, 21 insertions(+), 20 deletions(-)
diff --git a/libs/vkd3d-shader/tpf.c b/libs/vkd3d-shader/tpf.c index a8d78f31..a4138e7d 100644 --- a/libs/vkd3d-shader/tpf.c +++ b/libs/vkd3d-shader/tpf.c @@ -3722,16 +3722,17 @@ static void sm4_src_from_node(struct sm4_src_register *src, src->swizzle = hlsl_map_swizzle(hlsl_swizzle_from_writemask(writemask), map_writemask); }
-static uint32_t sm4_encode_register(const struct sm4_register *reg) +static void sm4_write_dst_register(struct vkd3d_bytecode_buffer *buffer, const struct sm4_dst_register *dst) { const struct vkd3d_sm4_register_type_info *register_type_info; - uint32_t sm4_reg_type; - - register_type_info = get_info_from_vkd3d_register_type(reg->type); + uint32_t sm4_reg_type, reg_dim; + uint32_t token = 0; + unsigned int j;
+ register_type_info = get_info_from_vkd3d_register_type(dst->reg.type); if (!register_type_info) { - FIXME("Unhandled vkd3d-shader register type %#x.\n", reg->type); + FIXME("Unhandled vkd3d-shader register type %#x.\n", dst->reg.type); sm4_reg_type = VKD3D_SM4_RT_TEMP; } else @@ -3739,9 +3740,20 @@ static uint32_t sm4_encode_register(const struct sm4_register *reg) sm4_reg_type = register_type_info->sm4_type; }
- return (sm4_reg_type << VKD3D_SM4_REGISTER_TYPE_SHIFT) - | (reg->idx_count << VKD3D_SM4_REGISTER_ORDER_SHIFT) - | (reg->dim << VKD3D_SM4_DIMENSION_SHIFT); + reg_dim = dst->reg.dim; + + token |= sm4_reg_type << VKD3D_SM4_REGISTER_TYPE_SHIFT; + token |= dst->reg.idx_count << VKD3D_SM4_REGISTER_ORDER_SHIFT; + token |= reg_dim << VKD3D_SM4_DIMENSION_SHIFT; + if (reg_dim == VKD3D_SM4_DIMENSION_VEC4) + token |= dst->writemask << VKD3D_SM4_WRITEMASK_SHIFT; + put_u32(buffer, token); + + for (j = 0; j < dst->reg.idx_count; ++j) + { + put_u32(buffer, dst->reg.idx[j].offset); + assert(!dst->reg.idx[j].rel_addr); + } }
static void sm4_write_src_register(struct vkd3d_bytecode_buffer *buffer, const struct sm4_src_register *src) @@ -3838,18 +3850,7 @@ static void write_sm4_instruction(struct vkd3d_bytecode_buffer *buffer, const st }
for (i = 0; i < instr->dst_count; ++i) - { - token = sm4_encode_register(&instr->dsts[i].reg); - if (instr->dsts[i].reg.dim == VKD3D_SM4_DIMENSION_VEC4) - token |= instr->dsts[i].writemask << VKD3D_SM4_WRITEMASK_SHIFT; - put_u32(buffer, token); - - for (j = 0; j < instr->dsts[i].reg.idx_count; ++j) - { - put_u32(buffer, instr->dsts[i].reg.idx[j].offset); - assert(!instr->dsts[i].reg.idx[j].rel_addr); - } - } + sm4_write_dst_register(buffer, &instr->dsts[i]);
for (i = 0; i < instr->src_count; ++i) sm4_write_src_register(buffer, &instr->srcs[i]);
On Fri Jul 21 00:52:11 2023 +0000, Henri Verbeet wrote:
+static const struct vkd3d_sm4_register_type_info *get_register_type_info( + enum vkd3d_sm4_register_type type) +{ + unsigned int i; + + for (i = 0; i < ARRAY_SIZE(register_type_table); ++i) + { + if (type == register_type_table[i].type) return ®ister_type_table[i]; + } + + return NULL; +} + +static const struct vkd3d_sm4_register_type_info *get_register_type_info_from_handler( + enum vkd3d_shader_register_type handler) +{ + unsigned int i; + + for (i = 0; i < ARRAY_SIZE(register_type_table); ++i) + { + if (handler == register_type_table[i].handler_type) return ®ister_type_table[i]; + } + + return NULL; +}
We care about parser speed at least somewhat, and looping over the table like that is probably going to make it worse. (On that note, get_opcode_info() is one of the current offenders, and we should probably just index opcode_table[] by the opcode id.) We probably care slightly less about HLSL compilation time, but I'd probably feel better about simply having separate tables for vkd3d->sm4 and sm4->vkd3d. There may be different ways to get there, of course. The most straightforward approach would be to simply put two separate tables in tpf.c, and I think that would be fine. We could also generate those lookup tables from a single table like you have here though, either at runtime (for example, the first time get_register_type_info()/get_register_type_info_from_handler() is called) or at compile time. I don't love the names "type" and "handler_type"; I'd probably prefer some variant of "tpf_type"/"vkd3d_type", "sm4_type"/"ir_type", or something along those lines.
The `'s'` trick doesn't really thrill me, but right now I can't think
of anything better and I prefer to see this moving forward rather than bikeshedding that bit (which can always be improved later, if a better solution arise at some point). Likewise; I don't think it's great, but I can live with it. Does this series strictly depend on the first two patches by Conor?
I see, I modified get_register_type_info() and get_register_type_info_from_handler() to have static lookup arrays, initialized on the first call. If you think this solutions is fine, I can send a similar solution for get_opcode_info() as another MR.
Also, I renamed the functions to get_info_from_sm4_register_type() and get_info_from_vkd3d_register_type().
I also renamed the vkd3d_sm4_register_type_info fields as sm4_type and vkd3d_type.
Does this series strictly depend on the first two patches by Conor?
I see now that it is not as dependent as !269 was. I removed the first two patches and just replaced it with a patch that renames `VKD3D_SM4_SWIZZLE_NONE` to `VKD3D_SM4_SWIZZLE_MASK4`.
From: Francisco Casas fcasas@codeweavers.com
So far, for every register type we write we only use a single dimension type. The only exception being the sampler register for gather instructions, where the register dimension must be VEC4 (and the swizzle_type SCALAR) instead of NONE.
To get rid of sm4_src_register.dim, we need a way to handle this nasty exception, which is provided by this patch. --- libs/vkd3d-shader/tpf.c | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-)
diff --git a/libs/vkd3d-shader/tpf.c b/libs/vkd3d-shader/tpf.c index a4138e7d..ac74acbb 100644 --- a/libs/vkd3d-shader/tpf.c +++ b/libs/vkd3d-shader/tpf.c @@ -1198,6 +1198,7 @@ static void shader_sm5_read_sync(struct vkd3d_shader_instruction *ins, uint32_t * O -> VKD3D_DATA_OPAQUE * R -> VKD3D_DATA_RESOURCE * S -> VKD3D_DATA_SAMPLER + * s -> VKD3D_DATA_SAMPLER with scalar swizzle * U -> VKD3D_DATA_UAV */ static const struct vkd3d_sm4_opcode_info opcode_table[] = @@ -1329,7 +1330,7 @@ static const struct vkd3d_sm4_opcode_info opcode_table[] = {VKD3D_SM4_OP_DCL_GLOBAL_FLAGS, VKD3DSIH_DCL_GLOBAL_FLAGS, "", "", shader_sm4_read_dcl_global_flags}, {VKD3D_SM4_OP_LOD, VKD3DSIH_LOD, "f", "fRS"}, - {VKD3D_SM4_OP_GATHER4, VKD3DSIH_GATHER4, "u", "fRS"}, + {VKD3D_SM4_OP_GATHER4, VKD3DSIH_GATHER4, "u", "fRs"}, {VKD3D_SM4_OP_SAMPLE_POS, VKD3DSIH_SAMPLE_POS, "f", "Ru"}, {VKD3D_SM4_OP_SAMPLE_INFO, VKD3DSIH_SAMPLE_INFO, "f", "R"}, {VKD3D_SM5_OP_HS_DECLS, VKD3DSIH_HS_DECLS, "", ""}, @@ -1345,9 +1346,9 @@ static const struct vkd3d_sm4_opcode_info opcode_table[] = {VKD3D_SM5_OP_DERIV_RTX_FINE, VKD3DSIH_DSX_FINE, "f", "f"}, {VKD3D_SM5_OP_DERIV_RTY_COARSE, VKD3DSIH_DSY_COARSE, "f", "f"}, {VKD3D_SM5_OP_DERIV_RTY_FINE, VKD3DSIH_DSY_FINE, "f", "f"}, - {VKD3D_SM5_OP_GATHER4_C, VKD3DSIH_GATHER4_C, "f", "fRSf"}, - {VKD3D_SM5_OP_GATHER4_PO, VKD3DSIH_GATHER4_PO, "f", "fiRS"}, - {VKD3D_SM5_OP_GATHER4_PO_C, VKD3DSIH_GATHER4_PO_C, "f", "fiRSf"}, + {VKD3D_SM5_OP_GATHER4_C, VKD3DSIH_GATHER4_C, "f", "fRsf"}, + {VKD3D_SM5_OP_GATHER4_PO, VKD3DSIH_GATHER4_PO, "f", "fiRs"}, + {VKD3D_SM5_OP_GATHER4_PO_C, VKD3DSIH_GATHER4_PO_C, "f", "fiRsf"}, {VKD3D_SM5_OP_RCP, VKD3DSIH_RCP, "f", "f"}, {VKD3D_SM5_OP_F32TOF16, VKD3DSIH_F32TOF16, "u", "f"}, {VKD3D_SM5_OP_F16TOF32, VKD3DSIH_F16TOF32, "f", "u"}, @@ -1623,6 +1624,7 @@ static enum vkd3d_data_type map_data_type(char t) case 'R': return VKD3D_DATA_RESOURCE; case 'S': + case 's': return VKD3D_DATA_SAMPLER; case 'U': return VKD3D_DATA_UAV; @@ -3722,7 +3724,8 @@ static void sm4_src_from_node(struct sm4_src_register *src, src->swizzle = hlsl_map_swizzle(hlsl_swizzle_from_writemask(writemask), map_writemask); }
-static void sm4_write_dst_register(struct vkd3d_bytecode_buffer *buffer, const struct sm4_dst_register *dst) +static void sm4_write_dst_register(struct vkd3d_bytecode_buffer *buffer, const struct sm4_dst_register *dst, + char dst_info) { const struct vkd3d_sm4_register_type_info *register_type_info; uint32_t sm4_reg_type, reg_dim; @@ -3756,7 +3759,8 @@ static void sm4_write_dst_register(struct vkd3d_bytecode_buffer *buffer, const s } }
-static void sm4_write_src_register(struct vkd3d_bytecode_buffer *buffer, const struct sm4_src_register *src) +static void sm4_write_src_register(struct vkd3d_bytecode_buffer *buffer, const struct sm4_src_register *src, + char src_info) { const struct vkd3d_sm4_register_type_info *register_type_info; uint32_t sm4_reg_type, reg_dim; @@ -3775,6 +3779,8 @@ static void sm4_write_src_register(struct vkd3d_bytecode_buffer *buffer, const s }
reg_dim = src->reg.dim; + if (src_info == 's') + reg_dim = VKD3D_SM4_DIMENSION_VEC4;
token |= sm4_reg_type << VKD3D_SM4_REGISTER_TYPE_SHIFT; token |= src->reg.idx_count << VKD3D_SM4_REGISTER_ORDER_SHIFT; @@ -3823,6 +3829,7 @@ static uint32_t sm4_register_order(const struct sm4_register *reg)
static void write_sm4_instruction(struct vkd3d_bytecode_buffer *buffer, const struct sm4_instruction *instr) { + const struct vkd3d_sm4_opcode_info *opcode_info = get_opcode_info(instr->opcode); uint32_t token = instr->opcode; unsigned int size = 1, i, j;
@@ -3850,10 +3857,10 @@ static void write_sm4_instruction(struct vkd3d_bytecode_buffer *buffer, const st }
for (i = 0; i < instr->dst_count; ++i) - sm4_write_dst_register(buffer, &instr->dsts[i]); + sm4_write_dst_register(buffer, &instr->dsts[i], opcode_info ? opcode_info->dst_info[i] : ' ');
for (i = 0; i < instr->src_count; ++i) - sm4_write_src_register(buffer, &instr->srcs[i]); + sm4_write_src_register(buffer, &instr->srcs[i], opcode_info ? opcode_info->src_info[i] : ' ');
if (instr->byte_stride) put_u32(buffer, instr->byte_stride); @@ -5150,7 +5157,6 @@ static void write_sm4_gather(struct hlsl_ctx *ctx, struct vkd3d_bytecode_buffer
src = &instr.srcs[instr.src_count++]; sm4_src_from_deref(ctx, src, sampler, VKD3DSP_WRITEMASK_ALL); - src->reg.dim = VKD3D_SM4_DIMENSION_VEC4; src->swizzle_type = VKD3D_SM4_SWIZZLE_SCALAR; src->swizzle = swizzle;
From: Francisco Casas fcasas@codeweavers.com
--- libs/vkd3d-shader/tpf.c | 118 ++++++++++++++++++---------------------- 1 file changed, 53 insertions(+), 65 deletions(-)
diff --git a/libs/vkd3d-shader/tpf.c b/libs/vkd3d-shader/tpf.c index ac74acbb..cb688a14 100644 --- a/libs/vkd3d-shader/tpf.c +++ b/libs/vkd3d-shader/tpf.c @@ -517,6 +517,8 @@ enum vkd3d_sm4_dimension VKD3D_SM4_DIMENSION_NONE = 0x0, VKD3D_SM4_DIMENSION_SCALAR = 0x1, VKD3D_SM4_DIMENSION_VEC4 = 0x2, + + VKD3D_SM4_DIMENSION_INVALID = ~0u, };
enum vkd3d_sm4_resource_type @@ -1475,46 +1477,47 @@ struct vkd3d_sm4_register_type_info { enum vkd3d_sm4_register_type sm4_type; enum vkd3d_shader_register_type vkd3d_type; + enum vkd3d_sm4_dimension default_dimension; };
static const struct vkd3d_sm4_register_type_info register_type_table[] = { - {VKD3D_SM4_RT_TEMP, VKD3DSPR_TEMP}, - {VKD3D_SM4_RT_INPUT, VKD3DSPR_INPUT}, - {VKD3D_SM4_RT_OUTPUT, VKD3DSPR_OUTPUT}, - {VKD3D_SM4_RT_INDEXABLE_TEMP, VKD3DSPR_IDXTEMP}, - {VKD3D_SM4_RT_IMMCONST, VKD3DSPR_IMMCONST}, - {VKD3D_SM4_RT_IMMCONST64, VKD3DSPR_IMMCONST64}, - {VKD3D_SM4_RT_SAMPLER, VKD3DSPR_SAMPLER}, - {VKD3D_SM4_RT_RESOURCE, VKD3DSPR_RESOURCE}, - {VKD3D_SM4_RT_CONSTBUFFER, VKD3DSPR_CONSTBUFFER}, - {VKD3D_SM4_RT_IMMCONSTBUFFER, VKD3DSPR_IMMCONSTBUFFER}, - {VKD3D_SM4_RT_PRIMID, VKD3DSPR_PRIMID}, - {VKD3D_SM4_RT_DEPTHOUT, VKD3DSPR_DEPTHOUT}, - {VKD3D_SM4_RT_NULL, VKD3DSPR_NULL}, - {VKD3D_SM4_RT_RASTERIZER, VKD3DSPR_RASTERIZER}, - {VKD3D_SM4_RT_OMASK, VKD3DSPR_SAMPLEMASK}, - {VKD3D_SM5_RT_STREAM, VKD3DSPR_STREAM}, - {VKD3D_SM5_RT_FUNCTION_BODY, VKD3DSPR_FUNCTIONBODY}, - {VKD3D_SM5_RT_FUNCTION_POINTER, VKD3DSPR_FUNCTIONPOINTER}, - {VKD3D_SM5_RT_OUTPUT_CONTROL_POINT_ID, VKD3DSPR_OUTPOINTID}, - {VKD3D_SM5_RT_FORK_INSTANCE_ID, VKD3DSPR_FORKINSTID}, - {VKD3D_SM5_RT_JOIN_INSTANCE_ID, VKD3DSPR_JOININSTID}, - {VKD3D_SM5_RT_INPUT_CONTROL_POINT, VKD3DSPR_INCONTROLPOINT}, - {VKD3D_SM5_RT_OUTPUT_CONTROL_POINT, VKD3DSPR_OUTCONTROLPOINT}, - {VKD3D_SM5_RT_PATCH_CONSTANT_DATA, VKD3DSPR_PATCHCONST}, - {VKD3D_SM5_RT_DOMAIN_LOCATION, VKD3DSPR_TESSCOORD}, - {VKD3D_SM5_RT_UAV, VKD3DSPR_UAV}, - {VKD3D_SM5_RT_SHARED_MEMORY, VKD3DSPR_GROUPSHAREDMEM}, - {VKD3D_SM5_RT_THREAD_ID, VKD3DSPR_THREADID}, - {VKD3D_SM5_RT_THREAD_GROUP_ID, VKD3DSPR_THREADGROUPID}, - {VKD3D_SM5_RT_LOCAL_THREAD_ID, VKD3DSPR_LOCALTHREADID}, - {VKD3D_SM5_RT_COVERAGE, VKD3DSPR_COVERAGE}, - {VKD3D_SM5_RT_LOCAL_THREAD_INDEX, VKD3DSPR_LOCALTHREADINDEX}, - {VKD3D_SM5_RT_GS_INSTANCE_ID, VKD3DSPR_GSINSTID}, - {VKD3D_SM5_RT_DEPTHOUT_GREATER_EQUAL, VKD3DSPR_DEPTHOUTGE}, - {VKD3D_SM5_RT_DEPTHOUT_LESS_EQUAL, VKD3DSPR_DEPTHOUTLE}, - {VKD3D_SM5_RT_OUTPUT_STENCIL_REF, VKD3DSPR_OUTSTENCILREF}, + {VKD3D_SM4_RT_TEMP, VKD3DSPR_TEMP, VKD3D_SM4_DIMENSION_VEC4}, + {VKD3D_SM4_RT_INPUT, VKD3DSPR_INPUT, VKD3D_SM4_DIMENSION_VEC4}, + {VKD3D_SM4_RT_OUTPUT, VKD3DSPR_OUTPUT, VKD3D_SM4_DIMENSION_VEC4}, + {VKD3D_SM4_RT_INDEXABLE_TEMP, VKD3DSPR_IDXTEMP, VKD3D_SM4_DIMENSION_VEC4}, + {VKD3D_SM4_RT_IMMCONST, VKD3DSPR_IMMCONST, VKD3D_SM4_DIMENSION_INVALID}, + {VKD3D_SM4_RT_IMMCONST64, VKD3DSPR_IMMCONST64, VKD3D_SM4_DIMENSION_INVALID}, + {VKD3D_SM4_RT_SAMPLER, VKD3DSPR_SAMPLER, VKD3D_SM4_DIMENSION_NONE}, + {VKD3D_SM4_RT_RESOURCE, VKD3DSPR_RESOURCE, VKD3D_SM4_DIMENSION_VEC4}, + {VKD3D_SM4_RT_CONSTBUFFER, VKD3DSPR_CONSTBUFFER, VKD3D_SM4_DIMENSION_VEC4}, + {VKD3D_SM4_RT_IMMCONSTBUFFER, VKD3DSPR_IMMCONSTBUFFER, VKD3D_SM4_DIMENSION_VEC4}, + {VKD3D_SM4_RT_PRIMID, VKD3DSPR_PRIMID, VKD3D_SM4_DIMENSION_VEC4}, + {VKD3D_SM4_RT_DEPTHOUT, VKD3DSPR_DEPTHOUT, VKD3D_SM4_DIMENSION_SCALAR}, + {VKD3D_SM4_RT_NULL, VKD3DSPR_NULL, VKD3D_SM4_DIMENSION_NONE}, + {VKD3D_SM4_RT_RASTERIZER, VKD3DSPR_RASTERIZER, VKD3D_SM4_DIMENSION_VEC4}, + {VKD3D_SM4_RT_OMASK, VKD3DSPR_SAMPLEMASK, VKD3D_SM4_DIMENSION_VEC4}, + {VKD3D_SM5_RT_STREAM, VKD3DSPR_STREAM, VKD3D_SM4_DIMENSION_VEC4}, + {VKD3D_SM5_RT_FUNCTION_BODY, VKD3DSPR_FUNCTIONBODY, VKD3D_SM4_DIMENSION_VEC4}, + {VKD3D_SM5_RT_FUNCTION_POINTER, VKD3DSPR_FUNCTIONPOINTER, VKD3D_SM4_DIMENSION_VEC4}, + {VKD3D_SM5_RT_OUTPUT_CONTROL_POINT_ID, VKD3DSPR_OUTPOINTID, VKD3D_SM4_DIMENSION_VEC4}, + {VKD3D_SM5_RT_FORK_INSTANCE_ID, VKD3DSPR_FORKINSTID, VKD3D_SM4_DIMENSION_VEC4}, + {VKD3D_SM5_RT_JOIN_INSTANCE_ID, VKD3DSPR_JOININSTID, VKD3D_SM4_DIMENSION_VEC4}, + {VKD3D_SM5_RT_INPUT_CONTROL_POINT, VKD3DSPR_INCONTROLPOINT, VKD3D_SM4_DIMENSION_VEC4}, + {VKD3D_SM5_RT_OUTPUT_CONTROL_POINT, VKD3DSPR_OUTCONTROLPOINT, VKD3D_SM4_DIMENSION_VEC4}, + {VKD3D_SM5_RT_PATCH_CONSTANT_DATA, VKD3DSPR_PATCHCONST, VKD3D_SM4_DIMENSION_VEC4}, + {VKD3D_SM5_RT_DOMAIN_LOCATION, VKD3DSPR_TESSCOORD, VKD3D_SM4_DIMENSION_VEC4}, + {VKD3D_SM5_RT_UAV, VKD3DSPR_UAV, VKD3D_SM4_DIMENSION_VEC4}, + {VKD3D_SM5_RT_SHARED_MEMORY, VKD3DSPR_GROUPSHAREDMEM, VKD3D_SM4_DIMENSION_VEC4}, + {VKD3D_SM5_RT_THREAD_ID, VKD3DSPR_THREADID, VKD3D_SM4_DIMENSION_VEC4}, + {VKD3D_SM5_RT_THREAD_GROUP_ID, VKD3DSPR_THREADGROUPID, VKD3D_SM4_DIMENSION_VEC4}, + {VKD3D_SM5_RT_LOCAL_THREAD_ID, VKD3DSPR_LOCALTHREADID, VKD3D_SM4_DIMENSION_VEC4}, + {VKD3D_SM5_RT_COVERAGE, VKD3DSPR_COVERAGE, VKD3D_SM4_DIMENSION_VEC4}, + {VKD3D_SM5_RT_LOCAL_THREAD_INDEX, VKD3DSPR_LOCALTHREADINDEX,VKD3D_SM4_DIMENSION_VEC4}, + {VKD3D_SM5_RT_GS_INSTANCE_ID, VKD3DSPR_GSINSTID, VKD3D_SM4_DIMENSION_VEC4}, + {VKD3D_SM5_RT_DEPTHOUT_GREATER_EQUAL, VKD3DSPR_DEPTHOUTGE, VKD3D_SM4_DIMENSION_VEC4}, + {VKD3D_SM5_RT_DEPTHOUT_LESS_EQUAL, VKD3DSPR_DEPTHOUTLE, VKD3D_SM4_DIMENSION_VEC4}, + {VKD3D_SM5_RT_OUTPUT_STENCIL_REF, VKD3DSPR_OUTSTENCILREF, VKD3D_SM4_DIMENSION_VEC4}, };
static const enum vkd3d_shader_register_precision register_precision_table[] = @@ -3481,7 +3484,7 @@ struct sm4_register enum vkd3d_shader_register_type type; struct vkd3d_shader_register_index idx[2]; unsigned int idx_count; - enum vkd3d_sm4_dimension dim; + enum vkd3d_immconst_type immconst_type; uint32_t immconst_uint[4]; unsigned int mod; }; @@ -3528,7 +3531,6 @@ static void sm4_register_from_deref(struct hlsl_ctx *ctx, struct sm4_register *r if (regset == HLSL_REGSET_TEXTURES) { reg->type = VKD3DSPR_RESOURCE; - reg->dim = VKD3D_SM4_DIMENSION_VEC4; if (swizzle_type) *swizzle_type = VKD3D_SM4_SWIZZLE_VEC4; reg->idx[0].offset = var->regs[HLSL_REGSET_TEXTURES].id; @@ -3540,7 +3542,6 @@ static void sm4_register_from_deref(struct hlsl_ctx *ctx, struct sm4_register *r else if (regset == HLSL_REGSET_UAVS) { reg->type = VKD3DSPR_UAV; - reg->dim = VKD3D_SM4_DIMENSION_VEC4; if (swizzle_type) *swizzle_type = VKD3D_SM4_SWIZZLE_VEC4; reg->idx[0].offset = var->regs[HLSL_REGSET_UAVS].id; @@ -3552,7 +3553,6 @@ static void sm4_register_from_deref(struct hlsl_ctx *ctx, struct sm4_register *r else if (regset == HLSL_REGSET_SAMPLERS) { reg->type = VKD3DSPR_SAMPLER; - reg->dim = VKD3D_SM4_DIMENSION_NONE; if (swizzle_type) *swizzle_type = VKD3D_SM4_SWIZZLE_MASK4; reg->idx[0].offset = var->regs[HLSL_REGSET_SAMPLERS].id; @@ -3567,7 +3567,6 @@ static void sm4_register_from_deref(struct hlsl_ctx *ctx, struct sm4_register *r
assert(data_type->class <= HLSL_CLASS_VECTOR); reg->type = VKD3DSPR_CONSTBUFFER; - reg->dim = VKD3D_SM4_DIMENSION_VEC4; if (swizzle_type) *swizzle_type = VKD3D_SM4_SWIZZLE_VEC4; reg->idx[0].offset = var->buffer->reg.id; @@ -3590,7 +3589,6 @@ static void sm4_register_from_deref(struct hlsl_ctx *ctx, struct sm4_register *r reg->idx_count = 1; }
- reg->dim = VKD3D_SM4_DIMENSION_VEC4; *writemask = ((1u << data_type->dimx) - 1) << (offset % 4); } else @@ -3599,7 +3597,6 @@ static void sm4_register_from_deref(struct hlsl_ctx *ctx, struct sm4_register *r
assert(hlsl_reg.allocated); reg->type = VKD3DSPR_INPUT; - reg->dim = VKD3D_SM4_DIMENSION_VEC4; if (swizzle_type) *swizzle_type = VKD3D_SM4_SWIZZLE_VEC4; reg->idx[0].offset = hlsl_reg.id; @@ -3621,10 +3618,6 @@ static void sm4_register_from_deref(struct hlsl_ctx *ctx, struct sm4_register *r reg->idx_count = 1; }
- if (reg->type == VKD3DSPR_DEPTHOUT) - reg->dim = VKD3D_SM4_DIMENSION_SCALAR; - else - reg->dim = VKD3D_SM4_DIMENSION_VEC4; *writemask = ((1u << data_type->dimx) - 1) << (offset % 4); } else @@ -3633,7 +3626,6 @@ static void sm4_register_from_deref(struct hlsl_ctx *ctx, struct sm4_register *r
assert(hlsl_reg.allocated); reg->type = VKD3DSPR_OUTPUT; - reg->dim = VKD3D_SM4_DIMENSION_VEC4; reg->idx[0].offset = hlsl_reg.id; reg->idx_count = 1; *writemask = hlsl_reg.writemask; @@ -3645,7 +3637,6 @@ static void sm4_register_from_deref(struct hlsl_ctx *ctx, struct sm4_register *r
assert(hlsl_reg.allocated); reg->type = VKD3DSPR_TEMP; - reg->dim = VKD3D_SM4_DIMENSION_VEC4; if (swizzle_type) *swizzle_type = VKD3D_SM4_SWIZZLE_VEC4; reg->idx[0].offset = hlsl_reg.id; @@ -3669,7 +3660,6 @@ static void sm4_register_from_node(struct sm4_register *reg, unsigned int *write { assert(instr->reg.allocated); reg->type = VKD3DSPR_TEMP; - reg->dim = VKD3D_SM4_DIMENSION_VEC4; *swizzle_type = VKD3D_SM4_SWIZZLE_VEC4; reg->idx[0].offset = instr->reg.id; reg->idx_count = 1; @@ -3690,14 +3680,14 @@ static void sm4_src_from_constant_value(struct sm4_src_register *src, src->reg.type = VKD3DSPR_IMMCONST; if (width == 1) { - src->reg.dim = VKD3D_SM4_DIMENSION_SCALAR; + src->reg.immconst_type = VKD3D_IMMCONST_SCALAR; src->reg.immconst_uint[0] = value->u[0].u; } else { unsigned int i, j = 0;
- src->reg.dim = VKD3D_SM4_DIMENSION_VEC4; + src->reg.immconst_type = VKD3D_IMMCONST_VEC4; for (i = 0; i < 4; ++i) { if (map_writemask & (1u << i)) @@ -3737,14 +3727,14 @@ static void sm4_write_dst_register(struct vkd3d_bytecode_buffer *buffer, const s { FIXME("Unhandled vkd3d-shader register type %#x.\n", dst->reg.type); sm4_reg_type = VKD3D_SM4_RT_TEMP; + reg_dim = VKD3D_SM4_DIMENSION_VEC4; } else { sm4_reg_type = register_type_info->sm4_type; + reg_dim = register_type_info->default_dimension; }
- reg_dim = dst->reg.dim; - token |= sm4_reg_type << VKD3D_SM4_REGISTER_TYPE_SHIFT; token |= dst->reg.idx_count << VKD3D_SM4_REGISTER_ORDER_SHIFT; token |= reg_dim << VKD3D_SM4_DIMENSION_SHIFT; @@ -3772,16 +3762,21 @@ static void sm4_write_src_register(struct vkd3d_bytecode_buffer *buffer, const s { FIXME("Unhandled vkd3d-shader register type %#x.\n", src->reg.type); sm4_reg_type = VKD3D_SM4_RT_TEMP; + reg_dim = VKD3D_SM4_DIMENSION_VEC4; } else { sm4_reg_type = register_type_info->sm4_type; + reg_dim = register_type_info->default_dimension; }
- reg_dim = src->reg.dim; + if (src->reg.type == VKD3DSPR_IMMCONST) + reg_dim = src->reg.immconst_type == VKD3D_IMMCONST_VEC4 ? VKD3D_SM4_DIMENSION_VEC4 : VKD3D_SM4_DIMENSION_SCALAR; if (src_info == 's') reg_dim = VKD3D_SM4_DIMENSION_VEC4;
+ assert(reg_dim != VKD3D_SM4_DIMENSION_INVALID); + token |= sm4_reg_type << VKD3D_SM4_REGISTER_TYPE_SHIFT; token |= src->reg.idx_count << VKD3D_SM4_REGISTER_ORDER_SHIFT; token |= reg_dim << VKD3D_SM4_DIMENSION_SHIFT; @@ -3820,7 +3815,7 @@ static uint32_t sm4_register_order(const struct sm4_register *reg) { uint32_t order = 1; if (reg->type == VKD3DSPR_IMMCONST) - order += reg->dim == VKD3D_SM4_DIMENSION_VEC4 ? 4 : 1; + order += reg->immconst_type == VKD3D_IMMCONST_VEC4 ? 4 : 1; order += reg->idx_count; if (reg->mod) ++order; @@ -3902,7 +3897,6 @@ static void write_sm4_dcl_constant_buffer(struct vkd3d_bytecode_buffer *buffer, { .opcode = VKD3D_SM4_OP_DCL_CONSTANT_BUFFER,
- .srcs[0].reg.dim = VKD3D_SM4_DIMENSION_VEC4, .srcs[0].reg.type = VKD3DSPR_CONSTBUFFER, .srcs[0].reg.idx[0].offset = cbuffer->reg.id, .srcs[0].reg.idx[1].offset = (cbuffer->used_size + 3) / 4, @@ -4011,7 +4005,6 @@ static void write_sm4_dcl_semantic(struct hlsl_ctx *ctx, struct vkd3d_bytecode_b
struct sm4_instruction instr = { - .dsts[0].reg.dim = VKD3D_SM4_DIMENSION_VEC4, .dst_count = 1, };
@@ -4036,9 +4029,6 @@ static void write_sm4_dcl_semantic(struct hlsl_ctx *ctx, struct vkd3d_bytecode_b instr.dsts[0].writemask = var->regs[HLSL_REGSET_NUMERIC].writemask; }
- if (instr.dsts[0].reg.type == VKD3DSPR_DEPTHOUT) - instr.dsts[0].reg.dim = VKD3D_SM4_DIMENSION_SCALAR; - hlsl_sm4_usage_from_semantic(ctx, &var->semantic, output, &usage); if (usage == ~0u) usage = D3D_NAME_UNDEFINED; @@ -4171,7 +4161,6 @@ static void write_sm4_unary_op_with_two_destinations(struct vkd3d_bytecode_buffe sm4_dst_from_node(&instr.dsts[dst_idx], dst); assert(1 - dst_idx >= 0); instr.dsts[1 - dst_idx].reg.type = VKD3DSPR_NULL; - instr.dsts[1 - dst_idx].reg.dim = VKD3D_SM4_DIMENSION_NONE; instr.dsts[1 - dst_idx].reg.idx_count = 0; instr.dst_count = 2;
@@ -4231,7 +4220,6 @@ static void write_sm4_binary_op_with_two_destinations(struct vkd3d_bytecode_buff sm4_dst_from_node(&instr.dsts[dst_idx], dst); assert(1 - dst_idx >= 0); instr.dsts[1 - dst_idx].reg.type = VKD3DSPR_NULL; - instr.dsts[1 - dst_idx].reg.dim = VKD3D_SM4_DIMENSION_NONE; instr.dsts[1 - dst_idx].reg.idx_count = 0; instr.dst_count = 2;
@@ -4303,7 +4291,7 @@ static void write_sm4_ld(struct hlsl_ctx *ctx, struct vkd3d_bytecode_buffer *buf memset(&instr.srcs[2], 0, sizeof(instr.srcs[2])); instr.srcs[2].swizzle_type = VKD3D_SM4_SWIZZLE_MASK4; reg->type = VKD3DSPR_IMMCONST; - reg->dim = VKD3D_SM4_DIMENSION_SCALAR; + reg->immconst_type = VKD3D_IMMCONST_SCALAR; reg->immconst_uint[0] = index->value.u[0].u; } else if (ctx->profile->major_version == 4 && ctx->profile->minor_version == 0) @@ -4422,7 +4410,7 @@ static void write_sm4_cast_from_bool(struct hlsl_ctx *ctx, sm4_src_from_node(&instr.srcs[0], arg, instr.dsts[0].writemask); instr.srcs[1].swizzle_type = VKD3D_SM4_SWIZZLE_MASK4; instr.srcs[1].reg.type = VKD3DSPR_IMMCONST; - instr.srcs[1].reg.dim = VKD3D_SM4_DIMENSION_SCALAR; + instr.srcs[1].reg.immconst_type = VKD3D_IMMCONST_SCALAR; instr.srcs[1].reg.immconst_uint[0] = mask; instr.src_count = 2;
I see, I modified get_register_type_info() and get_register_type_info_from_handler() to have static lookup arrays, initialized on the first call. If you think this solutions is fine, I can send a similar solution for get_opcode_info() as another MR.
Unfortunately that's slightly wrong: without appropriate synchronization two different threads might try to do that initialization step concurrently of step on each other. I believe it's very improbable that anything bad happens, honestly, but it's the kind of thing I'd try to avoid. C doesn't offer, AFAICT, a fully satisfying way to do what you want to do in a DRY and self-contained way (which is part of why I didn't raise this point in my review). The ways forward I can see are these: * Do the same thing, but use proper locking and/or atomic operations to ensure concurrent threads do the right thing. It can be tricky and it forces us to deal with threading API in a library that probably shouldn't. I'd avoid that. * Do the same thing, but store the lookup tables in the parser context instead of in a static variable. There's a bit of CPU cycle wasting, but it's a small thing, and concurrent threads work on independent tables and do not step on each other. * Don't do anything fancy, just write both lookup tables in the code. It's not DRY, so in principle I hate that, but given that the two tables are still rather small and C really tries to get in your way here, it still feels one of the best alternatives. * Turn to the dark side and embrace the C way: the preprocessor! Create a file full of `X(VKD3D_SM4_RT_TEMP, VKD3DSPR_TEMP)` lines and include each time you need it defining `X()` appropriately each time. Once you're done with that you start doing things like `#define E } else {` and you're halfway submitting vkd3d-shader to the IOCCC. It's a great career path, though, I think you can get consultancy work from Satan himself.
Notice that with the third and fourth alternatives you can also write a `switch` statement instead of a lookup table, in theory giving your compiler even more optimization opportunities. Not sure the compiler is going to make good use of them, though.
I'd go with the second or third alternative.
On Fri Jul 21 14:11:14 2023 +0000, Giovanni Mascellani wrote:
I see, I modified get_register_type_info() and
get_register_type_info_from_handler() to have static lookup arrays, initialized on the first call. If you think this solutions is fine, I can send a similar solution for get_opcode_info() as another MR. Unfortunately that's slightly wrong: without appropriate synchronization two different threads might try to do that initialization step concurrently of step on each other. I believe it's very improbable that anything bad happens, honestly, but it's the kind of thing I'd try to avoid. C doesn't offer, AFAICT, a fully satisfying way to do what you want to do in a DRY and self-contained way (which is part of why I didn't raise this point in my review). The ways forward I can see are these:
- Do the same thing, but use proper locking and/or atomic operations to
ensure concurrent threads do the right thing. It can be tricky and it forces us to deal with threading API in a library that probably shouldn't. I'd avoid that.
- Do the same thing, but store the lookup tables in the parser context
instead of in a static variable. There's a bit of CPU cycle wasting, but it's a small thing, and concurrent threads work on independent tables and do not step on each other.
- Don't do anything fancy, just write both lookup tables in the code.
It's not DRY, so in principle I hate that, but given that the two tables are still rather small and C really tries to get in your way here, it still feels one of the best alternatives.
- Turn to the dark side and embrace the C way: the preprocessor! Create
a file full of `X(VKD3D_SM4_RT_TEMP, VKD3DSPR_TEMP)` lines and include each time you need it defining `X()` appropriately each time. Once you're done with that you start doing things like `#define E } else {` and you're halfway submitting vkd3d-shader to the IOCCC. It's a great career path, though, I think you can get consultancy work from Satan himself. Notice that with the third and fourth alternatives you can also write a `switch` statement instead of a lookup table, in theory giving your compiler even more optimization opportunities. Not sure the compiler is going to make good use of them, though. I'd go with the second or third alternative.
I see your point. AFAIK hlsl_sm4_write() always gets called by a single thread, so I guess that the problem may happen when calling the disassembler. I have to understand that code path better.
Anyways, notice that the initialization part of the functions just read from `register_type_table`, which is const, and write values to `lookup` without reading it, and the values are the same regardless of which thread is writing them. So, even if many threads were to pass the `!computed_lookup` check at the same time, they wouldn't be stepping on each other.
I dare to say that this is one of the weird cases where parallelization doesn't require synchronization structs. Now, the question is if it is possible that the compiler does some optimization that breaks this rather special situation.
I will explore the second and third alternatives though.
I see your point. AFAIK hlsl_sm4_write() always gets called by a single thread
Why? If the client calls `vkd3d_shader_compile()` from two different threads at the same time it can happen that the two `hlsl_sm4_write()` calls execute at the same time, can't it?
I agree that it's probable that in practice nothing bad happens, at least on the architectures we care about, because you're just going to do aligned pointer accesses. But technically concurrent access to the same memory location is UB in C (if at least one of the accesses is for writing), so I think we'd better stay away from that. Not sure if others have different opinions on the matter.
Why? If the client calls `vkd3d_shader_compile()` from two different threads at the same time it can happen that the two `hlsl_sm4_write()` calls execute at the same time, can't it?
Ah, right! That didn't occur to me.
Does this series strictly depend on the first two patches by Conor?
I see now that it is not as dependent as !269 was. I removed the first two patches and just replaced it with a patch that renames `VKD3D_SM4_SWIZZLE_NONE` to `VKD3D_SM4_SWIZZLE_MASK4`.
So it actually doesn't depend on those patches at all. :) This series might have some minor conflicts with that series, but we should simply deal with those once they actually happen; it's entirely possible for this series to go in first.
bool hlsl_sm4_register_from_semantic(struct hlsl_ctx *ctx, const struct hlsl_semantic *semantic, - bool output, unsigned int *type, enum vkd3d_sm4_swizzle_type *swizzle_type, bool *has_idx) + bool output, enum vkd3d_shader_register_type *type, enum vkd3d_sm4_swizzle_type *swizzle_type, bool *has_idx)
You should update the prototype in hlsl.h as well.
@@ -2555,7 +2555,8 @@ bool hlsl_sm4_register_from_semantic(struct hlsl_ctx *ctx, const struct hlsl_sem && output == register_table[i].output && ctx->profile->type == register_table[i].shader_type) { - *type = register_table[i].type; + if (type) + *type = register_table[i].type; if (swizzle_type) *swizzle_type = register_table[i].swizzle_type; *has_idx = register_table[i].has_idx; @@ -2652,7 +2653,6 @@ static void write_sm4_signature(struct hlsl_ctx *ctx, struct dxbc_writer *dxbc, LIST_FOR_EACH_ENTRY(var, &ctx->extern_vars, struct hlsl_ir_var, extern_entry) { unsigned int width = (1u << var->data_type->dimx) - 1, use_mask; - enum vkd3d_sm4_register_type type; uint32_t usage_idx, reg_idx; D3D_NAME usage; bool has_idx; @@ -2666,14 +2666,13 @@ static void write_sm4_signature(struct hlsl_ctx *ctx, struct dxbc_writer *dxbc, continue; usage_idx = var->semantic.index; - if (hlsl_sm4_register_from_semantic(ctx, &var->semantic, output, &type, NULL, &has_idx)) + if (hlsl_sm4_register_from_semantic(ctx, &var->semantic, output, NULL, NULL, &has_idx)) { reg_idx = has_idx ? var->semantic.index : ~0u; } else { assert(var->regs[HLSL_REGSET_NUMERIC].allocated); - type = VKD3D_SM4_RT_INPUT; reg_idx = var->regs[HLSL_REGSET_NUMERIC].id; }
I don't dislike this, but it does seem like somewhat of a separate change.
+ VKD3D_SM4_REGISTER_TYPE_LAST = VKD3D_SM5_RT_OUTPUT_STENCIL_REF,
We'd typically add something like VKD3D_SM4_REGISTER_TYPE_COUNT. E.g., VKD3D_SHADER_TYPE_COUNT.
- Do the same thing, but use proper locking and/or atomic operations to ensure concurrent threads do the right thing. It can be tricky and it forces us to deal with threading API in a library that probably shouldn't. I'd avoid that.
- Do the same thing, but store the lookup tables in the parser context instead of in a static variable. There's a bit of CPU cycle wasting, but it's a small thing, and concurrent threads work on independent tables and do not step on each other.
- Don't do anything fancy, just write both lookup tables in the code. It's not DRY, so in principle I hate that, but given that the two tables are still rather small and C really tries to get in your way here, it still feels one of the best alternatives.
- Turn to the dark side and embrace the C way: the preprocessor! Create a file full of `X(VKD3D_SM4_RT_TEMP, VKD3DSPR_TEMP)` lines and include each time you need it defining `X()` appropriately each time. Once you're done with that you start doing things like `#define E } else {` and you're halfway submitting vkd3d-shader to the IOCCC. It's a great career path, though, I think you can get consultancy work from Satan himself.
Notice that with the third and fourth alternatives you can also write a `switch` statement instead of a lookup table, in theory giving your compiler even more optimization opportunities. Not sure the compiler is going to make good use of them, though.
I'd go with the second or third alternative.
Yeah. I'd go with the second option.
It's probably not that hard to make the first option work with either something like "__attribute__((constructor))" or something like pthread_once(), although there may be a few pitfalls. I doubt it's worth the effort at this point though.
+ VKD3D_SM4_DIMENSION_INVALID = ~0u,
Do we really need this? We use it in register_type_table[] for VKD3DSPR_IMMCONST and VKD3DSPR_IMMCONST64, but then overwrite it for VKD3DSPR_IMMCONST. I'm not sure it adds that much over using e.g. VKD3D_SM4_DIMENSION_NONE or VKD3D_SM4_DIMENSION_VEC4 for these.