First part of v2 of !27, which aims to:
* Allow allocation of variables of complex types that contain both numerics and objects across multiple register sets (regsets). * Support the tex2D and tex3D intrinsics, inferring generic samplers dimension from usage, writing sampler declarations, and writing sample instructions. * Support for arrays of resources for both SM1 and SM4 (not to be confused with the resource-arrays of SM 5.1, which can have non-constant indexes). * Support for resources declared within structs. * Support for synthetic combined samplers for SM1 and synthetic separated samplers for SM4, considering that they can be arrays or members of structs. * Imitate the way the native compiler assigns the register indexes of the resources on allocation, which proved to be the most difficult thing. * Support for object components within complex input parameters. * Small fixes to corner cases.
This part consists on parsing the `tex2D()` and `tex3D()` intrinsics and beginning to support the allocation of variables across multiple regsets.
The whole series, is on my [master6](https://gitlab.winehq.org/fcasas/vkd3d/-/commits/master6) branch.
-- v4: vkd3d-shader/hlsl: Allocate register reservations in a separate pass. vkd3d-shader/hlsl: Respect object reservations even if the object is unused. vkd3d-shader/hlsl: Allocate objects according to register set. vkd3d-shader/hlsl: Keep an hlsl_reg for each register set in hlsl_ir_var. vkd3d-shader/hlsl: Store the type's register size for each register set. vkd3d-shader/hlsl: Parse the tex3D() intrinsic. vkd3d-shader/hlsl: Parse the tex2D() intrinsic.
From: Francisco Casas fcasas@codeweavers.com
--- libs/vkd3d-shader/hlsl_sm4.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/libs/vkd3d-shader/hlsl_sm4.c b/libs/vkd3d-shader/hlsl_sm4.c index af59b7c7..8f96c780 100644 --- a/libs/vkd3d-shader/hlsl_sm4.c +++ b/libs/vkd3d-shader/hlsl_sm4.c @@ -2243,7 +2243,10 @@ static void write_sm4_resource_load(struct hlsl_ctx *ctx,
case HLSL_RESOURCE_SAMPLE: if (!load->sampler.var) + { hlsl_fixme(ctx, &load->node.loc, "SM4 combined sample expression."); + return; + } write_sm4_sample(ctx, buffer, resource_type, &load->node, &load->resource, &load->sampler, coords, texel_offset); break;
From: Zebediah Figura zfigura@codeweavers.com
--- Modifications: * Using new hlsl_resource_load_params struct. * Removed `HLSL_OP2_SAMPLE` from enum hlsl_ir_expr_op, since it is not used. --- libs/vkd3d-shader/hlsl.y | 61 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 61 insertions(+)
diff --git a/libs/vkd3d-shader/hlsl.y b/libs/vkd3d-shader/hlsl.y index a4de0edd..d2a7c418 100644 --- a/libs/vkd3d-shader/hlsl.y +++ b/libs/vkd3d-shader/hlsl.y @@ -2888,6 +2888,66 @@ static bool intrinsic_step(struct hlsl_ctx *ctx, return !!add_implicit_conversion(ctx, params->instrs, ge, type, loc); }
+static bool intrinsic_tex(struct hlsl_ctx *ctx, const struct parse_initializer *params, + const struct vkd3d_shader_location *loc, const char *name, enum hlsl_sampler_dim dim) +{ + struct hlsl_resource_load_params load_params = {.type = HLSL_RESOURCE_SAMPLE}; + const struct hlsl_type *sampler_type; + struct hlsl_ir_resource_load *load; + struct hlsl_ir_load *sampler_load; + struct hlsl_ir_node *coords; + + if (params->args_count != 2 && params->args_count != 4) + { + hlsl_error(ctx, loc, VKD3D_SHADER_ERROR_HLSL_WRONG_PARAMETER_COUNT, + "Wrong number of arguments to function '%s': expected 2 or 4, but got %u.", name, params->args_count); + return false; + } + + if (params->args_count == 4) + { + hlsl_fixme(ctx, loc, "Samples with gradients are not implemented.\n"); + } + + sampler_type = params->args[0]->data_type; + if (sampler_type->type != HLSL_CLASS_OBJECT || sampler_type->base_type != HLSL_TYPE_SAMPLER + || (sampler_type->sampler_dim != dim && sampler_type->sampler_dim != HLSL_SAMPLER_DIM_GENERIC)) + { + struct vkd3d_string_buffer *string; + + if ((string = hlsl_type_to_string(ctx, sampler_type))) + hlsl_error(ctx, loc, VKD3D_SHADER_ERROR_HLSL_INVALID_TYPE, + "Wrong type for argument 1 of '%s': expected 'sampler' or '%s', but got '%s'.", + name, ctx->builtin_types.sampler[dim]->name, string->buffer); + hlsl_release_string_buffer(ctx, string); + } + else + { + /* Only HLSL_IR_LOAD can return an object. */ + sampler_load = hlsl_ir_load(params->args[0]); + + load_params.resource = sampler_load->src; + } + + if (!(coords = add_implicit_conversion(ctx, params->instrs, params->args[1], + hlsl_get_vector_type(ctx, HLSL_TYPE_FLOAT, hlsl_sampler_dim_count(dim)), loc))) + coords = params->args[1]; + + load_params.coords = coords; + load_params.format = hlsl_get_vector_type(ctx, HLSL_TYPE_FLOAT, 4); + + if (!(load = hlsl_new_resource_load(ctx, &load_params, loc))) + return false; + list_add_tail(params->instrs, &load->node.entry); + return true; +} + +static bool intrinsic_tex2D(struct hlsl_ctx *ctx, + const struct parse_initializer *params, const struct vkd3d_shader_location *loc) +{ + return intrinsic_tex(ctx, params, loc, "tex2D", HLSL_SAMPLER_DIM_2D); +} + static bool intrinsic_transpose(struct hlsl_ctx *ctx, const struct parse_initializer *params, const struct vkd3d_shader_location *loc) { @@ -2981,6 +3041,7 @@ intrinsic_functions[] = {"smoothstep", 3, true, intrinsic_smoothstep}, {"sqrt", 1, true, intrinsic_sqrt}, {"step", 2, true, intrinsic_step}, + {"tex2D", -1, false, intrinsic_tex2D}, {"transpose", 1, true, intrinsic_transpose}, };
From: Zebediah Figura zfigura@codeweavers.com
--- libs/vkd3d-shader/hlsl.y | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/libs/vkd3d-shader/hlsl.y b/libs/vkd3d-shader/hlsl.y index d2a7c418..1ce399a6 100644 --- a/libs/vkd3d-shader/hlsl.y +++ b/libs/vkd3d-shader/hlsl.y @@ -2948,6 +2948,12 @@ static bool intrinsic_tex2D(struct hlsl_ctx *ctx, return intrinsic_tex(ctx, params, loc, "tex2D", HLSL_SAMPLER_DIM_2D); }
+static bool intrinsic_tex3D(struct hlsl_ctx *ctx, + const struct parse_initializer *params, const struct vkd3d_shader_location *loc) +{ + return intrinsic_tex(ctx, params, loc, "tex3D", HLSL_SAMPLER_DIM_3D); +} + static bool intrinsic_transpose(struct hlsl_ctx *ctx, const struct parse_initializer *params, const struct vkd3d_shader_location *loc) { @@ -3042,6 +3048,7 @@ intrinsic_functions[] = {"sqrt", 1, true, intrinsic_sqrt}, {"step", 2, true, intrinsic_step}, {"tex2D", -1, false, intrinsic_tex2D}, + {"tex3D", -1, false, intrinsic_tex3D}, {"transpose", 1, true, intrinsic_transpose}, };
From: Francisco Casas fcasas@codeweavers.com
--- libs/vkd3d-shader/hlsl.c | 84 +++++++++++++++++++++++--------- libs/vkd3d-shader/hlsl.h | 34 +++++++++---- libs/vkd3d-shader/hlsl_codegen.c | 65 +++++++++++++----------- libs/vkd3d-shader/hlsl_sm1.c | 2 +- libs/vkd3d-shader/hlsl_sm4.c | 4 +- 5 files changed, 126 insertions(+), 63 deletions(-)
diff --git a/libs/vkd3d-shader/hlsl.c b/libs/vkd3d-shader/hlsl.c index 256e466a..1531551d 100644 --- a/libs/vkd3d-shader/hlsl.c +++ b/libs/vkd3d-shader/hlsl.c @@ -164,6 +164,29 @@ static unsigned int get_array_size(const struct hlsl_type *type) return 1; }
+enum hlsl_regset hlsl_type_get_regset(struct hlsl_ctx *ctx, const struct hlsl_type *type) +{ + if (type->type <= HLSL_CLASS_LAST_NUMERIC) + return HLSL_REGSET_NUMERIC; + + if (type->type == HLSL_CLASS_OBJECT) + { + switch (type->base_type) + { + case HLSL_TYPE_TEXTURE: + return HLSL_REGSET_T; + case HLSL_TYPE_SAMPLER: + return HLSL_REGSET_S; + case HLSL_TYPE_UAV: + return HLSL_REGSET_U; + default: + return HLSL_REGSET_VOID; + } + } + + return HLSL_REGSET_VOID; +} + unsigned int hlsl_type_get_sm4_offset(const struct hlsl_type *type, unsigned int offset) { /* Align to the next vec4 boundary if: @@ -171,7 +194,7 @@ unsigned int hlsl_type_get_sm4_offset(const struct hlsl_type *type, unsigned int * (b) the type would cross a vec4 boundary; i.e. a vec3 and a * vec1 can be packed together, but not a vec3 and a vec2. */ - if (type->type > HLSL_CLASS_LAST_NUMERIC || (offset & 3) + type->reg_size > 4) + if (type->type > HLSL_CLASS_LAST_NUMERIC || (offset & 3) + type->reg_size[HLSL_REGSET_NUMERIC] > 4) return align(offset, 4); return offset; } @@ -179,31 +202,40 @@ unsigned int hlsl_type_get_sm4_offset(const struct hlsl_type *type, unsigned int static void hlsl_type_calculate_reg_size(struct hlsl_ctx *ctx, struct hlsl_type *type) { bool is_sm4 = (ctx->profile->major_version >= 4); + unsigned int k; + + for (k = 0; k <= HLSL_REGSET_LAST; ++k) + type->reg_size[k] = 0;
switch (type->type) { case HLSL_CLASS_SCALAR: case HLSL_CLASS_VECTOR: - type->reg_size = is_sm4 ? type->dimx : 4; + type->reg_size[HLSL_REGSET_NUMERIC] = is_sm4 ? type->dimx : 4; break;
case HLSL_CLASS_MATRIX: if (hlsl_type_is_row_major(type)) - type->reg_size = is_sm4 ? (4 * (type->dimy - 1) + type->dimx) : (4 * type->dimy); + type->reg_size[HLSL_REGSET_NUMERIC] = is_sm4 ? (4 * (type->dimy - 1) + type->dimx) : (4 * type->dimy); else - type->reg_size = is_sm4 ? (4 * (type->dimx - 1) + type->dimy) : (4 * type->dimx); + type->reg_size[HLSL_REGSET_NUMERIC] = is_sm4 ? (4 * (type->dimx - 1) + type->dimy) : (4 * type->dimx); break;
case HLSL_CLASS_ARRAY: { - unsigned int element_size = type->e.array.type->reg_size; - if (type->e.array.elements_count == HLSL_ARRAY_ELEMENTS_COUNT_IMPLICIT) - type->reg_size = 0; - else if (is_sm4) - type->reg_size = (type->e.array.elements_count - 1) * align(element_size, 4) + element_size; - else - type->reg_size = type->e.array.elements_count * element_size; + break; + + for (k = 0; k <= HLSL_REGSET_LAST; ++k) + { + unsigned int element_size = type->e.array.type->reg_size[k]; + + if (is_sm4 && k == HLSL_REGSET_NUMERIC) + type->reg_size[k] = (type->e.array.elements_count - 1) * align(element_size, 4) + element_size; + else + type->reg_size[k] = type->e.array.elements_count * element_size; + } + break; }
@@ -212,16 +244,17 @@ static void hlsl_type_calculate_reg_size(struct hlsl_ctx *ctx, struct hlsl_type unsigned int i;
type->dimx = 0; - type->reg_size = 0; - for (i = 0; i < type->e.record.field_count; ++i) { struct hlsl_struct_field *field = &type->e.record.fields[i]; - unsigned int field_size = field->type->reg_size;
- type->reg_size = hlsl_type_get_sm4_offset(field->type, type->reg_size); - field->reg_offset = type->reg_size; - type->reg_size += field_size; + for (k = 0; k <= HLSL_REGSET_LAST; ++k) + { + if (k == HLSL_REGSET_NUMERIC) + type->reg_size[k] = hlsl_type_get_sm4_offset(field->type, type->reg_size[k]); + field->reg_offset[k] = type->reg_size[k]; + type->reg_size[k] += field->type->reg_size[k]; + }
type->dimx += field->type->dimx * field->type->dimy * get_array_size(field->type); } @@ -229,16 +262,23 @@ static void hlsl_type_calculate_reg_size(struct hlsl_ctx *ctx, struct hlsl_type }
case HLSL_CLASS_OBJECT: - type->reg_size = 0; + { + enum hlsl_regset regset = hlsl_type_get_regset(ctx, type); + + if (regset != HLSL_REGSET_VOID) + type->reg_size[regset] = 1; break; + } } }
-/* Returns the size of a type, considered as part of an array of that type. - * As such it includes padding after the type. */ -unsigned int hlsl_type_get_array_element_reg_size(const struct hlsl_type *type) +/* Returns the size of a type, considered as part of an array of that type, within a specific + * register set. As such it includes padding after the type, when applicable. */ +unsigned int hlsl_type_get_array_element_reg_size(const struct hlsl_type *type, enum hlsl_regset regset) { - return align(type->reg_size, 4); + if (regset == HLSL_REGSET_NUMERIC) + return align(type->reg_size[regset], 4); + return type->reg_size[regset]; }
static struct hlsl_type *hlsl_new_type(struct hlsl_ctx *ctx, const char *name, enum hlsl_type_class type_class, diff --git a/libs/vkd3d-shader/hlsl.h b/libs/vkd3d-shader/hlsl.h index bb63f827..1233d338 100644 --- a/libs/vkd3d-shader/hlsl.h +++ b/libs/vkd3d-shader/hlsl.h @@ -122,6 +122,17 @@ enum hlsl_matrix_majority HLSL_ROW_MAJOR };
+enum hlsl_regset +{ + HLSL_REGSET_S, + HLSL_REGSET_T, + HLSL_REGSET_U, + HLSL_REGSET_LAST_OBJECT = HLSL_REGSET_U, + HLSL_REGSET_NUMERIC, + HLSL_REGSET_VOID, + HLSL_REGSET_LAST = HLSL_REGSET_VOID, +}; + /* An HLSL source-level data type, including anonymous structs and typedefs. */ struct hlsl_type { @@ -183,12 +194,12 @@ struct hlsl_type struct hlsl_type *resource_format; } e;
- /* Number of numeric register components used by one value of this type (4 components make 1 - * register). - * If type is HLSL_CLASS_STRUCT or HLSL_CLASS_ARRAY, this value includes the reg_size of - * their elements and padding (which varies according to the backend). - * This value is 0 for types without numeric components, like objects. */ - unsigned int reg_size; + /* Number of numeric register components used by one value of this type, for each regset. + * For HLSL_REGSET_NUMERIC, 4 components make 1 register, while for other regsets 1 component makes + * 1 register. + * If type is HLSL_CLASS_STRUCT or HLSL_CLASS_ARRAY, the reg_size of their elements and padding + * (which varies according to the backend) is also included. */ + unsigned int reg_size[HLSL_REGSET_LAST + 1]; /* Offset where the type's description starts in the output bytecode, in bytes. */ size_t bytecode_offset;
@@ -215,8 +226,8 @@ struct hlsl_struct_field * type->modifiers instead) and that also are specific to the field and not the whole variable. * In particular, interpolation modifiers. */ unsigned int storage_modifiers; - /* Offset of the field within the type it belongs to, in numeric register components. */ - unsigned int reg_offset; + /* Offset of the field within the type it belongs to, in register components, for each regset. */ + unsigned int reg_offset[HLSL_REGSET_LAST + 1];
/* Offset where the fields's name starts in the output bytecode, in bytes. */ size_t name_bytecode_offset; @@ -546,10 +557,12 @@ struct hlsl_deref struct hlsl_src *path;
/* Single instruction node of data type uint used to represent the register offset (in register - * components), from the start of the variable, of the part referenced. + * components, within the pertaining regset), from the start of the variable, of the part + * referenced. * The path is lowered to this single offset -- whose value may vary between SM1 and SM4 -- * before writing the bytecode. */ struct hlsl_src offset; + enum hlsl_regset offset_regset; };
struct hlsl_ir_load @@ -1065,13 +1078,14 @@ bool hlsl_scope_add_type(struct hlsl_scope *scope, struct hlsl_type *type); struct hlsl_type *hlsl_type_clone(struct hlsl_ctx *ctx, struct hlsl_type *old, unsigned int default_majority, unsigned int modifiers); unsigned int hlsl_type_component_count(const struct hlsl_type *type); -unsigned int hlsl_type_get_array_element_reg_size(const struct hlsl_type *type); +unsigned int hlsl_type_get_array_element_reg_size(const struct hlsl_type *type, enum hlsl_regset regset); struct hlsl_type *hlsl_type_get_component_type(struct hlsl_ctx *ctx, struct hlsl_type *type, unsigned int index); bool hlsl_type_is_row_major(const struct hlsl_type *type); unsigned int hlsl_type_minor_size(const struct hlsl_type *type); unsigned int hlsl_type_major_size(const struct hlsl_type *type); unsigned int hlsl_type_element_count(const struct hlsl_type *type); +enum hlsl_regset hlsl_type_get_regset(struct hlsl_ctx *ctx, const struct hlsl_type *type); unsigned int hlsl_type_get_sm4_offset(const struct hlsl_type *type, unsigned int offset); bool hlsl_types_are_equal(const struct hlsl_type *t1, const struct hlsl_type *t2);
diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index 4fa860a6..a35baa2b 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -24,7 +24,7 @@ /* TODO: remove when no longer needed, only used for new_offset_instr_from_deref() */ static struct hlsl_ir_node *new_offset_from_path_index(struct hlsl_ctx *ctx, struct hlsl_block *block, struct hlsl_type *type, struct hlsl_ir_node *offset, struct hlsl_ir_node *idx, - const struct vkd3d_shader_location *loc) + enum hlsl_regset regset, const struct vkd3d_shader_location *loc) { struct hlsl_ir_node *idx_offset = NULL; struct hlsl_ir_constant *c; @@ -52,7 +52,7 @@ static struct hlsl_ir_node *new_offset_from_path_index(struct hlsl_ctx *ctx, str
case HLSL_CLASS_ARRAY: { - unsigned int size = hlsl_type_get_array_element_reg_size(type->e.array.type); + unsigned int size = hlsl_type_get_array_element_reg_size(type->e.array.type, regset);
if (!(c = hlsl_new_uint_constant(ctx, size, loc))) return NULL; @@ -70,7 +70,7 @@ static struct hlsl_ir_node *new_offset_from_path_index(struct hlsl_ctx *ctx, str unsigned int field_idx = hlsl_ir_constant(idx)->value[0].u; struct hlsl_struct_field *field = &type->e.record.fields[field_idx];
- if (!(c = hlsl_new_uint_constant(ctx, field->reg_offset, loc))) + if (!(c = hlsl_new_uint_constant(ctx, field->reg_offset[regset], loc))) return NULL; list_add_tail(&block->instrs, &c->node.entry);
@@ -110,7 +110,8 @@ static struct hlsl_ir_node *new_offset_instr_from_deref(struct hlsl_ctx *ctx, st { struct hlsl_block idx_block;
- if (!(offset = new_offset_from_path_index(ctx, &idx_block, type, offset, deref->path[i].node, loc))) + if (!(offset = new_offset_from_path_index(ctx, &idx_block, type, offset, deref->path[i].node, + deref->offset_regset, loc))) return NULL;
list_move_tail(&block->instrs, &idx_block.instrs); @@ -134,6 +135,8 @@ static void replace_deref_path_with_offset(struct hlsl_ctx *ctx, struct hlsl_der /* register offsets shouldn't be used before this point is reached. */ assert(!deref->offset.node);
+ deref->offset_regset = hlsl_type_get_regset(ctx, hlsl_deref_get_type(ctx, deref)); + if (!(offset = new_offset_instr_from_deref(ctx, &block, deref, &instr->loc))) return; list_move_before(&instr->entry, &block.instrs); @@ -2172,32 +2175,33 @@ static struct hlsl_reg allocate_range(struct hlsl_ctx *ctx, struct liveness *liv static const char *debug_register(char class, struct hlsl_reg reg, const struct hlsl_type *type) { static const char writemask_offset[] = {'w','x','y','z'}; + unsigned int reg_size = type->reg_size[HLSL_REGSET_NUMERIC];
- if (type->reg_size > 4) + if (reg_size > 4) { - if (type->reg_size & 3) - return vkd3d_dbg_sprintf("%c%u-%c%u.%c", class, reg.id, class, - reg.id + (type->reg_size / 4), writemask_offset[type->reg_size & 3]); + if (reg_size & 3) + return vkd3d_dbg_sprintf("%c%u-%c%u.%c", class, reg.id, class, reg.id + (reg_size / 4), + writemask_offset[reg_size & 3]);
- return vkd3d_dbg_sprintf("%c%u-%c%u", class, reg.id, class, - reg.id + (type->reg_size / 4) - 1); + return vkd3d_dbg_sprintf("%c%u-%c%u", class, reg.id, class, reg.id + (reg_size / 4) - 1); } return vkd3d_dbg_sprintf("%c%u%s", class, reg.id, debug_hlsl_writemask(reg.writemask)); }
static void allocate_variable_temp_register(struct hlsl_ctx *ctx, struct hlsl_ir_var *var, struct liveness *liveness) { + unsigned int reg_size = var->data_type->reg_size[HLSL_REGSET_NUMERIC]; + if (var->is_input_semantic || var->is_output_semantic || var->is_uniform) return;
if (!var->reg.allocated && var->last_read) { - if (var->data_type->reg_size > 4) - var->reg = allocate_range(ctx, liveness, var->first_write, - var->last_read, var->data_type->reg_size); + if (reg_size > 4) + var->reg = allocate_range(ctx, liveness, var->first_write, var->last_read, reg_size); else - var->reg = allocate_register(ctx, liveness, var->first_write, - var->last_read, var->data_type->reg_size); + var->reg = allocate_register(ctx, liveness, var->first_write, var->last_read, reg_size); + TRACE("Allocated %s to %s (liveness %u-%u).\n", var->name, debug_register('r', var->reg, var->data_type), var->first_write, var->last_read); } @@ -2211,12 +2215,12 @@ static void allocate_temp_registers_recurse(struct hlsl_ctx *ctx, struct hlsl_bl { if (!instr->reg.allocated && instr->last_read) { - if (instr->data_type->reg_size > 4) - instr->reg = allocate_range(ctx, liveness, instr->index, - instr->last_read, instr->data_type->reg_size); + unsigned int reg_size = instr->data_type->reg_size[HLSL_REGSET_NUMERIC]; + + if (reg_size > 4) + instr->reg = allocate_range(ctx, liveness, instr->index, instr->last_read, reg_size); else - instr->reg = allocate_register(ctx, liveness, instr->index, - instr->last_read, instr->data_type->reg_size); + instr->reg = allocate_register(ctx, liveness, instr->index, instr->last_read, reg_size); TRACE("Allocated anonymous expression @%u to %s (liveness %u-%u).\n", instr->index, debug_register('r', instr->reg, instr->data_type), instr->index, instr->last_read); } @@ -2274,7 +2278,7 @@ static void allocate_const_registers_recurse(struct hlsl_ctx *ctx, struct hlsl_b struct hlsl_ir_constant *constant = hlsl_ir_constant(instr); const struct hlsl_type *type = instr->data_type; unsigned int x, y, i, writemask, end_reg; - unsigned int reg_size = type->reg_size; + unsigned int reg_size = type->reg_size[HLSL_REGSET_NUMERIC];
if (reg_size > 4) constant->reg = allocate_range(ctx, liveness, 1, UINT_MAX, reg_size); @@ -2373,15 +2377,15 @@ static void allocate_const_registers(struct hlsl_ctx *ctx, struct hlsl_ir_functi { if (var->is_uniform && var->last_read) { - if (var->data_type->reg_size == 0) + if (var->data_type->reg_size[HLSL_REGSET_NUMERIC] == 0) continue;
- if (var->data_type->reg_size > 4) - var->reg = allocate_range(ctx, &liveness, 1, UINT_MAX, var->data_type->reg_size); + if (var->data_type->reg_size[HLSL_REGSET_NUMERIC] > 4) + var->reg = allocate_range(ctx, &liveness, 1, UINT_MAX, var->data_type->reg_size[HLSL_REGSET_NUMERIC]); else { var->reg = allocate_register(ctx, &liveness, 1, UINT_MAX, 4); - var->reg.writemask = (1u << var->data_type->reg_size) - 1; + var->reg.writemask = (1u << var->data_type->reg_size[HLSL_REGSET_NUMERIC]) - 1; } TRACE("Allocated %s to %s.\n", var->name, debug_register('c', var->reg, var->data_type)); } @@ -2498,7 +2502,7 @@ static void calculate_buffer_offset(struct hlsl_ir_var *var)
var->buffer_offset = buffer->size; TRACE("Allocated buffer offset %u to %s.\n", var->buffer_offset, var->name); - buffer->size += var->data_type->reg_size; + buffer->size += var->data_type->reg_size[HLSL_REGSET_NUMERIC]; if (var->last_read) buffer->used_size = buffer->size; } @@ -2753,6 +2757,7 @@ bool hlsl_component_index_range_from_deref(struct hlsl_ctx *ctx, const struct hl bool hlsl_offset_from_deref(struct hlsl_ctx *ctx, const struct hlsl_deref *deref, unsigned int *offset) { struct hlsl_ir_node *offset_node = deref->offset.node; + unsigned int size;
if (!offset_node) { @@ -2769,10 +2774,12 @@ bool hlsl_offset_from_deref(struct hlsl_ctx *ctx, const struct hlsl_deref *deref
*offset = hlsl_ir_constant(offset_node)->value[0].u;
- if (*offset >= deref->var->data_type->reg_size) + assert(deref->offset_regset != HLSL_REGSET_VOID); + size = deref->var->data_type->reg_size[deref->offset_regset]; + if (*offset >= size) { hlsl_error(ctx, &deref->offset.node->loc, VKD3D_SHADER_ERROR_HLSL_OFFSET_OUT_OF_BOUNDS, - "Dereference is out of bounds. %u/%u", *offset, deref->var->data_type->reg_size); + "Dereference is out of bounds. %u/%u", *offset, size); return false; }
@@ -2798,6 +2805,8 @@ struct hlsl_reg hlsl_reg_from_deref(struct hlsl_ctx *ctx, const struct hlsl_dere struct hlsl_reg ret = var->reg; unsigned int offset = hlsl_offset_from_deref_safe(ctx, deref);
+ assert(deref->offset_regset == HLSL_REGSET_NUMERIC); + ret.id += offset / 4;
ret.writemask = 0xf & (0xf << (offset % 4)); diff --git a/libs/vkd3d-shader/hlsl_sm1.c b/libs/vkd3d-shader/hlsl_sm1.c index a66d4028..c2830fe3 100644 --- a/libs/vkd3d-shader/hlsl_sm1.c +++ b/libs/vkd3d-shader/hlsl_sm1.c @@ -366,7 +366,7 @@ static void write_sm1_uniforms(struct hlsl_ctx *ctx, struct vkd3d_bytecode_buffe else { put_u32(buffer, vkd3d_make_u32(D3DXRS_FLOAT4, var->reg.id)); - put_u32(buffer, var->data_type->reg_size / 4); + put_u32(buffer, var->data_type->reg_size[HLSL_REGSET_NUMERIC] / 4); } put_u32(buffer, 0); /* type */ put_u32(buffer, 0); /* FIXME: default value */ diff --git a/libs/vkd3d-shader/hlsl_sm4.c b/libs/vkd3d-shader/hlsl_sm4.c index 8f96c780..b0ea4a4c 100644 --- a/libs/vkd3d-shader/hlsl_sm4.c +++ b/libs/vkd3d-shader/hlsl_sm4.c @@ -395,7 +395,7 @@ static void write_sm4_type(struct hlsl_ctx *ctx, struct vkd3d_bytecode_buffer *b
put_u32(buffer, field->name_bytecode_offset); put_u32(buffer, field->type->bytecode_offset); - put_u32(buffer, field->reg_offset); + put_u32(buffer, field->reg_offset[HLSL_REGSET_NUMERIC]); } }
@@ -711,7 +711,7 @@ static void write_sm4_rdef(struct hlsl_ctx *ctx, struct dxbc_writer *dxbc)
put_u32(&buffer, 0); /* name */ put_u32(&buffer, var->buffer_offset * sizeof(float)); - put_u32(&buffer, var->data_type->reg_size * sizeof(float)); + put_u32(&buffer, var->data_type->reg_size[HLSL_REGSET_NUMERIC] * sizeof(float)); put_u32(&buffer, flags); put_u32(&buffer, 0); /* type */ put_u32(&buffer, 0); /* FIXME: default value */
From: Francisco Casas fcasas@codeweavers.com
--- libs/vkd3d-shader/hlsl.h | 16 +++----- libs/vkd3d-shader/hlsl_codegen.c | 55 +++++++++++++++----------- libs/vkd3d-shader/hlsl_sm1.c | 22 +++++++---- libs/vkd3d-shader/hlsl_sm4.c | 67 ++++++++++++++++++++------------ 4 files changed, 96 insertions(+), 64 deletions(-)
diff --git a/libs/vkd3d-shader/hlsl.h b/libs/vkd3d-shader/hlsl.h index 1233d338..91b222eb 100644 --- a/libs/vkd3d-shader/hlsl.h +++ b/libs/vkd3d-shader/hlsl.h @@ -383,19 +383,13 @@ struct hlsl_ir_var /* Offset where the variable's value is stored within its buffer in numeric register components. * This in case the variable is uniform. */ unsigned int buffer_offset; - /* Register to which the variable is allocated during its lifetime. - * In case that the variable spans multiple registers, this is set to the start of the register - * range. - * The register type is inferred from the data type and the storage of the variable. + /* Register to which the variable is allocated during its lifetime, for each register set. + * In case that the variable spans multiple registers in one regset, this is set to the + * start of the register range. * Builtin semantics don't use the field. * In SM4, uniforms don't use the field because they are located using the buffer's hlsl_reg - * and the buffer_offset instead. - * If the variable is an input semantic copy, the register is 'v'. - * If the variable is an output semantic copy, the register is 'o'. - * Textures are stored on 's' registers in SM1, and 't' registers in SM4. - * Samplers are stored on 's' registers. - * UAVs are stored on 'u' registers. */ - struct hlsl_reg reg; + * and the buffer_offset instead. */ + struct hlsl_reg regs[HLSL_REGSET_LAST + 1];
uint32_t is_input_semantic : 1; uint32_t is_output_semantic : 1; diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index a35baa2b..a5b4f78f 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -2195,15 +2195,17 @@ static void allocate_variable_temp_register(struct hlsl_ctx *ctx, struct hlsl_ir if (var->is_input_semantic || var->is_output_semantic || var->is_uniform) return;
- if (!var->reg.allocated && var->last_read) + if (!var->regs[HLSL_REGSET_NUMERIC].allocated && var->last_read) { if (reg_size > 4) - var->reg = allocate_range(ctx, liveness, var->first_write, var->last_read, reg_size); + var->regs[HLSL_REGSET_NUMERIC] = allocate_range(ctx, liveness, var->first_write, + var->last_read, reg_size); else - var->reg = allocate_register(ctx, liveness, var->first_write, var->last_read, reg_size); + var->regs[HLSL_REGSET_NUMERIC] = allocate_register(ctx, liveness, var->first_write, + var->last_read, reg_size);
- TRACE("Allocated %s to %s (liveness %u-%u).\n", var->name, - debug_register('r', var->reg, var->data_type), var->first_write, var->last_read); + TRACE("Allocated %s to %s (liveness %u-%u).\n", var->name, debug_register('r', + var->regs[HLSL_REGSET_NUMERIC], var->data_type), var->first_write, var->last_read); } }
@@ -2377,17 +2379,21 @@ static void allocate_const_registers(struct hlsl_ctx *ctx, struct hlsl_ir_functi { if (var->is_uniform && var->last_read) { - if (var->data_type->reg_size[HLSL_REGSET_NUMERIC] == 0) + unsigned int reg_size = var->data_type->reg_size[HLSL_REGSET_NUMERIC]; + + if (reg_size == 0) continue;
- if (var->data_type->reg_size[HLSL_REGSET_NUMERIC] > 4) - var->reg = allocate_range(ctx, &liveness, 1, UINT_MAX, var->data_type->reg_size[HLSL_REGSET_NUMERIC]); + if (reg_size > 4) + { + var->regs[HLSL_REGSET_NUMERIC] = allocate_range(ctx, &liveness, 1, UINT_MAX, reg_size); + } else { - var->reg = allocate_register(ctx, &liveness, 1, UINT_MAX, 4); - var->reg.writemask = (1u << var->data_type->reg_size[HLSL_REGSET_NUMERIC]) - 1; + var->regs[HLSL_REGSET_NUMERIC] = allocate_register(ctx, &liveness, 1, UINT_MAX, 4); + var->regs[HLSL_REGSET_NUMERIC].writemask = (1u << reg_size) - 1; } - TRACE("Allocated %s to %s.\n", var->name, debug_register('c', var->reg, var->data_type)); + TRACE("Allocated %s to %s.\n", var->name, debug_register('c', var->regs[HLSL_REGSET_NUMERIC], var->data_type)); } } } @@ -2461,10 +2467,11 @@ static void allocate_semantic_register(struct hlsl_ctx *ctx, struct hlsl_ir_var } else { - var->reg.allocated = true; - var->reg.id = (*counter)++; - var->reg.writemask = (1 << var->data_type->dimx) - 1; - TRACE("Allocated %s to %s.\n", var->name, debug_register(output ? 'o' : 'v', var->reg, var->data_type)); + var->regs[HLSL_REGSET_NUMERIC].allocated = true; + var->regs[HLSL_REGSET_NUMERIC].id = (*counter)++; + var->regs[HLSL_REGSET_NUMERIC].writemask = (1 << var->data_type->dimx) - 1; + TRACE("Allocated %s to %s.\n", var->name, debug_register(output ? 'o' : 'v', + var->regs[HLSL_REGSET_NUMERIC], var->data_type)); } }
@@ -2627,10 +2634,14 @@ static void allocate_objects(struct hlsl_ctx *ctx, enum hlsl_base_type type)
LIST_FOR_EACH_ENTRY(var, &ctx->extern_vars, struct hlsl_ir_var, extern_entry) { + enum hlsl_regset regset; + if (!var->last_read || var->data_type->type != HLSL_CLASS_OBJECT || var->data_type->base_type != type) continue;
+ regset = hlsl_type_get_regset(ctx, var->data_type); + if (var->reg_reservation.type == type_info->reg_name) { const struct hlsl_ir_var *reserved_object = get_reserved_object(ctx, type_info->reg_name, @@ -2652,8 +2663,8 @@ static void allocate_objects(struct hlsl_ctx *ctx, enum hlsl_base_type type) type_info->reg_name, var->reg_reservation.index); }
- var->reg.id = var->reg_reservation.index; - var->reg.allocated = true; + var->regs[regset].id = var->reg_reservation.index; + var->regs[regset].allocated = true; TRACE("Allocated reserved %s to %c%u.\n", var->name, type_info->reg_name, var->reg_reservation.index); } else if (!var->reg_reservation.type) @@ -2661,8 +2672,8 @@ static void allocate_objects(struct hlsl_ctx *ctx, enum hlsl_base_type type) while (get_reserved_object(ctx, type_info->reg_name, index)) ++index;
- var->reg.id = index; - var->reg.allocated = true; + var->regs[regset].id = index; + var->regs[regset].allocated = true; TRACE("Allocated object to %c%u.\n", type_info->reg_name, index); ++index; } @@ -2802,7 +2813,7 @@ unsigned int hlsl_offset_from_deref_safe(struct hlsl_ctx *ctx, const struct hlsl struct hlsl_reg hlsl_reg_from_deref(struct hlsl_ctx *ctx, const struct hlsl_deref *deref) { const struct hlsl_ir_var *var = deref->var; - struct hlsl_reg ret = var->reg; + struct hlsl_reg ret = var->regs[HLSL_REGSET_NUMERIC]; unsigned int offset = hlsl_offset_from_deref_safe(ctx, deref);
assert(deref->offset_regset == HLSL_REGSET_NUMERIC); @@ -2810,8 +2821,8 @@ struct hlsl_reg hlsl_reg_from_deref(struct hlsl_ctx *ctx, const struct hlsl_dere ret.id += offset / 4;
ret.writemask = 0xf & (0xf << (offset % 4)); - if (var->reg.writemask) - ret.writemask = hlsl_combine_writemasks(var->reg.writemask, ret.writemask); + if (var->regs[HLSL_REGSET_NUMERIC].writemask) + ret.writemask = hlsl_combine_writemasks(var->regs[HLSL_REGSET_NUMERIC].writemask, ret.writemask);
return ret; } diff --git a/libs/vkd3d-shader/hlsl_sm1.c b/libs/vkd3d-shader/hlsl_sm1.c index c2830fe3..84abff15 100644 --- a/libs/vkd3d-shader/hlsl_sm1.c +++ b/libs/vkd3d-shader/hlsl_sm1.c @@ -315,7 +315,9 @@ static void write_sm1_uniforms(struct hlsl_ctx *ctx, struct vkd3d_bytecode_buffe
LIST_FOR_EACH_ENTRY(var, &ctx->extern_vars, struct hlsl_ir_var, extern_entry) { - if (!var->semantic.name && var->reg.allocated) + enum hlsl_regset regset = hlsl_type_get_regset(ctx, var->data_type); + + if (!var->semantic.name && var->regs[regset].allocated) { ++uniform_count;
@@ -353,20 +355,24 @@ static void write_sm1_uniforms(struct hlsl_ctx *ctx, struct vkd3d_bytecode_buffe
LIST_FOR_EACH_ENTRY(var, &ctx->extern_vars, struct hlsl_ir_var, extern_entry) { - if (!var->semantic.name && var->reg.allocated) + enum hlsl_regset regset = hlsl_type_get_regset(ctx, var->data_type); + + if (!var->semantic.name && var->regs[regset].allocated) { put_u32(buffer, 0); /* name */ if (var->data_type->type == HLSL_CLASS_OBJECT && (var->data_type->base_type == HLSL_TYPE_SAMPLER || var->data_type->base_type == HLSL_TYPE_TEXTURE)) { - put_u32(buffer, vkd3d_make_u32(D3DXRS_SAMPLER, var->reg.id)); + assert(regset == HLSL_REGSET_S); + put_u32(buffer, vkd3d_make_u32(D3DXRS_SAMPLER, var->regs[regset].id)); put_u32(buffer, 1); } else { - put_u32(buffer, vkd3d_make_u32(D3DXRS_FLOAT4, var->reg.id)); - put_u32(buffer, var->data_type->reg_size[HLSL_REGSET_NUMERIC] / 4); + assert(regset == HLSL_REGSET_NUMERIC); + put_u32(buffer, vkd3d_make_u32(D3DXRS_FLOAT4, var->regs[regset].id)); + put_u32(buffer, var->data_type->reg_size[regset] / 4); } put_u32(buffer, 0); /* type */ put_u32(buffer, 0); /* FIXME: default value */ @@ -377,7 +383,9 @@ static void write_sm1_uniforms(struct hlsl_ctx *ctx, struct vkd3d_bytecode_buffe
LIST_FOR_EACH_ENTRY(var, &ctx->extern_vars, struct hlsl_ir_var, extern_entry) { - if (!var->semantic.name && var->reg.allocated) + enum hlsl_regset regset = hlsl_type_get_regset(ctx, var->data_type); + + if (!var->semantic.name && var->regs[regset].allocated) { size_t var_offset = vars_start + (uniform_count * 5 * sizeof(uint32_t)); size_t name_offset; @@ -547,7 +555,7 @@ static void write_sm1_semantic_dcl(struct hlsl_ctx *ctx, struct vkd3d_bytecode_b ret = hlsl_sm1_usage_from_semantic(&var->semantic, &usage, &usage_idx); assert(ret); reg.type = output ? D3DSPR_OUTPUT : D3DSPR_INPUT; - reg.reg = var->reg.id; + reg.reg = var->regs[HLSL_REGSET_NUMERIC].id; }
token = D3DSIO_DCL; diff --git a/libs/vkd3d-shader/hlsl_sm4.c b/libs/vkd3d-shader/hlsl_sm4.c index b0ea4a4c..64daa28c 100644 --- a/libs/vkd3d-shader/hlsl_sm4.c +++ b/libs/vkd3d-shader/hlsl_sm4.c @@ -182,9 +182,9 @@ static void write_sm4_signature(struct hlsl_ctx *ctx, struct dxbc_writer *dxbc, } else { - assert(var->reg.allocated); + assert(var->regs[HLSL_REGSET_NUMERIC].allocated); type = VKD3D_SM4_RT_INPUT; - reg_idx = var->reg.id; + reg_idx = var->regs[HLSL_REGSET_NUMERIC].id; }
use_mask = width; /* FIXME: accurately report use mask */ @@ -480,16 +480,27 @@ static D3D_SRV_DIMENSION sm4_rdef_resource_dimension(const struct hlsl_type *typ } }
-static int sm4_compare_externs(const struct hlsl_ir_var *a, const struct hlsl_ir_var *b) +static int sm4_compare_externs(struct hlsl_ctx *ctx, const struct hlsl_ir_var *a, const struct hlsl_ir_var *b) { - if (a->data_type->base_type != b->data_type->base_type) - return a->data_type->base_type - b->data_type->base_type; - if (a->reg.allocated && b->reg.allocated) - return a->reg.id - b->reg.id; + enum hlsl_regset a_regset = hlsl_type_get_regset(ctx, a->data_type); + enum hlsl_regset b_regset = hlsl_type_get_regset(ctx, b->data_type); + unsigned int a_id, b_id; + bool a_allocated, b_allocated; + + a_allocated = a->regs[a_regset].allocated; + a_id = a->regs[a_regset].id; + + b_allocated = b->regs[b_regset].allocated; + b_id = b->regs[b_regset].id; + + if (a_regset != b_regset) + return a_regset - b_regset; + if (a_allocated && b_allocated) + return a_id - b_id; return strcmp(a->name, b->name); }
-static void sm4_sort_extern(struct list *sorted, struct hlsl_ir_var *to_sort) +static void sm4_sort_extern(struct hlsl_ctx *ctx, struct list *sorted, struct hlsl_ir_var *to_sort) { struct hlsl_ir_var *var;
@@ -497,7 +508,7 @@ static void sm4_sort_extern(struct list *sorted, struct hlsl_ir_var *to_sort)
LIST_FOR_EACH_ENTRY(var, sorted, struct hlsl_ir_var, extern_entry) { - if (sm4_compare_externs(to_sort, var) < 0) + if (sm4_compare_externs(ctx, to_sort, var) < 0) { list_add_before(&var->extern_entry, &to_sort->extern_entry); return; @@ -515,7 +526,7 @@ static void sm4_sort_externs(struct hlsl_ctx *ctx) LIST_FOR_EACH_ENTRY_SAFE(var, next, &ctx->extern_vars, struct hlsl_ir_var, extern_entry) { if (var->data_type->type == HLSL_CLASS_OBJECT) - sm4_sort_extern(&sorted, var); + sm4_sort_extern(ctx, &sorted, var); } list_move_tail(&ctx->extern_vars, &sorted); } @@ -544,8 +555,11 @@ static void write_sm4_rdef(struct hlsl_ctx *ctx, struct dxbc_writer *dxbc)
LIST_FOR_EACH_ENTRY(var, &ctx->extern_vars, struct hlsl_ir_var, extern_entry) { - if (var->reg.allocated && var->data_type->type == HLSL_CLASS_OBJECT) - ++resource_count; + enum hlsl_regset regset = hlsl_type_get_regset(ctx, var->data_type); + + if (regset > HLSL_REGSET_LAST_OBJECT || !var->regs[regset].allocated) + continue; + ++resource_count; }
LIST_FOR_EACH_ENTRY(cbuffer, &ctx->buffers, struct hlsl_buffer, entry) @@ -585,9 +599,10 @@ static void write_sm4_rdef(struct hlsl_ctx *ctx, struct dxbc_writer *dxbc)
LIST_FOR_EACH_ENTRY(var, &ctx->extern_vars, struct hlsl_ir_var, extern_entry) { + enum hlsl_regset regset = hlsl_type_get_regset(ctx, var->data_type); uint32_t flags = 0;
- if (!var->reg.allocated || var->data_type->type != HLSL_CLASS_OBJECT) + if (regset > HLSL_REGSET_LAST_OBJECT || !var->regs[regset].allocated) continue;
if (var->reg_reservation.type) @@ -595,7 +610,7 @@ static void write_sm4_rdef(struct hlsl_ctx *ctx, struct dxbc_writer *dxbc)
put_u32(&buffer, 0); /* name */ put_u32(&buffer, sm4_resource_type(var->data_type)); - if (var->data_type->base_type == HLSL_TYPE_SAMPLER) + if (regset == HLSL_REGSET_S) { put_u32(&buffer, 0); put_u32(&buffer, 0); @@ -608,7 +623,7 @@ static void write_sm4_rdef(struct hlsl_ctx *ctx, struct dxbc_writer *dxbc) put_u32(&buffer, ~0u); /* FIXME: multisample count */ flags |= (var->data_type->e.resource_format->dimx - 1) << VKD3D_SM4_SIF_TEXTURE_COMPONENTS_SHIFT; } - put_u32(&buffer, var->reg.id); + put_u32(&buffer, var->regs[regset].id); put_u32(&buffer, 1); /* bind count */ put_u32(&buffer, flags); } @@ -637,7 +652,9 @@ static void write_sm4_rdef(struct hlsl_ctx *ctx, struct dxbc_writer *dxbc)
LIST_FOR_EACH_ENTRY(var, &ctx->extern_vars, struct hlsl_ir_var, extern_entry) { - if (!var->reg.allocated || var->data_type->type != HLSL_CLASS_OBJECT) + enum hlsl_regset regset = hlsl_type_get_regset(ctx, var->data_type); + + if (regset > HLSL_REGSET_LAST_OBJECT || !var->regs[regset].allocated) continue;
string_offset = put_string(&buffer, var->name); @@ -863,7 +880,7 @@ static void sm4_register_from_deref(struct hlsl_ctx *ctx, struct sm4_register *r reg->dim = VKD3D_SM4_DIMENSION_VEC4; if (swizzle_type) *swizzle_type = VKD3D_SM4_SWIZZLE_VEC4; - reg->idx[0] = var->reg.id; + reg->idx[0] = var->regs[HLSL_REGSET_T].id; reg->idx_count = 1; *writemask = VKD3DSP_WRITEMASK_ALL; } @@ -873,7 +890,7 @@ static void sm4_register_from_deref(struct hlsl_ctx *ctx, struct sm4_register *r reg->dim = VKD3D_SM4_DIMENSION_VEC4; if (swizzle_type) *swizzle_type = VKD3D_SM4_SWIZZLE_VEC4; - reg->idx[0] = var->reg.id; + reg->idx[0] = var->regs[HLSL_REGSET_U].id; reg->idx_count = 1; *writemask = VKD3DSP_WRITEMASK_ALL; } @@ -883,7 +900,7 @@ static void sm4_register_from_deref(struct hlsl_ctx *ctx, struct sm4_register *r reg->dim = VKD3D_SM4_DIMENSION_NONE; if (swizzle_type) *swizzle_type = VKD3D_SM4_SWIZZLE_NONE; - reg->idx[0] = var->reg.id; + reg->idx[0] = var->regs[HLSL_REGSET_S].id; reg->idx_count = 1; *writemask = VKD3DSP_WRITEMASK_ALL; } @@ -1153,7 +1170,7 @@ static void write_sm4_dcl_sampler(struct vkd3d_bytecode_buffer *buffer, const st .opcode = VKD3D_SM4_OP_DCL_SAMPLER,
.dsts[0].reg.type = VKD3D_SM4_RT_SAMPLER, - .dsts[0].reg.idx = {var->reg.id}, + .dsts[0].reg.idx = {var->regs[HLSL_REGSET_S].id}, .dsts[0].reg.idx_count = 1, .dst_count = 1, }; @@ -1169,7 +1186,7 @@ static void write_sm4_dcl_texture(struct vkd3d_bytecode_buffer *buffer, const st | (sm4_resource_dimension(var->data_type) << VKD3D_SM4_RESOURCE_TYPE_SHIFT),
.dsts[0].reg.type = uav ? VKD3D_SM5_RT_UAV : VKD3D_SM4_RT_RESOURCE, - .dsts[0].reg.idx = {var->reg.id}, + .dsts[0].reg.idx = {uav ? var->regs[HLSL_REGSET_U].id : var->regs[HLSL_REGSET_T].id}, .dsts[0].reg.idx_count = 1, .dst_count = 1,
@@ -1208,9 +1225,9 @@ static void write_sm4_dcl_semantic(struct hlsl_ctx *ctx, struct vkd3d_bytecode_b else { instr.dsts[0].reg.type = output ? VKD3D_SM4_RT_OUTPUT : VKD3D_SM4_RT_INPUT; - instr.dsts[0].reg.idx[0] = var->reg.id; + instr.dsts[0].reg.idx[0] = var->regs[HLSL_REGSET_NUMERIC].id; instr.dsts[0].reg.idx_count = 1; - instr.dsts[0].writemask = var->reg.writemask; + instr.dsts[0].writemask = var->regs[HLSL_REGSET_NUMERIC].writemask; }
if (instr.dsts[0].reg.type == VKD3D_SM4_RT_DEPTHOUT) @@ -2438,7 +2455,9 @@ static void write_sm4_shdr(struct hlsl_ctx *ctx,
LIST_FOR_EACH_ENTRY(var, &ctx->extern_vars, const struct hlsl_ir_var, extern_entry) { - if (!var->reg.allocated || var->data_type->type != HLSL_CLASS_OBJECT) + enum hlsl_regset regset = hlsl_type_get_regset(ctx, var->data_type); + + if (regset > HLSL_REGSET_LAST_OBJECT || !var->regs[regset].allocated) continue;
if (var->data_type->base_type == HLSL_TYPE_SAMPLER)
From: Francisco Casas fcasas@codeweavers.com
--- libs/vkd3d-shader/hlsl.h | 17 ++++++++ libs/vkd3d-shader/hlsl_codegen.c | 68 ++++++++++---------------------- 2 files changed, 37 insertions(+), 48 deletions(-)
diff --git a/libs/vkd3d-shader/hlsl.h b/libs/vkd3d-shader/hlsl.h index 91b222eb..88a3a17a 100644 --- a/libs/vkd3d-shader/hlsl.h +++ b/libs/vkd3d-shader/hlsl.h @@ -777,6 +777,23 @@ struct hlsl_resource_load_params struct hlsl_ir_node *coords, *lod, *texel_offset; };
+static inline char hlsl_regset_name(enum hlsl_regset regset) +{ + switch (regset) + { + case HLSL_REGSET_S: + return 's'; + case HLSL_REGSET_T: + return 't'; + case HLSL_REGSET_U: + return 'u'; + case HLSL_REGSET_NUMERIC: + case HLSL_REGSET_VOID: + vkd3d_unreachable(); + } + vkd3d_unreachable(); +} + static inline struct hlsl_ir_call *hlsl_ir_call(const struct hlsl_ir_node *node) { assert(node->type == HLSL_IR_CALL); diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index a5b4f78f..b5042997 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -2577,50 +2577,27 @@ static void allocate_buffers(struct hlsl_ctx *ctx) } }
-static const struct hlsl_ir_var *get_reserved_object(struct hlsl_ctx *ctx, char type, uint32_t index) +static const struct hlsl_ir_var *get_reserved_object(struct hlsl_ctx *ctx, enum hlsl_regset regset, + uint32_t index) { const struct hlsl_ir_var *var;
LIST_FOR_EACH_ENTRY(var, &ctx->extern_vars, const struct hlsl_ir_var, extern_entry) { - if (var->last_read && var->reg_reservation.type == type && var->reg_reservation.index == index) + if (var->last_read && var->reg_reservation.type == hlsl_regset_name(regset) && var->reg_reservation.index == index) return var; } return NULL; }
-static const struct object_type_info +static void allocate_objects(struct hlsl_ctx *ctx, enum hlsl_regset regset) { - enum hlsl_base_type type; - char reg_name; -} -object_types[] = -{ - { HLSL_TYPE_SAMPLER, 's' }, - { HLSL_TYPE_TEXTURE, 't' }, - { HLSL_TYPE_UAV, 'u' }, -}; - -static const struct object_type_info *get_object_type_info(enum hlsl_base_type type) -{ - unsigned int i; - - for (i = 0; i < ARRAY_SIZE(object_types); ++i) - if (type == object_types[i].type) - return &object_types[i]; - - WARN("No type info for object type %u.\n", type); - return NULL; -} - -static void allocate_objects(struct hlsl_ctx *ctx, enum hlsl_base_type type) -{ - const struct object_type_info *type_info = get_object_type_info(type); + char regset_name = hlsl_regset_name(regset); struct hlsl_ir_var *var; uint32_t min_index = 0; uint32_t index;
- if (type == HLSL_TYPE_UAV) + if (regset == HLSL_REGSET_U) { LIST_FOR_EACH_ENTRY(var, &ctx->extern_vars, struct hlsl_ir_var, extern_entry) { @@ -2634,21 +2611,17 @@ static void allocate_objects(struct hlsl_ctx *ctx, enum hlsl_base_type type)
LIST_FOR_EACH_ENTRY(var, &ctx->extern_vars, struct hlsl_ir_var, extern_entry) { - enum hlsl_regset regset; - - if (!var->last_read || var->data_type->type != HLSL_CLASS_OBJECT - || var->data_type->base_type != type) + if (!var->last_read || var->data_type->reg_size[regset] == 0) continue;
- regset = hlsl_type_get_regset(ctx, var->data_type); - - if (var->reg_reservation.type == type_info->reg_name) + if (var->reg_reservation.type == regset_name) { - const struct hlsl_ir_var *reserved_object = get_reserved_object(ctx, type_info->reg_name, + const struct hlsl_ir_var *reserved_object = get_reserved_object(ctx, regset, var->reg_reservation.index);
if (var->reg_reservation.index < min_index) { + assert(regset == HLSL_REGSET_U); hlsl_error(ctx, &var->loc, VKD3D_SHADER_ERROR_HLSL_OVERLAPPING_RESERVATIONS, "UAV index (%u) must be higher than the maximum render target index (%u).", var->reg_reservation.index, min_index - 1); @@ -2656,25 +2629,24 @@ static void allocate_objects(struct hlsl_ctx *ctx, enum hlsl_base_type type) else if (reserved_object && reserved_object != var) { hlsl_error(ctx, &var->loc, VKD3D_SHADER_ERROR_HLSL_OVERLAPPING_RESERVATIONS, - "Multiple objects bound to %c%u.", type_info->reg_name, - var->reg_reservation.index); + "Multiple objects bound to %c%u.", regset_name, var->reg_reservation.index); hlsl_note(ctx, &reserved_object->loc, VKD3D_SHADER_LOG_ERROR, - "Object '%s' is already bound to %c%u.", reserved_object->name, - type_info->reg_name, var->reg_reservation.index); + "Object '%s' is already bound to %c%u.", reserved_object->name, regset_name, + var->reg_reservation.index); }
var->regs[regset].id = var->reg_reservation.index; var->regs[regset].allocated = true; - TRACE("Allocated reserved %s to %c%u.\n", var->name, type_info->reg_name, var->reg_reservation.index); + TRACE("Allocated reserved %s to %c%u.\n", var->name, regset_name, var->regs[regset].id); } else if (!var->reg_reservation.type) { - while (get_reserved_object(ctx, type_info->reg_name, index)) + while (get_reserved_object(ctx, regset, index)) ++index;
var->regs[regset].id = index; var->regs[regset].allocated = true; - TRACE("Allocated object to %c%u.\n", type_info->reg_name, index); + TRACE("Allocated object to %c%u.\n", regset_name, index); ++index; } else @@ -2684,7 +2656,7 @@ static void allocate_objects(struct hlsl_ctx *ctx, enum hlsl_base_type type) type_string = hlsl_type_to_string(ctx, var->data_type); hlsl_error(ctx, &var->loc, VKD3D_SHADER_ERROR_HLSL_INVALID_RESERVATION, "Object of type '%s' must be bound to register type '%c'.", - type_string->buffer, type_info->reg_name); + type_string->buffer, regset_name); hlsl_release_string_buffer(ctx, type_string); } } @@ -2994,11 +2966,11 @@ int hlsl_emit_bytecode(struct hlsl_ctx *ctx, struct hlsl_ir_function_decl *entry else { allocate_buffers(ctx); - allocate_objects(ctx, HLSL_TYPE_TEXTURE); - allocate_objects(ctx, HLSL_TYPE_UAV); + allocate_objects(ctx, HLSL_REGSET_T); + allocate_objects(ctx, HLSL_REGSET_U); } allocate_semantic_registers(ctx); - allocate_objects(ctx, HLSL_TYPE_SAMPLER); + allocate_objects(ctx, HLSL_REGSET_S);
if (ctx->result) return ctx->result;
From: Francisco Casas fcasas@codeweavers.com
--- libs/vkd3d-shader/hlsl_codegen.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index b5042997..3eec0d6f 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -2584,7 +2584,7 @@ static const struct hlsl_ir_var *get_reserved_object(struct hlsl_ctx *ctx, enum
LIST_FOR_EACH_ENTRY(var, &ctx->extern_vars, const struct hlsl_ir_var, extern_entry) { - if (var->last_read && var->reg_reservation.type == hlsl_regset_name(regset) && var->reg_reservation.index == index) + if (var->reg_reservation.type == hlsl_regset_name(regset) && var->reg_reservation.index == index) return var; } return NULL;
From: Francisco Casas fcasas@codeweavers.com
This refactoring is required for improving the allocation strategy so it works with multiple-register variables. --- libs/vkd3d-shader/hlsl_codegen.c | 75 ++++++++++++++++++++++---------- 1 file changed, 52 insertions(+), 23 deletions(-)
diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index 3eec0d6f..e726d5a1 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -1924,6 +1924,39 @@ static void dump_function(struct rb_entry *entry, void *context) rb_for_each_entry(&func->overloads, dump_function_decl, ctx); }
+static void allocate_register_reservations(struct hlsl_ctx *ctx) +{ + struct hlsl_ir_var *var; + + LIST_FOR_EACH_ENTRY(var, &ctx->extern_vars, struct hlsl_ir_var, extern_entry) + { + enum hlsl_regset regset = hlsl_type_get_regset(ctx, var->data_type); + + if (regset > HLSL_REGSET_LAST_OBJECT) + continue; + + if (var->reg_reservation.type) + { + if (var->reg_reservation.type != hlsl_regset_name(regset)) + { + struct vkd3d_string_buffer *type_string; + + type_string = hlsl_type_to_string(ctx, var->data_type); + hlsl_error(ctx, &var->loc, VKD3D_SHADER_ERROR_HLSL_INVALID_RESERVATION, + "Object of type '%s' must be bound to register type '%c'.", + type_string->buffer, hlsl_regset_name(regset)); + hlsl_release_string_buffer(ctx, type_string); + } + else + { + var->regs[regset].allocated = true; + var->regs[regset].id = var->reg_reservation.index; + TRACE("Allocated reserved %s to %c%u.\n", var->name, var->reg_reservation.type, var->reg_reservation.index); + } + } + } +} + /* Compute the earliest and latest liveness for each variable. In the case that * a variable is accessed inside of a loop, we promote its liveness to extend * to at least the range of the entire loop. Note that we don't need to do this @@ -2577,14 +2610,17 @@ static void allocate_buffers(struct hlsl_ctx *ctx) } }
-static const struct hlsl_ir_var *get_reserved_object(struct hlsl_ctx *ctx, enum hlsl_regset regset, +static const struct hlsl_ir_var *get_allocated_object(struct hlsl_ctx *ctx, enum hlsl_regset regset, uint32_t index) { const struct hlsl_ir_var *var;
LIST_FOR_EACH_ENTRY(var, &ctx->extern_vars, const struct hlsl_ir_var, extern_entry) { - if (var->reg_reservation.type == hlsl_regset_name(regset) && var->reg_reservation.index == index) + if (!var->regs[regset].allocated) + continue; + + if (index == var->regs[regset].id) return var; } return NULL; @@ -2611,37 +2647,39 @@ static void allocate_objects(struct hlsl_ctx *ctx, enum hlsl_regset regset)
LIST_FOR_EACH_ENTRY(var, &ctx->extern_vars, struct hlsl_ir_var, extern_entry) { - if (!var->last_read || var->data_type->reg_size[regset] == 0) + if (!var->last_read || !var->data_type->reg_size[regset]) continue;
- if (var->reg_reservation.type == regset_name) + if (var->regs[regset].allocated) { - const struct hlsl_ir_var *reserved_object = get_reserved_object(ctx, regset, - var->reg_reservation.index); + const struct hlsl_ir_var *reserved_object; + unsigned int index = var->regs[regset].id;
- if (var->reg_reservation.index < min_index) + reserved_object = get_allocated_object(ctx, regset, index); + + if (var->regs[regset].id < min_index) { assert(regset == HLSL_REGSET_U); hlsl_error(ctx, &var->loc, VKD3D_SHADER_ERROR_HLSL_OVERLAPPING_RESERVATIONS, "UAV index (%u) must be higher than the maximum render target index (%u).", - var->reg_reservation.index, min_index - 1); + var->regs[regset].id, min_index - 1); } else if (reserved_object && reserved_object != var) { hlsl_error(ctx, &var->loc, VKD3D_SHADER_ERROR_HLSL_OVERLAPPING_RESERVATIONS, - "Multiple objects bound to %c%u.", regset_name, var->reg_reservation.index); + "Multiple objects bound to %c%u.", regset_name, index); hlsl_note(ctx, &reserved_object->loc, VKD3D_SHADER_LOG_ERROR, - "Object '%s' is already bound to %c%u.", reserved_object->name, regset_name, - var->reg_reservation.index); + "Object '%s' is already bound to %c%u.", reserved_object->name, + regset_name, index); }
var->regs[regset].id = var->reg_reservation.index; var->regs[regset].allocated = true; TRACE("Allocated reserved %s to %c%u.\n", var->name, regset_name, var->regs[regset].id); } - else if (!var->reg_reservation.type) + else { - while (get_reserved_object(ctx, regset, index)) + while (get_allocated_object(ctx, regset, index)) ++index;
var->regs[regset].id = index; @@ -2649,16 +2687,6 @@ static void allocate_objects(struct hlsl_ctx *ctx, enum hlsl_regset regset) TRACE("Allocated object to %c%u.\n", regset_name, index); ++index; } - else - { - struct vkd3d_string_buffer *type_string; - - type_string = hlsl_type_to_string(ctx, var->data_type); - hlsl_error(ctx, &var->loc, VKD3D_SHADER_ERROR_HLSL_INVALID_RESERVATION, - "Object of type '%s' must be bound to register type '%c'.", - type_string->buffer, regset_name); - hlsl_release_string_buffer(ctx, type_string); - } } }
@@ -2958,6 +2986,7 @@ int hlsl_emit_bytecode(struct hlsl_ctx *ctx, struct hlsl_ir_function_decl *entry if (TRACE_ON()) rb_for_each_entry(&ctx->functions, dump_function, ctx);
+ allocate_register_reservations(ctx); allocate_temp_registers(ctx, entry_func); if (profile->major_version < 4) {
On Mon Jan 30 23:52:35 2023 +0000, Francisco Casas wrote:
We require allocating variables across multiple register sets because some variables (in particular structs) can contain fields of different objects and numeric types, and we can't promote these fields into separated variables (well, at least before allocation), since allocation has a set of wonky rules that consider the variable as a whole (that the next patches try to address). For instance, in the following shader: https://shader-playground.timjones.io/fc9cdeee0e000008281a7d3f1c7d94b9 even if only one texture and one sampler are used, they bindings are `t5` and `s2` respectively. I recall attempting to separate variables by regset a couple of times without success, so there should also be other additional reasons that I cannot clearly remember.
Right, but above you also say that structs get split up into separate variables for sm5, and that putting object components into a struct is illegal for sm4. If we're going to split up the struct, that would seem to leave all uniform variables with a single register set, no?
So, I don't see much reason to have them represent another level of indirection using values such as `HLSL_REGSET_RESOURCES` since they are used pass this point of backend-specific-ness.
I don't see how this is another level of indirection. The idea is HLSL_REGSET_RESOURCE would map to VKD3DSPR_RESOURCE / VKD3D_SM4_RT_RESOURCE for sm4, and VKD3DSPR_SAMPLER for sm1.
On Tue Jan 31 18:33:08 2023 +0000, Zebediah Figura wrote:
Right, but above you also say that structs get split up into separate variables for sm5, and that putting object components into a struct is illegal for sm4. If we're going to split up the struct, that would seem to leave all uniform variables with a single register set, no?
Structs have to be split into separated components in sm5 but only **after** register allocation.
This is because register allocation still considers variables as a whole, it is only for the RDEF that components have to be written separately.
On Tue Jan 31 20:21:04 2023 +0000, Francisco Casas wrote:
Structs have to be split into separated components in sm5 but only **after** register allocation. This is because register allocation still considers variables as a whole, it is only for the RDEF that components have to be written separately.
What does it mean that "register allocation still considers variables as a whole"? Why can't we get the same effect from just allocating the separate variables (assuming we do it in a consistent order)?
On Tue Jan 31 18:37:19 2023 +0000, Zebediah Figura wrote:
So, I don't see much reason to have them represent another level of
indirection using values such as `HLSL_REGSET_RESOURCES` since they are used pass this point of backend-specific-ness. I don't see how this is another level of indirection. The idea is HLSL_REGSET_RESOURCE would map to VKD3DSPR_RESOURCE / VKD3D_SM4_RT_RESOURCE for sm4, and VKD3DSPR_SAMPLER for sm1.
What I am currently doing is to have each regset represent a single letter used for register reservations (and thus, in the assembly), which also matches the "space" that is used for calculating the register offsets of the object components within a struct.
So, SM1 would be mapping samplers (and the textures that result from combining sampler+texture, in following patches) to `HLSL_REGSET_S`, while for SM4 we would be mapping samplers to `HLSL_REGSET_S` and textures to `HLSL_REGSET_T`. This mapping is done now in `hlsl_type_get_regset()`.
``` SM1: Texture -> HLSL_REGSET_S -> D3DSPR_SAMPLER Sampler -> HLSL_REGSET_S -> D3DSPR_SAMPLER
SM4: Texture -> HLSL_REGSET_T -> VKD3D_SM4_RT_RESOURCE Sampler -> HLSL_REGSET_S -> VKD3D_SM4_RT_SAMPLER UAV -> HLSL_REGSET_U -> VKD3D_SM5_RT_UAV ```
If I am understanding correctly, based on your last comment, you are suggesting doing:
``` SM1: Texture -> HLSL_REGSET_RESOURCE -> D3DSPR_SAMPLER Sampler -> HLSL_REGSET_SAMPLER -> D3DSPR_SAMPLER
SM4: Texture -> HLSL_REGSET_RESOURCE -> VKD3D_SM4_RT_RESOURCE Sampler -> HLSL_REGSET_SAMPLER -> VKD3D_SM4_RT_SAMPLER UAV -> HLSL_REGSET_UAV -> VKD3D_SM5_RT_UAV ```
which means, always assigning each type to the same regset, regardless of the shader model, deferring the mapping to a unique register type until writing the binary. But if we do this, we would require an intermediate mapping from `HLSL_REGSET_*` to the letters used for register reservations and the different register spaces used for calculating offsets. Because, in SM1, sampler and texture indexes are not separated.
On Tue Jan 31 21:31:28 2023 +0000, Francisco Casas wrote:
What I am currently doing is to have each regset represent a single letter used for register reservations (and thus, in the assembly), which also matches the "space" that is used for calculating the register offsets of the object components within a struct. So, SM1 would be mapping samplers (and the textures that result from combining sampler+texture, in following patches) to `HLSL_REGSET_S`, while for SM4 we would be mapping samplers to `HLSL_REGSET_S` and textures to `HLSL_REGSET_T`. This mapping is done now in `hlsl_type_get_regset()`.
SM1: Texture -> HLSL_REGSET_S -> D3DSPR_SAMPLER Sampler -> HLSL_REGSET_S -> D3DSPR_SAMPLER SM4: Texture -> HLSL_REGSET_T -> VKD3D_SM4_RT_RESOURCE Sampler -> HLSL_REGSET_S -> VKD3D_SM4_RT_SAMPLER UAV -> HLSL_REGSET_U -> VKD3D_SM5_RT_UAV
If I am understanding correctly, based on your last comment, you are suggesting doing:
SM1: Texture -> HLSL_REGSET_RESOURCE -> D3DSPR_SAMPLER Sampler -> HLSL_REGSET_SAMPLER -> D3DSPR_SAMPLER SM4: Texture -> HLSL_REGSET_RESOURCE -> VKD3D_SM4_RT_RESOURCE Sampler -> HLSL_REGSET_SAMPLER -> VKD3D_SM4_RT_SAMPLER UAV -> HLSL_REGSET_UAV -> VKD3D_SM5_RT_UAV
which means, always assigning each type to the same regset, regardless of the shader model, deferring the mapping to a unique register type until writing the binary. But if we do this, we would require an intermediate mapping from `HLSL_REGSET_*` to the letters used for register reservations and the different register spaces used for calculating offsets. Because, in SM1, sampler and texture indexes are not separated.
We shouldn't even have sampler variables left over by the time we get to sm1.
But if we do this, we would require an intermediate mapping from HLSL_REGSET_* to the letters used for register reservations and the different register spaces used for calculating offsets.
But we need an intermediate mapping from HLSL_REGSET_* to backend-specific register type enums anyway. Why does it matter how we map hlsl types to regsets?
We shouldn't even have sampler variables left over by the time we get to sm1.
I am not sure if you meant to say "textures" instead of "samplers", but combined samplers appear like "Texture2D" in the CTAB, and `register(s·)` can be used in textures so that they reserve a particular sampler register:
https://shader-playground.timjones.io/d201fdfb98dd0cfa8149d3abb5f19320
Well, maybe I didn't consider the possibility of handling the combined sampler internally as a Sampler and adding a special flag to present it like a texture.
But we need an intermediate mapping from HLSL_REGSET_* to backend-specific register type enums anyway. Why does it matter how we map hlsl types to regsets?
I think that what I care about, rather than the name change, is that regsets can be used directly in the passes for allocating objects and for calculating deref offsets, both of which are backend-specific but have to be done before writing the actual bytecode.
On Tue Jan 31 21:28:48 2023 +0000, Zebediah Figura wrote:
What does it mean that "register allocation still considers variables as a whole"? Why can't we get the same effect from just allocating the separate variables (assuming we do it in a consistent order)?
In the same example above, the `uniform struct apple ap` is allocated from `t0` to `t5` and from `s0` to `s2`, even though only `ap.sam2[0]` and `ap.texs2[1]` are used. Which means that all the register for `ap.texs1[]` and `ap.sam1[]` were allocated even if they are not used.
The rules are that for each variable, all texture registers are allocated if only one is used, and sampler registers are allocated from the first one to the last one that is actually used. These rules work at the variable-level, without considering how the variable is subdivided into components.
We shouldn't even have sampler variables left over by the time we get to sm1.
I am not sure if you meant to say "textures" instead of "samplers", but combined samplers appear like "Texture2D" in the CTAB, and `register(s·)` can be used in textures so that they reserve a particular sampler register:
I mean we shouldn't have HLSL_TYPE_SAMPLER variables left over. That's what my (out-of-tree) patches accomplish by lowering "separate" HLSL_IR_RESOURCE_LOAD to "combined" HLSL_IR_RESOURCE_LOAD instructions. I.e. removing the "sampler" variable from the instruction.
Arguably it's a bit awkward naming-wise to map HLSL_TYPE_TEXTURE to what sm1 (or GLSL, for that matter) calls a "sampler", but in terms of functionality I think it makes the most sense (e.g. HLSL_TYPE_TEXTURE has a dimension, as do sm1 samplers; HLSL_TYPE_SAMPLER does not.)
But we need an intermediate mapping from HLSL_REGSET_* to backend-specific register type enums anyway. Why does it matter how we map hlsl types to regsets?
I think that what I care about, rather than the name change, is that regsets can be used directly in the passes for allocating objects and for calculating deref offsets, both of which are backend-specific but have to be done before writing the actual bytecode.
I don't quite see how the mapping affects that, though?
On Tue Jan 31 23:39:32 2023 +0000, Zebediah Figura wrote:
We shouldn't even have sampler variables left over by the time we
get to sm1.
I am not sure if you meant to say "textures" instead of "samplers",
but combined samplers appear like "Texture2D" in the CTAB, and `register(s·)` can be used in textures so that they reserve a particular sampler register: I mean we shouldn't have HLSL_TYPE_SAMPLER variables left over. That's what my (out-of-tree) patches accomplish by lowering "separate" HLSL_IR_RESOURCE_LOAD to "combined" HLSL_IR_RESOURCE_LOAD instructions. I.e. removing the "sampler" variable from the instruction. Arguably it's a bit awkward naming-wise to map HLSL_TYPE_TEXTURE to what sm1 (or GLSL, for that matter) calls a "sampler", but in terms of functionality I think it makes the most sense (e.g. HLSL_TYPE_TEXTURE has a dimension, as do sm1 samplers; HLSL_TYPE_SAMPLER does not.)
But we need an intermediate mapping from HLSL_REGSET_* to
backend-specific register type enums anyway. Why does it matter how we map hlsl types to regsets?
I think that what I care about, rather than the name change, is that
regsets can be used directly in the passes for allocating objects and for calculating deref offsets, both of which are backend-specific but have to be done before writing the actual bytecode. I don't quite see how the mapping affects that, though?
I realize, by the way, the current code doesn't really work like this (cf. the object_types global array), but I think the current code is broken for sm1 anyway.
On Tue Jan 31 23:03:22 2023 +0000, Francisco Casas wrote:
In the same example above, the `uniform struct apple ap` is allocated from `t0` to `t5` and from `s0` to `s2`, even though only `ap.sam2[0]` and `ap.texs2[1]` are used. Which means that all the registers for `ap.texs1[]` and `ap.sam1[]` were allocated even if they are not used. The rules are that for each variable, all texture registers are allocated if only one is used, and sampler registers are allocated from the first one to the last one that is actually used. These rules work at the variable-level, without considering how the variable is subdivided into components.
Oh. Right. I think I knew that at some point but forgot. Sorry for the runaround.
I do think that hlsl_type_get_regset() feels conceptually broken, though, and HLSL_REGSET_VOID as well. The function signature implies to the reader that a type can only have one register set, which simply isn't true. If we were to assert that it was never called with a struct type, that would probably be fine. As far as I can tell that's true of all of its callers right now anyway, except for:
* write_sm4_rdef() and write_sm4_shdr(), which can just move it after the check for HLSL_CLASS_OBJECT;
* write_sm1_uniforms() should I think validate that structs never contain objects and then just assume that any struct is numeric;
* sm4_compare_externs() - why did this need to be changed? (And if this actually does change sorting order it deserves its own patch.)
Oh. Right. I think I knew that at some point but forgot. Sorry for the runaround.
No problem, I am sorry for all the times it has happened to me too. There are... many things to consider at the same time.
I do think that hlsl_type_get_regset() feels conceptually broken, though, and HLSL_REGSET_VOID as well.
Yep, the problem is that in following patches there are many more places where getting rid of `HLSL_REGSET_VOID` would mean having to add a check for if a variable is struct or not (and it can be quite easy to forget to add one). That's why, at first, I just mapped structs to `HLSL_REGSET_NUM` and now `HLSL_REGSET_VOID` serves that purpose.
Also, since now `hlsl_type_calculate_reg_size()` calls `hlsl_type_get_regset()` the latter also has to consider `HLSL_TYPE_PIXELSHADER`, `HLSL_TYPE_VERTEXSHADER`, `HLSL_TYPE_STRING`, and `HLSL_TYPE_VOID` as inputs.
I will keep thinking, maybe there is a better way to implement this, maybe another function that tells whether a type belongs to a single regset or not.
On Tue Jan 31 16:41:54 2023 +0000, Francisco Casas wrote:
There was some discussion in !27 about this. The initial name, "register space", was discarded. Zeb and I showed preference for "register set", henri suggested "register file".
Ok, then it's fine.
I do think that hlsl_type_get_regset() feels conceptually broken, though, and HLSL_REGSET_VOID as well.
Yep, the problem is that in following patches there are many more places where getting rid of `HLSL_REGSET_VOID` would mean having to add a check for if a variable is struct or not (and it can be quite easy to forget to add one). That's why, at first, I just mapped structs to `HLSL_REGSET_NUM` and now `HLSL_REGSET_VOID` serves that purpose.
The point is that any callers of hlsl_type_get_regset() either don't need to handle structs, or need to do something special to handle structs. In the case of sm1 "something special" might mean "assuming they're always numeric", but that's still something we want to spell out explicitly. HLSL_REGSET_VOID should never actually be interpreted, and as a corollary shouldn't exist.
Also, since now `hlsl_type_calculate_reg_size()` calls `hlsl_type_get_regset()` the latter also has to consider `HLSL_TYPE_PIXELSHADER`, `HLSL_TYPE_VERTEXSHADER`, `HLSL_TYPE_STRING`, and `HLSL_TYPE_VOID` as inputs.
hlsl_type_calculate_reg_size() really just shouldn't exist, at least not in its current form. We should never care about the register size of a variable that has any of those types. I think we eventually want to stop stashing reg_size in the type itself and just make hlsl_type_calculate_reg_size() a pure function. Probably it should just duplicate hlsl_type_get_regset() for now.