The general idea is: * Run copy prop initially, save the copy prop state. * Unroll one iteration, substitute all jumps by stores to synthetic control vars, move all instructions after the jumps to checks to those control vars. * Run copy prop on the iteration, using the previous iteration's or the root's state as the parent state. * Probe the copy prop state to check if the "loop_broken" control variable was set, stop unrolling if it true, otherwise continue.
Some stuff I'm less sure about and would love comments on: * The way I deferred bounds checking. * Splitting the iter block from the body. This was needed because otherwise copy prop marks variable that are changed in iter blocks (e.g "i++") as invalidated and does not propagate them to subsequent iterations. * Messing around with copy prop internals. I attempted two other similar solutions that were less invasive: running copy prop on the entire block after each unroll and running copy prop once, after unrolling to `max_iterations`, both were too slow to be viable.
From: Victor Chiletto vchiletto@codeweavers.com
--- tests/hlsl/texture-load.shader_test | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+)
diff --git a/tests/hlsl/texture-load.shader_test b/tests/hlsl/texture-load.shader_test index 495fa88ea..3858f7ca6 100644 --- a/tests/hlsl/texture-load.shader_test +++ b/tests/hlsl/texture-load.shader_test @@ -117,3 +117,30 @@ float4 main(float4 pos : sv_position) : sv_target Texture2DMS<float4> s = t; return s.Load(pos.yx, 0); } + +% SM4.0 cannot dynamically index multisampled textures, it relies on loop unrolling. + +[require] +shader model >= 4.0 +shader model < 4.1 + +[pixel shader todo] +Texture2DMS<float4, 1> t; + +float4 main(float4 pos : sv_position) : sv_target +{ + int i; + float4 o; + for (i = 0; i < 1; i++) + { + o = t.Load(pos.xy, i); + } + return o; +} + +[test] +todo draw quad +probe (0, 0) rgba (0.1, 0.2, 0.3, 0.4) +probe (1, 0) rgba (0.5, 0.7, 0.6, 0.8) +probe (0, 1) rgba (0.6, 0.5, 0.2, 0.1) +probe (1, 1) rgba (0.8, 0.0, 0.7, 1.0)
From: Victor Chiletto vchiletto@codeweavers.com
We potentially generate OOB accesses during loop unrolling that are later deleted. --- libs/vkd3d-shader/hlsl_codegen.c | 173 ++++++++++++++++++++++--------- 1 file changed, 124 insertions(+), 49 deletions(-)
diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index 5c09ce04f..882401665 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -1954,6 +1954,76 @@ bool hlsl_copy_propagation_execute(struct hlsl_ctx *ctx, struct hlsl_block *bloc return progress; }
+enum validation_result +{ + DEREF_VALIDATION_OK, + DEREF_VALIDATION_OUT_OF_BOUNDS, + DEREF_VALIDATION_NOT_CONSTANT, +}; + +static enum validation_result hlsl_validate_component_index_range_from_deref(struct hlsl_ctx *ctx, const struct hlsl_deref *deref) +{ + struct hlsl_type *type = deref->var->data_type; + unsigned int i; + + for (i = 0; i < deref->path_len; ++i) + { + struct hlsl_ir_node *path_node = deref->path[i].node; + unsigned int idx = 0; + + assert(path_node); + if (path_node->type != HLSL_IR_CONSTANT) + return DEREF_VALIDATION_NOT_CONSTANT; + + /* We should always have generated a cast to UINT. */ + assert(path_node->data_type->class == HLSL_CLASS_SCALAR + && path_node->data_type->base_type == HLSL_TYPE_UINT); + + idx = hlsl_ir_constant(path_node)->value.u[0].u; + + switch (type->class) + { + case HLSL_CLASS_VECTOR: + if (idx >= type->dimx) + { + hlsl_error(ctx, &path_node->loc, VKD3D_SHADER_ERROR_HLSL_OFFSET_OUT_OF_BOUNDS, + "Vector index is out of bounds. %u/%u", idx, type->dimx); + return DEREF_VALIDATION_OUT_OF_BOUNDS; + } + break; + + case HLSL_CLASS_MATRIX: + if (idx >= hlsl_type_major_size(type)) + { + hlsl_error(ctx, &path_node->loc, VKD3D_SHADER_ERROR_HLSL_OFFSET_OUT_OF_BOUNDS, + "Matrix index is out of bounds. %u/%u", idx, hlsl_type_major_size(type)); + return DEREF_VALIDATION_OUT_OF_BOUNDS; + } + break; + + case HLSL_CLASS_ARRAY: + if (idx >= type->e.array.elements_count) + { + hlsl_error(ctx, &path_node->loc, VKD3D_SHADER_ERROR_HLSL_OFFSET_OUT_OF_BOUNDS, + "Array index is out of bounds. %u/%u (%p, %u)", idx, type->e.array.elements_count, path_node, path_node->index); + return DEREF_VALIDATION_OUT_OF_BOUNDS; + } + break; + + case HLSL_CLASS_STRUCT: + break; + + default: + vkd3d_unreachable(); + } + + type = hlsl_get_element_type_from_path_index(ctx, type, path_node); + } + + return DEREF_VALIDATION_OK; +} + + static void note_non_static_deref_expressions(struct hlsl_ctx *ctx, const struct hlsl_deref *deref, const char *usage) { @@ -1971,60 +2041,77 @@ static void note_non_static_deref_expressions(struct hlsl_ctx *ctx, const struct } }
-static bool validate_static_object_references(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, +static bool validate_dereferences(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, void *context) { - unsigned int start, count; - - if (instr->type == HLSL_IR_RESOURCE_LOAD) + switch (instr->type) { - struct hlsl_ir_resource_load *load = hlsl_ir_resource_load(instr); - - if (!load->resource.var->is_uniform) + case HLSL_IR_RESOURCE_LOAD: { - hlsl_error(ctx, &instr->loc, VKD3D_SHADER_ERROR_HLSL_NON_STATIC_OBJECT_REF, - "Loaded resource must have a single uniform source."); + struct hlsl_ir_resource_load *load = hlsl_ir_resource_load(instr); + + if (!load->resource.var->is_uniform) + { + hlsl_error(ctx, &instr->loc, VKD3D_SHADER_ERROR_HLSL_NON_STATIC_OBJECT_REF, + "Loaded resource must have a single uniform source."); + } + else if (hlsl_validate_component_index_range_from_deref(ctx, &load->resource) == DEREF_VALIDATION_NOT_CONSTANT) + { + hlsl_error(ctx, &instr->loc, VKD3D_SHADER_ERROR_HLSL_NON_STATIC_OBJECT_REF, + "Loaded resource from "%s" must be determinable at compile time.", + load->resource.var->name); + note_non_static_deref_expressions(ctx, &load->resource, "loaded resource"); + } + + if (load->sampler.var) + { + if (!load->sampler.var->is_uniform) + { + hlsl_error(ctx, &instr->loc, VKD3D_SHADER_ERROR_HLSL_NON_STATIC_OBJECT_REF, + "Resource load sampler must have a single uniform source."); + } + else if (hlsl_validate_component_index_range_from_deref(ctx, &load->sampler) == DEREF_VALIDATION_NOT_CONSTANT) + { + hlsl_error(ctx, &instr->loc, VKD3D_SHADER_ERROR_HLSL_NON_STATIC_OBJECT_REF, + "Resource load sampler from "%s" must be determinable at compile time.", + load->sampler.var->name); + note_non_static_deref_expressions(ctx, &load->sampler, "resource load sampler"); + } + } + break; } - else if (!hlsl_component_index_range_from_deref(ctx, &load->resource, &start, &count)) + case HLSL_IR_RESOURCE_STORE: { - hlsl_error(ctx, &instr->loc, VKD3D_SHADER_ERROR_HLSL_NON_STATIC_OBJECT_REF, - "Loaded resource from "%s" must be determinable at compile time.", - load->resource.var->name); - note_non_static_deref_expressions(ctx, &load->resource, "loaded resource"); - } + struct hlsl_ir_resource_store *store = hlsl_ir_resource_store(instr);
- if (load->sampler.var) - { - if (!load->sampler.var->is_uniform) + if (!store->resource.var->is_uniform) { hlsl_error(ctx, &instr->loc, VKD3D_SHADER_ERROR_HLSL_NON_STATIC_OBJECT_REF, - "Resource load sampler must have a single uniform source."); + "Accessed resource must have a single uniform source."); } - else if (!hlsl_component_index_range_from_deref(ctx, &load->sampler, &start, &count)) + else if (hlsl_validate_component_index_range_from_deref(ctx, &store->resource) == DEREF_VALIDATION_NOT_CONSTANT) { hlsl_error(ctx, &instr->loc, VKD3D_SHADER_ERROR_HLSL_NON_STATIC_OBJECT_REF, - "Resource load sampler from "%s" must be determinable at compile time.", - load->sampler.var->name); - note_non_static_deref_expressions(ctx, &load->sampler, "resource load sampler"); + "Accessed resource from "%s" must be determinable at compile time.", + store->resource.var->name); + note_non_static_deref_expressions(ctx, &store->resource, "accessed resource"); } + break; } - } - else if (instr->type == HLSL_IR_RESOURCE_STORE) - { - struct hlsl_ir_resource_store *store = hlsl_ir_resource_store(instr); - - if (!store->resource.var->is_uniform) + case HLSL_IR_LOAD: { - hlsl_error(ctx, &instr->loc, VKD3D_SHADER_ERROR_HLSL_NON_STATIC_OBJECT_REF, - "Accessed resource must have a single uniform source."); + struct hlsl_ir_load *load = hlsl_ir_load(instr); + hlsl_validate_component_index_range_from_deref(ctx, &load->src); + break; } - else if (!hlsl_component_index_range_from_deref(ctx, &store->resource, &start, &count)) + case HLSL_IR_STORE: { - hlsl_error(ctx, &instr->loc, VKD3D_SHADER_ERROR_HLSL_NON_STATIC_OBJECT_REF, - "Accessed resource from "%s" must be determinable at compile time.", - store->resource.var->name); - note_non_static_deref_expressions(ctx, &store->resource, "accessed resource"); + struct hlsl_ir_store *store = hlsl_ir_store(instr); + hlsl_validate_component_index_range_from_deref(ctx, &store->lhs); + break; } + default: + break; }
return false; @@ -5034,21 +5121,13 @@ bool hlsl_component_index_range_from_deref(struct hlsl_ctx *ctx, const struct hl { case HLSL_CLASS_VECTOR: if (idx >= type->dimx) - { - hlsl_error(ctx, &path_node->loc, VKD3D_SHADER_ERROR_HLSL_OFFSET_OUT_OF_BOUNDS, - "Vector index is out of bounds. %u/%u", idx, type->dimx); return false; - } *start += idx; break;
case HLSL_CLASS_MATRIX: if (idx >= hlsl_type_major_size(type)) - { - hlsl_error(ctx, &path_node->loc, VKD3D_SHADER_ERROR_HLSL_OFFSET_OUT_OF_BOUNDS, - "Matrix index is out of bounds. %u/%u", idx, hlsl_type_major_size(type)); return false; - } if (hlsl_type_is_row_major(type)) *start += idx * type->dimx; else @@ -5057,11 +5136,7 @@ bool hlsl_component_index_range_from_deref(struct hlsl_ctx *ctx, const struct hl
case HLSL_CLASS_ARRAY: if (idx >= type->e.array.elements_count) - { - hlsl_error(ctx, &path_node->loc, VKD3D_SHADER_ERROR_HLSL_OFFSET_OUT_OF_BOUNDS, - "Array index is out of bounds. %u/%u", idx, type->e.array.elements_count); return false; - } *start += idx * hlsl_type_component_count(type->e.array.type); break;
@@ -5450,7 +5525,7 @@ int hlsl_emit_bytecode(struct hlsl_ctx *ctx, struct hlsl_ir_function_decl *entry lower_ir(ctx, lower_casts_to_bool, body); lower_ir(ctx, lower_int_dot, body);
- hlsl_transform_ir(ctx, validate_static_object_references, body, NULL); + hlsl_transform_ir(ctx, validate_dereferences, body, NULL); hlsl_transform_ir(ctx, track_object_components_sampler_dim, body, NULL); if (profile->major_version >= 4) hlsl_transform_ir(ctx, lower_combined_samples, body, NULL);
From: Victor Chiletto vchiletto@codeweavers.com
--- libs/vkd3d-shader/hlsl.c | 19 ++++++++++++++++--- libs/vkd3d-shader/hlsl.h | 3 ++- libs/vkd3d-shader/hlsl.y | 5 +---- libs/vkd3d-shader/hlsl_codegen.c | 12 ++++++++++++ libs/vkd3d-shader/tpf.c | 1 + 5 files changed, 32 insertions(+), 8 deletions(-)
diff --git a/libs/vkd3d-shader/hlsl.c b/libs/vkd3d-shader/hlsl.c index cba954c98..b7bb856aa 100644 --- a/libs/vkd3d-shader/hlsl.c +++ b/libs/vkd3d-shader/hlsl.c @@ -1618,7 +1618,7 @@ struct hlsl_ir_node *hlsl_new_jump(struct hlsl_ctx *ctx, enum hlsl_ir_jump_type }
struct hlsl_ir_node *hlsl_new_loop(struct hlsl_ctx *ctx, - struct hlsl_block *block, const struct vkd3d_shader_location *loc) + struct hlsl_block *block, struct hlsl_block *iter, const struct vkd3d_shader_location *loc) { struct hlsl_ir_loop *loop;
@@ -1627,6 +1627,11 @@ struct hlsl_ir_node *hlsl_new_loop(struct hlsl_ctx *ctx, init_node(&loop->node, HLSL_IR_LOOP, NULL, loc); hlsl_block_init(&loop->body); hlsl_block_add_block(&loop->body, block); + + hlsl_block_init(&loop->iter); + if (iter) + hlsl_block_add_block(&loop->iter, iter); + return &loop->node; }
@@ -1783,12 +1788,18 @@ static struct hlsl_ir_node *clone_load(struct hlsl_ctx *ctx, struct clone_instr_ static struct hlsl_ir_node *clone_loop(struct hlsl_ctx *ctx, struct clone_instr_map *map, struct hlsl_ir_loop *src) { struct hlsl_ir_node *dst; - struct hlsl_block body; + struct hlsl_block body, iter;
if (!clone_block(ctx, &body, &src->body, map)) return NULL;
- if (!(dst = hlsl_new_loop(ctx, &body, &src->node.loc))) + if (!clone_block(ctx, &iter, &src->iter, map)) + { + hlsl_block_cleanup(&body); + return NULL; + } + + if (!(dst = hlsl_new_loop(ctx, &body, &iter, &src->node.loc))) { hlsl_block_cleanup(&body); return NULL; @@ -2694,6 +2705,7 @@ static void dump_ir_loop(struct hlsl_ctx *ctx, struct vkd3d_string_buffer *buffe { vkd3d_string_buffer_printf(buffer, "for (;;) {\n"); dump_block(ctx, buffer, &loop->body); + dump_block(ctx, buffer, &loop->iter); vkd3d_string_buffer_printf(buffer, " %10s }", ""); }
@@ -3012,6 +3024,7 @@ static void free_ir_load(struct hlsl_ir_load *load)
static void free_ir_loop(struct hlsl_ir_loop *loop) { + hlsl_block_cleanup(&loop->iter); hlsl_block_cleanup(&loop->body); vkd3d_free(loop); } diff --git a/libs/vkd3d-shader/hlsl.h b/libs/vkd3d-shader/hlsl.h index 64111f3fc..9a04c8ee7 100644 --- a/libs/vkd3d-shader/hlsl.h +++ b/libs/vkd3d-shader/hlsl.h @@ -522,6 +522,7 @@ struct hlsl_ir_loop struct hlsl_ir_node node; /* loop condition is stored in the body (as "if (!condition) break;") */ struct hlsl_block body; + struct hlsl_block iter; unsigned int next_index; /* liveness index of the end of the loop */ };
@@ -1278,7 +1279,7 @@ bool hlsl_index_chain_has_resource_access(struct hlsl_ir_index *index); struct hlsl_ir_node *hlsl_new_index(struct hlsl_ctx *ctx, struct hlsl_ir_node *val, struct hlsl_ir_node *idx, const struct vkd3d_shader_location *loc); struct hlsl_ir_node *hlsl_new_loop(struct hlsl_ctx *ctx, - struct hlsl_block *block, const struct vkd3d_shader_location *loc); + struct hlsl_block *block, struct hlsl_block *iter, const struct vkd3d_shader_location *loc); struct hlsl_ir_node *hlsl_new_resource_load(struct hlsl_ctx *ctx, const struct hlsl_resource_load_params *params, const struct vkd3d_shader_location *loc); struct hlsl_ir_node *hlsl_new_resource_store(struct hlsl_ctx *ctx, const struct hlsl_deref *resource, diff --git a/libs/vkd3d-shader/hlsl.y b/libs/vkd3d-shader/hlsl.y index 52c217654..2372af026 100644 --- a/libs/vkd3d-shader/hlsl.y +++ b/libs/vkd3d-shader/hlsl.y @@ -610,15 +610,12 @@ static struct hlsl_block *create_loop(struct hlsl_ctx *ctx, enum loop_type type, if (!append_conditional_break(ctx, cond)) goto oom;
- if (iter) - hlsl_block_add_block(body, iter); - if (type == LOOP_DO_WHILE) list_move_tail(&body->instrs, &cond->instrs); else list_move_head(&body->instrs, &cond->instrs);
- if (!(loop = hlsl_new_loop(ctx, body, loc))) + if (!(loop = hlsl_new_loop(ctx, body, iter, loc))) goto oom; hlsl_block_add_instr(init, loop);
diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index 882401665..87c0081ec 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -616,6 +616,7 @@ bool hlsl_transform_ir(struct hlsl_ctx *ctx, bool (*func)(struct hlsl_ctx *ctx, else if (instr->type == HLSL_IR_LOOP) { progress |= hlsl_transform_ir(ctx, func, &hlsl_ir_loop(instr)->body, context); + progress |= hlsl_transform_ir(ctx, func, &hlsl_ir_loop(instr)->iter, context); } else if (instr->type == HLSL_IR_SWITCH) { @@ -843,6 +844,7 @@ static bool lower_return(struct hlsl_ctx *ctx, struct hlsl_ir_function_decl *fun else if (instr->type == HLSL_IR_LOOP) { has_early_return |= lower_return(ctx, func, &hlsl_ir_loop(instr)->body, true); + has_early_return |= lower_return(ctx, func, &hlsl_ir_loop(instr)->iter, true);
if (has_early_return) { @@ -1228,6 +1230,7 @@ static unsigned int index_instructions(struct hlsl_block *block, unsigned int in else if (instr->type == HLSL_IR_LOOP) { index = index_instructions(&hlsl_ir_loop(instr)->body, index); + index = index_instructions(&hlsl_ir_loop(instr)->iter, index); hlsl_ir_loop(instr)->next_index = index; } else if (instr->type == HLSL_IR_SWITCH) @@ -1801,6 +1804,7 @@ static void copy_propagation_invalidate_from_block(struct hlsl_ctx *ctx, struct struct hlsl_ir_loop *loop = hlsl_ir_loop(instr);
copy_propagation_invalidate_from_block(ctx, state, &loop->body, time); + copy_propagation_invalidate_from_block(ctx, state, &loop->iter, time);
break; } @@ -1858,9 +1862,11 @@ static bool copy_propagation_process_loop(struct hlsl_ctx *ctx, struct hlsl_ir_l bool progress = false;
copy_propagation_invalidate_from_block(ctx, state, &loop->body, loop->node.index); + copy_propagation_invalidate_from_block(ctx, state, &loop->iter, loop->node.index);
copy_propagation_state_init(ctx, &inner_state, state); progress |= copy_propagation_transform_block(ctx, &loop->body, &inner_state); + progress |= copy_propagation_transform_block(ctx, &loop->iter, &inner_state); copy_propagation_state_destroy(&inner_state);
return progress; @@ -3998,6 +4004,9 @@ static void compute_liveness_recurse(struct hlsl_block *block, unsigned int loop
compute_liveness_recurse(&loop->body, loop_first ? loop_first : instr->index, loop_last ? loop_last : loop->next_index); + + compute_liveness_recurse(&loop->iter, loop_first ? loop_first : instr->index, + loop_last ? loop_last : loop->next_index); break; } case HLSL_IR_RESOURCE_LOAD: @@ -4439,6 +4448,7 @@ static void allocate_temp_registers_recurse(struct hlsl_ctx *ctx, { struct hlsl_ir_loop *loop = hlsl_ir_loop(instr); allocate_temp_registers_recurse(ctx, &loop->body, allocator); + allocate_temp_registers_recurse(ctx, &loop->iter, allocator); break; }
@@ -4567,6 +4577,7 @@ static void allocate_const_registers_recurse(struct hlsl_ctx *ctx, { struct hlsl_ir_loop *loop = hlsl_ir_loop(instr); allocate_const_registers_recurse(ctx, &loop->body, allocator); + allocate_const_registers_recurse(ctx, &loop->iter, allocator); break; }
@@ -5351,6 +5362,7 @@ static void remove_unreachable_code(struct hlsl_ctx *ctx, struct hlsl_block *bod struct hlsl_ir_loop *loop = hlsl_ir_loop(instr);
remove_unreachable_code(ctx, &loop->body); + remove_unreachable_code(ctx, &loop->iter); } else if (instr->type == HLSL_IR_SWITCH) { diff --git a/libs/vkd3d-shader/tpf.c b/libs/vkd3d-shader/tpf.c index 4d0658313..4dea50237 100644 --- a/libs/vkd3d-shader/tpf.c +++ b/libs/vkd3d-shader/tpf.c @@ -5478,6 +5478,7 @@ static void write_sm4_loop(const struct tpf_writer *tpf, const struct hlsl_ir_lo write_sm4_instruction(tpf, &instr);
write_sm4_block(tpf, &loop->body); + write_sm4_block(tpf, &loop->iter);
instr.opcode = VKD3D_SM4_OP_ENDLOOP; write_sm4_instruction(tpf, &instr);
From: Victor Chiletto vchiletto@codeweavers.com
--- libs/vkd3d-shader/hlsl.y | 6 ---- libs/vkd3d-shader/hlsl_codegen.c | 50 ++++++++++++++++++++++++++++++++ 2 files changed, 50 insertions(+), 6 deletions(-)
diff --git a/libs/vkd3d-shader/hlsl.y b/libs/vkd3d-shader/hlsl.y index 2372af026..9c2339124 100644 --- a/libs/vkd3d-shader/hlsl.y +++ b/libs/vkd3d-shader/hlsl.y @@ -527,12 +527,6 @@ static void resolve_loop_continue(struct hlsl_ctx *ctx, struct hlsl_block *block } list_move_before(&instr->entry, &cond_block.instrs); } - else if (type == LOOP_FOR) - { - if (!hlsl_clone_block(ctx, &cond_block, iter)) - return; - list_move_before(&instr->entry, &cond_block.instrs); - } jump->type = HLSL_IR_JUMP_CONTINUE; } } diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index 87c0081ec..8a94b4b8a 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -2394,6 +2394,55 @@ static bool remove_trivial_conditional_branches(struct hlsl_ctx *ctx, struct hls return true; }
+ +static void transform_insert_continue_iter_blocks(struct hlsl_ctx *ctx, struct hlsl_block *block, struct hlsl_block *iter) +{ + struct hlsl_ir_node *node; + + LIST_FOR_EACH_ENTRY(node, &block->instrs, struct hlsl_ir_node, entry) + { + switch (node->type) + { + case HLSL_IR_LOOP: + { + struct hlsl_ir_loop *loop = hlsl_ir_loop(node); + transform_insert_continue_iter_blocks(ctx, &loop->body, &loop->iter); + break; + } + case HLSL_IR_IF: + { + struct hlsl_ir_if *iff = hlsl_ir_if(node); + transform_insert_continue_iter_blocks(ctx, &iff->then_block, iter); + transform_insert_continue_iter_blocks(ctx, &iff->else_block, iter); + break; + } + case HLSL_IR_SWITCH: + { + struct hlsl_ir_switch *s = hlsl_ir_switch(node); + struct hlsl_ir_switch_case *c; + + LIST_FOR_EACH_ENTRY(c, &s->cases, struct hlsl_ir_switch_case, entry) + transform_insert_continue_iter_blocks(ctx, &c->body, iter); + + break; + } + case HLSL_IR_JUMP: + { + struct hlsl_ir_jump *jump = hlsl_ir_jump(node); + if (jump->type == HLSL_IR_JUMP_CONTINUE && !list_empty(&iter->instrs)) + { + struct hlsl_block clone; + hlsl_clone_block(ctx, &clone, iter); + list_move_before(&node->entry, &clone.instrs); + } + break; + } + default: + break; + } + } +} + static bool normalize_switch_cases(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, void *context) { struct hlsl_ir_switch_case *c, *def = NULL; @@ -5523,6 +5572,7 @@ int hlsl_emit_bytecode(struct hlsl_ctx *ctx, struct hlsl_ir_function_decl *entry progress |= hlsl_transform_ir(ctx, remove_trivial_conditional_branches, body, NULL); } while (progress); + transform_insert_continue_iter_blocks(ctx, body, NULL); remove_unreachable_code(ctx, body); hlsl_transform_ir(ctx, normalize_switch_cases, body, NULL);
From: Victor Chiletto vchiletto@codeweavers.com
Based on a patch by Nikolay Sivov.
Co-authored-by: Nikolay Sivov nsivov@codeweavers.com --- libs/vkd3d-shader/hlsl.c | 7 +- libs/vkd3d-shader/hlsl.h | 4 +- libs/vkd3d-shader/hlsl.y | 153 ++++++++++++++++++++------------------- 3 files changed, 86 insertions(+), 78 deletions(-)
diff --git a/libs/vkd3d-shader/hlsl.c b/libs/vkd3d-shader/hlsl.c index b7bb856aa..00dc562c1 100644 --- a/libs/vkd3d-shader/hlsl.c +++ b/libs/vkd3d-shader/hlsl.c @@ -1618,7 +1618,8 @@ struct hlsl_ir_node *hlsl_new_jump(struct hlsl_ctx *ctx, enum hlsl_ir_jump_type }
struct hlsl_ir_node *hlsl_new_loop(struct hlsl_ctx *ctx, - struct hlsl_block *block, struct hlsl_block *iter, const struct vkd3d_shader_location *loc) + struct hlsl_block *block, struct hlsl_block *iter, + bool unroll, unsigned int unroll_limit, const struct vkd3d_shader_location *loc) { struct hlsl_ir_loop *loop;
@@ -1632,6 +1633,8 @@ struct hlsl_ir_node *hlsl_new_loop(struct hlsl_ctx *ctx, if (iter) hlsl_block_add_block(&loop->iter, iter);
+ loop->unroll = unroll; + loop->unroll_limit = unroll_limit; return &loop->node; }
@@ -1799,7 +1802,7 @@ static struct hlsl_ir_node *clone_loop(struct hlsl_ctx *ctx, struct clone_instr_ return NULL; }
- if (!(dst = hlsl_new_loop(ctx, &body, &iter, &src->node.loc))) + if (!(dst = hlsl_new_loop(ctx, &body, &iter, src->unroll, src->unroll_limit, &src->node.loc))) { hlsl_block_cleanup(&body); return NULL; diff --git a/libs/vkd3d-shader/hlsl.h b/libs/vkd3d-shader/hlsl.h index 9a04c8ee7..1266c3170 100644 --- a/libs/vkd3d-shader/hlsl.h +++ b/libs/vkd3d-shader/hlsl.h @@ -524,6 +524,8 @@ struct hlsl_ir_loop struct hlsl_block body; struct hlsl_block iter; unsigned int next_index; /* liveness index of the end of the loop */ + unsigned int unroll_limit; + bool unroll; };
struct hlsl_ir_switch_case @@ -1279,7 +1281,7 @@ bool hlsl_index_chain_has_resource_access(struct hlsl_ir_index *index); struct hlsl_ir_node *hlsl_new_index(struct hlsl_ctx *ctx, struct hlsl_ir_node *val, struct hlsl_ir_node *idx, const struct vkd3d_shader_location *loc); struct hlsl_ir_node *hlsl_new_loop(struct hlsl_ctx *ctx, - struct hlsl_block *block, struct hlsl_block *iter, const struct vkd3d_shader_location *loc); + struct hlsl_block *block, struct hlsl_block *iter, bool unroll, unsigned int unroll_limit, const struct vkd3d_shader_location *loc); struct hlsl_ir_node *hlsl_new_resource_load(struct hlsl_ctx *ctx, const struct hlsl_resource_load_params *params, const struct vkd3d_shader_location *loc); struct hlsl_ir_node *hlsl_new_resource_store(struct hlsl_ctx *ctx, const struct hlsl_deref *resource, diff --git a/libs/vkd3d-shader/hlsl.y b/libs/vkd3d-shader/hlsl.y index 9c2339124..2e5bd9e02 100644 --- a/libs/vkd3d-shader/hlsl.y +++ b/libs/vkd3d-shader/hlsl.y @@ -554,12 +554,79 @@ static void check_loop_attributes(struct hlsl_ctx *ctx, const struct parse_attri hlsl_error(ctx, loc, VKD3D_SHADER_ERROR_HLSL_INVALID_SYNTAX, "Unroll attribute can't be used with 'fastopt' attribute."); }
+static unsigned int evaluate_static_expression_as_uint(struct hlsl_ctx *ctx, struct hlsl_block *block, + const struct vkd3d_shader_location *loc) +{ + struct hlsl_ir_constant *constant; + struct hlsl_ir_node *node; + struct hlsl_block expr; + unsigned int ret = 0; + bool progress; + + LIST_FOR_EACH_ENTRY(node, &block->instrs, struct hlsl_ir_node, entry) + { + switch (node->type) + { + case HLSL_IR_CONSTANT: + case HLSL_IR_EXPR: + case HLSL_IR_SWIZZLE: + case HLSL_IR_LOAD: + case HLSL_IR_INDEX: + continue; + case HLSL_IR_CALL: + case HLSL_IR_IF: + case HLSL_IR_LOOP: + case HLSL_IR_JUMP: + case HLSL_IR_RESOURCE_LOAD: + case HLSL_IR_RESOURCE_STORE: + case HLSL_IR_STORE: + case HLSL_IR_SWITCH: + hlsl_error(ctx, &node->loc, VKD3D_SHADER_ERROR_HLSL_INVALID_SYNTAX, + "Expected literal expression."); + } + } + + if (!hlsl_clone_block(ctx, &expr, &ctx->static_initializers)) + return 0; + hlsl_block_add_block(&expr, block); + + if (!add_implicit_conversion(ctx, &expr, node_from_block(&expr), + hlsl_get_scalar_type(ctx, HLSL_TYPE_UINT), loc)) + { + hlsl_block_cleanup(&expr); + return 0; + } + + do + { + progress = hlsl_transform_ir(ctx, hlsl_fold_constant_exprs, &expr, NULL); + progress |= hlsl_copy_propagation_execute(ctx, &expr); + } while (progress); + + node = node_from_block(&expr); + if (node->type == HLSL_IR_CONSTANT) + { + constant = hlsl_ir_constant(node); + ret = constant->value.u[0].u; + } + else + { + hlsl_error(ctx, &node->loc, VKD3D_SHADER_ERROR_HLSL_INVALID_SYNTAX, + "Failed to evaluate constant expression."); + } + + hlsl_block_cleanup(&expr); + + return ret; +} + static struct hlsl_block *create_loop(struct hlsl_ctx *ctx, enum loop_type type, const struct parse_attribute_list *attributes, struct hlsl_block *init, struct hlsl_block *cond, struct hlsl_block *iter, struct hlsl_block *body, const struct vkd3d_shader_location *loc) { + unsigned int i, unroll_limit = 0; struct hlsl_ir_node *loop; - unsigned int i; + bool unroll = true;
if (attribute_list_has_duplicates(attributes)) hlsl_error(ctx, loc, VKD3D_SHADER_ERROR_HLSL_INVALID_SYNTAX, "Found duplicate attribute."); @@ -572,18 +639,20 @@ static struct hlsl_block *create_loop(struct hlsl_ctx *ctx, enum loop_type type, const struct hlsl_attribute *attr = attributes->attrs[i]; if (!strcmp(attr->name, "unroll")) { - if (attr->args_count) - { - hlsl_error(ctx, loc, VKD3D_SHADER_ERROR_HLSL_NOT_IMPLEMENTED, "Unroll attribute with iteration count."); - } - else + if (attr->args_count == 1) { - hlsl_warning(ctx, loc, VKD3D_SHADER_ERROR_HLSL_NOT_IMPLEMENTED, "Loop unrolling is not implemented."); + struct hlsl_block expr; + hlsl_block_init(&expr); + if (!hlsl_clone_block(ctx, &expr, &attr->instrs)) + return NULL; + + unroll_limit = evaluate_static_expression_as_uint(ctx, &expr, loc); + hlsl_block_cleanup(&expr); } } else if (!strcmp(attr->name, "loop")) { - /* TODO: this attribute will be used to disable unrolling, once it's implememented. */ + unroll = false; } else if (!strcmp(attr->name, "fastopt") || !strcmp(attr->name, "allow_uav_condition")) @@ -609,7 +678,7 @@ static struct hlsl_block *create_loop(struct hlsl_ctx *ctx, enum loop_type type, else list_move_head(&body->instrs, &cond->instrs);
- if (!(loop = hlsl_new_loop(ctx, body, iter, loc))) + if (!(loop = hlsl_new_loop(ctx, body, iter, unroll, unroll_limit, loc))) goto oom; hlsl_block_add_instr(init, loop);
@@ -1263,72 +1332,6 @@ static struct hlsl_block *make_block(struct hlsl_ctx *ctx, struct hlsl_ir_node * return block; }
-static unsigned int evaluate_static_expression_as_uint(struct hlsl_ctx *ctx, struct hlsl_block *block, - const struct vkd3d_shader_location *loc) -{ - struct hlsl_ir_constant *constant; - struct hlsl_ir_node *node; - struct hlsl_block expr; - unsigned int ret = 0; - bool progress; - - LIST_FOR_EACH_ENTRY(node, &block->instrs, struct hlsl_ir_node, entry) - { - switch (node->type) - { - case HLSL_IR_CONSTANT: - case HLSL_IR_EXPR: - case HLSL_IR_SWIZZLE: - case HLSL_IR_LOAD: - case HLSL_IR_INDEX: - continue; - case HLSL_IR_CALL: - case HLSL_IR_IF: - case HLSL_IR_LOOP: - case HLSL_IR_JUMP: - case HLSL_IR_RESOURCE_LOAD: - case HLSL_IR_RESOURCE_STORE: - case HLSL_IR_STORE: - case HLSL_IR_SWITCH: - hlsl_error(ctx, &node->loc, VKD3D_SHADER_ERROR_HLSL_INVALID_SYNTAX, - "Expected literal expression."); - } - } - - if (!hlsl_clone_block(ctx, &expr, &ctx->static_initializers)) - return 0; - hlsl_block_add_block(&expr, block); - - if (!add_implicit_conversion(ctx, &expr, node_from_block(&expr), - hlsl_get_scalar_type(ctx, HLSL_TYPE_UINT), loc)) - { - hlsl_block_cleanup(&expr); - return 0; - } - - do - { - progress = hlsl_transform_ir(ctx, hlsl_fold_constant_exprs, &expr, NULL); - progress |= hlsl_copy_propagation_execute(ctx, &expr); - } while (progress); - - node = node_from_block(&expr); - if (node->type == HLSL_IR_CONSTANT) - { - constant = hlsl_ir_constant(node); - ret = constant->value.u[0].u; - } - else - { - hlsl_error(ctx, &node->loc, VKD3D_SHADER_ERROR_HLSL_INVALID_SYNTAX, - "Failed to evaluate constant expression."); - } - - hlsl_block_cleanup(&expr); - - return ret; -} - static bool expr_compatible_data_types(struct hlsl_type *t1, struct hlsl_type *t2) { if (t1->base_type > HLSL_TYPE_LAST_SCALAR || t2->base_type > HLSL_TYPE_LAST_SCALAR)
From: Victor Chiletto vchiletto@codeweavers.com
Based on a patch by Nikolay Sivov.
Co-authored-by: Nikolay Sivov nsivov@codeweavers.com --- libs/vkd3d-shader/hlsl_codegen.c | 365 ++++++++++++++++++++++- libs/vkd3d-shader/vkd3d_shader_private.h | 1 + tests/hlsl/for.shader_test | 4 +- tests/hlsl/function-return.shader_test | 4 +- tests/hlsl/loop.shader_test | 16 +- tests/hlsl/return.shader_test | 4 +- tests/hlsl/texture-load.shader_test | 4 +- 7 files changed, 372 insertions(+), 26 deletions(-)
diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index 8a94b4b8a..b4cff480c 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -2394,6 +2394,359 @@ static bool remove_trivial_conditional_branches(struct hlsl_ctx *ctx, struct hls return true; }
+static void transform_run_const_passes(struct hlsl_ctx *ctx, struct hlsl_block *body) +{ + bool progress; + + do + { + progress = hlsl_transform_ir(ctx, hlsl_fold_constant_exprs, body, NULL); + progress |= hlsl_transform_ir(ctx, hlsl_fold_constant_swizzles, body, NULL); + progress |= hlsl_copy_propagation_execute(ctx, body); + progress |= hlsl_transform_ir(ctx, fold_swizzle_chains, body, NULL); + progress |= hlsl_transform_ir(ctx, remove_trivial_swizzles, body, NULL); + progress |= hlsl_transform_ir(ctx, remove_trivial_conditional_branches, body, NULL); + } while (progress); +} + +static bool loop_unrolling_generate_const_bool_store(struct hlsl_ctx *ctx, struct hlsl_ir_var *var, bool val, struct hlsl_block *block, struct vkd3d_shader_location *loc) +{ + struct hlsl_ir_node *const_false, *store; + + if (!(const_false = hlsl_new_bool_constant(ctx, val, loc))) + return false; + hlsl_block_add_instr(block, const_false); + + if (!(store = hlsl_new_simple_store(ctx, var, const_false))) + return false; + hlsl_block_add_instr(block, store); + + return true; +} + +static bool loop_unrolling_remove_jumps_recurse(struct hlsl_ctx *ctx, struct hlsl_block *block, struct hlsl_ir_var *loop_broken, struct hlsl_ir_var *loop_continued); + +static bool loop_unrolling_remove_jumps_visit(struct hlsl_ctx *ctx, struct hlsl_ir_node *node, struct hlsl_ir_var *loop_broken, struct hlsl_ir_var *loop_continued) +{ + switch (node->type) + { + case HLSL_IR_IF: + { + struct hlsl_ir_if *iff = hlsl_ir_if(node); + if (loop_unrolling_remove_jumps_recurse(ctx, &iff->then_block, loop_broken, loop_continued)) + return true; + + if (loop_unrolling_remove_jumps_recurse(ctx, &iff->else_block, loop_broken, loop_continued)) + return true; + + break; + } + case HLSL_IR_JUMP: + { + struct hlsl_ir_jump *jump = hlsl_ir_jump(node); + + if (jump->type == HLSL_IR_JUMP_CONTINUE || jump->type == HLSL_IR_JUMP_BREAK) + { + struct hlsl_block draft; + struct hlsl_ir_var *var; + hlsl_block_init(&draft); + + if (jump->type == HLSL_IR_JUMP_CONTINUE) + var = loop_continued; + else + var = loop_broken; + + if (!loop_unrolling_generate_const_bool_store(ctx, var, true, &draft, &jump->node.loc)) + return false; + + list_move_before(&jump->node.entry, &draft.instrs); + list_remove(&jump->node.entry); + hlsl_free_instr(&jump->node); + + return true; + } + + break; + } + default: + break; + } + + return false; +} + +static struct hlsl_ir_if *loop_unrolling_generate_var_check(struct hlsl_ctx *ctx, struct hlsl_block *dst, struct vkd3d_shader_location *loc, struct hlsl_ir_var *var) +{ + struct hlsl_ir_node *cond, *iff; + struct hlsl_block then_block; + struct hlsl_ir_load *load; + + hlsl_block_init(&then_block); + + if (!(load = hlsl_new_var_load(ctx, var, loc))) + return NULL; + hlsl_block_add_instr(dst, &load->node); + + if (!(cond = hlsl_new_unary_expr(ctx, HLSL_OP1_LOGIC_NOT, &load->node, loc))) + return NULL; + hlsl_block_add_instr(dst, cond); + + if (!(iff = hlsl_new_if(ctx, cond, &then_block, NULL, loc))) + return NULL; + hlsl_block_add_instr(dst, iff); + + return hlsl_ir_if(iff); +} + +static bool loop_unrolling_remove_jumps_recurse(struct hlsl_ctx *ctx, struct hlsl_block *block, struct hlsl_ir_var *loop_broken, struct hlsl_ir_var *loop_continued) +{ + struct hlsl_ir_node *node, *next; + + LIST_FOR_EACH_ENTRY_SAFE(node, next, &block->instrs, struct hlsl_ir_node, entry) + { + if (loop_unrolling_remove_jumps_visit(ctx, node, loop_broken, loop_continued)) + { + if (&next->entry != &block->instrs) + { + struct hlsl_ir_if *broken_check, *continued_check; + struct hlsl_block draft; + hlsl_block_init(&draft); + + broken_check = loop_unrolling_generate_var_check(ctx, &draft, &next->loc, loop_broken); + continued_check = loop_unrolling_generate_var_check(ctx, &broken_check->then_block, &next->loc, loop_continued); + list_move_before(&next->entry, &draft.instrs); + + list_move_slice_tail(&continued_check->then_block.instrs, &next->entry, list_tail(&block->instrs)); + } + + return true; + } + } + + return false; +} + +static void loop_unrolling_remove_jumps(struct hlsl_ctx *ctx, struct hlsl_block *block, struct hlsl_ir_var *loop_broken, struct hlsl_ir_var *loop_continued) +{ + while (loop_unrolling_remove_jumps_recurse(ctx, block, loop_broken, loop_continued)); +} + +static bool loop_unrolling_simplify_and_check_broken(struct hlsl_ctx *ctx, struct hlsl_block *block, struct hlsl_ir_var *loop_broken, + struct copy_propagation_state *previous_state, struct copy_propagation_state *current_state, unsigned int *index) +{ + struct copy_propagation_value *loop_broken_value; + unsigned int current_index; + bool progress; + + do { + copy_propagation_state_destroy(current_state); + copy_propagation_state_init(ctx, current_state, previous_state); + + progress = hlsl_transform_ir(ctx, hlsl_fold_constant_exprs, block, NULL); + progress |= hlsl_transform_ir(ctx, hlsl_fold_constant_swizzles, block, NULL); + current_index = index_instructions(block, *index); + progress |= copy_propagation_transform_block(ctx, block, current_state); + progress |= hlsl_transform_ir(ctx, fold_swizzle_chains, block, NULL); + progress |= hlsl_transform_ir(ctx, remove_trivial_swizzles, block, NULL); + progress |= hlsl_transform_ir(ctx, remove_trivial_conditional_branches, block, NULL); + } while (progress); + + *index = current_index; + + if ((loop_broken_value = copy_propagation_get_value(current_state, loop_broken, 0, (unsigned int) -1)) && + loop_broken_value->node->type == HLSL_IR_CONSTANT) + { + struct hlsl_ir_constant *condition = hlsl_ir_constant(loop_broken_value->node); + if (condition->value.u[0].u) + return true; + } + + return false; +} + +#define LOOP_UNROLLING_DEFAULT_MAX_ITERATIONS 1024 + +static bool loop_unrolling_unroll_loop(struct hlsl_ctx *ctx, struct hlsl_block *block, struct hlsl_ir_loop *loop) +{ + struct copy_propagation_state partial_states[LOOP_UNROLLING_DEFAULT_MAX_ITERATIONS + 1] = { 0 }; + struct hlsl_ir_var *loop_broken = NULL, *loop_continued = NULL; + unsigned int max_iterations, index, i; + struct hlsl_ir_if *target_if = NULL; + struct hlsl_block draft, tmp_dst; + + max_iterations = LOOP_UNROLLING_DEFAULT_MAX_ITERATIONS; + if (loop->unroll_limit) + max_iterations = min(loop->unroll_limit, max_iterations); + + hlsl_block_init(&draft); + hlsl_block_init(&tmp_dst); + list_move_slice_tail(&draft.instrs, list_head(&block->instrs), list_prev(&block->instrs, &loop->node.entry)); + + if (!(loop_broken = hlsl_new_synthetic_var(ctx, "loop_broken", hlsl_get_scalar_type(ctx, HLSL_TYPE_BOOL), &loop->node.loc))) + goto fail; + + if (!(loop_continued = hlsl_new_synthetic_var(ctx, "loop_continued", hlsl_get_scalar_type(ctx, HLSL_TYPE_BOOL), &loop->node.loc))) + goto fail; + + if (!loop_unrolling_generate_const_bool_store(ctx, loop_broken, false, &draft, &loop->node.loc)) + goto fail; + + if (!(target_if = loop_unrolling_generate_var_check(ctx, &draft, &loop->node.loc, loop_broken))) + goto fail; + + copy_propagation_state_init(ctx, &partial_states[0], NULL); + index = index_instructions(&draft, 2); + copy_propagation_transform_block(ctx, &draft, &partial_states[0]); + + for (i = 0; i < max_iterations; ++i) + { + if (!loop_unrolling_generate_const_bool_store(ctx, loop_continued, false, &target_if->then_block, &loop->node.loc)) + goto fail; + + if (!hlsl_clone_block(ctx, &tmp_dst, &loop->body)) + goto fail; + hlsl_block_add_block(&target_if->then_block, &tmp_dst); + + loop_unrolling_remove_jumps(ctx, &target_if->then_block, loop_broken, loop_continued); + + if (loop_unrolling_simplify_and_check_broken(ctx, &target_if->then_block, loop_broken, &partial_states[i], &partial_states[i + 1], &index)) + break; + + if (!(target_if = loop_unrolling_generate_var_check(ctx, &draft, &loop->node.loc, loop_broken))) + goto fail; + + if (!hlsl_clone_block(ctx, &tmp_dst, &loop->iter)) + goto fail; + + hlsl_block_add_block(&target_if->then_block, &tmp_dst); + } + + /* Native gives up on unrolling entirely after 1024 iterations. + * It also will not insert a loop if there are iterations left + * after max_iterations, i.e [unroll(4)] for (i = 0; i < 8; ++i)) */ + if (i == LOOP_UNROLLING_DEFAULT_MAX_ITERATIONS) + { + hlsl_warning(ctx, &loop->node.loc, VKD3D_SHADER_WARNING_HLSL_UNABLE_TO_UNROLL, "Unable to unroll loop, maximum iterations reached (%u).", LOOP_UNROLLING_DEFAULT_MAX_ITERATIONS); + goto fail; + } + + list_remove(&loop->node.entry); + hlsl_free_instr(&loop->node); + + list_move_head(&block->instrs, &draft.instrs); + hlsl_block_cleanup(&tmp_dst); + hlsl_block_cleanup(&draft); + + for (i = 0; i < LOOP_UNROLLING_DEFAULT_MAX_ITERATIONS + 1; ++i) + copy_propagation_state_destroy(&partial_states[i]); + + return true; + +fail: + hlsl_block_cleanup(&draft); + hlsl_block_cleanup(&tmp_dst); + + for (i = 0; i < LOOP_UNROLLING_DEFAULT_MAX_ITERATIONS + 1; ++i) + copy_propagation_state_destroy(&partial_states[i]); + + if (!loop_continued) + hlsl_free_var(loop_continued); + + if (!loop_broken) + hlsl_free_var(loop_broken); + + return false; +} + +static struct hlsl_ir_loop *loop_unrolling_find_unrollable_loop(struct hlsl_ctx *ctx, struct hlsl_block *block, struct hlsl_block **containing_block) +{ + struct hlsl_ir_node *instr, *next; + + LIST_FOR_EACH_ENTRY_SAFE(instr, next, &block->instrs, struct hlsl_ir_node, entry) + { + switch (instr->type) + { + case HLSL_IR_LOOP: + { + struct hlsl_ir_loop *nested_loop; + + struct hlsl_ir_loop *loop = hlsl_ir_loop(instr); + + if ((nested_loop = loop_unrolling_find_unrollable_loop(ctx, &loop->body, containing_block))) + return nested_loop; + + if (loop->unroll) + { + *containing_block = block; + return loop; + } + + break; + } + case HLSL_IR_IF: + { + struct hlsl_ir_loop *loop; + + struct hlsl_ir_if *iff = hlsl_ir_if(instr); + + if ((loop = loop_unrolling_find_unrollable_loop(ctx, &iff->then_block, containing_block))) + return loop; + + if ((loop = loop_unrolling_find_unrollable_loop(ctx, &iff->else_block, containing_block))) + return loop; + + break; + } + case HLSL_IR_SWITCH: + { + struct hlsl_ir_switch *s = hlsl_ir_switch(instr); + struct hlsl_ir_switch_case *c; + struct hlsl_ir_loop *loop; + + LIST_FOR_EACH_ENTRY(c, &s->cases, struct hlsl_ir_switch_case, entry) + { + if ((loop = loop_unrolling_find_unrollable_loop(ctx, &c->body, containing_block))) + return loop; + } + + break; + } + default: + break; + } + } + + return NULL; +} + +static void transform_unroll_loops(struct hlsl_ctx *ctx, struct hlsl_block *block) +{ + while (true) + { + struct hlsl_block clone, *containing_block; + struct hlsl_ir_loop *loop, *cloned_loop; + + if (!(loop = loop_unrolling_find_unrollable_loop(ctx, block, &containing_block))) + return; + + if (!hlsl_clone_block(ctx, &clone, block)) + return; + + cloned_loop = loop_unrolling_find_unrollable_loop(ctx, &clone, &containing_block); + assert(cloned_loop); + + if (!loop_unrolling_unroll_loop(ctx, containing_block, cloned_loop)) + { + hlsl_block_cleanup(&clone); + loop->unroll = false; + continue; + } + + hlsl_block_cleanup(block); + hlsl_block_init(block); + hlsl_block_add_block(block, &clone); + } +}
static void transform_insert_continue_iter_blocks(struct hlsl_ctx *ctx, struct hlsl_block *block, struct hlsl_block *iter) { @@ -5562,17 +5915,9 @@ int hlsl_emit_bytecode(struct hlsl_ctx *ctx, struct hlsl_ir_function_decl *entry lower_ir(ctx, lower_int_abs, body); lower_ir(ctx, lower_float_modulus, body); hlsl_transform_ir(ctx, fold_redundant_casts, body, NULL); - do - { - progress = hlsl_transform_ir(ctx, hlsl_fold_constant_exprs, body, NULL); - progress |= hlsl_transform_ir(ctx, hlsl_fold_constant_swizzles, body, NULL); - progress |= hlsl_copy_propagation_execute(ctx, body); - progress |= hlsl_transform_ir(ctx, fold_swizzle_chains, body, NULL); - progress |= hlsl_transform_ir(ctx, remove_trivial_swizzles, body, NULL); - progress |= hlsl_transform_ir(ctx, remove_trivial_conditional_branches, body, NULL); - } - while (progress); + transform_unroll_loops(ctx, body); transform_insert_continue_iter_blocks(ctx, body, NULL); + transform_run_const_passes(ctx, body); remove_unreachable_code(ctx, body); hlsl_transform_ir(ctx, normalize_switch_cases, body, NULL);
diff --git a/libs/vkd3d-shader/vkd3d_shader_private.h b/libs/vkd3d-shader/vkd3d_shader_private.h index b07a7bff7..10864e162 100644 --- a/libs/vkd3d-shader/vkd3d_shader_private.h +++ b/libs/vkd3d-shader/vkd3d_shader_private.h @@ -157,6 +157,7 @@ enum vkd3d_shader_error VKD3D_SHADER_WARNING_HLSL_IMAGINARY_NUMERIC_RESULT = 5303, VKD3D_SHADER_WARNING_HLSL_NON_FINITE_RESULT = 5304, VKD3D_SHADER_WARNING_HLSL_IGNORED_ATTRIBUTE = 5305, + VKD3D_SHADER_WARNING_HLSL_UNABLE_TO_UNROLL = 5306,
VKD3D_SHADER_ERROR_GLSL_INTERNAL = 6000,
diff --git a/tests/hlsl/for.shader_test b/tests/hlsl/for.shader_test index 7ce6c8213..b3fbd76d7 100644 --- a/tests/hlsl/for.shader_test +++ b/tests/hlsl/for.shader_test @@ -63,7 +63,7 @@ probe (481, 0, 640, 480) rgba ( 5.0, 10.0, 0.0, 0.0) [require] % Reset requirements
-[pixel shader todo(sm<4)] +[pixel shader] float4 main(float tex : texcoord) : sv_target { int i; @@ -76,7 +76,7 @@ float4 main(float tex : texcoord) : sv_target }
[test] -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (10.0, 45.0, 0.0, 0.0)
[pixel shader fail(sm<6)] diff --git a/tests/hlsl/function-return.shader_test b/tests/hlsl/function-return.shader_test index 3c085a578..9d754a0e2 100644 --- a/tests/hlsl/function-return.shader_test +++ b/tests/hlsl/function-return.shader_test @@ -143,7 +143,7 @@ uniform 0 float 0.9 todo(sm<4 | glsl) draw quad probe all rgba (1.0, 0.9, 1.0, 0.6) 1
-[pixel shader todo(sm<4)] +[pixel shader] float func(out float o) { o = 0.1; @@ -181,7 +181,7 @@ float4 main() : sv_target }
[test] -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (0.4, 0.3, 0.3, 0.9) 1
[pixel shader todo(sm<4)] diff --git a/tests/hlsl/loop.shader_test b/tests/hlsl/loop.shader_test index 2de10d986..47fece6ac 100644 --- a/tests/hlsl/loop.shader_test +++ b/tests/hlsl/loop.shader_test @@ -1,6 +1,6 @@ % TODO: dxcompiler emits no loops for any of these test shaders.
-[pixel shader todo(sm<4)] +[pixel shader] float a;
float4 main() : sv_target @@ -18,11 +18,11 @@ float4 main() : sv_target
[test] uniform 0 float 5.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (50.0, 50.0, 50.0, 50.0)
-[pixel shader todo(sm<4)] +[pixel shader] float a;
float4 main() : sv_target @@ -41,10 +41,10 @@ float4 main() : sv_target
[test] uniform 0 float 4.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (20.0, 20.0, 20.0, 20.0)
-[pixel shader todo(sm<4)] +[pixel shader] float a;
float4 main() : sv_target @@ -70,10 +70,10 @@ float4 main() : sv_target
[test] uniform 0 float 4.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (409.1, 409.1, 409.1, 409.1)
-[pixel shader todo(sm<4)] +[pixel shader] float a;
float4 main() : sv_target @@ -100,7 +100,7 @@ float4 main() : sv_target
[test] uniform 0 float 4.0 -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (410.1, 410.1, 410.1, 410.1)
% loop attribute by itself diff --git a/tests/hlsl/return.shader_test b/tests/hlsl/return.shader_test index 2195f749a..24c157afd 100644 --- a/tests/hlsl/return.shader_test +++ b/tests/hlsl/return.shader_test @@ -124,7 +124,7 @@ uniform 0 float 0.9 todo(sm<4 | glsl) draw quad probe all rgba (0.4, 0.5, 0.6, 0.7) 1
-[pixel shader todo(sm<4)] +[pixel shader] void main(out float4 ret : sv_target) { ret = float4(0.1, 0.2, 0.3, 0.4); @@ -138,7 +138,7 @@ void main(out float4 ret : sv_target) }
[test] -todo(sm<4 | glsl) draw quad +todo(glsl) draw quad probe all rgba (0.2, 0.4, 0.6, 0.8)
[pixel shader todo(sm<4)] diff --git a/tests/hlsl/texture-load.shader_test b/tests/hlsl/texture-load.shader_test index 3858f7ca6..bf63ec307 100644 --- a/tests/hlsl/texture-load.shader_test +++ b/tests/hlsl/texture-load.shader_test @@ -124,7 +124,7 @@ float4 main(float4 pos : sv_position) : sv_target shader model >= 4.0 shader model < 4.1
-[pixel shader todo] +[pixel shader] Texture2DMS<float4, 1> t;
float4 main(float4 pos : sv_position) : sv_target @@ -139,7 +139,7 @@ float4 main(float4 pos : sv_position) : sv_target }
[test] -todo draw quad +todo(glsl) draw quad probe (0, 0) rgba (0.1, 0.2, 0.3, 0.4) probe (1, 0) rgba (0.5, 0.7, 0.6, 0.8) probe (0, 1) rgba (0.6, 0.5, 0.2, 0.1)
Nikolay Sivov (@nsivov) commented about libs/vkd3d-shader/hlsl_codegen.c:
- list_move_head(&block->instrs, &draft.instrs);
- hlsl_block_cleanup(&tmp_dst);
- hlsl_block_cleanup(&draft);
- for (i = 0; i < LOOP_UNROLLING_DEFAULT_MAX_ITERATIONS + 1; ++i)
copy_propagation_state_destroy(&partial_states[i]);
- return true;
+fail:
- hlsl_block_cleanup(&draft);
- hlsl_block_cleanup(&tmp_dst);
- for (i = 0; i < LOOP_UNROLLING_DEFAULT_MAX_ITERATIONS + 1; ++i)
copy_propagation_state_destroy(&partial_states[i]);
I find 1024 iteration loops disturbing. Why do we need to save this state? Could you give a small example for how this runs, and what gets saved.
I haven't reviewed in detail, but I have some high level comments:
* 2/6 seems suspicious. Why are we running copyprop on out-of-bounds loads if we're going to delete them? Can't we DCE them first?
* The point of having UNRESOLVED_CONTINUE vs CONTINUE is that the IR should always have uniform semantics. 4/6 is breaking that rule. If we need to do this we should move the *entire* resolution pass later, and change the jump type when we do.
* That said, why do we need 3/6 and 4/6 at all? It seems suspicious. After all, a loop like
for (i = 0; i < n; ++i) { ...; }
is isomorphic to
i = 0; for (;;) { if (!(i < n)) break; ...; ++i; }
and we should be able to unroll both of them equally well.
* In 5/6 please separate a patch to move evaluate_static_expression_as_uint() upwards.
* Why loop_unrolling_find_unrollable_loop()? Usually we just iterate over all instructions, why are we not doing that here?
* Why are we cloning the loop before unrolling it?
* Saving and restoring the copy propagation state works, but it's a bit unfortunate, and reaching into the copyprop internals seems scary and is not something I really want to have to think through.
Instead here's an alternate approach. I haven't tried to implement it but I think it should work:
In the top level pass, every time we encounter a loop, we only unroll one iteration. Record that we have done so in the loop itself, probably by incrementing some counter, or maybe even by decrementing the "unroll_limit" field (not immediately sure if that causes problems). After running an iteration, run DCE, remove_unreachable_code(), etc.
The upshot of this is that we don't have to explicitly check whether we can stop unrolling: if we've unconditionally broken after some number of iterations, then remove_unreachable_code() / DCE will simply remove the loop, and we no longer need to unroll it.
In the top level pass, every time we encounter a loop, we only unroll one iteration. Record that we have done so in the loop itself, probably by incrementing some counter, or maybe even by decrementing the "unroll_limit" field (not immediately sure if that causes problems). After running an iteration, run DCE, remove_unreachable_code(), etc.
The upshot of this is that we don't have to explicitly check whether we can stop unrolling: if we've unconditionally broken after some number of iterations, then remove_unreachable_code() / DCE will simply remove the loop, and we no longer need to unroll it.
As discussed on IRC, there is a problem with this approach. If the loop contains a (conditional) break, you need to guard out future iterations of that loop with a temporary variable. This means that every iteration of the loop will be inside an if block nested one additional level. That's fine in itself, and it produces correct code, but it runs into the documented maximum of 64 levels of nesting. [It's also mean to our current implementation of copy-prop, although I think we could potentially do things differently.]
For a realistic example, this shader can't be simplified after unrolling, but it should still unroll successfully:
``` Buffer<float> b;
float4 main() : sv_target { [unroll] for (uint i = 0; i < 100; ++i) { if (!b[i]) break; } return i; } ```
(Obviously the "i < 100" part can be simplified, but the conditional break can't.)
What native does in this case is to create a structure like
``` if (!broken) body; if (!broken) body; if (!broken) body; ```
which is well and good, but can't exactly be constructed through incremental and independent lowering like I was proposing. So we will indeed need to construct this manually.
I also realize that incremental lowering can't really work anyway, because loop unrolling can, critically, *fail*, and that should emit a warning and leave the program where it was. (This is also, I assume, why we are cloning the loop before unrolling.)
With all that said, I still don't think we should need to mess with copy prop. What we should be able to do is something more like this:
```c struct hlsl_ir_var *broken = hlsl_new_synthetic_var(); struct hlsl_block output;
hlsl_block_init(&output);
for (;;) { struct hlsl_ir_node *load; struct hlsl_src src;
// Pull out one body, guarded by "!broken". generate_iteration(&output, broken, ...);
load = hlsl_new_var_load(broken); hlsl_block_add_instr(&output, load); hlsl_src_from_node(&src, load);
// Run lowering passes on "output", including copy-prop, trivial branch simplification... ...;
if (src.node->type == HLSL_IR_CONSTANT && /* and it's true... */) { /* successful unroll, break */ }
hlsl_src_remove(&src); } ```
2/6 seems suspicious. Why are we running copyprop on out-of-bounds loads if we're going to delete them?
As we've discussed on IRC, since the loop check is stored within the loop body, on the last iteration we possibly run copyprop on OOB accesses.
Can't we DCE them first?
We need to run copyprop and trivial if removal first to simplify stuff like ```hlsl if (!broken) { ++i; if (!(i < 5)) break; /* loop body, potentially containing an OOB access */ } ```
The point of having UNRESOLVED_CONTINUE vs CONTINUE is that the IR should always have uniform semantics. 4/6 is breaking that rule. If we need to do this we should move the _entire_ resolution pass later, and change the jump type when we do.
I'll look into deferring that entire pass then, as it's needed due to:
That said, why do we need 3/6 and 4/6 at all? It seems suspicious.
Consider a loop like: ```hlsl Buffer<float> buf;
float a = 0.0f; for (i = 0; i < 4; ++i) { if (buf[i] < 0.1) break; if (buf[i] < 0.2) continue; a += buf[i]; } ```
without 3/6 and 4/6, this would unroll to something like ```hlsl if (!broken) { continued = false; if (!(i < 4)) broken = true; if (!broken && !continued) { if (buf[i] < 0.1) broken = true; if (!broken && !continued) { if (buf[i] < 0.2) { ++i; continued = true; } if (!broken && !continued) ++i; } } } ```
This is _correct_, but it leads to copy prop invalidating `i` and therefore not propagating it to subsequent iterations. 3/6 and 4/6 allows us to generate this (except for the first iteration): ```hlsl if (!broken) { ++i; /* not present in the first iteration */ continued = false; if (!(i < 4)) broken = true; if (!broken && !continued) { if (buf[i] < 0.1) broken = true; if (!broken && !continued) { if (buf[i] < 0.2) { continued = true; } if (!broken && !continued) { a += buf[i]; } } } } ```
In 5/6 please separate a patch to move evaluate_static_expression_as_uint() upwards.
Shall do.
Why loop_unrolling_find_unrollable_loop()? Usually we just iterate over all instructions, why are we not doing that here?
Because we need to clone the blocks every time we attempt an unroll: as soon as an unroll is successful we have to clone the result as future unrolls might fail, so looping through the entire block once isn't feasible without poking with `clone_block`'s `struct clone_instr_map` to remap `loop->next` from the original block to the cloned one.
Why are we cloning the loop before unrolling it?
loop unrolling can, critically, _fail_, and that should emit a warning and leave the program where it was. (This is also, I assume, why we are cloning the loop before unrolling.)
Yep, but also, this MR's current handling of unrolling failure isn't 100% accurate to native. Native will unroll by default, even without the attribute present, but if the attribute is present and unrolling fails that should trigger an actual error. I'll look into fixing this.
With all that said, I still don't think we should need to mess with copy prop. What we should be able to do is something more like this: [code snippet]
This is close to what this PR does, but it would run into the variable invalidation issue I mentioned above, and would also have a noticeable performance impact as copy prop would have to visit n² `if !broken`s.
Why do we need to save this state?
It allows us to partially run copy prop after each unroll iteration on a single `if !broken` block, while also avoiding nesting those ifs.
Could you give a small example for how this runs, and what gets saved.
Since we run copy prop for one iteration using the previous iteration's state as the parent, we mimic the nested ifs solution's copy prop state chaining.
I find 1024 iteration loops disturbing.
Yeah, it's a lot of stack space (~24 KiB). I should probably use `vkd3d_array_reserve` here instead?
This is _correct_, but it leads to copy prop invalidating `i` and therefore not propagating it to subsequent iterations.
I suppose I'll have to see, but I'm not sure this is the way we want to solve that problem. In any case it's only an optimization, so I would leave it out for now.
Why loop_unrolling_find_unrollable_loop()? Usually we just iterate over all instructions, why are we not doing that here?
Because we need to clone the blocks every time we attempt an unroll: as soon as an unroll is successful we have to clone the result as future unrolls might fail, so looping through the entire block once isn't feasible without poking with `clone_block`'s `struct clone_instr_map` to remap `loop->next` from the original block to the cloned one.
I don't understand.
With all that said, I still don't think we should need to mess with copy prop. What we should be able to do is something more like this: [code snippet]
This is close to what this PR does, but it would run into the variable invalidation issue I mentioned above, and would also have a noticeable performance impact as copy prop would have to visit n² `if !broken`s.
By "variable invalidation issue" I gather you mean "poor performance with continue, because the compiler can't tell that the iteration block is executed on all branches"? Let's set that aside for now.
How bad is the performance penalty from visiting n² "if" blocks? Have you benchmarked it? I suspect that we should try to find a way to optimize copyprop independently of unrolling, if it's a problem.
What native does in this case is to create a structure like
if (!broken) body; if (!broken) body; if (!broken) body;
which is well and good, but can't exactly be constructed through incremental and independent lowering like I was proposing. So we will indeed need to construct this manually.
If the only problem is ending up with a "chained" rather than "nested" structure to avoid hitting some depth limit, wouldn't it be simpler to add another pass that converts from one to the other? I didn't think this through (and I haven't read the MR yet), but superficially it doesn't look too hard. I guess that a nested unrolling of a loop would look like this: ```c bool broken = false; // loop body, which might set broken if (!broken) { // loop body, which might set broken if (!broken) { // etc } } ``` We could run all the optimization on this structure, which is more conducive to copy propagation (because we're not polluting the control flow with variables that would have to be tracked). Then, for as long as trailing constructs like `if (!broken)` exist, we move them out of the outer selection construct. This looks sound to do. It would become: ```c bool broken = false; // loop body if (!broken) { // loop body } if (!broken) { // etc } ```
On Fri Apr 5 11:40:02 2024 +0000, Zebediah Figura wrote:
This is _correct_, but it leads to copy prop invalidating `i` and
therefore not propagating it to subsequent iterations. I suppose I'll have to see, but I'm not sure this is the way we want to solve that problem. In any case it's only an optimization, so I would leave it out for now.
Notice that native too might become unable to unroll a loop if you force it to lose the information of what is the iteration instruction. Compare for example https://shader-playground.timjones.io/5e211ad0eacd7f37429eed7821208dec and https://shader-playground.timjones.io/bbdbef08299b83cc0427884d49d656fc.
I've read only briefly the code so far, but any chance you could split the huge last commit a little bit? For example splitting the jump removal part and the actual unrolling part? Or even more, if possible.
On Fri Apr 5 11:40:02 2024 +0000, Giovanni Mascellani wrote:
Notice that native too might become unable to unroll a loop if you force it to lose the information of what is the iteration instruction. Compare for example https://shader-playground.timjones.io/5e211ad0eacd7f37429eed7821208dec and https://shader-playground.timjones.io/bbdbef08299b83cc0427884d49d656fc.
Copying those shaders here to avoid depending on a possibly ephemeral site:
```hlsl Buffer<float> buf;
float4 main() : sv_target { float a = 0.0; int i = 0; [unroll] for (;; ++i) { if (!(i < 4)) break; if (buf[i] < 0.1) break; if (buf[i] < 0.2) continue; a += buf[i]; } return a; } ```
```hlsl Buffer<float> buf;
float4 main() : sv_target { float a = 0.0; int i = 0; [unroll] for (;;) { if (!(i < 4)) break; if (buf[i] < 0.1) break; if (buf[i] < 0.2) { ++i; continue; } a += buf[i]; ++i; } return a; } ```
On Fri Apr 5 20:00:48 2024 +0000, Zebediah Figura wrote:
Copying those shaders here to avoid depending on a possibly ephemeral site:
Buffer<float> buf; float4 main() : sv_target { float a = 0.0; int i = 0; [unroll] for (;; ++i) { if (!(i < 4)) break; if (buf[i] < 0.1) break; if (buf[i] < 0.2) continue; a += buf[i]; } return a; }
Buffer<float> buf; float4 main() : sv_target { float a = 0.0; int i = 0; [unroll] for (;;) { if (!(i < 4)) break; if (buf[i] < 0.1) break; if (buf[i] < 0.2) { ++i; continue; } a += buf[i]; ++i; } return a; }
Note also, curiously, that you get the same difference even if you get rid of the continue block.
Regardless, I would prefer to set aside this optimization for now. I'd rather review a version of copyprop that does not rely on it, before adding that potential complexity, whether or not it's required (and native notwithstanding, it's not clear to me that we will require it).
On Fri Apr 5 20:12:52 2024 +0000, Zebediah Figura wrote:
Note also, curiously, that you get the same difference even if you get rid of the continue block. Regardless, I would prefer to set aside this optimization for now. I'd rather review a version of copyprop that does not rely on it, before adding that potential complexity, whether or not it's required (and native notwithstanding, it's not clear to me that we will require it).
I wouldn't call this an optimization, we will not be able to unroll loops with `continue`s at all if we drop 3/6 and 4/6.
Note also, curiously, that you get the same difference even if you get rid of the continue block.
This also happens with this MR as copy prop will mark `i` as invalid in the root block of the function as it is being stored to inside the `if (!broken)` blocks we generate (see below). I can only imagine something similar is happening to the native compiler.
By "variable invalidation issue" I gather you mean "poor performance with continue, because the compiler can't tell that the iteration block is executed on all branches"?
No, I mean failure to unroll shaders with conditional jumps as the the value of `i` will not be propagated forwards beyond the second iteration. Here's a example, using the shader from [here](https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/748#note_66697):
1st iteration --- ```hlsl i = 0; broken = false; if (!broken) { if (!(i < 100)) broken = true; if (!b[i]) broken = true; } ```
After expr folding, copy prop, trivial if simplification (i.e `transform_run_const_passes`): ```hlsl i = 0; broken = false; if (!b[0]) broken = true; ```
2nd iteration --- ```hlsl i = 0; broken = false; if (!b[0]) broken = true; if (!broken) { ++i; if (!(i < 100)) broken = true; if (!b[i]) broken = true; } ```
Simplifying: ```hlsl i = 0; broken = false; if (!b[0]) broken = true; if (!broken) { ++i; if (!b[1]) broken = true; } ```
Things are still fine so far.
3rd iteration --- This is where stuff breaks: ```hlsl i = 0; broken = false; if (!b[0]) broken = true; if (!broken) { ++i; /* this invalidates i in the parent block. */ if (!b[1]) broken = true; } if (!broken) { /* i is invalid here */ ++i; if (!(i < 100)) broken = true; if (!b[i]) broken = true; } ```
Since `i` is invalid, nothing really simplifies and we can't determine statically if the loop breaks before `max_iterations`.
---
How bad is the performance penalty from visiting n² "if" blocks? Have you benchmarked it? I suspect that we should try to find a way to optimize copyprop independently of unrolling, if it's a problem.
Using the shader from [tests/hlsl/switch.shader_test:545](https://gitlab.winehq.org/wine/vkd3d/-/blob/master/tests/hlsl/switch.shader_...), which is probably the worst case in the vkd3d test suite:
267582ff7cbb43a7fee2b354c57cff7495f9267f (this MR): ``` $ time ./vkd3d-compiler -p ps_4_0 -x hlsl -b dxbc-tpf bad.hlsl -o out bad.hlsl:14:1: W5306: Unable to unroll loop, maximum iterations reached (1024). bad.hlsl:9:1: W5306: Unable to unroll loop, maximum iterations reached (1024).
real 0m0.403s user 0m0.387s sys 0m0.019s ```
[302834aa0be75d105e7f17487b6c475812fe8ec0](https://gitlab.winehq.org/vitorhnn/vkd3d/-/commit/302834aa0be75d105e7f17487b...) (Nesting ifs, passes all tests): ``` $ time ./vkd3d-compiler -p ps_4_0 -x hlsl -b dxbc-tpf bad.hlsl -o out
real 4m44.458s user 4m44.075s sys 0m0.018s ```
Interestingly, the native compiler also struggles to unroll this shader's outer loop (but only if it's forced to by the `[unroll]` attribute, otherwise it seems to bail out early)
wouldn't it be simpler to add another pass that converts from one to the other?
This was the approach I was going to take until performance issues led me down the current one instead. My plan was to do this all the way until [ef6eeb84cdf3c297c2e69da3e924a3dc3a7e4f77](https://gitlab.winehq.org/vitorhnn/vkd3d/-/commit/ef6eeb84cdf3c297c2e69da3e9...) where I realized that I could just "chain" the iterations with this approach and I wouldn't need such a pass.
On Sat Apr 6 00:07:04 2024 +0000, Victor Chiletto wrote:
I wouldn't call this an optimization, we will not be able to unroll loops with `continue`s at all if we drop 3/6 and 4/6.
Note also, curiously, that you get the same difference even if you get
rid of the continue block. This also happens with this MR as copy prop will mark `i` as invalid in the root block of the function as it is being stored to inside the `if (!broken)` blocks we generate (see below). I can only imagine something similar is happening to the native compiler.
Well, it is an optimization, albeit a quite important one.
Part of what I'm asking here is that, to make review easier, we start with the basic, generic version of unrolling, then add the necessary optimizations to make it reasonably performant. That'll aid review quite a lot.
I've been thinking about about this, and also taking a closer look at the patches. As usual I've taken the long way around to come to the same conclusions the original author did. For the benefit of future readers I'll probably write a long form comment for inclusion in the code.
The "hooking into" copy-prop actually makes a lot of sense now that I look, and it's not really that bad. All we're doing is saving state from the last block and using it on the new one. I'm inclined to think it's exactly what we want to do, maybe with minor adjustments.
I do still think we should hold off on the "iter" part though.
This merge request was closed by Victor Chiletto.
Closing in favor of !786.