**NOTE**: In the end I decided to send the first part of this MR as !616, already merged.
For temporary registers, SM1-SM3 integer types are internally represented as floating point, so, in order to perform a cast from ints to floats we need a mere MOV.
By the same token, casts from floats to ints can also be implemented with a FLOOR + MOV, where the FLOOR is then lowered by the lower_floor() pass.
For constant integer registers "iN" there is no operation for casting from a floating point register to them. For address registers "aN", and the loop counting register "aL", vertex shaders have the "mova" operation but we haven't used these registers in any way yet.
We probably would want to introduce these as synthetic variables allocated in a special register set. In that case we have to remember to use MOVA instead of MOV in the store operations, but they shouldn't be src or dst of CAST operations.
Regarding constant integer registers, in some shaders, constants are expected to be received formatted as an integer, such as:
int m; float4 main() : sv_target { float4 res = {0, 0, 0, 0};
for (int k = 0; k < m; ++k) res += k; return res; }
which compiles as:
// Registers: // // Name Reg Size // ------------ ----- ---- // m i0 1 //
ps_3_0 def c0, 0, 1, 0, 0 mov r0, c0.x mov r1.x, c0.x rep i0 add r0, r0, r1.x add r1.x, r1.x, c0.y endrep mov oC0, r0
but this only happens if the integer constant is used directly in an instruction that needs it, and as I said there is no instruction that allows converting them to a float representation.
Notice how a more complex shader, that performs operations with this integer variable "m":
int m; float4 main() : sv_target { float4 res = {0, 0, 0, 0};
for (int k = 0; k < m * m; ++k) res += k; return res; }
gives the following output:
// Registers: // // Name Reg Size // ------------ ----- ---- // m c0 1 //
ps_3_0 def c1, 0, 0, 1, 0 defi i0, 255, 0, 0, 0 mul r0.x, c0.x, c0.x mov r1, c1.y mov r0.y, c1.y rep i0 mov r0.z, r0.x break_ge r0.y, r0.z add r1, r0.y, r1 add r0.y, r0.y, c1.z endrep mov oC0, r1
Meaning that the uniform "m" is just stored as a floating point in "c0", the constant integer register "i0" is just set to 255 (hoping it is a high enough value) using "defi", and the "break_ge" involving c0 is used to break from the loop.
We could potentially use this approach to implement loops from SM3 without expecting the variables being received as constant integer registers.
According to the D3D documentation, for SM1-SM3 constant integer registers are only used by the 'loop' and 'rep' instructions.
-- v4: vkd3d-shader/hlsl: Lower casts to int for SM1. tests: Add simple test for implicit cast to int. vkd3d-shader/d3dbc: Implement casts from ints to floats as a MOV.
From: Francisco Casas fcasas@codeweavers.com
For temporary registers, SM1-SM3 integer types are internally represented as floating point, so, in order to perform a cast from ints to floats we need a mere MOV.
For constant integer registers "iN" there is no operation for casting from a floating point register to them. For address registers "aN", and the loop counting register "aL", vertex shaders have the "mova" operation but we haven't used these registers in any way yet.
We probably would want to introduce these as synthetic variables allocated in a special register set. In that case we have to remember to use MOVA instead of MOV in the store operations, but they shouldn't be src or dst of CAST operations.
Regarding constant integer registers, in some shaders, constants are expected to be received formatted as an integer, such as:
int m; float4 main() : sv_target { float4 res = {0, 0, 0, 0};
for (int k = 0; k < m; ++k) res += k; return res; }
which compiles as:
// Registers: // // Name Reg Size // ------------ ----- ---- // m i0 1 //
ps_3_0 def c0, 0, 1, 0, 0 mov r0, c0.x mov r1.x, c0.x rep i0 add r0, r0, r1.x add r1.x, r1.x, c0.y endrep mov oC0, r0
but this only happens if the integer constant is used directly in an instruction that needs it, and as I said there is no instruction that allows converting them to a float representation.
Notice how a more complex shader, that performs operations with this integer variable "m":
int m; float4 main() : sv_target { float4 res = {0, 0, 0, 0};
for (int k = 0; k < m * m; ++k) res += k; return res; }
gives the following output:
// Registers: // // Name Reg Size // ------------ ----- ---- // m c0 1 //
ps_3_0 def c1, 0, 0, 1, 0 defi i0, 255, 0, 0, 0 mul r0.x, c0.x, c0.x mov r1, c1.y mov r0.y, c1.y rep i0 mov r0.z, r0.x break_ge r0.y, r0.z add r1, r0.y, r1 add r0.y, r0.y, c1.z endrep mov oC0, r1
Meaning that the uniform "m" is just stored as a floating point in "c0", the constant integer register "i0" is just set to 255 (hoping it is a high enough value) using "defi", and the "break_ge" involving c0 is used to break from the loop.
We could potentially use this approach to implement loops from SM3 without expecting the variables being received as constant integer registers.
According to the D3D documentation, for SM1-SM3 constant integer registers are only used by the 'loop' and 'rep' instructions. --- libs/vkd3d-shader/d3dbc.c | 82 +++++++++++++++++++++++++++++++++ tests/hlsl/distance.shader_test | 2 +- tests/hlsl/half.shader_test | 4 +- tests/hlsl/ldexp.shader_test | 4 +- tests/hlsl/lerp.shader_test | 6 +-- 5 files changed, 90 insertions(+), 8 deletions(-)
diff --git a/libs/vkd3d-shader/d3dbc.c b/libs/vkd3d-shader/d3dbc.c index 9ad9f735d..27aec124d 100644 --- a/libs/vkd3d-shader/d3dbc.c +++ b/libs/vkd3d-shader/d3dbc.c @@ -1958,6 +1958,84 @@ static void write_sm1_unary_op(struct hlsl_ctx *ctx, struct vkd3d_bytecode_buffe write_sm1_instruction(ctx, buffer, &instr); }
+static void write_sm1_cast(struct hlsl_ctx *ctx, struct vkd3d_bytecode_buffer *buffer, + const struct hlsl_ir_node *instr) +{ + struct hlsl_ir_expr *expr = hlsl_ir_expr(instr); + const struct hlsl_ir_node *arg1 = expr->operands[0].node; + const struct hlsl_type *dst_type = expr->node.data_type; + const struct hlsl_type *src_type = arg1->data_type; + + /* Narrowing casts were already lowered. */ + assert(src_type->dimx == dst_type->dimx); + + switch (dst_type->base_type) + { + case HLSL_TYPE_HALF: + case HLSL_TYPE_FLOAT: + switch (src_type->base_type) + { + case HLSL_TYPE_INT: + case HLSL_TYPE_UINT: + /* Integers are internally represented as floats, so no change is necessary.*/ + case HLSL_TYPE_HALF: + case HLSL_TYPE_FLOAT: + write_sm1_unary_op(ctx, buffer, D3DSIO_MOV, &instr->reg, &arg1->reg, 0, 0); + break; + + case HLSL_TYPE_BOOL: + hlsl_fixme(ctx, &instr->loc, "SM1 cast from bool to float."); + break; + + case HLSL_TYPE_DOUBLE: + hlsl_fixme(ctx, &instr->loc, "SM1 cast from double to float."); + break; + + default: + vkd3d_unreachable(); + } + break; + + case HLSL_TYPE_INT: + case HLSL_TYPE_UINT: + switch(src_type->base_type) + { + case HLSL_TYPE_INT: + case HLSL_TYPE_UINT: + write_sm1_unary_op(ctx, buffer, D3DSIO_MOV, &instr->reg, &arg1->reg, 0, 0); + break; + + case HLSL_TYPE_HALF: + case HLSL_TYPE_FLOAT: + hlsl_fixme(ctx, &instr->loc, "SM1 cast from float to integer."); + break; + + case HLSL_TYPE_BOOL: + hlsl_fixme(ctx, &instr->loc, "SM1 cast from bool to integer."); + break; + + case HLSL_TYPE_DOUBLE: + hlsl_fixme(ctx, &instr->loc, "SM1 cast from double to integer."); + break; + + default: + vkd3d_unreachable(); + } + break; + + case HLSL_TYPE_DOUBLE: + hlsl_fixme(ctx, &instr->loc, "SM1 cast to double."); + break; + + case HLSL_TYPE_BOOL: + /* Casts to bool should have already been lowered. */ + default: + hlsl_fixme(ctx, &expr->node.loc, "SM1 cast from %s to %s.\n", + debug_hlsl_type(ctx, src_type), debug_hlsl_type(ctx, dst_type)); + break; + } +} + static void write_sm1_constant_defs(struct hlsl_ctx *ctx, struct vkd3d_bytecode_buffer *buffer) { unsigned int i, x; @@ -2179,6 +2257,10 @@ static void write_sm1_expr(struct hlsl_ctx *ctx, struct vkd3d_bytecode_buffer *b write_sm1_unary_op(ctx, buffer, D3DSIO_ABS, &instr->reg, &arg1->reg, 0, 0); break;
+ case HLSL_OP1_CAST: + write_sm1_cast(ctx, buffer, instr); + break; + case HLSL_OP1_DSX: write_sm1_unary_op(ctx, buffer, D3DSIO_DSX, &instr->reg, &arg1->reg, 0, 0); break; diff --git a/tests/hlsl/distance.shader_test b/tests/hlsl/distance.shader_test index bf2423c7a..3f5446451 100644 --- a/tests/hlsl/distance.shader_test +++ b/tests/hlsl/distance.shader_test @@ -13,7 +13,7 @@ uniform 4 float4 2.0 -1.0 4.0 5.0 draw quad probe all rgba (7.483983, 7.483983, 7.483983, 7.483983) 1
-[pixel shader todo(sm<4)] +[pixel shader] uniform int4 x; uniform int4 y;
diff --git a/tests/hlsl/half.shader_test b/tests/hlsl/half.shader_test index 8cf7a756f..fe7074e45 100644 --- a/tests/hlsl/half.shader_test +++ b/tests/hlsl/half.shader_test @@ -9,7 +9,7 @@ float4 main() : sv_target [require] options: backcompat
-[pixel shader todo(sm<4)] +[pixel shader] uniform half h;
float4 main() : sv_target @@ -19,5 +19,5 @@ float4 main() : sv_target
[test] uniform 0 float 10.0 -todo(sm<4) draw quad +draw quad probe all rgba (10.0, 10.0, 10.0, 10.0) diff --git a/tests/hlsl/ldexp.shader_test b/tests/hlsl/ldexp.shader_test index 2d778e077..75492b66d 100644 --- a/tests/hlsl/ldexp.shader_test +++ b/tests/hlsl/ldexp.shader_test @@ -14,7 +14,7 @@ draw quad probe all rgba (2.0, 0.00292968750, 4096.0, 6.33825300e+030) 2
-[pixel shader todo(sm<4)] +[pixel shader] uniform int4 x; uniform int4 y;
@@ -28,7 +28,7 @@ if(sm<4) uniform 0 float4 2 3 4 5 if(sm<4) uniform 4 float4 0 -10 10 100 if(sm>=4) uniform 0 int4 2 3 4 5 if(sm>=4) uniform 4 int4 0 -10 10 100 -todo(sm<4) draw quad +draw quad probe all rgba (2.0, 0.00292968750, 4096.0, 6.33825300e+030) 2
diff --git a/tests/hlsl/lerp.shader_test b/tests/hlsl/lerp.shader_test index 901857dfd..ae6ef5343 100644 --- a/tests/hlsl/lerp.shader_test +++ b/tests/hlsl/lerp.shader_test @@ -16,7 +16,7 @@ draw quad probe all rgba (2.0, -10.0, -2.0, 76.25)
-[pixel shader todo(sm<4)] +[pixel shader] uniform int4 x; uniform int4 y; uniform int4 s; @@ -33,8 +33,8 @@ if(sm<4) uniform 8 float4 0 1 -1 1000000 if(sm>=4) uniform 0 int4 2 3 4 0 if(sm>=4) uniform 4 int4 0 -10 10 1000000 if(sm>=4) uniform 8 int4 0 1 -1 1000000 -todo(sm<4) draw quad -probe all rgba (2.0, -10.0, -2.0, 1e12) +draw quad +probe all rgba (2.0, -10.0, -2.0, 1e12) 4
[pixel shader]
From: Francisco Casas fcasas@codeweavers.com
--- tests/hlsl/cast-to-int.shader_test | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/tests/hlsl/cast-to-int.shader_test b/tests/hlsl/cast-to-int.shader_test index 385e3375f..4605edb8e 100644 --- a/tests/hlsl/cast-to-int.shader_test +++ b/tests/hlsl/cast-to-int.shader_test @@ -1,3 +1,19 @@ +[pixel shader todo(sm<4)] +uniform float3 f; + +float4 main() : sv_target +{ + int3 r = f; + + return float4(r, 0); +} + +[test] +uniform 0 float4 10.3 11.5 12.8 13.1 +todo(sm<4) draw quad +probe all rgba (10, 11, 12, 0) + + [pixel shader todo(sm<4)] uniform float f; uniform uint u;
From: Francisco Casas fcasas@codeweavers.com
--- libs/vkd3d-shader/d3dbc.c | 18 ++++++++-------- libs/vkd3d-shader/hlsl_codegen.c | 34 ++++++++++++++++++++++++++++++ tests/hlsl/cast-to-int.shader_test | 4 ++-- tests/hlsl/ceil.shader_test | 8 +++---- tests/hlsl/floor.shader_test | 8 +++---- tests/hlsl/round.shader_test | 8 +++---- tests/hlsl/saturate.shader_test | 4 ++-- 7 files changed, 59 insertions(+), 25 deletions(-)
diff --git a/libs/vkd3d-shader/d3dbc.c b/libs/vkd3d-shader/d3dbc.c index 27aec124d..27f5c8104 100644 --- a/libs/vkd3d-shader/d3dbc.c +++ b/libs/vkd3d-shader/d3dbc.c @@ -2000,16 +2000,14 @@ static void write_sm1_cast(struct hlsl_ctx *ctx, struct vkd3d_bytecode_buffer *b case HLSL_TYPE_UINT: switch(src_type->base_type) { + case HLSL_TYPE_HALF: + case HLSL_TYPE_FLOAT: + /* A compilation pass applies a FLOOR operation to casts to int, so no change is necessary. */ case HLSL_TYPE_INT: case HLSL_TYPE_UINT: write_sm1_unary_op(ctx, buffer, D3DSIO_MOV, &instr->reg, &arg1->reg, 0, 0); break;
- case HLSL_TYPE_HALF: - case HLSL_TYPE_FLOAT: - hlsl_fixme(ctx, &instr->loc, "SM1 cast from float to integer."); - break; - case HLSL_TYPE_BOOL: hlsl_fixme(ctx, &instr->loc, "SM1 cast from bool to integer."); break; @@ -2244,6 +2242,12 @@ static void write_sm1_expr(struct hlsl_ctx *ctx, struct vkd3d_bytecode_buffer *b
assert(instr->reg.allocated);
+ if (expr->op == HLSL_OP1_CAST) + { + write_sm1_cast(ctx, buffer, instr); + return; + } + if (instr->data_type->base_type != HLSL_TYPE_FLOAT) { /* These need to be lowered. */ @@ -2257,10 +2261,6 @@ static void write_sm1_expr(struct hlsl_ctx *ctx, struct vkd3d_bytecode_buffer *b write_sm1_unary_op(ctx, buffer, D3DSIO_ABS, &instr->reg, &arg1->reg, 0, 0); break;
- case HLSL_OP1_CAST: - write_sm1_cast(ctx, buffer, instr); - break; - case HLSL_OP1_DSX: write_sm1_unary_op(ctx, buffer, D3DSIO_DSX, &instr->reg, &arg1->reg, 0, 0); break; diff --git a/libs/vkd3d-shader/hlsl_codegen.c b/libs/vkd3d-shader/hlsl_codegen.c index 6ad60e4c6..4121fadf3 100644 --- a/libs/vkd3d-shader/hlsl_codegen.c +++ b/libs/vkd3d-shader/hlsl_codegen.c @@ -2647,6 +2647,39 @@ static bool sort_synthetic_separated_samplers_first(struct hlsl_ctx *ctx) return false; }
+/* Append a FLOOR before a CAST to int or uint (which is written as a mere MOV). */ +static bool lower_casts_to_int(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, struct hlsl_block *block) +{ + struct hlsl_ir_node *arg, *floor, *cast2; + struct hlsl_ir_expr *expr; + + if (instr->type != HLSL_IR_EXPR) + return false; + expr = hlsl_ir_expr(instr); + if (expr->op != HLSL_OP1_CAST) + return false; + + arg = expr->operands[0].node; + if (instr->data_type->base_type != HLSL_TYPE_INT && instr->data_type->base_type != HLSL_TYPE_UINT) + return false; + if (arg->data_type->base_type != HLSL_TYPE_FLOAT && arg->data_type->base_type != HLSL_TYPE_HALF) + return false; + + /* Check that the argument is not already a FLOOR */ + if (arg->type == HLSL_IR_EXPR && hlsl_ir_expr(arg)->op == HLSL_OP1_FLOOR) + return false; + + if (!(floor = hlsl_new_unary_expr(ctx, HLSL_OP1_FLOOR, arg, &instr->loc))) + return false; + hlsl_block_add_instr(block, floor); + + if (!(cast2 = hlsl_new_cast(ctx, floor, instr->data_type, &instr->loc))) + return false; + hlsl_block_add_instr(block, cast2); + + return true; +} + /* Lower DIV to RCP + MUL. */ static bool lower_division(struct hlsl_ctx *ctx, struct hlsl_ir_node *instr, struct hlsl_block *block) { @@ -5060,6 +5093,7 @@ int hlsl_emit_bytecode(struct hlsl_ctx *ctx, struct hlsl_ir_function_decl *entry lower_ir(ctx, lower_ternary, body); if (profile->major_version < 4) { + lower_ir(ctx, lower_casts_to_int, body); lower_ir(ctx, lower_division, body); lower_ir(ctx, lower_sqrt, body); lower_ir(ctx, lower_dot, body); diff --git a/tests/hlsl/cast-to-int.shader_test b/tests/hlsl/cast-to-int.shader_test index 4605edb8e..f35bb5604 100644 --- a/tests/hlsl/cast-to-int.shader_test +++ b/tests/hlsl/cast-to-int.shader_test @@ -1,4 +1,4 @@ -[pixel shader todo(sm<4)] +[pixel shader] uniform float3 f;
float4 main() : sv_target @@ -10,7 +10,7 @@ float4 main() : sv_target
[test] uniform 0 float4 10.3 11.5 12.8 13.1 -todo(sm<4) draw quad +draw quad probe all rgba (10, 11, 12, 0)
diff --git a/tests/hlsl/ceil.shader_test b/tests/hlsl/ceil.shader_test index aa4bdf297..91bfe1991 100644 --- a/tests/hlsl/ceil.shader_test +++ b/tests/hlsl/ceil.shader_test @@ -21,7 +21,7 @@ uniform 0 float4 -0.5 6.5 7.5 3.4 draw quad probe all rgba (0.0, 7.0, 8.0, 4.0) 4
-[pixel shader todo(sm<4)] +[pixel shader] uniform float4 u;
float4 main() : sv_target @@ -34,10 +34,10 @@ float4 main() : sv_target
[test] uniform 0 float4 -0.5 6.5 7.5 3.4 -todo(sm<4) draw quad +draw quad probe all rgba (7.0, 8.0, 0.0, 4.0) 4
-[pixel shader todo(sm<4)] +[pixel shader] uniform int4 u;
float4 main() : sv_target @@ -51,5 +51,5 @@ float4 main() : sv_target [test] if(sm<4) uniform 0 float4 -1 6 7 3 if(sm>=4) uniform 0 int4 -1 6 7 3 -todo(sm<4) draw quad +draw quad probe all rgba (6.0, 7.0, -1.0, 3.0) 4 diff --git a/tests/hlsl/floor.shader_test b/tests/hlsl/floor.shader_test index ed1ab7688..b3552515a 100644 --- a/tests/hlsl/floor.shader_test +++ b/tests/hlsl/floor.shader_test @@ -21,7 +21,7 @@ uniform 0 float4 -0.5 6.5 7.5 3.4 draw quad probe all rgba (-1.0, 6.0, 7.0, 3.0) 4
-[pixel shader todo(sm<4)] +[pixel shader] uniform float4 u;
float4 main() : sv_target @@ -34,11 +34,11 @@ float4 main() : sv_target
[test] uniform 0 float4 -0.5 6.5 7.5 3.4 -todo(sm<4) draw quad +draw quad probe all rgba (6.0, 7.0, -1.0, 3.0) 4
-[pixel shader todo(sm<4)] +[pixel shader] uniform int4 u;
float4 main() : sv_target @@ -52,5 +52,5 @@ float4 main() : sv_target [test] if(sm<4) uniform 0 float4 -1 6 7 3 if(sm>=4) uniform 0 int4 -1 6 7 3 -todo(sm<4) draw quad +draw quad probe all rgba (6.0, 7.0, -1.0, 3.0) 4 diff --git a/tests/hlsl/round.shader_test b/tests/hlsl/round.shader_test index 7b4c68cb7..b9234b010 100644 --- a/tests/hlsl/round.shader_test +++ b/tests/hlsl/round.shader_test @@ -13,7 +13,7 @@ probe all rgba (0.0, -7.0, 8.0, 3.0) 4
-[pixel shader todo(sm<4)] +[pixel shader] uniform float4 u;
float4 main() : sv_target @@ -26,12 +26,12 @@ float4 main() : sv_target
[test] uniform 0 float4 -0.4 -6.6 7.6 3.4 -todo(sm<4) draw quad +draw quad probe all rgba (-7.0, 8.0, 0.0, 3.0) 4
-[pixel shader todo(sm<4)] +[pixel shader] uniform float4 u;
float4 main() : sv_target @@ -42,5 +42,5 @@ float4 main() : sv_target
[test] uniform 0 float4 -1 0 2 10 -todo(sm<4) draw quad +draw quad probe all rgba (-1.0, 0.0, 2.0, 10.0) 4 diff --git a/tests/hlsl/saturate.shader_test b/tests/hlsl/saturate.shader_test index 6852015b2..2ed83cf66 100644 --- a/tests/hlsl/saturate.shader_test +++ b/tests/hlsl/saturate.shader_test @@ -11,7 +11,7 @@ uniform 0 float4 0.7 -0.1 0.0 0.0 todo(sm>=6) draw quad probe all rgba (0.7, 0.0, 1.0, 0.0)
-[pixel shader todo(sm<4)] +[pixel shader] uniform float4 u;
float4 main() : sv_target @@ -22,5 +22,5 @@ float4 main() : sv_target
[test] uniform 0 float4 -2 0 2 -1 -todo(sm<4 | sm>=6) draw quad +todo(sm>=6) draw quad probe all rgba (0.0, 0.0, 1.0, 0.0)
This merge request was approved by Zebediah Figura.
Giovanni Mascellani (@giomasce) commented about libs/vkd3d-shader/hlsl_codegen.c:
- if (instr->type != HLSL_IR_EXPR)
return false;
- expr = hlsl_ir_expr(instr);
- if (expr->op != HLSL_OP1_CAST)
return false;
- arg = expr->operands[0].node;
- if (instr->data_type->base_type != HLSL_TYPE_INT && instr->data_type->base_type != HLSL_TYPE_UINT)
return false;
- if (arg->data_type->base_type != HLSL_TYPE_FLOAT && arg->data_type->base_type != HLSL_TYPE_HALF)
return false;
- /* Check that the argument is not already a FLOOR */
- if (arg->type == HLSL_IR_EXPR && hlsl_ir_expr(arg)->op == HLSL_OP1_FLOOR)
return false;
It's ok for now, but it would be nice if this would eventually become another optimization pass, so that it covers other cases of double floors (or double whatever other idempotent operation).
This merge request was approved by Giovanni Mascellani.
This merge request was approved by Henri Verbeet.
On Thu Feb 15 11:41:30 2024 +0000, Giovanni Mascellani wrote:
It's ok for now, but it would be nice if this would eventually become another optimization pass, so that it covers other cases of double floors (or double whatever other idempotent operation).
I wrote this part to ensure that the transform_ir() call using this pass doesn't return true every time it is called, in case it is used in a loop, but it makes sense to also have a pass to lower idempotent functions.
This merge request was approved by Matteo Bruni.