SPIR-V says this is undefined behaviour, but Direct3D actually specifies that it should clamp. Most drivers do clamp, but llvmpipe does not. Hence this fixes a couple of tests with llvmpipe.
This does of course add overhead to the ftou instruction, but I cannot imagine that it is a common instruction to execute. This also is not the first time we perform such checks; cf. udiv.
From: Zebediah Figura zfigura@codeweavers.com
--- libs/vkd3d-shader/spirv.c | 41 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 39 insertions(+), 2 deletions(-)
diff --git a/libs/vkd3d-shader/spirv.c b/libs/vkd3d-shader/spirv.c index d71f0a69..63b4107f 100644 --- a/libs/vkd3d-shader/spirv.c +++ b/libs/vkd3d-shader/spirv.c @@ -1715,6 +1715,15 @@ static uint32_t vkd3d_spirv_build_op_glsl_std450_cos(struct vkd3d_spirv_builder return vkd3d_spirv_build_op_glsl_std450_tr1(builder, GLSLstd450Cos, result_type, operand); }
+static uint32_t vkd3d_spirv_build_op_glsl_std450_max(struct vkd3d_spirv_builder *builder, + uint32_t result_type, uint32_t x, uint32_t y) +{ + uint32_t glsl_std450_id = vkd3d_spirv_get_glsl_std450_instr_set(builder); + uint32_t operands[] = {x, y}; + return vkd3d_spirv_build_op_ext_inst(builder, result_type, glsl_std450_id, + GLSLstd450NMax, operands, ARRAY_SIZE(operands)); +} + static uint32_t vkd3d_spirv_build_op_glsl_std450_nclamp(struct vkd3d_spirv_builder *builder, uint32_t result_type, uint32_t x, uint32_t min, uint32_t max) { @@ -6530,7 +6539,6 @@ static SpvOp spirv_compiler_map_alu_instruction(const struct vkd3d_shader_instru {VKD3DSIH_DTOU, SpvOpConvertFToU}, {VKD3DSIH_FTOD, SpvOpFConvert}, {VKD3DSIH_FTOI, SpvOpConvertFToS}, - {VKD3DSIH_FTOU, SpvOpConvertFToU}, {VKD3DSIH_IADD, SpvOpIAdd}, {VKD3DSIH_INEG, SpvOpSNegate}, {VKD3DSIH_ISHL, SpvOpShiftLeftLogical}, @@ -6996,6 +7004,33 @@ static void spirv_compiler_emit_udiv(struct spirv_compiler *compiler, } }
+static void spirv_compiler_emit_ftou(struct spirv_compiler *compiler, + const struct vkd3d_shader_instruction *instruction) +{ + struct vkd3d_spirv_builder *builder = &compiler->spirv_builder; + const struct vkd3d_shader_dst_param *dst = instruction->dst; + const struct vkd3d_shader_src_param *src = instruction->src; + uint32_t src_type_id, type_id, src_id, zero_id, val_id; + + assert(instruction->dst_count == 1); + assert(instruction->src_count == 1); + + /* OpConvertFToU has undefined results if the result cannot be represented + * as an unsigned integer, but Direct3D expects the result to saturate, + * and for NaN to yield zero. */ + + src_type_id = spirv_compiler_get_type_id_for_reg(compiler, &src->reg, dst->write_mask); + type_id = spirv_compiler_get_type_id_for_dst(compiler, dst); + src_id = spirv_compiler_emit_load_src(compiler, src, dst->write_mask); + zero_id = spirv_compiler_get_constant_float_vector(compiler, 0.0f, + vkd3d_write_mask_component_count(dst->write_mask)); + + val_id = vkd3d_spirv_build_op_glsl_std450_max(builder, src_type_id, src_id, zero_id); + val_id = vkd3d_spirv_build_op_tr1(builder, &builder->function_stream, SpvOpConvertFToU, type_id, val_id); + + spirv_compiler_emit_store_dst(compiler, dst, val_id); +} + static void spirv_compiler_emit_bitfield_instruction(struct spirv_compiler *compiler, const struct vkd3d_shader_instruction *instruction) { @@ -9280,7 +9315,6 @@ static int spirv_compiler_handle_instruction(struct spirv_compiler *compiler, case VKD3DSIH_DTOU: case VKD3DSIH_FTOD: case VKD3DSIH_FTOI: - case VKD3DSIH_FTOU: case VKD3DSIH_IADD: case VKD3DSIH_INEG: case VKD3DSIH_ISHL: @@ -9341,6 +9375,9 @@ static int spirv_compiler_handle_instruction(struct spirv_compiler *compiler, case VKD3DSIH_UDIV: spirv_compiler_emit_udiv(compiler, instruction); break; + case VKD3DSIH_FTOU: + spirv_compiler_emit_ftou(compiler, instruction); + break; case VKD3DSIH_DEQ: case VKD3DSIH_DGE: case VKD3DSIH_DLT:
From: Zebediah Figura zfigura@codeweavers.com
--- libs/vkd3d-shader/spirv.c | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/libs/vkd3d-shader/spirv.c b/libs/vkd3d-shader/spirv.c index 63b4107f..116786eb 100644 --- a/libs/vkd3d-shader/spirv.c +++ b/libs/vkd3d-shader/spirv.c @@ -7007,10 +7007,12 @@ static void spirv_compiler_emit_udiv(struct spirv_compiler *compiler, static void spirv_compiler_emit_ftou(struct spirv_compiler *compiler, const struct vkd3d_shader_instruction *instruction) { + uint32_t src_id, zero_id, uint_max_id, float_max_id, condition_id, val_id; struct vkd3d_spirv_builder *builder = &compiler->spirv_builder; const struct vkd3d_shader_dst_param *dst = instruction->dst; const struct vkd3d_shader_src_param *src = instruction->src; - uint32_t src_type_id, type_id, src_id, zero_id, val_id; + uint32_t src_type_id, dst_type_id, condition_type_id; + unsigned int component_count;
assert(instruction->dst_count == 1); assert(instruction->src_count == 1); @@ -7019,14 +7021,22 @@ static void spirv_compiler_emit_ftou(struct spirv_compiler *compiler, * as an unsigned integer, but Direct3D expects the result to saturate, * and for NaN to yield zero. */
+ component_count = vkd3d_write_mask_component_count(dst->write_mask); src_type_id = spirv_compiler_get_type_id_for_reg(compiler, &src->reg, dst->write_mask); - type_id = spirv_compiler_get_type_id_for_dst(compiler, dst); + dst_type_id = spirv_compiler_get_type_id_for_dst(compiler, dst); src_id = spirv_compiler_emit_load_src(compiler, src, dst->write_mask); - zero_id = spirv_compiler_get_constant_float_vector(compiler, 0.0f, - vkd3d_write_mask_component_count(dst->write_mask));
+ zero_id = spirv_compiler_get_constant_float_vector(compiler, 0.0f, component_count); val_id = vkd3d_spirv_build_op_glsl_std450_max(builder, src_type_id, src_id, zero_id); - val_id = vkd3d_spirv_build_op_tr1(builder, &builder->function_stream, SpvOpConvertFToU, type_id, val_id); + + float_max_id = spirv_compiler_get_constant_float_vector(compiler, 4294967296.0f, component_count); + uint_max_id = spirv_compiler_get_constant_uint_vector(compiler, UINT_MAX, component_count); + condition_type_id = vkd3d_spirv_get_type_id(builder, VKD3D_SHADER_COMPONENT_BOOL, component_count); + condition_id = vkd3d_spirv_build_op_tr2(builder, &builder->function_stream, + SpvOpFOrdGreaterThanEqual, condition_type_id, val_id, float_max_id); + + val_id = vkd3d_spirv_build_op_tr1(builder, &builder->function_stream, SpvOpConvertFToU, dst_type_id, val_id); + val_id = vkd3d_spirv_build_op_select(builder, dst_type_id, condition_id, uint_max_id, val_id);
spirv_compiler_emit_store_dst(compiler, dst, val_id); }
Could you please also fix `ftoi`?
I had written a similar patch at some point. The reason why I hesitated sending it is that it's not completely clear what "Behavior is undefined" mean in the SPIR-V specification, since what your code does is to compute `ConvertFToU` anyway and then ignore it if the input is not in the range. According to the C/C++ meaning of "undefined behavior" this would be invalid: once you invoke UB there's nothing you can do to "go back", and even if you ignore the computed value the compiler has no obligations any more with you. If SPIR-V adopted the same meaning, which would sound plausible, your implementation (and mine as well) would be incorrect. Do you have any specific insight in this matter?
Could you please also fix `ftoi`?
Sure. I was more concerned about ftou, since it was one of the last things preventing Wine tests from working, but it shouldn't be too much more work to fix ftoi.
I had written a similar patch at some point. The reason why I hesitated sending it is that it's not completely clear what "Behavior is undefined" mean in the SPIR-V specification, since what your code does is to compute `ConvertFToU` anyway and then ignore it if the input is not in the range. According to the C/C++ meaning of "undefined behavior" this would be invalid: once you invoke UB there's nothing you can do to "go back", and even if you ignore the computed value the compiler has no obligations any more with you. If SPIR-V adopted the same meaning, which would sound plausible, your implementation (and mine as well) would be incorrect. Do you have any specific insight in this matter?
I don't have any better insight than you, and asking for a clarification on what undefined behaviour actually means would be a good thing.
I guess there are two questions here:
(1) can an "undefined" instruction do anything other than returning an arbitrary value, including cascading to further instructions that use that undefined value in computation?
I suspect the answer is, or will be, no, if nothing else then at least for arithmetic instructions. This is at least partially because it is hard for shaders to crash, or corrupt data, in a meaningful sense. I also suspect this is the intent because SPIR-V explicitly includes an OpUndef instruction, which would otherwise mean "please crash my computer". I also suspect this is the case because given the way shaders are designed, it's not really possible to avoid executing an undefined instruction; when branching is encountered shaders will basically execute both branches in parallel invocations, so all of the code gets run anyway.
(2) can an "undefined" instruction that is *not* used in computation [as with the OpSelect here] affect the result of that computation? E.g. can "<true> ? <value> : <undef>" yield anything other than <value>?
I suspect the answer is again no, although I don't have much better to base this on than "that's the simplest and most intuitive way for shaders to behave", and also again "what would be the point of OpUndef otherwise".
I don't *know* that the answers are no, and I think we should still request a spec clarification. But I think an answer of yes to either would make the language near unusable as a compiler target. Fundamentally, if this isn't enough to avoid undefined behaviour, I don't think anything is.
And it's worth pointing out that no spec will ever be perfect, and no "conformant" implementation will be perfect either, and this behaviour seems sensible, and fixes a problem in a way that aligns with what the spec's intent *looks* like.
Also, this isn't the first time that we've avoided undefined behaviour with an OpSelect—see again udiv.
On Wed Aug 2 13:44:29 2023 +0000, Zebediah Figura wrote:
Could you please also fix `ftoi`?
Sure. I was more concerned about ftou, since it was one of the last things preventing Wine tests from working, but it shouldn't be too much more work to fix ftoi.
I had written a similar patch at some point. The reason why I
hesitated sending it is that it's not completely clear what "Behavior is undefined" mean in the SPIR-V specification, since what your code does is to compute `ConvertFToU` anyway and then ignore it if the input is not in the range. According to the C/C++ meaning of "undefined behavior" this would be invalid: once you invoke UB there's nothing you can do to "go back", and even if you ignore the computed value the compiler has no obligations any more with you. If SPIR-V adopted the same meaning, which would sound plausible, your implementation (and mine as well) would be incorrect. Do you have any specific insight in this matter? I don't have any better insight than you, and asking for a clarification on what undefined behaviour actually means would be a good thing. I guess there are two questions here: (1) can an "undefined" instruction do anything other than returning an arbitrary value, including cascading to further instructions that use that undefined value in computation? I suspect the answer is, or will be, no, if nothing else then at least for arithmetic instructions. This is at least partially because it is hard for shaders to crash, or corrupt data, in a meaningful sense. I also suspect this is the intent because SPIR-V explicitly includes an OpUndef instruction, which would otherwise mean "please crash my computer". I also suspect this is the case because given the way shaders are designed, it's not really possible to avoid executing an undefined instruction; when branching is encountered shaders will basically execute both branches in parallel invocations, so all of the code gets run anyway. (2) can an "undefined" instruction that is *not* used in computation [as with the OpSelect here] affect the result of that computation? E.g. can "<true> ? <value> : <undef>" yield anything other than <value>? I suspect the answer is again no, although I don't have much better to base this on than "that's the simplest and most intuitive way for shaders to behave", and also again "what would be the point of OpUndef otherwise". I don't *know* that the answers are no, and I think we should still request a spec clarification. But I think an answer of yes to either would make the language near unusable as a compiler target. Fundamentally, if this isn't enough to avoid undefined behaviour, I don't think anything is. And it's worth pointing out that no spec will ever be perfect, and no "conformant" implementation will be perfect either, and this behaviour seems sensible, and fixes a problem in a way that aligns with what the spec's intent *looks* like. Also, this isn't the first time that we've avoided undefined behaviour with an OpSelect—see again udiv.
After some research, my feeling is different than yours WRT the two questions you asked. Specifically: * [I asked on a Khronos-managed Discord instance my question](https://discord.com/channels/427551838099996672/427951557045387276/113440593...); the only answer I got is that the C/C++ interpretation is valid, meaning that as soon as you invoke UB anything can happen: the program can change arbitrarily, the driver can declare the device lost, etc. Granted, that's just one random user giving their opinion, so it's nothing more than a data point. * The comment for `udiv` mentions that the RISC-V spec says: "The resulting value is undefined if Operand 2 is 0". That's not the spec text anymore: in commit https://gitlab.khronos.org/spirv/SPIR-V/-/commit/8e544a3947a67bf20edaf4ad306... it was changed to "Behavior is undefined if Operand 2 is 0", which is a stronger formulation (the same used for `OpConvertFToU`). Also, while the SPIR-V spec has in principle no relationship with the C/C++ specs, the fact that they decided to switch to the same formulation reinforces my belief that the same meaning is sought. Also, C is clearly the dominant programming language in this software engineering segment, so the interpretation is sensible anyway. * The fact that GPUs are going to execute both branches anyway to then select the relevant result is, IMHO, irrelevant here; it's just an implementation detail. There are many other effects that can happen if the generated code is UB (in the C/C++ sense), like a compiler inferring other unwanted optimizations.
So my interpretation is that to emit correct SPIR-V code we should use a full `OpBranchConditional`, so that `OpConvertFToU` is not even executed when outside of its domain.
OTOH, I don't think that necessarily has to happen within this MR, which I consider an improvement over the current situation anyway, at least from the correctness point of view. I don't know how to weigh overhead, really, though I tend to agree with you on the count we already have a similar trick for `udiv`.
On Wed Aug 2 13:44:29 2023 +0000, Giovanni Mascellani wrote:
After some research, my feeling is different than yours WRT the two questions you asked. Specifically:
- [I asked on a Khronos-managed Discord instance my
question](https://discord.com/channels/427551838099996672/427951557045387276/113440593...); the only answer I got is that the C/C++ interpretation is valid, meaning that as soon as you invoke UB anything can happen: the program can change arbitrarily, the driver can declare the device lost, etc. Granted, that's just one random user giving their opinion, so it's nothing more than a data point.
- The comment for `udiv` mentions that the RISC-V spec says: "The
resulting value is undefined if Operand 2 is 0". That's not the spec text anymore: in commit https://gitlab.khronos.org/spirv/SPIR-V/-/commit/8e544a3947a67bf20edaf4ad306... it was changed to "Behavior is undefined if Operand 2 is 0", which is a stronger formulation (the same used for `OpConvertFToU`). Also, while the SPIR-V spec has in principle no relationship with the C/C++ specs, the fact that they decided to switch to the same formulation reinforces my belief that the same meaning is sought. Also, C is clearly the dominant programming language in this software engineering segment, so the interpretation is sensible anyway.
- The fact that GPUs are going to execute both branches anyway to then
select the relevant result is, IMHO, irrelevant here; it's just an implementation detail. There are many other effects that can happen if the generated code is UB (in the C/C++ sense), like a compiler inferring other unwanted optimizations. So my interpretation is that to emit correct SPIR-V code we should use a full `OpBranchConditional`, so that `OpConvertFToU` is not even executed when outside of its domain. OTOH, I don't think that necessarily has to happen within this MR, which I consider an improvement over the current situation anyway, at least from the correctness point of view. I don't know how to weigh overhead, really, though I tend to agree with you on the count we already have a similar trick for `udiv`.
To be fair, I should say it's quite possible that was the intention of "undefined behaviour" in general. But in the case of arithmetic instructions, I find it surprising that they would really have completely undefined behaviour in practice, or even that that was intended in the specification.
I suspect the switch to consistent wording "undefined behaviour" was to specifically contrast it with "invalid shader module", which has clearly defined semantics—passing an invalid module to Vulkan APIs is illegal (undefined behaviour according to Vulkan).
So my interpretation is that to emit correct SPIR-V code we should use a full `OpBranchConditional`, so that `OpConvertFToU` is not even executed when outside of its domain.
It's worse than that; we'd have to perform a branch *per component*.
I find it hard to believe this was intended. I find it easier to believe that "illegal under any conditions; you'd better make sure your inputs are sanitary" was intended (as problematic as it is), or that "arithmetic instructions produce an undefined value, and you won't have any bad behaviour as long as that value never gets written, so OpSelect is fine" was intended. Of course, I find it even easier to believe that not that much thought was put into "UB" in the first place.
It's worse than that; we'd have to perform a branch *per component*.
Right. Though on second thought you can use `OpSelect` as you did, provided that you first mask away the invalid inputs with another `OpSelect`.