On Thu Jun 1 13:36:54 2023 +0000, Giovanni Mascellani wrote:
Yes, but the mere fact that it *can* happen is a problem for me, because it requires that every developer keeps in mind that `struct hlsl_ir_jump` changes meaning depending on the SM, which I think it's a bad idea. Especially, I don't want this to become a precedent for other IR instructions to be assigned different meanings depending on the SM. I don't know whether this is already happening for floating point rules, but if it is it's already bad enough, and if it isn't then let's not start it now.
At the time I wrote that comment (v3), the patch was emitting conditional HLSL_IR_JUMP_DISCARD for sm1, but still unconditional HLSL_IR_JUMP_DISCARD (guarded out with HLSL_IR_IF) for sm4. There are basically two issues here:
* That the IR should have identical semantics regardless of backend—i.e. what Giovanni is trying to argue. I think this is universally something we should stick to. I want to be able to read code that generates HLSL, or dumped HLSL IR, and know exactly what it's doing, without having to rely on context.
* That we should (as much as possible) generate IR that doesn't rely on the profile in the frontend, and use explicit lowering passes to change it if necessary. This is... something I've been trying to argue for on the grounds of modularity, and what I was trying to say in my original comment in this thread, but I can also see the counterargument that I am cargo-culting modularity, especially when we can't get away from having profile checks elsewhere in the HLSL code.
For the sake of explanation, D3DCOLORtoUBYTE4() is a bit of a different case than this: in that case the *semantics* of the function actually differ between sm1 and sm4. In the case of clip() the semantics are the same, but require different bytecode to express them. That doesn't mean I'm 100% convinced we want to use a lowering pass, along the lines of what I was saying above.
In terms of analogizing to other code, this is a more similar case to (float) division, sqrt(), dot(), or round(), which all have dedicated instructions for sm4, and need lowering for sm1. Currently we use dedicated passes to perform that lowering. Is that the best solution? Maybe. Probably?
On top of that, there is probably a decision that should at least be more explicit here: when I proposed to add an IR operator corresponding to the C ternary operator this was rejected because, if I understood well, we don't want to make the IR too rich, so that it's easier to write the optimizer (because there are fewer possible ways to express the same constructs). So, it is intended that HLSL `a ? b : c` must be translated to IR `if (a) { store(x, b); } else { store(x, c); } load(x)` rather than using an IR-level ternary operator. Here we've got essentially the same dilemma, so according to the same principle we should avoid adding `arg`, but rather use an `if` construct; and the task of lowering that to a better TPF program should be delegated to some other optimizer that might in principle appear at some point in the future.
Now, I was against this decision at the time, so I should be in favor to what Nikolay is doing here. And in a sense I am, except that I feel we should a bit more consistent about these design decisions. Is this case different from the ternary operator? Or did we decide that since the lower-level optimizer doesn't exist (or, more precisely, while it doesn't exist yet) we agree that it makes sense for the current IR to be a little more flexible, so that some optimizations can still be done on it? That wouldn't be similar in spirit to !215.
Honestly looking at that discussion I'm not really convinced of my arguments there either, or at least I don't think that they're very strong. I think keeping the IR as conceptually simple as possible is a good idea, but I'm not sure that adding a MOVC analogue really adds much in the way of conceptual complexity. It's not like conditional discard adds that much either. On the other hand, these things do add up...
One thing that we probably do want to avoid (which I think I alluded to an earlier comment) is the possibility of having to write both a lowering pass and a raising pass for the same transformation, i.e. doing the same transformation in both directions. E.g. we don't want to have to emit MOVC for ternaries, have to lower it for SM1, but then also have to raise "if (cond) a = b; else a = c;" to a movc. Similarly, it's a waste of time to emit a clip-discard statement, have to lower it for sm4, but then also have to raise an explicitly spelled clip() for sm1. On the other hand, native doesn't seem to raise *either* of these cases...