On Fri Mar 15 05:44:54 2024 +0000, Conor McCarthy wrote:
This is virtualisation, right? I wonder if there will be issues with older drivers, like we found with the TGSM MR. I don't the tests there will trigger this case.
Yeah, unfortunately there is some virtualization here. But whether it exposes us to a variant of the problem we're having with TGSM, I'm not completely sure, and I lean towards thinking it does not. My understanding is that the real problem we're having with TGSM is not virtualization, but lack of merging information. The new structurizer has usable merging information, and AFAIU the SPIR-V specification only require dynamically uniform control flow for tangled instructions, not necessarily statically uniform control flow. So the virtualization introduced by the break trampoline might make the compiler miss some optimization opportunities, but if invocations are expected to (dynamically) converge at some instruction, then they should (while clearly they cannot be expected to converge with the old structurizer).
That nevertheless, the optimizations passes I'm going to submit soon should largely remove the need for multilevel jumps (because loops with breaks are converted to selections whenever possible), so eventually I'll also add code to skip emitting trampolines when they're not needed. [My work branch](https://gitlab.winehq.org/giomasce/vkd3d/-/commits/cfg4) already does a decent job (though skipping trampolines is not written yet). Eventually I'm also going to gather statistics on ShadowCI to see how often we're forced to emit trampolines and if there are other heuristics that can help getting rid of them. I guess in the general case they cannot be avoided completely, though: one can always come up with an arbitrarily complicated DXIL shader that cannot be represented with a structured program without virtualizing at least some control flow. Hopefully this kind of shader is not often generated by DXC.
For completeness, let me mention that in general the new structurizer has to make some arbitrary choice, because the problem of structurizing a DXIL shader is intrinsically ill defined. [Consider for example this CFG](http://magjac.com/graphviz-visual-editor/?dot=digraph%20%7B%0A%20%20%20%20n1...), which is reducible (trivially: it has not back edges): ```dot digraph { n1 -> n2 -> n4; n1 -> n3 -> n5; n2 -> n5 -> n6; n3 -> n4 -> n6; } ``` When constructing the block order the structurizer can arbitrarily decide the relative order of blocks 2 and 3 and independently of blocks 4 and 5 (well, the structurizer currently has a rule to prefer more recently added nodes, so in practice the two choices are not independent; but that's mostly a coincidence). Depending on that choice, either block 4 will be a merge for block 5 or the other way around, but there is nothing in the graph that suggest that one option is better than the other. So if, say, block 5 contains a barrier and we happen to make it the merge node for 4, then everything will work fine. If we happen to pick the other choice, the program will be broken. And this is not a bug with the structurizer, it simply happens that the structure information is not there and we can only guess in some cases. If we happen to stumble upon cases like this we have nothing to do other than finding heuristics and hope they're enough. I assume that's what Windows drivers do too.