On Wed Sep 6 10:55:59 2023 +0000, Henri Verbeet wrote:
How valuable is that? If it's significant, we may want to consider
converting TPF to SSA form as well in the SPIR-V backend.
It seems to me quite likely that translating sm4 to SSA would be more
expensive than passing it to the driver (and letting the driver translate it to SSA internally, probably), but translating sm6 to temps would be more expensive than passing it through to the driver. To be clear, my expectation would be that these redundant OpStore and OpLoad instructions largely don't matter. I could very well be wrong about that, of course. And perhaps there are other reasons to avoid using temps here, but I haven't seen those. So what I imagine we want to do, is to start with naively translating DXIL SSA values to vsir temp assignments. That should unblock upstreaming plenty of bits. We don't have control flow yet, we don't have phi instructions yet either, and it's not clear to me that there's a fundamental reason e.g. vsir phi instructions couldn't operate on vsir temps anyway. I think there are a few scenarios that could happen from there:
- It's fine.
- We introduce vsir SSA registers for some other reason, e.g. because
the HLSL compiler wants to do optimisations in that form; DXIL/SPIR-V could then just take advantage of that.
- We encounter some issue that can't reasonably be resolved with vsir
temps. We discuss it when we get there, instead of now in the abstract.
- We do some benchmarking/profiling and find that there are either
advantages in compile time or run time to introducing a separate register type. We discuss it when we get there, with the hard data to back it up, instead of speculating about it now. And perhaps this is also a good time to reflect on the broader upstreaming strategy. On a very broad level, the sequence that I would have hoped/expected to see would be something along these lines:
- Some shader_runner infrastructure to compile and run DXIL shaders.
- The most basic, straightforward implementation of the bits required
to make a fairly minimal test pass. E.g., tests/hlsl/swizzles.shader_test.
- Basic, straightforward implementations of features required to make
the rest of the tests pass.
- Features required by applications, but not covered by the tests.
Writing tests for these as they're implemented.
- Optimisations and other complications.
Also, I'd like to stress this because it has come up before for other MRs, please don't pre-emptively add complications for issues that will only come up later; whether that's in a later patch in the same series or 400 patches later in the branch you're upstreaming from. Reviewers generally can't or don't want to look that far ahead, and it just ends up slowing the entire process down. Patches should make sense in isolation, at the point in time where they're introduced.
Just to explain my acceptance, I think too that the ideal flow would be to first convert SSA to temporary load and store, and introduce later SSA if there is an advantage. Dealing with loads and stores is something I expect downstream compilers to do pretty well (even the HLSL compiler does that, to a certain extent!), so I wouldn't stress too much over that. Phi nodes shouldn't be too much of a hassle either: just before emitting a jump you look at the phi nodes of the target block and convert them to assignments, unless I am missing something.
That said, I don't think it is unreasonable to go with SSA from the get-go, so I accept this MR and leave to Henri to do the actual software architecture part.