Oof, I would have asked that we not do this quite yet :-/
I wouldn't call either d3dbc.c or tpf.c very long, but I suppose that's a matter of opinion. I imagine it doesn't help that I find the practice of spreading source code over a million tiny files to be one of the most irritating things a codebase can do...
I suppose it depends on your standard for how long a file can be. I feel like more than a few thousand lines can start to feel like an awful lot, and at that point the compilation time does start to show.
Personally I think modularity is the most important thing, and I find it somehow easier to mentally work with files that segregate their components. In this case I think segregating the reader and writer would not have been a bad idea, and keeping the sm4 definitions in a separate file might also have been nice even if not.
Obviously by that principle, any sufficiently modular piece of code can be split into its own file, however small. If it's only a few hundred lines, or a couple of functions (honestly, the number of [well-formed?] functions may matter less to me than the line count) I'd be inclined against it, but I think I have a lower threshold for what seems reasonable to split up. Take HLSL copy propagation, for instance, which is only about 500 lines, but on the other hand spans a whole 20 functions; that feels to me like a large enough self-contained chunk of code that it's worth at least considering segregation.
In any case, while it's perhaps not quite true at the moment, parsing and serialising the different bytecode formats should be closely related and use the same constants, structures, and so on. There's currently some HLSL-specific code intermixed that would more properly belong in hlsl.c/hlsl_codegen.c, but we can deal with that once we get to it.
...Possibly. I would have appreciated a bit more discussion on this first. I had been vaguely thinking of moving more code *to* the backend files, so now I need to stop and re-plan.
And while I think we'd always been talking about using the same structures in hlsl_smX, I kind of thought that we would perform that conversion before moving things around. That would have let things be done more gradually, and avoided some unpleasant rebases.
Anyway, generally this ties into the question I brought up in [1]. I brought up a relatively complete summary of our options with the IR there, but it doesn't seem to have garnered any discussion. At best we seem to have committed to a "maximally CISC" instruction set with Conor's normalization patches, so that rules out option (2).
I'd obviously like to keep HLSL details out of the sm1/sm4 code, but I'd also really like to keep sm1/sm4 details out of hlsl_codegen.c as much as we can. We are still going to need specific hlsl -> smX glue for some things like register allocation and semantic validation, and that may be unavoidable (though on the other hand, a lot of that may end up being shareable... certainly in the case of register allocation it currently is; I'm just not sure if it should remain that way.)
Past that, it's still an open question what kind of instructions the HLSL compiler should actually emit. I don't really know the answer to this one. My natural inclination is that we want the HLSL compiler to care about its target as little as possible [so I'm inclined against option (1)].
Going further, even if the IR is going to be maximal anyway, I'd say that HLSL should probably still emit a more minimal subset [basically option (3)], and then we would have backend-specific optimization passes that "raise" simple IR into more complex IR. We don't want to complexify the HLSL IR to match the CISC capabilities of vkd3d_shader_instruction, and if we're going to have to perform "raising" passes on vkd3d_shader_instruction anyway, probably better to raise them per backend.
Of course, there are upsides and downsides to each approach, and I'm not fully sure which is best.
[1] https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/37#note_13059