177 of 568 in my Cyberpunk shader collection are misaligned, and some I built are too, but it's uncommon enough to justify only copying when needed.
Do we know what causes this misalignment? Does dxc simply make no effort to align things while fxc does? Could you attach such a shader you compiled yourself here?
I added checksum writing to dxc (maybe there's a better way to make this available): [checksum hack](https://www.codeweavers.com/xfer/cmccarthy/dxc-add-checksum.diff)
I don't think we want to modify dxc like that, no.
In addition to instructions, much extra information needs to be passed out to the tracer or compiler, and parser_read_instruction is not practical for this. Instructions also need to reference other instructions, and the practice of writing one instruction to a struct would make that messy. Should vkd3d_shader_parser not be used at all for SM 6?
Well, a decision needs to be made. There are a couple of options:
- Use the existing vkd3d_shader_parser interface. This would of course be ideal; we'd be able to largely use the existing backends, perhaps with minor adjustments for things like handling additional instructions/capabilities. This seems unlikely to be feasible; still, it would be good to address the specific reasons behind that in the patch introducing whichever alternative we end up with.
- Don't use the vkd3d_shader_parser interface at all, and introduce something new only for DXIL. Perhaps this is unavoidable, but this would be the least desirable outcome, since it would imply needing DXIL-specific backends as well. That would most notably mean SPIR-V, but would have consequences for e.g. potential GLSL output as well.
- Extend/adjust the vkd3d_shader_parser interface. In particular, assuming for the sake of argument that the "parser_read_module" interface is the ideal interface for DXIL input, what are the reasons it would be unsuitable for dxbc-tpf and d3dbc input?
- Parse DXIL to the vkd3d_shader_sm6_module interface, and then do a lowering/flattening pass to turn it into a list of (likely extended) vkd3d_shader_instruction's.
The current patch essentially takes the approach of "pretend to use the vkd3d_shader_parser interface, but don't really", which I don't think is an appropriate option.
In any case, the patches you'll end up with will have to explain what specific issues they're addressing and what considerations went into the chosen approach, to someone potentially much less familiar with DXIL than you currently are. (And note that that person isn't necessarily "Henri, today"; this might be e.g. a new contributor reading the commit log 6 months from now, or perhaps yourself in a decade or so.)
It's perhaps also worth pointing out that this touches on ongoing conversations about the HLSL compiler's IRs. We currently essentially go from "HLSL IR" (almost) straight to d3dbc or dxbc-tpf bytecode. However, it turns out there may be some value in introducing an IR between that, on a level slightly higher than vkd3d_shader_instruction.
I initially was translating instructions to use vkd3d_shader_opcode, but since none can be handled by common code, we would end up using the same enum for two different things, with numerous values being used in one implementation but not the other. Also there would be inefficiencies, e.g. all numeric instructions with two inputs are emitted using a single BINOP with the actual operation encoded as an operand. We would translate this to one of the numerous VKD3DSIH equivalents, only to handle them all later in a single case statement.
It's possible to add another union member to vkd3d_shader_instruction for SM 6, but of the other fields only handler_idx, flags and src_count would be used. Use of common code would be rare; vkd3d_dxbc_compiler_map_ext_glsl_instruction() and vkd3d_dxbc_compiler_map_atomic_instruction() are the only two I found in a brief search, and the translation overhead probably cancels any gain.
Why is that? Or put a different way, what would prevent someone from transforming a list of vkd3d_shader_instruction's into a vkd3d_shader_sm6_module? And would that be worse than having to handle both in the backends?
I could reduce the large patch to nothing more than reading the block tree. This wouldn't do anything except validate the blocks and emit any errors encountered. Is that enough, or too much?
I'd have to see it; perhaps it would make sense to still split things further from there. It sounds like a decent start though.