Can we split registers? E.g., map o0.xyz in the vertex shader to v0.x and v1.zw in the pixel shader. Is that something we need to be able to do?
Well, sm4 doesn't need it. As for sm1... I can get the native assembler to output two dcl instructions with the same semantic, but I don't know how that's interpreted. Is that something that you've encountered before?
I seem to recall seeing applications doing that, yes, though it's also long enough ago that I don't remember for sure.
Right now, I'm just looking for either "that can't possibly happen" or "we could handle that in this way, if needed". The backup is of course doing a v2 of the structure, but it would be nice to avoid that if we can.
Is this strictly necessary for d3dbc shaders? I suppose "should" leaves the possibility of not providing it open, but remap_output_signature() seems to care less about it being absent than "should" may imply.
It's not necessary if the registers happen to match, but that is probably unlikely. I wouldn't object to making it a hard requirement for d3dbc.
Note also that shader model 1 and 2 d3dbc shaders have a natural mapping between vertex shader outputs and pixel shader inputs.
They do, though there is a snag. 1.x/2.x can output as many as 10 registers (8 texcoord and 2 colour. Also, is fog an issue here?), but early versions of GL might not be able to support that many locations (4.6 requires 16, as does Vulkan, but 3.0 seems to require only 8), which would force us to do remapping—and if the varyings aren't the same, I don't think we can remap in a consistent manner. So we may end up needing this struct for 1.x/2.x as well.
Right, a caller may want to remap, but it may also not necessarily need to. Also, does that imply we may need to remap pixel shader inputs as well, along the lines of wined3d_pixel_shader's "input_reg_map"?
+ /** + * The number of registers provided in \ref varying_map. + */ + unsigned int varying_map_size; +};
So, "register_count"? :)
Sure, though that means it's not named like varying_map...
We could also say "varying_count", I suppose. The issue I have with using "_size" for fields like these is that it's often ambiguous whether it refers to the size of the allocation or the number of elements.
+static inline int vkd3d_bit_scan(unsigned int *x) +{ + int bit_offset; +#ifdef HAVE_BUILTIN_FFS + bit_offset = __builtin_ffs(*x) - 1; +#else + for (bit_offset = 0; bit_offset < 32; bit_offset++) + if (*x & (1u << bit_offset)) break; +#endif + *x ^= 1u << bit_offset; + return bit_offset; +}
"x" should probably be a uint32_t, although I think there's also an argument for making it uintptr_t sized.
It was done on the model of wined3d and the other functions here, but sure, I can change it.
Yeah, I figured. The reason wined3d's wined3d_bit_scan() takes an unsigned int is essentially just "history", but it would be nice to make this consistent with the other bitmap functions like bitmap_set() here.
I don't think we want the #else implementation; it seems preferable to #error out. On Windows we could potentially use _BitScanForward(), and POSIX provides ffs(), but even without either of those we could probably do better in plain C.
Same applies here, though more saliently, I'm not sure how to interpret the statements "I don't think we want the #else implementation" alongside "we could probably do better in plain C"?
We seem to have grown the #else path in wined3d_bit_scan() at some point during Wine's PE move, but I'd argue we never want to use it; iterating over all bits is just not a sufficient implementation for something like wined3d_stateblock_apply() or context_apply_draw_state().
As for doing better than a linear scan without using either POSIX or Win32 functions, plenty has been written about that by other people. One fairly decent approach is to use a de Bruijn sequence together with a small lookup table. Still, I think ffs() and _BitScanForward() should cover everything we care about.
+ * This mapping should be constructed by vkd3d_shader_build_varying_map().
Should it? If we're going to prescribe that, we might as well just pass the nest stage input signature in struct vkd3d_shader_next_stage_info, and only provide a way to retrieve the output map used, whether that's vkd3d_shader_build_varying_map() or simply vkd3d_shader_scan().
Well... "should", not "must",
I'd argue it's more of a "may".