Can we split registers? E.g., map o0.xyz in the vertex shader to v0.x and v1.zw in the pixel shader. Is that something we need to be able to do?
Well, sm4 doesn't need it. As for sm1... I can get the native assembler to output two dcl instructions with the same semantic, but I don't know how that's interpreted. Is that something that you've encountered before?
I seem to recall seeing applications doing that, yes, though it's also long enough ago that I don't remember for sure.
Right now, I'm just looking for either "that can't possibly happen" or "we could handle that in this way, if needed". The backup is of course doing a v2 of the structure, but it would be nice to avoid that if we can.
Sure.
Unless actually doing that does something unexpected, I think it would be covered from an API perspective by emitting multiple dcls as a single entry in the signature (with a combined mask), and then adding code in the spirv to blit to the individual register variables. Actually I think our signature scanning code is already written that way, and depending on how the I/O lowering pass is written, it may be as well.
Right, a caller may want to remap, but it may also not necessarily need to. Also, does that imply we may need to remap pixel shader inputs as well, along the lines of wined3d_pixel_shader's "input_reg_map"?
Broadly. The way I was picturing this would actually work would be that you'd pass the varying limit as an argument to v-s (as a compile option?), vs would automatically remap PS registers according to the limit (setting their register_index in the signature accordingly), and then vkd3d_shader_build_varying_map() would use those mapped register indices to build the varying map for the VS.
+ /** + * The number of registers provided in \ref varying_map. + */ + unsigned int varying_map_size; +};
So, "register_count"? :)
Sure, though that means it's not named like varying_map...
We could also say "varying_count", I suppose. The issue I have with using "_size" for fields like these is that it's often ambiguous whether it refers to the size of the allocation or the number of elements.
I think varying_count is close enough, yeah.
+static inline int vkd3d_bit_scan(unsigned int *x) +{ + int bit_offset; +#ifdef HAVE_BUILTIN_FFS + bit_offset = __builtin_ffs(*x) - 1; +#else + for (bit_offset = 0; bit_offset < 32; bit_offset++) + if (*x & (1u << bit_offset)) break; +#endif + *x ^= 1u << bit_offset; + return bit_offset; +}
"x" should probably be a uint32_t, although I think there's also an argument for making it uintptr_t sized.
It was done on the model of wined3d and the other functions here, but sure, I can change it.
Yeah, I figured. The reason wined3d's wined3d_bit_scan() takes an unsigned int is essentially just "history", but it would be nice to make this consistent with the other bitmap functions like bitmap_set() here.
I don't think we want the #else implementation; it seems preferable to #error out. On Windows we could potentially use _BitScanForward(), and POSIX provides ffs(), but even without either of those we could probably do better in plain C.
Same applies here, though more saliently, I'm not sure how to interpret the statements "I don't think we want the #else implementation" alongside "we could probably do better in plain C"?
We seem to have grown the #else path in wined3d_bit_scan() at some point during Wine's PE move, but I'd argue we never want to use it; iterating over all bits is just not a sufficient implementation for something like wined3d_stateblock_apply() or context_apply_draw_state().
As for doing better than a linear scan without using either POSIX or Win32 functions, plenty has been written about that by other people. One fairly decent approach is to use a de Bruijn sequence together with a small lookup table. Still, I think ffs() and _BitScanForward() should cover everything we care about.
Makes sense, thanks.
+ * This mapping should be constructed by vkd3d_shader_build_varying_map().
Should it? If we're going to prescribe that, we might as well just pass the nest stage input signature in struct vkd3d_shader_next_stage_info, and only provide a way to retrieve the output map used, whether that's vkd3d_shader_build_varying_map() or simply vkd3d_shader_scan().
Well... "should", not "must",
I'd argue it's more of a "may".
Sure, that's reasonable. I'll make it more clear what the API for this is, then.