How do we represent registers not read by the next stage? vkd3d_shader_build_varying_map() seems to suggest we should map them to a free register, but I'm not sure that's ideal. It also sets "dst_mask" to 0, which is perhaps easier to work with. (So far that and the FIXME also seem like the only usage of "dst_mask" though...) In any case, the API documentation here leaves it unspecified.
dst_mask of zero, yes. Mapping them to a free register is done so that the current spirv code outputs them to a free location.Ideally we wouldn't output them at all, but I left that out of scope of the current patch.
Similarly, we *should* be outputting only the varyings that the PS uses [unless we have some Vulkan extension I forget the name of], but I left that out of scope of this patch as well. Point is, we have those things in the API so we can use them going forward.
Sort-of-underspecifying this struct, and how vkd3d_shader_build_varying_map() builds its output, was a deliberate choice, since I wanted to basically say "you should use those functions", and give us a bit of freedom at least as to what registers to actually use.
Can we split registers? E.g., map o0.xyz in the vertex shader to v0.x and v1.zw in the pixel shader. Is that something we need to be able to do?
Well, sm4 doesn't need it. As for sm1... I can get the native assembler to output two dcl instructions with the same semantic, but I don't know how that's interpreted. Is that something that you've encountered before?
I suspect that even if it is, the right thing to do here is handle that in the pixel shader, and convert it to a single semantic. I don't think our current spirv code is set up to handle it otherwise.
Is this strictly necessary for d3dbc shaders? I suppose "should" leaves the possibility of not providing it open, but remap_output_signature() seems to care less about it being absent than "should" may imply.
It's not necessary if the registers happen to match, but that is probably unlikely. I wouldn't object to making it a hard requirement for d3dbc.
Note also that shader model 1 and 2 d3dbc shaders have a natural mapping between vertex shader outputs and pixel shader inputs.
They do, though there is a snag. 1.x/2.x can output as many as 10 registers (8 texcoord and 2 colour. Also, is fog an issue here?), but early versions of GL might not be able to support that many locations (4.6 requires 16, as does Vulkan, but 3.0 seems to require only 8), which would force us to do remapping—and if the varyings aren't the same, I don't think we can remap in a consistent manner. So we may end up needing this struct for 1.x/2.x as well.
+ /** + * The number of registers provided in \ref varying_map. + */ + unsigned int varying_map_size; +};
So, "register_count"? :)
Sure, though that means it's not named like varying_map...
+static inline int vkd3d_bit_scan(unsigned int *x) +{ + int bit_offset; +#ifdef HAVE_BUILTIN_FFS + bit_offset = __builtin_ffs(*x) - 1; +#else + for (bit_offset = 0; bit_offset < 32; bit_offset++) + if (*x & (1u << bit_offset)) break; +#endif + *x ^= 1u << bit_offset; + return bit_offset; +}
"x" should probably be a uint32_t, although I think there's also an argument for making it uintptr_t sized.
It was done on the model of wined3d and the other functions here, but sure, I can change it.
I don't think we want the #else implementation; it seems preferable to #error out. On Windows we could potentially use _BitScanForward(), and POSIX provides ffs(), but even without either of those we could probably do better in plain C.
Same applies here, though more saliently, I'm not sure how to interpret the statements "I don't think we want the #else implementation" alongside "we could probably do better in plain C"?
+ * This mapping should be constructed by vkd3d_shader_build_varying_map().
Should it? If we're going to prescribe that, we might as well just pass the nest stage input signature in struct vkd3d_shader_next_stage_info, and only provide a way to retrieve the output map used, whether that's vkd3d_shader_build_varying_map() or simply vkd3d_shader_scan().
Well... "should", not "must", but moreover, the whole point here is that passing the signature to struct vkd3d_shader_next_stage_info got complicated in some ways, and this is nicer. Keeping this function around solves the caching problem, but:
(1) we'd have to reorganize some things in wined3d to allow passing the signature; it currently expects to use *only* the cache key as compile arguments. Minor problem, I guess, but if we *can* pass the cache key, I don't see a reason not to.
(2) this mapping is already kind of in optimal ("compiled", in a sense) form for vkd3d_shader_compile(), since we just look up the mapping in an array.
It might be nice to guarantee an upper bound of output_signature->element_count. That might require representing "varyings" in a way that omits unused registers, but that might not be a bad thing.
I originally was going to write that, but decided that since I was indexing the array by register_index, it wouldn't work. But on reflection there's no reason we can't index the array by the *signature* index, so yeah, that'd be better.