Sorry for the late reply; I had to think about this for a while...
On 8/31/22 18:52, Francisco Casas wrote:
On 31-08-22 14:12, Zebediah Figura wrote:
I don't think I understand the point of splitting uniforms for sm4 but not sm1. If we need to determine (and potentially store) the actual type of a uniform per-component anyway, why would we need to do that differently for structs and arrays?
Well, first, I don't think we should split uniforms completely, only separate the object components as standalone variables.
So, if we have, say
struct { float4 foo; Texture2D tex[2]; float4 bar; } pou;
We would be effectively end up with 3 variables:
The original:
struct { float4 foo; Texture2D tex[2]; float4 bar; } pou;
and:
Texture2D "pou.tex[0]"; Texture2D "pou.tex[1]";
because we would be adding a store from the "pou.tex[0]" variable to pou.tex[0], and from the "pou.tex[1]" variable to pou.tex[1], copy-prop should replace all derefs to the Texture components to derefs to these new variables.
The pou variable no longer has to worry about the register allocation of its Texture fields, because the new variables do that.
Generalizing, variables should no longer care about their object components, unless they are objects themselves.
But this goes both ways. The struct variables still contain multiple object types, and they'll need to be able to allocate one while ignoring the other.
This would make more sense if we were to remove the object types from the struct afterward. It's not perfectly clear to me that we want to do that vs. just keeping them there. A more interesting question is how this compares to cases in sm1 where a variable gets allocated to multiple register sets (float/int/bool).
I think something similar happens under the hood in SM4, consider the output of the following shader in the native compiler:
struct { float4 foo; Texture2D tex[2]; float4 bar; } pou;
float4 main() : SV_TARGET { return pou.bar + pou.tex[0].Load(int3(1, 2, 3)) + pou.tex[1].Load(int3(1, 2, 3)); }
// Buffer Definitions: // // cbuffer $Globals // { // // struct <unnamed> // { // // float4 foo; // Offset: 0 // float4 bar; // Offset: 16 // // } pou; // Offset: 0 Size: 32 // Textures: t0-t1 // // } // // // Resource Bindings: // // Name Type Format Dim HLSL Bind Count // --------------------- ---------- ------- -------- --------- ------ // pou.tex[0] texture float4 2d t0 1 // pou.tex[1] texture float4 2d t1 1 // $Globals cbuffer NA NA cb0 1
Object components are listed separately in the resource bindings, and they are efectively removed (or ignored?) in the definition of pou.
This is a more interesting reason. There are two caveats, though, one easy to handwave and one harder.
Firstly, it's worth noting that this doesn't happen with 4.0 or 4.1, only 5.0 (and 5.1 is a different beast anyway). Putting objects in a struct is illegal in 4.x, even if you're not mixing them with numeric types, but you can see a similar difference if you just declare a global array of objects.
Secondly, and more importantly, the original struct actually needs to retain some amount of information about the objects it contained, including which ones were actually used. Consider the following shader:
struct { Texture2D one; Texture2D unused; float3 coords; } apple;
struct { // Texture2D unused; Texture2D three; float3 coords; } banana;
float4 main() : SV_TARGET { return apple.one.Load(apple.coords) + banana.three.Load(banana.coords); }
If you move "unused" to the "banana" struct (as indicated by the comment), the shader itself stays the same, as does the binding table (in the RDEF section), but the constant buffers do not. The difference can be seen in the disassembly output, and corresponds to the "StartTexture" and "TextureSize" fields of D3D11_SHADER_VARIABLE_DESC.
This means, notably, that I don't think we can just remove object variables from their structs.
With this, for both SM1 and SM4, each variable should only care about the allocation of a single type of register, so we can use the register offsets as we do now.
Maybe, but on the other hand, could we just have multiple registers per struct variable? I.e. one per namespace (numeric, sampler, texture, uav...)
In the future we would probably want to remove the "offset" field from the derefs, and make each hlsl_sm*.c compute the register offsets from the derefs in their index path form.
We definitely want to get rid of "offset" from the derefs.