Re: [vkd3d] Handling object components within the HLSL compiler.

20 Sep 2022

      Hello,
Sorry for the late reply too, I had to hit myself against the wall a 
couple of times to understand this better.
On 08-09-22 00:37, Zebediah Figura wrote:
...
Sorry for the late reply; I had to think about this for a while...
On 8/31/22 18:52, Francisco Casas wrote:
...
On 31-08-22 14:12, Zebediah Figura wrote:
...
I don't think I understand the point of splitting uniforms for sm4 but
not sm1. If we need to determine (and potentially store) the actual type
of a uniform per-component anyway, why would we need to do that
differently for structs and arrays?
Well, first, I don't think we should split uniforms completely, only
separate the object components as standalone variables.
So, if we have, say
struct {
    float4 foo;
    Texture2D tex[2];
    float4 bar;
} pou;
We would be effectively end up with 3 variables:
The original:

struct {
    float4 foo;
    Texture2D tex[2];
    float4 bar;
} pou;

and:

Texture2D "pou.tex[0]";
Texture2D "pou.tex[1]";

because we would be adding a store from the "pou.tex[0]" variable to
pou.tex[0], and from the "pou.tex[1]" variable to pou.tex[1], copy-prop
should replace all derefs to the Texture components to derefs to these
new variables.
The pou variable no longer has to worry about the register allocation of
its Texture fields, because the new variables do that.
Generalizing, variables should no longer care about their object
components, unless they are objects themselves.
But this goes both ways. The struct variables still contain multiple 
object types, and they'll need to be able to allocate one while ignoring 
the other.
This would make more sense if we were to remove the object types from 
the struct afterward. It's not perfectly clear to me that we want to do 
that vs. just keeping them there. A more interesting question is how 
this compares to cases in sm1 where a variable gets allocated to 
multiple register sets (float/int/bool).
...
I think something similar happens under the hood in SM4, consider the
output of the following shader in the native compiler:

struct {
    float4 foo;
    Texture2D tex[2];
    float4 bar;
} pou;
float4 main() : SV_TARGET
{
    return pou.bar + pou.tex[0].Load(int3(1, 2, 3)) +
pou.tex[1].Load(int3(1, 2, 3));
}

// Buffer Definitions:
//
// cbuffer $Globals
// {
//
//   struct <unnamed>
//   {
//
//       float4 foo;                    // Offset:    0
//       float4 bar;                    // Offset:   16
//
//   } pou;                             // Offset:    0 Size:    32
                                          // Textures:  t0-t1
//
// }
//
//
// Resource Bindings:
//
// Name                   Type      Format  Dim       HLSL Bind Count
// --------------------- ---------- ------- --------  --------- ------
// pou.tex[0]            texture    float4        2d         t0      1
// pou.tex[1]            texture    float4        2d         t1      1
// $Globals              cbuffer        NA        NA        cb0      1

Object components are listed separately in the resource bindings, and
they are efectively removed (or ignored?) in the definition of pou.
This is a more interesting reason. There are two caveats, though, one 
easy to handwave and one harder.
Firstly, it's worth noting that this doesn't happen with 4.0 or 4.1, 
only 5.0 (and 5.1 is a different beast anyway). Putting objects in a 
struct is illegal in 4.x, even if you're not mixing them with numeric 
types, but you can see a similar difference if you just declare a global 
array of objects.
Secondly, and more importantly, the original struct actually needs to 
retain some amount of information about the objects it contained, 
including which ones were actually used. Consider the following shader:
struct
     {
         Texture2D one;
         Texture2D unused;
         float3 coords;
     } apple;
struct
     {
         // Texture2D unused;
         Texture2D three;
         float3 coords;
     } banana;
float4 main() : SV_TARGET
     {
         return apple.one.Load(apple.coords)
                 + banana.three.Load(banana.coords);
     }
If you move "unused" to the "banana" struct (as indicated by the 
comment), the shader itself stays the same, as does the binding table 
(in the RDEF section), but the constant buffers do not. The difference 
can be seen in the disassembly output, and corresponds to the 
"StartTexture" and "TextureSize" fields of D3D11_SHADER_VARIABLE_DESC.
...
This means, notably, that I don't think we can just remove object
variables from their structs.
More so, the resource bindings actually change too.
If "unused" is added to the banana struct, banana.three is bind to 't3' 
instead of 't2'.
So far I have noticed that if only a single texture within a 
struct/array is used, all of the textures within that struct/array are 
allocated.
Sampler components are a little different; they are allocated from the 
first one to the last one used.
In both cases it is indeed necessary to keep track of the usage of 
object components, this is also a requirement for the inference of SM1 
sampler dimensions.
...
...
With this, for both SM1 and SM4, each variable should only care about
the allocation of a single type of register, so we can use the register
offsets as we do now.
Maybe, but on the other hand, could we just have multiple registers per 
struct variable?  I.e. one per namespace (numeric, sampler, texture, 
uav...)
Well, after trying many things, and checking your example ahead, I 
realized that indeed we have to allow variables to span across multiple 
register spaces if we want to correctly imitate the register indexes 
assigned to each object by the native compiler.
After going for the approach of separating object type components as 
different variables, so that each variable is allocated in a single 
register space, my hope was being able to determine which object 
components needed to be allocated before the point where they were split 
into separate variables. That proved to be quite difficult without 
having to call copy propagation and other passes twice. Also, keeping 
track of the struct's sampler components usage and then transfer that 
information to the new variables ended up a little ugly.
At least now I am more familiar with register allocation.
I have been some days implementing a proposal for allocating variables 
in multiple register spaces. Sadly, this requires changing a lot of 
code, I am trying to do it cleanly, though.
I am also translating my SM1 resource loads and long list of related 
patches to this approach to see if it fits those well.
...
...
In the future we would probably want to remove the "offset" field from
the derefs, and make each hlsl_sm*.c compute the register offsets from
the derefs in their index path form.
We definitely want to get rid of "offset" from the derefs.
Regarding this, after some thought I am actually not sure. Consider a 
complex non-constant offset dereference such as 'foo[n][m]' in:
---
unsigned int n, m;
float4 main() : sv_target
{
     float4 foo[3][4] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11};
return foo[n][m];
}
---
So far I haven't seen these being represented with a complex expression 
within the index of an instruction's register (I doubt that register 
indexes can be more complex than a reference to another register plus a 
constant offset).
Instead, a single temporal register is created and then referenced (r0 
in this case):
---
ishl r0.x, cb0[0].x, l(2)
iadd r0.x, r0.x, cb0[0].y
mov r0.x, x0[r0.x + 0].x
mov o0.xyzw, r0.xxxx
---
So it is reasonable for me to keep the deref offset instruction to serve 
this purpose.
I suggest we review this again after once I send this patch series as a 
merge request.
Best regards,
Francisco

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: [vkd3d] Handling object components within the HLSL compiler.