On 5/10/22 08:31, Giovanni Mascellani wrote:
Hi,
sorry, I made a mess with the commit message and some of it remained in the subject line.
At any rate, I think this patch still requires some thought: changing hlsl_type_component_count(), while making sense in itself, impacts hlsl_compute_component_offset().
More in general, I think there are two different concepts that are still a bit too confused in the code:
- the "logical" component count, i.e., how many things you have to put
into an initializer;
- the "physical" register count, i.e., how many register that type
will require.
Most of base types take one component and one register, which is probably why we're confusing them right now. But objects don't (they take a component, but no registers, because they are pure entities, they don't have a value). I think doubles wouldn't either if we supported them (they take one component, but two registers).
Also, components are always packed (as far as I know), while registers have padding we have to take into account (for example, a matrix row is padded to a whole four registers; same for an array item; but these rules depend on the SM).
Therefore I think we should distinguish more clearly which of these two concept is used each time, and be sure that we're computing correctly both offsets and sizes based on either of them.
Consider for example:
struct foo { Texture2D t; float x; };
With this patch the field x would go to register offset 1, which doesn't look correct.
BTW, I just made up the words "component" and "register", and I think they're wrong already, because I guess that a register is technically four things of what I called "register" in this email. Maybe HLSL/DXBC gurus have better word proposals.
Yes, quite. Without trying to hold things up, it's probably past time we fixed this in a better way. Adding hlsl_fixmes for object arrays is feasible but has proven somewhat awkward.
The above summary is about accurate, although I will nitpick one thing—objects *do* have registers; they're just allocated in a different namespace and according to different rules.
There was some private discussion between myself and Matteo about this, but I think the correct thing to do is to deal in HLSL *only* in terms of components.
In fact, I would go so far as to say that reg_size and reg_offset should go away (or perhaps be accessed only from hlsl_sm4.c and hlsl_sm1.c—but better just to invent a new helper to calculate them from scratch, and stop storing them in the hlsl_type structure, I think).
I would also go so far as to say that hlsl.c should stop caring about matrix majority, and just use the same component ordering for both row-major and column-major matrices.
There are a few interesting consequences here:
* Doing the translation from "component-based" offsets to "register-based" offsets (i.e. getting the alignment and namespaces right) is actually relatively trivial; as Francisco pointed out, we basically just need to use hlsl_compute_component_offset() in hlsl_offset_from_deref().
* One thing we should do, though, is make a point of watching for non-contiguous loads in hlsl_offset_from_deref(). Because of field alignment, I don't think we'll ever generate any, but that fact doesn't inhere in the IR, and accordingly I'd rather not even assert() that it's the case. (In fact, if we ignore matrix majority, it may end up being quite easy to "accidentally" generate non-contiguous loads.)
* in order to generate less terrible code, we are going to want to do vectorization passes, and probably other passes as well, on sm1/sm4 IR (henceforth "LLIR"). This doesn't necessarily need to happen yet, since we can generate correct code without such optimizations, but it is another reason that introducing a "proper" LLIR will need to happen, as has been talked about often.