On Wed Oct 11 10:37:17 2023 +0000, Giovanni Mascellani wrote:
I'm not sure this is correct: it seems that the `r` and `x` registers are two independent namespaces, so the `id` that was allocated for the variable as an `r` register has no meaning if you want it to use it as an `x` register. This has a few undesirable implications:
- You're wasting registers, because an `r` register makes it impossible
to use the corresponding `x` registers and viceversa. This might be minor, I guess, since it's quite likely that a downstream compiler is going to redo register allocation anyway. But it's not pretty.
- More importantly, you might end up allocating two different variables
(with independent lifespans) on the same or conflicting `x` registers, therefore creating duplicate and possibly contradicting `dcl_indexable_temp` instructions. That doesn't look right. I think that the proper way forward here is to skip indexable variables from the usual register allocation algorithm and have another round of `x` register allocation for them, in which I think that lifetime doesn't have to be considered (i.e., variables must use independent registers even if they never are alive at the same time). This last bit might require some independent confirmation: I remember it from past experiments, but I might be mistaken.
You are correct, these are independent name spaces and I am wasting registers. It didn't seem like a problem to me when I implemented this, but now I see that I should improve it.
Furthermore, after playing more with the native compiler I realized that every `xN` register has its own name space, and thus, we don't need to worry about overlaps, for instance, for the following shader:
``` float a, b, c, d; float e, f, g, h; int i, j;
float4 main() : sv_target {
float arr1[8] = {a, a, b, b, c, c, d, d}; float arr2[8] = {e, e, f, f, g, g, h, h};
arr1[i] = arr2[i]; arr2[j] = arr1[j];
return 1000 * float4(arr1[0], arr1[4], arr2[0], arr2[4]) + 100 * float4(arr1[1], arr1[5], arr2[1], arr2[5]) + 10 * float4(arr1[2], arr1[6], arr2[2], arr2[6]) + 1 * float4(arr1[3], arr1[7], arr2[3], arr2[7]); } ```
native compiles: ``` dcl_indexableTemp x0[8], 4 dcl_indexableTemp x1[8], 4 ```
instead of: ``` dcl_indexableTemp x0[8], 4 dcl_indexableTemp x8[8], 4 ``` as it may be expected, given that both arrays have overlapping lifetimes.
Regarding the possibility of overlapping indexes in the declarations, that's correct too, it didn't occur to me and didn't manifest on any of my tests.
So, yes, the safest thing would be to assign a permanent index to every indexable temp, regardless of its lifetime. I implemented that on 9/9.