The following thread is based partly on, and makes reference to, private
conversation, but for the sake of openness I've elected to post it to
wine-devel.
A long time ago, HLSL_IR_LOAD—then called HLSL_IR_DEREF—was this:
enum hlsl_ir_deref_type
{
HLSL_IR_DEREF_VAR,
HLSL_IR_DEREF_ARRAY,
HLSL_IR_DEREF_RECORD,
};
struct hlsl_deref
{
enum hlsl_ir_deref_type type;
union
{
struct hlsl_ir_var *var;
struct
{
struct hlsl_ir_node *array;
struct hlsl_ir_node *index;
} array;
struct
{
struct hlsl_ir_node *record;
struct hlsl_struct_field *field;
} record;
} v;
};
struct hlsl_ir_deref
{
struct hlsl_ir_node node;
struct hlsl_deref src;
};
Now, one problem with this is that it was kind of mean to RA and
liveness analysis. For example, a line of HLSL like
var.a.b = 2.0;
produced the following IR:
2: 2.0
3: deref(var)
4: @3.b
5: @4.c = @2
This is annoying because:
* to discover that "var" is written, @5 needs to reach upwards through
a deref chain;
* reaching through the deref chain requires lots of assert() statements;
* @3 implies that "var" is read, which it isn't (and, if we reach
upwards through the deref chain, @4 implies the same thing).
I proposed that instead of using generic node pointers, we could have
arbitrarily long deref chains encoded in the hlsl_deref structure
itself. [1]
There was some discussion on that—which is mostly concentrated in that
thread, and also IRC. Most of the concern is about being nicer to
liveness analysis and RA.
What ultimately ended up happening is that Matteo proposed numeric
(register) offsets calculated at parse time, which is fundamentally
similar to my idea except that it's a lot simpler to work with.
Interestingly, the problem of multiple register sets was brought up [2]:
From my testing it essentially does, yes, i.e. if you have
struct { int unused; float f; bool b; } s;
float4 main(float4 pos : POSITION) : POSITION
{
if (s.b)
return s.f;
return pos;
}
then "s" gets allocated to registers b0-b2 and c0-c1, but only b2
and c1 are ever used.
So yeah, it makes things pretty simple. I can see how it would have
been a lot uglier otherwise.
I guess we've finally run into that ugliness now :-(
The ultimate conclusions to draw from this historical exercise are:
- what I said about "we used to have derefs handled like that" is mostly
correct, although not quite. We did used to have more rich type
information, and we did decide that offsets calculated at parse time
were preferable to that type information, although I thought we at one
point had something like [1] in the tree, which we didn't. Anyway the
decision to use offsets calculated at parse time seems to have been
motivated only by simplicity. To be fair, at the time, it *was* simpler.
- [1] and the later patch that replaced it were mostly motivated by RA.
We will probably end up doing RA after SMxIR translation, but we may
very likely do RA *before* it as well (tracking e.g. SMx instructions
with register numbers instead of having def-use chains.) A more salient
concern is that I still don't like the idea of having instructions in
the tree that aren't actually translated (or translatable) to SMxIR,
which means that we shouldn't have instructions that yield e.g. structs.
The ugliness that we've run into is: how do we emit IR for the following
variable load?
struct apple
{
int a;
struct
{
Texture2D b;
int c;
} s;
} a;
/* in some expression */
func(a.s);
Unlike the SM1 example above, the register numbers don't match up.
Separately, it's kind of ugly that backend-specific details regarding
register size and alignment are leaking into the frontend so much.
Similarly, the amount of code that has to deal with matrix majority is
unfortunate.
The former problem can potentially be solved by embedding multiple
register offsets into hlsl_deref (one per register type). Neither this
nor the latter problem are prohibitive, and I was at one point in favour
of continuing to use register offsets everywhere, but at this point my
feeling has changed, and I think using register offsets is looking more
ugly than the alternatives. I get the impression that Francisco
disagrees, though, which is why we should probably hash this out now.
Nor do I think we should use both register offsets and component offsets
(either in the same node type, or in different node types). That just
makes the IR way more complicated. Rather, I think we should be doing
everything in *just* component offsets until translation from HLSL IR to
SMx IR.
In order to deal with the problem of translating dynamic offsets from
components to registers, I see three options:
(a) emit code at runtime, or do some sophisticated lowering,
(b) use special offsetof and sizeof nodes,
(c) introduce a structured deref type, much like [1]. Francisco was
actually proposing something like this, although with an array instead
of a recursive structure, which strikes me as an improvement.
My guess is that (a) is very hard. I haven't really tried to reason it
out, though.
Given a choice between (b) and (c), I'm more inclined to pick (c). It
makes the IR structure more restrictive, and those restrictions
fundamentally match the structured nature of the language we're working
with, both things I tend to like.
Note that either way we're going to need specialized functions to
resolve deref offsets in one step. I also think that should depend on
the domain—e.g. for copy-prop we'll actually want to do everything in
component counts, but when translating to SMxIR we'll evaluate given the
register alignment constraints of the shader model. In the case of (b)
it's not going to be as simple as running the existing constant folding
pass, because we can't actually fold the sizeof/offsetof constants
(unless we dup the node list, evaluate, and then fold, which seems very
hairy and more work than the alternative).
I invite thoughts—especially from Matteo, since we discussed this sort
of problem ages ago.
ἔρρωσθε,
Zeb
[1] https://www.winehq.org/pipermail/wine-devel/2020-April/164399.html
[2] https://www.winehq.org/pipermail/wine-devel/2020-April/165493.html
> We cannot reliably detect presence of QEMU
When I said "it should at least refuse a MAP_FIXED mmap that would overwrite things the JIT is using", the "it" I had in mind was that qemu-user should do that, not wine-preloader. As you say, wine-preloader doesn't have a good way to know it's running inside qemu. qemu could see that executing this MAP_FIXED would trample on the host proces in ways that compromise the interpreter, and it could choose to make the mmap syscall return MAP_FAILED.
> I don't know if there is a significant portion of Wine user base that uses QEMU to run x86 apps on other architectures.
I'm actually doing the opposite, which has to be even more niche. We're using qemu-user and binfmt_misc to run unit tests of an aarch64 executable (built with wineg++) on our x86_64 CI server, as part of cross-compiling. This kind of usage has to be vanishingly rare (winelib seems quite rare period).
Sorry to have started a somewhat offtopic thread with my comment here. I don't directly think there's much for wine-preloader to do about this. I just stumbled on this merge request in the course of debugging my crash, and was in the same area, and made me think of more alternatives. I had forgotten that using mremap to move `[vdso]` out of the way might even be an option. The main connection is that if this MR makes wine ask to move the vdso out of the way (rather than just having MAP_FIXED clobber it), that seems like something qemu-user might be able to implement and carry on. Just clobbering it seems much harder to tolerate.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/6#note_2890
Required by Cyberpunk 2077 for 4.0 audio setups (`PKEY_AudioEndpoint_PhysicalSpeakers` is `SPEAKER_FRONT_LEFT | SPEAKER_FRONT_RIGHT | SPEAKER_BACK_LEFT | SPEAKER_BACK_RIGHT`) as it then dereferences the device path property (`{b3f8fa53-0004-438e-9003-51a46e139bfc},2`) without checking if is is non-NULL.
The patches use `{1}.ROOT\MEDIA\%04u` which is used on Windows for virtual audio devices (e.g. the ones created Voicemeeter Banana).
--
v2: winealsa.drv: Set device path for all devices.
winepulse.drv: Set device path for all devices.
https://gitlab.winehq.org/wine/wine/-/merge_requests/325