From what I can tell, the recent work on SampleBias/SampleLevel did all of the work for us, we just need to take the same framework and include the case for the `sample_l` instruction.
Tested with some shaders from the native Linux version of Little Racers STREET.
--
v5: vkd3d-shader/tpf: For sample_l/sample_b, set lod swizzle to SCALAR.
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/188
> > In practice I see a number of difficulties:
> >
> > * Shaders are not just code; they also embed interface information (input and output signatures, uniforms, buffers, object bindings, ...) and other random metadata (for example for the hull shaders). Should this be representable in the common IR?
>
> We need *some* way to represent that information so it doesn't get lost, at least. I think we don't want side channels, to be clear: I think that there should be a point where all of the information relevant to a shader is represented in v_s_i format, including metadata.
>
> And contrary to the instructions themselves, I think we want this information to be in a relatively simple and well-designed form. In a sense, it doesn't need to be "complex" like the instructions since we're not doing optimization passes over it \[if that makes sense\]. I think Conor has basically been pushing v_s_i in this direction in order to accommodate sm6 (at least wrt semantics), and given the tribulations I've had to endure to get sm1 to work, I think that's the right approach.
Yes, although I don't think that necessarily means storing these as "instructions". The approach I'm personally leaning towards at this point is to rebrand "struct vkd3d_shader_instruction_array" as "struct vkd3d_shader_program" (or something along those lines), and then add the input/output/patch constant signatures and the shader version structure to that structure. I think a similar approach should work for e.g. RDEF as well.
--
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/174#note_31631
This is causing several GStreamer-Video-CRITICAL messages to be printed while running the tests:
```
(wine:363): GStreamer-Video-CRITICAL **: 09:01:22.664: gst_video_info_from_caps: assertion 'gst_caps_is_fixed (caps)' failed
```
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/2546#note_31623
Please note that with this MR the following `TRACE()` calls are removed:
- `"ALSA does not support volume control\n"`
- `"OSS doesn't support setting volume\n"`
- `"PulseAudio does not support session volume control\n"`
Should I perhaps move them to the specific drivers' unixlib instead?
--
v2: winepulse: Use mmdevapi's SimpleAudioVolume.
wineoss: Use mmdevapi's SimpleAudioVolume.
https://gitlab.winehq.org/wine/wine/-/merge_requests/2726
According to the tests in 23b72ad, when we are reading a compressed stream, the type returned by `stream_props_GetMediaType()` should reflect compressed format even if we finnally output uncomressed data. For example, if we use wmvcore reader to read a WMV3 stream and output RGB24, the format information returned by `stream_props_GetMediaType()` should be WMV3, not RGB24.
--
v8: winegstreamer: Use codec format in stream_props_GetMediaType.
https://gitlab.winehq.org/wine/wine/-/merge_requests/2387
> > > If we want it to be usable for things like disassembly (and assembly), we need v_s_i to be able to express everything that *any* frontend or backend can.
>
> And I don't think we want it, it would be unmanageable.
>
> > should v-s IR exist just as a common, "neutral" format, and then we'd have basically frontend- and/or backend-specific IR that we'd do passes over?
>
> Ideally this would feel a good design to me. Though I would say that it is not a hard requirement for frontends and backends to have their specific IR, even if it is conceivable that past a certain complexity threshold it is difficult to do without. Also, my idea is that you can do passes over {front,back}end specific IRs, but also over the common IR. At least, I'd design the common IR in such a way that they can be done.
This is what I originally thought, but I'm having slight second thoughts now. The thing is that the tradeoff is now you need a new IR for every bytecode language you have, which in our case means at least sm1, sm4, and eventually I think sm6. That's a lot of extra support code to add. Consider how much we'd need to add by putting that intermediate step into HLSL -> smX translation. The advantage of making v_s_i the IR is that you don't actually need any of that.
It does feel conceptually ugly to have a CISC IR, but looking past that, I'm not sure it's that bad? Some additions (new opcodes, new register types) aren't a problem, since the backends can just vkd3d_shader_error(). Some additions (new source fields, new instruction flags) are worse, though, since now you need to make sure that nothing else cares...
It may also not be that much of a concern if the number of languages we have doesn't actually grow that large.
It may make sense to start with something underdesigned—which I think means CISC—and if that starts to get actually unwieldy, then we can start overdesigning a RISC IR to mediate.
> In practice I see a number of difficulties:
>
> * Shaders are not just code; they also embed interface information (input and output signatures, uniforms, buffers, object bindings, ...) and other random metadata (for example for the hull shaders). Should this be representable in the common IR?
We need *some* way to represent that information so it doesn't get lost, at least. I think we don't want side channels, to be clear: I think that there should be a point where all of the information relevant to a shader is represented in v_s_i format, including metadata.
And contrary to the instructions themselves, I think we want this information to be in a relatively simple and well-designed form. In a sense, it doesn't need to be "complex" like the instructions since we're not doing optimization passes over it [if that makes sense]. I think Conor has basically been pushing v_s_i in this direction in order to accommodate sm6 (at least wrt semantics), and given the tribulations I've had to endure to get sm1 to work, I think that's the right approach.
--
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/174#note_31612
Zebediah Figura (@zfigura) commented about dlls/hrtfapo/main.c:
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA
> + */
> +
> +#include "hrtfapoapi.h"
> +#include "wine/debug.h"
> +
> +WINE_DEFAULT_DEBUG_CHANNEL(hrtfapo);
Let's use the existing xaudio2 channel here.
And in general, matching the style used in xaudio2 would probably be good.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/2727#note_31599
Please note that with this MR the following `TRACE()` calls are removed:
- `"ALSA does not support volume control\n"`
- `"OSS doesn't support setting volume\n"`
- `"PulseAudio does not support session volume control\n"`
Should I perhaps move them to the specific drivers' unixlib instead?
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/2726
See https://bugs.winehq.org/show_bug.cgi?id=54832
I'm starting to see this particular FIXME in quite a few games (Escape Goat 2, Murder Miners, and Little Racers STREET to name a few), and since I'm not sure how to fix this I figured I could at least provide a test for someone that knows more SM4 than me :P
--
v2: tests: Add a test for arrays with an expression as the index.
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/189
On Sat Apr 29 14:54:15 2023 +0000, Jinoh Kang wrote:
> To reduce complexity, I've tried refactoring the state management.
> Currently, the async has the following two state flags that deal with
> the state of the client that originally requested the async:
> - `signaled`: indicates whether the async object is signaled; the client
> may wait on the async handle before returning from the corresponding
> wine system call.
> - Valid transition: 0 -> 1.
> - This flag is set if only if any of the following conditions are true:
> - Given that `!async->blocking`,
> - `async_wake_obj(async)` has been called when
> `async->unknown_status = 0`, or
> - `async_handoff(async, ...)` has been called with `get_error() !=
> STATUS_ALERTED && (async->pending || !NT_ERROR(get_error()) &&
> (get_error() != STATUS_PENDING || async->iosb->status != STATUS_PENDING)`.
> - (Regardless of `async->blocking`) `async_set_result(async, status,
> ...)` has been called with `async->terminated && (!async->alerted ||
> status != STATUS_PENDING)`.
> - Note that this flag exhibits slightly different behavior depending
> on `async->blocking`.
> - If `async->blocking`, it tries to block the requestor until the
> async operation has completed.
> - If `!async->blocking`, it tries to signal as soon as the initial
> status is known, _unless_ the wait handle is being closed, in which case
> it doesn't bother to touch the flag at all.
> - If signaled and a wait on the async object is satisfied, the
> requestor function should have performed the synchronous completion
> sequence (either directly or via APC\_ASYNC\_IO) before the wait is
> satisfied, unless `async->pending`.
> - `direct_result`: from Wine codebase, `a flag if we're passing result
> directly from request instead of APC`.
> - Valid transition: 1 -> 0 (after set to 1 from `create_request_async`).
> - Whenever async state is settled, the flag is set if all of the
> following conditions are true:
> - The async was created via create_request_async(), and
> - async_set_unknown_status() has not been called, and
> - The async handle has never been waited on (wait consumes the
> flag), and
> - Any of:
> - async_handoff() has not been called, or
> - From async_handoff(), the async is terminated or alerted synchronously.
> - The flag has the following effect:
> - For each async_terminate() call, it will send APC_ASYNC_IO if and
> only if !direct_result.
> - For each async handle wait satisfaction, it will call
> async_set_result() if and only if direct_result.
> We can observe the following from above:
> - The flags are defined in terms of what the flags *do* rather than what
> the value of the flags *mean*. It doesn't help that the transitions for
> each flag are quite scattered and sparse.
> - Their behaviors are dependent on the client-side async implementation,
> so reasoning about them requires verifying how the requestor side behaves.
> - Depending on other part of the async object state, one or more flags
> may be left unobserved and thus ignored, effectivly becoming a "don't
> care" bit.
> This increases complexity and hinders maintenance in my opinion. In
> particular, it's difficult to determine precisely *when* the flag is or
> should be set or unset.
> I'd propose that the constituent conditions of each flag be refactored
> into separate state variables first. Paradoxically, it may even help
> combine two or more binary flags into an enum as the semantics become
> more simpler.
Independent of above, I'm thinking about introducing a new `completed` flag and replacing `terminated` with `completed || alerted`.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/499#note_31580
On Tue Apr 18 19:25:54 2023 +0000, Zebediah Figura wrote:
> > I don't think so. `async_terminate()` is currently no-op if the async
> has already been terminated with a status other than `STATUS_ALERTED`,
> and the patch maintains this invariant.
> Right, that's the bit I failed to think through.
> > It is possible to introduce a new boolean flag, but the flag cannot
> replace `terminate_status`. `terminate_status` can be any arbitrary
> value passed to the third parameter of `async_set_timeout`.
> Ah, and I missed that that was a possibility, although, in my defense,
> it's not stated in the patch subject either. I.e. this patch doesn't
> just fix the race where an alerted async is canceled, it also fixes the
> similar race where it times out.
> So yes, I think the patch makes sense as is. Maybe a comment to yet
> again overdocument asyncs wouldn't hurt? I easily see myself looking at
> code like this and having to ask myself, "when exactly is
> terminate_status set?" and the answer is "if (and only if?) the async is
> canceled or times out while it's alerted", which is nice to have
> documented and not have to puzzle through later.
To reduce complexity, I've tried refactoring the state management. Currently, the async has the following two state flags that deal with the state of the client that originally requested the async:
- `signaled`: indicates whether the async object is signaled; the client may wait on the async handle before returning from the corresponding wine system call.
- Valid transition: 0 -> 1.
- This flag is set if only if any of the following conditions are true:
- Given that `!async->blocking`,
- `async_wake_obj(async)` has been called when `async->unknown_status = 0`, or
- `async_handoff(async, ...)` has been called with `get_error() != STATUS_ALERTED && (async->pending || !NT_ERROR(get_error()) && (get_error() != STATUS_PENDING || async->iosb->status != STATUS_PENDING)`.
- (Regardless of `async->blocking`) `async_set_result(async, status, ...)` has been called with `async->terminated && (!async->alerted || status != STATUS_PENDING)`.
- Note that this flag exhibits slightly different behavior depending on `async->blocking`.
- If `async->blocking`, it tries to block the requestor until the async operation has completed.
- If `!async->blocking`, it tries to signal as soon as the initial status is known, _unless_ the wait handle is being closed, in which case it doesn't bother to touch the flag at all.
- If signaled and a wait on the async object is satisfied, the requestor function should have performed the synchronous completion sequence (either directly or via APC\_ASYNC\_IO) before the wait is satisfied, unless `async->pending`.
- `direct_result`: from Wine codebase, `a flag if we're passing result directly from request instead of APC`.
- Valid transition: 1 -> 0 (after set to 1 from `create_request_async`).
- Whenever async state is settled, the flag is set if all of the following conditions are true:
- The async was created via create_request_async(), and
- async_set_unknown_status() has not been called, and
- The async handle has never been waited on (wait consumes the flag), and
- Any of:
- async_handoff() has not been called, or
- From async_handoff(), the async is terminated or alerted synchronously.
- The flag has the following effect:
- For each async_terminate() call, it will send APC_ASYNC_IO if and only if !direct_result.
- For each async handle wait satisfaction, it will call async_set_result() if and only if direct_result.
We can observe the following from above:
- The flags are defined in terms of what the flags *do* rather than what the value of the flags *mean*. It doesn't help that the transitions for each flag are quite scattered and sparse.
- Their behaviors are dependent on the client-side async implementation, so reasoning about them requires verifying how the requestor side behaves.
- Depending on other part of the async object state, one or more flags may be left unobserved and thus ignored, effectivly becoming a "don't care" bit.
This increases complexity and hinders maintenance in my opinion. In particular, it's difficult to determine precisely *when* the flag is or should be set or unset.
I'd propose that the constituent conditions of each flag be refactored into separate state variables first. Paradoxically, it may even help combine two or more binary flags into an enum as the semantics become more simpler.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/499#note_31579
I just realized one more thing. I think we are not properly following the specification for both sample_l and sample_b.
In the **Restrictions** section of the documentation for sample_b it is mentioned that the last src register must use a single component selector if it is not a scalar immediate. This "select_component" is also ilustrated in the sample_l format:
```
sample_l[_aoffimmi(u,v,w)] dest[.mask], srcAddress[.swizzle], srcResource[.swizzle], srcSampler, srcLOD.select_component
```
Currently, we are not respecting that since sm4_src_from_node() sets
```
src->swizzle_type = VKD3D_SM4_SWIZZLE_VEC4;
```
by default.
It should be VKD3D_SM4_SWIZZLE_SCALAR in this case.
If I am not mistaken, a simple:
```
instr.srcs[3].swizzle_type = VKD3D_SM4_SWIZZLE_SCALAR;
```
after calling sm4_src_from_node() should do. The disassembly should show a single component in the swizzle of the last src register.
--
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/188#note_31568
On Fri Apr 28 21:50:24 2023 +0000, Francisco Casas wrote:
> Currently, there doesn't exist a `[test todo]` Section for the
> shader_runner.
> This only seems to work (in this incomplete stage) because the
> shader_runner is handling the section the same as the previous one,
> `[pixel shader todo]`.
> The proper way to write this part is:
> ```
> [test]
> todo draw quad
> todo probe all rgba (0.25, 0, 0.25, 0)
> ```
> and remove the `todo`s in the next commit.
Done!
--
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/188#note_31566
From what I can tell, the recent work on SampleBias/SampleLevel did all of the work for us, we just need to take the same framework and include the case for the `sample_l` instruction.
Tested with some shaders from the native Linux version of Little Racers STREET.
--
v3: vkd3d-shader/tpf: Add support for emitting sample_l instructions
tests: Add a test for SampleLevel() function.
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/188
From what I can tell, the recent work on SampleBias/SampleLevel did all of the work for us, we just need to take the same framework and include the case for the `sample_l` instruction.
Tested with some shaders from the native Linux version of Little Racers STREET.
--
v2: vkd3d-shader/tpf: Add support for emitting sample_l instructions
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/188
From what I can tell, the recent work on SampleBias/SampleLevel did all of the work for us, we just need to take the same framework and include the case for the `sample_l` instruction.
Tested with some shaders from the native Linux version of Little Racers STREET.
--
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/188
~~This one's marked as a draft, as there seems to be a blocker with the method parameters.~~
~~The first commit totally works, _if_ the ddx/ddy parameters are literals - they do _not_ work when passing a variable of any kind. The test comes from tests/d3d12.c, so I'm mostly just trying to migrate that to the HLSL test suite, but it currently hits an assert before we get to the resource load (which does eventually work) and I'm not sure what's causing it:~~
```
vkd3d-compiler: libs/vkd3d-shader/tpf.c:3190: sm4_register_from_node: Assertion `instr->reg.allocated' failed.
```
~~Seems like it's surprised when we try to load from the constant buffer maybe?~~ Fixed!
--
v6: tests: Add a basic compilation test for SampleGrad() method.
vkd3d-shader/hlsl: Add support for SampleGrad() method
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/184
On Fri Apr 28 14:26:01 2023 +0000, Ethan Lee wrote:
> Understood, test has been pushed!
40 isn't that bad, no. Our trigonometry tests do far worse.
I'm not particularly surprised that derivatives aren't very accurate anyway.
--
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/179#note_31550
--
v3: uiautomationcore: Retrieve runtime ID on UiaReturnRawElementProvider thread to prevent a deadlock.
uiautomationcore: Implement IUIAutomation::GetFocusedElement{BuildCacheRequest}.
uiautomationcore: Implement UiaNodeFromFocus.
https://gitlab.winehq.org/wine/wine/-/merge_requests/2682
This MR introduces the driver mechanisms to handle dynamic events from the Wayland compositor, using wl_output events as the guiding use case (i.e., we want to update the win32u display settings when the host settings change).
In this design we create a dedicated thread to read and dispatch Wayland events received from the compositor. If a Wayland event handler wants some code to be run in the context of a particular HWND's thread, it can add an internal event to a custom queue we have for each (GUI enabled) thread. The ProcessEvents driver callback processes internal events from that queue. In order to wake up waiting threads we use a pipe to notify about new internal events, with the read end acting as the thread's host queue fd. This is similar to how winemac.drv works.
We use the aforementioned mechanisms to queue win32u display device updates to the desktop window thread. Since there are many pieces that need to fall into place, this MR gradually reaches the final design:
1. We first introduce the dedicated read/dispatch thread and handle events (and also display device updates if in the desktop process) in that thread.
2. We ensure access to Wayland output information is thread-safe and consistent (since in step 3 we will need to access it from a different thread).
3. We finally introduce per-thread internal event queues and, if we are in the desktop process, queue the display device update to the desktop window thread internal event queue. Note that the main portion of the wl_output event code is still handled in the dedicated read/dispatch thread.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/2712
--
v3: vkd3d-shader/ir: Normalise signatures and input/output registers to the Shader Model 6 pattern.
vkd3d-shader/tpf: Fail parsing if an input/output parameter order is > 2.
tests/d3d12: Test register relative addressing in vertex and pixel shaders.
vkd3d-shader: Introduce an internal sm6 signature structure.
vkd3d-shader/tpf: Return an error from vkd3d_shader_sm4_parser_create() if the parser failed.
vkd3d-shader/d3dbc: Return an error from vkd3d_shader_sm1_parser_create() if the parser failed.
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/181
--
v2: vkd3d-shader/ir: Normalise signatures and input/output registers to the Shader Model 6 pattern.
vkd3d-shader/tpf: Fail parsing if an input/output parameter order is > 2.
tests/d3d12: Test register relative addressing in vertex and pixel shaders.
vkd3d-shader: Introduce an internal sm6 signature structure.
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/181
~~This one's marked as a draft, as there seems to be a blocker with the method parameters.~~
~~The first commit totally works, _if_ the ddx/ddy parameters are literals - they do _not_ work when passing a variable of any kind. The test comes from tests/d3d12.c, so I'm mostly just trying to migrate that to the HLSL test suite, but it currently hits an assert before we get to the resource load (which does eventually work) and I'm not sure what's causing it:~~
```
vkd3d-compiler: libs/vkd3d-shader/tpf.c:3190: sm4_register_from_node: Assertion `instr->reg.allocated' failed.
```
~~Seems like it's surprised when we try to load from the constant buffer maybe?~~ Fixed!
--
v5: tests: Add a basic compilation test for SampleGrad() method.
vkd3d-shader/hlsl: Add support for SampleGrad() method
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/184
SPIR-V already handled DSX/DSY, so only D3DBC/TPF needed new case blocks.
You'll notice that there's no test for this one - in addition to being a pretty straightforward translation for all possible formats, this feature uses the render target width/height and I wasn't sure if there was a good way to ensure that the test would always make sense.
Instead, I did the test manually, and it's what you'd expect (EDIT: Previously the test used a uniform which always optimized to 0, new test uses VPOS instead):
HLSL:
```
float4 main(float4 pos : sv_position) : sv_target
{
float4 x = ddx(pos.x);
float4 y = ddy(pos.y);
return x + y;
}
```
D3DBC:
```
ps_3_0
dcl_position0 vPos
mov r0.xyzw, vPos.xyzw
mov r1.x, r0.x
dsx r1.x, r1.x
mov r0.x, r0.yxxx
dsy r0.x, r0.x
mov r1.xyzw, r1.x
mov r0.xyzw, r0.x
add r0.xyzw, r1.xyzw, r0.xyzw
mov oC0.xyzw, r0.xyzw
```
DXBC-TPF:
```
ps_4_0
dcl_input_ps_siv linear v0.xyzw, position
dcl_output o0.xyzw
dcl_temps 2
mov r0.xyzw, v0.xyzw
mov r1.x, r0.x
dsx r1.x, r1.x
mov r0.x, r0.yxxx
dsy r0.x, r0.x
mov r1.xyzw, r1.x
mov r0.xyzw, r0.x
add r0.xyzw, r1.xyzw, r0.xyzw
mov o0.xyzw, r0.xyzw
ret
```
Fixes https://bugs.winehq.org/show_bug.cgi?id=54827
--
v6: tests: Add tests for ddx(), ddy() intrinsics.
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/179
~~This one's marked as a draft, as there seems to be a blocker with the method parameters.~~
~~The first commit totally works, _if_ the ddx/ddy parameters are literals - they do _not_ work when passing a variable of any kind. The test comes from tests/d3d12.c, so I'm mostly just trying to migrate that to the HLSL test suite, but it currently hits an assert before we get to the resource load (which does eventually work) and I'm not sure what's causing it:~~
```
vkd3d-compiler: libs/vkd3d-shader/tpf.c:3190: sm4_register_from_node: Assertion `instr->reg.allocated' failed.
```
~~Seems like it's surprised when we try to load from the constant buffer maybe?~~ Fixed!
--
v4: tests: Add a basic compilation test for SampleGrad() method.
vkd3d-shader/hlsl: Add support for SampleGrad() method
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/184
SPIR-V already handled DSX/DSY, so only D3DBC/TPF needed new case blocks.
You'll notice that there's no test for this one - in addition to being a pretty straightforward translation for all possible formats, this feature uses the render target width/height and I wasn't sure if there was a good way to ensure that the test would always make sense.
Instead, I did the test manually, and it's what you'd expect (EDIT: Previously the test used a uniform which always optimized to 0, new test uses VPOS instead):
HLSL:
```
float4 main(float4 pos : sv_position) : sv_target
{
float4 x = ddx(pos.x);
float4 y = ddy(pos.y);
return x + y;
}
```
D3DBC:
```
ps_3_0
dcl_position0 vPos
mov r0.xyzw, vPos.xyzw
mov r1.x, r0.x
dsx r1.x, r1.x
mov r0.x, r0.yxxx
dsy r0.x, r0.x
mov r1.xyzw, r1.x
mov r0.xyzw, r0.x
add r0.xyzw, r1.xyzw, r0.xyzw
mov oC0.xyzw, r0.xyzw
```
DXBC-TPF:
```
ps_4_0
dcl_input_ps_siv linear v0.xyzw, position
dcl_output o0.xyzw
dcl_temps 2
mov r0.xyzw, v0.xyzw
mov r1.x, r0.x
dsx r1.x, r1.x
mov r0.x, r0.yxxx
dsy r0.x, r0.x
mov r1.xyzw, r1.x
mov r0.xyzw, r0.x
add r0.xyzw, r1.xyzw, r0.xyzw
mov o0.xyzw, r0.xyzw
ret
```
Fixes https://bugs.winehq.org/show_bug.cgi?id=54827
--
v5: tests: Add test for ddx(), ddy() intrinsics
vkd3d-shader/hlsl: Add support for ddx(), ddy() intrinsics.
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/179
~~This one's marked as a draft, as there seems to be a blocker with the method parameters.~~
~~The first commit totally works, _if_ the ddx/ddy parameters are literals - they do _not_ work when passing a variable of any kind. The test comes from tests/d3d12.c, so I'm mostly just trying to migrate that to the HLSL test suite, but it currently hits an assert before we get to the resource load (which does eventually work) and I'm not sure what's causing it:~~
```
vkd3d-compiler: libs/vkd3d-shader/tpf.c:3190: sm4_register_from_node: Assertion `instr->reg.allocated' failed.
```
~~Seems like it's surprised when we try to load from the constant buffer maybe?~~ Fixed!
--
v3: tests: Add a basic compilation test for SampleGrad() method.
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/184
This is actually fixing two issues:
- if we get a negative stride, then the video transform should have
already flipped vertically the image, so we don't need to flip it
again.
(This is another side effect for handling negative strides in
wg_transform).
- it also fixes a crash in AoEII, as the image source line offsets
were incorrectly computed with unsigned arithmetic (while signed
arithmetic was expected), resulting in incorrect sign propagation
from 32bit to 64bit integers.
Signed-off-by: Eric Pouech <epouech(a)codeweavers.com>
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/2699
--
v3: vkd3d-shader/hlsl: Return an hlsl_ir_node pointer from hlsl_new_resource_store().
vkd3d-shader/hlsl: Return an hlsl_ir_node pointer from hlsl_new_resource_load().
vkd3d-shader/hlsl: Return an hlsl_ir_node pointer from hlsl_new_loop().
vkd3d-shader/hlsl: Pass an hlsl_block pointer to hlsl_new_loop().
vkd3d-shader/hlsl: Reuse the "init" instruction list if possible in create_loop().
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/176
--
v13: winepulse: Use mmdevapi's AudioSessionControl.
wineoss: Use mmdevapi's AudioSessionControl.
winecoreaudio: Use mmdevapi's AudioSessionControl.
winealsa: Move AudioSessionControl into mmdevapi.
https://gitlab.winehq.org/wine/wine/-/merge_requests/2669
On Fri Apr 28 05:16:50 2023 +0000, Georg Lehmann wrote:
> We are not supporting video extension at the moment because
> `make_vulkan` wasn't updated to support more than one xml file.
VK_NV_displacement_micromap isn't final yet, we don't enable provisional extensions in winevulkan.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/2549#note_31470
On Fri Apr 28 05:16:50 2023 +0000, Oscar Barenys wrote:
> sorry for asking here if not appropiate, but just asking:
> just compiled a bunch of NV VK samples on Windows and was going to test
> on Linux via Wine..
> and noticed the KHR video decode sample:
> https://github.com/nvpro-samples/vk_video_samples/tree/main/vk_video_decoder
> and also the new DMM sample (VK_NV_displacement_micromap):
> https://github.com/nvpro-samples/vk_mini_samples/tree/main/samples/mm_displ…
> fail to run under wine..
> any reason for not supporting the "released" KHR video (decoding) exts:?
> ```
> VK_KHR_video_decode_queue
> VK_KHR_video_queue
> VK_KHR_video_decode_h264
> VK_KHR_video_decode_h265
> ```
> and also the new VK_NV_displacement_micromap released with 1.3.245 spec?
We are not supporting video extension at the moment because `make_vulkan` wasn't updated to support more than one xml file.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/2549#note_31469
--
v12: winepulse: Use mmdevapi's AudioSessionControl.
wineoss: Use mmdevapi's AudioSessionControl.
winecoreaudio: Use mmdevapi's AudioSessionControl.
winealsa: Move AudioSessionControl into mmdevapi.
https://gitlab.winehq.org/wine/wine/-/merge_requests/2669
--
v11: winepulse: Use mmdevapi's AudioSessionControl.
wineoss: Use mmdevapi's AudioSessionControl.
winecoreaudio: Use mmdevapi's AudioSessionControl.
winealsa: Move AudioSessionControl into mmdevapi.
https://gitlab.winehq.org/wine/wine/-/merge_requests/2669
---
This must be the longest ignored d3d test failure. I think my r200 GPU
does not show this behavior, but my r500 one does. I'll be able to check
next week if anyone cares.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/2717
The serie intent is to fix unexpected paths in module's list:
This happens:
- when running under old / (new) wow64
- when main module is located under the syswow64 directory
- the 32 bit modules are stored in LdrData (and then exposed through
a couple of ways) under syswow64 (they are normally stored under
system32, letting the redirection come into play when needed)
This triggers a couple of errors in winetest (as we're using
c:\windows\syswow64\msinfo32.exe in many tests to trigger a wow64
process from a winetest program).
This is the fix awaited in MR!2497.
@julliard: I'm not 100% happy with the fix itself by reintroducting
ref to the redirected DLLs in ntdll/PE but couldn't find a better idea.
--
v2: ntdll: Store system DLL in LdrData under system32 for a wow64 process.
kernel32: Harden some wow64 module tests.
https://gitlab.winehq.org/wine/wine/-/merge_requests/2578
This is required by https://bugs.winehq.org/show_bug.cgi?id=54660 .
--
v9: vkd3d-shader/hlsl: Consider duplicated input semantic types equivalent in SM1.
vkd3d-shader/hlsl: Handle possibly different types in input semantic var load.
vkd3d-shader/hlsl: Error out when a semantic is used with incompatible types.
vkd3d-shader/hlsl: Error out when an output semantic is used more than once.
vkd3d-shader/hlsl: Support semantics for array types.
vkd3d-shader/hlsl: Don't create semantic vars more than once.
vkd3d-shader/hlsl: Move get_array_size() and get_array_type() to hlsl.c.
tests: Test duplicated semantics.
tests: Test array types with semantics.
vkd3d-shader/hlsl: Avoid invalid input/output copies for non-numeric types.
tests: Map unindentified hrs on compilation.
tests: Allow invalid vertex shader tests.
tests: Expect S_OK result on [vertex shader].
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/148
This series allows linking of our d3dx9.dll against d3dcompiler.lib from the Windows SDK. It presumably also fixes the d3dcompiler tests, although I didn't bother building them with Visual Studio so far.
And yes, I did check, that stringification macro works on MSVC.
--
v3: d3dcompiler: Make D3DAssemble a private export.
d3dcompiler/tests: Load D3DAssemble via GetProcAddress.
d3dx9: Load D3DAssemble via GetProcAddress.
https://gitlab.winehq.org/wine/wine/-/merge_requests/2705
On Thu Apr 27 13:22:42 2023 +0000, Giovanni Mascellani wrote:
> This means that expressions of type other than `float` are casted to
> `float`. In principle that should be correct in the end, because AFAIU
> the sign should always be mapped correctly, but it makes me a bit
> nervous. Could the type of zero be set to coincide with the type of the argument?
Pushed an update to do this, since it seems like if anything it'd be an optimization - as part of the update I've also added int versions of all the tests, to make sure it's doing the right thing for float and non-float types.
--
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/178#note_31399
Printing using EMF spool files is still missing some features and has some bugs:
- ResetDC should generally work on gdi side but is ignored on playback - because of that it wasn't really tested
- no support for ExtEscape / Escape yet
- DC bounds are incorrect
In order to enable printing using EMF spool files it's needed to change printer print processor to wineps.
--
v3: gdi32: Add GdiIsMetaPrintDC implementation.
https://gitlab.winehq.org/wine/wine/-/merge_requests/2718
Printing using EMF spool files is still missing some features and has some bugs:
- ResetDC should generally work on gdi side but is ignored on playback - because of that it wasn't really tested
- no support for ExtEscape / Escape yet
- DC bounds are incorrect
In order to enable printing using EMF spool files it's needed to change printer print processor to wineps.
--
v2: gdi32: Add GdiIsMetaPrintDC implementation.
gdi32: Implicitly call StartPage while creating spool file.
gdi32: Add support for creating EMF spool files.
gdi32: Factor out emf_create helper.
gdi32: Factor out emf_eof helper.
gdi32: Improve EMF DC cleanup when CloseEnhMetafile is not called.
gdi32: Store the printer info in a structure.
wineps: Reset current position on every page.
localspl: Validate datatype in StartDocPrinter.
localspl: Add support for PRINTER_ATTRIBUTE_RAW_ONLY printer attribute.
https://gitlab.winehq.org/wine/wine/-/merge_requests/2718
Printing using EMF spool files is still missing some features and has some bugs:
- ResetDC should generally work on gdi side but is ignored on playback - because of that it wasn't really tested
- no support for ExtEscape / Escape yet
- DC bounds are incorrect
In order to enable printing using EMF spool files it's needed to change printer print processor to wineps.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/2718
On Thu Apr 27 01:58:24 2023 +0000, Zebediah Figura wrote:
> > Among other things, this means that it's still not completely clear to
> me what you mean by "maximally CISC" or "minimally CISC" (and while I
> think I sort of get the general idea for the terms, that's not nearly
> enough to understand options (3) and (4)).
> Ah, I'm sorry. To put it into more concrete terms: HLSL is intentionally
> a quite simple IR. I would describe it as erring on the side of RISC in
> design. Everything operates on SSA values except for extern and resource
> loads; single instructions do pretty much maximally modular things; we
> avoid adding new expression types or instruction types if we can instead
> just lower them to simpler expressions immediately.
> By contrast, both sm1 and sm4 are more CISC. One instruction can do
> multiple arithmetic operations (abs, neg, sat) in addition to whatever
> else it's doing; instructions load directly from multiple types of
> registers (instead of always going through an SSA value, or even just
> always going through a temporary register).
> Part of being able to make statements like "HLSL IR is RISC" is the
> knowledge that it used to be less so (e.g. we used to have
> HLSL_IR_CONSTRUCTOR, which I assume you can guess the idea of) and also
> that we've considered making it less so. On the basis that one HLSL
> instruction corresponds to one smX instruction we've considered adding
> some of those features to HLSL IR (for example, making hlsl_src hold a
> union that includes not just SSA values but also immediate constants or
> something similar to hlsl_deref). We eventually decided against those on
> the grounds that it would make the IR more complex, and harder to reason
> about when doing optimization passes.
> > * In general, I think that code processing should be done in the form
> of IR passes as much as possible, rather than be embedded in the
> frontends or backends. This helps modularity and code sharing. I think
> Conor's patches go into this direction, which makes sense to me.
> Frontends and backends already care about serialization and
> deserialization and should not be loaded with excessive other duties.
> Yeah. Well, in a sense the question can be rephrased as: should v-s IR
> exist just as a common, "neutral" format, and then we'd have basically
> frontend- and/or backend-specific IR that we'd do passes over? E.g. in
> this case the backend IR would probably be adapted from struct
> sm4_instruction, which currently exists just to be a slightly more
> structured version of the byte code, but could grow to be more than
> that. It seems we probably won't be moving this way, but that's kind of
> what I was envisioning with that proposal. In that case struct
> vkd3d_shader_instruction wouldn't have any optimization passes done over
> it, the only raison d'être would be to help allow mixing any frontend
> with any backend.
> > * Currently my understanding is that `vkd3d_shader_instrucion` is
> basically modeled after SM4. When converting SM4 -> SPIR-V, the SM4 code
> is basically deserialized to `vkd3d_shader_instruction` and then
> rewritten to SPIR-V in a rather naive way. The deserialization step is
> very syntactical, to the point that the original SM4 code can be
> faithfully disassembled from the IR.
> I believe that vkd3d_shader_instruction is modeled after (or even
> "adapted directly from") struct wined3d_shader_instruction, which was
> designed to handle both sm1 and sm4. Fundamentally the formats are
> relatively similar, enough that it's possible to write a single
> disassembly routine, and a single GLSL shader backend, that mostly
> handles both, although there did need to be a lot of version-specific
> code in the latter case.
> > If we want `vkd3d_shader_instruction` to be flexible enough to support
> different frontends and backends, I think it must somehow be unchained
> from SM4. In particular, SM4 disassembling has to go through a different path.
> Right, and there's part of the rub. If we want it to be usable for
> things like disassembly (and assembly), we need v_s_i to be able to
> express everything that *any* frontend or backend can. This ends up kind
> of bloating the structure, which is one of the reasons I'm not sure we
> want that "maximally CISC" kind of IR.
> Like I mentioned, the less complicated an IR is, the easier it is to
> work with. On the other hand, the kind of passes we'd be potentially
> doing over a maximally CISC v_s_ir aren't the same as the work we do
> with HLSL IR. HLSL has to bridge the gap from text all the way down to
> byte code, but v_s_ir would potentially just be a bunch of peepholes.
> The fact that it doesn't have to deal with *types*, or well, doesn't
> have to deal with data structures, is already quite a benefit. So I'm
> not sure anymore that that per se is a concern.
> And of course when trading off one complex IR against multiple (ideally
> less complex) IRs, it takes some judgement to decide which is the best option.
Thanks for the explanation. Keeping in mind my remark about salt and cargo ships from yesterday, here are my thoughts.
> If we want it to be usable for things like disassembly (and assembly), we need v_s_i to be able to express everything that *any* frontend or backend can.
And I don't think we want it, it would be unmanageable.
> should v-s IR exist just as a common, "neutral" format, and then we'd have basically frontend- and/or backend-specific IR that we'd do passes over?
Ideally this would feel a good design to me. Though I would say that it is not a hard requirement for frontends and backends to have their specific IR, even if it is conceivable that past a certain complexity threshold it is difficult to do without. Also, my idea is that you can do passes over {front,back}end specific IRs, but also over the common IR. At least, I'd design the common IR in such a way that they can be done.
In practice I see a number of difficulties:
* Shaders are not just code; they also embed interface information (input and output signatures, uniforms, buffers, object bindings, ...) and other random metadata (for example for the hull shaders). Should this be representable in the common IR?
* How essential (or "RISC") should the IR be? Should it be scalar or vector? Making it more essential means that frontends have to do more work and common IR passes are easier to write; make it less means that backends have to do more work, though it also has the advantage that passes that make sense for different frontends and backends can be deduplicated.
Maybe I am just rehashing the same elements around and around. It's quite hard to design this thing...
--
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/174#note_31369
This series allows linking of our d3dx9.dll against d3dcompiler.lib from the Windows SDK. It presumably also fixes the d3dcompiler tests, although I didn't bother building them with Visual Studio so far.
And yes, I did check, that stringification macro works on MSVC.
--
v2: d3dcompiler: Make D3DAssemble a private export.
d3dcompiler/tests: Load D3DAssemble via GetProcAddress.
d3dx9: Load D3DAssemble via GetProcAddress.
https://gitlab.winehq.org/wine/wine/-/merge_requests/2705
--
v10: winepulse: Use mmdevapi's AudioSessionControl.
wineoss: Use mmdevapi's AudioSessionControl.
winecoreaudio: Use mmdevapi's AudioSessionControl.
winealsa: Move AudioSessionControl into mmdevapi.
https://gitlab.winehq.org/wine/wine/-/merge_requests/2669
On Thu Apr 27 06:51:02 2023 +0000, Nikolay Sivov wrote:
> How bad would it be to duplicate tests in msxml3/4/6, removing version
> checks completely?
could try that. don't think it'd be too bad
would it be just for the tests that are different? would msxml3 remain the main place for tests with common results and msxml4/6-specific tests under dlls/msxml[46]?
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/895#note_31345
--
v9: winepulse: Use mmdevapi's AudioSessionControl.
wineoss: Use mmdevapi's AudioSessionControl.
winecoreaudio: Use mmdevapi's AudioSessionControl.
winealsa: Move AudioSessionControl into mmdevapi.
https://gitlab.winehq.org/wine/wine/-/merge_requests/2669
custom.c does not use the test.h and debug.h definitions and presumably has a good reason not to. But this resulted in the printf-format compiler attribute getting lost. So copy it from debug.h to allow proper checks of the ok_() format string.
Also fix said format strings.
--
v2: msi/tests: ok_() takes printf-style arguments.
msi/tests: Fix the ok() formats so they match the size of their arguments.
https://gitlab.winehq.org/wine/wine/-/merge_requests/2692
This patch makes index expressions on resources hlsl_ir_index nodes
instead of hlsl_ir_resource_load nodes, because it is not known if they
will be used later as the lhs of an hlsl_ir_resource_store.
For now, the only benefit is consistency.
--
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/182
webview2 uses this function to locate media foundation.
I also saw the UWP version of FH5 calling this function as well, interestingly.
--
v4: kernelbase: Add GetPackagesByPackageFamily stub.
https://gitlab.winehq.org/wine/wine/-/merge_requests/2713