This is the last thing needed to support non-constant offset dereferences in SM4.
It allows to perform relative addressing on temps.
Besides this, I have additional patches for relative addressing on uniforms, and input and output semantics, but these may not be useful for now, since we copy all these variables into temps instead of using them directly.
--
v6: vkd3d-shader/tpf: Support relative addressing for indexable temps in SM4.
vkd3d-shader/tpf: Move sm4_register_from_node() up.
vkd3d-shader/tpf: Support writing relative addressing indexes.
vkd3d-shader/tpf: Write register index addressing.
vkd3d-shader/tpf: Encode dst and src registers using the same function.
tests: Add aditional relative addressing tests.
tests: Rename array-index-expr.shader_test as non-const-indexing.shader_test.
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/435
gcc tends to optimize away the magic field cleanup, leading to
believe the query is still allocated.
Signed-off-by: Eric Pouech <epouech(a)codeweavers.com>
--
v3: pdh: Zero out magic fields with SecureZeroMemory().
https://gitlab.winehq.org/wine/wine/-/merge_requests/4323
gcc tends to optimize away the magic field cleanup, leading to
believe the query is still allocated.
Signed-off-by: Eric Pouech <epouech(a)codeweavers.com>
--
v2: pdh: Zero out magic fields with SecureZeroMemory().
https://gitlab.winehq.org/wine/wine/-/merge_requests/4323
Signed-off-by: Nikolay Sivov <nsivov(a)codeweavers.com>
--
v5: vkd3d-shader/tpf: Initial support for writing fx_4_0/fx_4_1 binaries.
vkd3d-shader: Add separate binary target type for effects.
vkd3d-shader/hlsl: Handle effect group statement.
vkd3d-shader/hlsl: Add variables for techniques.
vkd3d-shader/hlsl: Rename rule for top-level techniques.
vkd3d-shader/hlsl: Add 'fxgroup' token.
tests: Add some tests for effects groups syntax.
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/443
Signed-off-by: Nikolay Sivov <nsivov(a)codeweavers.com>
--
v4: vkd3d-shader/tpf: Initial support for writing fx_4_0/fx_4_1 binaries.
vkd3d-shader/hlsl: Handle effect group statement.
vkd3d-shader/hlsl: Add variables for techniques.
vkd3d-shader/hlsl: Rename rule for top-level techniques.
vkd3d-shader/hlsl: Add 'fxgroup' token.
tests: Add some tests for effects groups syntax.
vkd3d-shader: Add separate binary target types for effects.
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/443
--
v2: wined3d: Translate sampler states to sampler objects in wined3d_device_apply_stateblock().
d3d8: Store the texture's parent device as a d3d8_device pointer.
wined3d: Pass shader type and unmodified index to context_preload_texture().
wined3d: Handle texture LOD in wined3d_sampler_desc_from_sampler_states().
wined3d: Pass a wined3d_texture to wined3d_sampler_desc_from_sampler_states().
https://gitlab.winehq.org/wine/wine/-/merge_requests/4057
> Sure. I think the main alternative would be to split the IR in two (or more) separate IRs though. I.e., you'd have a representation of the parsed TPF, then convert that to VSIR, and from there to SPIR-V.
The alternative I was considering would rather be to have only one IR, and use it for all shader language conversions, but not for disassembling. And then, for each language for which we support disassembling, have a dedicated disassembler (which probably doesn't really need an IR: it can emit as it parses). If we supported assembling too, we'd also have a dedicated assembler.
In my mind, assembling/disassembling and compiling (or transpiling, if we want to look more modern!) are two different beasts. For compiling it's useful to have an IR which is flexible and simple, but it doesn't need to faithfully represent precisely all the features of any other language. OTOH for assembling/disassembling you don't really care about flexibility, but it's important to represent faithfully every detail of the language.
My feeling is that trying to shove all these features (simplicity, flexibility, faithfullness to any language) on a single language is a bit overconstraining. Write dedicated assemblers and disassemblers is some additional work too, but I'm not sure the balance is in favor of our solution.
> The disassembler would operate on TPF IR, as would certain lowering passes. That's certainly a valid choice, but I think it's important to point out that while it would make some thing easier, it would also make some things harder. The most obvious is perhaps that we'd need separate disassemblers for d3dbc, tpf, dxil, and vsir. Somewhat less obvious is perhaps that we may need to duplicate certain lowering passes between e.g. d3dbc and tpf, because we'd no longer be able to express them in vsir. It may also make it slightly harder to do something like HLSL IR -> vsir -> d3dbc, because we'd have to get rid of complex texturing instructions when converting HLSL IR to vsir, and then reintroduce them when converting vsir to d3dbc.
If VSIR features are useful for translating between languages, then I agree it's sensible to have then. The part I don't like is having features only because some of the languages we support need to faithfully represent all their features (e.g., having to keep operations like NEG and ABS as register modifiers instead of as regular operators).
> Are those worth it? Well, maybe; it doesn't seem like an obvious win to me. Note also that it's not uncommon for languages/IRs to have different dialects or subsets; the obvious example here is perhaps LLVM IR/DXIL, but note that it's also true for all of d3dbc, tpf, HLSL, and GLSL to various extents.
I agree that finding the right balance is not easy here. But I can't help thinking our current situation is not ideal.
> We may want to tweak the vkd3d_shader_instruction_array data structure somewhat, but I don't think we'll need to do anything as drastic as converting the instruction array to a linked list; gap buffers tend to handle this kind of thing fairly well, and we may even be able to improve on that in specific instances.
I don't know much about gap buffers, but after some reading on Wikipedia I'm not convinced. It seems that a gap buffers makes sense when you have a concept of a cursor that mostly moves locally, while our passes usually scan the whole program each time. With a gap buffer you would end up copying the whole program each time, and by that token you could directly rewrite it in a new array each time. Also, random insertion with gap buffers seems to be comparable to arrays.
Following some link on Wikipedia, a [rope](https://en.wikipedia.org/wiki/Rope_(data_structure)) might be a better match for us, being essentially a compromise between an array and a link-based structure.
--
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/409#note_51289
_(w)environ[] do have a distinct allocation chunk for each entry, so
that _(w)environ[i] pointer and (pointed) string don't change when
updating/deleting any other entry.
Proposed implementation still differs from native:
- allocation is done on process heap, while native uses msvcrt's heap
- first ANSI allocated _environ[] doesn't have per entry allocation.
This is only activated after a change (update/deletion) to _environ[]
is made.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/4313
Goes atop MRs 409 and 436. The last three commits belong to this MR.
--
v3: vkd3d-shader/dxil: Implement the DXIL CMP2 instruction.
vkd3d-shader/spirv: Support orderedness inversion in comparison instructions.
vkd3d-shader/spirv: Support bool result in spirv_compiler_emit_comparison_instruction().
vkd3d-shader/dxil: Implement the DXIL CAST instruction.
vkd3d-shader/spirv: Support double in spirv_compiler_emit_ftou().
vkd3d-shader/spirv: Support double in spirv_compiler_emit_ftoi().
vkd3d-shader/spirv: Handle unsigned result in spirv_compiler_emit_ftoi().
vkd3d-shader/spirv: Introduce integer width cast instructions.
vkd3d-shader/spirv: Support bool cast in spirv_compiler_emit_alu_instruction().
vkd3d-shader/spirv: Support bool logic ops in spirv_compiler_emit_alu_instruction().
vkd3d-shader/spirv: Support bitcast in spirv_compiler_emit_load_ssa_reg().
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/441
Since Yousician's last update, it was throwing an error when initialising audio output. Unfortunately I don't have access to the old version, but they seem to have dropped win<10 support, and are using only IAudioClient3_InitializeSharedAudioStream. They also use IDeviceTopology to get the type of the first output connector.
This is the bare minimum I needed to get it working.
--
v12: mmdevapi: add stub for IDeviceTopology
mmdevapi/tests: add test for IDeviceTopology
mmdevapi: implement IAudioClient3_InitializeSharedAudioStream
mmdevapi/tests: add test for AudioClient3_InitializeSharedAudioStream
https://gitlab.winehq.org/wine/wine/-/merge_requests/3554
Goes atop MR 409. The last six commits belong to this MR.
--
v6: vkd3d-shader/dxil: Implement the DXIL CAST instruction.
vkd3d-shader/spirv: Handle unsigned result in spirv_compiler_emit_ftoi().
vkd3d-shader/spirv: Introduce integer width cast instructions.
vkd3d-shader/spirv: Support bool cast in spirv_compiler_emit_alu_instruction().
vkd3d-shader/spirv: Support bool logic ops in spirv_compiler_emit_alu_instruction().
vkd3d-shader/spirv: Support bitcast in spirv_compiler_emit_load_ssa_reg().
vkd3d-shader/dxil: Implement the DXIL BINOP instruction.
vkd3d-shader/spirv: Support VKD3D_DATA_UINT in spirv_compiler_emit_neg().
vkd3d-shader/spirv: Handle the UMUL instruction.
vkd3d-shader/spirv: Introduce an IDIV instruction.
vkd3d-shader/spirv: Introduce an FREM instruction.
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/436
--
v3: vkd3d-shader/hlsl: Consistently use HLSL allocation functions.
vkd3d-shader/hlsl: Sort keywords.
vkd3d-shader/hlsl: Remove C++ comment lexing.
vkd3d-shader/hlsl: Remove some tokens from the lexer.
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/426
There are applications that uses SRWLOCK in an invalid way and then checks its binary
representation. ~~Specifically they releases an unlocked SRWLOCK then check its bit pattern is
all-ones.~~
Tweak the representation a bit so they are happy.
--
v5: ntdll: Tweak the binary representation of SRWLOCK.
https://gitlab.winehq.org/wine/wine/-/merge_requests/4310
Evan Tang reported that new fixmes appeared on the shader_runner when
running some of his tests after
f50d0ae2cbc5ee2a26fedd7b8aac2def62decd6c.
vkd3d:652593:fixme:shader_sm4_read_src_param Unhandled mask 0x4.
The change to blame seems to be this added line in
sm4_src_from_constant_value().
+ src->swizzle = VKD3D_SHADER_NO_SWIZZLE;
On tpf binaries the last 12 bytes of each src register in an instruction
specify the swizzle, and there are 5 possible combinations:
Dimension NONE
-------- 00
Dimension SCALAR
-------- 01
Dimension VEC4, with a 4 bit writemask:
---- xxxx 00 01
Dimension VEC4, with an 8 bit swizzle:
xx xx xx xx 01 01
Dimension VEC4, with a 2bit scalar dimension number:
------ xx 10 01
So far, we have only seen src registers use 4 bit writemasks in a
single case: for vec4 constants, and it is always zero.
So we expect this:
---- 0000 00 01
Now, I probably wanted to initialize src->swizzle to zero when writing
constants, but VKD3D_SHADER_NO_SWIZZLE is not zero, it is actually the
default swizzle:
11 10 01 00
And the last 4 bits (0x4) get written in the mask part, which causes
the reader to complain.
--
v3: vkd3d-shader/tpf: Don't pass 0x4 as mask for vec4 constant src registers.
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/438
There are applications that uses SRWLOCK in an invalid way and then checks its binary
representation. ~~Specifically they releases an unlocked SRWLOCK then check its bit pattern is
all-ones.~~
Tweak the representation a bit so they are happy.
--
v13: ntdll: Tweak the binary representation of SRWLOCK.
https://gitlab.winehq.org/wine/wine/-/merge_requests/4310
There are applications that uses SRWLOCK in an invalid way and then checks its binary
representation. ~~Specifically they releases an unlocked SRWLOCK then check its bit pattern is
all-ones.~~
Tweak the representation a bit so they are happy.
--
v10: ntdll: Tweak the binary representation of SRWLOCK.
winnt: Add InterlockedExchangeAdd16.
https://gitlab.winehq.org/wine/wine/-/merge_requests/4310
There are applications that uses SRWLOCK in an invalid way and then checks its binary
representation. ~~Specifically they releases an unlocked SRWLOCK then check its bit pattern is
all-ones.~~
Tweak the representation a bit so they are happy.
--
v8: ntdll: Tweak the binary representation of SRWLOCK.
https://gitlab.winehq.org/wine/wine/-/merge_requests/4310
There are applications that uses SRWLOCK in an invalid way and then checks its binary
representation. ~~Specifically they releases an unlocked SRWLOCK then check its bit pattern is
all-ones.~~
Tweak the representation a bit so they are happy.
--
v4: ntdll: Tweak the binary representation of SRWLOCK.
https://gitlab.winehq.org/wine/wine/-/merge_requests/4310
Wine-Bug: https://bugs.winehq.org/show_bug.cgi?id=55842
The bug blames commit 059094c1c18ddc33b04eac53a72fd0eb7510be94 ("ntdll: Define heap block's BLOCK_FLAG_LFH as 0x80.") but actually before the blamed commit that worked essentially by chance.
The problem the patch is solving is that RtlValidateHeap() currently always fails for LFH blocks allocated from large block memory (vs subheap blocks). That can happen for large enough LFH block sizes. In case of the regressed game the user of RtlValidateHeap() is msvcr80.msvcrt_heap_free() which uses HeapValidate() to guess the heap used to allocate the pointer to free. I am attaching a standalone test program which can be used to reproduce the problem without the patch.
[test_lfh_validate.c](/uploads/006b04a9a00ffb7949956b66a275d5cf/test_lfh_validate.c)
--
v4: ntdll: Fix pending free block validation in heap_validate() for LFH blocks.
ntdll: Handle LFH blocks allocated in large blocks in heap_validate_ptr().
https://gitlab.winehq.org/wine/wine/-/merge_requests/4232
There are applications that uses SRWLOCK in an invalid way and then checks its binary
representation. ~~Specifically they releases an unlocked SRWLOCK then check its bit pattern is
all-ones.~~
Tweak the representation a bit so they are happy.
--
v3: ntdll: Tweak the binary representation of SRWLOCK.
https://gitlab.winehq.org/wine/wine/-/merge_requests/4310
On Mon Nov 6 14:46:29 2023 +0000, Yuxuan Shui wrote:
> demonstrate the leak? or demonstrate the race condition?
> in the first case, our unit test already does. in the second case, i
> don't see how that's useful? the race condition is pretty clearly there,
> simply based on the fact that the callback might be running when you try
> to cancel it.
https://gist.github.com/yshui/2f1c5c8ce9d80e9eef1db4eab51344d0 this demonstrate the `CloseThreadpool` vs running callback race.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/4243#note_51113
This is the last thing needed to support non-constant offset dereferences in SM4.
It allows to perform relative addressing on temps.
Besides this, I have additional patches for relative addressing on uniforms, and input and output semantics, but these may not be useful for now, since we copy all these variables into temps instead of using them directly.
--
v5: vkd3d-shader/tpf: Support relative addressing for indexable temps in SM4.
vkd3d-shader/tpf: Move sm4_register_from_node() up.
vkd3d-shader/tpf: Support writing relative addressing indexes.
vkd3d-shader/tpf: Write register index addressing.
vkd3d-shader/tpf: Encode dst and src registers using the same function.
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/435
Evan Tang reported that new fixmes appeared on the shader_runner when
running some of his tests after
f50d0ae2cbc5ee2a26fedd7b8aac2def62decd6c.
vkd3d:652593:fixme:shader_sm4_read_src_param Unhandled mask 0x4.
The change to blame seems to be this added line in
sm4_src_from_constant_value().
+ src->swizzle = VKD3D_SHADER_NO_SWIZZLE;
On tpf binaries the last 12 bytes of each src register in an instruction
specify the swizzle, and there are 5 possible combinations:
Dimension NONE
-------- 00
Dimension SCALAR
-------- 01
Dimension VEC4, with a 4 bit writemask:
---- xxxx 00 01
Dimension VEC4, with an 8 bit swizzle:
xx xx xx xx 01 01
Dimension VEC4, with a 2bit scalar dimension number:
------ xx 10 01
So far, we have only seen src registers use 4 bit writemasks in a
single case: for vec4 constants, and it is always zero.
So we expect this:
---- 0000 00 01
Now, I probably wanted to initialize src->swizzle to zero when writing
constants, but VKD3D_SHADER_NO_SWIZZLE is not zero, it is actually the
default swizzle:
11 10 01 00
And the last 4 bits (0x4) get written in the mask part, which causes
the reader to complain.
--
v2: vkd3d-shader/tpf: Don't pass 0x4 as mask for vec4 constant src registers.
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/438
More preparatory work to declare I/O variables from the signature.
--
v2: vkd3d-shader/spirv: Use register counts from the signature and shader desc.
vkd3d-shader: Store the control point counts in struct vkd3d_shader_desc.
vkd3d-shader/spirv: Use the array sizes for shader phase builtins as well.
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/439
There are applications that uses SRWLOCK in an invalid way and then checks its binary
representation. Specifically they releases an unlocked SRWLOCK then check its bit pattern is
all-ones.
Tweak the representation a bit so they are happy.
--
v2: ntdll: Tweak the binary representation of SRWLOCK.
https://gitlab.winehq.org/wine/wine/-/merge_requests/4310
There are applications that uses SRWLOCK in an invalid way and then checks its binary
representation. Specifically they releases an unlocked SRWLOCK then check its bit pattern is
all-ones.
Tweak the representation a bit so they are happy.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/4310
Signed-off-by: Nikolay Sivov <nsivov(a)codeweavers.com>
--
v2: vkd3d-shader/tpf: Initial support for writing fx_4_0/fx_4_1 binaries.
vkd3d-shader/hlsl: Handle effect group statement.
vkd3d-shader/hlsl: Add variables for techniques.
vkd3d-shader/hlsl: Rename rule for top-level techniques.
vkd3d-shader/hlsl: Add 'fxgroup' token.
tests: Add some tests for effects groups syntax.
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/443
Wine-Bug: https://bugs.winehq.org/show_bug.cgi?id=55842
The bug blames commit 059094c1c18ddc33b04eac53a72fd0eb7510be94 ("ntdll: Define heap block's BLOCK_FLAG_LFH as 0x80.") but actually before the blamed commit that worked essentially by chance.
The problem the patch is solving is that RtlValidateHeap() currently always fails for LFH blocks allocated from large block memory (vs subheap blocks). That can happen for large enough LFH block sizes. In case of the regressed game the user of RtlValidateHeap() is msvcr80.msvcrt_heap_free() which uses HeapValidate() to guess the heap used to allocate the pointer to free. I am attaching a standalone test program which can be used to reproduce the problem without the patch.
[test_lfh_validate.c](/uploads/006b04a9a00ffb7949956b66a275d5cf/test_lfh_validate.c)
--
v3: ntdll: Fix pending free block validation in heap_validate() for LFH blocks.
ntdll: Handle LFH blocks allocated in large blocks in heap_validate_ptr().
https://gitlab.winehq.org/wine/wine/-/merge_requests/4232
--
v8: vkd3d-shader/dxil: Implement the DXIL BINOP instruction.
vkd3d-shader/spirv: Support VKD3D_DATA_UINT in spirv_compiler_emit_neg().
vkd3d-shader/spirv: Handle the UMUL instruction.
vkd3d-shader/spirv: Introduce an IDIV instruction.
vkd3d-shader/spirv: Introduce an FREM instruction.
https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/409