> Would it help to return to the return address already on the PE stack?
Are we sure it's never clobbered?
> I guess that moving the ret address to rcx and push rcx / ret might be
the same performance-wise as pushq 0x70(%rcx), ret.
Yes, skipping rcx save will break existing tests.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/1552#note_18485
On Fri Dec 2 20:30:47 2022 +0000, **** wrote:
> Paul Gofman replied on the mailing list:
> ```
> On 12/2/22 14:25, Gabriel Ivăncescu (@insn) wrote:
> > On Fri Dec 2 18:57:30 2022 +0000, Jacek Caban wrote:
> >>> This should help a bit more, does it make a difference for you?
> >> My previous test wasn't really good for measuring it.
> >> I hacked a micro-benchmark, which confirms that the patch improves
> >> performance a lot. It was visible when doing "real" Vulkan
> >> vkGetPhysicalDeviceProperties calls in a loop, but even cleaner when I
> >> changed it further to make Unix side to be no-op. It closes most of the
> >> gap between direct call and __wine_unix_call_dispatcher. Times recorded
> >> for no-op calls:
> >> - direct call: 5761
> >> - unpatched Wine: 13933
> >> - ret.diff: 6823 (55% time spent in __wine_unix_call_dispatcher, 29% in
> >> PE vkGetPhysicalDeviceProperties)
> >> Looks impressive!
> > @gofman This isn't about setting it in rcx or not, it's about
> mispairing `call`s and `ret`s, which basically means 100% mispredicted
> because CPUs are optimized for it, so it couldn't do any speculative
> execution past the return before.
> >
> Yes, I figured that much. Yet the attached diff removes the return
> address from rcx in wine_syscall_dispatcher(), so I thought it makes
> sense to note that it will break things.
> ```
Would it help to return to the return address already on the PE stack?
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/1552#note_18480
On Fri Dec 2 18:57:30 2022 +0000, Jacek Caban wrote:
> > This should help a bit more, does it make a difference for you?
> My previous test wasn't really good for measuring it.
> I hacked a micro-benchmark, which confirms that the patch improves
> performance a lot. It was visible when doing "real" Vulkan
> vkGetPhysicalDeviceProperties calls in a loop, but even cleaner when I
> changed it further to make Unix side to be no-op. It closes most of the
> gap between direct call and __wine_unix_call_dispatcher. Times recorded
> for no-op calls:
> - direct call: 5761
> - unpatched Wine: 13933
> - ret.diff: 6823 (55% time spent in __wine_unix_call_dispatcher, 29% in
> PE vkGetPhysicalDeviceProperties)
> Looks impressive!
@gofman This isn't about setting it in rcx or not, it's about mispairing `call`s and `ret`s, which basically means 100% mispredicted because CPUs are optimized for it, so it couldn't do any speculative execution past the return before.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/1552#note_18478
> This should help a bit more, does it make a difference for you?
My previous test wasn't really good for measuring it.
I hacked a micro-benchmark, which confirms that the patch improves performance a lot. It was visible when doing "real" Vulkan vkGetPhysicalDeviceProperties calls in a loop, but even cleaner when I changed it further to make Unix side to be no-op. It closes most of the gap between direct call and __wine_unix_call_dispatcher. Times recorded for no-op calls:
- direct call: 5761
- unpatched Wine: 13933
- ret.diff: 6823 (55% time spent in __wine_unix_call_dispatcher, 29% in PE vkGetPhysicalDeviceProperties)
Looks impressive!
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/1552#note_18474
Signed-off-by: Nikolay Sivov <nsivov(a)codeweavers.com>
--
v2: d3d10/effect: Add 'frc' instruction support for expressions.
d3d10/effect: Add 'rcp' instruction support for expressions.
d3d10/effect: Add 'div' instruction support for expressions.
d3d10/effect: Add 'ftob' instruction support for expressions.
d3d10/effect: Partially implement updates through value expressions.
https://gitlab.winehq.org/wine/wine/-/merge_requests/1622
Implement a basic GC based on the mark-and-sweep algorithm, without requiring manually specifying "roots", which vastly simplifies the management. For now, it is triggered every 30 seconds since it last finished, on a new object initialization. Better heuristics could be used in the future.
The comments in the code should hopefully understand the high level logic of this approach without boilerplate details. I've tested it on FFXIV launcher (along with other patches from Proton to have it work) and it stops the massive memory leak successfully by itself, so at least it does its job properly. The second patch in the MR is just an optimization for a *very* common case.
For artificial testing, one could use something like:
```javascript
function leak() {
var a = {}, b = {};
a.b = b;
b.a = a;
}
```
which creates a circular ref and will leak when the function returns.
It also introduces and makes use of a "heap_stack", which prevents stack overflows on long chains.
--
v2: jscript: Create the source function's 'prototype' prop object on demand.
jscript: Run the garbage collector every 30 seconds on a new object
jscript: Implement CollectGarbage().
jscript: Implement a Garbage Collector to deal with circular references.
https://gitlab.winehq.org/wine/wine/-/merge_requests/1635