[Bug 59027] New: "Rebased" NTSync is broken: massive performance regression
http://bugs.winehq.org/show_bug.cgi?id=59027 Bug ID: 59027 Summary: "Rebased" NTSync is broken: massive performance regression Product: Wine Version: 10.19 Hardware: x86-64 OS: Linux Status: UNCONFIRMED Severity: major Priority: P2 Component: -unknown Assignee: wine-bugs(a)list.winehq.org Reporter: virtuousfox(a)gmail.com Distribution: --- Last time I had good performance in wine it was wine-staging-1.10 with the original NTSync patch (MR 7226). But after it was "rebased" in smaller chunks and officially adopted, it as if performance is even worse than before it existed (possibly due to losing esync too). Is it still not fully merged or something, being broken in half-state? This is evident in the biggest offender I've found: https://bugs.winehq.org/show_bug.cgi?id=54693 - Freedom Planet 2 (and its demo) is back to 20 fps (it should have no problem to get 200 even with CPU-only rendering, like vulkan:llvmpipe). I also see that in Dishonored 2 fps is often stuck at also around 20-40 (previously: 50-75) while GPU is underloaded at 50-75% and 12-core CPU - <10%. At least it's not eating up 70% of all CPU cores, like it did before (or was it only esync's thing?). But /dev/ntsync is with 666 permissions and I don't see any obvious errors and warnings. Perhaps, it's silently ignored at all or there is other massive regression. Tested recently with dxvk+app-emulation/vkd3d-proton using DXVK_HUD="devinfo,fps,frametimes,submissions,drawcalls,pipelines,memory,gpuload,api,scale=1.2" but wine's native rendering with mesa's overlay should show the same, last time I've checked. Mesa overlay can be used via: VK_INSTANCE_LAYERS="VK_LAYER_MESA_overlay" VK_LOADER_LAYERS_ENABLE+=",VK_LAYER_MESA_overlay" VK_LAYER_MESA_OVERLAY_CONFIG="fps_sampling_period=80,width=480,position=top-left,submit,draw,pipeline_graphics,vert_invocations,geom_invocations,clip_invocations,frag_invocations,tess_eval_invocations,compute_invocations" If everything work well, either your fps will be capped at maximum or you should see either CPU/GPU compute load or RAM/VRAM usage at near-100%, being a bottleneck. Otherwise, system is underutilized due to bad timing of something. It this timing is particularly bad. -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
http://bugs.winehq.org/show_bug.cgi?id=59027 --- Comment #1 from FoX <virtuousfox(a)gmail.com> --- After trying to figure this out for months I've just stumbled on a massive breakthrough: it appears that all sync methods in both wine and proton are severely crippled by threading - the more cores they get, the worse they perform but they always try to get all cores. In place where I get 24-26 fps with NTsync on current wine-staging, I've tried: 1) WINE_CPU_TOPOLOGY=2 wine-proton FP2.exe 2) taskset -c 2-3 wine FP2.exe 3) WINE_CPU_TOPOLOGY=4 wine-proton FP2.exe 4) taskset -c 2-5 wine FP2.exe The results are astonishing: 1) 110-120 fps; 2) 55-65 fps; 3) 50-60 fps; 4) 24-26 fps. Meaning that 2 threads (1 core) was the sweet-spot, despite that single core being maxed out on load. I have 12 cores, so you can imagine how bad it's by default. Ironically, proton aced the test in the end but it has started with the worst results by default: 9 fps with default sync and <5 fps for esync & fsync. However, limiting all wine processes and apps themselves is a bad workaround in general. At least, there should be a way to limit only sync processes. Even pinning everything of entire sync unto a single thread by default does not seem like a bad idea. -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
http://bugs.winehq.org/show_bug.cgi?id=59027 Zeb Figura <z.figura12(a)gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |z.figura12(a)gmail.com --- Comment #2 from Zeb Figura <z.figura12(a)gmail.com> --- (In reply to FoX from comment #1)
After trying to figure this out for months I've just stumbled on a massive breakthrough: it appears that all sync methods in both wine and proton are severely crippled by threading - the more cores they get, the worse they perform but they always try to get all cores.
In place where I get 24-26 fps with NTsync on current wine-staging, I've tried: 1) WINE_CPU_TOPOLOGY=2 wine-proton FP2.exe 2) taskset -c 2-3 wine FP2.exe 3) WINE_CPU_TOPOLOGY=4 wine-proton FP2.exe 4) taskset -c 2-5 wine FP2.exe
This doesn't make any sense. Sync methods don't, by themselves, "try to get all cores". Applications might, but that shouldn't make ntsync worse. I also can't reproduce these results. I have a fairly high-powered computer, and with ntsync I reach even the highest FPS limit available (288 FPS). But without ntsync, performance gets worse, and if I limit it to 2 cores with taskset, performance gets worse still. That's more or less what I'd expect. Can you please test with unmodified upstream non-staging wine, in a fresh prefix, without any external components including dxvk? -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
http://bugs.winehq.org/show_bug.cgi?id=59027 --- Comment #3 from FoX <virtuousfox(a)gmail.com> --- (In reply to Zeb Figura from comment #2)
This doesn't make any sense. Sync methods don't, by themselves, "try to get all cores". Applications might, but that shouldn't make ntsync worse.
This is what doesn't make sense. No matter the application, all cores are always used in background, judging by core utilization graph in gkrellm. I doubt that every single Windows game has something explicitly coded to scale only synchronization on all cores but nothing else.
I also can't reproduce these results. I have a fairly high-powered computer, and with ntsync I reach even the highest FPS limit available (288 FPS). But without ntsync, performance gets worse, and if I limit it to 2 cores with taskset, performance gets worse still. That's more or less what I'd expect.
Good for you but I've never seen such magic. And I did say that performance is worse without ntsync. It's just still bad with it (proton is outlier in this). But I'm 90% sure performance in wine-staging was decent with original ntsync merge request. 10% chance is that I misremembering due to it still being way better than complete slideshow without ntsync and core-limiting. Make no mistake, at some scenes some games also can reach high fps for me. But when they are affected by this, it tanks hard. It took me few levels to reach one where Freedom Planet 2 is comically slow.
Can you please test with unmodified upstream non-staging wine, in a fresh prefix, without any external components including dxvk?
Did that but had to at least switch to vulkan renderer, as using default opengl one hanged whole wine when I tried loading GALLIUM_HUD (which is opengl-only), so much that I had to use `wineboot -k -f -e` to make wine unstuck. With vanilla wine's native vulkan renderer performance is almost exactly the same as wine-staging with dxvk but way more stuttery. Same core load distribution too. -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
http://bugs.winehq.org/show_bug.cgi?id=59027 --- Comment #4 from FoX <virtuousfox(a)gmail.com> --- Also it appears that kernel scheduling tuning affects fps significantly, likely due to latency of thread switching. I've noticed that on last run boost from taskset was too low, it appeared that tuned silently failed to apply the profile on boot. After forcing it, it returned to previously stated values. I suspect that these settings influence fps when constrained by taskset up to 50-60% of difference: [cpu] load_threshold=0.33 latency_low=1 latency_high=999 pm_qos_resume_latency_us=200 governor=schedutil energy_perf_bias=performance energy_performance_preference=performance sampling_down_factor=3 min_perf_pct=63 [sysfs] /sys/kernel/debug/sched/min_granularity_ns=2000 /sys/kernel/debug/sched/idle_min_granularity_ns=1000000 /sys/kernel/debug/sched/latency_ns=500000 /sys/kernel/debug/sched/wakeup_granularity_ns=1000 /sys/kernel/debug/sched/tunable_scaling=0 /sys/kernel/debug/sched/migration_cost_ns=4000 /sys/kernel/debug/sched/nr_migrate=1 /sys/devices/system/cpu/cpufreq/schedutil/rate_limit_us=50 /sys/block/nvme*n*/queue/scheduler=kyber /sys/block/nvme*n*/queue/nr_requests=512 /sys/block/nvme*n*/queue/max_sectors_kb=2048 /sys/block/nvme*n*/queue/read_ahead_kb=16384 /sys/block/nvme*n*/queue/rq_affinity=2 [sysctl] kernel.sched_autogroup_enabled=0 kernel.sched_cfs_bandwidth_slice_us=1000 kernel.sched_deadline_period_max_us=100000 kernel.sched_deadline_period_min_us=1000 kernel.sched_rt_runtime_us=500000 kernel.sched_rt_period_us=1000000 kernel.sched_rr_timeslice_ms=2 kernel.sched_util_clamp_max=1000 kernel.sched_util_clamp_min=850 kernel.sched_util_clamp_min_rt_default=975 vm.admin_reserve_kbytes=262144 vm.compaction_proactiveness=9 vm.dirty_ratio=24 vm.dirty_background_ratio=16 vm.vfs_cache_pressure=133 vm.swappiness=66 vm.page-cluster=1 vm.watermark_scale_factor=333 My kernel is built with: CONFIG_PREEMPT_LAZY=y CONFIG_PREEMPT_RT=y -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
http://bugs.winehq.org/show_bug.cgi?id=59027 --- Comment #5 from FoX <virtuousfox(a)gmail.com> --- Created attachment 79854 --> http://bugs.winehq.org/attachment.cgi?id=79854 Freedom Planet 2's save on the most broken level (Avian Museum) Got another breakthrough: running FP2 in wine-staging-10.20 by `chrt -v -f 1 taskset -c 2-3 wine FP2.exe` command (forcing real-time FIFO priority while using only 2 cores) doubled fps and put it on par with proton. I suspect that it's because the game process wins scheduling contention against sync threads. But that wouldn't work with games that actually need many cores. -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
http://bugs.winehq.org/show_bug.cgi?id=59027 morgwai <foss(a)morgwai.pl> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |foss(a)morgwai.pl -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
http://bugs.winehq.org/show_bug.cgi?id=59027 --- Comment #6 from Zeb Figura <z.figura12(a)gmail.com> --- What I'm trying to say is there's no such thing as a "sync thread". Synchronization is the tool you use to deal with multiple threads. Synchronization primitives don't by themselves do work, or spawn threads. I don't know what all the custom kernel settings you're using do, but it would probably be prudent to start trying to alter those, and see if something closer to upstream defaults fixes your problems. I'm running stock Debian here. -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.
participants (1)
-
WineHQ Bugzilla