https://bugs.winehq.org/show_bug.cgi?id=57388
Bug ID: 57388 Summary: Major perf loss with blocking ReadFile() & OVERLAPPED Product: Wine Version: 9.20 Hardware: x86-64 OS: Linux Status: UNCONFIRMED Severity: minor Priority: P2 Component: kernel32 Assignee: wine-bugs@winehq.org Reporter: nyandarknessgirl@gmail.com Distribution: ---
When using usual blocking file IO & File pointer - based reads (Passing NULL to ReadFile(lpOverlapped)), the performance is relatively similar to native `read()` syscall on the host. However, when passing an OVERLAPPED structure to ReadFile() on a file opened in blocking mode (For pread-like behavior), the performance drops significantly compared to a native pread() syscall on the host. It would be a good thing to fix this, as positioned IO is pretty useful for a lot of apps. Some performance measures:
Native Linux pread() test 512b reads: 1239998.666753 IOPS, 605.468099 MiB/s 4k reads: 1246816.786350 IOPS, 4870.378072 MiB/s 1M reads: 20914.414868 IOPS, 20914.414868 MiB/s
Wine 9.20 on the same host, blocking ReadFile() with OVERLAPPED file position 512b reads: 76785.134791 IOPS, 37.492742 MiB/s 4k reads: 74402.319471 IOPS, 290.634060 MiB/s 1M reads: 14141.221307 IOPS, 14141.221307 MiB/s
In both cases file fully resides in pagecache and no actual random IO is performed, the mere fact that we specify a file position from OVERLAPPED structure affects performance a lot
https://bugs.winehq.org/show_bug.cgi?id=57388
Zeb Figura z.figura12@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |z.figura12@gmail.com
--- Comment #1 from Zeb Figura z.figura12@gmail.com --- Certain types of completion signalling are potentially cross-process, and need to be managed by a central authority (the wineserver). This means RPC to the wineserver process, which is slow.
This will happen if:
* an event is passed;
* an APC routine is passed;
* the operation might queue a packet to a completion port. (Note that only the server knows if there is in fact a completion port associated with this file.)
The last one is presumably the culprit here, assuming you're not passing a valid event. If no OVERLAPPED structure is provided, completion is never queued, but if a valid OVERLAPPED structure is provided, it will be.
On the other hand, a non-overlapped file handle can't be associated with a port, so we can do a little better, and avoid trying to queue completion in that case. I've sent https://gitlab.winehq.org/wine/wine/-/merge_requests/6768.
https://bugs.winehq.org/show_bug.cgi?id=57388
--- Comment #2 from LekKit nyandarknessgirl@gmail.com --- Thank you very much, it's really a huge improvement. Here are new results:
Wine 9.21 on the same host, synchronous ReadFile() with OVERLAPPED file position
512b reads: 634344.851175 IOPS, 309.738697 MB/s 4k reads: 628872.834744 IOPS, 2456.534511 MB/s 1M reads: 18620.033840 IOPS, 18620.033840 MB/s
It's still slightly slower than a native version, but according to `strace` it's just doing another lseek() to conform to WinAPI properly (ReadFile()/WriteFile() seeks after buffer even with synchronous OVERLAPPED mode). That explains why it's ~2x slower for 512b/4k reads, as the syscall entry latency dominates the execution time (Mitigations enabled).
A pretty good result overall, improving it further would require moving the file pointer into userspace which I'm not sure is doable for Wine at all.
Best regards
https://bugs.winehq.org/show_bug.cgi?id=57388
LekKit nyandarknessgirl@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED
--- Comment #3 from LekKit nyandarknessgirl@gmail.com --- Closing this as FIXED.
https://bugs.winehq.org/show_bug.cgi?id=57388
Alexandre Julliard julliard@winehq.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED
--- Comment #4 from Alexandre Julliard julliard@winehq.org --- Closing bugs fixed in 9.22.