Based on [a patch](https://www.winehq.org/mailman3/hyperkitty/list/wine-devel(a)winehq.org/mess...) by Jinoh Kang (@iamahuman) from February 2022. I removed the need for the event object and implemented fast paths for Linux. On macOS 10.14+ `thread_get_register_pointer_values` is used on every thread of the process. On Linux 4.14+ `membarrier(MEMBARRIER_CMD_GLOBAL_EXPEDITED, ...)` is used. On x86 Linux <= 4.13 `madvise(..., MADV_DONTNEED)` is used, which sends IPIs to all cores causing them to do a memory barrier. On non-x86 Linux <= 4.2 and on other platforms the fallback path using APCs is used. -- v3: ntdll: Add thread_get_register_pointer_values-based fast path for NtFlushProcessWriteBuffers. ntdll: Add sys_membarrier-based fast path to NtFlushProcessWriteBuffers. ntdll: Add MADV_DONTNEED-based fast path for NtFlushProcessWriteBuffers. ntdll: Make server_select a memory barrier. ntdll: Implement NtFlushProcessWriteBuffers. https://gitlab.winehq.org/wine/wine/-/merge_requests/741