Based on [a patch](https://www.winehq.org/mailman3/hyperkitty/list/wine-devel@winehq.org/messag...) by Jinoh Kang (@iamahuman) from February 2022.
I removed the need for the event object and implemented fast paths for Linux. On Linux 4.14+ `membarrier(MEMBARRIER_CMD_GLOBAL_EXPEDITED, ...)` is used. On x86 Linux <= 4.13 `madvise(..., MADV_DONTNEED)` is used, which sends IPIs to all cores causing them to do a memory barrier. On non-x86 Linux 4.3-4.13 `membarrier(MEMBARRIER_CMD_SHARED, ...)` is used. On non-x86 Linux <= 4.2 and on other platforms the fallback path using APCs is used.
-- v2: ntdll: Add thread_get_register_pointer_values-based fast path for NtFlushProcessWriteBuffers. ntdll: Add sys_membarrier-based fast path to NtFlushProcessWriteBuffers. ntdll: Add MADV_DONTNEED-based fast path for NtFlushProcessWriteBuffers. ntdll: Make server_select a memory barrier. ntdll: Implement NtFlushProcessWriteBuffers.