Based on [a patch](https://www.winehq.org/mailman3/hyperkitty/list/wine-devel@winehq.org/messag...) by Jinoh Kang (@iamahuman) from February 2022.
I removed the need for the event object and implemented fast paths for Linux. On macOS 10.14+ `thread_get_register_pointer_values` is called on every thread of the process. On Linux 4.14+ `membarrier(MEMBARRIER_CMD_GLOBAL_EXPEDITED, ...)` is used. On x86 Linux <= 4.13 and on other platforms `madvise(..., MADV_DONTNEED)` is used, which sends IPIs to all cores causing them to do a memory barrier.
-- v11: ntdll: Add thread_get_register_pointer_values-based implementation of NtFlushProcessWriteBuffers. ntdll: Add sys_membarrier-based implementation of NtFlushProcessWriteBuffers. ntdll: Add MADV_DONTNEED-based implementation of NtFlushProcessWriteBuffers.