Based on [a patch](https://www.winehq.org/mailman3/hyperkitty/list/wine-devel@winehq.or… by Jinoh Kang (@iamahuman) from February 2022.
I removed the need for the event object and implemented fast paths for Linux.
On macOS 10.14+ `thread_get_register_pointer_values` is used on every thread of the process.
On Linux 4.14+ `membarrier(MEMBARRIER_CMD_GLOBAL_EXPEDITED, ...)` is used.
On x86 Linux <= 4.13 `madvise(..., MADV_DONTNEED)` is used, which sends IPIs to all cores causing them to do a memory barrier.
On non-x86 Linux <= 4.2 and on other platforms the fallback path using APCs is used.
--
v8: ntdll: Add thread_get_register_pointer_values-based implementation of NtFlushProcessWriteBuffers.
ntdll: Add sys_membarrier-based implementation of NtFlushProcessWriteBuffers.
ntdll: Add MADV_DONTNEED-based implementation of NtFlushProcessWriteBuffers.
https://gitlab.winehq.org/wine/wine/-/merge_requests/741
I'm not exactly sure what is calling this function, but it's probably that anti virus I installed to test the KeInitializeGuardedMutex implementation.
edit: TkCtrl2k64.sys calls it
--
v7: ntoskrnl.exe: Implement ExInterlockedInsertTailList.
https://gitlab.winehq.org/wine/wine/-/merge_requests/1282
This was already committed, and I guess it's not hurting anything, but it seems like the wrong solution to the problem. Presumably either rpcrt4 should be using NdrAllocate/NdrFree for both, or it should be using internal I_RpcAllocate/I_RpcFree.
--
https://gitlab.winehq.org/wine/wine/-/merge_requests/1521#note_17270