Supersedes !741. On macOS 10.14+ `thread_get_register_pointer_values` is called on every thread of the process. On Linux 4.14+ `membarrier(MEMBARRIER_CMD_PRIVATE_EXPEDITED, ...)` is used. On x86 Linux <= 4.13 and on other platforms it falls back to calling `NtGetContextThread()` on each thread. The fast path patches from @tmatthies are slightly modified in the following ways: 1. On unsupported platforms, the `try_*()` functions return `FALSE` instead of `0`. 2. `try_exp_membarrier()` is called first, then `try_mach_tgrpvs()`. --- Known applications fixed by this MR: - osu! (rhythm game) song selection menu stuttering - .NET CoreCLR GC - HotSpot JVM (-XX:+UseSystemMemoryBarrier) safepoints -- v2: kernel32/tests: Add a store buffering litmus test involving FlushProcessWriteBuffers. ntdll: Add slow fallback implementation of NtFlushProcessWriteBuffers. https://gitlab.winehq.org/wine/wine/-/merge_requests/7250