Supersedes !741.
On macOS 10.14+ `thread_get_register_pointer_values` is called on every thread of the process. On Linux 4.14+ `membarrier(MEMBARRIER_CMD_GLOBAL_EXPEDITED, ...)` is used. On x86 Linux <= 4.13 and on other platforms it falls back to calling `NtGetContextThread()` on each thread.
The fast path patches from @tmatthies are slightly modified in the following ways:
1. On unsupported platforms, the `try_*()` function returns `FALSE` instead of `0`. 2. `try_exp_membarrier()` is called first, then `try_mach_tgrpvs()`.