On both Mac and Linux under Wow64, after ~120 threads are created, the 64-bit stacks start to be allocated above 4GB.
This triggered crashes in alloc_fs_sel() and when the result of get_cpu_area() was used. (On Mac the ntdll threadpool tests reproduced this, but on both platforms a test app that created 256 threads also worked).
-- v2: ntdll: Avoid truncating pointer to 32-bits in get_cpu_area(). ntdll: Use 32-bit stack in alloc_fs_sel().