On both Mac and Linux under Wow64, after ~120 threads are created, the 64-bit stacks start to be allocated above 4GB.
This triggered crashes in alloc_fs_sel() and when the result of get_cpu_area() was used. (On Mac the ntdll threadpool tests reproduced this, but on both platforms a test app that created 256 threads also worked).