New subject: [PATCH 1/1] ntdll: Use Mach COW for write watches support on macOS.

2 Oct 2025


      This uses the Mach COW mechanism to implement writewatch functionality.
Below is the same micro-benchmark @gofman used in his [UFFD MR](https://gitlab.winehq.org/wine/wine/-/merge_requests/7871).
```
Parameters:
- number of concurrent threads;
- number of pages;
- delay between reading / resetting write watches (ms)
- random (1) or sequentual (0) page write access;
- reset with WRITE_WATCH_FLAG_RESET in GetWriteWatch (1) or in a separate ResetWriteWatch call (0).
Result is in the form of <average write to page time, ns> / <average GetWriteWatch() time, mcs>
Parameters      Windows          Mach COW              Fallback
6 1080 3 1 1    897 / 80         371 / 12634           66202 / 186
6 1080 3 1 0    855 / 87         369 / 12637           66766 / 187
8 8192 3 1 1    6526 / 268       627 / 113263          111053 / 485
8 8192 3 1 0    1197 / 509       623 / 113810          122921 / 489
8 8192 1 1 1    1227 / 412       636 / 118930          150628 / 388
8 8192 1 1 0    5721 / 144       631 / 120538          146392 / 384
8 64 1 1 1      572 / 7          490 / 1078            1000 / 89
8 64 1 1 0      530 / 13         500 / 1075            1167 / 77
```
This was all on the same M2 Max machine with Windows being win11 on ARM in a VM running the x64 binary emulated and otherwise Wine through Rosetta with and without this MR.
Unlike UFFD which is always better than fallback and comparable to the Windows performance, here good average write to page time is traded for bad average `GetWriteWatch()` time (pretty much in equal ratios).
However in real world applications (like the FFXIV + Dalamud mod framework/loader use case) the startup time is reduced from about 25.5s to 23.6s with this change from a cold start, including loading a modern dotnet 9 runtime into the game process and initializing a complex mod collection, with a fairly high GC pressure.
This is probably because the `GetWriteWatch()` calls the GC does mostly happen concurrently, whereas in Wines fallback implementation running threads are interrupted and often wait on the global virtual lock in Wine while the segfault is handled and parallel accesses to write watched memory and other VM operations are blocked.
Another advantage is that `VPROT_WRITEWATCH` can be used then for other purposes in the future and also Rosetta being a bit finicky sometimes with reported protections with the current implementation, but behaved always as expected so far in my testing with the new one.
On native ARM64 the `VM_PROT_COPY`/`SM_COW` mechanism works also as expected on native 16k pages (not that this matters much at the moment).
`GetWriteWatch()` with the reset flag also does not need to be transactional (unlike UFFD), since only marked pages are reset here and not the entire range.
-- 
https://gitlab.winehq.org/wine/wine/-/merge_requests/9090

[PATCH 0/1] MR9090: ntdll: Use Mach COW for write watches support on macOS.