Am still a bit hesitant to not querying first, at least the performance measurements seem to indicate a noticeable difference between WriteProcessMemory and NtWriteVirtualMemory on Windows in the “normal” case (this could be as well due to Windows unconditionally flushing the instruction cache though).
Are you saying that it is slower on Windows? Is that on ARM only probably? Can't imagine that being quicker on Linux / x86, with two server call roundtrips.