Looking into the code duplication some more: msvcrt links with ntdll, so it seems the simplest solution is to remove the functions from msvcrt, which will cause the ntdll versions to be called.
It's not as clean as I'd like, though. Some of the functions are better in msvcrt, so they should be backported, and others can't be moved at all. This isn't the only source of duplication either. For instance, math.c seems to be full of duplicates, but I don't even want to touch that...
If upstream typically doesn't like this kind of patch, maybe I should resubmit after I've developed an AVX2 patch as well? That would provide a much more convincing argument for speed, compared to the current patch which is fairly marginal.