So I now have access to a Windows 98 machine and some contemporaneous hardware, and results are... weird.
The only working card I have with hardware TCL is an NV17. It passes ddraw7's test_map_synchronisation() just fine.
It fails ddraw4's test_map_synchronisation() because it... seems to draw nothing at all if asked to draw more than 24 vertices. We don't actually check the output colour, which should be fixed, but the problems come when all of the draws pass pretty much immediately, which throws off the "how many primitives can we draw in 100 ms" calculation.
The same thing seems to happen if, on either ddraw4 or ddraw7, I create the VB with D3DVBCAPS_SYSTEMMEMORY. This does, at least, strongly suggest that in ddraw4 (and probably in ddraw7 too if you don't create a hardware TnL device), a non-RHW buffer is always in system memory.
On Stefan's suggestion I tried hacking ddraw4's test_map_synchronisation() to draw pretransformed vertices. That worked strangely. The NOOVERWRITE tests seem to consistently get 0xffff00, suggesting they are unsynchronized after all. The non-NOOVERWRITE tests randomly get either 0x000000 (note that the clear colour is 0x0000ff), 0xffff00 (as if unsynchronized), or 0xff00ff.
In no case did I see GetVertexBufferDesc() returning anything other than the exact flags that it was passed.