Okay, so I did some more performance testing on the 98 machine. I'm attaching a diff that shows the general form of the tests I did, and the results (sorry they're a bit of a mess). Cards tested were a Geforce 4 MX (NV17), which supports hardware TCL, and a Rage 128, which doesn't.
The results are weird, but there's three consistent patterns that emerge from this and previous tests:
(1) The NVidia card shows a difference between D3DVBCAPS_SYSMEM and no SYSMEM, even when vertex processing is done in software. The ATI card doesn't. The explanation here may be that creating and locking a vertex buffer doesn't actually depend on the device IID.
(2) RHW buffers on a non-TnLHal device broadly act more like XYZ buffers on a TnLHal device than they act like XYZ buffers on a non-TnLHal device—e.g. NOOVERWRITE seems to be unsynchronized, and D3DVBCAPS_SYSMEM is slower than no D3DVBCAPS_SYSMEM. This is not really surprising.
(3) DISCARD and NOOVERWRITE flags do nothing on ddraw4. Oddly, they aren't rejected either (and I tested that other flags are rejected), but that may just be a consequence of having a runtime that supports ddraw7.
It's also worth mentioning that the test suggests that Prince of Persia should perform *terribly* on the NVidia card, but decently on the Rage 128, and in fact the demo does exactly this. I didn't test the full game, but if it performs better it's not unlikely that it does so via driver hacks.
(1) and (2) do suggest that we should be honouring the SYSMEM flag, and tests suggest that it matters even for ddraw4. At the same time, I haven't seen any evidence that a modern GPU will *ever* perform better with a vertex buffer not in sysmem unless it's using the streaming pattern, and (3) means that it never will on ddraw4. Maybe things were different for a contemporaneous GPU, but I don't think that matters when deciding how to optimize (even before dropping support for them recently).
The upshot of this is that I think 1/2 of this patch series, as is, really does do the right thing. I'll rebase it, and add a comment that explains this better, as requested.