At the same time, I haven't seen any evidence that a modern GPU will ever perform better with a vertex buffer not in sysmem unless it's using the streaming pattern
I guess it should perform better with a vidmem buffer if the buffer is holding static data and not locked after the initial upload.
Can you check D3DDEVICEDESC.dwMaxVertexCount on these cards? Regardless of what we do with these two patches I think we should reduce it to whatever Windows reports. (1024 in my case on Windows 11)
Conceptually the two patches have my blessing though, regardless of specifics of what PoP3D is doing.