From the discussion regarding the drawStridedSlow implementation of vertex blending, what shook out was the suggestion that WineD3D will need a vertex-program fixed function vertex pipeline at some point anyway, and therefore it doesn't make sense to add another software implementation to drawStridedSlow but to instead get the vertex pipeline in place and build the vertex blending functionality on top of that.
Towards that aim, I've ported Stefan Dösinger's initial implementation of this from October last year [1] to what turned out to be 1.1.14.
It fails two more D3D9 tests than 1.1.14: visual.c:914: Test failed: Transformed vertex with linear vertex fog has color 0000ff00 visual.c:986: Test failed: Transformed vertex with linear vertex fog has color 0000ff00
(I don't know if these failed when it was first implemented, so I don't know if the problem is the patch, the port, or even the test. This is on my nVidia 8600 w/180.22 binary drivers, in case it makes a difference.)
I haven't run it with any games yet, or profiled it. Stefan's commented since that it was slower than the current system, albeit unoptimised, and that was the reason he didn't undertake any further work on it.
I don't know what games use the fixed-function pipeline, so I'm open to suggestions and bug report references.
All but the last patch in this tarball are ports of Stefan's original, and were done using git-am so still retain his original headers. The sorta-nasty no_d3dcolor_swizzle code in get_color and its caller in the last patch is entirely my fault though. ^_^
The last patch is actually also from Stefan's code, but it's an attempt by me to reinsert fog code I have removed, possibly incorrectly. However, it didn't fix the above fog failures, so I left it as a separate patch.
I'm not totally sure I've got all the hard parts of the port right. The two relevant changes since these patches were applied were the implementation of EXT_vertex_buffer_bgra and the rearrangement of the fog code.
For EXT_vertex_array_bgra and vertex_pipe->can_convert_d3dcolor, I'm not sure if there are places that check one and should check the other, and I'm not sure if the test in state.c line 4280 (streamsrc) is doing the right thing. I think it is doing the right thing. The vertex program recognises and handles a position_transformed vertex declaration itself.
Of the original 13-patch series, patches 4 and 5 were rendered completely uninteresting by the fog changes, and 7 and 13 had already gone into the main tree. There was also what I believe was a bug in patch 2, which basically left pCaps->MaxUserClipPlanes uninitialised rather than fetching the value from the vertex_pipeline's get_caps method result.
Anyway, here it is. I hope this is of benefit to someone. The next thing I plan to do with this is try and wrap my head around the internals of the arbvp vertex program so I can implement vertex blending there.
[1] http://www.mail-archive.com/wine-devel@winehq.org/msg49501.html