Sorry for the slowness of review thus far.
One other general thing I notice is that we're still missing some flushes. We basically need to flush any time state is set on the stateblock or device, so we're missing at least the following:
* SetTransform() * MultiplyTransform() * setup_lighting() * SetViewport() * SetMaterial() * SetLight() * LightEnable() * SetClipPlane() * stateblock reset in ddraw_surface_create() for primary surfaces * reset in ddraw_set_cooperative_level() * d3d_device_set_render_target() * d3d_device_update_depth_stencil()
I also believe we need to flush before a flip, blit, clear, or download, since we might be drawing to the source surface in that case. That includes:
* Blt() * BltFast() * GetDC() * Lock() * Load()
We should be able to skip execute buffers at least, since they're not going through vertex batching.