Hello,
The way opengl states are managed in wined3d at the moment is a bit messy and inefficient. Basically it is a mixture of modifying the opengl settings when the application modifies the d3d settings and brute-force applying the rest in drawprim. The situation has got worse with the addition of the DirectDraw blitting code and it is time to clean that up and make it more efficient.
Basically during primitive drawing the opengl setup must equal the one requested by the application. Some things can be translated 1:1, e.g. WINED3DRS_LIGHTINGENABLE to glEnable(GL_LIGHTING). Other things like fog or texture stage states are more complex. Some operations in wined3d, for example unlocking a render target or doing a 2D blt require special opengl settings.
At the moment we have the following situation: * Some things are applied when the app requests a change * Some parts are applied by brute force in drawPrimitive * BltOverride and unlockRect read the old gl states, modify it the way they need it and restore the old states back.
This has some problems: * Setting some things during each drawprim call wastes quite a few resources, due to the loops that and gl calls. * If an application performs only blts(e.g. 2D only app) or only locks the render target(playing a movie) storing and restoring the gl state is a waste of resources * Brute force applying opengl states is likely to re-set the old state unnecessarily. Opengl does not assure that redundant calls are cheap. * if an application sets and resets a state for some reason between 2 drawprim calls that involves 2 not really necessary opengl state switches * Anything else I didn't think of?
I can think of a few ways to solve this: * Do not do any opengl changes in SetRenderState, and apply all states in drawprim. This way UnlockRect and Blt don't have to care for resetting the things they changed and redundant changes can be catched nicely * Apply everything when the app requests to do so, and take as much things out of drawprim. This keeps drawprim small and efficient. * Use a mixed style like it is done with transformed vs untransformed vertex drawing with last_was_rhw. That makes it easy to find out if reapplying anything is needed and frees other functions from resetting everything.
To avoid re-setting an old setting again the current opengl state could be stored in the d3ddevice. I think this should be done in any case, and depending on the nature of some parameters the opengl state or the d3d state should be kept in there(or both). So we shouldn't use a d3d stateblock but instead our own structure where we can add the stuff we need.
I also think we should try to get rid of as much things as possible in drawprim. I do not mean entirely removing it, but having a simple flag which tells if a bigger number of states needs attention, like last_was_rhw does. I think about adding a last_was_blit flag which is set in BltOverride and UnlockRect.
I'm not sure about the texture stage states and sampler states. In d3d the settings seem to be per-stage, while in opengl I think they are per texture. If so we could store the last d3d settings that the texture was used with in the texture too see if we have to reapply them.
A partially related sidenote about drawStridedSlow: It has a loop which iterates through the vertices to draw, and in this loop there are if statements checking which data the vertex contains. So if some data isn't there this is at worst a comparison + a jump. I had a look at the vertex data 3DMark2000 uses and removed checks and handling for the things it didn't use for testing. 3DMark2000 doesn't need drawStridedSlow a lot, but yet the score increased from 4879 to 5013 3DMarks. As a comparison forcing drawStridedFast gets about 5500 3dmarks. The fps in the low detail helicopter test increased from 95.6 to 99.2(105 with drawStridedFast).
Cheers, Stefan