A small implementation change plan:
With Henri's suggestion, we have two levels of indirection, one that maps a D3D state to a pipeline part and a state, and one that maps a pipeline part and a state to an application function. Since both states are known at latest at device creation time, we can remove one indirection there.
So my suggestion is this: Instead of trying to keep the single table in code, add 3 pipeline stage backends, like Henri suggested: A vertex one, a fragment and a misc backend. The pipeline backend has a description structure which contains a priv data creation and destruction data, possibly some flags to tell other parts of the pipeline how to communicate with it and a set of states and application functions:
struct pipeline_backend { DWORD state; /* State this sets */ apply_func apply; /* Apply function like the current state mgmt uses */ DWORD representative;/* For state grouping */ }
state and representative work like the current state identifiers via STATE_RENDER(x), STATE_TEXTURESTAGE(x, y), etc. The apply function takes the stateblock, state and context as argument, as usual.
The current global state table(s) are removed. At device creation(or Init3D or somewhere else) we select a vertex, fragment and misc state backend and a shader backend. The device contains a full state table like the current FFPStateTable / ATIFSStateTable. A pipeline compiler(is there some better name?) iterates over the 3 partial state tables and inserts them in the device's state table. If a state is handled by more than one pipeline part, a helper function can be used which calls the callbacks in a row. That's not as efficient as the current inlining, but it should be at least as fast as in Henri's proposal, and I can live with that.
Advantages: -> We can split up pipeline part handlers and select them dynamically etc -> We have no additional overhead in the rendering loop since we only deal with one state table there -> Minimal state polling because pixel and vertex shaders are dirtifyable states
Problems: -> The issues from the last mail still apply and need to be solved(not a problem, just mentioning)
-> Shaders not 100% separated from the fixed function pipeline since both are equal states in the vertex and fragment pipelines
-> No state handler or shader_select is not guaranted to be applied each draw, so we can't use them to enable GL_FRAGMENT_SHADER_ATI and GL_TEXTURE_SHADER_ATI. That will have to be a fixed function fragment pipeline callback. If there's a shader implementation using that extension in use as well, they can sort this out via private data sharing. (Note: we would get away with using the colorop setter for this, because after device creation this state is dirty, after a blit it is dirty, and otherwise the extension will be enabled anyway. I don't particularly like that as it is a rather fragile setup)
-> What do we do if a state has different representatives in different pipeline parts? E.g. the vdecl and vertex shader will be linked in both misc and vertex pipeline backend. In the misc backend, the stream sources will be added there as well. This means that if a stream source is changed the VDECL fog/lighting changes will be performed needlessly(it happens currently as well, so it won't be a regression). There may be other such state groups, but I can't think of any right now.
Does that sound like a reasonable idea?